Applications of data mining algorithms to analysis of medical data.
Independent thesis Advanced level (degree of Master (One Year))Student thesis
Medical datasets have reached enormous capacities. This data may contain valuable information that awaits extraction. The knowledge may be encapsulated in various patterns and regularities that may be hidden in the data. Such knowledge may prove to be priceless in future medical decision making. The data which is analyzed comes from the Polish National Breast Cancer Prevention Program ran in Poland in 2006. The aim of this master's thesis is the evaluation of the analytical data from the Program to see if the domain can be a subject to data mining. The next step is to evaluate several data mining methods with respect to their applicability to the given data. This is to show which of the techniques are particularly usable for the given dataset. Finally, the research aims at extracting some tangible medical knowledge from the set. The research utilizes a data warehouse to store the data. The data is assessed via the ETL process. The performance of the data mining models is measured with the use of the lift charts and confusion (classification) matrices. The medical knowledge is extracted based on the indications of the majority of the models. The experiments are conducted in the Microsoft SQL Server 2005. The results of the analyses have shown that the Program did not deliver good-quality data. A lot of missing values and various discrepancies make it especially difficult to build good models and draw any medical conclusions. It is very hard to unequivocally decide which is particularly suitable for the given data. It is advisable to test a set of methods prior to their application in real systems. The data mining models were not unanimous about patterns in the data. Thus the medical knowledge is not certain and requires verification from the medical people. However, most of the models strongly associated patient's age, tissue type, hormonal therapies and disease in family with the malignancy of cancers. The next step of the research is to present the findings to the medical people for verification. In the future the outcomes may constitute a good background for development of a Medical Decision Support System.
Place, publisher, year, edition, pages
2007. , 104 p.
medical data mining, medical data warehouse, medical data, breast cancer.
Computer Science Software Engineering
IdentifiersURN: urn:nbn:se:bth-4253Local ID: oai:bth.se:arkivexDE0F987698BB6DC9C125737600667AB7OAI: oai:DiVA.org:bth-4253DiVA: diva2:831582