Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Machine Learning Based Analysis of DNA Methylation Patterns in Pediatric Acute Leukemia
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.ORCID iD: 0000-0002-9615-5079
2015 (English)Doctoral thesis, comprehensive summary (Other academic)Alternative title
Maskininlärningsbaserad analys av DNA-metyleringsmönster i pediatrisk akut lymfatisk leukemi (Swedish)
Abstract [en]

Acute lymphoblastic leukemia (ALL) is the most common pediatric cancer in the Nordic countries. Recent evidence indicate that DNA methylation (DNAm) play a central role in the development and progression of the disease.

DNAm profiles of a collection of ALL patient samples and a panel of non-leukemic reference samples were analyzed using the Infinium 450k methylation assay. State-of-the-art machine learning algorithms were used to search the large amounts of data produced for patterns predictive of future relapses, in vitro drug resistance, and cytogenetic subtypes, aiming at improving our understanding of the disease and ultimately improving treatment.

In paper I, the predictive modeling framework developed to perform the analyses of DNAm dataset was presented. It focused on uncompromising statistical rigor and computational efficiency, while allowing a high level of modeling flexibility and usability. In paper II, the DNAm landscape of ALL was comprehensively characterized, discovering widespread aberrant methylation at diagnosis strongly influenced by cytogenetic subtype. The aberrantly methylated regions were enriched for genes repressed by polycomb group proteins, repressively marked histones in healthy cells, and genes associated with embryonic development. A consistent trend of hypermethylation at relapse was also discovered. In paper III, a tool for DNAm-based subtyping was presented, validated using blinded samples and used to re-classify samples with incomplete phenotypic information. Using RNA-sequencing, previously undetected non-canonical aberrations were found in many re-classified samples. In paper IV, the relationship between DNAm and in vitro drug resistance was investigated and predictive signatures were obtained for seven of the eight therapeutic drugs studied. Interpretation was challenging due to poor correlation between DNAm and gene expression, further complicated by the discovery that random subsets of the array can yield comparable classification accuracy. Paper V presents a novel Bayesian method for multivariate density estimation with variable bandwidths. Simulations showed comparable performance to the current state-of-the-art methods and an advantage on skewed distributions.

In conclusion, the studies characterize the information contained in the aberrant DNAm patterns of ALL and assess its predictive capabilities for future relapses, in vitro drug sensitivity and subtyping. They also present three publicly available tools for the scientific community to use.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2015. , 68 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine, ISSN 1651-6206 ; 1069
National Category
Bioinformatics (Computational Biology) Hematology Cancer and Oncology
Identifiers
URN: urn:nbn:se:uu:diva-242544ISBN: 978-91-554-9151-2 (print)OAI: oai:DiVA.org:uu-242544DiVA: diva2:784373
Public defence
2015-03-13, Auditorium minus, Museum Gustavianum, Akademigatan 3, Uppsala, 14:00 (English)
Opponent
Supervisors
Funder
Swedish Foundation for Strategic Research , RBc08-008
Available from: 2015-02-19 Created: 2015-01-27 Last updated: 2015-03-27Bibliographically approved
List of papers
1. Developer Friendly and Computationally Efficient Predictive Modeling without Information Leakage: The emil Package for R
Open this publication in new window or tab >>Developer Friendly and Computationally Efficient Predictive Modeling without Information Leakage: The emil Package for R
(English)In: Journal of Statistical Software, ISSN 1548-7660, E-ISSN 1548-7660Article in journal (Other academic) Submitted
Abstract [en]

Machine learning-based solutions to predictive modeling problems (classification, regression, or survival analysis) typically involve a number of steps beginning with data pre-processing and ending with performance evaluation. A large number of packages providing tools for the individual steps are available for R but not for facilitating the assembly of them into complete modeling procedures or rigorously evaluating their combined performance.

We present a new package for R denoted emil (evaluation of modeling without information leakage) that is designed to be a flexible backbone of modeling procedures having the following properties:(1) Enable evaluation of performance and variable importance by means of resampling methods without introducing information leakage.(2) Return parameter tuning statistics and final prediction models.(3) Transparent, highly customizable and easy to debug structure.(4) Offer the user direct control over memory and CPU-intensive steps of the calculations.(5) Comprehensive yet concise documentation.

First we explain emil's functionality in the context of standard usage, resampling, and customization. Specific application examples are presented to show its potential in terms of parallelization, customization for survival analysis, and memory management.

The result is a computationally efficient and developer friendly framework that enables resampling based analyzes using several hundreds of thousands of variables, is easy to extend, and allows development of scalable solutions.

Keyword
predictive modeling, machine learning, performance evaluation, resampling, high performance computing
National Category
Computational Mathematics
Research subject
Materials Science
Identifiers
urn:nbn:se:uu:diva-242353 (URN)
Funder
Swedish Foundation for Strategic Research , RBc08-008
Available from: 2015-01-25 Created: 2015-01-25 Last updated: 2017-12-05Bibliographically approved
2. Genome-wide signatures of differential DNA methylation in pediatric acute lymphoblastic leukemia
Open this publication in new window or tab >>Genome-wide signatures of differential DNA methylation in pediatric acute lymphoblastic leukemia
Show others...
2013 (English)In: Genome Biology, ISSN 1465-6906, E-ISSN 1474-760X, Vol. 14, no 9, r105- p.Article in journal (Refereed) Published
Abstract [en]

BACKGROUND:

Although aberrant DNA methylation has been observed previously in acute lymphoblastic leukemia (ALL), the patterns of differential methylation have not been comprehensively determined in all subtypes of ALL on a genome-wide scale. The relationship between DNA methylation, cytogenetic background, drug resistance and relapse in ALL is poorly understood.

RESULTS:

We surveyed the DNA methylation levels of 435,941 CpG sites in samples from 764 children at diagnosis of ALL and from 27 children at relapse. This survey uncovered four characteristic methylation signatures. First, compared with control blood cells, the methylomes of ALL cells shared 9,406 predominantly hypermethylated CpG sites, independent of cytogenetic background. Second, each cytogenetic subtype of ALL displayed a unique set of hyper- and hypomethylated CpG sites. The CpG sites that constituted these two signatures differed in their functional genomic enrichment to regions with marks of active or repressed chromatin. Third, we identified subtype-specific differential methylation in promoter and enhancer regions that were strongly correlated with gene expression. Fourth, a set of 6,612 CpG sites was predominantly hypermethylated in ALL cells at relapse, compared with matched samples at diagnosis. Analysis of relapse-free survival identified CpG sites with subtype-specific differential methylation that divided the patients into different risk groups, depending on their methylation status.

CONCLUSIONS:

Our results suggest an important biological role for DNA methylation in the differences between ALL subtypes and in their clinical outcome after treatment.

National Category
Medical Genetics
Identifiers
urn:nbn:se:uu:diva-208296 (URN)10.1186/gb-2013-14-9-r105 (DOI)000328195700011 ()24063430 (PubMedID)
Note

De två första författarna delar förstaförfattarskapet.

Available from: 2013-09-27 Created: 2013-09-27 Last updated: 2017-12-06Bibliographically approved
3. DNA methylation-based subtype prediction for pediatric acute lymphoblastic leukemia
Open this publication in new window or tab >>DNA methylation-based subtype prediction for pediatric acute lymphoblastic leukemia
Show others...
2015 (English)In: Clinical Epigenetics, E-ISSN 1868-7083, Vol. 7, 11Article in journal (Refereed) Published
Abstract [en]

Background

We present a method that utilizes DNA methylation profiling for prediction of the cytogenetic subtypes of acute lymphoblastic leukemia (ALL) cells from pediatric ALL patients. The primary aim of our study was to improve risk stratification of ALL patients into treatment groups using DNA methylation as a complement to current diagnostic methods. A secondary aim was to gain insight into the functional role of DNA methylation in ALL.

Results

We used the methylation status of ~450,000 CpG sites in 546 well-characterized patients with T-ALL or seven recurrent B-cell precursor ALL subtypes to design and validate sensitive and accurate DNA methylation classifiers. After repeated cross-validation, a final classifier was derived that consisted of only 246 CpG sites. The mean sensitivity and specificity of the classifier across the known subtypes was 0.90 and 0.99, respectively. We then used DNA methylation classification to screen for subtype membership of 210 patients with undefined karyotype (normal or no result) or non-recurrent cytogenetic aberrations (‘other’ subtype). Nearly half (n = 106) of the patients lacking cytogenetic subgrouping displayed highly similar methylation profiles as the patients in the known recurrent groups. We verified the subtype of 20% of the newly classified patients by examination of diagnostic karyotypes, array-based copy number analysis, and detection of fusion genes by quantitative polymerase chain reaction (PCR) and RNA-sequencing (RNA-seq). Using RNA-seq data from ALL patients where cytogenetic subtype and DNA methylation classification did not agree, we discovered several novel fusion genes involving ETV6, RUNX1, and PAX5.

Conclusions

Our findings indicate that DNA methylation profiling contributes to the clarification of the heterogeneity in cytogenetically undefined ALL patient groups and could be implemented as a complementary method for diagnosis of ALL. The results of our study provide clues to the origin and development of leukemic transformation. The methylation status of the CpG sites constituting the classifiers also highlight relevant biological characteristics in otherwise unclassified ALL patients.

National Category
Hematology
Identifiers
urn:nbn:se:uu:diva-242351 (URN)10.1186/s13148-014-0039-z (DOI)000350260800001 ()25729447 (PubMedID)
Funder
Swedish Foundation for Strategic Research , RBc08-008
Note

De två sista författarna delar sistaförfattarskapet.

Available from: 2015-01-25 Created: 2015-01-25 Last updated: 2017-12-05Bibliographically approved
4. DNA methylation-based prediction of in vitro drug resistance in primary pediatric acute lymphoblastic leukemia patient samples
Open this publication in new window or tab >>DNA methylation-based prediction of in vitro drug resistance in primary pediatric acute lymphoblastic leukemia patient samples
Show others...
(English)Manuscript (preprint) (Other academic)
National Category
Cancer and Oncology Hematology Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:uu:diva-242543 (URN)
Funder
Swedish Foundation for Strategic Research , RBc08-008
Available from: 2015-01-27 Created: 2015-01-27 Last updated: 2015-03-11
5. Bayesian model averaging of adaptive bandwidth kernel density estimators yields state-of-the-art performance
Open this publication in new window or tab >>Bayesian model averaging of adaptive bandwidth kernel density estimators yields state-of-the-art performance
(English)Manuscript (preprint) (Other academic)
Keyword
Variable kernel density estimation, adaptive kernel density estimation, Bayesian model averaging, variable bandwidth, square root law
National Category
Probability Theory and Statistics
Identifiers
urn:nbn:se:uu:diva-242354 (URN)
Funder
Swedish Foundation for Strategic Research , RBc08-008EU, FP7, Seventh Framework Programme, PROACTIVE
Available from: 2015-01-27 Created: 2015-01-25 Last updated: 2015-03-11

Open Access in DiVA

fulltext(2570 kB)675 downloads
File information
File name FULLTEXT01.pdfFile size 2570 kBChecksum SHA-512
1119d7eb052d1d721d71e3190848b230528a794e26caea150c3c30f8ec1ca4b88f5987889ddacf0f90b91adc226e2f56eb249e1f4875b03cf23d31ae919247b8
Type fulltextMimetype application/pdf