Change search
ReferencesLink to record
Permanent link

Direct link
Rule-based Models of Transcriptional Regulation and Complex Diseases: Applications and Development
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
2014 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

As we gain increased understanding of genetic disorders and gene regulation more focus has turned towards complex interactions. Combinations of genes or gene and environmental factors have been suggested to explain the missing heritability behind complex diseases. Furthermore, gene activation and splicing seem to be governed by a complex machinery of histone modification (HM), transcription factor (TF), and DNA sequence signals. This thesis aimed to apply and develop multivariate machine learning methods for use on such biological problems. Monte Carlo feature selection was combined with rule-based classification to identify interactions between HMs and to study the interplay of factors with importance for asthma and allergy.

Firstly, publicly available ChIP-seq data (Paper I) for 38 HMs was studied. We trained a classifier for predicting exon inclusion levels based on the HMs signals. We identified HMs important for splicing and illustrated that splicing could be predicted from the HM patterns. Next, we applied a similar methodology on data from two large birth cohorts describing asthma and allergy in children (Paper II). We identified genetic and environmental factors with importance for allergic diseases which confirmed earlier results and found candidate gene-gene and gene-environment interactions.

In order to interpret and present the classifiers we developed Ciruvis, a web-based tool for network visualization of classification rules (Paper III). We applied Ciruvis on classifiers trained on both simulated and real data and compared our tool to another methodology for interaction detection using classification. Finally, we continued the earlier study on epigenetics by analyzing HM and TF signals in genes with or without evidence of bidirectional transcription (Paper IV). We identified several HMs and TFs with different signals between unidirectional and bidirectional genes. Among these, the CTCF TF was shown to have a well-positioned peak 60-80 bp upstream of the transcription start site in unidirectional genes.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2014. , 69 p.
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1167
Keyword [en]
Histone modification, Transcription factor, Transcriptional regulation, Next-generation sequencing, Feature selection, Machine learning, Rule-based classification, Asthma, Allergy
National Category
Bioinformatics and Systems Biology Bioinformatics (Computational Biology)
Research subject
URN: urn:nbn:se:uu:diva-230159ISBN: 978-91-554-9005-8OAI: diva2:739042
Public defence
2014-10-03, BMC C8:301, Husargatan 3, Uppsala, 13:15 (English)
Available from: 2014-09-12 Created: 2014-08-19 Last updated: 2015-01-22
List of papers
1. Combinations of histone modifications mark exon inclusion levels
Open this publication in new window or tab >>Combinations of histone modifications mark exon inclusion levels
2012 (English)In: PLoS ONE, ISSN 1932-6203, Vol. 7, no 1, e29911Article in journal (Refereed) Published
Abstract [en]

Splicing is a complex process regulated by sequence at the classical splice sites and other motifs in exons and introns with an enhancing or silencing effect. In addition, specific histone modifications on nucleosomes positioned over the exons have been shown to correlate both positively and negatively with exon expression. Here, we trained a model of "IF … THEN …" rules to predict exon inclusion levels in a transcript from histone modification patterns. Furthermore, we showed that combinations of histone modifications, in particular those residing on nucleosomes preceding or succeeding the exon, are better predictors of exon inclusion levels than single modifications. The resulting model was evaluated with cross validation and had an average accuracy of 72% for 27% of the exons, which demonstrates that epigenetic signals substantially mark alternative splicing.

National Category
Cell and Molecular Biology
urn:nbn:se:uu:diva-175875 (URN)10.1371/journal.pone.0029911 (DOI)000312662100045 ()22242188 (PubMedID)
Knut and Alice Wallenberg FoundationSwedish Foundation for Strategic Research Swedish Research CouncilSwedish Cancer Society
Available from: 2012-06-13 Created: 2012-06-13 Last updated: 2015-08-11Bibliographically approved
2. Rule-Based Models of the Interplay between Genetic and Environmental Factors in Childhood Allergy
Open this publication in new window or tab >>Rule-Based Models of the Interplay between Genetic and Environmental Factors in Childhood Allergy
Show others...
2013 (English)In: PLoS ONE, ISSN 1932-6203, Vol. 8, no 11, e80080- p.Article in journal (Refereed) Published
Abstract [en]

Both genetic and environmental factors are important for the development of allergic diseases. However, a detailed understanding of how such factors act together is lacking. To elucidate the interplay between genetic and environmental factors in allergic diseases, we used a novel bioinformatics approach that combines feature selection and machine learning. In two materials, PARSIFAL (a European cross-sectional study of 3113 children) and BAMSE (a Swedish birth-cohort including 2033 children), genetic variants as well as environmental and lifestyle factors were evaluated for their contribution to allergic phenotypes. Monte Carlo feature selection and rule based models were used to identify and rank rules describing how combinations of genetic and environmental factors affect the risk of allergic diseases. Novel interactions between genes were suggested and replicated, such as between ORMDL3 and RORA, where certain genotype combinations gave odds ratios for current asthma of 2.1 (95% CI 1.2-3.6) and 3.2 (95% CI 2.0-5.0) in the BAMSE and PARSIFAL children, respectively. Several combinations of environmental factors appeared to be important for the development of allergic disease in children. For example, use of baby formula and antibiotics early in life was associated with an odds ratio of 7.4 (95% CI 4.5-12.0) of developing asthma. Furthermore, genetic variants together with environmental factors seemed to play a role for allergic diseases, such as the use of antibiotics early in life and COL29A1 variants for asthma, and farm living and NPSR1 variants for allergic eczema. Overall, combinations of environmental and life style factors appeared more frequently in the models than combinations solely involving genes. In conclusion, a new bioinformatics approach is described for analyzing complex data, including extensive genetic and environmental information. Interactions identified with this approach could provide useful hints for further in-depth studies of etiological mechanisms and may also strengthen the basis for risk assessment and prevention.

National Category
Medical and Health Sciences
urn:nbn:se:uu:diva-213817 (URN)10.1371/journal.pone.0080080 (DOI)000327311900057 ()
Available from: 2014-01-05 Created: 2014-01-04 Last updated: 2015-01-22Bibliographically approved
3. Ciruvis: a web-based tool for rule networks and interaction detection using rule-based classifiers
Open this publication in new window or tab >>Ciruvis: a web-based tool for rule networks and interaction detection using rule-based classifiers
2014 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 15, 139- p.Article in journal (Refereed) Published
Abstract [en]

Background: The use of classification algorithms is becoming increasingly important for the field of computational biology. However, not only the quality of the classification, but also its biological interpretation is important. This interpretation may be eased if interacting elements can be identified and visualized, something that requires appropriate tools and methods. Results: We developed a new approach to detecting interactions in complex systems based on classification. Using rule-based classifiers, we previously proposed a rule network visualization strategy that may be applied as a heuristic for finding interactions. We now complement this work with Ciruvis, a web-based tool for the construction of rule networks from classifiers made of IF-THEN rules. Simulated and biological data served as an illustration of how the tool may be used to visualize and interpret classifiers. Furthermore, we used the rule networks to identify feature interactions, compared them to alternative methods, and computationally validated the findings. Conclusions: Rule networks enable a fast method for model visualization and provide an exploratory heuristic to interaction detection. The tool is made freely available on the web and may thus be used to aid and improve rule-based classification.

Visualization, Rules, Interactions, Interaction detection, Classification, Rule-based classification
National Category
Biochemistry and Molecular Biology
urn:nbn:se:uu:diva-228027 (URN)10.1186/1471-2105-15-139 (DOI)000336679600001 ()
Available from: 2014-07-02 Created: 2014-07-02 Last updated: 2015-01-22Bibliographically approved
4. Different distribution of histone modifications in genes with unidirectional and bidirectional transcription and a role of CTCF and cohesin in directing transcription
Open this publication in new window or tab >>Different distribution of histone modifications in genes with unidirectional and bidirectional transcription and a role of CTCF and cohesin in directing transcription
2015 (English)In: BMC Genomics, ISSN 1471-2164, Vol. 16, 300Article in journal (Refereed) Published
Abstract [en]

Background: Several post-translational histone modifications are mainly found in gene promoters and are associated with the promoter activity. It has been hypothesized that histone modifications regulate the transcription, as opposed to the traditional view with transcription factors as the key regulators. Promoters of most active genes do not only initiate transcription of the coding sequence, but also a substantial amount of transcription of the antisense strand upstream of the transcription start site (TSS). This promoter feature has generally not been considered in previous studies of histone modifications and transcription factor binding.

Results: We annotated protein-coding genes as bi- or unidirectional depending on their mode of transcription and compared histone modifications and transcription factor occurrences between them. We found that H3K4me3, H3K9ac, and H3K27ac were significantly more enriched upstream of the TSS in bidirectional genes compared with the unidirectional ones. In contrast, the downstream histone modification signals were similar, suggesting that the upstream histone modifications might be a consequence of transcription rather than a cause. Notably, we found well-positioned CTCF and RAD21 peaks approximately 60-80 bp upstream of the TSS in the unidirectional genes. The peak heights were related to the amount of antisense transcription and we hypothesized that CTCF and cohesin act as a barrier against antisense transcription.

Conclusions: Our results provide insights into the distribution of histone modifications at promoters and suggest a novel role of CTCF and cohesin as regulators of transcriptional direction.

Antisense transcription, CTCF, RAD21, Cohesin, CAGE, Epigenetics, Transcription factor, Histone modification
National Category
Bioinformatics and Systems Biology
urn:nbn:se:uu:diva-230158 (URN)10.1186/s12864-015-1485-5 (DOI)000355166000001 ()25881024 (PubMedID)
Available from: 2014-08-19 Created: 2014-08-19 Last updated: 2015-06-26Bibliographically approved

Open Access in DiVA

fulltext(3182 kB)174 downloads
File information
File name FULLTEXT01.pdfFile size 3182 kBChecksum SHA-512
Type fulltextMimetype application/pdf
Buy this publication >>

Search in DiVA

By author/editor
Bornelöv, Susanne
By organisation
Computational and Systems Biology
Bioinformatics and Systems BiologyBioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 174 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 713 hits
ReferencesLink to record
Permanent link

Direct link