Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Modeling the intronic regulation of Alternative Splicing using Deep Convolutional Neural Nets
KTH, School of Computer Science and Communication (CSC).
2015 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
En metod baserad på djupa neurala nätverk för att modellera regleringen av Alternativ Splicing (Swedish)
Abstract [en]

This paper investigates the use of deep Convolutional Neural Networks for modeling the intronic regulation of Alternative Splicing on the basis of DNA sequence. By training the CNN on massively parallel synthetic DNA libraries of Alternative 5'-splicing and Alternatively Skipped exon events, the model is capable of predicting the relative abundance of alternatively spliced mRNA isoforms on held-out library data to a very high accuracy (R2 = 0.77 for Alt. 5'-splicing). Furthermore, the CNN is shown to generalize alternative splicing across cell lines efficiently. The Convolutional Neural Net is tested against a Logistic regression model and the results show that while prediction accuracy on the synthetic library is notably higher compared to the LR model, the CNN is worse at generalizing to new intronic contexts. Tests on non-synthetic human SNP genes suggest the CNN is dependent on the relative position of the intronic region it was trained for, a problem which is alleviated with LR.

The increased library prediction accuracy of the CNN compared to Logistic regression is concluded to come from the non-linearity introduced by the deep layer architecture. It adds the capacity to model complex regulatory interactions and combinatorial RBP effects which studies have shown largely affect alternative splicing. However, the architecture makes interpreting the CNN hard, as the regulatory interactions are encoded deep within the layers. Nevertheless, high-performance modeling of alternative splicing using CNNs may still prove useful in numerous Synthetic biology applications, for example to model differentially spliced genes as is done in this paper.

Abstract [sv]

Den här uppsatsen undersöker hur djupa neurala nätverk baserade på faltning ("Convolutions") kan användas för att modellera den introniska regleringen av Alternativ Splicing med endast DNA-sekvensen som indata. Nätverket tränas på ett massivt parallelt bibliotek av syntetiskt DNA innehållandes Alternativa Splicing-event där delar av de introniska regionerna har randomiserats. Uppsatsen visar att nätverksarkitekturen kan förutspå den relativa mängden alternativt splicat RNA till en mycket hög noggrannhet inom det syntetiska biblioteket. Modellen generaliserar även alternativ splicing mellan mänskliga celltyper väl. Hursomhelst, tester på icke-syntetiska mänskliga gener med SNP-mutationer visar att nätverkets prestanda försämras när den introniska region som används som indata flyttas i jämförelse till den relativa position som modellen tränats på.

Uppsatsen jämför modellen med Logistic regression och drar slutsatsen att nätverkets förbättrade prestanda grundar sig i dess förmåga att modellera icke-linjära beroenden i datan. Detta medför dock svårigheter i att tolka vad modellen faktiskt lärt sig, eftersom interaktionen mellan reglerande element är inbäddat i nätverkslagren. Trots det kan högpresterande modellering av alternativ splicing med hjälp av neurala nät vara användbart, exempelvis inom Syntetisk biologi där modellen kan användas för att kontrollera regleringen av splicing när man konstruerar syntetiska gener.

Place, publisher, year, edition, pages
2015. , 39 p.
Keyword [en]
Machine Learning, Deep Learning, CNN, CNNs, Convolutional Neural Networks, Convolutional Neural Nets, Neural Networks, Neural Nets, Synthetic Biology, Alternative Splicing, AS, Splicing, DNA Regulation, Synthetic DNA, Massively Parallel Library
Keyword [sv]
Maskinlärning, CNN, CNNs, Convolutional Neural Networks, Convolutional Neural Nets, Faltning, Neurala Nätverk, Neurala Nät, Syntetisk Biologi, Alternativ Splicing, Splicing, AS, DNA, Massivt Parallellt Bibliotek, Syntetiskt DNA
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-172327OAI: oai:DiVA.org:kth-172327DiVA: diva2:846714
External cooperation
Seelig Lab, University of Washington
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2015-08-21 Created: 2015-08-18 Last updated: 2015-08-21Bibliographically approved

Open Access in DiVA

fulltext(10387 kB)303 downloads
File information
File name FULLTEXT01.pdfFile size 10387 kBChecksum SHA-512
c12d50573bbff2164a3b23cf6952e9d2a0035d9134ede0c9a7c503340fa8d5ea3ef161668625d43a2d9b11d74d2b6b2ccb6cba9966f28dae7fe79202665d13ed
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 303 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 2316 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf