Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Enabling massive genomic and transcriptomic analysis
KTH, School of Biotechnology (BIO), Gene Technology.
2011 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In recent years there have been tremendous advances in our ability to rapidly and cost-effectively sequence DNA. This has revolutionized the fields of genetics and biology, leading to a deeper understanding of the molecular events in life processes. The rapid advances have enormously expanded sequencing opportunities and applications, but also imposed heavy strains on steps prior to sequencing, as well as the subsequent handling and analysis of the massive amounts of sequence data that are generated, in order to exploit the full capacity of these novel platforms. The work presented in this thesis (based on six appended papers) has contributed to balancing the sequencing process by developing techniques to accelerate the rate-limiting steps prior to sequencing, facilitating sequence data analysis and applying the novel techniques to address biological questions.

 

Papers I and II describe techniques to eliminate expensive and time-consuming preparatory steps through automating library preparation procedures prior to sequencing. The automated procedures were benchmarked against standard manual procedures and were found to substantially increase throughput while maintaining high reproducibility. In Paper III, a novel algorithm for fast classification of sequences in complex datasets is described. The algorithm was first optimized and validated using a synthetic metagenome dataset and then shown to enable faster analysis of an experimental metagenome dataset than conventional long-read aligners, with similar accuracy. Paper IV, presents an investigation of the molecular effects on the p53 gene of exposing human skin to sunlight during the course of a summer holiday. There was evidence of previously accumulated persistent p53 mutations in 14% of all epidermal cells. Most of these mutations are likely to be passenger events, as the affected cell compartments showed no apparent growth advantage. An annual rate of 35,000 novel sun-induced persistent p53 mutations was estimated to occur in sun-exposed skin of a human individual.  Paper V, assesses the effect of using RNA obtained from whole cell extracts (total RNA) or cytoplasmic RNA on quantifying transcripts detected in subsequent analysis. Overall, more differentially detected genes were identified when using the cytoplasmic RNA. The major reason for this is related to the reduced complexity of cytoplasmic RNA, but also apparently due (at least partly) to the nuclear retention of transcripts with long, structured 5’- and 3’-untranslated regions or long protein coding sequences. The last paper, VI, describes whole-genome sequencing of a large, consanguineous family with a history of Leber hereditary optic neuropathy (LHON) on the maternal side. The analysis identified new candidate genes, which could be important in the aetiology of LHON. However, these candidates require further validation before any firm conclusions can be drawn regarding their contribution to the manifestation of LHON.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology , 2011. , 45 p.
Series
Trita-BIO-Report, ISSN 1654-2312 ; 2011:24
Keyword [en]
DNA, RNA, sequencing, massively parallel sequencing, alignment, assembly, single nucleotide polymorphism, LHON
National Category
Biological Sciences
Identifiers
URN: urn:nbn:se:kth:diva-45957ISBN: 978-91-7501-164-6 (print)OAI: oai:DiVA.org:kth-45957DiVA: diva2:456668
Public defence
2011-12-02, Petrén‐salen, Nobels väg 12B, Karolinska Institute Campus Solna, Stockholm, 13:00 (English)
Opponent
Supervisors
Note
QC 20111115Available from: 2011-11-15 Created: 2011-11-01 Last updated: 2011-11-15Bibliographically approved
List of papers
1. Increased Throughput by Parallelization of Library Preparation for Massive Sequencing
Open this publication in new window or tab >>Increased Throughput by Parallelization of Library Preparation for Massive Sequencing
Show others...
2010 (English)In: PLOS ONE, ISSN 1932-6203, Vol. 5, no 3, e10029- p.Article in journal (Refereed) Published
Abstract [en]

Background: Massively parallel sequencing systems continue to improve on data output, while leaving labor-intensive library preparations a potential bottleneck. Efforts are currently under way to relieve the crucial and time-consuming work to prepare DNA for high-throughput sequencing. Methodology/Principal Findings: In this study, we demonstrate an automated parallel library preparation protocol using generic carboxylic acid-coated superparamagnetic beads and polyethylene glycol precipitation as a reproducible and flexible method for DNA fragment length separation. With this approach the library preparation for DNA sequencing can easily be adjusted to a desired fragment length. The automated protocol, here demonstrated using the GS FLX Titanium instrument, was compared to the standard manual library preparation, showing higher yield, throughput and great reproducibility. In addition, 12 libraries were prepared and uniquely tagged in parallel, and the distribution of sequence reads between these indexed samples could be improved using quantitative PCR-assisted pooling. Conclusions/Significance: We present a novel automated procedure that makes it possible to prepare 36 indexed libraries per person and day, which can be increased to up to 96 libraries processed simultaneously. The yield, speed and robust performance of the protocol constitute a substantial improvement to present manual methods, without the need of extensive equipment investments. The described procedure enables a considerable efficiency increase for small to midsize sequencing centers.

Keyword
POLYETHYLENE-GLYCOL, DNA, PRECIPITATION
National Category
Other Industrial Biotechnology
Identifiers
urn:nbn:se:kth:diva-28306 (URN)10.1371/journal.pone.0010029 (DOI)000276420400007 ()2-s2.0-77956313182 (Scopus ID)
Funder
Swedish Research CouncilKnut and Alice Wallenberg Foundation
Note
QC 20110113Available from: 2011-01-13 Created: 2011-01-12 Last updated: 2012-11-16Bibliographically approved
2. Scalable Transcriptome Preparation for Massive Parallel Sequencing
Open this publication in new window or tab >>Scalable Transcriptome Preparation for Massive Parallel Sequencing
2011 (English)In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 6, no 7, e21910- p.Article in journal (Refereed) Published
Abstract [en]

Background: The tremendous output of massive parallel sequencing technologies requires automated robust and scalable sample preparation methods to fully exploit the new sequence capacity. Methodology: In this study, a method for automated library preparation of RNA prior to massively parallel sequencing is presented. The automated protocol uses precipitation onto carboxylic acid paramagnetic beads for purification and size selection of both RNA and DNA. The automated sample preparation was compared to the standard manual sample preparation. Conclusion/Significance: The automated procedure was used to generate libraries for gene expression profiling on the Illumina HiSeq 2000 platform with the capacity of 12 samples per preparation with a significantly improved throughput compared to the standard manual preparation. The data analysis shows consistent gene expression profiles in terms of sensitivity and quantification of gene expression between the two library preparation methods.

National Category
Biological Sciences
Identifiers
urn:nbn:se:kth:diva-37159 (URN)10.1371/journal.pone.0021910 (DOI)000292655400026 ()2-s2.0-79960041380 (Scopus ID)
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscience
Note
QC 20110803Available from: 2011-08-03 Created: 2011-08-02 Last updated: 2017-12-08Bibliographically approved
3. Classification of DNA sequences using Bloom filters
Open this publication in new window or tab >>Classification of DNA sequences using Bloom filters
Show others...
2010 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 26, no 13, 1595-1600 p.Article in journal (Refereed) Published
Abstract [en]

Motivation: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the 'novel' sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. Results: A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS achieves comparable accuracy as BLAT and SSAHA2 but is at least 21 times faster in classifying sequences.

National Category
Biochemistry and Molecular Biology
Identifiers
urn:nbn:se:kth:diva-27282 (URN)10.1093/bioinformatics/btq230 (DOI)000278967500003 ()2-s2.0-77954187316 (Scopus ID)
Note
QC 20101214Available from: 2010-12-14 Created: 2010-12-09 Last updated: 2017-12-11Bibliographically approved
4. Sun‐induced Missense Mutations Are Extensively Accumulated and Tolerated in Phenotypically Intact Stem Cell Compartments of Human Skin
Open this publication in new window or tab >>Sun‐induced Missense Mutations Are Extensively Accumulated and Tolerated in Phenotypically Intact Stem Cell Compartments of Human Skin
Show others...
(English)Article in journal (Refereed) Submitted
Abstract [en]

Here we demonstrate that intermittently sun‐exposed human skin contains an extensive number of phenotypically intact stem cell compartments bearing missense mutations in the p53 tumor suppressor gene. Deep sequencing of sun‐exposed and shielded microdissected skin from mid‐life individuals revealed that persistent p53 mutations had accumulated in 14% of all epidermal cells, with no apparent signs of a growth advantage of the affected cell compartments. Furthermore, 6% of the mutated epidermal cells encoded a truncated protein. The abundance of these events, not taking into account intron mutations and mutations in other genes that also may have functional implications, suggests an extensive tolerance of human cells to severe genetic alterations caused by ultraviolet light, with an estimated annual rate of accumulation of approximately 35,000 new persistent protein altering p53 mutations in sun exposed skin of a human individual.

Identifiers
urn:nbn:se:kth:diva-12404 (URN)
Note
QC 20100416Available from: 2010-04-16 Created: 2010-04-16 Last updated: 2011-11-15Bibliographically approved
5. Transcript nuclear retention effects quantification of gene expression levels
Open this publication in new window or tab >>Transcript nuclear retention effects quantification of gene expression levels
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

The majority of published differential gene expression studies have used RNA isolated from whole cell extracts (total RNA), overlooking the potential impact of including the nuclear transcriptome in the analyses. It has not been firmly established that the contribution of nuclear RNA is negligible or how the inclusion of it affects quantification of gene expression. Previous studies have estimated that the nuclear transcriptome is five to ten times more complex than the cytoplasmic [1]. Hence, RNA purified solely from the cytoplasm should have fewer unique transcripts, resulting in more sequence counts per transcript and resulting in increased power to detect remaining transcripts. In this study, cytoplasmic and total mRNA have been prepared from three human cell‐lines and sequenced using massive parallel sequencing. The resulting sequence data was analyzed regarding the effect of number of biological replicates, read length and transcripts fractionation on calling differentially detected genes. In addition, the impact of length and secondary structure of mRNAs un‐translated regions (UTRs), and coding sequence length on nucleus to cytoplasm transportation rates of mRNAs was studied. We observe that the number of differentially detected genes was not significantly increased by adding more than three biological replicates or by increasing the sequence read length > 35bp. More differentially detected genes were found in the cytoplasmic RNA compared to the total RNA and a nuclear retention effect was observed for transcripts with long and structured 5’‐ and 3’‐UTR or long protein coding sequences.

Keyword
RNA, RNA-Seq, transcriptome, nuclear retention
National Category
Biological Sciences
Identifiers
urn:nbn:se:kth:diva-48039 (URN)
Note
QS 2011Available from: 2011-11-15 Created: 2011-11-15 Last updated: 2011-11-15Bibliographically approved
6. ­A strategy for identifying nuclear modifier genes by massively parallel whole-genome sequencing
Open this publication in new window or tab >>­A strategy for identifying nuclear modifier genes by massively parallel whole-genome sequencing
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Leber hereditary optic neuropathy (LHON) results from mutations in mtDNA, butadditional factors are required for disease expression. LHON is thus a model for theconcept of modifiers affecting expression of single gene diseases. No modifier factorhas yet been clearly identified. Here we describe a large, consanguineous familyaffected by LHON with offspring showing variable disease expression. This providesan opportunity to investigate the presence of nuclear modifiers in homozygousgenomic regions. We analyzed genomes from six members, parents and foursiblings. Each genome was sequenced to >23x coverage and approximately 3.8million single nucleotide variants and small indels per individual were called, where17,000‐20,000 were located in the exome. As a first step, we hypothesize that amodifier gene affecting penetrance of the LHON mutation, and another modifiergene predisposing to an aggravated phenotype, are located in the protein‐codingparts of the genome (the exome). As we gain experience in data analysis, this can befollowed by extended analyses of additional genomic regions. Our initial, simplehypothesis generated five lists of candidate modifier genes, conforming to fivedifferent models of inheritance. In total, 86 candidate genes were identified and 11of these genes contained 14 variants that were further validated by Sangersequencing. Additional Sanger validation in another two affected siblings reducedthe number of candidate genes to two potential disease‐causing variants.

Keyword
LHON, whole-genome sequencing, DNA, SNVs
National Category
Biological Sciences
Identifiers
urn:nbn:se:kth:diva-48041 (URN)
Note
QS 2011Available from: 2011-11-15 Created: 2011-11-15 Last updated: 2011-11-15Bibliographically approved

Open Access in DiVA

fulltext(1362 kB)338 downloads
File information
File name FULLTEXT01.pdfFile size 1362 kBChecksum SHA-512
55211f17abcd608feb4ecd181a1fd2acaaa1fee20f3326b1494923f6a9722ea9deb4ede53143f5d4c3a459e128cc257dd218ecf18b8feb9eebea8e270250558b
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Stranneheim, Henrik
By organisation
Gene Technology
Biological Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 338 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 239 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf