Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Analysis of RNA and DNA sequencing data: Improved bioinformatics applications
KTH, School of Biotechnology (BIO), Gene Technology.
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Massively parallel sequencing has rapidly revolutionized DNA and RNA research. Sample preparations are steadfastly advancing, sequencing costs have plummeted and throughput is ever growing. This progress has resulted in exponential growth in data generation with a corresponding demand for bioinformatic solutions. This thesis addresses methodological aspects of this sequencing revolution and applies it to selected biological topics.

Papers I and II are technical in nature and concern sample preparation and data anal- ysis of RNA sequencing data. Paper I is focused on RNA degradation and paper II on generating strand specific RNA-seq libraries.

Paper III and IV deal with current biological issues. In paper III, whole exomes of cancer patients undergoing chemotherapy are sequenced and their genetic variants associ- ated to their toxicity induced adverse drug reactions. In paper IV a comprehensive view of the gene expression of the endometrium is assessed from two time points of the menstrual cycle.

Together these papers show relevant aspects of contemporary sequencing technologies and how it can be applied to diverse biological topics. 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2016. , 135 p.
Series
TRITA-BIO-Report, ISSN 1654-2312 ; 2016:2
Keyword [en]
RNA sequencing, exome sequencing, bioinformatics, gene expression, differential expression, variant calling
National Category
Bioinformatics and Systems Biology
Research subject
Biotechnology
Identifiers
URN: urn:nbn:se:kth:diva-184158ISBN: 978-91-7595-894-1 (print)OAI: oai:DiVA.org:kth-184158DiVA: diva2:915153
Public defence
2016-04-22, Inghesalen, Tomtebodavägen 18A, Solna, Stockholm, 10:00 (English)
Opponent
Supervisors
Funder
Swedish Research CouncilKnut and Alice Wallenberg Foundation
Note

QC 20160329

Available from: 2016-03-29 Created: 2016-03-29 Last updated: 2016-03-29Bibliographically approved
List of papers
1. Sequencing Degraded RNA Addressed by 3' Tag Counting
Open this publication in new window or tab >>Sequencing Degraded RNA Addressed by 3' Tag Counting
2014 (English)In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 9, no 3, e91851- p.Article in journal (Refereed) Published
Abstract [en]

RNA sequencing has become widely used in gene expression profiling experiments. Prior to any RNA sequencing experiment the quality of the RNA must be measured to assess whether or not it can be used for further downstream analysis. The RNA integrity number (RIN) is a scale used to measure the quality of RNA that runs from 1 (completely degraded) to 10 (intact). Ideally, samples with high RIN (>8) are used in RNA sequencing experiments. RNA, however, is a fragile molecule which is susceptible to degradation and obtaining high quality RNA is often hard, or even impossible when extracting RNA from certain clinical tissues. Thus, occasionally, working with low quality RNA is the only option the researcher has. Here we investigate the effects of RIN on RNA sequencing and suggest a computational method to handle data from samples with low quality RNA which also enables reanalysis of published datasets. Using RNA from a human cell line we generated and sequenced samples with varying RINs and illustrate what effect the RIN has on the basic procedure of RNA sequencing; both quality aspects and differential expression. We show that the RIN has systematic effects on gene coverage, false positives in differential expression and the quantification of duplicate reads. We introduce 3' tag counting (3TC) as a computational approach to reliably estimate differential expression for samples with low RIN. We show that using the 3TC method in differential expression analysis significantly reduces false positives when comparing samples with different RIN, while retaining reasonable sensitivity.

Keyword
SEQ, Quantification, Transcription, Degradation, Integrity, Number
National Category
Biological Sciences
Identifiers
urn:nbn:se:kth:diva-144364 (URN)10.1371/journal.pone.0091851 (DOI)000332858400109 ()2-s2.0-84897972088 (Scopus ID)
Funder
Swedish Research CouncilKnut and Alice Wallenberg FoundationScience for Life Laboratory - a national resource center for high-throughput molecular bioscience
Note

QC 20140423

Available from: 2014-04-23 Created: 2014-04-22 Last updated: 2017-12-05Bibliographically approved
2. Analysis of stranded information using an automated procedure for strand specific RNA sequencing
Open this publication in new window or tab >>Analysis of stranded information using an automated procedure for strand specific RNA sequencing
2014 (English)In: BMC Genomics, ISSN 1471-2164, E-ISSN 1471-2164, Vol. 15, no 1, 631Article in journal (Refereed) Published
Abstract [en]

Background: Strand specific RNA sequencing is rapidly replacing conventional cDNA sequencing as an approach for assessing information about the transcriptome. Alongside improved laboratory protocols the development of bioinformatical tools is steadily progressing. In the current procedure the Illumina TruSeq library preparation kit is used, along with additional reagents, to make stranded libraries in an automated fashion which are then sequenced on Illumina HiSeq 2000. By the use of freely available bioinformatical tools we show, through quality metrics, that the protocol is robust and reproducible. We further highlight the practicality of strand specific libraries by comparing expression of strand specific libraries to non-stranded libraries, by looking at known antisense transcription of pseudogenes and by identifying novel transcription. Furthermore, two ribosomal depletion kits, RiboMinus and RiboZero, are compared and two sequence aligners, Tophat2 and STAR, are also compared. Results: The, non-stranded, Illumina TruSeq kit can be adapted to generate strand specific libraries and can be used to access detailed information on the transcriptome. The RiboZero kit is very effective in removing ribosomal RNA from total RNA and the STAR aligner produces high mapping yield in a short time. Strand specific data gives more detailed and correct results than does non-stranded data as we show when estimating expression values and in assembling transcripts. Even well annotated genomes need improvements and corrections which can be achieved using strand specific data. Conclusions: Researchers in the field should strive to use strand specific data; it allows for more confidence in the data analysis and is less likely to lead to false conclusions. If faced with analysing non-stranded data, researchers should be well aware of the caveats of that approach.

Keyword
Antisense RNA, Bioinformatics, Ribosomal depletion, RNA sequencing, Strand specificity
National Category
Biological Sciences
Identifiers
urn:nbn:se:kth:diva-161768 (URN)10.1186/1471-2164-15-631 (DOI)2-s2.0-84904776692 (Scopus ID)
Funder
Science for Life Laboratory - a national resource center for high-throughput molecular bioscienceSwedish Research Council
Note

QC 20150317

Available from: 2015-03-17 Created: 2015-03-17 Last updated: 2017-12-04Bibliographically approved
3. Genetic association of gemcitabine/carboplatin induced myelosuppression in patients with non-small cell lung cancer using whole exome sequencing
Open this publication in new window or tab >>Genetic association of gemcitabine/carboplatin induced myelosuppression in patients with non-small cell lung cancer using whole exome sequencing
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Purpose: Chemotherapy induced myelosuppression is a recurrent problem in cancer treatment, both for the patients’ quality of life and response. Severe hematological toxicities lead to dose reduction, postponed or ceased treatment, affecting the treatment effect. Identifying genetic markers associated with toxicity is an important factor for individualized chemotherapy and might increase the overall effect of the treatment.

Material and methods: Non-small cell lung cancer patients undergoing gemcitabine/carboplatin chemotherapy were included and their exomes were sequenced. Genetic variants from 212 exomes were correlated to thrombocytopenia, leukopenia, and neutropenia on single nucleotide and gene level. Results were processed through enrichment analysis and variants were validated using externally available datasets.

Results: SNV analysis identified 103, 131 and 112 variants to be associated with thrombocytopenia, leukopenia and neutropenia, respectively. Gene based analysis identified 21, 54 and 31 genes to be associated with thrombocytopenia, leukopenia and neutropenia, respectively. Using external data sets 8, 26 and 9 SNVs were validated through linkage disequilibrium for thrombocytopenia, leukopenia and neutropenia, respectively.

The variant rs61739531 (CADD = 25.7) in the gene MYO1G was identified to be associated with high toxicity in all forms of myelosuppression. Validated variants include rs6118 (CADD = 22.3) in SERPINA5, rs16910526 (CADD = 35.0) in CLEC7A and rs79350244 (CADD = 24.2) in DNAH2. Enrichment analysis of associated genes identified the pathways hemostasis, HIF-1 alpha transcription factor network and vitamin B12 metabolism to be involved in thrombocytopenia, leukopenia and neutropenia, respectively.

Factors involved in megakaryocyte development and platelet production, was also associated with thrombocytopenia for three genes JMJD1C with the variant rs34491125 (CADD = 22.1), DOCK8 with the variant rs10491684 (CADD = 11.7) and CAPZA2 based on three variants in the gene based analysis.

Conclusion: The results highlight genetic markers and relevant pathways associated with chemotherapy induced myelosuppression and form a strong foundation for further investigation into toxicity induced myelosuppression. 

Keyword
chemotherapy, adverse drug reaction, exome sequencing, lung cancer, myelosuppression
National Category
Cancer and Oncology
Research subject
Biotechnology
Identifiers
urn:nbn:se:kth:diva-184155 (URN)
Funder
Swedish Cancer SocietySwedish Research Council
Note

QC 20160404

Available from: 2016-03-29 Created: 2016-03-29 Last updated: 2016-04-04Bibliographically approved
4. Comprehensive RNA sequencing of healthy human endometrium at two time points of the menstrual cycle
Open this publication in new window or tab >>Comprehensive RNA sequencing of healthy human endometrium at two time points of the menstrual cycle
Show others...
2016 (English)Manuscript (preprint) (Other academic)
Abstract [en]

Endometrial receptivity is crucial for implantation and establishment of a normal pregnancy. The shift from proliferative to receptive endometrium is still far from understood. In this paper we comprehensively present the transcriptome of the human endometrium by comparing endometrial biopsies from proliferative phase with consecutive biopsies 7-9 days after ovulation. The results show a clear difference in expression between the two time points using both total and small RNA sequencing.  3297 mRNAs, 516 long non-coding RNAs and 102 small non-coding RNAs were identified as statistically differentially expressed between the two time points. We show a thorough description of the change in mRNA between the two time points and display lncRNAs, snoRNAs and snRNAs not previously reported in the healthy human endometrium. In conclusion this paper reports in detail the shift in RNA expression from the proliferative to receptive endometrium.

Keyword
RNA-seq, endometrium, menstruation cycle, differential expression, small RNA, total RNA
National Category
Endocrinology and Diabetes
Identifiers
urn:nbn:se:kth:diva-184156 (URN)
Funder
Swedish Research Council
Note

QC 20161124

Available from: 2016-03-29 Created: 2016-03-29 Last updated: 2016-11-24Bibliographically approved

Open Access in DiVA

fulltext(4981 kB)149 downloads
File information
File name FULLTEXT01.pdfFile size 4981 kBChecksum SHA-512
60a58a1f57205d992f2a4f4febb35e9d73ffd98447491c4da0c2ea68829fd4d18ab8ed23696c20cbecb06c2d8e2978cd98b3c82fe4a96a8c12615f2e8fed6dec
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Sigurgeirsson, Benjamín
By organisation
Gene Technology
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 149 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 648 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf