Change search
ReferencesLink to record
Permanent link

Direct link
ConDeTri: A content dependent read trimmer for Illumina data
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology.
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology.
2011 (English)In: PLoS ONE, ISSN 1932-6203, Vol. 6, no 10, e26314- p.Article in journal (Refereed) Published
Abstract [en]

During the last few years, DNA and RNA sequencing have started to play an increasingly important role in biological and medical applications, especially due to the greater amount of sequencing data yielded from the new sequencing machines and the enormous decrease in sequencing costs. Particularly, Illumina/Solexa sequencing has had an increasing impact on gathering data from model and non-model organisms. However, accurate and easy to use tools for quality filtering have not yet been established. We present ConDeTri, a method for content dependent read trimming for next generation sequencing data using quality scores of each individual base. The main focus of the method is to remove sequencing errors from reads so that sequencing reads can be standardized. Another aspect of the method is to incorporate read trimming in next-generation sequencing data processing and analysis pipelines. It can process single-end and paired-end sequence data of arbitrary length and it is independent from sequencing coverage and user interaction. ConDeTri is able to trim and remove reads with low quality scores to save computational time and memory usage during de novo assemblies.  Low coverage or large genome sequencing projects will especially gain from trimming reads.  The method can easily be incorporated into preprocessing and analysis pipelines for Illumina data.

Availability and implementation:

Freely available on the web at

Place, publisher, year, edition, pages
2011. Vol. 6, no 10, e26314- p.
Keyword [en]
Next Generatiom Sequencing, Software, Sequencing Errors
National Category
Bioinformatics and Systems Biology
Research subject
URN: urn:nbn:se:uu:diva-159761DOI: 10.1371/journal.pone.0026314ISI: 000296507500049OAI: diva2:446816
Available from: 2011-10-10 Created: 2011-10-10 Last updated: 2011-11-30Bibliographically approved
In thesis
1. Birds as a Model for Comparative Genomic Studies
Open this publication in new window or tab >>Birds as a Model for Comparative Genomic Studies
2011 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Comparative genomics provides a tool to investigate large biological datasets, i.e. genomic datasets. In my thesis I focused on inferring patterns of selection in coding and non-coding regions of avian genomes. Until recently, large comparative studies on selection were mainly restricted to model species with sequenced genomes. This limitation has been overcome with advances in sequencing technologies and it is now possible to gather large genomic data sets for non-model species. 

Next-generation sequencing data was used to study patterns of nucleotide substitutions and from this we inferred how selection has acted in the genomes of 10 non-model bird species. In general, we found evidence for a negative correlation between neutral substitution rate and chromosome size in birds. In a follow up study, we investigated two closely related bird species, to study expression levels in different tissues and pattern of selection. We found that between 2% and 18% of all genes were differentially expressed between the two species.

We showed that non-coding regions adjacent to genes are under evolutionary constraint in birds, which suggests that noncoding DNA plays an important functional role in the genome. Regions downstream to genes (3’) showed particularly high level of constraint. The level of constraint in these regions was not correlated to the length of untranslated regions, which suggests that other causes play also a role in sequence conservation.

We compared the rate of nonsynonymous substitutions to the rate of synonymous substitutions in order to infer levels of selection in protein-coding sequences. Synonymous substitutions are often assumed to evolve neutrally. We studied synonymous substitutions by estimating constraint on 4-fold degenerate sites of avian genes and found significant evolutionary constraint on this category of sites (between 24% and 43%). These results call for a reappraisal of synonymous substitution rates being used as neutral standards in molecular evolutionary analysis (e.g. the dN/dS ratio to infer positive selection).

Finally, the problem of sequencing errors in next-generation sequencing data was investigated. We developed a program that removes erroneous bases from the reads. We showed that low coverage sequencing projects and large genome sequencing projects will especially gain from trimming erroneous reads.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2011. 62 p.
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 868
Birds, Selection, Gene expression, Sequence evolution, Next-generation sequencing, Comparative genomics, Molecular evolution, Genomics, Substitution Rates, Non-coding DNA
National Category
Evolutionary Biology Bioinformatics and Systems Biology
Research subject
Biology with specialization in Molecular Evolution
urn:nbn:se:uu:diva-159766 (URN)978-91-554-8186-5 (ISBN)
Public defence
2011-11-25, Lindahlsalen, Evolutionary Biology Centre, Norbyvägen 18A, Uppsala, 13:00 (English)
Available from: 2011-11-04 Created: 2011-10-10 Last updated: 2011-11-10Bibliographically approved

Open Access in DiVA

fulltext(118 kB)333 downloads
File information
File name FULLTEXT01.pdfFile size 118 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Smeds, LinneaKünstner, Axel
By organisation
Evolutionary Biology
In the same journal
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 333 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 264 hits
ReferencesLink to record
Permanent link

Direct link