Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Seqenv: linking sequences to environments through text mining
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Limnology.
Univ Glasgow, Sch Engn, Infrastruct & Environm Res Div, Glasgow, Lanark, Scotland..
Univ Copenhagen, Fac Hlth & Med Sci, Novo Nordisk Fdn Ctr Prot Res, Copenhagen, Denmark..
Curtin Univ Technol, Dept Chem, WA OIGC, Bentley, WA, Australia..
Show others and affiliations
2016 (English)In: PeerJ, ISSN 2167-8359, E-ISSN 2167-8359, Vol. 4, e2690Article in journal (Refereed) Published
Abstract [en]

Understanding the distribution of taxa and associated traits across different environments is one of the central questions in microbial ecology. High-throughput sequencing (HTS) studies are presently generating huge volumes of data to address this biogeographical topic. However, these studies are often focused on specific environment types or processes leading to the production of individual, unconnected datasets. The large amounts of legacy sequence data with associated metadata that exist can be harnessed to better place the genetic information found in these surveys into a wider environmental context. Here we introduce a software program, seqenv, to carry out precisely such a task. It automatically performs similarity searches of short sequences against the "nt" nucleotide database provided by NCBI and, out of every hit, extracts if it is available the textual metadata field. After collecting all the isolation sources from all the search results, we run a text mining algorithm to identify and parse words that are associated with the Environmental Ontology (EnvO) controlled vocabulary. This, n turn, enables us to determine both in which environments individual sequences or taxa have previously been observed and, by weighted summation of those results, to summarize complete samples. We present two demonstrative applications of seqenv to a survey of ammonia oxidizing archaea as well as to a plankton paleome dataset from the Black Sea. These demonstrate the ability of the tool to reveal novel patterns in HTS and its utility in the fields of environmental source tracking, paleontology, and s of microbial biogeography.

Place, publisher, year, edition, pages
2016. Vol. 4, e2690
Keyword [en]
Bioinformatics, Ecology, Microbiology, Genomics, Sequence analysis, Text processing, Statistics, Pipeline, Open source software
National Category
Bioinformatics and Systems Biology Computer Sciences
Identifiers
URN: urn:nbn:se:uu:diva-313528DOI: 10.7717/peerj.2690ISI: 000390057600002OAI: oai:DiVA.org:uu-313528DiVA: diva2:1071014
Funder
Swedish Foundation for Strategic Research , ICA10-0015NERC - the Natural Environment Research Council, NE/L011956/1Novo Nordisk, NNF14CC0001EU, FP7, Seventh Framework Programme, 264089
Available from: 2017-02-02 Created: 2017-01-20 Last updated: 2018-01-13Bibliographically approved

Open Access in DiVA

fulltext(2551 kB)23 downloads
File information
File name FULLTEXT01.pdfFile size 2551 kBChecksum SHA-512
dd9d2216c3bd614836fe8b95e6ba850e67757ed7d09e65655e3941b3c4d687ab2d740ffbab4be4c5f6dd153d19a16b193527848d8479b1e46fcbf5fe449f3826
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Sinclair, LucasEiler, Alexander
By organisation
Limnology
In the same journal
PeerJ
Bioinformatics and Systems BiologyComputer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 23 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 311 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf