Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Discovering viral genomes in human metagenomic data by predicting unknown protein families
Karolinska Inst, Sweden.
Stockholm Univ, Sweden.
Karolinska Inst, Sweden.
Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, Faculty of Science & Engineering.
Show others and affiliations
2018 (English)In: Scientific Reports, ISSN 2045-2322, E-ISSN 2045-2322, Vol. 8, article id 28Article in journal (Refereed) Published
Abstract [en]

Massive amounts of metagenomics data are currently being produced, and in all such projects a sizeable fraction of the resulting data shows no or little homology to known sequences. It is likely that this fraction contains novel viruses, but identification is challenging since they frequently lack homology to known viruses. To overcome this problem, we developed a strategy to detect ORFan protein families in shotgun metagenomics data, using similarity-based clustering and a set of filters to extract bona fide protein families. We applied this method to 17 virus-enriched libraries originating from human nasopharyngeal aspirates, serum, feces, and cerebrospinal fluid samples. This resulted in 32 predicted putative novel gene families. Some families showed detectable homology to sequences in metagenomics datasets and protein databases after reannotation. Notably, one predicted family matches an ORF from the highly variable Torque Teno virus (TTV). Furthermore, follow-up from a predicted ORFan resulted in the complete reconstruction of a novel circular genome. Its organisation suggests that it most likely corresponds to a novel bacteriophage in the microviridae family, hence it was named bacteriophage HFM.

Place, publisher, year, edition, pages
NATURE PUBLISHING GROUP , 2018. Vol. 8, article id 28
National Category
Bioinformatics and Systems Biology
Identifiers
URN: urn:nbn:se:liu:diva-144887DOI: 10.1038/s41598-017-18341-7ISI: 000419441300028PubMedID: 29311716OAI: oai:DiVA.org:liu-144887DiVA, id: diva2:1181673
Note

Funding Agencies|Swedish Research Council; Knut and Alice Wallenberg Foundation

Available from: 2018-02-09 Created: 2018-02-09 Last updated: 2018-03-05

Open Access in DiVA

fulltext(3929 kB)32 downloads
File information
File name FULLTEXT01.pdfFile size 3929 kBChecksum SHA-512
7d78f7a80b4499c9ff6fd6e84e1d0931ed748052c02b9075fb06c2355b3433c413c58982fe135df11524818f43ad5f95b972a2c1586cd6983919d5e38739c35e
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMed

Search in DiVA

By author/editor
Lysholm, Fredrik
By organisation
BioinformaticsFaculty of Science & Engineering
In the same journal
Scientific Reports
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 32 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 165 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf