Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
rawMSA: End-to-end Deep Learning using raw Multiple Sequence Alignments
Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, Faculty of Science & Engineering.
Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0002-3772-8279
2019 (English)In: PLoS ONE, E-ISSN 1932-6203, Vol. 14, no 8, article id e0220182Article in journal (Refereed) Published
Abstract [en]

In the last decades, huge efforts have been made in the bioinformatics community to develop machine learning-based methods for the prediction of structural features of proteins in the hope of answering fundamental questions about the way proteins function and their involvement in several illnesses. The recent advent of Deep Learning has renewed the interest in neural networks, with dozens of methods being developed taking advantage of these new architectures. However, most methods are still heavily based pre-processing of the input data, as well as extraction and integration of multiple hand-picked, and manually designed features. Multiple Sequence Alignments (MSA) are the most common source of information in de novo prediction methods. Deep Networks that automatically refine the MSA and extract useful features from it would be immensely powerful. In this work, we propose a new paradigm for the prediction of protein structural features called rawMSA. The core idea behind rawMSA is borrowed from the field of natural language processing to map amino acid sequences into an adaptively learned continuous space. This allows the whole MSA to be input into a Deep Network, thus rendering pre-calculated features such as sequence profiles and other features calculated from MSA obsolete. We showcased the rawMSA methodology on three different prediction problems: secondary structure, relative solvent accessibility and inter-residue contact maps. We have rigorously trained and bench-marked rawMSA on a large set of proteins and have determined that it outperforms classical methods based on position-specific scoring matrices (PSSM) when predicting secondary structure and solvent accessibility, while performing on par with methods using more pre-calculated features in the inter-residue contact map prediction category in CASP12 and CASP13. Clearly demonstrating that rawMSA represents a promising development that can pave the way for improved methods using rawMSA instead of sequence profiles to represent evolutionary information in the coming years.

Place, publisher, year, edition, pages
PUBLIC LIBRARY SCIENCE , 2019. Vol. 14, no 8, article id e0220182
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:liu:diva-161199DOI: 10.1371/journal.pone.0220182ISI: 000485017200012PubMedID: 31415569OAI: oai:DiVA.org:liu-161199DiVA, id: diva2:1365676
Note

Funding Agencies|Swedish Research CouncilSwedish Research Council [2016-05369]; Foundation Blanceflor Boncompagni Ludovisi, nee Bildt; Nvidia Corporation

Available from: 2019-10-25 Created: 2019-10-25 Last updated: 2019-12-10

Open Access in DiVA

fulltext(1134 kB)11 downloads
File information
File name FULLTEXT01.pdfFile size 1134 kBChecksum SHA-512
c793604adeaaad24991390824b4751d31f73aeada71205babdfa2cfe6c91a35a218175d2740ba1131d109191e6d83e790d29d9cbd80a6969f55802f26461ce15
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMed

Search in DiVA

By author/editor
Mirabello, ClaudioWallner, Björn
By organisation
BioinformaticsFaculty of Science & Engineering
In the same journal
PLoS ONE
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 11 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 20 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf