Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Predicting the route: from protein sequence to sorting in eukaryotic cell
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Science for Life Laboratory.
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Proteins need to be localised in the correct compartment of a eukaryotic cell to function correctly. Therefore, a protein needs to be transported to the right location. Specific signals present in the protein sequence direct proteins to different subcellular localisations. The correct transport is essential for the life of the cell, while, possible errors during the transport can cause irreversible damage and interfere with the activities of surrounding proteins. For more than 30 years, the development of methods to identify the localisation of proteins using both experimental and computational approaches has been an important research area. The objective of this thesis is to develop better computational methods for the classification of the subcellular localisation of eukaryotic proteins. I first describe the development of a consensus method, SubCons, which improves the subcellular prediction of human proteins. Next, I present the SubCons web-server as well as an additional benchmark using protein annotation from novel mass-spectrometry studies in two eukaryotic organisms Mus musculus and Drosophila melanogaster. Then, I present the new version of TargetP and how deep learning can improve the identification of N-terminal sorting signals by focusing on relevant biological signatures. Finally, I describe the development of a novel method for sub-nuclear localisation prediction. Here, I show that the performance of a deep convolutional neural network is improved when using an augmented dataset of homologous proteins.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University , 2019. , p. 65
Keywords [en]
eukaryotic cell, sorting signals, subcellular localisation, machine learning, biological sequence analysis, bioinformatics
National Category
Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
URN: urn:nbn:se:su:diva-171434ISBN: 978-91-7797-801-5 (print)ISBN: 978-91-7797-802-2 (electronic)OAI: oai:DiVA.org:su-171434DiVA, id: diva2:1341238
Public defence
2019-09-27, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16 B, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 3: Manuscript. Paper 4: Manuscript.

Available from: 2019-09-04 Created: 2019-08-08 Last updated: 2022-02-26Bibliographically approved
List of papers
1. SubCons: a new ensemble method for improved human subcellular localization predictions
Open this publication in new window or tab >>SubCons: a new ensemble method for improved human subcellular localization predictions
Show others...
2017 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 16, p. 2464-2470Article in journal (Refereed) Published
Abstract [en]

Motivation: Knowledge of the correct protein subcellular localization is necessary for understanding the function of a protein. Unfortunately large-scale experimental studies are limited in their accuracy. Therefore, the development of prediction methods has been limited by the amount of accurate experimental data. However, recently large-scale experimental studies have provided new data that can be used to evaluate the accuracy of subcellular predictions in human cells. Using this data we examined the performance of state of the art methods and developed SubCons, an ensemble method that combines four predictors using a Random Forest classifier. Results: SubCons outperforms earlier methods in a dataset of proteins where two independent methods confirm the subcellular localization. Given nine subcellular localizations, SubCons achieves an F1-Score of 0.79 compared to 0.70 of the second bestmethod. Furthermore, at a FPR of 1% the true positive rate (TPR) is over 58% for SubCons compared to less than 50% for the best individual predictor.

National Category
Biological Sciences Environmental Biotechnology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-147084 (URN)10.1093/bioinformatics/btx219 (DOI)000407139800005 ()28407043 (PubMedID)
Available from: 2017-10-16 Created: 2017-10-16 Last updated: 2022-02-28Bibliographically approved
2. The SubCons webserver: A user friendly web interface for state-of-the-art subcellular localization prediction
Open this publication in new window or tab >>The SubCons webserver: A user friendly web interface for state-of-the-art subcellular localization prediction
2018 (English)In: Protein Science, ISSN 0961-8368, E-ISSN 1469-896X, Vol. 27, no 1, p. 195-201Article in journal (Refereed) Published
Abstract [en]

SubCons is a recently developed method that predicts the subcellular localization of a protein. It combines predictions from four predictors using a Random Forest classifier. Here, we present the user-friendly web-interface implementation of SubCons. Starting from a protein sequence, the server rapidly predicts the subcellular localizations of an individual protein. In addition, the server accepts the submission of sets of proteins either by uploading the files or programmatically by using command line WSDL API scripts. This makes SubCons ideal for proteome wide analyses allowing the user to scan a whole proteome in few days. From the web page, it is also possible to download precalculated predictions for several eukaryotic organisms. To evaluate the performance of SubCons we present a benchmark of LocTree3 and SubCons using two recent mass-spectrometry based datasets of mouse and drosophila proteins. The server is available at http://subcons.bioinfo.se/

Keywords
subcellular localization, sequence analysis, machine learning
National Category
Biochemistry Molecular Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-152492 (URN)10.1002/pro.3297 (DOI)000418254300019 ()28901589 (PubMedID)
Available from: 2018-02-07 Created: 2018-02-07 Last updated: 2025-02-20Bibliographically approved
3. Detecting Novel Sequence Signals in Targeting Peptides Using Deep Learning
Open this publication in new window or tab >>Detecting Novel Sequence Signals in Targeting Peptides Using Deep Learning
Show others...
(English)Manuscript (preprint) (Other academic)
National Category
Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-171426 (URN)
Available from: 2019-08-07 Created: 2019-08-07 Last updated: 2022-02-26Bibliographically approved
4. Improved sub-nuclear prediction by Deep Learning using an augmented dataset
Open this publication in new window or tab >>Improved sub-nuclear prediction by Deep Learning using an augmented dataset
Show others...
(English)Manuscript (preprint) (Other academic)
National Category
Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-171425 (URN)
Available from: 2019-08-07 Created: 2019-08-07 Last updated: 2022-02-26Bibliographically approved

Open Access in DiVA

Predicting the route: from protein sequence to sorting in eukaryotic cell(10223 kB)1442 downloads
File information
File name FULLTEXT01.pdfFile size 10223 kBChecksum SHA-512
86e333b7cc084e1796291b68338c42788cfe034c2260e816850cb48bd449842f077e94f8ed6bf46b0451090de57c7d7ee36388c31e02554b150ee8ee337fccd7
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Salvatore, Marco
By organisation
Department of Biochemistry and Biophysics
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 1442 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 624 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf