Change search
ReferencesLink to record
Permanent link

Direct link
Peptide Retention Time Prediction using Artificial Neural Networks
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Peptid retentionstids prediktering med artificiella neuronnät (Swedish)
Abstract [en]

This thesis describes the development and evaluation of an artificial neural network, trained to predict the chromatographic retention times of peptides, based on their amino acid sequence. The purpose of accurately predicting retention times is to increase the number of protein identifications in shotgun proteomics and to improve targeted mass spectrometry experiment. The model presented in this thesis is a branched convolutional neural network (CNN) consisting of two convolutional layers, followed by three fully connected layers, all with leaky rectifier as the activation function. Each amino acid sequence is represented by a 20-by-20 matrix X, with each row corresponding to a certain amino acid and the columns representing the position of the amino acid in the peptide. This model achieves a RMSE corresponding to 3.8% of the total running time of the liquid chromatography and a 95 % confidence interval proportional to 14% of the running time, when trained on 20 000 unique peptides from a yeast sample. The CNN predicts retention times slightly more accurately than the software ELUDE when trained on a larger dataset, yet ELUDE performs better on smaller datasets. The CNN does however have a considerable shorter training time. 

Abstract [sv]

Det här examensarbetet beskriver utveckningen och utvärderingen av ett artificiellt neuronnät som har tränats för att prediktera kromotografisk retentionstid för peptider baserat på dess aminosyrasekvens. Syftet med att prediktera retentionstider är  att kunna identifiera fler peptider i ”shotgun” proteomik experiment och att förbättra riktade masspektrometri experiment. Den slutgiltiga modellen i detta arbete är ett konvolutions neuronnät (CNN) bestående av två konvolutions lager följt av tre lager med fullt kopplade neuroner, alla med ’leaky rectifier’ som aktiveringsfunktion. Varje aminosyrasekvens representeras av en 20x25-matris X, där varje rad representerar en specifik aminosyra och kolumnerna beskriver aminosyrans position i peptiden. Den här modellen uppnår ett kvadratiskt medelfel motsvarande 3.8% av körtiden för vätskekromatografin och ett 95 % konfidensinterval motsvarande 14% av körtiden, när CNN modellen tränas på 20 000 unika peptides från ett jästprov. CNN modellen presterar marginellt bättre än mjukvaran ELUDE när de är tränade på ett stort dataset, men för begränsade dataset så presenterar ELUDE bättre. CNN modellen tränar dock avsevärt mycket snabbare.

Place, publisher, year, edition, pages
2016.
Series
TRITA-MAT-E, 2016:48
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:kth:diva-190995OAI: oai:DiVA.org:kth-190995DiVA: diva2:954471
Subject / course
Mathematical Statistics
Educational program
Master of Science - Applied and Computational Mathematics
Supervisors
Examiners
Available from: 2016-08-22 Created: 2016-08-20 Last updated: 2016-08-22Bibliographically approved

Open Access in DiVA

fulltext(917 kB)24 downloads
File information
File name FULLTEXT01.pdfFile size 917 kBChecksum SHA-512
07d60324479b641729743f78bb58f57d1fffc7af01d777112e0915166a6e9a84923b5956e1aeba46602b82487b99eb99b958bd6019cc47d61bf70a77b836ad66
Type fulltextMimetype application/pdf

By organisation
Mathematical Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 24 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 33 hits
ReferencesLink to record
Permanent link

Direct link