NOVEL APPROACH TO STORAGE AND STORTING OF NEXT GENERATION SEQUENCING DATA FOR THE PURPOSE OF FUNCTIONAL ANNOTATION TRANSFER
Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE creditsStudent thesis
The problem of functional annotation of novel sequences has been a sigfinicant issue for many laboratories that decided to apply next generation sequencing techniques to less studied species. In particular experiments such as transcriptome analysis heavily suer from this problem due to the impossibility of ascribing their results in a relevant biological context. Several tools have been proposed to solve this problem through homology annotation transfer. The principle behind this strategy is that homologous genes share common functions in dierent organisms, and therefore annotations are transferable between these genes. Commonly, BLAST reports are used to identify a suitable homologousgene in a well annotated species and the annotation is then transferred fromthe homologue to the novel sequence. Not all homologues, however, possess valid functional annotations. The aim of this project was to devise an algorithm to process BLAST reports and provide a criterion to discriminate between homologues with a biologically informative and uninformative annotation, respectively. In addition, all data obtained from the BLAST report isto be stored in a relational database for ease of consultation and visualization. In order to test the solidity of the system, we utilized 750 novel sequences obtained through application of next generation sequencing techniques to Avena sativa samples. This species particularly suits our needs as it represents the typical target for homology annotation transfer: lack of a reference genome and diculty in attributing functional annotation. The system was able to perform all the required tasks. Comparisons between best hits asdetermined by BLAST and best hits as determined by the algorithm showed a significant increase in the biological significance of the results when thealgorithm sorting system was applied.
Place, publisher, year, edition, pages
2012. , 47 p.
homology annotation transfer, blast parsing, relational database, functional information
Bioinformatics and Systems Biology
IdentifiersURN: urn:nbn:se:his:diva-6043OAI: oai:DiVA.org:his-6043DiVA: diva2:536688
Subject / course
Bioinformatics - Master’s Programme
2012-06-05, Skovde, 14:20 (English)
UppsokLife Earth Science
Lindlof, Angelica, Dr
Lubovac, Zelmina, Dr