Change search
ReferencesLink to record
Permanent link

Direct link
Automated annotation of protein families
Linköping University, Department of Physics, Chemistry and Biology, Bioinformatics .
2011 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Automatiserad annotering av proteinfamiljer (Swedish)
Abstract [en]

Introduction: The great challenge in bioinformatics is data integration. The amount of available data is always increasing and there are no common unified standards of where, or how, the data should be stored. The aim of this workis to build an automated tool to annotate the different member families within the protein superfamily of medium-chain dehydrogenases/reductases (MDR), by finding common properties among the member proteins. The goal is to increase the understanding of the MDR superfamily as well as the different member families.This will add to the amount of knowledge gained for free when a new, unannotated, protein is matched as a member to a specific MDR member family.

Method: The different types of data available all needed different handling. Textual data was mainly compared as strings while numeric data needed some special handling such as statistical calculations. Ontological data was handled as tree nodes where ancestry between terms had to be considered. This was implemented as a plugin-based system to make the tool easy to extend with additional data sources of different types.

Results: The biggest challenge was data incompleteness yielding little (or no) results for some families and thus decreasing the statistical significance of the results. Results show that all the human and mouse MDR members have a Pfam ADH domain (ADH_N and/or ADH_zinc_N) and takes part in an oxidation-reduction process, often with NAD or NADP as cofactor. Many of the proteins contain zinc and are expressed in liver tissue.

Conclusions: A python based tool for automatic annotation has been created to annotate the different MDR member families. The tool is easily extendable to be used with new databases and much of the results agrees with information found in literature. The utility and necessity of this system, as well as the quality of its produced results, are expected to only increase over time, even if no additional extensions are produced, as the system itself is able to make further and more detailed inferences as more and more data become available.

Place, publisher, year, edition, pages
2011. , 47 p.
Keyword [en]
data integration
National Category
Bioinformatics and Systems Biology
URN: urn:nbn:se:liu:diva-69393ISRN: LiTH-IFM-EX--11/2551--SEOAI: diva2:428331
Subject / course
2011-06-17, Jordan-Fermi, Fysikhuset, Campus Valla, Linköping, 11:00 (Swedish)
Available from: 2011-06-30 Created: 2011-06-26 Last updated: 2011-06-30Bibliographically approved

Open Access in DiVA

fulltext(585 kB)492 downloads
File information
File name FULLTEXT01.pdfFile size 585 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 492 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 186 hits
ReferencesLink to record
Permanent link

Direct link