Change search
ReferencesLink to record
Permanent link

Direct link
Evolution of protein indels in plants, animals and fungi
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Systematic Biology.
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Systematic Biology.
2013 (English)In: BMC Evolutionary Biology, ISSN 1471-2148, Vol. 13, 140- p.Article in journal (Refereed) Published
Abstract [en]

Background: Insertions/deletions (indels) in protein sequences are useful as drug targets, protein structure predictors, species diagnostics and evolutionary markers. However there is limited understanding of indel evolutionary patterns. We sought to characterize indel patterns focusing first on the major groups of multicellular eukaryotes. Results: Comparisons of complete proteomes from a taxonically broad set of primarily Metazoa, Fungi and Viridiplantae yielded 299 substantial (>250aa) universal, single-copy (in-paralog only) proteins, from which 901 simple (present/absent) and 3,806 complex (multistate) indels were extracted. Simple indels are mostly small (1-7aa) with a most frequent size class of 1aa. However, even these simple looking indels show a surprisingly high level of hidden homoplasy (multiple independent origins). Among the apparently homoplasy-free simple indels, we identify 69 potential clade-defining indels (CDIs) that may warrant closer examination. CDIs show a very uneven taxonomic distribution among Viridiplante (13 CDIs), Fungi (40 CDIs), and Metazoa (0 CDIs). An examination of singleton indels shows an excess of insertions over deletions in nearly all examined taxa. This excess averages 2.31 overall, with a maximum observed value of 7.5 fold. Conclusions: We find considerable potential for identifying taxon-marker indels using an automated pipeline. However, it appears that simple indels in universal proteins are too rare and homoplasy-rich to be used for pure indel-based phylogeny. The excess of insertions over deletions seen in nearly every genome and major group examined maybe useful in defining more realistic gap penalties for sequence alignment. This bias also suggests that insertions in highly conserved proteins experience less purifying selection than do deletions.

Place, publisher, year, edition, pages
2013. Vol. 13, 140- p.
Keyword [en]
Indels, Rare genomic changes, Phylogeny, Insertion/deletion, Multiple sequence alignment, Eukaryote evolution, Indel profiles
National Category
Natural Sciences
URN: urn:nbn:se:uu:diva-204971DOI: 10.1186/1471-2148-13-140ISI: 000321461800001OAI: diva2:641383
Available from: 2013-08-16 Created: 2013-08-13 Last updated: 2014-04-17Bibliographically approved
In thesis
1. Mine the Gaps: Evolution of Eukaryotic Protein Indels and their Application for Testing Deep Phylogeny
Open this publication in new window or tab >>Mine the Gaps: Evolution of Eukaryotic Protein Indels and their Application for Testing Deep Phylogeny
2014 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Insertions/deletions (indels) are potentially powerful evolutionary markers, but little is known about their evolution and few tools exist to effectively study them. To address this, I developed SeqFIRE, a tool for automated identification and extraction of indels from protein multiple sequence alignments. The program also extracts conserved alignment blocks, thus covering all major steps in preparing multiple sequence alignments for phylogenetic analysis.

I then used SeqFIRE to build an indel database, using 299 single copy proteins from a broad taxonomic sampling of mainly multicellular eukaryotes. A total of 4,707 indels were extracted, of which 901 are simple (one genetic event) and 3,806 are complex (multiple events). The most abundant indels are single amino acid simple indels. Indel frequency decreases exponentially with length and shows a linear relationship with host protein size. Singleton indels reveal a strong bias towards insertions (2.31 x deletions on average). These analyses also identify 43 indels marking major clades in Plantae and Fungi (clade defining indels or CDIs), but none for Metazoa.

In order to study the 3806 complex indels they were first classified by number of states. Analysis of the 2-state complex and simple indels combined (“bi-state indels”) confirms that insertions are over 2.5 times as frequent as deletions. Three-quarters of the complex indels had three-nine states (“slightly complex indels”). A tree-assisted search method was developed allowing me to identify 1,010 potential CDIs supporting all examined major branches of Plantae and Fungi.

Forty-two proteins were also found to host complex indel CDIs for the deepest branches of Metazoa. After expanding the taxon set for these proteins, I identified a total of 49 non-bilaterian specific CDIs. Parsimony analysis of these indels places Ctenophora as sister taxon to all other Metazoa including Porifera. Six CDIs were also found placing Placozoa as sister to Bilateria. I conclude that slightly complex indels are a rich source of CDIs, and my tree-assisted search strategy could be automated and implemented in the program SeqFIRE to facilitate their discovery. This will have important implications for mining the phylogenomic content of the vast resource of protist genome data soon to become available.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2014. 58 p.
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1131
indel, insertion/deletion, protein evolution, bioinformatics, non-bilateria, eukaryotes, phylogeny
National Category
Bioinformatics and Systems Biology Biological Systematics
Research subject
Biology with specialization in Systematics; Biology with specialization in Molecular Evolution
urn:nbn:se:uu:diva-220727 (URN)978-91-554-8904-5 (ISBN)
Public defence
2014-05-07, Lindahlsalen, Norbyvägen 18, Uppsala, 10:00 (English)
Available from: 2014-04-15 Created: 2014-03-19 Last updated: 2014-04-29Bibliographically approved

Open Access in DiVA

fulltext(3654 kB)148 downloads
File information
File name FULLTEXT01.pdfFile size 3654 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Ajawatanawong, PravechBaldauf, Sandra L.
By organisation
Systematic Biology
In the same journal
BMC Evolutionary Biology
Natural Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 148 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 351 hits
ReferencesLink to record
Permanent link

Direct link