Change search
ReferencesLink to record
Permanent link

Direct link
SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Systematic Biology.
Show others and affiliations
2012 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 40, no W1, W340-W347 p.Article in journal (Refereed) Published
Abstract [en]

Analyses of multiple sequence alignments generally focus on well-defined conserved sequence blocks, while the rest of the alignment is largely ignored or discarded. This is especially true in phylogenomics, where large multigene datasets are produced through automated pipelines. However, some of the most powerful phylogenetic markers have been found in the variable length regions of multiple alignments, particularly insertions/deletions (indels) in protein sequences. We have developed Sequence Feature and Indel Region Extractor (SeqFIRE) to enable the automated identification and extraction of indels from protein sequence alignments. The program can also extract conserved blocks and identify fast evolving sites using a combination of conservation and entropy. All major variables can be adjusted by the user, allowing them to identify the sets of variables most suited to a particular analysis or dataset. Thus, all major tasks in preparing an alignment for further analysis are combined in a single flexible and user-friendly program. The output includes a numbered list of indels, alignments in NEXUS format with indels annotated or removed and indel-only matrices. SeqFIRE is a user-friendly web application, freely available online at

Place, publisher, year, edition, pages
2012. Vol. 40, no W1, W340-W347 p.
Keyword [en]
Indels, Alignment, Conserved blocks
National Category
Bioinformatics (Computational Biology) Bioinformatics and Systems Biology
URN: urn:nbn:se:uu:diva-179937DOI: 10.1093/nar/gks561ISI: 000306670900056OAI: diva2:547110
Available from: 2012-08-27 Created: 2012-08-27 Last updated: 2014-04-17Bibliographically approved
In thesis
1. Mine the Gaps: Evolution of Eukaryotic Protein Indels and their Application for Testing Deep Phylogeny
Open this publication in new window or tab >>Mine the Gaps: Evolution of Eukaryotic Protein Indels and their Application for Testing Deep Phylogeny
2014 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Insertions/deletions (indels) are potentially powerful evolutionary markers, but little is known about their evolution and few tools exist to effectively study them. To address this, I developed SeqFIRE, a tool for automated identification and extraction of indels from protein multiple sequence alignments. The program also extracts conserved alignment blocks, thus covering all major steps in preparing multiple sequence alignments for phylogenetic analysis.

I then used SeqFIRE to build an indel database, using 299 single copy proteins from a broad taxonomic sampling of mainly multicellular eukaryotes. A total of 4,707 indels were extracted, of which 901 are simple (one genetic event) and 3,806 are complex (multiple events). The most abundant indels are single amino acid simple indels. Indel frequency decreases exponentially with length and shows a linear relationship with host protein size. Singleton indels reveal a strong bias towards insertions (2.31 x deletions on average). These analyses also identify 43 indels marking major clades in Plantae and Fungi (clade defining indels or CDIs), but none for Metazoa.

In order to study the 3806 complex indels they were first classified by number of states. Analysis of the 2-state complex and simple indels combined (“bi-state indels”) confirms that insertions are over 2.5 times as frequent as deletions. Three-quarters of the complex indels had three-nine states (“slightly complex indels”). A tree-assisted search method was developed allowing me to identify 1,010 potential CDIs supporting all examined major branches of Plantae and Fungi.

Forty-two proteins were also found to host complex indel CDIs for the deepest branches of Metazoa. After expanding the taxon set for these proteins, I identified a total of 49 non-bilaterian specific CDIs. Parsimony analysis of these indels places Ctenophora as sister taxon to all other Metazoa including Porifera. Six CDIs were also found placing Placozoa as sister to Bilateria. I conclude that slightly complex indels are a rich source of CDIs, and my tree-assisted search strategy could be automated and implemented in the program SeqFIRE to facilitate their discovery. This will have important implications for mining the phylogenomic content of the vast resource of protist genome data soon to become available.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2014. 58 p.
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1131
indel, insertion/deletion, protein evolution, bioinformatics, non-bilateria, eukaryotes, phylogeny
National Category
Bioinformatics and Systems Biology Biological Systematics
Research subject
Biology with specialization in Systematics; Biology with specialization in Molecular Evolution
urn:nbn:se:uu:diva-220727 (URN)978-91-554-8904-5 (ISBN)
Public defence
2014-05-07, Lindahlsalen, Norbyvägen 18, Uppsala, 10:00 (English)
Available from: 2014-04-15 Created: 2014-03-19 Last updated: 2014-04-29Bibliographically approved

Open Access in DiVA

fulltext(8167 kB)251 downloads
File information
File name FULLTEXT01.pdfFile size 8167 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Ajawatanawong, PravechBaldauf, Sandra L.
By organisation
Systematic Biology
In the same journal
Nucleic Acids Research
Bioinformatics (Computational Biology)Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 251 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 687 hits
ReferencesLink to record
Permanent link

Direct link