uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
Mine the Gaps: Evolution of Eukaryotic Protein Indels and their Application for Testing Deep Phylogeny
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Systematic Biology.
2014 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Insertions/deletions (indels) are potentially powerful evolutionary markers, but little is known about their evolution and few tools exist to effectively study them. To address this, I developed SeqFIRE, a tool for automated identification and extraction of indels from protein multiple sequence alignments. The program also extracts conserved alignment blocks, thus covering all major steps in preparing multiple sequence alignments for phylogenetic analysis.

I then used SeqFIRE to build an indel database, using 299 single copy proteins from a broad taxonomic sampling of mainly multicellular eukaryotes. A total of 4,707 indels were extracted, of which 901 are simple (one genetic event) and 3,806 are complex (multiple events). The most abundant indels are single amino acid simple indels. Indel frequency decreases exponentially with length and shows a linear relationship with host protein size. Singleton indels reveal a strong bias towards insertions (2.31 x deletions on average). These analyses also identify 43 indels marking major clades in Plantae and Fungi (clade defining indels or CDIs), but none for Metazoa.

In order to study the 3806 complex indels they were first classified by number of states. Analysis of the 2-state complex and simple indels combined (“bi-state indels”) confirms that insertions are over 2.5 times as frequent as deletions. Three-quarters of the complex indels had three-nine states (“slightly complex indels”). A tree-assisted search method was developed allowing me to identify 1,010 potential CDIs supporting all examined major branches of Plantae and Fungi.

Forty-two proteins were also found to host complex indel CDIs for the deepest branches of Metazoa. After expanding the taxon set for these proteins, I identified a total of 49 non-bilaterian specific CDIs. Parsimony analysis of these indels places Ctenophora as sister taxon to all other Metazoa including Porifera. Six CDIs were also found placing Placozoa as sister to Bilateria. I conclude that slightly complex indels are a rich source of CDIs, and my tree-assisted search strategy could be automated and implemented in the program SeqFIRE to facilitate their discovery. This will have important implications for mining the phylogenomic content of the vast resource of protist genome data soon to become available.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2014. , 58 p.
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1131
Keyword [en]
indel, insertion/deletion, protein evolution, bioinformatics, non-bilateria, eukaryotes, phylogeny
National Category
Bioinformatics and Systems Biology Biological Systematics
Research subject
Biology with specialization in Systematics; Biology with specialization in Molecular Evolution
URN: urn:nbn:se:uu:diva-220727ISBN: 978-91-554-8904-5OAI: oai:DiVA.org:uu-220727DiVA: diva2:706331
Public defence
2014-05-07, Lindahlsalen, Norbyvägen 18, Uppsala, 10:00 (English)
Available from: 2014-04-15 Created: 2014-03-19 Last updated: 2014-04-29Bibliographically approved
List of papers
1. SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments
Open this publication in new window or tab >>SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments
Show others...
2012 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 40, no W1, W340-W347 p.Article in journal (Refereed) Published
Abstract [en]

Analyses of multiple sequence alignments generally focus on well-defined conserved sequence blocks, while the rest of the alignment is largely ignored or discarded. This is especially true in phylogenomics, where large multigene datasets are produced through automated pipelines. However, some of the most powerful phylogenetic markers have been found in the variable length regions of multiple alignments, particularly insertions/deletions (indels) in protein sequences. We have developed Sequence Feature and Indel Region Extractor (SeqFIRE) to enable the automated identification and extraction of indels from protein sequence alignments. The program can also extract conserved blocks and identify fast evolving sites using a combination of conservation and entropy. All major variables can be adjusted by the user, allowing them to identify the sets of variables most suited to a particular analysis or dataset. Thus, all major tasks in preparing an alignment for further analysis are combined in a single flexible and user-friendly program. The output includes a numbered list of indels, alignments in NEXUS format with indels annotated or removed and indel-only matrices. SeqFIRE is a user-friendly web application, freely available online at www.seqfire.org/.

Indels, Alignment, Conserved blocks
National Category
Bioinformatics (Computational Biology) Bioinformatics and Systems Biology
urn:nbn:se:uu:diva-179937 (URN)10.1093/nar/gks561 (DOI)000306670900056 ()
Available from: 2012-08-27 Created: 2012-08-27 Last updated: 2014-04-17Bibliographically approved
2. Evolution of protein indels in plants, animals and fungi
Open this publication in new window or tab >>Evolution of protein indels in plants, animals and fungi
2013 (English)In: BMC Evolutionary Biology, ISSN 1471-2148, Vol. 13, 140- p.Article in journal (Refereed) Published
Abstract [en]

Background: Insertions/deletions (indels) in protein sequences are useful as drug targets, protein structure predictors, species diagnostics and evolutionary markers. However there is limited understanding of indel evolutionary patterns. We sought to characterize indel patterns focusing first on the major groups of multicellular eukaryotes. Results: Comparisons of complete proteomes from a taxonically broad set of primarily Metazoa, Fungi and Viridiplantae yielded 299 substantial (>250aa) universal, single-copy (in-paralog only) proteins, from which 901 simple (present/absent) and 3,806 complex (multistate) indels were extracted. Simple indels are mostly small (1-7aa) with a most frequent size class of 1aa. However, even these simple looking indels show a surprisingly high level of hidden homoplasy (multiple independent origins). Among the apparently homoplasy-free simple indels, we identify 69 potential clade-defining indels (CDIs) that may warrant closer examination. CDIs show a very uneven taxonomic distribution among Viridiplante (13 CDIs), Fungi (40 CDIs), and Metazoa (0 CDIs). An examination of singleton indels shows an excess of insertions over deletions in nearly all examined taxa. This excess averages 2.31 overall, with a maximum observed value of 7.5 fold. Conclusions: We find considerable potential for identifying taxon-marker indels using an automated pipeline. However, it appears that simple indels in universal proteins are too rare and homoplasy-rich to be used for pure indel-based phylogeny. The excess of insertions over deletions seen in nearly every genome and major group examined maybe useful in defining more realistic gap penalties for sequence alignment. This bias also suggests that insertions in highly conserved proteins experience less purifying selection than do deletions.

Indels, Rare genomic changes, Phylogeny, Insertion/deletion, Multiple sequence alignment, Eukaryote evolution, Indel profiles
National Category
Natural Sciences
urn:nbn:se:uu:diva-204971 (URN)10.1186/1471-2148-13-140 (DOI)000321461800001 ()
Available from: 2013-08-16 Created: 2013-08-13 Last updated: 2014-04-17Bibliographically approved
3. An automatable method for high throughput analysis of evolutionary patterns in slightly complex indels and its application to the deep phylogeny of Metazoa
Open this publication in new window or tab >>An automatable method for high throughput analysis of evolutionary patterns in slightly complex indels and its application to the deep phylogeny of Metazoa
2014 (English)Article in journal (Refereed) Submitted
Abstract [en]

Insertions/deletions (indels) in protein sequences are potential powerful evolutionary markers. However, these characters have rarely been explored systematically at deep phylogenetic levels. Previous analyses of simple (2-state) clade defining indels (CDIs) in universal eukaryotic proteins found none to support any major animal clade. We hypothesized that CDIs might still be found in the remaining population of indels, which we term complex indels. Here, we propose a method for analyzing the simplest class of complex indels the “slightly complex indels”, and use these to investigate deep branches in animal phylogeny. Complex indels with two states, called bi-state indels, show similar evolutionary patterns to singleton simple indels and confirms that insertion mutations are more common than deletions. Exploration of CDIs in 2- to 9-state complex indels shows strong support for all examined branches of fungi and Archaeplastida. Surprisingly, we also found CDIs supporting major branches in animals, particular in vertebrates. We then expanded the search to non-bilaterial animals (Porifera, Cnidaria and Ctenophora). The phylogenetic tree reconstructed by CDIs places the Ctenophore Mnemiopsis leidyi as the deepest branch of animals with 6 CDIs support. Trichoplax adhaerens is closely related to the Bilateria. Moreover, the indel phylogeny shows Nematostella vectensis and Hydra magnipapillata are paraphyletic group and position of Cnidarian branches seems to be problematic in the indel phylogeny because of homoplasy. This might be solved if we discover CDIs from animal specific proteins, which emerged after the universal orthologous proteins.Evolutionary Patterns in Slightly Complex Protein Insertions/Deletions (Indels) and Their Application to the Study of Deep Phylogeny in Metazoa

National Category
Other Biological Topics
urn:nbn:se:uu:diva-216842 (URN)
Available from: 2014-01-27 Created: 2014-01-27 Last updated: 2014-04-17Bibliographically approved

Open Access in DiVA

fulltext(4311 kB)828 downloads
File information
File name FULLTEXT01.pdfFile size 4311 kBChecksum SHA-512
Type fulltextMimetype application/pdf
Buy this publication >>

Search in DiVA

By author/editor
Ajawatanawong, Pravech
By organisation
Systematic Biology
Bioinformatics and Systems BiologyBiological Systematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 828 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 892 hits
ReferencesLink to record
Permanent link

Direct link