uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evidence of Statistical Inconsistency of Phylogenetic Methods in the Presence of Multiple Sequence Alignment Uncertainty
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology. Univ Manchester, Fac Life Sci, Manchester M13 9PL, Lancs, England..
Univ Manchester, Fac Life Sci, Manchester M13 9PL, Lancs, England..
Univ Manchester, Fac Life Sci, Manchester M13 9PL, Lancs, England..
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology.
2015 (English)In: Genome Biology and Evolution, ISSN 1759-6653, E-ISSN 1759-6653, Vol. 7, no 8, 2102-2116 p.Article in journal (Refereed) Published
Abstract [en]

Evolutionary studies usually use a two-step process to investigate sequence data. Step one estimates a multiple sequence alignment (MSA) and step two applies phylogenetic methods to ask evolutionary questions of that MSA. Modern phylogenetic methods infer evolutionary parameters using maximum likelihood or Bayesian inference, mediated by a probabilistic substitution model that describes sequence change over a tree. The statistical properties of these methods mean that more data directly translates to an increased confidence in downstream results, providing the substitution model is adequate and the MSA is correct. Many studies have investigated the robustness of phylogenetic methods in the presence of substitution model misspecification, but few have examined the statistical properties of those methods when the MSA is unknown. This simulation study examines the statistical properties of the complete two-step process when inferring sequence divergence and the phylogenetic tree topology. Both nucleotide and amino acid analyses are negatively affected by the alignment step, both through inaccurate guide tree estimates and through overfitting to that guide tree. For many alignment tools these effects become more pronounced when additional sequences are added to the analysis. Nucleotide sequences are particularly susceptible, with MSA errors leading to statistical support for long-branch attraction artifacts, which are usually associated with gross substitution model misspecification. Amino acid MSAs are more robust, but do tend to arbitrarily resolve multifurcations in favor of the guide tree. No inference strategies produce consistently accurate estimates of divergence between sequences, although amino acid MSAs are again more accurate than their nucleotide counterparts. We conclude with some practical suggestions about how to limit the effect of MSA uncertainty on evolutionary inference.

Place, publisher, year, edition, pages
2015. Vol. 7, no 8, 2102-2116 p.
Keyword [en]
multiple sequence alignment, phylogenetic inference, maximum likelihood, sequence divergence, tree inference
National Category
Evolutionary Biology
Identifiers
URN: urn:nbn:se:uu:diva-263531DOI: 10.1093/gbe/evv127ISI: 000360819300003OAI: oai:DiVA.org:uu-263531DiVA: diva2:858911
Available from: 2015-10-05 Created: 2015-10-02 Last updated: 2017-12-01Bibliographically approved

Open Access in DiVA

fulltext(1135 kB)70 downloads
File information
File name FULLTEXT01.pdfFile size 1135 kBChecksum SHA-512
10b988710bef8fc87ac55f77518249b96a805935d015483cfd7ac19f49aaf7942fb74cb25871151043f7373ed8836b14cf268bba8ff40cb06b22a1c7e124e4f8
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records BETA

Whelan, Simon

Search in DiVA

By author/editor
Whelan, Simon
By organisation
Evolutionary Biology
In the same journal
Genome Biology and Evolution
Evolutionary Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 70 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 451 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf