uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Unsupervised genome-wide recognition of local relationship patterns
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology. Uppsala University, Science for Life Laboratory, SciLifeLab.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology. Uppsala University, Science for Life Laboratory, SciLifeLab.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology. Uppsala University, Science for Life Laboratory, SciLifeLab.
Show others and affiliations
2013 (English)In: BMC Genomics, ISSN 1471-2164, E-ISSN 1471-2164, Vol. 14, 347- p.Article in journal (Refereed) Published
Abstract [en]

BACKGROUND

Phenomena such as incomplete lineage sorting, horizontal gene transfer, gene duplication and subsequent sub- and neo-functionalisation can result in distinct local phylogenetic relationships that are discordant with species phylogeny. In order to assess the possible biological roles for these subdivisions, they must first be identified and characterised, preferably on a large scale and in an automated fashion.

RESULTS

We developed Saguaro, a combination of a Hidden Markov Model (HMM) and a Self Organising Map (SOM), to characterise local phylogenetic relationships among aligned sequences using cacti, matrices of pair-wise distance measures. While the HMM determines the genomic boundaries from aligned sequences, the SOM hypothesises new cacti in an unsupervised and iterative fashion based on the regions that were modelled least well by existing cacti. After testing the software on simulated data, we demonstrate the utility of Saguaro by testing two different data sets: (i) 181 Dengue virus strains, and (ii) 5 primate genomes. Saguaro identifies regions under lineage-specific constraint for the first set, and genomic segments that we attribute to incomplete lineage sorting in the second dataset. Intriguingly for the primate data, Saguaro also classified an additional ~3% of the genome as most incompatible with the expected species phylogeny. A substantial fraction of these regions was found to overlap genes associated with both the innate and adaptive immune systems.

CONCLUSIONS

Saguaro detects distinct cacti describing local phylogenetic relationships without requiring any a priori hypotheses. We have successfully demonstrated Saguaro's utility with two contrasting data sets, one containing many members with short sequences (Dengue viral strains: n = 181, genome size = 10,700 nt), and the other with few members but complex genomes (related primate species: n = 5, genome size = 3 Gb), suggesting that the software is applicable to a wide variety of experimental populations. Saguaro is written in C++, runs on the Linux operating system, and can be downloaded from http://saguarogw.sourceforge.net/.

Place, publisher, year, edition, pages
2013. Vol. 14, 347- p.
National Category
Bioinformatics and Systems Biology Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:uu:diva-200422DOI: 10.1186/1471-2164-14-347ISI: 000319820200001OAI: oai:DiVA.org:uu-200422DiVA: diva2:623509
Note

De fem första författarna delar på förstaförfattarskapet.

Available from: 2013-05-27 Created: 2013-05-27 Last updated: 2017-12-06Bibliographically approved

Open Access in DiVA

fulltext(3763 kB)343 downloads
File information
File name FULLTEXT01.pdfFile size 3763 kBChecksum SHA-512
8e0c0a4f5119c392af37668bf6513fb22da407761dfbf648abf3bde7a81674bf0a5ec27485a1b22d08ade4047792a36481ef4df4cc59280b556249b58c91b74f
Type fulltextMimetype application/pdf

Other links

Publisher's full texthttp://www.biomedcentral.com/1471-2164/14/347

Authority records BETA

Zamani, NedaLantz, HenrikHoeppner, MarcMeadows, JenniferVijay, NagarjunLindblad-Toh, KerstinJern, PatricGrabherr, Manfred

Search in DiVA

By author/editor
Zamani, NedaLantz, HenrikHoeppner, MarcMeadows, JenniferVijay, NagarjunLindblad-Toh, KerstinJern, PatricGrabherr, Manfred
By organisation
Department of Medical Biochemistry and MicrobiologyScience for Life Laboratory, SciLifeLabEvolutionary Biology
In the same journal
BMC Genomics
Bioinformatics and Systems BiologyBioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 343 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 915 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf