uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
AutoPhylo, a bioinformatic tool for identifying and retrieving sequences
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Systematic Biology. Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre. (SLBaldauf)
2010 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

The task of constructing a molecular phylogenetic tree consists of finding homologous sequences, making a multiple sequence alignment, perhaps removing gaps and ambiguous positions in the alignment and finally phylogenetically inferring the tree with various evolutionary models. Often there is a need to refine the tree by removing inappropriate sequences. For each step of this process there is a tool to accomplish the task. Starting with a sequence of interest BLAST (Basic Local Alignment Search Tool) is used to find homologous sequences from various databases. Then multiple sequence alignment programs such as MUSCLE or CLUSTAL can be used to align the sequences. In the alignment there are often regions of gaps and ambiguous positions that can be identified and removed with programs such as GBlocks. Finally, using the alignment a phylogenetic tree can be reconstructed using selected methods and models. Depending on the scientific goals and data, this process generally needs to be repeated several times in order to “refine” the tree. To construct a tree of correct phylogeny “true” homologous positions in an alignment must beused. In addition, if the tree is to reconstruct the correct relationship among species (rather than just the genes), then it is also necessary to use orthologous sequences, rather than sequences that have undergone duplications (paralogs). To further complicate tree reconstruction there are technical problems such as long branch attraction (where fast evolving sequences cluster together even if they are unrelated) and horizontal gene transfer (where cells that can be unrelated exchange genes) that could mislead the phylogeny. At present there is no effective program that is sophisticated enough to correct these kinds of problems without careful manual examination. However, many of these steps are simple and repetitive. It is the goal of bioinformatics to automate as many of these simple tedious steps as possible, in order to allow large amounts of data to be processed quickly and accurately. In this paper a tool that streamlines the phylogenetic tree reconstruction process is presented. The tool, named AutoPhylo, identifies and retrieves sequences from NCBI (database collection) or a user-defined local database via BLAST searches. These sequences are then used to construct a tree that can be examined with a graphical user interface (GUI). The GUI allows the user to identify and remove unwanted sequences in order to refine the tree. The sequences are retrieved in groups that have one or more queries that limits the selection to specific species, genes or others valid NCBI queries. Some tests are applied to show that the program is useful and is able to accelerate subroutines of the process.

Place, publisher, year, edition, pages
2010. , 24 p.
National Category
Bioinformatics and Systems Biology
URN: urn:nbn:se:uu:diva-142609OAI: oai:DiVA.org:uu-142609DiVA: diva2:387752
Educational program
Bachelor Programme in Biology / Molecular Biology
Life Earth Science
The root of the eukaryote tree
Available from: 2011-03-04 Created: 2011-01-14 Last updated: 2012-04-23Bibliographically approved

Open Access in DiVA

No full text

By organisation
Systematic BiologyBiology Education Centre
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 279 hits
ReferencesLink to record
Permanent link

Direct link