uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
TRAP: Tandem Repeat Assembly Program produces improved shotgun assemblies of repetitive sequences
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Genetics and Pathology.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Genetics and Pathology.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Genetics and Pathology.
2003 (English)In: Computer Methods and Programs in Biomedicine, ISSN 0169-2607, E-ISSN 1872-7565, Vol. 70, no 1, 47-59 p.Article in journal (Refereed) Published
Abstract [en]

The software commonly used for assembly of shotgun sequence data has several limitations. One such limitation becomes obvious when repetitive sequences are encountered. Shotgun assembly is a difficult task, even for non-repetitive regions, but the use of quality assessments of the data and efficient matching algorithms have made it possible to assemble most sequences efficiently. In the case of highly repetitive sequences, however, these algorithms fail to distinguish between sequencing errors and single base differences in regions containing nearly identical repeats. None of the currently available fragment assembly programs are able to correctly assemble highly similar repetitive data, and we, therefore, present a novel shotgun assembly program, Tandem Repeat Assembly Program (TRAP). The main feature of this program is the ability to separate long repetitive regions from each other by distinguishing single base substitutions as well as insertions/deletions from sequencing errors. This is accomplished by using a novel multiple-alignment based analysis method. Since repeats are a common complication in most sequencing projects, this software should be of use for the whole sequencing community.

Place, publisher, year, edition, pages
2003. Vol. 70, no 1, 47-59 p.
National Category
Medical and Health Sciences
Identifiers
URN: urn:nbn:se:uu:diva-89932DOI: 10.1016/S0169-2607(01)00194-8PubMedID: 12468126OAI: oai:DiVA.org:uu-89932DiVA: diva2:161815
Available from: 2002-05-23 Created: 2002-05-23 Last updated: 2013-07-24Bibliographically approved
In thesis
1. Software Tools and Algorithms for Shotgun Sequence Assembly
Open this publication in new window or tab >>Software Tools and Algorithms for Shotgun Sequence Assembly
2002 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

During the last ten years, a genomics revolution has changed the ways biological research is carried out. The draft sequence of the human genome is available, as well as the sequence of 84 other completed genomes. High-throughput genomics technologies such as genome sequencing with associated bioinformatics tools have been instrumental in this process. The draft genome sequences were determined using the shotgun sequencing strategy, where long DNA molecules are randomly sheared into small pieces from which sequences are determined. These are assembled by computer programs in order to reconstruct the original genome sequence. Ubiquitous repeated sequences together with errors in the sequencing process complicate the assembly of shotgun fragments. In most genome projects gaps are caused by this complication.

This thesis presents methods and algorithms to separate repeated sequences in shotgun projects. The Tandem Repeat Assembly Program (TRAP) builds multiple alignments of reads, which are then analyzed in order to discriminate sequencing errors from real differences between highly similar repeats. The method is based on the fact that sequencing errors are randomly distributed, as opposed to the systematic distribution of mutations in repeat copies. The TRAP assembler was shown to be able to correctly assemble 2000 bp repeat copies that are repeated in tandem eight times. The degree of difference between repeat copies was 1.0%, and the maximum sequencing error 11%.

A refined method based on single base differences between repeat copies has been developed to further improve repeat separation. Results show that in the same sequence, 87% of all the single base differences present in the repeats can be detected, with an error of only 1.6%.

In addition, a novel pattern-matching algorithm was developed. This algorithm takes advantage of the inherent symmetry between indices that can be computed for similar words of the same length and was implemented in the error correction software, MisEd. The results show that up to 99.3% of the sequencing errors can be corrected, while up to 87% of the single base differences remain.

All methods described have thus been shown to be functional, and it is clear that these programs will facilitate genome sequencing and assembly.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2002. 55 p.
Series
Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine, ISSN 0282-7476 ; 1169
Keyword
Genetics, Shotgun sequencing, multiple alignment, fragment assembly, repeats, sequencing error, Genetik
National Category
Medical Genetics
Research subject
Medical Genetics
Identifiers
urn:nbn:se:uu:diva-2176 (URN)91-554-5361-9 (ISBN)
Public defence
2002-09-06, Rudbecksalen, Uppsala, 13:15
Opponent
Available from: 2002-05-23 Created: 2002-05-23Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textPubMed
By organisation
Department of Genetics and Pathology
In the same journal
Computer Methods and Programs in Biomedicine
Medical and Health Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 529 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf