uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
Correcting Errors in Shotgun Sequences Using DNPs and a Novel Pattern Matching Algorithm
Uppsala University, Medicinska vetenskapsområdet, Faculty of Medicine, Department of Genetics and Pathology.
In: BioinformaticsArticle in journal (Refereed) Submitted
URN: urn:nbn:se:uu:diva-89934OAI: oai:DiVA.org:uu-89934DiVA: diva2:161817
Available from: 2002-05-23 Created: 2002-05-23Bibliographically approved
In thesis
1. Software Tools and Algorithms for Shotgun Sequence Assembly
Open this publication in new window or tab >>Software Tools and Algorithms for Shotgun Sequence Assembly
2002 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

During the last ten years, a genomics revolution has changed the ways biological research is carried out. The draft sequence of the human genome is available, as well as the sequence of 84 other completed genomes. High-throughput genomics technologies such as genome sequencing with associated bioinformatics tools have been instrumental in this process. The draft genome sequences were determined using the shotgun sequencing strategy, where long DNA molecules are randomly sheared into small pieces from which sequences are determined. These are assembled by computer programs in order to reconstruct the original genome sequence. Ubiquitous repeated sequences together with errors in the sequencing process complicate the assembly of shotgun fragments. In most genome projects gaps are caused by this complication.

This thesis presents methods and algorithms to separate repeated sequences in shotgun projects. The Tandem Repeat Assembly Program (TRAP) builds multiple alignments of reads, which are then analyzed in order to discriminate sequencing errors from real differences between highly similar repeats. The method is based on the fact that sequencing errors are randomly distributed, as opposed to the systematic distribution of mutations in repeat copies. The TRAP assembler was shown to be able to correctly assemble 2000 bp repeat copies that are repeated in tandem eight times. The degree of difference between repeat copies was 1.0%, and the maximum sequencing error 11%.

A refined method based on single base differences between repeat copies has been developed to further improve repeat separation. Results show that in the same sequence, 87% of all the single base differences present in the repeats can be detected, with an error of only 1.6%.

In addition, a novel pattern-matching algorithm was developed. This algorithm takes advantage of the inherent symmetry between indices that can be computed for similar words of the same length and was implemented in the error correction software, MisEd. The results show that up to 99.3% of the sequencing errors can be corrected, while up to 87% of the single base differences remain.

All methods described have thus been shown to be functional, and it is clear that these programs will facilitate genome sequencing and assembly.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2002. 55 p.
Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine, ISSN 0282-7476 ; 1169
Genetics, Shotgun sequencing, multiple alignment, fragment assembly, repeats, sequencing error, Genetik
National Category
Medical Genetics
Research subject
Medical Genetics
urn:nbn:se:uu:diva-2176 (URN)91-554-5361-9 (ISBN)
Public defence
2002-09-06, Rudbecksalen, Uppsala, 13:15
Available from: 2002-05-23 Created: 2002-05-23Bibliographically approved

Open Access in DiVA

No full text

By organisation
Department of Genetics and Pathology

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 40 hits
ReferencesLink to record
Permanent link

Direct link