uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
Using paired-end sequences to optimise parameters for alignment of sequence reads against related genomes
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology.
2010 (English)In: BMC Genomics, ISSN 1471-2164, Vol. 11, 458- p.Article in journal (Refereed) Published
Abstract [en]

Background: The advent of cheap high through-put sequencing methods has facilitated low coverage skims of a large number of organisms. To maximise the utility of the sequences, assembly into contigs and then ordering of those contigs is required. Whilst sequences can be assembled into contigs de novo, using assembled genomes of closely related organisms as a framework can considerably aid the process. However, the preferred search programs and parameters that will optimise the sensitivity and specificity of the alignments between the sequence reads and the framework genome(s) are not necessarily obvious. Here we demonstrate a process that uses paired-end sequence reads to choose an optimal program and alignment parameters. Results: Unlike two single fragment reads, in paired-end sequence reads, such as BAC-end sequences, the two sequences in the pair have a known positional relationship in the original genome. This provides an additional level of confidence over match scores and e-values in the accuracy of the positional assignment of the reads in the comparative genome. Three commonly used sequence alignment programs: MegaBLAST, Blastz and PatternHunter were used to align a set of ovine BAC-end sequences against the equine genome assembly. A range of different search parameters, with a particular focus on contiguous and discontiguous seeds, were used for each program. The number of reads with a hit and the number of read pairs with hits for the two end sequences in the tail-to-tail paired-end configuration were plotted relative to the theoretical maximum expected curve. Of the programs tested, MegaBLAST with short contiguous seed lengths (word size 8-11) performed best in this particular task. In addition the data also provides estimates of the false positive and false negative rates, which can be used to determine the appropriate values of additional parameters, such as score cut-off, to balance sensitivity and specificity. To determine whether the approach also worked for the alignment of shorter reads, the first 240 bases of each BAC end sequence were also aligned to the equine genome. Again, contiguous MegaBLAST performed the best in optimising the sensitivity and specificity with which sheep BAC end reads map to the equine and bovine genomes. Conclusions: Paired-end reads, such as BAC-end sequences, provide an efficient mechanism to optimise sequence alignment parameters, for example for comparative genome assemblies, by providing an objective standard to evaluate performance.

Place, publisher, year, edition, pages
2010. Vol. 11, 458- p.
National Category
Medical and Health Sciences Biological Sciences
URN: urn:nbn:se:uu:diva-134172DOI: 10.1186/1471-2164-11-458ISI: 000282788300001OAI: oai:DiVA.org:uu-134172DiVA: diva2:371801
Available from: 2010-11-22 Created: 2010-11-22 Last updated: 2010-11-22Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text
By organisation
Department of Medical Biochemistry and Microbiology
In the same journal
BMC Genomics
Medical and Health SciencesBiological Sciences

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 264 hits
ReferencesLink to record
Permanent link

Direct link