uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
De novo search for non-coding RNA genes in the AT-rich genome of Dictyostelium discoideum: Performance of Markov-dependent genome feature scoring
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology.
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology.
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology.
Show others and affiliations
2008 (English)In: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 18, no 6, 888-899 p.Article in journal (Refereed) Published
Abstract [en]

Genome data are increasingly important in the computational identification of novel regulatory non-coding RNAs (ncRNAs). However, most ncRNA gene-finders are either specialized to well-characterized ncRNA gene families or require comparisons of closely related genomes. We developed a method for de novo screening for ncRNA genes with a nucleotide composition that stands out against the background genome based on a partial sum process. We compared the performance when assuming independent and first-order Markov-dependent nucleotides, respectively, and used Karlin-Altschul and Karlin-Dembo statistics to evaluate the significance of hits. We hypothesized that a first-order Markov-dependent process might have better power to detect ncRNA genes since nearest-neighbor models have been shown to be successful in predicting RNA structures. A model based on a first-order partial sum process (analyzing overlapping dinucleotides) had better sensitivity and specificity than a zeroth-order model when applied to the AT-rich genome of the amoeba Dictyostelium discoideum. In this genome, we detected 94% of previously known ncRNA genes (at this sensitivity, the false positive rate was estimated to be 25% in a simulated background). The predictions were further refined by clustering candidate genes according to sequence similarity and/or searching for an ncRNA-associated upstream element. We experimentally verified six out of 10 tested ncRNA gene predictions. We conclude that higher-order models, in combination with other information, are useful for identification of novel ncRNA gene families in single-genome analysis of D. discoideum. Our generalizable approach extends the range of genomic data that can be searched for novel ncRNA genes using well-grounded statistical methods.

Place, publisher, year, edition, pages
2008. Vol. 18, no 6, 888-899 p.
National Category
Biochemistry and Molecular Biology
URN: urn:nbn:se:uu:diva-97961DOI: 10.1101/gr.069104.107ISI: 000256356200006PubMedID: 18347326OAI: oai:DiVA.org:uu-97961DiVA: diva2:173092
Available from: 2009-01-01 Created: 2009-01-01 Last updated: 2011-10-13Bibliographically approved
In thesis
1. Computational Approaches to the Identification and Characterization of Non-Coding RNA Genes
Open this publication in new window or tab >>Computational Approaches to the Identification and Characterization of Non-Coding RNA Genes
2009 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Non-coding RNAs (ncRNAs) have emerged as highly diverse and powerful key players in the cell, the range of capabilities spanning from catalyzing essential processes in all living organisms, e.g. protein synthesis, to being highly specific regulators of gene expression. To fully understand the functional significance of ncRNAs, it is of critical importance to identify and characterize the repertoire of ncRNAs in the cell. Practically every genome-wide screen to identify ncRNAs has revealed large numbers of expressed ncRNAs and often identified species-specific ncRNA families of unknown function. Recent years' advancement in high-throughput sequencing techniques necessitates efficient and reliable methods for computational identification and annotation of genes. A major aim in the work underlying this thesis has been to develop and use computational tools for the identification and characterization of ncRNA genes.

We used computational approaches in combination with experimental methods to study the ncRNA repertoire of the model organism Dictyostelium discoideum. We report ncRNA genes belonging to well-characterized gene families as well as previously unknown and potentially species-specific ncRNA families. The complicated task of de novo ncRNA gene prediction was successfully addressed by developing a method for nucleotide composition-based gene prediction using maximal-scoring partial sums and considering overlapping dinucleotides.

We also report a substantial heterogeneity among human spliceosomal snRNAs. Northern blot analysis and cDNA cloning, as well as bioinformatical analysis of publicly available microarray data, revealed a large number of expressed snRNAs. In particular, U1 snRNA variants with several nucleotide substitutions that could potentially have dramatic effects on splice site recognition were identified.

In conclusion, we have by using computational approaches combined with experimental analysis identified a rich and diverse ncRNA repertoire in the eukaryotes D. discoideum and Homo sapiens. The surprising diversity among the snRNAs in H. sapiens suggests a functional involvement in recognition of non-canonical introns and regulation of messenger RNA splicing.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2009. 57 p.
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 589
ncRNA, snRNA, U1, splice site, alternative splicing, Dictyostelium, nucleotide composition, partial sums
National Category
Bioinformatics (Computational Biology)
urn:nbn:se:uu:diva-9518 (URN)978-91-554-7386-0 (ISBN)
Public defence
2009-01-30, B21, Biomedicinskt Centrum, Husargatan 3, Uppsala, 10:00
Available from: 2009-01-01 Created: 2009-01-01Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textPubMed

Search in DiVA

By author/editor
Hinas, Andrea
By organisation
Department of Cell and Molecular BiologyDepartment of Medical Biochemistry and MicrobiologyThe Linnaeus Centre for BioinformaticsMolecular Biology
In the same journal
Genome Research
Biochemistry and Molecular Biology

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 315 hits
ReferencesLink to record
Permanent link

Direct link