uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Distinguishing protein-coding and noncoding genes in the human genome
Show others and affiliations
2007 (English)In: Proceedings of the National Academy of Sciences of the United States of America, ISSN 0027-8424, E-ISSN 1091-6490, Vol. 104, no 49, 19428-19433 p.Article in journal (Refereed) Published
Abstract [en]

Although the Human Genome Project was completed 4 years ago, the catalog of human protein-coding genes remains a matter of controversy. Current catalogs list a total of ≈24,500 putative protein-coding genes. It is broadly suspected that a large fraction of these entries are functionally meaningless ORFs present by chance in RNA transcripts, because they show no evidence of evolutionary conservation with mouse or dog. However, there is currently no scientific justification for excluding ORFs simply because they fail to show evolutionary conservation: the alternative hypothesis is that most of these ORFs are actually valid human genes that reflect gene innovation in the primate lineage or gene loss in the other lineages. Here, we reject this hypothesis by carefully analyzing the nonconserved ORFs—specifically, their properties in other primates. We show that the vast majority of these ORFs are random occurrences. The analysis yields, as a by-product, a major revision of the current human catalogs, cutting the number of protein-coding genes to ≈20,500. Specifically, it suggests that nonconserved ORFs should be added to the human gene catalog only if there is clear evidence of an encoded protein. It also provides a principled methodology for evaluating future proposed additions to the human gene catalog. Finally, the results indicate that there has been relatively little true innovation in mammalian protein-coding genes.

Place, publisher, year, edition, pages
2007. Vol. 104, no 49, 19428-19433 p.
Keyword [en]
Animals, Base Sequence, DNA Transposable Elements/genetics, Dogs, Genes/genetics, Genetic Code, Genome; Human/*genetics, Genomics, Humans, Mice, Molecular Sequence Data, Open Reading Frames/*genetics, Proteins/*genetics, Pseudogenes/genetics, Sequence Analysis; DNA
National Category
Medical and Health Sciences
Identifiers
URN: urn:nbn:se:uu:diva-13091DOI: 10.1073/pnas.0709013104PubMedID: 18040051OAI: oai:DiVA.org:uu-13091DiVA: diva2:40861
Available from: 2008-01-21 Created: 2008-01-21 Last updated: 2017-12-11Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textPubMedhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=18040051&dopt=Citation

Authority records BETA

Lindblad-Toh, Kerstin

Search in DiVA

By author/editor
Lindblad-Toh, Kerstin
By organisation
Department of Medical Biochemistry and Microbiology
In the same journal
Proceedings of the National Academy of Sciences of the United States of America
Medical and Health Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 633 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf