uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rediscovery rate estimation for assessing the validation of significant findings in high-throughput studies
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Molecular epidemiology. Uppsala University, Science for Life Laboratory, SciLifeLab.
Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Molecular epidemiology.ORCID iD: 0000-0003-2256-6972
2015 (English)In: Briefings in Bioinformatics, ISSN 1467-5463, E-ISSN 1477-4054, Vol. 16, no 4, 563-575 p.Article in journal (Refereed) Published
Abstract [en]

It is common and advised practice in biomedical research to validate experimental or observational findings in a population different from the one where the findings were initially assessed. This practice increases the generalizability of the results and decreases the likelihood of reporting false-positive findings. Validation becomes critical when dealing with high-throughput experiments, where the large number of tests increases the chance to observe false-positive results. In this article, we review common approaches to determine statistical thresholds for validation and describe the factors influencing the proportion of significant findings from a 'training' sample that are replicated in a 'validation' sample. We refer to this proportion as rediscovery rate (RDR). In high-throughput studies, the RDR is a function of false-positive rate and power in both the training and validation samples. We illustrate the application of the RDR using simulated data and real data examples from metabolomics experiments. We further describe an online tool to calculate the RDR using t-statistics. We foresee two main applications. First, if the validation study has not yet been collected, the RDR can be used to decide the optimal combination between the proportion of findings taken to validation and the size of the validation study. Secondly, if a validation study has already been done, the RDR estimated using the training data can be compared with the observed RDR from the validation data; hence, the success of the validation study can be assessed.

Place, publisher, year, edition, pages
2015. Vol. 16, no 4, 563-575 p.
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:uu:diva-244119DOI: 10.1093/bib/bbu033ISI: 000359083200002PubMedID: 25256289OAI: oai:DiVA.org:uu-244119DiVA: diva2:787852
Funder
Swedish Heart Lung FoundationSwedish Research Council
Available from: 2015-02-11 Created: 2015-02-11 Last updated: 2017-12-04Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textPubMed

Authority records BETA

Ingelsson, Erik

Search in DiVA

By author/editor
Ingelsson, Erik
By organisation
Molecular epidemiologyScience for Life Laboratory, SciLifeLab
In the same journal
Briefings in Bioinformatics
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 356 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf