uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
Identification of sample annotation errors in gene expression datasets
TU Dortmund Univ, Dept Stat, D-44227 Dortmund, Germany..
TU Dortmund Univ, Dept Stat, D-44227 Dortmund, Germany..
Dortmund TU, Leibniz Res Ctr Working Environm & Human Factors, Dortmund, Germany..
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology.
Show others and affiliations
2015 (English)In: Archives of Toxicology, ISSN 0340-5761, E-ISSN 1432-0738, Vol. 89, no 12, 2265-2272 p.Article in journal (Refereed) PublishedText
Abstract [en]

The comprehensive transcriptomic analysis of clinically annotated human tissue has found widespread use in oncology, cell biology, immunology, and toxicology. In cancer research, microarray-based gene expression profiling has successfully been applied to subclassify disease entities, predict therapy response, and identify cellular mechanisms. Public accessibility of raw data, together with corresponding information on clinicopathological parameters, offers the opportunity to reuse previously analyzed data and to gain statistical power by combining multiple datasets. However, results and conclusions obviously depend on the reliability of the available information. Here, we propose gene expression-based methods for identifying sample misannotations in public transcriptomic datasets. Sample mix-up can be detected by a classifier that differentiates between samples from male and female patients. Correlation analysis identifies multiple measurements of material from the same sample. The analysis of 45 datasets (including 4913 patients) revealed that erroneous sample annotation, affecting 40 % of the analyzed datasets, may be a more widespread phenomenon than previously thought. Removal of erroneously labelled samples may influence the results of the statistical evaluation in some datasets. Our methods may help to identify individual datasets that contain numerous discrepancies and could be routinely included into the statistical analysis of clinical gene expression data.

Place, publisher, year, edition, pages
2015. Vol. 89, no 12, 2265-2272 p.
Keyword [en]
Gene expression, Microarray, Misannotation, Quality control, Male-female classifier
National Category
Pharmacology and Toxicology
URN: urn:nbn:se:uu:diva-272120DOI: 10.1007/s00204-015-1632-4ISI: 000366155200007PubMedID: 26608184OAI: oai:DiVA.org:uu-272120DiVA: diva2:893188
German Research Foundation (DFG), RA 870/4-1German Research Foundation (DFG), RA 870/5-1Swedish Cancer Society
Available from: 2016-01-12 Created: 2016-01-12 Last updated: 2016-01-12Bibliographically approved

Open Access in DiVA

fulltext(741 kB)59 downloads
File information
File name FULLTEXT01.pdfFile size 741 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMed

Search in DiVA

By author/editor
Mattsson, Johanna S. M.Botling, JohanMicke, Patrick
By organisation
Department of Immunology, Genetics and Pathology
In the same journal
Archives of Toxicology
Pharmacology and Toxicology

Search outside of DiVA

GoogleGoogle Scholar
Total: 59 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 199 hits
ReferencesLink to record
Permanent link

Direct link