Identification of Biomarkers and Signatures in Protein Data
2015 (English)In: 2015 IEEE 11Th International Conference On E-Science, 2015, 411-419 p.Conference paper (Refereed)
The correct diagnosis of cancer patients conventionally depends on the pathologist's experience and ability to distinguish cancer tissue from normal tissue under a microscope. Advances in technology for measuring the abundance of, e.g., proteins and mRNAs in tissue samples make it interesting to search for an optimal subset of these for classification of samples as cancer or normal. We discuss issues of identification of biomarkers that provide distinct signatures for prediction of tissues as cancer or normal, exemplified by our recent study of cancer signalling signatures in human colon cancer characterised with regards to protein abundance using high sensitivity isoelectric focusing. We show that the optimal subset for separation of cancer tissues from normal tissues does not contain any of the proteins in the top quintile in terms of significant difference between the groups according to Mann-Whitney U-test or correlation to the diagnosis. Actually, one of the proteins belongs to the tertile with the lowest significance and correlation. This highlights the weakness of the practice of only looking for significant differences in the abundance of individual proteins and raises the question of how many lifesaving discoveries that have been missed due to it. We also demonstrate how Monte Carlo simulations of the separation with random class assignment can be used to calculate p-values for observing any specific separation by chance and selection of the optimal number of proteins in the subset based on these p-values. Both selection of the optimal number of biomarkers and calculation of p-values corrected for multiple hypothesis testing are essential to obtain a subset of biomarkers that yield robust predictions for clinical use.
Place, publisher, year, edition, pages
2015. 411-419 p.
, Proceeding IEEE International Conference on e-Science (e-Science), ISSN 2325-372X
Biomarkers, Mann-Whitney U test, Student's t-test, Spearman's rank correlation, Subset selection, Variable selection, Feature selection, Monte Carlo simulations, p-values, Cancer, Colon cancer, Protein abundance
Computer and Information Science Cancer and Oncology
IdentifiersURN: urn:nbn:se:uu:diva-303183DOI: 10.1109/eScience.2015.46ISI: 000380433500055ISBN: 9781467393256OAI: oai:DiVA.org:uu-303183DiVA: diva2:971151
IEEE International Conference On eScience, AUG 31-SEP 04, 2015, Munich, GERMANY