External cross-validation for unbiased evaluation of protein family detectors: application to allergens
2005 (English)In: Proteins, ISSN 0887-3585, Vol. 61, no 4, 918-925 p.Article in journal (Refereed) Published
Key issues in protein science and computational biology are design and evaluation of algorithms aimed at detection of proteins that belong to a specific family, as defined by structural, evolutionary, or functional criteria. In this context, several validation techniques are often used to compare different parameter settings of the detector, and to subsequently select the setting that yields the smallest error rate estimate. A frequently overlooked problem associated with this approach is that this smallest error rate estimate may have a large optimistic bias. Based on computer simulations, we show that a detector's error rate estimate can be overly optimistic and propose a method to obtain unbiased performance estimates of a detector design procedure. The method is founded on an external 10-fold cross-validation (CV) loop that embeds an internal validation procedure used for parameter selection in detector design. The designed detector generated in each of the 10 iterations are evaluated on held-out examples exclusively available in the external CV iterations. Notably, the average of these 10 performance estimates is not associated with a final detector, but rather with the average performance of the design procedure used. We apply the external CV loop to the particular problem of detecting potentially allergenic proteins, using a previously reported design procedure. Unbiased performance estimates of the allergen detector design procedure are presented together with information about which algorithms and parameter settings that are most frequently selected.
Place, publisher, year, edition, pages
2005. Vol. 61, no 4, 918-925 p.
protein classification, bias, bioinformatics, supervised learning, allergy, sequence alignment
Medical and Health Sciences
IdentifiersURN: urn:nbn:se:uu:diva-76006DOI: 10.1002/prot.20656PubMedID: 16231294OAI: oai:DiVA.org:uu-76006DiVA: diva2:103917