Benchmarking Variable Selection in QSAR
2012 (English)In: Molecular Informatics, ISSN 1868-1743, Vol. 31, no 2, 173-179 p.Article in journal (Refereed) Published
Variable selection is important in QSAR modeling since it can improve model performance and transparency, as well as reduce the computational cost of model fitting and predictions. Which variable selection methods that perform well in QSAR settings is largely unknown. To address this question we, in a total of 1728 benchmarking experiments, rigorously investigated how eight variable selection methods affect the predictive performance and transparency of random forest models fitted to seven QSAR datasets covering different endpoints, descriptors sets, types of response variables, and number of chemical compounds. The results show that univariate variable selection methods are suboptimal and that the number of variables in the benchmarked datasets can be reduced with about 60?% without significant loss in model performance when using multivariate adaptive regression splines MARS and forward selection.
Place, publisher, year, edition, pages
2012. Vol. 31, no 2, 173-179 p.
Variable selection, Benchmarking, Optimization, Model performance
IdentifiersURN: urn:nbn:se:uu:diva-171437DOI: 10.1002/minf.201100142ISI: 000300675200007OAI: oai:DiVA.org:uu-171437DiVA: diva2:510899