A Comparison of Two Strategies for Building an Exposure Prediction Model
2016 (English)In: Annals of Occupational Hygiene, ISSN 0003-4878, E-ISSN 1475-3162, Vol. 60, no 1, 74-89 p.Article in journal (Refereed) PublishedText
Cost-efficient assessments of job exposures in large populations may be obtained from models in which 'true' exposures assessed by expensive measurement methods are estimated from easily accessible and cheap predictors. Typically, the models are built on the basis of a validation study comprising 'true' exposure data as well as an extensive collection of candidate predictors from questionnaires or company data, which cannot all be included in the models due to restrictions in the degrees of freedom available for modeling. In these situations, predictors need to be selected using procedures that can identify the best possible subset of predictors among the candidates. The present study compares two strategies for selecting a set of predictor variables. One strategy relies on stepwise hypothesis testing of associations between predictors and exposure, while the other uses cluster analysis to reduce the number of predictors without relying on empirical information about the measured exposure. Both strategies were applied to the same dataset on biomechanical exposure and candidate predictors among computer users, and they were compared in terms of identified predictors of exposure as well as the resulting model fit using bootstrapped resamples of the original data. The identified predictors were, to a large part, different between the two strategies, and the initial model fit was better for the stepwise testing strategy than for the clustering approach. Internal validation of the models using bootstrap resampling with fixed predictors revealed an equally reduced model fit in resampled datasets for both strategies. However, when predictor selection was incorporated in the validation procedure for the stepwise testing strategy, the model fit was reduced to the extent that both strategies showed similar model fit. Thus, the two strategies would both be expected to perform poorly with respect to predicting biomechanical exposure in other samples of computer users.
Place, publisher, year, edition, pages
2016. Vol. 60, no 1, 74-89 p.
bias, optimism, statistical performance, variable selection
Environmental Health and Occupational Health
IdentifiersURN: urn:nbn:se:uu:diva-280906DOI: 10.1093/annhyg/mev072ISI: 000369997400007PubMedID: 26424806OAI: oai:DiVA.org:uu-280906DiVA: diva2:912309