Developer Friendly and Computationally Efficient Predictive Modeling without Information Leakage: The emil Package for R
(English)In: Journal of Statistical Software, ISSN 1548-7660Article in journal (Other academic) Submitted
Machine learning-based solutions to predictive modeling problems (classification, regression, or survival analysis) typically involve a number of steps beginning with data pre-processing and ending with performance evaluation. A large number of packages providing tools for the individual steps are available for R but not for facilitating the assembly of them into complete modeling procedures or rigorously evaluating their combined performance.
We present a new package for R denoted emil (evaluation of modeling without information leakage) that is designed to be a flexible backbone of modeling procedures having the following properties:(1) Enable evaluation of performance and variable importance by means of resampling methods without introducing information leakage.(2) Return parameter tuning statistics and final prediction models.(3) Transparent, highly customizable and easy to debug structure.(4) Offer the user direct control over memory and CPU-intensive steps of the calculations.(5) Comprehensive yet concise documentation.
First we explain emil's functionality in the context of standard usage, resampling, and customization. Specific application examples are presented to show its potential in terms of parallelization, customization for survival analysis, and memory management.
The result is a computationally efficient and developer friendly framework that enables resampling based analyzes using several hundreds of thousands of variables, is easy to extend, and allows development of scalable solutions.
predictive modeling, machine learning, performance evaluation, resampling, high performance computing
Research subject Materials Science
IdentifiersURN: urn:nbn:se:uu:diva-242353OAI: oai:DiVA.org:uu-242353DiVA: diva2:783296
FunderSwedish Foundation for Strategic Research , RBc08-008