uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
SimSel: a new simulation method for variable selection
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Mathematics, Mathematical Statistics.
2012 (English)In: Journal of Statistical Computation and Simulation, ISSN 0094-9655, E-ISSN 1563-5163, Vol. 82, no 4, 515-527 p.Article in journal (Refereed) Published
Abstract [en]

We propose a new simulation method, SimSel, for variable selection in linear and nonlinear modelling problems. SimSel works by disturbing the input data with pseudo-errors. We then study how this disturbance affects the quality of an approximative model fitted to the data. The main idea is that disturbing unimportant variables does not affect the quality of the model fit. The use of an approximative model has the advantage that the true underlying function does not need to be known and that the method becomes insensitive to model misspecifications. We demonstrate SimSel on simulated data from linear and nonlinear models and on two real data sets. The simulation studies suggest that SimSel works well in complicated situations, such as nonlinear errors-in-variable models.

Place, publisher, year, edition, pages
2012. Vol. 82, no 4, 515-527 p.
Keyword [en]
variable selection, simulation method, pseudo-error, pseudo-variable
National Category
Computer and Information Science
URN: urn:nbn:se:uu:diva-109360DOI: 10.1080/00949655.2010.543981ISI: 000303234800003OAI: oai:DiVA.org:uu-109360DiVA: diva2:272075
Available from: 2009-10-14 Created: 2009-10-14 Last updated: 2012-07-26Bibliographically approved
In thesis
1. eScience Approaches to Model Selection and Assessment: Applications in Bioinformatics
Open this publication in new window or tab >>eScience Approaches to Model Selection and Assessment: Applications in Bioinformatics
2009 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

High-throughput experimental methods, such as DNA and protein microarrays, have become ubiquitous and indispensable tools in biology and biomedicine, and the number of high-throughput technologies is constantly increasing. They provide the power to measure thousands of properties of a biological system in a single experiment and have the potential to revolutionize our understanding of biology and medicine. However, the high expectations on high-throughput methods are challenged by the problem to statistically model the wealth of data in order to translate it into concrete biological knowledge, new drugs, and clinical practices. In particular, the huge number of properties measured in high-throughput experiments makes statistical model selection and assessment exigent. To use high-throughput data in critical applications, it must be warranted that the models we construct reflect the underlying biology and are not just hypotheses suggested by the data. We must furthermore have a clear picture of the risk of making incorrect decisions based on the models.

The rapid improvements of computers and information technology have opened up new ways of how the problem of model selection and assessment can be approached. Specifically, eScience, i.e. computationally intensive science that is carried out in distributed network envi- ronments, provides computational power and means to efficiently access previously acquired scientific knowledge. This thesis investigates how we can use eScience to improve our chances of constructing biologically relevant models from high-throughput data. Novel methods for model selection and assessment that leverage on computational power and on prior scientific information to "guide" the model selection to models that a priori are likely to be relevant are proposed. In addition, a software system for deploying new methods and make them easily accessible to end users is presented.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2009. 51 p.
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy, ISSN 1651-6192 ; 112
bioinformatics, high-throughout biology, eScience, model selection, model assessment
National Category
Bioinformatics and Systems Biology
urn:nbn:se:uu:diva-109437 (URN)978-91-554-7634-2 (ISBN)
Public defence
2009-11-28, B42, BMC, Husargatan 3, Uppsala, 10:15 (English)
Available from: 2009-11-06 Created: 2009-10-15 Last updated: 2011-05-11Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Eklund, MartinZwanzig, Silvelyn
By organisation
Department of Pharmaceutical BiosciencesMathematical Statistics
In the same journal
Journal of Statistical Computation and Simulation
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 303 hits
ReferencesLink to record
Permanent link

Direct link