Prediction of drug class and adverse sideeffects based on induced gene expressionprofiles - a feasability study
Independent thesis Advanced level (degree of Master (Two Years)), 30 credits / 45 HE creditsStudent thesis
One of the core businesses of biomedical study is to establish diseases/genes/drugs connections, which still remains as a fundamental challenge in today’s pharmacological and medical research due to the limited numbers of effective tools. Gene-expression profiling has historically served as a valuable resource for elucidating the mechanisms underlying biological pathways, for instance, the molecular pathological mechanism of certain diseases in biomedicine. However, few efforts have been put into exploring deeper knowledge of medication by means of gene expression profiling. The aim of the project reported here was to establish a systematic approach to the discovery of functional connections among gene expression, drug classification and drug action using statistical (machine) learning techniques available and refined in R and in RapidMiner. Based on the data derived from “Connectivity Map”resource (a large collection of 22000-dimensional gene-expression profiles induced in cultured human cells when treated with 1309 different drug molecules) the feasibility to establish a well performing classifier which can predict drug groups according to the Anatomical Therapeutic Chemical (ATC) system was explored. The same kind of classification approach was also applied to predict adverse side effects of drugs, available in the SIDER online database, using the same “Connectivity Map” gene expression data. In order to avoid information leaks between the classifier design and the subsequent test on new examples, which could future lead to over-optimistic conclusions, all feasibility studies were performed using carefully designed cross validation procedures. Although we succeeded in building a well performing classifier for one certain ATC group on the second level, the overall performance of classifiers when evaluated properly (no information leaks) was less promising than expected, for both ATC and side effect prediction. Therefore the main conclusion is that the “Connectivity Map” resource seems to contain surprisingly limited information with respect to these two prediction tasks.
Place, publisher, year, edition, pages
2013. , 41 p.
drug class, side effect, classification, machine learning
Bioinformatics and Systems Biology
IdentifiersURN: urn:nbn:se:uu:diva-195836OAI: oai:DiVA.org:uu-195836DiVA: diva2:608427
Master Programme in Bioinformatics
2012-11-15, C2:305, BMC, Husargatan 3, Uppsala, 13:57 (English)
Gustafsson, Mats, ProfessorAndersson, Claes, Doctor
Josefsson, Lars-Göran, Docotor