uu.seUppsala University Publications
Change search
Refine search result
12 1 - 50 of 93
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Ahlberg, Ernst
    et al.
    AstraZeneca Innovat Med & Early Dev, Drug Safety & Metab, Molndal, Sweden..
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Hasselgren, Catrin
    Univ New Mexico, Internal Med, Albuquerque, NM 87131 USA..
    Carlsson, Lars
    AstraZeneca Innovat Med & Early Dev, Drug Safety & Metab, Molndal, Sweden..
    Interpretation of Conformal Prediction Classification Models2015In: STATISTICAL LEARNING AND DATA SCIENCES, 2015, p. 323-334Conference paper (Refereed)
    Abstract [en]

    We present a method for interpretation of conformal prediction models. The discrete gradient of the largest p-value is calculated with respect to object space. A criterion is applied to identify the most important component of the gradient and the corresponding part of the object is visualized. The method is exemplified with data from drug discovery relating chemical compounds to mutagenicity. Furthermore, a comparison is made to already established important subgraphs with respect to mutagenicity and this initial assessment shows very useful results with respect to interpretation of a conformal predictor.

  • 2. Ahmed, Laeeq
    et al.
    Georgiev, Valentin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Capuccini, Marco
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Toor, Salman
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Schaal, Wesley
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Laure, Erwin
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Efficient iterative virtual screening with Apache Spark and conformal prediction2018In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 10, article id 8Article in journal (Refereed)
  • 3.
    Alvarsson, Jonathan
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Andersson, Claes
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Larsson, Rolf
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Pharmacology.
    Wikberg, Jarl
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Brunn: an open source laboratory information system for microplates with a graphical plate layout design process2011In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 12, no 1, article id 179Article in journal (Refereed)
    Abstract [en]

    Background:

    Compound profiling and drug screening generates large amounts of data and is generally based on microplate assays. Current information systems used for handling this are mainly commercial, closed source, expensive, and heavyweight and there is a need for a flexible lightweight open system for handling plate design, and validation and preparation of data.

    Results:

    A Bioclipse plugin consisting of a client part and a relational database was constructed. A multiple-step plate layout point-and-click interface was implemented inside Bioclipse. The system contains a data validation step, where outliers can be removed, and finally a plate report with all relevant calculated data, including dose-response curves.

    Conclusions:

    Brunn is capable of handling the data from microplate assays. It can create dose-response curves and calculate IC50 values. Using a system of this sort facilitates work in the laboratory. Being able to reuse already constructed plates and plate layouts by starting out from an earlier step in the plate layout design process saves time and cuts down on error sources.

  • 4.
    Alvarsson, Jonathan
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Andersson, Claes
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Carlsson, Lars
    AstraZeneca R&D.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Wikberg, Jarl E. S.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Benchmarking Study of Parameter Variation When Using Signature Fingerprints Together with Support Vector Machines2014In: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, no 11, p. 3211-3217Article in journal (Refereed)
    Abstract [en]

    QSAR modeling using molecular signatures and support vector machines with a radial basis function is increasingly used for virtual screening in the drug discovery field. This method has three free parameters: C, ?, and signature height. C is a penalty parameter that limits overfitting, ? controls the width of the radial basis function kernel, and the signature height determines how much of the molecule is described by each atom signature. Determination of optimal values for these parameters is time-consuming. Good default values could therefore save considerable computational cost. The goal of this project was to investigate whether such default values could be found by using seven public QSAR data sets spanning a wide range of end points and using both a bit version and a count version of the molecular signatures. On the basis of the experiments performed, we recommend a parameter set of heights 0 to 2 for the count version of the signature fingerprints and heights 0 to 3 for the bit version. These are in combination with a support vector machine using C in the range of 1 to 100 and gamma in the range of 0.001 to 0.1. When data sets are small or longer run times are not a problem, then there is reason to consider the addition of height 3 to the count fingerprint and a wider grid search. However, marked improvements should not be expected.

  • 5.
    Alvarsson, Jonathan
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Engkvist, Ola
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Carlsson, Lars
    Wikberg, Jarl E. S.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Noeske, Tobias
    Ligand-Based Target Prediction with Signature Fingerprints2014In: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, no 10, p. 2647-2653Article in journal (Refereed)
    Abstract [en]

    When evaluating a potential drug candidate it is desirable to predict target interactions in silico prior to synthesis in order to assess, e.g., secondary pharmacology. This can be done by looking at known target binding profiles of similar compounds using chemical similarity searching. The purpose of this study was to construct and evaluate the performance of chemical fingerprints based on the molecular signature descriptor for performing target binding predictions. For the comparison we used the area under the receiver operating characteristics curve (AUC) complemented with net reclassification improvement (NRI). We created two open source signature fingerprints, a bit and a count version, and evaluated their performance compared to a set of established fingerprints with regards to predictions of binding targets using Tanimoto-based similarity searching on publicly available data sets extracted from ChEMBL. The results showed that the count version of the signature fingerprint performed on par with well-established fingerprints such as ECFP. The count version outperformed the bit version slightly; however, the count version is more complex and takes more computing time and memory to run so its usage should probably be evaluated on a case-by-case basis. The NRI based tests complemented the AUC based ones and showed signs of higher power.

  • 6.
    Alvarsson, Jonathan
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Lampa, Samuel
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Schaal, Wesley
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Andersson, Claes
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Wikberg, Jarl E. S.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Large-scale ligand-based predictive modelling using support vector machines2016In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 8, article id 39Article in journal (Refereed)
    Abstract [en]

    The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse.

  • 7.
    Ameur, Adam
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Yankovski, Vladimir
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Enroth, Stefan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Komorowski, Jan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    The LCB Data Warehouse2006In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 22, no 8, p. 1024-1026Article in journal (Refereed)
    Abstract [en]

    The Linnaeus Centre for Bioinformatics Data Warehouse (LCB-DWH) is a web-based infrastructure for reliable and secure microarray gene expression data management and analysis that provides an online service for the scientific community. The LCB-DWH is an effort towards a complete system for storage (using the BASE system), analysis and publication of microarray data. Important features of the system include: access to established methods within R/Bioconductor for data analysis, built-in connection to the Gene Ontology database and a scripting facility for automatic recording and re-play of all the steps of the analysis. The service is up and running on a high performance server. At present there are more than 150 registered users.

  • 8.
    Arvidsson, Staffan
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Carlsson, Lars
    AstraZeneca R&D.
    Paulo, Toccaceli
    Royal Holloway University of London.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Prediction of Metabolic Transformations using Cross Venn-ABERS Predictors2017In: Conformal and Probabilistic Prediction with Applications (COPA) 2017 / [ed] Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, Harris Papadopoulos, 2017, Vol. 60, p. 118-131Conference paper (Refereed)
    Abstract [en]

    Prediction of drug metabolism is an important topic in the drug discovery process, and we here present a study using probabilistic predictions applying Cross Venn-ABERS Predictors (CVAPs) on data for site-of-metabolism. We used a dataset of 73599 biotransformations, applied SMIRKS to define biotransformations of interest and constructed five datasets where chemical structures were represented using signatures descriptors. The results show that CVAP produces well-calibrated predictions for all datasets with good predictive capability, making CVAP an interesting method for further exploration in drug discovery applications.

  • 9.
    Capuccini, Marco
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Ahmed, Laeeq
    Schaal, Wesley
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Laure, Erwin
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Large-scale virtual screening on public cloud resources with Apache Spark2017In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 9, article id 15Article in journal (Refereed)
  • 10.
    Capuccini, Marco
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Carlsson, Lars
    Norinder, Ulf
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Conformal prediction in Spark: Large-scale machine learning with confidence2015In: Proc. 2nd International Symposium on Big Data Computing, Los Alamitos, CA: IEEE Computer Society, 2015, p. 61-67Conference paper (Refereed)
  • 11.
    Carlsson, Lars
    et al.
    Safety Assessment, AstraZeneca Research & Development.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Adams, Samuel
    Unilever Centre for Molecular Sciences Informatics, University Chemical Laboratory, Cambridge.
    Glen, Robert
    Unilever Centre for Molecular Sciences Informatics, University Chemical Laboratory, Cambridge.
    Boyer, Scott
    Safety Assessment, AstraZeneca Research & Development.
    Use of Historic Metabolic Biotransformation Data as a Means of Anticipating Metabolic Sites Using MetaPrint2D and Bioclipse2010In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11, p. 362-Article in journal (Refereed)
    Abstract [en]

    Background: Predicting metabolic sites is important in the drug discovery process to aid in rapid compound optimisation. No interactive tool exists and most of the useful tools are quite expensive.Results: Here a fast and reliable method to analyse ligands and visualise potential metabolic sites is presented which is based on annotated metabolic data, described by circular fingerprints. The method is available via the graphical workbench Bioclipse, which is equipped with advanced features in cheminformatics.Conclusions: Due to the speed of predictions (less than 50 ms per molecule), scientists can get real time decision support when editing chemical structures. Bioclipse is a rich client, which means that all calculations are performed on the local computer and do not require network connection. Bioclipse and MetaPrint2D are free for all users, released under open source licenses, and available from http://www.bioclipse.net.

  • 12.
    Carlsson, Lars
    et al.
    AstraZeneca R&D.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Boyer, Scott
    AstraZeneca R&D.
    Model building in Bioclipse Decision Support applied to open datasets2012In: Toxicology Letters, ISSN 0378-4274, E-ISSN 1879-3169, Vol. 211, no Suppl., p. S62-Article in journal (Refereed)
    Abstract [en]

    Bioclipse Decision Support (DS) is a system capable of building predictive models of any collection of SAR data, and making them available in a simple user interface based on Bioclipse (www.bioclipse.net).

    The method is fast and uses Faulon Signatures as chemical descriptors together with a Support Vector Machine algorithm for QSAR model building. A key feature is the capability to visualize and interpret results by highlighting the substructures which contributed most to the prediction. This, together with very fast predictions, allows for editing chemical structures with instantly updated results.

    We here present the results from applying Bioclipse Decision Support to several open QSAR data sets, including endpoints from OpenTox and PubChem. The results show how to extract data from the sources and to build models which can be integrated with user specific models.

  • 13. Claesson, Alf
    et al.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    On Mechanisms of Reactive Metabolite Formation from Drugs2013In: Mini-Reviews in medical chemistry, ISSN 1389-5575, E-ISSN 1875-5607, Vol. 13, no 5, p. 720-729Article in journal (Refereed)
    Abstract [en]

    Idiosyncratic adverse drug reactions (IADRs) cause a broad range of clinically severe conditions of which drug induced liver injury (DILI) in particular is one of the most frequent causes of safety-related drug withdrawals. The underlying cause is almost invariably formation of reactive metabolites (RM) which by attacking macromolecules induce organ injuries. Attempts are being made in the pharmaceutical industry to lower the risk of selecting unfit compounds as clinical candidates. Approaches vary but do not seem to be overly successful at the initial design/synthesis stage. We review here the most frequent categories of mechanisms for RM formation and propose that many cases of RMs encountered within early ADME screening can be foreseen by applying chemical and metabolic knowledge. We also mention a web tool, SpotRM, which can be used for efficient look-up and learning about drugs that have recognized IADRs likely caused by RM formation.

  • 14.
    Dahlö, Martin
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Haziza, Frédéric
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    Kallio, Aleksi
    Korpelainen, Eija
    Bongcam-Rudloff, Erik
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    BioImg.org: A catalog of virtual machine images for the life sciences2015In: Bioinformatics and Biology Insights, ISSN 1177-9322, E-ISSN 1177-9322, Vol. 9, p. 125-128Article in journal (Refereed)
    Abstract [en]

    Virtualization is becoming increasingly important in bioscience, enabling assembly and provisioning of complete computer setups, including operating system, data, software, and services packaged as virtual machine images (VMIs). We present an open catalog of VMIs for the life sciences, where scientists can share information about images and optionally upload them to a server equipped with a large file system and fast Internet connection. Other scientists can then search for and download images that can be run on the local computer or in a cloud computing environment, providing easy access to bioinformatics environments. We also describe applications where VMIs aid life science research, including distributing tools and data, supporting reproducible analysis, and facilitating education.

  • 15.
    Dahlö, Martin
    et al.
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    Scofield, Douglas
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology.
    Schaal, Wesley
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Tracking the NGS revolution: managing life science research on shared high-performance computing clusters2018In: GigaScience, ISSN 2047-217X, E-ISSN 2047-217X, Vol. 7, no 5, article id giy028Article in journal (Refereed)
  • 16.
    Eklund, Martin
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Wikberg, Jarl
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    An eScience-Bayes strategy for analyzing omics data2010In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11, p. 282-Article in journal (Refereed)
    Abstract [en]

    Background: The omics fields promise to revolutionize our understanding of biology and biomedicine. However, their potential is compromised by the challenge to analyze the huge datasets produced. Analysis of omics data is plagued by the curse of dimensionality, resulting in imprecise estimates of model parameters and performance. Moreover, the integration of omics data with other data sources is difficult to shoehorn into classical statistical models. This has resulted in ad hoc approaches to address specific problems. Results: We present a general approach to omics data analysis that alleviates these problems. By combining eScience and Bayesian methods, we retrieve scientific information and data from multiple sources and coherently incorporate them into large models. These models improve the accuracy of predictions and offer new insights into the underlying mechanisms. This "eScience-Bayes" approach is demonstrated in two proof-of-principle applications, one for breast cancer prognosis prediction from transcriptomic data and one for protein-protein interaction studies based on proteomic data. Conclusions: Bayesian statistics provide the flexibility to tailor statistical models to the complex data structures in omics biology as well as permitting coherent integration of multiple data sources. However, Bayesian methods are in general computationally demanding and require specification of possibly thousands of prior distributions. eScience can help us overcome these difficulties. The eScience-Bayes thus approach permits us to fully leverage on the advantages of Bayesian methods, resulting in models with improved predictive performance that gives more information about the underlying biological system.

  • 17.
    Eklund, Martin
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Wikberg, Jarl E S
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    The C1C2: a framework for simultaneous model selection and assessment2008In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 9, p. 360-Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: There has been recent concern regarding the inability of predictive modeling approaches to generalize to new data. Some of the problems can be attributed to improper methods for model selection and assessment. Here, we have addressed this issue by introducing a novel and general framework, the C1C2, for simultaneous model selection and assessment. The framework relies on a partitioning of the data in order to separate model choice from model assessment in terms of used data. Since the number of conceivable models in general is vast, it was also of interest to investigate the employment of two automatic search methods, a genetic algorithm and a brute-force method, for model choice. As a demonstration, the C1C2 was applied to simulated and real-world datasets. A penalized linear model was assumed to reasonably approximate the true relation between the dependent and independent variables, thus reducing the model choice problem to a matter of variable selection and choice of penalizing parameter. We also studied the impact of assuming prior knowledge about the number of relevant variables on model choice and generalization error estimates. The results obtained with the C1C2 were compared to those obtained by employing repeated K-fold cross-validation for choosing and assessing a model. RESULTS: The C1C2 framework performed well at finding the true model in terms of choosing the correct variable subset and producing reasonable choices for the penalizing parameter, even in situations when the independent variables were highly correlated and when the number of observations was less than the number of variables. The C1C2 framework was also found to give accurate estimates of the generalization error. Prior information about the number of important independent variables improved the variable subset choice but reduced the accuracy of generalization error estimates. Using the genetic algorithm worsened the model choice but not the generalization error estimates, compared to using the brute-force method. The results obtained with repeated K-fold cross-validation were similar to those produced by the C1C2 in terms of model choice, however a lower accuracy of the generalization error estimates was observed. CONCLUSION: The C1C2 framework was demonstrated to work well for finding the true model within a penalized linear model class and accurately assess its generalization error, even for datasets with many highly correlated independent variables, a low observation-to-variable ratio, and model assumption deviations. A complete separation of the model choice and the model assessment in terms of data used for each task improves the estimates of the generalization error.

  • 18.
    Exner, T. E.
    et al.
    Douglas Connect GmbH, Basel, Switzerland.
    Dokler, J.
    Douglas Connect GmbH, Basel, Switzerland.
    Bachler, D.
    Douglas Connect GmbH, Basel, Switzerland.
    Farcal, L. R.
    Douglas Connect GmbH, Basel, Switzerland.
    Evelo, C. T.
    Maastricht Univ, Dept Bioinformat, Maastricht, Netherlands.
    Willighagen, E.
    Maastricht Univ, Dept Bioinformat, Maastricht, Netherlands.
    Jennen, D. G. J.
    Maastricht Univ, Dept Toxicogen, Maastricht, Netherlands.
    Jabocs, M.
    Fraunhofer Gesell, SCAI Bioinformat, St Augustin, Germany.
    Doganis, P.
    Natl Tech Univ Athens, Athens, Greece.
    Sarimveis, H.
    Natl Tech Univ Athens, Athens, Greece.
    Lynch, I.
    Univ Birmingham, Birmingham, W Midlands, England.
    Gkoutos, G.
    Univ Birmingham, Birmingham, W Midlands, England.
    Kramer, S.
    Johannes Gutenberg Univ Mainz, Mainz, Germany.
    Notredame, C.
    Fundacio Ctr Regulacio Genom, Barcelona, Spain.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Jennings, P.
    Vrije Univ Amsterdam, Amsterdam, Netherlands.
    Dudgeon, T.
    Informat Matters Ltd, Oxford, England.
    Bols, F.
    Inst Natl Environm & Risques, Verneuil En Halatte, France.
    Hardy, B.
    Douglas Connect GmbH, Basel, Switzerland.
    OpenRiskNet, an open e-infrastructure to support data sharing, knowledge integration and in silico analysis and modelling in risk assessment2018In: Toxicology Letters, ISSN 0378-4274, E-ISSN 1879-3169, Vol. 295, p. S104-S104Article in journal (Other academic)
  • 19.
    Gauraha, Niharika
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Synergy Conformal PredictionManuscript (preprint) (Other academic)
    Abstract [en]

    Conformal Prediction is a machine learning methodology that produces valid prediction regions under mild conditions. Ensembles of conformal predictors have been proposed to improve the informational efficiency of inductive conformal predictors by combining p-values, however, the validity of such methods has been an open problem. We introduce Synergy Conformal Prediction which is an ensemble method that combines monotonic conformity scores, and is capable of producing valid prediction intervals. We study the applicability in two scenarios; where data is partitioned in order to reduce the total model training time, and where an ensemble of different machine learning methods is used to improve the overall efficiency of predictions. We evaluate the method on 10 data sets and show that the synergy conformal predictor produces valid predictions and improves informational efficiency as compared to inductive conformal prediction and existing ensemble methods. The results indicate that synergy conformal prediction has advantageous properties compared to contemporary approaches, and we also envision that it will have an impact in Big Data and federated environments.

  • 20.
    Gauraha, Niharika
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Synergy Conformal Prediction for RegressionManuscript (preprint) (Other academic)
    Abstract [en]

    Large and distributed data sets pose many challenges for machine learning, including requirements on computational resources and training time. One approach is to train multiple models in parallel on subsets of data and aggregate the resulting predictions. Large data sets can then be partitioned into smaller chunks, and for distributed data the need for pooling can be avoided. Combining results from conformal predictors using synergy rules has been shown to have advantageous properties for classification problems. In this paper we extend the methodology to regression problems, and we show that it produces valid and efficient predictors compared to inductive conformal predictors and cross-conformal predictors for 10 different data sets from the UCI machine learning repository using three different machine learning methods. The approach offers a straightforward and compelling alternative to pooling data, such as when working in distributed environments.

  • 21.
    Gauraha, Niharika
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Söderdahl, Fabian
    Statisticon AB.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Robust Knowledge Transfer in Learning Under Privileged Information FrameworkManuscript (preprint) (Other academic)
    Abstract [en]

    Learning Under Privileged Information (LUPI) enables the inclusion of additional (privileged) information when training machine learning models; data that is not available when making predictions. The methodology has been successfully applied to a diverse set of problems from various fields. SVM+ was the first realization of the LUPI paradigm which showed fast convergence but did not scale well. To address the scalability issue, knowledge  transfer  approaches were proposed to estimate privileged information from standard features in order to construct improved decision rules.Most available knowledge transfer methods use regression techniques and the same data for approximating the privileged features as for learning the transfer function.Inspired by the cross-validation approach, we propose to partition the training data into K folds and use each fold for learning a transfer function and the remaining folds for approximations of privileged features - we refer to this a robust knowledge transfer. We conduct empirical evaluation considering four different experimental setups using one synthetic and three real datasets. These experiments demonstrate that our approach yields improved accuracy as compared to LUPI with standard knowledge transfer.

  • 22.
    Georgieva, Polina
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Schaal, Wesley
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Exploring the usefulness of morphological profiling of cells to study toxicity mechanisms2018In: Toxicology Letters, ISSN 0378-4274, E-ISSN 1879-3169, Vol. 295, p. S203-S203Article in journal (Other academic)
  • 23.
    Gholami, Ali
    et al.
    Royal Institute of Technology.
    Laure, Erwin
    Royal Institute of Technology.
    Somogyi, Peter
    Karolinska Institutet.
    Spjuth, Ola
    Swedish e-Science Research Center and Department of Medical Epidemiology and Biostatistics, Karolinska Institute.
    Niazi, Salman
    Swedish Institute of Computer Science.
    Dowling, Jim
    Swedish Institute of Computer Science.
    Privacy-Preservation for Publishing Sample Availability Data with Personal Identifiers2015In: Journal of medical and bioengineering, ISSN 2301-3796, Vol. 4, no 2, p. 117-125Article in journal (Refereed)
    Abstract [en]

    Medical organizations collect, store and process vast amounts of sensitive information about patients. Easy access to this information by researchers is crucial to improving medical research, but in many institutions, cumbersome security measures and walled-gardens have created a situation where even information about what medical data is out there is not available. One of the main security challenges in this area, is enabling researchers to cross-link different medical studies, while preserving the privacy of the patients involved. In this paper, we introduce a privacy-preserving system for publishing sample availability data that allows researchers to make queries that crosscut different studies. That is, researchers can ask questions such as how many patients have had both diabetes and prostate cancer, where the diabetes and prostate cancer information originates from different clinical registries. We realize our solution by having a two-level anonymiziation mechanism, where our toolkit for publishing availability data first pseudonymizes personal identifiers and then anonymizes sensitive attributes. Our toolkit also includes a web-based server that stores the encrypted pseudonymized sample data and allows researchers to execute cross-linked queries across different study data. We believe that our toolkit contributes a first step to support the privacy preserving publication of data containing personal identifiers.

  • 24. Grafström, Roland C
    et al.
    Nymark, Penny
    Hongisto, Vesa
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Ceder, Rebecca
    Willighagen, Egon
    Hardy, Barry
    Kaski, Samuel
    Kohonen, Pekka
    Toward the Replacement of Animal Experiments through the Bioinformatics-driven Analysis of 'Omics' Data from Human Cell Cultures2015In: ATLA (Alternatives to Laboratory Animals), ISSN 0261-1929, Vol. 43, no 5, p. 325-332Article in journal (Refereed)
    Abstract [en]

    This paper outlines the work for which Roland Grafström and Pekka Kohonen were awarded the 2014 Lush Science Prize. The research activities of the Grafström laboratory have, for many years, covered cancer biology studies, as well as the development and application of toxicity-predictive in vitro models to determine chemical safety. Through the integration of in silico analyses of diverse types of genomics data (transcriptomic and proteomic), their efforts have proved to fit well into the recently-developed Adverse Outcome Pathway paradigm. Genomics analysis within state-of-the-art cancer biology research and Toxicology in the 21st Century concepts share many technological tools. A key category within the Three Rs paradigm is the Replacement of animals in toxicity testing with alternative methods, such as bioinformatics-driven analyses of data obtained from human cell cultures exposed to diverse toxicants. This work was recently expanded within the pan-European SEURAT-1 project (Safety Evaluation Ultimately Replacing Animal Testing), to replace repeat-dose toxicity testing with data-rich analyses of sophisticated cell culture models. The aims and objectives of the SEURAT project have been to guide the application, analysis, interpretation and storage of 'omics' technology-derived data within the service-oriented sub-project, ToxBank. Particularly addressing the Lush Science Prize focus on the relevance of toxicity pathways, a 'data warehouse' that is under continuous expansion, coupled with the development of novel data storage and management methods for toxicology, serve to address data integration across multiple 'omics' technologies. The prize winners' guiding principles and concepts for modern knowledge management of toxicological data are summarised. The translation of basic discovery results ranged from chemical-testing and material-testing data, to information relevant to human health and environmental safety.

  • 25. Guha, Rajarshi
    et al.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Willighagen, Egon
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Collaborative Cheminformatics Applications2011In: Collaborative Computational Technologies for Biomedical Research / [ed] Sean Ekins, Maggie A. Z. Hupcey, Antony J. Williams, Hoboken, N.J.: John Wiley & Sons, 2011Chapter in book (Other academic)
  • 26.
    Gupta, Anindya
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction.
    Harrison, Philip J.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Wieslander, Håkan
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction.
    Pielawski, Nicolas
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction.
    Kartasalo, Kimmo
    Partel, Gabriele
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction.
    Solorzano, Leslie
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction.
    Suveer, Amit
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction.
    Klemm, Anna H.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Sintorn, Ida-Maria
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Wählby, Carolina
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Deep Learning in Image Cytometry: A Review2019In: Cytometry Part A, ISSN 1552-4922, E-ISSN 1552-4930, Vol. 95, no 6, p. 366-380Article, review/survey (Refereed)
  • 27.
    Hardy, Barry
    et al.
    DouglasConnect.
    Apic, Gordana
    Cambridge Cell Networks.
    Carthew, Philip
    Unilever.
    Clark, Dominic
    EMBL-EBI.
    Cook, David
    AstraZeneca.
    Dix, Ian
    AstraZeneca.
    Escher, Sylvia
    Fraunhofer Institute for Toxicology & Experimental Medicine.
    Hastings, Janna
    EMBL-EBI.
    Heard, David J
    Novartis.
    Jeliazkova, Nina
    Ideaconsult.
    Judson, Philip
    Lhasa Ltd.
    Matis-Mitchell, Sherri
    AstraZeneca.
    Mitic, Dragana
    Cambridge Cell Networks.
    Myatt, Glenn
    Leadscope.
    Shah, Imran
    US EPA.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Tcheremenskaia, Olga
    Istituto Superiore di Sanita.
    Toldo, Luca
    Merck KGaA.
    Watson, David
    Lhasa Ltd.
    White, Andrew
    Unilever.
    Yang, Chihae
    Altamira.
    Toxicology Ontology Perspectives2012In: ALTEX. Alternatives zu Tierexperimenten, ISSN 0946-7785, Vol. 29, no 2, p. 139-156Article in journal (Refereed)
    Abstract [en]

    The field of predictive toxicology requires the development of open, public, computable, standardized toxicology vocabularies and ontologies to support the applications required by in silico, in vitro, and in vivo toxicology methods and related analysis and reporting activities. In this article we review ontology developments based on a set of perspectives showing how ontologies are being used in predictive toxicology initiatives and applications. Perspectives on resources and initiatives reviewed include OpenTox, eTOX, Pistoia Alliance, ToxWiz, Virtual Liver, EU-ADR, BEL, ToxML, and Bioclipse. We also review existing ontology developments in neighboring fields that can contribute to establishing an ontological framework for predictive toxicology. A significant set of resources is already available to provide a foundation for an ontological framework for 21st century mechanistic-based toxicology research. Ontologies such as ToxWiz provide a basis for application to toxicology investigations, whereas other ontologies under development in the biological, chemical, and biomedical communities could be incorporated in an extended future framework. OpenTox has provided a semantic web framework for the implementation of such ontologies into software applications and linked data resources. Bioclipse developers have shown the benefit of interoperability obtained through ontology by being able to link their workbench application with remote OpenTox web services. Although these developments are promising, an increased international coordination of efforts is greatly needed to develop a more unified, standardized, and open toxicology ontology framework.

  • 28.
    Harry, Barry
    et al.
    DouglasConnect.
    Apic, Gordana
    Cambridge Cell Networks.
    Carthew, Philip
    Unilever.
    Clark, Dominic
    EMBL-EBI.
    Cook, David
    AstraZeneca.
    Dix, Ian
    AstraZeneca.
    Escher, Sylvia
    Fraunhofer Institute for Toxicology & Experimental Medicine.
    Hastings, Janna
    EMBL-EBI.
    Heard, David J
    Novartis.
    Jeliazkova, Nina
    Ideaconsult.
    Judson, Philip
    Lhasa Ltd.
    Matis-Mitchell, Sherri
    AstraZeneca.
    Mitic, Dragana
    Cambridge Cell Networks.
    Myatt, Glenn
    Leadscope.
    Shah, Imran
    US EPA.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Tcheremenskaia, Olga
    Istituto Superiore di Sanita.
    Toldo, Luca
    Merck KGaA.
    Watson, David
    Lhasa Ltd.
    White, Andrew
    Unilever.
    Yang, Chihae
    Altamira.
    Food for thought...: A toxicology ontology roadmap2012In: ALTEX. Alternatives zu Tierexperimenten, ISSN 0946-7785, Vol. 29, no 2, p. 129-137Article in journal (Refereed)
    Abstract [en]

    Foreign substances can have a dramatic and unpredictable adverse effect on human health. In the development of new therapeutic agents, it is essential that the potential adverse effects of all candidates be identified as early as possible. The field of predictive toxicology strives to profile the potential for adverse effects of novel chemical substances before they occur, both with traditional in vivo experimental approaches and increasingly through the development of in vitro and computational methods which can supplement and reduce the need for animal testing. To be maximally effective, the field needs access to the largest possible knowledge base of previous toxicology findings, and such results need to be made available in such a fashion so as to be interoperable, comparable, and compatible with standard toolkits. This necessitates the development of open, public, computable, and standardized toxicology vocabularies and ontologies so as to support the applications required by in silico, in vitro, and in vivo toxicology methods and related analysis and reporting activities. Such ontology development will support data management, model building, integrated analysis, validation and reporting, including regulatory reporting and alternative testing submission requirements as required by guidelines such as the REACH legislation, leading to new scientific advances in a mechanistically-based predictive toxicology. Numerous existing ontology and standards initiatives can contribute to the creation of a toxicology ontology supporting the needs of predictive toxicology and risk assessment. Additionally, new ontologies are needed to satisfy practical use cases and scenarios where gaps currently exist. Developing and integrating these resources will require a well-coordinated and sustained effort across numerous stakeholders engaged in a public-private partnership. In this communication, we set out a roadmap for the development of an integrated toxicology ontology, harnessing existing resources where applicable. We describe the stakeholders’ requirements analysis from the academic and industry perspectives, timelines, and expected benefits of this initiative, with a view to engagement with the wider community.

  • 29.
    Herman, Stephanie
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Emami Khoonsari, Payam
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry.
    Aftab, Obaid
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Krishnan, Shibu
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry.
    Strömbom, Emil
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Larsson, Rolf
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Hammerling, Ulf
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Kultima, Kim
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry.
    Gustafsson, Mats G
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Mass spectrometry based metabolomics for in vitro systems pharmacology: pitfalls, challenges, and computational solutions.2017In: Metabolomics, ISSN 1573-3882, E-ISSN 1573-3890, Vol. 13, no 7, article id 79Article in journal (Refereed)
    Abstract [en]

    INTRODUCTION: Mass spectrometry based metabolomics has become a promising complement and alternative to transcriptomics and proteomics in many fields including in vitro systems pharmacology. Despite several merits, metabolomics based on liquid chromatography mass spectrometry (LC-MS) is a developing area that is yet attached to several pitfalls and challenges. To reach a level of high reliability and robustness, these issues need to be tackled by implementation of refined experimental and computational protocols.

    OBJECTIVES: This study illustrates some key pitfalls in LC-MS based metabolomics and introduces an automated computational procedure to compensate for them.

    METHOD: Non-cancerous mammary gland derived cells were exposed to 27 chemicals from four pharmacological classes plus a set of six pesticides. Changes in the metabolome of cell lysates were assessed after 24 h using LC-MS. A data processing pipeline was established and evaluated to handle issues including contaminants, carry over effects, intensity decay and inherent methodology variability and biases. A key component in this pipeline is a latent variable method called OOS-DA (optimal orthonormal system for discriminant analysis), being theoretically more easily motivated than PLS-DA in this context, as it is rooted in pattern classification rather than regression modeling.

    RESULT: The pipeline is shown to reduce experimental variability/biases and is used to confirm that LC-MS spectra hold drug class specific information.

    CONCLUSION: LC-MS based metabolomics is a promising methodology, but comes with pitfalls and challenges. Key difficulties can be largely overcome by means of a computational procedure of the kind introduced and demonstrated here. The pipeline is freely available on www.github.com/stephanieherman/MS-data-processing.

  • 30.
    Herman, Stephanie
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Emami Khoonsari, Payam
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry.
    Tolf, Andreas
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Neuroscience, Neurology.
    Steinmetz, Julia
    Zetterberg, Henrik
    Åkerfeldt, Torbjörn
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry.
    Jakobsson, Per-Johan
    Larsson, Anders
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Burman, Joachim
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Neuroscience, Neurology.
    Kultima, Kim
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry.
    Integration of magnetic resonance imaging and protein and metabolite CSF measurements to enable early diagnosis of secondary progressive multiple sclerosis.2018In: Theranostics, ISSN 1838-7640, E-ISSN 1838-7640, Vol. 8, no 16, p. 4477-4490Article in journal (Refereed)
    Abstract [en]

    Molecular networks in neurological diseases are complex. Despite this fact, contemporary biomarkers are in most cases interpreted in isolation, leading to a significant loss of information and power. We present an analytical approach to scrutinize and combine information from biomarkers originating from multiple sources with the aim of discovering a condensed set of biomarkers that in combination could distinguish the progressive degenerative phenotype of multiple sclerosis (SPMS) from the relapsing-remitting phenotype (RRMS).

    Methods: Clinical and magnetic resonance imaging (MRI) data were integrated with data from protein and metabolite measurements of cerebrospinal fluid, and a method was developed to sift through all the variables to establish a small set of highly informative measurements. This prospective study included 16 SPMS patients, 30 RRMS patients and 10 controls. Protein concentrations were quantitated with multiplexed fluorescent bead-based immunoassays and ELISA. The metabolome was recorded using liquid chromatography-mass spectrometry. Clinical follow-up data of the SPMS patients were used to assess disease progression and development of disability.

    Results: Eleven variables were in combination able to distinguish SPMS from RRMS patients with high confidence superior to any single measurement. The identified variables consisted of three MRI variables: the size of the spinal cord and the third ventricle and the total number of T1 hypointense lesions; six proteins: galectin-9, monocyte chemoattractant protein-1 (MCP-1), transforming growth factor alpha (TGF-α), tumor necrosis factor alpha (TNF-α), soluble CD40L (sCD40L) and platelet-derived growth factor AA (PDGF-AA); and two metabolites: 20β-dihydrocortisol (20β-DHF) and indolepyruvate. The proteins myelin basic protein (MBP) and macrophage-derived chemokine (MDC), as well as the metabolites 20β-DHF and 5,6-dihydroxyprostaglandin F1a (5,6-DH-PGF1), were identified as potential biomarkers of disability progression.

    Conclusion: Our study demonstrates, in a limited but well-defined and data-rich cohort, the importance and value of combining multiple biomarkers to aid diagnostics and track disease progression.

  • 31.
    Herman, Stephanie
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Niemelä, Valter
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Neuroscience, Neurology.
    Emami Khoonsari, Payam
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry.
    Sundblom, Jimmy
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Neuroscience, Neurosurgery.
    Burman, Joachim
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Neuroscience, Neurology.
    Landtblom, Anne-Marie
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Neuroscience, Neurology.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Nyholm, Dag
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Neuroscience, Neurology.
    Kultima, Kim
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry.
    Alterations in the tyrosine and phenylalanine pathways revealed by biochemical profiling in cerebrospinal fluid of Huntington's disease subjects2019In: Scientific Reports, ISSN 2045-2322, E-ISSN 2045-2322, Vol. 9, article id 4129Article in journal (Refereed)
    Abstract [en]

    Huntington's disease (HD) is a severe neurological disease leading to psychiatric symptoms, motor impairment and cognitive decline. The disease is caused by a CAG expansion in the huntingtin (HTT) gene, but how this translates into the clinical phenotype of HD remains elusive. Using liquid chromatography mass spectrometry, we analyzed the metabolome of cerebrospinal fluid (CSF) from premanifest and manifest HD subjects as well as control subjects. Inter-group differences revealed that the tyrosine metabolism, including tyrosine, thyroxine, L-DOPA and dopamine, was significantly altered in manifest compared with premanifest HD. These metabolites demonstrated moderate to strong associations to measures of disease severity and symptoms. Thyroxine and dopamine also correlated with the five year risk of onset in premanifest HD subjects. The phenylalanine and the purine metabolisms were also significantly altered, but associated less to disease severity. Decreased levels of lumichrome were commonly found in mutated HTT carriers and the levels correlated with the five year risk of disease onset in premanifest carriers. These biochemical findings demonstrates that the CSF metabolome can be used to characterize molecular pathogenesis occurring in HD, which may be essential for future development of novel HD therapies.

  • 32.
    Herman, Stephanie
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Åkerfeldt, Torbjörn
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Burman, Joachim
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Neuroscience, Neurology.
    Kultima, Kim
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry.
    Biochemical Differences in Cerebrospinal Fluid between Secondary Progressive and Relapsing-Remitting Multiple Sclerosis2019In: Cells, ISSN 2073-4409, Vol. 8, no 2, article id 84Article in journal (Refereed)
    Abstract [en]

    To better understand the pathophysiological differences between secondary progressive multiple sclerosis (SPMS) and relapsing-remitting multiple sclerosis (RRMS), and to identify potential biomarkers of disease progression, we applied high-resolution mass spectrometry (HRMS) to investigate the metabolome of cerebrospinal fluid (CSF). The biochemical differences were determined using partial least squares discriminant analysis (PLS-DA) and connected to biochemical pathways as well as associated to clinical and radiological measures. Tryptophan metabolism was significantly altered, with perturbed levels of kynurenate, 5-hydroxytryptophan, 5-hydroxyindoleacetate, and N-acetylserotonin in SPMS patients compared with RRMS and controls. SPMS patients had altered kynurenine compared with RRMS patients, and altered indole-3-acetate compared with controls. Regarding the pyrimidine metabolism, SPMS patients had altered levels of uridine and deoxyuridine compared with RRMS and controls, and altered thymine and glutamine compared with RRMS patients. Metabolites from the pyrimidine metabolism were significantly associated with disability, disease activity and brain atrophy, making them of particular interest for understanding the disease mechanisms and as markers of disease progression. Overall, these findings are of importance for the characterization of the molecular pathogenesis of SPMS and support the hypothesis that the CSF metabolome may be used to explore changes that occur in the transition between the RRMS and SPMS pathologies.

  • 33.
    Junaid, Muhammad
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Lapins, Maris
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Wikberg, Jarl
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Proteochemometric Modeling of the Susceptibility of Mutated Variants of the HIV-1 Virus to Reverse Transcriptase Inhibitors2010In: PLoS ONE, ISSN eISSN-1932-6203, Vol. 5, no 12, p. e14353-Article in journal (Refereed)
    Abstract [en]

    Background

    Reverse transcriptase is a major drug target in highly active antiretroviral therapy (HAART) against HIV, which typically comprises two nucleoside/nucleotide analog reverse transcriptase (RT) inhibitors (NRTIs) in combination with a non-nucleoside RT inhibitor or a protease inhibitor. Unfortunately, HIV is capable of escaping the therapy by mutating into drug-resistant variants. Computational models that correlate HIV drug susceptibilities to the virus genotype and to drug molecular properties might facilitate selection of improved combination treatment regimens.

    Methodology/Principal Findings

    We applied our earlier developed proteochemometric modeling technology to analyze HIV mutant susceptibility to the eight clinically approved NRTIs. The data set used covered 728 virus variants genotyped for 240 sequence residues of the DNA polymerase domain of the RT; 165 of these residues contained mutations; totally the data-set covered susceptibility data for 4,495 inhibitor-RT combinations. Inhibitors and RT sequences were represented numerically by 3D-structural and physicochemical property descriptors, respectively. The two sets of descriptors and their derived cross-terms were correlated to the susceptibility data by partial least-squares projections to latent structures. The model identified more than ten frequently occurring mutations, each conferring more than two-fold loss of susceptibility for one or several NRTIs. The most deleterious mutations were K65R, Q151M, M184V/I, and T215Y/F, each of them decreasing susceptibility to most of the NRTIs. The predictive ability of the model was estimated by cross-validation and by external predictions for new HIV variants; both procedures showed very high correlation between the predicted and actual susceptibility values (Q2 = 0.89 and Q2ext = 0.86). The model is available at www.hivdrc.org as a free web service for the prediction of the susceptibility to any of the clinically used NRTIs for any HIV-1 mutant variant.

    Conclusions/Significance

    Our results give directions how to develop approaches for selection of genome-based optimum combination therapy for patients harboring mutated HIV variants.

  • 34.
    Kensert, Alexander
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Alvarsson, Jonathan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Norinder, Ulf
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Evaluating parameters for ligand-based modeling with random forest on sparse data sets2018In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 10, article id 49Article in journal (Refereed)
    Abstract [en]

    Ligand-based predictive modeling is widely used to generate predictive models aiding decision making in e.g. drug discovery projects. With growing data sets and requirements on low modeling time comes the necessity to analyze data sets efficiently to support rapid and robust modeling. In this study we analyzed four data sets and studied the efficiency of machine learning methods on sparse data structures, utilizing Morgan fingerprints of different radii and hash sizes, and compared with molecular signatures descriptor of different height. We specifically evaluated the effect these parameters had on modeling time, predictive performance, and memory requirements using two implementations of random forest; Scikit-learn as well as FEST. We also compared with a support vector machine implementation. Our results showed that unhashed fingerprints yield significantly better accuracy than hashed fingerprints (p <= 0.05), with no pronounced deterioration in modeling time and memory usage. Furthermore, the fast execution and low memory usage of the FEST algorithm suggest that it is a good alternative for large, high dimensional sparse data. Both support vector machines and random forest performed equally well but results indicate that the support vector machine was better at using the extra information from larger values of the Morgan fingerprint's radius.

  • 35.
    Kensert, Alexander
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Harrison, Philip J
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Transfer Learning with Deep Convolutional Neural Networks for Classifying Cellular Morphological Changes2019In: SLAS discovery : advancing life sciences R & D, ISSN 2472-5552, Vol. 24, no 4, p. 466-475Article in journal (Refereed)
    Abstract [en]

    The quantification and identification of cellular phenotypes from high-content microscopy images has proven to be very useful for understanding biological activity in response to different drug treatments. The traditional approach has been to use classical image analysis to quantify changes in cell morphology, which requires several nontrivial and independent analysis steps. Recently, convolutional neural networks have emerged as a compelling alternative, offering good predictive performance and the possibility to replace traditional workflows with a single network architecture. In this study, we applied the pretrained deep convolutional neural networks ResNet50, InceptionV3, and InceptionResnetV2 to predict cell mechanisms of action in response to chemical perturbations for two cell profiling datasets from the Broad Bioimage Benchmark Collection. These networks were pretrained on ImageNet, enabling much quicker model training. We obtain higher predictive accuracy than previously reported, between 95% and 97%. The ability to quickly and accurately distinguish between different cell morphologies from a scarce amount of labeled data illustrates the combined benefit of transfer learning and deep convolutional neural networks for interrogating cell-based images.

  • 36.
    Kohonen, Pekka
    et al.
    Karolinska Institutet.
    Ceder, Rebecca
    Karolinska Institutet.
    Smit, Ines
    Karolinska Institutet.
    Vesa, Hongisto
    VTT Technical Research Centre of Finland.
    Glenn, Myatt
    Leadscope.
    Barry, Hardy
    Douglas connect.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Grafström, Roland
    Karolinska Institutet.
    Cancer Biology, Toxicology and Alternative Methods Development Go Hand-in-Hand2014In: Basic & Clinical Pharmacology & Toxicology, ISSN 1742-7835, E-ISSN 1742-7843, Vol. 115, no 1, p. 50-58Article, review/survey (Refereed)
    Abstract [en]

    Toxicological research faces the challenge of integrating knowledge from diverse fields and novel technological developments generally in the biological and medical sciences. We discuss herein the fact that the multiple facets of cancer research, including discovery related to mechanisms, treatment and diagnosis, overlap many up and coming interest areas in toxicology, including the need for improved methods and analysis tools. Common to both disciplines, in vitro and in silico methods serve as alternative investigation routes to animal studies. Knowledge on cancer development helps in understanding the relevance of chemical toxicity studies in cell models, and many bioinformatics-based cancer biomarker discovery tools are also applicable to computational toxicology. Robotics-aided cell-based high throughput screening, microscale immunostaining techniques, and gene expression profiling analyses are common tools in cancer research, and when sequentially combined, form a tiered approach to structured safety evaluation of thousands of environmental agents, novel chemicals or engineered nanomaterials. Comprehensive tumour data collections in databases have been translated into clinically useful data, and this concept serves as template for computer-driven evaluation of toxicity data into meaningful results. Future “cancer research-inspired knowledge management” of toxicological data will aid the translation of basic discovery results and chemicals- and materials-testing data to information relevant to human health and environmental safety.

  • 37.
    Laeeq, Ahmed
    et al.
    Royal Institute of Technology.
    Edlund, Åke
    Royal Institute of Technology.
    Laure, Erwin
    Royal Institute of Technology.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Using Iterative MapReduce for Parallel Virtual ScreeningIn: Journal of medical and bioengineering, ISSN 2301-3796Article in journal (Refereed)
    Abstract [en]

    MapReduce and its different implementations has been successfully used on commodity clusters for analysis of data for problems where the datasets becomes really huge. Virtual Screening is a technique in chemoinformatics used for Drug discovery by searching large libraries of molecule structures, making it a great candidate for MapReduce. However, in this study we used SVM based virtual screening which is resource demanding. Such virtual screening not only have huge datasets, but it is also compute expensive whose complexity can grow at least upto n2. Most SVM based applications use MPI, but MPI has its own limitations such as lack of fault tolerance and low productivity. This study shows that MapReduce can be used effectively for implementing SVM based virtual screening. The results illustrate that MapReduce performs quite well with the increasing nodes on the cluster. For experiments, we have used spark, an iterative MapReduce programming model. We have also provided the flow of program and the results to show the efficiency of iterative MapReduce.

  • 38.
    Lampa, Samuel
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Alvarsson, Jonathan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Arvidsson Mc Shane, Staffan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Berg, Arvid
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Ahlberg, Ernst
    Predictive Compound ADME & Safety, Drug Safety & Metabolism, AstraZeneca IMED Biotech Unit, Mölndal, Sweden.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Predicting off-target binding profiles with confidence using Conformal Prediction2018In: Frontiers in Pharmacology, ISSN 1663-9812, E-ISSN 1663-9812, Vol. 9, article id 1256Article in journal (Refereed)
    Abstract [en]

    Ligand-based models can be used in drug discovery to obtain an early indication of potential off-target interactions that could be linked to adverse effects. Another application is to combine such models into a panel, allowing to compare and search for compounds with similar profiles. Most contemporary methods and implementations however lack valid measures of confidence in their predictions, and only providing point predictions. We here describe the use of conformal prediction for predicting off-target interactions with models trained on data from 31 targets in the ExCAPE dataset, selected for their utility in broad early hazard assessment. Chemicals were represented by the signature molecular descriptor and support vector machines were used as the underlying machine learning method. By using conformal prediction, the results from predictions come in the form of confidence p-values for each class. The full pre-processing and model training process is openly available as scientific workflows on GitHub, rendering it fully reproducible. We illustrate the usefulness of the methodology on a set of compounds extracted from DrugBank. The resulting models are published online and are available via a graphical web interface and an OpenAPI interface for programmatic access.

  • 39.
    Lampa, Samuel
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Alvarsson, Jonathan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles2016In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 8, article id 67Article in journal (Refereed)
    Abstract [en]

    Predictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud computing are required, adding to the existing challenges of fault-tolerant automation. Workflow management systems can aid in many of these challenges, but the currently available systems are lacking in the functionality needed to enable agile and flexible predictive modelling. We here present an approach inspired by elements of the flow-based programming paradigm, implemented as an extension of the Luigi system which we name SciLuigi. We also discuss the experiences from using the approach when modelling a large set of biochemical interactions using a shared computer cluster.

  • 40.
    Lampa, Samuel
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab. Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Stockholm University, Stockholm, Sweden.
    Dahlö, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Alvarsson, Jonathan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    SciPipe: A workflow library for agile development of complex and dynamic bioinformatics pipelines2019In: GigaScience, ISSN 2047-217X, E-ISSN 2047-217X, Vol. 8, no 5, article id giz044Article in journal (Refereed)
    Abstract [en]

    Background: The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation, and aid reproducibility of analyses. Many contemporary workflow tools are specialized or not designed for highly complex workflows, such as with nested loops, dynamic scheduling, and parametrization, which is common in, e.g., machine learning. Findings: SciPipe is a workflow programming library implemented in the programming language Go, for managing complex and dynamic pipelines in bioinformatics, cheminformatics, and other fields. SciPipe helps in particular with workflow constructs common in machine learning, such as extensive branching, parameter sweeps, and dynamic scheduling and parametrization of downstream tasks. SciPipe builds on flow-based programming principles to support agile development of workflows based on a library of self-contained, reusable components. It supports running subsets of workflows for improved iterative development and provides a data-centric audit logging feature that saves a full audit trace for every output file of a workflow, which can be converted to other formats such as HTML, TeX, and PDF on demand. The utility of SciPipe is demonstrated with a machine learning pipeline, a genomics, and a transcriptomics pipeline. Conclusions: SciPipe provides a solution for agile development of complex and dynamic pipelines, especially in machine learning, through a flexible application programming interface suitable for scientists used to programming or scripting.

  • 41.
    Lampa, Samuel
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Dahlö, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Alvarsson, Jonathan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    SciPipe-Turning Scientific Workflows into Computer Programs2019In: Computing in science & engineering (Print), ISSN 1521-9615, E-ISSN 1558-366X, Vol. 21, no 3, p. 109-113Article in journal (Refereed)
  • 42.
    Lampa, Samuel
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Dahlö, Martin
    Uppsala University, Science for Life Laboratory, SciLifeLab.
    Olason, Pall I
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology.
    Hagberg, Jonas
    Uppsala University, Science for Life Laboratory, SciLifeLab.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data2013In: GigaScience, ISSN 2047-217X, E-ISSN 2047-217X, Vol. 2, no 1, p. 1-10Article in journal (Refereed)
    Abstract [en]

    Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. Here we describe UPPNEX, an implementation of such an infrastructure, tailored to the needs of data storage and analysis of NGS data in Sweden serving various labs and multiple instruments from the major sequencing technology platforms. UPPNEX comprises resources for high-performance computing, large-scale and high-availability storage, an extensive bioinformatics software suite, up-to-date reference genomes and annotations, a support function with system and application experts as well as a web portal and support ticket system. UPPNEX applications are numerous and diverse, and include whole genome-, de novo- and exome sequencing, targeted resequencing, SNP discovery, RNASeq, and methylation analysis. There are over 300 projects that utilize UPPNEX and include large undertakings such as the sequencing of the flycatcher and Norwegian spruce. We describe the strategic decisions made when investing in hardware, setting up maintenance and support, allocating resources, and illustrate major challenges such as managing data growth. We conclude with summarizing our experiences and observations with UPPNEX to date, providing insights into the successful and less successful decisions made.

  • 43.
    Lampa, Samuel
    et al.
    Uppsala University, Science for Life Laboratory, SciLifeLab.
    Hagberg, Jonas
    Uppsala University, Science for Life Laboratory, SciLifeLab.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    UPPNEX: A solution for next generation sequencing data management and analysis2012In: EMBnet.journal, ISSN 2226-6089, Vol. 17, no Suppl. B, p. 44-44Article in journal (Other academic)
  • 44.
    Lampa, Samuel
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Willighagen, Egon
    Maastricht Univ, Dept Bioinformat BiGCaT, NUTRIM, POB 616,UNS50 Box 19, NL-6200 MD Maastricht, Netherlands.
    Kohonen, Pekka
    Karolinska Inst, Inst Environm Med, SE-17177 Stockholm, Sweden.; Misvik Biol Oy, Div Toxicol, Turku, Finland. .
    King, Ali
    FanDuel Inc, Edinburgh, Midlothian, Scotland.
    Vrandečić, Denny
    Google Inc, 345 Spear St, San Francisco, CA USA.
    Grafström, Roland
    Karolinska Inst, Inst Environm Med, SE-17177 Stockholm, Sweden.; Misvik Biol Oy, Div Toxicol, Turku, Finland..
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    RDFIO: extending Semantic MediaWiki for interoperable biomedical data management2017In: Journal of Biomedical Semantics, ISSN 2041-1480, E-ISSN 2041-1480, Vol. 8, article id 35Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Biological sciences are characterised not only by an increasing amount but also the extreme complexity of its data. This stresses the need for efficient ways of integrating these data in a coherent description of biological systems. In many cases, biological data needs organization before integration. This is not seldom a collaborative effort, and it is thus important that tools for data integration support a collaborative way of working. Wiki systems with support for structured semantic data authoring, such as Semantic MediaWiki, provide a powerful solution for collaborative editing of data combined with machine-readability, so that data can be handled in an automated fashion in any downstream analyses. Semantic MediaWiki lacks a built-in data import function though, which hinders efficient round-tripping of data between interoperable Semantic Web formats such as RDF and the internal wiki format.

    RESULTS: To solve this deficiency, the RDFIO suite of tools is presented, which supports importing of RDF data into Semantic MediaWiki, with metadata needed to export it again in the same RDF format, or ontology. Additionally, the new functionality enables mash-ups of automated data imports combined with manually created data presentations. The application of the suite of tools is demonstrated by importing drug discovery related data about rare diseases from Orphanet and acid dissociation constants from Wikidata. The RDFIO suite of tools is freely available for download via pharmb.io/project/rdfio .

    CONCLUSIONS: Through a set of biomedical demonstrators, it is demonstrated how the new functionality enables a number of usage scenarios where the interoperability of SMW and the wider Semantic Web is leveraged for biomedical data sets, to create an easy to use and flexible platform for exploring and working with biomedical data.

  • 45.
    Lapins, Maris
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Arvidsson, Staffan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Lampa, Samuel
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Berg, Arvid
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Schaal, Wesley
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Alvarsson, Jonathan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    A confidence predictor for logD using conformal regression and a support-vector machine2018In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 10, no 1, article id 17Article in journal (Refereed)
    Abstract [en]

    Lipophilicity is a major determinant of ADMET properties and overall suitability of drug candidates. We have developed large-scale models to predict water-octanol distribution coefficient (logD) for chemical compounds, aiding drug discovery projects. Using ACD/logD data for 1.6 million compounds from the ChEMBL database, models are created and evaluated by a support-vector machine with a linear kernel using conformal prediction methodology, outputting prediction intervals at a specified confidence level. The resulting model shows a predictive ability of [Formula: see text] and with the best performing nonconformity measure having median prediction interval of [Formula: see text] log units at 80% confidence and [Formula: see text] log units at 90% confidence. The model is available as an online service via an OpenAPI interface, a web page with a molecular editor, and we also publish predictive values at 90% confidence level for 91 M PubChem structures in RDF format for download and as an URI resolver service.

  • 46.
    Lapins, Maris
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Prusis, Peteris
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Wikberg, Jarl E S
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Proteochemometric modeling of HIV protease susceptibility2008In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 9, p. 181-Article in journal (Refereed)
    Abstract [en]

    BACKGROUND

    A major obstacle in treatment of HIV is the ability of the virus to mutate rapidly into drug-resistant variants. A method for predicting the susceptibility of mutated HIV strains to antiviral agents would provide substantial clinical benefit as well as facilitate the development of new candidate drugs. Therefore, we used proteochemometrics to model the susceptibility of HIV to protease inhibitors in current use, utilizing descriptions of the physico-chemical properties of mutated HIV proteases and 3D structural property descriptions for the protease inhibitors. The descriptions were correlated to the susceptibility data of 828 unique HIV protease variants for seven protease inhibitors in current use; the data set comprised 4792 protease-inhibitor combinations.

    RESULTS

    The model provided excellent predictability (R2 = 0.92, Q2 = 0.87) and identified general and specific features of drug resistance. The model's predictive ability was verified by external prediction in which the susceptibilities to each one of the seven inhibitors were omitted from the data set, one inhibitor at a time, and the data for the six remaining compounds were used to create new models. This analysis showed that the over all predictive ability for the omitted inhibitors was Q2 inhibitors = 0.72.

    CONCLUSION

    Our results show that a proteochemometric approach can provide generalized susceptibility predictions for new inhibitors. Our proteochemometric model can directly analyze inhibitor-protease interactions and facilitate treatment selection based on viral genotype. The model is available for public use, and is located at HIV Drug Research Centre.

  • 47.
    Lapins, Maris
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Worachartcheewan, Apilak
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Georgiev, Valentin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Prachayasittikul, Virapong
    Nantasenamat, Chanin
    Wikberg, Jarl E. S.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    A Unified Proteochemometric Model for Prediction of Inhibition of Cytochrome P450 Isoforms2013In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 8, no 6, p. e66566-Article in journal (Refereed)
    Abstract [en]

    A unified proteochemometric (PCM) model for the prediction of the ability of drug-like chemicals to inhibit five major drug metabolizing CYP isoforms (i.e. CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4) was created and made publicly available under the Bioclipse Decision Support open source system at www.cyp450model.org. In regards to the proteochemometric modeling we represented the chemical compounds by molecular signature descriptors and the CYP-isoforms by alignment-independent description of composition and transition of amino acid properties of their protein primary sequences. The entire training dataset contained 63 391 interactions and the best PCM model was obtained using signature descriptors of height 1, 2 and 3 and inducing the model with a support vector machine. The model showed excellent predictive ability with internal AUC = 0.923 and an external AUC = 0.940, as evaluated on a large external dataset. The advantage of PCM models is their extensibility making it possible to extend our model for new CYP isoforms and polymorphic CYP forms. A key benefit of PCM is that all proteins are confined in one single model, which makes it generally more stable and predictive as compared with single target models. The inclusion of the model in Bioclipse Decision Support makes it possible to make virtual instantaneous predictions (∼100 ms per prediction) while interactively drawing or modifying chemical structures in the Bioclipse chemical structure editor.

  • 48.
    Novella, Jon Ander
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Emami Khoonsari, Payam
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry.
    Herman, Stephanie
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Whitenack, Daniel
    Capuccini, Marco
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Burman, Joachim
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Neuroscience, Neurology.
    Kultima, Kim
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Chemistry.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Container-based bioinformatics with Pachyderm2019In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 35, p. 839-846Article in journal (Refereed)
  • 49.
    O'Boyle, Noel
    et al.
    University College Cork.
    Guha, Rajarshi
    NIH Center for Translational Therapeutic.
    Willighagen, Egon
    Karolinska Institutet.
    Adams, Samuel
    University of Cambridge.
    Alvarsson, Jonathan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Bradley, Jean-Claude
    Drexel University.
    Filippov, Igor
    NCI-Frederick.
    Hansson, Robert
    St. Olaf College.
    Hanwell, Marcus
    Kitware, Inc.
    Hutchison, Geoffrey
    University of Pittsburg.
    James, Craig
    eMolecules Inc.
    Jeliazkova, Nina
    Ideaconsult Ltd.
    Lang, Andrew
    Oral Roberts University.
    Langner, Karol
    Leiden University.
    Lonie, David
    State University of New York at Buffalo.
    Lowe, Daniel
    University of Cambridge.
    Pansanel, Jerome
    Université de Strasbourg.
    Pavlov, Dmitry
    GGA Software Service.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Steinbeck, Christoph
    European Bioinformatics Institute.
    Tenderholt, Adam
    University of Washington.
    Thiesen, Kevin
    Chemlabs.
    Murray-Rust, Peter
    University of Cambridge.
    Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on2011In: Journal of Cheminformatics, ISSN 1758-2946, Vol. 3, p. 37-Article in journal (Refereed)
    Abstract [en]

    Background: The Blue Obelisk movement was established in 2005 as a response to the lack of Open Data,Open Standards and Open Source (ODOSOS) in chemistry. It aims to make it easier to carry out chemistryresearch by promoting interoperability between chemistry software, encouraging cooperation between OpenSource developers, and developing community resources and Open Standards.

    Results: This contribution looks back on the work carried out by the Blue Obelisk in the past 5 years and surveysprogress and remaining challenges in the areas of Open Data, Open Standards, and Open Source in chemistry.

    Conclusions: We show that the Blue Obelisk has been very successful in bringing together researchers anddevelopers with common interests in ODOSOS, leading to development of many useful resources freely availableto the chemistry community

  • 50.
    Oki, Noffisat
    et al.
    Douglas Connect GmbH, Basel, Switzerland.
    Exner, Thomas
    Douglas Connect GmbH, Basel, Switzerland.
    Kramer, Stefan
    Johannes Gutenberg Univ Mainz, Mainz, Germany.
    Notredame, Cedric
    Fundacio Ctr Regulacio Genom, Barcelona, Spain.
    Jennen, Danyel
    Univ Maastricht, Maastricht, Netherlands.
    Gkoutos, Georgios
    Univ Birmingham, Birmingham, W Midlands, England.
    Sarimveis, Haralambos
    Natl Tech Univ Athens, Athens, Greece.
    Jacobs, Marc
    Fraunhofer Gesell Foerderung Angewandten Forsch E, Munich, Germany.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Dudgeon, Tim
    Informat Matters Ltd, Kidlington, England.
    Bois, Frederic
    Inst Natl Environm & Risques, Verneuil En Halatte, France.
    Jennings, Paul
    Vrije Univ Amsterdam, Amsterdam, Netherlands.
    Hardy, Barry
    Douglas Connect GmbH, Basel, Switzerland.
    OpenRiskNet, an open e-infrastructure to support data sharing, knowledge integration, in silico analysis and modelling in risk assessment2018In: Abstract of Papers of the American Chemical Society, ISSN 0065-7727, Vol. 255Article in journal (Other academic)
12 1 - 50 of 93
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf