uu.seUppsala University Publications
Change search
Refine search result
12 1 - 50 of 67
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Oldest first
  • Newest first
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Ahlberg, Ernst
    et al.
    AstraZeneca Innovat Med & Early Dev, Drug Safety & Metab, Molndal, Sweden..
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Hasselgren, Catrin
    Univ New Mexico, Internal Med, Albuquerque, NM 87131 USA..
    Carlsson, Lars
    AstraZeneca Innovat Med & Early Dev, Drug Safety & Metab, Molndal, Sweden..
    Interpretation of Conformal Prediction Classification Models2015In: STATISTICAL LEARNING AND DATA SCIENCES, 2015, 323-334 p.Conference paper (Refereed)
    Abstract [en]

    We present a method for interpretation of conformal prediction models. The discrete gradient of the largest p-value is calculated with respect to object space. A criterion is applied to identify the most important component of the gradient and the corresponding part of the object is visualized. The method is exemplified with data from drug discovery relating chemical compounds to mutagenicity. Furthermore, a comparison is made to already established important subgraphs with respect to mutagenicity and this initial assessment shows very useful results with respect to interpretation of a conformal predictor.

  • 2.
    Alvarsson, Jonathan
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Andersson, Claes
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Larsson, Rolf
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Clinical Pharmacology.
    Wikberg, Jarl
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Brunn: an open source laboratory information system for microplates with a graphical plate layout design process2011In: BMC Bioinformatics, ISSN 1471-2105, Vol. 12, no 1, 179Article in journal (Refereed)
    Abstract [en]

    Background:

    Compound profiling and drug screening generates large amounts of data and is generally based on microplate assays. Current information systems used for handling this are mainly commercial, closed source, expensive, and heavyweight and there is a need for a flexible lightweight open system for handling plate design, and validation and preparation of data.

    Results:

    A Bioclipse plugin consisting of a client part and a relational database was constructed. A multiple-step plate layout point-and-click interface was implemented inside Bioclipse. The system contains a data validation step, where outliers can be removed, and finally a plate report with all relevant calculated data, including dose-response curves.

    Conclusions:

    Brunn is capable of handling the data from microplate assays. It can create dose-response curves and calculate IC50 values. Using a system of this sort facilitates work in the laboratory. Being able to reuse already constructed plates and plate layouts by starting out from an earlier step in the plate layout design process saves time and cuts down on error sources.

  • 3.
    Alvarsson, Jonathan
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Andersson, Claes
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Carlsson, Lars
    AstraZeneca R&D.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Wikberg, Jarl E. S.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Benchmarking Study of Parameter Variation When Using Signature Fingerprints Together with Support Vector Machines2014In: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, no 11, 3211-3217 p.Article in journal (Refereed)
    Abstract [en]

    QSAR modeling using molecular signatures and support vector machines with a radial basis function is increasingly used for virtual screening in the drug discovery field. This method has three free parameters: C, ?, and signature height. C is a penalty parameter that limits overfitting, ? controls the width of the radial basis function kernel, and the signature height determines how much of the molecule is described by each atom signature. Determination of optimal values for these parameters is time-consuming. Good default values could therefore save considerable computational cost. The goal of this project was to investigate whether such default values could be found by using seven public QSAR data sets spanning a wide range of end points and using both a bit version and a count version of the molecular signatures. On the basis of the experiments performed, we recommend a parameter set of heights 0 to 2 for the count version of the signature fingerprints and heights 0 to 3 for the bit version. These are in combination with a support vector machine using C in the range of 1 to 100 and gamma in the range of 0.001 to 0.1. When data sets are small or longer run times are not a problem, then there is reason to consider the addition of height 3 to the count fingerprint and a wider grid search. However, marked improvements should not be expected.

  • 4.
    Alvarsson, Jonathan
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Engkvist, Ola
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Carlsson, Lars
    Wikberg, Jarl E. S.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Noeske, Tobias
    Ligand-Based Target Prediction with Signature Fingerprints2014In: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, no 10, 2647-2653 p.Article in journal (Refereed)
    Abstract [en]

    When evaluating a potential drug candidate it is desirable to predict target interactions in silico prior to synthesis in order to assess, e.g., secondary pharmacology. This can be done by looking at known target binding profiles of similar compounds using chemical similarity searching. The purpose of this study was to construct and evaluate the performance of chemical fingerprints based on the molecular signature descriptor for performing target binding predictions. For the comparison we used the area under the receiver operating characteristics curve (AUC) complemented with net reclassification improvement (NRI). We created two open source signature fingerprints, a bit and a count version, and evaluated their performance compared to a set of established fingerprints with regards to predictions of binding targets using Tanimoto-based similarity searching on publicly available data sets extracted from ChEMBL. The results showed that the count version of the signature fingerprint performed on par with well-established fingerprints such as ECFP. The count version outperformed the bit version slightly; however, the count version is more complex and takes more computing time and memory to run so its usage should probably be evaluated on a case-by-case basis. The NRI based tests complemented the AUC based ones and showed signs of higher power.

  • 5.
    Alvarsson, Jonathan
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Lampa, Samuel
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Schaal, Wesley
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Andersson, Claes
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Wikberg, Jarl E. S.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Large-scale ligand-based predictive modelling using support vector machines2016In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 8, 39Article in journal (Refereed)
    Abstract [en]

    The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse.

  • 6.
    Ameur, Adam
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Yankovski, Vladimir
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Enroth, Stefan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Komorowski, Jan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    The LCB Data Warehouse2006In: Bioinformatics, ISSN 1367-4803, E-ISSN 1460-2059, Vol. 22, no 8, 1024-1026 p.Article in journal (Refereed)
    Abstract [en]

    The Linnaeus Centre for Bioinformatics Data Warehouse (LCB-DWH) is a web-based infrastructure for reliable and secure microarray gene expression data management and analysis that provides an online service for the scientific community. The LCB-DWH is an effort towards a complete system for storage (using the BASE system), analysis and publication of microarray data. Important features of the system include: access to established methods within R/Bioconductor for data analysis, built-in connection to the Gene Ontology database and a scripting facility for automatic recording and re-play of all the steps of the analysis. The service is up and running on a high performance server. At present there are more than 150 registered users.

  • 7.
    Arvidsson, Staffan
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Carlsson, Lars
    AstraZeneca R&D.
    Paulo, Toccaceli
    Royal Holloway University of London.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Prediction of Metabolic Transformations using Cross Venn-ABERS Predictors2017In: Conformal and Probabilistic Prediction with Applications (COPA) 2017 / [ed] Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, Harris Papadopoulos, 2017, Vol. 60, 118-131 p.Conference paper (Refereed)
    Abstract [en]

    Prediction of drug metabolism is an important topic in the drug discovery process, and we here present a study using probabilistic predictions applying Cross Venn-ABERS Predictors (CVAPs) on data for site-of-metabolism. We used a dataset of 73599 biotransformations, applied SMIRKS to define biotransformations of interest and constructed five datasets where chemical structures were represented using signatures descriptors. The results show that CVAP produces well-calibrated predictions for all datasets with good predictive capability, making CVAP an interesting method for further exploration in drug discovery applications.

  • 8.
    Capuccini, Marco
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Ahmed, Laeeq
    Schaal, Wesley
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Laure, Erwin
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Large-scale virtual screening on public cloud resources with Apache Spark2017In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 9, 15Article in journal (Refereed)
  • 9.
    Capuccini, Marco
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Carlsson, Lars
    Norinder, Ulf
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Conformal prediction in Spark: Large-scale machine learning with confidence2015In: Proc. 2nd International Symposium on Big Data Computing, Los Alamitos, CA: IEEE Computer Society, 2015, 61-67 p.Conference paper (Refereed)
  • 10.
    Carlsson, Lars
    et al.
    Safety Assessment, AstraZeneca Research & Development.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Adams, Samuel
    Unilever Centre for Molecular Sciences Informatics, University Chemical Laboratory, Cambridge.
    Glen, Robert
    Unilever Centre for Molecular Sciences Informatics, University Chemical Laboratory, Cambridge.
    Boyer, Scott
    Safety Assessment, AstraZeneca Research & Development.
    Use of Historic Metabolic Biotransformation Data as a Means of Anticipating Metabolic Sites Using MetaPrint2D and Bioclipse2010In: BMC Bioinformatics, ISSN 1471-2105, Vol. 11, 362- p.Article in journal (Refereed)
    Abstract [en]

    Background: Predicting metabolic sites is important in the drug discovery process to aid in rapid compound optimisation. No interactive tool exists and most of the useful tools are quite expensive.Results: Here a fast and reliable method to analyse ligands and visualise potential metabolic sites is presented which is based on annotated metabolic data, described by circular fingerprints. The method is available via the graphical workbench Bioclipse, which is equipped with advanced features in cheminformatics.Conclusions: Due to the speed of predictions (less than 50 ms per molecule), scientists can get real time decision support when editing chemical structures. Bioclipse is a rich client, which means that all calculations are performed on the local computer and do not require network connection. Bioclipse and MetaPrint2D are free for all users, released under open source licenses, and available from http://www.bioclipse.net.

  • 11.
    Carlsson, Lars
    et al.
    AstraZeneca R&D.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Boyer, Scott
    AstraZeneca R&D.
    Model building in Bioclipse Decision Support applied to open datasets2012In: Toxicology Letters, ISSN 0378-4274, E-ISSN 1879-3169, Vol. 211, no Suppl., S62- p.Article in journal (Refereed)
    Abstract [en]

    Bioclipse Decision Support (DS) is a system capable of building predictive models of any collection of SAR data, and making them available in a simple user interface based on Bioclipse (www.bioclipse.net).

    The method is fast and uses Faulon Signatures as chemical descriptors together with a Support Vector Machine algorithm for QSAR model building. A key feature is the capability to visualize and interpret results by highlighting the substructures which contributed most to the prediction. This, together with very fast predictions, allows for editing chemical structures with instantly updated results.

    We here present the results from applying Bioclipse Decision Support to several open QSAR data sets, including endpoints from OpenTox and PubChem. The results show how to extract data from the sources and to build models which can be integrated with user specific models.

  • 12. Claesson, Alf
    et al.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    On Mechanisms of Reactive Metabolite Formation from Drugs2013In: Mini-Reviews in medical chemistry, ISSN 1389-5575, Vol. 13, no 5, 720-729 p.Article in journal (Refereed)
    Abstract [en]

    Idiosyncratic adverse drug reactions (IADRs) cause a broad range of clinically severe conditions of which drug induced liver injury (DILI) in particular is one of the most frequent causes of safety-related drug withdrawals. The underlying cause is almost invariably formation of reactive metabolites (RM) which by attacking macromolecules induce organ injuries. Attempts are being made in the pharmaceutical industry to lower the risk of selecting unfit compounds as clinical candidates. Approaches vary but do not seem to be overly successful at the initial design/synthesis stage. We review here the most frequent categories of mechanisms for RM formation and propose that many cases of RMs encountered within early ADME screening can be foreseen by applying chemical and metabolic knowledge. We also mention a web tool, SpotRM, which can be used for efficient look-up and learning about drugs that have recognized IADRs likely caused by RM formation.

  • 13.
    Dahlö, Martin
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Haziza, Frédéric
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    Kallio, Aleksi
    Korpelainen, Eija
    Bongcam-Rudloff, Erik
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    BioImg.org: A catalog of virtual machine images for the life sciences2015In: Bioinformatics and Biology Insights, ISSN 1177-9322, E-ISSN 1177-9322, Vol. 9, 125-128 p.Article in journal (Refereed)
    Abstract [en]

    Virtualization is becoming increasingly important in bioscience, enabling assembly and provisioning of complete computer setups, including operating system, data, software, and services packaged as virtual machine images (VMIs). We present an open catalog of VMIs for the life sciences, where scientists can share information about images and optionally upload them to a server equipped with a large file system and fast Internet connection. Other scientists can then search for and download images that can be run on the local computer or in a cloud computing environment, providing easy access to bioinformatics environments. We also describe applications where VMIs aid life science research, including distributing tools and data, supporting reproducible analysis, and facilitating education.

  • 14.
    Eklund, Martin
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Wikberg, Jarl
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    An eScience-Bayes strategy for analyzing omics data2010In: BMC Bioinformatics, ISSN 1471-2105, Vol. 11, 282- p.Article in journal (Refereed)
    Abstract [en]

    Background: The omics fields promise to revolutionize our understanding of biology and biomedicine. However, their potential is compromised by the challenge to analyze the huge datasets produced. Analysis of omics data is plagued by the curse of dimensionality, resulting in imprecise estimates of model parameters and performance. Moreover, the integration of omics data with other data sources is difficult to shoehorn into classical statistical models. This has resulted in ad hoc approaches to address specific problems. Results: We present a general approach to omics data analysis that alleviates these problems. By combining eScience and Bayesian methods, we retrieve scientific information and data from multiple sources and coherently incorporate them into large models. These models improve the accuracy of predictions and offer new insights into the underlying mechanisms. This "eScience-Bayes" approach is demonstrated in two proof-of-principle applications, one for breast cancer prognosis prediction from transcriptomic data and one for protein-protein interaction studies based on proteomic data. Conclusions: Bayesian statistics provide the flexibility to tailor statistical models to the complex data structures in omics biology as well as permitting coherent integration of multiple data sources. However, Bayesian methods are in general computationally demanding and require specification of possibly thousands of prior distributions. eScience can help us overcome these difficulties. The eScience-Bayes thus approach permits us to fully leverage on the advantages of Bayesian methods, resulting in models with improved predictive performance that gives more information about the underlying biological system.

  • 15.
    Eklund, Martin
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Wikberg, Jarl E S
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    The C1C2: a framework for simultaneous model selection and assessment2008In: BMC Bioinformatics, ISSN 1471-2105, Vol. 9, 360- p.Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: There has been recent concern regarding the inability of predictive modeling approaches to generalize to new data. Some of the problems can be attributed to improper methods for model selection and assessment. Here, we have addressed this issue by introducing a novel and general framework, the C1C2, for simultaneous model selection and assessment. The framework relies on a partitioning of the data in order to separate model choice from model assessment in terms of used data. Since the number of conceivable models in general is vast, it was also of interest to investigate the employment of two automatic search methods, a genetic algorithm and a brute-force method, for model choice. As a demonstration, the C1C2 was applied to simulated and real-world datasets. A penalized linear model was assumed to reasonably approximate the true relation between the dependent and independent variables, thus reducing the model choice problem to a matter of variable selection and choice of penalizing parameter. We also studied the impact of assuming prior knowledge about the number of relevant variables on model choice and generalization error estimates. The results obtained with the C1C2 were compared to those obtained by employing repeated K-fold cross-validation for choosing and assessing a model. RESULTS: The C1C2 framework performed well at finding the true model in terms of choosing the correct variable subset and producing reasonable choices for the penalizing parameter, even in situations when the independent variables were highly correlated and when the number of observations was less than the number of variables. The C1C2 framework was also found to give accurate estimates of the generalization error. Prior information about the number of important independent variables improved the variable subset choice but reduced the accuracy of generalization error estimates. Using the genetic algorithm worsened the model choice but not the generalization error estimates, compared to using the brute-force method. The results obtained with repeated K-fold cross-validation were similar to those produced by the C1C2 in terms of model choice, however a lower accuracy of the generalization error estimates was observed. CONCLUSION: The C1C2 framework was demonstrated to work well for finding the true model within a penalized linear model class and accurately assess its generalization error, even for datasets with many highly correlated independent variables, a low observation-to-variable ratio, and model assumption deviations. A complete separation of the model choice and the model assessment in terms of data used for each task improves the estimates of the generalization error.

  • 16.
    Gholami, Ali
    et al.
    Royal Institute of Technology.
    Laure, Erwin
    Royal Institute of Technology.
    Somogyi, Peter
    Karolinska Institutet.
    Spjuth, Ola
    Swedish e-Science Research Center and Department of Medical Epidemiology and Biostatistics, Karolinska Institute.
    Niazi, Salman
    Swedish Institute of Computer Science.
    Dowling, Jim
    Swedish Institute of Computer Science.
    Privacy-Preservation for Publishing Sample Availability Data with Personal Identifiers2015In: Journal of medical and bioengineering, ISSN 2301-3796, Vol. 4, no 2, 117-125 p.Article in journal (Refereed)
    Abstract [en]

    Medical organizations collect, store and process vast amounts of sensitive information about patients. Easy access to this information by researchers is crucial to improving medical research, but in many institutions, cumbersome security measures and walled-gardens have created a situation where even information about what medical data is out there is not available. One of the main security challenges in this area, is enabling researchers to cross-link different medical studies, while preserving the privacy of the patients involved. In this paper, we introduce a privacy-preserving system for publishing sample availability data that allows researchers to make queries that crosscut different studies. That is, researchers can ask questions such as how many patients have had both diabetes and prostate cancer, where the diabetes and prostate cancer information originates from different clinical registries. We realize our solution by having a two-level anonymiziation mechanism, where our toolkit for publishing availability data first pseudonymizes personal identifiers and then anonymizes sensitive attributes. Our toolkit also includes a web-based server that stores the encrypted pseudonymized sample data and allows researchers to execute cross-linked queries across different study data. We believe that our toolkit contributes a first step to support the privacy preserving publication of data containing personal identifiers.

  • 17. Grafström, Roland C
    et al.
    Nymark, Penny
    Hongisto, Vesa
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Ceder, Rebecca
    Willighagen, Egon
    Hardy, Barry
    Kaski, Samuel
    Kohonen, Pekka
    Toward the Replacement of Animal Experiments through the Bioinformatics-driven Analysis of 'Omics' Data from Human Cell Cultures2015In: ATLA (Alternatives to Laboratory Animals), ISSN 0261-1929, Vol. 43, no 5, 325-332 p.Article in journal (Refereed)
    Abstract [en]

    This paper outlines the work for which Roland Grafström and Pekka Kohonen were awarded the 2014 Lush Science Prize. The research activities of the Grafström laboratory have, for many years, covered cancer biology studies, as well as the development and application of toxicity-predictive in vitro models to determine chemical safety. Through the integration of in silico analyses of diverse types of genomics data (transcriptomic and proteomic), their efforts have proved to fit well into the recently-developed Adverse Outcome Pathway paradigm. Genomics analysis within state-of-the-art cancer biology research and Toxicology in the 21st Century concepts share many technological tools. A key category within the Three Rs paradigm is the Replacement of animals in toxicity testing with alternative methods, such as bioinformatics-driven analyses of data obtained from human cell cultures exposed to diverse toxicants. This work was recently expanded within the pan-European SEURAT-1 project (Safety Evaluation Ultimately Replacing Animal Testing), to replace repeat-dose toxicity testing with data-rich analyses of sophisticated cell culture models. The aims and objectives of the SEURAT project have been to guide the application, analysis, interpretation and storage of 'omics' technology-derived data within the service-oriented sub-project, ToxBank. Particularly addressing the Lush Science Prize focus on the relevance of toxicity pathways, a 'data warehouse' that is under continuous expansion, coupled with the development of novel data storage and management methods for toxicology, serve to address data integration across multiple 'omics' technologies. The prize winners' guiding principles and concepts for modern knowledge management of toxicological data are summarised. The translation of basic discovery results ranged from chemical-testing and material-testing data, to information relevant to human health and environmental safety.

  • 18. Guha, Rajarshi
    et al.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Willighagen, Egon
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Collaborative Cheminformatics Applications2011In: Collaborative Computational Technologies for Biomedical Research / [ed] Sean Ekins, Maggie A. Z. Hupcey, Antony J. Williams, Hoboken, N.J.: John Wiley & Sons, 2011Chapter in book (Other academic)
  • 19.
    Hardy, Barry
    et al.
    DouglasConnect.
    Apic, Gordana
    Cambridge Cell Networks.
    Carthew, Philip
    Unilever.
    Clark, Dominic
    EMBL-EBI.
    Cook, David
    AstraZeneca.
    Dix, Ian
    AstraZeneca.
    Escher, Sylvia
    Fraunhofer Institute for Toxicology & Experimental Medicine.
    Hastings, Janna
    EMBL-EBI.
    Heard, David J
    Novartis.
    Jeliazkova, Nina
    Ideaconsult.
    Judson, Philip
    Lhasa Ltd.
    Matis-Mitchell, Sherri
    AstraZeneca.
    Mitic, Dragana
    Cambridge Cell Networks.
    Myatt, Glenn
    Leadscope.
    Shah, Imran
    US EPA.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Tcheremenskaia, Olga
    Istituto Superiore di Sanita.
    Toldo, Luca
    Merck KGaA.
    Watson, David
    Lhasa Ltd.
    White, Andrew
    Unilever.
    Yang, Chihae
    Altamira.
    Toxicology Ontology Perspectives2012In: ALTEX. Alternatives zu Tierexperimenten, ISSN 0946-7785, Vol. 29, no 2, 139-156 p.Article in journal (Refereed)
    Abstract [en]

    The field of predictive toxicology requires the development of open, public, computable, standardized toxicology vocabularies and ontologies to support the applications required by in silico, in vitro, and in vivo toxicology methods and related analysis and reporting activities. In this article we review ontology developments based on a set of perspectives showing how ontologies are being used in predictive toxicology initiatives and applications. Perspectives on resources and initiatives reviewed include OpenTox, eTOX, Pistoia Alliance, ToxWiz, Virtual Liver, EU-ADR, BEL, ToxML, and Bioclipse. We also review existing ontology developments in neighboring fields that can contribute to establishing an ontological framework for predictive toxicology. A significant set of resources is already available to provide a foundation for an ontological framework for 21st century mechanistic-based toxicology research. Ontologies such as ToxWiz provide a basis for application to toxicology investigations, whereas other ontologies under development in the biological, chemical, and biomedical communities could be incorporated in an extended future framework. OpenTox has provided a semantic web framework for the implementation of such ontologies into software applications and linked data resources. Bioclipse developers have shown the benefit of interoperability obtained through ontology by being able to link their workbench application with remote OpenTox web services. Although these developments are promising, an increased international coordination of efforts is greatly needed to develop a more unified, standardized, and open toxicology ontology framework.

  • 20.
    Harry, Barry
    et al.
    DouglasConnect.
    Apic, Gordana
    Cambridge Cell Networks.
    Carthew, Philip
    Unilever.
    Clark, Dominic
    EMBL-EBI.
    Cook, David
    AstraZeneca.
    Dix, Ian
    AstraZeneca.
    Escher, Sylvia
    Fraunhofer Institute for Toxicology & Experimental Medicine.
    Hastings, Janna
    EMBL-EBI.
    Heard, David J
    Novartis.
    Jeliazkova, Nina
    Ideaconsult.
    Judson, Philip
    Lhasa Ltd.
    Matis-Mitchell, Sherri
    AstraZeneca.
    Mitic, Dragana
    Cambridge Cell Networks.
    Myatt, Glenn
    Leadscope.
    Shah, Imran
    US EPA.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Tcheremenskaia, Olga
    Istituto Superiore di Sanita.
    Toldo, Luca
    Merck KGaA.
    Watson, David
    Lhasa Ltd.
    White, Andrew
    Unilever.
    Yang, Chihae
    Altamira.
    Food for thought...: A toxicology ontology roadmap2012In: ALTEX. Alternatives zu Tierexperimenten, ISSN 0946-7785, Vol. 29, no 2, 129-137 p.Article in journal (Refereed)
    Abstract [en]

    Foreign substances can have a dramatic and unpredictable adverse effect on human health. In the development of new therapeutic agents, it is essential that the potential adverse effects of all candidates be identified as early as possible. The field of predictive toxicology strives to profile the potential for adverse effects of novel chemical substances before they occur, both with traditional in vivo experimental approaches and increasingly through the development of in vitro and computational methods which can supplement and reduce the need for animal testing. To be maximally effective, the field needs access to the largest possible knowledge base of previous toxicology findings, and such results need to be made available in such a fashion so as to be interoperable, comparable, and compatible with standard toolkits. This necessitates the development of open, public, computable, and standardized toxicology vocabularies and ontologies so as to support the applications required by in silico, in vitro, and in vivo toxicology methods and related analysis and reporting activities. Such ontology development will support data management, model building, integrated analysis, validation and reporting, including regulatory reporting and alternative testing submission requirements as required by guidelines such as the REACH legislation, leading to new scientific advances in a mechanistically-based predictive toxicology. Numerous existing ontology and standards initiatives can contribute to the creation of a toxicology ontology supporting the needs of predictive toxicology and risk assessment. Additionally, new ontologies are needed to satisfy practical use cases and scenarios where gaps currently exist. Developing and integrating these resources will require a well-coordinated and sustained effort across numerous stakeholders engaged in a public-private partnership. In this communication, we set out a roadmap for the development of an integrated toxicology ontology, harnessing existing resources where applicable. We describe the stakeholders’ requirements analysis from the academic and industry perspectives, timelines, and expected benefits of this initiative, with a view to engagement with the wider community.

  • 21. Herman, Stephanie
    et al.
    Emami Khoonsari, Payam
    Aftab, Obaid
    Krishnan, Shibu
    Strömbom, Emil
    Larsson, Rolf
    Hammerling, Ulf
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Kultima, Kim
    Gustafsson, Mats
    Mass spectrometry based metabolomics for in vitro systems pharmacology: pitfalls, challenges, and computational solutions.2017In: Metabolomics, ISSN 1573-3882, E-ISSN 1573-3890, Vol. 13, no 7, 79Article in journal (Refereed)
    Abstract [en]

    INTRODUCTION: Mass spectrometry based metabolomics has become a promising complement and alternative to transcriptomics and proteomics in many fields including in vitro systems pharmacology. Despite several merits, metabolomics based on liquid chromatography mass spectrometry (LC-MS) is a developing area that is yet attached to several pitfalls and challenges. To reach a level of high reliability and robustness, these issues need to be tackled by implementation of refined experimental and computational protocols.

    OBJECTIVES: This study illustrates some key pitfalls in LC-MS based metabolomics and introduces an automated computational procedure to compensate for them.

    METHOD: Non-cancerous mammary gland derived cells were exposed to 27 chemicals from four pharmacological classes plus a set of six pesticides. Changes in the metabolome of cell lysates were assessed after 24 h using LC-MS. A data processing pipeline was established and evaluated to handle issues including contaminants, carry over effects, intensity decay and inherent methodology variability and biases. A key component in this pipeline is a latent variable method called OOS-DA (optimal orthonormal system for discriminant analysis), being theoretically more easily motivated than PLS-DA in this context, as it is rooted in pattern classification rather than regression modeling.

    RESULT: The pipeline is shown to reduce experimental variability/biases and is used to confirm that LC-MS spectra hold drug class specific information.

    CONCLUSION: LC-MS based metabolomics is a promising methodology, but comes with pitfalls and challenges. Key difficulties can be largely overcome by means of a computational procedure of the kind introduced and demonstrated here. The pipeline is freely available on www.github.com/stephanieherman/MS-data-processing.

  • 22.
    Junaid, Muhammad
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Lapins, Maris
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Wikberg, Jarl
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Proteochemometric Modeling of the Susceptibility of Mutated Variants of the HIV-1 Virus to Reverse Transcriptase Inhibitors2010In: PLoS ONE, ISSN eISSN-1932-6203, Vol. 5, no 12, e14353- p.Article in journal (Refereed)
    Abstract [en]

    Background

    Reverse transcriptase is a major drug target in highly active antiretroviral therapy (HAART) against HIV, which typically comprises two nucleoside/nucleotide analog reverse transcriptase (RT) inhibitors (NRTIs) in combination with a non-nucleoside RT inhibitor or a protease inhibitor. Unfortunately, HIV is capable of escaping the therapy by mutating into drug-resistant variants. Computational models that correlate HIV drug susceptibilities to the virus genotype and to drug molecular properties might facilitate selection of improved combination treatment regimens.

    Methodology/Principal Findings

    We applied our earlier developed proteochemometric modeling technology to analyze HIV mutant susceptibility to the eight clinically approved NRTIs. The data set used covered 728 virus variants genotyped for 240 sequence residues of the DNA polymerase domain of the RT; 165 of these residues contained mutations; totally the data-set covered susceptibility data for 4,495 inhibitor-RT combinations. Inhibitors and RT sequences were represented numerically by 3D-structural and physicochemical property descriptors, respectively. The two sets of descriptors and their derived cross-terms were correlated to the susceptibility data by partial least-squares projections to latent structures. The model identified more than ten frequently occurring mutations, each conferring more than two-fold loss of susceptibility for one or several NRTIs. The most deleterious mutations were K65R, Q151M, M184V/I, and T215Y/F, each of them decreasing susceptibility to most of the NRTIs. The predictive ability of the model was estimated by cross-validation and by external predictions for new HIV variants; both procedures showed very high correlation between the predicted and actual susceptibility values (Q2 = 0.89 and Q2ext = 0.86). The model is available at www.hivdrc.org as a free web service for the prediction of the susceptibility to any of the clinically used NRTIs for any HIV-1 mutant variant.

    Conclusions/Significance

    Our results give directions how to develop approaches for selection of genome-based optimum combination therapy for patients harboring mutated HIV variants.

  • 23.
    Kohonen, Pekka
    et al.
    Karolinska Institutet.
    Ceder, Rebecca
    Karolinska Institutet.
    Smit, Ines
    Karolinska Institutet.
    Vesa, Hongisto
    VTT Technical Research Centre of Finland.
    Glenn, Myatt
    Leadscope.
    Barry, Hardy
    Douglas connect.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Grafström, Roland
    Karolinska Institutet.
    Cancer Biology, Toxicology and Alternative Methods Development Go Hand-in-Hand2014In: Basic & Clinical Pharmacology & Toxicology, ISSN 1742-7835, E-ISSN 1742-7843, Vol. 115, no 1, 50-58 p.Article, review/survey (Refereed)
    Abstract [en]

    Toxicological research faces the challenge of integrating knowledge from diverse fields and novel technological developments generally in the biological and medical sciences. We discuss herein the fact that the multiple facets of cancer research, including discovery related to mechanisms, treatment and diagnosis, overlap many up and coming interest areas in toxicology, including the need for improved methods and analysis tools. Common to both disciplines, in vitro and in silico methods serve as alternative investigation routes to animal studies. Knowledge on cancer development helps in understanding the relevance of chemical toxicity studies in cell models, and many bioinformatics-based cancer biomarker discovery tools are also applicable to computational toxicology. Robotics-aided cell-based high throughput screening, microscale immunostaining techniques, and gene expression profiling analyses are common tools in cancer research, and when sequentially combined, form a tiered approach to structured safety evaluation of thousands of environmental agents, novel chemicals or engineered nanomaterials. Comprehensive tumour data collections in databases have been translated into clinically useful data, and this concept serves as template for computer-driven evaluation of toxicity data into meaningful results. Future “cancer research-inspired knowledge management” of toxicological data will aid the translation of basic discovery results and chemicals- and materials-testing data to information relevant to human health and environmental safety.

  • 24.
    Laeeq, Ahmed
    et al.
    Royal Institute of Technology.
    Edlund, Åke
    Royal Institute of Technology.
    Laure, Erwin
    Royal Institute of Technology.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Using Iterative MapReduce for Parallel Virtual ScreeningIn: Journal of medical and bioengineering, ISSN 2301-3796Article in journal (Refereed)
    Abstract [en]

    MapReduce and its different implementations has been successfully used on commodity clusters for analysis of data for problems where the datasets becomes really huge. Virtual Screening is a technique in chemoinformatics used for Drug discovery by searching large libraries of molecule structures, making it a great candidate for MapReduce. However, in this study we used SVM based virtual screening which is resource demanding. Such virtual screening not only have huge datasets, but it is also compute expensive whose complexity can grow at least upto n2. Most SVM based applications use MPI, but MPI has its own limitations such as lack of fault tolerance and low productivity. This study shows that MapReduce can be used effectively for implementing SVM based virtual screening. The results illustrate that MapReduce performs quite well with the increasing nodes on the cluster. For experiments, we have used spark, an iterative MapReduce programming model. We have also provided the flow of program and the results to show the efficiency of iterative MapReduce.

  • 25.
    Lampa, Samuel
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Alvarsson, Jonathan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles2016In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 8, 67Article in journal (Refereed)
    Abstract [en]

    Predictive modelling in drug discovery is challenging to automate as it often contains multiple analysis steps and might involve cross-validation and parameter tuning that create complex dependencies between tasks. With large-scale data or when using computationally demanding modelling methods, e-infrastructures such as high-performance or cloud computing are required, adding to the existing challenges of fault-tolerant automation. Workflow management systems can aid in many of these challenges, but the currently available systems are lacking in the functionality needed to enable agile and flexible predictive modelling. We here present an approach inspired by elements of the flow-based programming paradigm, implemented as an extension of the Luigi system which we name SciLuigi. We also discuss the experiences from using the approach when modelling a large set of biochemical interactions using a shared computer cluster.

  • 26.
    Lampa, Samuel
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Dahlö, Martin
    Uppsala University, Science for Life Laboratory, SciLifeLab.
    Olason, Pall I
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology.
    Hagberg, Jonas
    Uppsala University, Science for Life Laboratory, SciLifeLab.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Lessons learned from implementing a national infrastructure in Sweden for storage and analysis of next-generation sequencing data2013In: GigaScience, ISSN 2047-217X, E-ISSN 2047-217X, Vol. 2, no 1, 1-10 p.Article in journal (Refereed)
    Abstract [en]

    Analyzing and storing data and results from next-generation sequencing (NGS) experiments is a challenging task, hampered by ever-increasing data volumes and frequent updates of analysis methods and tools. Storage and computation have grown beyond the capacity of personal computers and there is a need for suitable e-infrastructures for processing. Here we describe UPPNEX, an implementation of such an infrastructure, tailored to the needs of data storage and analysis of NGS data in Sweden serving various labs and multiple instruments from the major sequencing technology platforms. UPPNEX comprises resources for high-performance computing, large-scale and high-availability storage, an extensive bioinformatics software suite, up-to-date reference genomes and annotations, a support function with system and application experts as well as a web portal and support ticket system. UPPNEX applications are numerous and diverse, and include whole genome-, de novo- and exome sequencing, targeted resequencing, SNP discovery, RNASeq, and methylation analysis. There are over 300 projects that utilize UPPNEX and include large undertakings such as the sequencing of the flycatcher and Norwegian spruce. We describe the strategic decisions made when investing in hardware, setting up maintenance and support, allocating resources, and illustrate major challenges such as managing data growth. We conclude with summarizing our experiences and observations with UPPNEX to date, providing insights into the successful and less successful decisions made.

  • 27.
    Lampa, Samuel
    et al.
    Uppsala University, Science for Life Laboratory, SciLifeLab.
    Hagberg, Jonas
    Uppsala University, Science for Life Laboratory, SciLifeLab.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    UPPNEX: A solution for next generation sequencing data management and analysis2012In: EMBnet.journal, ISSN 2226-6089, Vol. 17, no Suppl. B, 44-44 p.Article in journal (Other academic)
  • 28.
    Lampa, Samuel
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Willighagen, Egon
    Maastricht University.
    Kohonen, Pekka
    Karolinska Institutet.
    King, Ali
    FanDuel Inc.
    Vrandečić, Denny
    Google Inc..
    Grafström, Roland
    Karolinska Institutet.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    RDFIO: extending Semantic MediaWiki for interoperable biomedical data management.2017In: Journal of Biomedical Semantics, ISSN 2041-1480, E-ISSN 2041-1480, Vol. 8, no 1, 35Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Biological sciences are characterised not only by an increasing amount but also the extreme complexity of its data. This stresses the need for efficient ways of integrating these data in a coherent description of biological systems. In many cases, biological data needs organization before integration. This is not seldom a collaborative effort, and it is thus important that tools for data integration support a collaborative way of working. Wiki systems with support for structured semantic data authoring, such as Semantic MediaWiki, provide a powerful solution for collaborative editing of data combined with machine-readability, so that data can be handled in an automated fashion in any downstream analyses. Semantic MediaWiki lacks a built-in data import function though, which hinders efficient round-tripping of data between interoperable Semantic Web formats such as RDF and the internal wiki format.

    RESULTS: To solve this deficiency, the RDFIO suite of tools is presented, which supports importing of RDF data into Semantic MediaWiki, with metadata needed to export it again in the same RDF format, or ontology. Additionally, the new functionality enables mash-ups of automated data imports combined with manually created data presentations. The application of the suite of tools is demonstrated by importing drug discovery related data about rare diseases from Orphanet and acid dissociation constants from Wikidata. The RDFIO suite of tools is freely available for download via pharmb.io/project/rdfio .

    CONCLUSIONS: Through a set of biomedical demonstrators, it is demonstrated how the new functionality enables a number of usage scenarios where the interoperability of SMW and the wider Semantic Web is leveraged for biomedical data sets, to create an easy to use and flexible platform for exploring and working with biomedical data.

  • 29.
    Lapins, Maris
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Prusis, Peteris
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Wikberg, Jarl E S
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences, Pharmaceutical Pharmacology.
    Proteochemometric modeling of HIV protease susceptibility2008In: BMC Bioinformatics, ISSN 1471-2105, Vol. 9, 181- p.Article in journal (Refereed)
    Abstract [en]

    BACKGROUND

    A major obstacle in treatment of HIV is the ability of the virus to mutate rapidly into drug-resistant variants. A method for predicting the susceptibility of mutated HIV strains to antiviral agents would provide substantial clinical benefit as well as facilitate the development of new candidate drugs. Therefore, we used proteochemometrics to model the susceptibility of HIV to protease inhibitors in current use, utilizing descriptions of the physico-chemical properties of mutated HIV proteases and 3D structural property descriptions for the protease inhibitors. The descriptions were correlated to the susceptibility data of 828 unique HIV protease variants for seven protease inhibitors in current use; the data set comprised 4792 protease-inhibitor combinations.

    RESULTS

    The model provided excellent predictability (R2 = 0.92, Q2 = 0.87) and identified general and specific features of drug resistance. The model's predictive ability was verified by external prediction in which the susceptibilities to each one of the seven inhibitors were omitted from the data set, one inhibitor at a time, and the data for the six remaining compounds were used to create new models. This analysis showed that the over all predictive ability for the omitted inhibitors was Q2 inhibitors = 0.72.

    CONCLUSION

    Our results show that a proteochemometric approach can provide generalized susceptibility predictions for new inhibitors. Our proteochemometric model can directly analyze inhibitor-protease interactions and facilitate treatment selection based on viral genotype. The model is available for public use, and is located at HIV Drug Research Centre.

  • 30.
    Lapins, Maris
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Worachartcheewan, Apilak
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Georgiev, Valentin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Prachayasittikul, Virapong
    Nantasenamat, Chanin
    Wikberg, Jarl E. S.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    A Unified Proteochemometric Model for Prediction of Inhibition of Cytochrome P450 Isoforms2013In: PLoS ONE, ISSN 1932-6203, Vol. 8, no 6, e66566- p.Article in journal (Refereed)
    Abstract [en]

    A unified proteochemometric (PCM) model for the prediction of the ability of drug-like chemicals to inhibit five major drug metabolizing CYP isoforms (i.e. CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4) was created and made publicly available under the Bioclipse Decision Support open source system at www.cyp450model.org. In regards to the proteochemometric modeling we represented the chemical compounds by molecular signature descriptors and the CYP-isoforms by alignment-independent description of composition and transition of amino acid properties of their protein primary sequences. The entire training dataset contained 63 391 interactions and the best PCM model was obtained using signature descriptors of height 1, 2 and 3 and inducing the model with a support vector machine. The model showed excellent predictive ability with internal AUC = 0.923 and an external AUC = 0.940, as evaluated on a large external dataset. The advantage of PCM models is their extensibility making it possible to extend our model for new CYP isoforms and polymorphic CYP forms. A key benefit of PCM is that all proteins are confined in one single model, which makes it generally more stable and predictive as compared with single target models. The inclusion of the model in Bioclipse Decision Support makes it possible to make virtual instantaneous predictions (∼100 ms per prediction) while interactively drawing or modifying chemical structures in the Bioclipse chemical structure editor.

  • 31.
    O'Boyle, Noel
    et al.
    University College Cork.
    Guha, Rajarshi
    NIH Center for Translational Therapeutic.
    Willighagen, Egon
    Karolinska Institutet.
    Adams, Samuel
    University of Cambridge.
    Alvarsson, Jonathan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Bradley, Jean-Claude
    Drexel University.
    Filippov, Igor
    NCI-Frederick.
    Hansson, Robert
    St. Olaf College.
    Hanwell, Marcus
    Kitware, Inc.
    Hutchison, Geoffrey
    University of Pittsburg.
    James, Craig
    eMolecules Inc.
    Jeliazkova, Nina
    Ideaconsult Ltd.
    Lang, Andrew
    Oral Roberts University.
    Langner, Karol
    Leiden University.
    Lonie, David
    State University of New York at Buffalo.
    Lowe, Daniel
    University of Cambridge.
    Pansanel, Jerome
    Université de Strasbourg.
    Pavlov, Dmitry
    GGA Software Service.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Steinbeck, Christoph
    European Bioinformatics Institute.
    Tenderholt, Adam
    University of Washington.
    Thiesen, Kevin
    Chemlabs.
    Murray-Rust, Peter
    University of Cambridge.
    Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk five years on2011In: Journal of Cheminformatics, ISSN 1758-2946, Vol. 3, 37- p.Article in journal (Refereed)
    Abstract [en]

    Background: The Blue Obelisk movement was established in 2005 as a response to the lack of Open Data,Open Standards and Open Source (ODOSOS) in chemistry. It aims to make it easier to carry out chemistryresearch by promoting interoperability between chemistry software, encouraging cooperation between OpenSource developers, and developing community resources and Open Standards.

    Results: This contribution looks back on the work carried out by the Blue Obelisk in the past 5 years and surveysprogress and remaining challenges in the areas of Open Data, Open Standards, and Open Source in chemistry.

    Conclusions: We show that the Blue Obelisk has been very successful in bringing together researchers anddevelopers with common interests in ODOSOS, leading to development of many useful resources freely availableto the chemistry community

  • 32.
    Rostkowski, Michal
    et al.
    University of Copenhagen.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Rydberg, Patrik
    University of Copenhagen.
    WhichCyp: Prediction of Cytochromes P450 Inhibition2013In: Bioinformatics, ISSN 1367-4803, E-ISSN 1460-2059, Vol. 29, no 16, 2051-2052 p.Article in journal (Refereed)
    Abstract [en]

    SUMMARY: In this work we present WhichCyp, a tool for prediction of which cytochromes P450 isoforms (among 1A2, 2C9, 2C19, 2D6 and 3A4) a given molecule is likely to inhibit. The models are built from experimental high-throughput data using support vector machines and molecular signatures.

    AVAILABILITY: The WhichCyp server is freely available for use on the web at http://drug.ku.dk/whichcyp, where the WhichCyp Java program and source code is also available for download.

    CONTACT: pry@sund.ku.dk

    SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

  • 33.
    Schaal, Wesley
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Hammerling, Ulf
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Gustafsson, Mats G
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Automated QuantMap for rapid quantitative molecular network topology analysis2013In: Bioinformatics, ISSN 1367-4803, E-ISSN 1460-2059, Vol. 29, no 18, 2369-2370 p.Article in journal (Refereed)
    Abstract [en]

    SUMMARY:

    The previously disclosed QuantMap method for grouping chemicals by biological activity used online services for much of the data gathering and some of the numerical analysis. The present work attempts to streamline this process by using local copies of the databases and in-house analysis. Using computational methods similar or identical to those used in the previous work, a qualitatively equivalent result was found in just a few seconds on the same dataset (collection of 18 drugs). We use the user-friendly Galaxy framework to enable users to analyze their own datasets. Hopefully, this will make the QuantMap method more practical and accessible and help achieve its goals to provide substantial assistance to drug repositioning, pharmacology evaluation and toxicology risk assessment.

    AVAILABILITY:

    http://galaxy.predpharmtox.org

    CONTACT:

    mats.gustafsson@medsci.uu.se or ola.spjuth@farmbio.uu.se

    SUPPLEMENTARY INFORMATION:

    Supplementary data are available at Bioinformatics online.

  • 34. Shoombuatong, Watshara
    et al.
    Prathipati, Philip
    Prachayasittikul, Veda
    Schaduangrat, Nalini
    Malik, Aijaz Ahmad
    Pratiwi, Reny
    Wanwimolruk, Sompon
    S Wikberg, Jarl E
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Gleeson, Matthew Paul
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Nantasenamat, Chanin
    Towards Predicting the Cytochrome P450 Modulation: From QSAR to proteochemometric modeling.2017In: Current drug metabolism, ISSN 1389-2002, E-ISSN 1875-5453, Vol. 18, no 6, 540-555 p.Article in journal (Refereed)
    Abstract [en]

    Drug metabolism determines the fate of a drug when it enters the human body and is a critical factor in defining their absorption, distribution, metabolism, excretion and toxicity (ADMET) characteristics. Among the various drug metabolizing enzymes, cytochrome P450s (CYP450) constitute an important protein family that aside from functioning in xenobiotic metabolism is also responsible for a diverse array of other roles encompassing steroid and cholesterol biosynthesis, fatty acid metabolism, calcium homeostasis, neuroendocrine functions and growth regulation. Although CYP450 typically convert xenobiotics into safe metabolites, there are some situations whereby the metabolite is more toxic than its parent molecule. Computational modeling has been instrumental in CYP450 research by rationalizing the nature of the binding event (i.e. inhibit or induce CYP450s) or metabolic stability of query compounds of interest. A plethora of computational approaches encompassing ligand, structure and systems based approaches have been utilized to model CYP450-ligand interactions. This review provides a brief background on the CYP450 family (i.e. its roles, advantages and disadvantages as well as its modulators) and then discusses the various computational approaches that have been used to model CYP450-ligand interaction. Particular focus is given to the use of quantitative structure-activity relationship (QSAR) and more recent proteochemometric modeling studies. Finally, a perspective on the current state of the art and future trends of the field is provided.

  • 35. Simeon, Saw
    et al.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Lapins, Maris
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Nabu, Sunanta
    Anuwongcharoen, Nuttapat
    Prachayasittikul, Virapong
    Wikberg, Jarl E. S.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Nantasenamat, Chanin
    Origin of aromatase inhibitory activity via proteochemometric modeling2016In: PeerJ, ISSN 2167-8359, E-ISSN 2167-8359, Vol. 4, e1979Article in journal (Refereed)
    Abstract [en]

    Aromatase, the rate-limiting enzyme that catalyzes the conversion of androgen to estrogen, plays an essential role in the development of estrogen-dependent breast cancer. Side effects due to aromatase inhibitors (AIs) necessitate the pursuit of novel inhibitor candidates with high selectivity, lower toxicity and increased potency. Designing a novel therapeutic agent against aromatase could be achieved computationally by means of ligand-based and structure-based methods. For over a decade, we have utilized both approaches to design potential AIs for which quantitative structure-activity relationships and molecular docking were used to explore inhibitory mechanisms of AIs towards aromatase. However, such approaches do not consider the effects that aromatase variants have on different AIs. In this study, proteochemometrics modeling was applied to analyze the interaction space between AIs and aromatase variants as a function of their substructural and amino acid features. Good predictive performance was achieved, as rigorously verified by 10-fold cross-validation, external validation, leave-one-compound-out cross-validation, leave-one-protein-out cross-validation and Y-scrambling tests. The investigations presented herein provide important insights into the mechanisms of aromatase inhibitory activity that could aid in the design of novel potent AIs as breast cancer therapeutic agents.

  • 36.
    Siretskiy, Alexey
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    HTSeq-Hadoop: Extending HTSeq for Massively Parallel Sequencing Data Analysis using Hadoop2014In: Proc. 10th International Conference on e-Science, IEEE Computer Society, 2014, 317-323 p.Conference paper (Refereed)
    Abstract [en]

    Hadoop is a convenient framework in e-Science enabling scalable distributed data analysis. In molecular biology, next-generation sequencing produces vast amounts of data and requires flexible frameworks for constructing analysis pipelines. We extend the popular HTSeq package into the Hadoop realm by introducing massively parallel versions of short read quality assessment as well as functionality to count genes mapped by the short reads. We use the Hadoop-streaming library which allows the components to run in both Hadoop and regular Linux systems and evaluate their performance in two different execution environments: A single node on a computational cluster and a Hadoop cluster in a private cloud. We compare the implementations with Apache Pig showing improved runtime performance of our developed methods. We also inject the components in the graphical platform Cloudgene to simplify user interaction.

  • 37.
    Siretskiy, Alexey
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    Sundqvist, Tore
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    Voznesenskiy, Mikhail
    St Petersburg State Univ, Inst Chem, Dept Phys Chem, St Petersburg 199034, Russia.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    A quantitative assessment of the Hadoop framework for analyzing massively parallel DNA sequencing data2015In: GigaScience, ISSN 2047-217X, E-ISSN 2047-217X, Vol. 4, 26Article in journal (Refereed)
    Abstract [en]

    Background: New high-throughput technologies, such as massively parallel sequencing, have transformed the life sciences into a data-intensive field. The most common e-infrastructure for analyzing this data consists of batch systems that are based on high-performance computing resources; however, the bioinformatics software that is built on this platform does not scale well in the general case. Recently, the Hadoop platform has emerged as an interesting option to address the challenges of increasingly large datasets with distributed storage, distributed processing, built-in data locality, fault tolerance, and an appealing programming methodology. Results: In this work we introduce metrics and report on a quantitative comparison between Hadoop and a single node of conventional high-performance computing resources for the tasks of short read mapping and variant calling. We calculate efficiency as a function of data size and observe that the Hadoop platform is more efficient for biologically relevant data sizes in terms of computing hours for both split and un-split data files. We also quantify the advantages of the data locality provided by Hadoop for NGS problems, and show that a classical architecture with network-attached storage will not scale when computing resources increase in numbers. Measurements were performed using ten datasets of different sizes, up to 100 gigabases, using the pipeline implemented in Crossbow. To make a fair comparison, we implemented an improved preprocessor for Hadoop with better performance for splittable data files. For improved usability, we implemented a graphical user interface for Crossbow in a private cloud environment using the CloudGene platform. All of the code and data in this study are freely available as open source in public repositories. Conclusions: From our experiments we can conclude that the improved Hadoop pipeline scales better than the same pipeline on high-performance computing resources, we also conclude that Hadoop is an economically viable option for the common data sizes that are currently used in massively parallel sequencing. Given that datasets are expected to increase over time, Hadoop is a framework that we envision will have an increasingly important role in future biological data analysis.

  • 38.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Using Bioclipse to integrate bioinformatics functionality2005In: EMBnet.news, ISSN 1023-4144, Vol. 13, no 1, 5-11 p.Article in journal (Other (popular science, discussion, etc.))
  • 39.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Eine visuelle Open-Source-Platform für Chemo- und Bioinformatik2006In: JAVAmagazin, ISSN 1619-795X, no 8Article in journal (Other (popular science, discussion, etc.))
  • 40.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Bioclipse: Integration of Data and Software in the Life Sciences2009Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    New high throughput experimental techniques have turned the life sciences into a data-intensive field. Scientists are faced with new types of problems, such as managing voluminous sources of information, integrating heterogeneous data, and applying the proper analysis algorithms; all to end up with reliable conclusions. These challenges call for an infrastructure of algorithms and technologies to supply researchers with the tools and methods necessary to maximize the usefulness of the data. eScience has emerged as a promising technology to take on these challenges, and denotes integrated science carried out in highly distributed network environments, or science that makes use of large data sets and requires high performance computing resources.

    In this thesis I present standards, exchange formats, algorithms, and software implementations for empowering researchers in the life sciences with the tools of eScience. The work is centered around Bioclipse - an extensible workbench developed in the frame of this thesis - which provides users with instruments for carrying out integrated research and where technical details are hidden under simple graphical interfaces. Bioclipse is a Rich Client that takes full advantage of the many offerings of eScience, such as networked databases and online services. The benefits of mixing local and remote software in a unifying platform are demonstrated with an integrated approach for predicting metabolic sites in chemical structures. To overcome the limitations of the commonly used technologies for interacting with networked services, I also present a new technology using the XMPP protocol. This enables service discovery and asynchronous communication between the client and server, which is ideal for long-running analyses.

    To maximize the usefulness of the available data there is a need for standards, ontologies, and exchange formats, in order to define what information should be captured and how it should be structured and exchanged. A novel format for exchanging QSAR data sets in a fully interoperable and reproducible form is presented, together with an implementation in Bioclipse that takes advantage of eScience components during the setup process.

    Bioclipse has been well received by the scientific community, attracted a large group of international users and developers, and has been awarded three international prizes for its innovative character. With continued development, the project has a good chance of becoming an important component in a sustainable infrastructure for the life sciences.

    List of papers
    1. Bioclipse: an open source workbench for chemo- and bioinformatics
    Open this publication in new window or tab >>Bioclipse: an open source workbench for chemo- and bioinformatics
    Show others...
    2007 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 8, 59- p.Article in journal (Refereed) Published
    Abstract [en]

    BACKGROUND: There is a need for software applications that provide users with a complete and extensible toolkit for chemo- and bioinformatics accessible from a single workbench. Commercial packages are expensive and closed source, hence they do not allow end users to modify algorithms and add custom functionality. Existing open source projects are more focused on providing a framework for integrating existing, separately installed bioinformatics packages, rather than providing user-friendly interfaces. No open source chemoinformatics workbench has previously been published, and no successful attempts have been made to integrate chemo- and bioinformatics into a single framework. RESULTS: Bioclipse is an advanced workbench for resources in chemo- and bioinformatics, such as molecules, proteins, sequences, spectra, and scripts. It provides 2D-editing, 3D-visualization, file format conversion, calculation of chemical properties, and much more; all fully integrated into a user-friendly desktop application. Editing supports standard functions such as cut and paste, drag and drop, and undo/redo. Bioclipse is written in Java and based on the Eclipse Rich Client Platform with a state-of-the-art plugin architecture. This gives Bioclipse an advantage over other systems as it can easily be extended with functionality in any desired direction. CONCLUSION: Bioclipse is a powerful workbench for bio- and chemoinformatics as well as an advanced integration platform. The rich functionality, intuitive user interface, and powerful plugin architecture make Bioclipse the most advanced and user-friendly open source workbench for chemo- and bioinformatics. Bioclipse is released under Eclipse Public License (EPL), an open source license which sets no constraints on external plugin licensing; it is totally open for both open source plugins as well as commercial ones. Bioclipse is freely available at http://www.bioclipse.net.

    National Category
    Pharmaceutical Sciences
    Identifiers
    urn:nbn:se:uu:diva-104257 (URN)10.1186/1471-2105-8-59 (DOI)000244600100001 ()17316423 (PubMedID)
    Available from: 2009-05-28 Created: 2009-05-28 Last updated: 2015-09-11Bibliographically approved
    2. XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services
    Open this publication in new window or tab >>XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services
    2009 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 10, 279- p.Article in journal (Refereed) Published
    Abstract [en]

    BACKGROUND:Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use.

    RESULTS:We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics.

    CONCLUSION:XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics.

    Keyword
    xmpp, bioclipse, cloud, service, protocol, bioinformatics, cheminformatics, life sciences
    National Category
    Industrial Biotechnology
    Research subject
    Pharmaceutical Pharmacology
    Identifiers
    urn:nbn:se:uu:diva-109290 (URN)doi:10.1186/1471-2105-10-279 (DOI)000271117300001 ()
    Available from: 2009-10-13 Created: 2009-10-13 Last updated: 2015-05-04Bibliographically approved
    3. Use of Historic Metabolic Biotransformation Data as a Means of Anticipating Metabolic Sites Using MetaPrint2D and Bioclipse
    Open this publication in new window or tab >>Use of Historic Metabolic Biotransformation Data as a Means of Anticipating Metabolic Sites Using MetaPrint2D and Bioclipse
    Show others...
    2010 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 11, 362- p.Article in journal (Refereed) Published
    Abstract [en]

    Background: Predicting metabolic sites is important in the drug discovery process to aid in rapid compound optimisation. No interactive tool exists and most of the useful tools are quite expensive.Results: Here a fast and reliable method to analyse ligands and visualise potential metabolic sites is presented which is based on annotated metabolic data, described by circular fingerprints. The method is available via the graphical workbench Bioclipse, which is equipped with advanced features in cheminformatics.Conclusions: Due to the speed of predictions (less than 50 ms per molecule), scientists can get real time decision support when editing chemical structures. Bioclipse is a rich client, which means that all calculations are performed on the local computer and do not require network connection. Bioclipse and MetaPrint2D are free for all users, released under open source licenses, and available from http://www.bioclipse.net.

    Keyword
    bioclipse, metaprint2d, prediction, metabolic, safety, assessment
    National Category
    Bioinformatics and Systems Biology
    Research subject
    Bioinformatics; Pharmaceutical Pharmacology
    Identifiers
    urn:nbn:se:uu:diva-109301 (URN)10.1186/1471-2105-11-362 (DOI)000281440200001 ()20594327 (PubMedID)
    Available from: 2009-10-13 Created: 2009-10-13 Last updated: 2015-05-04Bibliographically approved
    4. Towards interoperable and reproducible QSAR analyses: Exchange of data sets
    Open this publication in new window or tab >>Towards interoperable and reproducible QSAR analyses: Exchange of data sets
    Show others...
    2010 (English)In: Journal of Cheminformatics, ISSN 1758-2946, Vol. 2, 5Article in journal (Refereed) Published
    Abstract [en]

    BACKGROUND: QSAR/QSPR is a widely used method to relate chemical structures and responses based on ex- perimental observations. In QSAR, chemical structures are expressed as descriptors, which are mathematical representations like calculated properties or enumerated fragments. Many existing QSAR data sets are based on a combination of different software tools mixed with in-house developed solutions, with datasets manually assembled in spreadsheets. Currently there exists no agreed-upon definition of descriptors and no standard for exchanging data sets in QSAR, which together with numerous different descriptor implementations makes it a virtually impossible task to reproduce and validate analyses, and significantly hinders collaborations and re-use of data.

    RESULTS: We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR/QSPR data sets, comprising an open XML format (QSAR-ML) and an open extensible descriptor ontology (Blue Obelisk Descriptor Ontology). The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a data set described by QSAR-ML makes its setup completely reproducible. We also provide an implementation as a set of plugins for Bioclipse that simplifies QSAR data set formation, and allows for exporting in QSAR-ML as well as traditional CSV formats. The implementation facilitates addition of new descriptor implementations, from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services.

    CONCLUSIONS: Standardized QSAR data sets opens up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible dataset formation, solving the problems of defining which software components were used, their versions, and the case of multiple names for the same descriptor. This makes is easy to join, extend, combine data sets and also to work collectively. The presented Bioclipse plugins equip scientists with intuitive tools that make QSAR-ML widely available for the community.

    Place, publisher, year, edition, pages
    BioMed Central, 2010
    Keyword
    QSAR, Bioclipse, standard, ontology, life sciences, bioinformatics, cheminformatics, reproducible
    National Category
    Bioinformatics and Systems Biology
    Research subject
    Bioinformatics
    Identifiers
    urn:nbn:se:uu:diva-109302 (URN)10.1186/1758-2946-2-5 (DOI)000208222200004 ()20591161 (PubMedID)
    Available from: 2009-10-13 Created: 2009-10-13 Last updated: 2015-08-14Bibliographically approved
    5. Bioclipse 2: A scriptable integration platform for the life sciences
    Open this publication in new window or tab >>Bioclipse 2: A scriptable integration platform for the life sciences
    Show others...
    2009 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 10, 397- p.Article in journal (Refereed) Published
    Abstract [en]

    Background: Contemporary biological research integrates neighboring scientific domains to answer complex ques- tions in fields such as systems biology and drug discovery. This calls for tools that are intuitive to use, yet flexible to adapt to new tasks.

    Results: Bioclipse is a free, open source workbench with advanced features for the life sciences. Version 2.0 constitutes a complete rewrite of Bioclipse, and delivers a stable, scalable integration platform for developers and an intuitive workbench for end users. All functionality is available both from the graphical user interface and from a built-in novel domain-specific language, supporting the scientist in interdisciplinary research and reproducible analyses through advanced visualization of the inputs and the results. New components for Bioclipse 2 include a rewritten editor for chemical structures, a table for multiple molecules that supports gigabyte-sized files, as well as a graphical editor for sequences and alignments.

    Conclusions: Bioclipse 2 is equipped with advanced tools required to carry out complex analysis in the fields of bio- and cheminformatics. Developed as a Rich Client based on Eclipse, Bioclipse 2 leverages on today’s powerful desktop computers for providing a responsive user interface, but also takes full advantage of the Web and networked (Web/Cloud) services for more demanding calculations or retrieval of data. That Bioclipse 2 is based on an advanced and widely used service platform ensures wide extensibility, and new algorithms, visualizations as well as scripting commands can easily be added. The intuitive tools for end users and the extensible architecture make Bioclipse 2 ideal for interdisciplinary and integrative research. Bioclipse 2 is released under the Eclipse Public License (EPL), a flexible open source license that allows additional plugins to be of any license. Bioclipse 2 is implemented in Java and supported on all major platforms; Source code and binaries are freely available at http://www.bioclipse.net.

    Keyword
    Bioclipse, bioinformatics, cheminformatics, scriptable, script, workbench, life science, platform
    National Category
    Bioinformatics and Systems Biology Pharmaceutical Sciences
    Identifiers
    urn:nbn:se:uu:diva-109304 (URN)10.1186/1471-2105-10-397 (DOI)000273329400001 ()
    Available from: 2009-12-16 Created: 2009-10-13 Last updated: 2015-05-12Bibliographically approved
  • 41.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    NGS data management and analysis for hundreds of projects: Experiences from Sweden2013In: NGS Data after the Gold Rush, 2013Conference paper (Other academic)
    Abstract [en]

    UPPNEX is a national e-infrastructure for next-generation sequencing data storage and analysis in Sweden. This presentation features strategic decisions made regarding hardware, software, maintenance and support, resource allocation, and illustrate challenges such as managing data growth in a shared system with over 400 research projects of varying types. Insights into bioinformatics usage patterns are also presented, together with the ongoing development to extend the e-infrastructure with redundant resources, a secure system for analyzing sensitive data, and a private cloud. 

  • 42.
    Spjuth, Ola
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Alvarsson, Johan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Krachtus, Dieter
    University of Heidelberg, Germany.
    Bioclipse 2.0: Life Science setzt auf die Staerken von Eclipse.2009In: Eclipse Magazine, ISSN 1861-2296, no 4Article in journal (Other (popular science, discussion, etc.))
  • 43.
    Spjuth, Ola
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Alvarsson, Jonathan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Bioclipse 2: Towards integrated biocheminformatics2009In: EMBnet.news, ISSN 1023-4144, Vol. 15, no 3, 25-27 p.Article in journal (Other (popular science, discussion, etc.))
  • 44.
    Spjuth, Ola
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Alvarsson, Jonathan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Berg, Arvid
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Kuhn, Stefan
    European Bioinformatics Institute, Hinxton, UK.
    Mäsak, Carl
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Torrance, Gilleain
    European Bioinformatics Institute, Hinxton, UK.
    Wagener, Johannes
    Max von Pettenkofer-Institut, Ludwig-Maximilians-Universität, Munich, Germany.
    Willighagen, Egon
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Steinbeck, Christoph
    European Bioinformatics Institute, Hinxton, UK.
    Wikberg, Jarl
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Bioclipse 2: A scriptable integration platform for the life sciences2009In: BMC Bioinformatics, ISSN 1471-2105, Vol. 10, 397- p.Article in journal (Refereed)
    Abstract [en]

    Background: Contemporary biological research integrates neighboring scientific domains to answer complex ques- tions in fields such as systems biology and drug discovery. This calls for tools that are intuitive to use, yet flexible to adapt to new tasks.

    Results: Bioclipse is a free, open source workbench with advanced features for the life sciences. Version 2.0 constitutes a complete rewrite of Bioclipse, and delivers a stable, scalable integration platform for developers and an intuitive workbench for end users. All functionality is available both from the graphical user interface and from a built-in novel domain-specific language, supporting the scientist in interdisciplinary research and reproducible analyses through advanced visualization of the inputs and the results. New components for Bioclipse 2 include a rewritten editor for chemical structures, a table for multiple molecules that supports gigabyte-sized files, as well as a graphical editor for sequences and alignments.

    Conclusions: Bioclipse 2 is equipped with advanced tools required to carry out complex analysis in the fields of bio- and cheminformatics. Developed as a Rich Client based on Eclipse, Bioclipse 2 leverages on today’s powerful desktop computers for providing a responsive user interface, but also takes full advantage of the Web and networked (Web/Cloud) services for more demanding calculations or retrieval of data. That Bioclipse 2 is based on an advanced and widely used service platform ensures wide extensibility, and new algorithms, visualizations as well as scripting commands can easily be added. The intuitive tools for end users and the extensible architecture make Bioclipse 2 ideal for interdisciplinary and integrative research. Bioclipse 2 is released under the Eclipse Public License (EPL), a flexible open source license that allows additional plugins to be of any license. Bioclipse 2 is implemented in Java and supported on all major platforms; Source code and binaries are freely available at http://www.bioclipse.net.

  • 45.
    Spjuth, Ola
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Berg, Arvid
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Adams, Samuel
    Unilever Centre for Molecular Sciences Informatics, University Chemical Laboratory, Cambridge.
    Willighagen, Egon
    3 Department of Bioinformatics - BiGCaT, Maastricht University.
    Applications of the InChI in cheminformatics with the CDK and Bioclipse2013In: Journal of Cheminformatics, ISSN 1758-2946, Vol. 5, no 14Article in journal (Refereed)
    Abstract [en]

    Background

    The InChI algorithms are written in C++ and not available as Java library. Integration into softwarewritten in Java therefore requires a bridge between C and Java libraries, provided by the Java NativeInterface (JNI) technology.

    Results

    We here describe how the InChI library is used in the Bioclipse workbench and the Chemistry Development Kit (CDK) cheminformatics library. To make this possible, a JNI bridge to the InChIlibrary was developed, JNI-InChI, allowing Java software to access the InChI algorithms. By usingthis bridge, the CDK project packages the InChI binaries in a module and offers easy access fromJava using the CDK API. The Bioclipse project packages and offers InChI as a dynamic OSGi bundlethat can easily be used by any OSGi-compliant software, in addition to the regular Java Archive andMaven bundles. Bioclipse itself uses the InChI as a key component and calculates it on the fly whenvisualizing and editing chemical structures. We demonstrate the utility of InChI with various applications in CDK and Bioclipse, such as decision support for chemical liability assessment, tautomergeneration, and for knowledge aggregation using a linked data approach.

    Conclusions

    These results show that the InChI library can be used in a variety of Java library dependency solutions, making the functionality easily accessible by Java software, such as in the CDK. The applications show various ways the InChI has been used in Bioclipse, to enrich its functionality.

  • 46.
    Spjuth, Ola
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Bongcam-Rudloff, Erik
    Carrasco Hernández, Guillermo
    Forer, Lukas
    Giovacchini, Mario
    Guimera, Roman Valls
    Kallio, Aleksi
    Korpelainen, Eija
    Kańduła, Maciej M.
    Krachunov, Milko
    Kreil, David P.
    Kulev, Ognyan
    Łabaj, Paweł P.
    Lampa, Samuel
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Pireddu, Luca
    Schönherr, Sebastian
    Siretskiy, Alexey
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    Vassilev, Dimitar
    Experiences with workflows for automating data-intensive bioinformatics2015In: Biology Direct, ISSN 1745-6150, E-ISSN 1745-6150, Vol. 10, 43Article, review/survey (Refereed)
  • 47.
    Spjuth, Ola
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Bongcam-Rudloff, Erik
    Dahlberg, Johan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences.
    Dahlö, Martin
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Kallio, Aleksi
    Pireddu, Luca
    Vezzi, Francesco
    Korpelainen, Eija
    Recommendations on e-infrastructures for next-generation sequencing2016In: GigaScience, ISSN 2047-217X, E-ISSN 2047-217X, Vol. 5, 26Article in journal (Refereed)
  • 48.
    Spjuth, Ola
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Carlsson, Lars
    Alvarsson, Jonathan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Georgiev, Valentin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Willighagen, Egon
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Open source drug discovery with Bioclipse2012In: Current Topics in Medicinal Chemistry, ISSN 1568-0266, E-ISSN 1873-4294, Vol. 12, no 18, 1980-1986 p.Article, review/survey (Refereed)
    Abstract [en]

    We present the open source components for drug discovery that has been developed and integrated into the graphical workbench Bioclipse. Building on a solid open source cheminformatics core, Bioclipse has advanced functionality for managing and visualizing chemical structures and related information. The features presented here include QSAR/QSPR modeling, various predictive solutions such as decision support for chemical liability assessment, site-of-metabolism prediction, virtual screening, and knowledge discovery and integration. We demonstrate the utility of the described tools with examples from computational pharmacology, toxicology, and ADME. Bioclipse is used in both academia and industry, and is a good example of open source leading to new solutions for drug discovery.

  • 49.
    Spjuth, Ola
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Helgee, Ernst Ahlberg
    Boyer, Scott
    Carlsson, Lars
    Integrated Decision Support for Assessing Chemical Liabilities2011In: Journal of chemical information and modeling, ISSN 1549-9596, Vol. 51, no 8, 1840-1847 p.Article in journal (Refereed)
    Abstract [en]

    Chemical liabilities, such as adverse effects and toxicity, have a major impact on today's drug discovery process. In silk prediction of chemical liabilities is an important approach which can reduce costs and animal testing by complementing or replacing in vitro and in vivo liability models. There is a lack of integrated, extensible decision support systems for chemical liability assessment which run quickly and have easily interpretable results. Here we present a method which integrates similarity searches, structural alerts, and QSAR models which all are available from the Bioclipse workbench. Emphasis has been placed on interpretation of results, and substructures which are important for predictions are highlighted in the original chemical structures. This allows for interactively changing chemical structures with instant visual feedback and can be used for hypothesis testing of single chemical structures as well as compound collections. The system has a clear separation between methods and data, and the extensible architecture enables straightforward extension via addition of more plugins (such as new data sets and computational models). We demonstrate our method on three important safety end points: mutagenicity, carcinogenicity, and aryl hydrocarbon receptor (AhR) activation. Bioclipse and the decision support implementation are free, open source, and available from http://www.bioclipse.net/decision-support.

  • 50.
    Spjuth, Ola
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Lapins, Maris
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Junaid, Muhammad
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Wikberg, Jarl
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Services for prediction of drug susceptibility for HIV proteases and reverse transcriptases at the HIV Drug Research Centre2011In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 27, no 12, 1719-1720 p.Article in journal (Refereed)
    Abstract [en]

    Summary: The HIV Drug Research Centre (HIVDRC) has established Web services for prediction of drug susceptibility for HIV proteases and reverse transcriptases. The services are based on two proteochemometric models which accepts a protease or reverse transcriptase sequence in amino acid form, and outputs the predicted drug susceptibility values. The predictions are based on a comprehensive analysis where all the relevant inhibitors are included, resulting in models with excellent predictive capabilities.

    Availability and Implementation: The services are implemented as interoperable Web services (REST and XMPP), with supporting web pages to allow for individual analyses. A set of plugins were also developed which make the services available from the Bioclipse workbench for life science. Services are available athttp://www.hivdrc.org/services.

12 1 - 50 of 67
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf