uu.seUppsala universitets publikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Ligand-based Methods for Data Management and Modelling
Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
2015 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Drug discovery is a complicated and expensive process in the billion dollar range. One way of making the drug development process more efficient is better information handling, modelling and visualisation. The majority of todays drugs are small molecules, which interact with drug targets to cause an effect. Since the 1980s large amounts of compounds have been systematically tested by robots in so called high-throughput screening. Ligand-based drug discovery is based on modelling drug molecules. In the field known as Quantitative Structure–Activity Relationship (QSAR) molecules are described by molecular descriptors which are used for building mathematical models. Based on these models molecular properties can be predicted and using the molecular descriptors molecules can be compared for, e.g., similarity. Bioclipse is a workbench for the life sciences which provides ligand-based tools through a point and click interface. 

The aims of this thesis were to research, and develop new or improved ligand-based methods and open source software, and to work towards making these tools available for users through the Bioclipse workbench. To this end, a series of molecular signature studies was done and various Bioclipse plugins were developed.

An introduction to the field is provided in the thesis summary which is followed by five research papers. Paper I describes the Bioclipse 2 software and the Bioclipse scripting language. In Paper II the laboratory information system Brunn for supporting work with dose-response studies on microtiter plates is described. In Paper III the creation of a molecular fingerprint based on the molecular signature descriptor is presented and the new fingerprints are evaluated for target prediction and found to perform on par with industrial standard commercial molecular fingerprints. In Paper IV the effect of different parameter choices when using the signature fingerprint together with support vector machines (SVM) using the radial basis function (RBF) kernel is explored and reasonable default values are found. In Paper V the performance of SVM based QSAR using large datasets with the molecular signature descriptor is studied, and a QSAR model based on 1.2 million substances is created and made available from the Bioclipse workbench.

sted, utgiver, år, opplag, sider
Uppsala: Acta Universitatis Upsaliensis , 2015. , s. 73
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy, ISSN 1651-6192 ; 200
Emneord [en]
QSAR, ligand-based drug discovery, bioclipse, information system, cheminformatics, bioinformatics
HSV kategori
Forskningsprogram
Farmaceutisk farmakologi; Bioinformatik
Identifikatorer
URN: urn:nbn:se:uu:diva-248964ISBN: 978-91-554-9237-3 (tryckt)OAI: oai:DiVA.org:uu-248964DiVA, id: diva2:801538
Disputas
2015-06-05, B22 BMC, Husargatan 3, Uppsala, 09:15 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2015-05-12 Laget: 2015-04-09 Sist oppdatert: 2018-01-11
Delarbeid
1. Bioclipse 2: A scriptable integration platform for the life sciences
Åpne denne publikasjonen i ny fane eller vindu >>Bioclipse 2: A scriptable integration platform for the life sciences
Vise andre…
2009 (engelsk)Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, s. 397-Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Background: Contemporary biological research integrates neighboring scientific domains to answer complex ques- tions in fields such as systems biology and drug discovery. This calls for tools that are intuitive to use, yet flexible to adapt to new tasks.

Results: Bioclipse is a free, open source workbench with advanced features for the life sciences. Version 2.0 constitutes a complete rewrite of Bioclipse, and delivers a stable, scalable integration platform for developers and an intuitive workbench for end users. All functionality is available both from the graphical user interface and from a built-in novel domain-specific language, supporting the scientist in interdisciplinary research and reproducible analyses through advanced visualization of the inputs and the results. New components for Bioclipse 2 include a rewritten editor for chemical structures, a table for multiple molecules that supports gigabyte-sized files, as well as a graphical editor for sequences and alignments.

Conclusions: Bioclipse 2 is equipped with advanced tools required to carry out complex analysis in the fields of bio- and cheminformatics. Developed as a Rich Client based on Eclipse, Bioclipse 2 leverages on today’s powerful desktop computers for providing a responsive user interface, but also takes full advantage of the Web and networked (Web/Cloud) services for more demanding calculations or retrieval of data. That Bioclipse 2 is based on an advanced and widely used service platform ensures wide extensibility, and new algorithms, visualizations as well as scripting commands can easily be added. The intuitive tools for end users and the extensible architecture make Bioclipse 2 ideal for interdisciplinary and integrative research. Bioclipse 2 is released under the Eclipse Public License (EPL), a flexible open source license that allows additional plugins to be of any license. Bioclipse 2 is implemented in Java and supported on all major platforms; Source code and binaries are freely available at http://www.bioclipse.net.

Emneord
Bioclipse, bioinformatics, cheminformatics, scriptable, script, workbench, life science, platform
HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-109304 (URN)10.1186/1471-2105-10-397 (DOI)000273329400001 ()
Tilgjengelig fra: 2009-12-16 Laget: 2009-10-13 Sist oppdatert: 2018-01-12bibliografisk kontrollert
2. Brunn: an open source laboratory information system for microplates with a graphical plate layout design process
Åpne denne publikasjonen i ny fane eller vindu >>Brunn: an open source laboratory information system for microplates with a graphical plate layout design process
Vise andre…
2011 (engelsk)Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 12, nr 1, artikkel-id 179Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Background:

Compound profiling and drug screening generates large amounts of data and is generally based on microplate assays. Current information systems used for handling this are mainly commercial, closed source, expensive, and heavyweight and there is a need for a flexible lightweight open system for handling plate design, and validation and preparation of data.

Results:

A Bioclipse plugin consisting of a client part and a relational database was constructed. A multiple-step plate layout point-and-click interface was implemented inside Bioclipse. The system contains a data validation step, where outliers can be removed, and finally a plate report with all relevant calculated data, including dose-response curves.

Conclusions:

Brunn is capable of handling the data from microplate assays. It can create dose-response curves and calculate IC50 values. Using a system of this sort facilitates work in the laboratory. Being able to reuse already constructed plates and plate layouts by starting out from an earlier step in the plate layout design process saves time and cuts down on error sources.

Emneord
brunn, microtiter, bioclipse, screening, information system, lis, lims
HSV kategori
Forskningsprogram
Bioinformatik
Identifikatorer
urn:nbn:se:uu:diva-153210 (URN)10.1186/1471-2105-12-179 (DOI)000292027200001 ()21599898 (PubMedID)
Tilgjengelig fra: 2011-05-09 Laget: 2011-05-09 Sist oppdatert: 2018-01-12bibliografisk kontrollert
3. Ligand-Based Target Prediction with Signature Fingerprints
Åpne denne publikasjonen i ny fane eller vindu >>Ligand-Based Target Prediction with Signature Fingerprints
Vise andre…
2014 (engelsk)Inngår i: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, nr 10, s. 2647-2653Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

When evaluating a potential drug candidate it is desirable to predict target interactions in silico prior to synthesis in order to assess, e.g., secondary pharmacology. This can be done by looking at known target binding profiles of similar compounds using chemical similarity searching. The purpose of this study was to construct and evaluate the performance of chemical fingerprints based on the molecular signature descriptor for performing target binding predictions. For the comparison we used the area under the receiver operating characteristics curve (AUC) complemented with net reclassification improvement (NRI). We created two open source signature fingerprints, a bit and a count version, and evaluated their performance compared to a set of established fingerprints with regards to predictions of binding targets using Tanimoto-based similarity searching on publicly available data sets extracted from ChEMBL. The results showed that the count version of the signature fingerprint performed on par with well-established fingerprints such as ECFP. The count version outperformed the bit version slightly; however, the count version is more complex and takes more computing time and memory to run so its usage should probably be evaluated on a case-by-case basis. The NRI based tests complemented the AUC based ones and showed signs of higher power.

HSV kategori
Forskningsprogram
Bioinformatik
Identifikatorer
urn:nbn:se:uu:diva-237934 (URN)10.1021/ci500361u (DOI)000343849600004 ()25230336 (PubMedID)
Forskningsfinansiär
Swedish Research Council, VR-2011-6129eSSENCE - An eScience CollaborationScience for Life Laboratory - a national resource center for high-throughput molecular bioscienceSwedish National Infrastructure for Computing (SNIC)
Tilgjengelig fra: 2014-12-08 Laget: 2014-12-08 Sist oppdatert: 2018-01-11bibliografisk kontrollert
4. Benchmarking Study of Parameter Variation When Using Signature Fingerprints Together with Support Vector Machines
Åpne denne publikasjonen i ny fane eller vindu >>Benchmarking Study of Parameter Variation When Using Signature Fingerprints Together with Support Vector Machines
Vise andre…
2014 (engelsk)Inngår i: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, nr 11, s. 3211-3217Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

QSAR modeling using molecular signatures and support vector machines with a radial basis function is increasingly used for virtual screening in the drug discovery field. This method has three free parameters: C, ?, and signature height. C is a penalty parameter that limits overfitting, ? controls the width of the radial basis function kernel, and the signature height determines how much of the molecule is described by each atom signature. Determination of optimal values for these parameters is time-consuming. Good default values could therefore save considerable computational cost. The goal of this project was to investigate whether such default values could be found by using seven public QSAR data sets spanning a wide range of end points and using both a bit version and a count version of the molecular signatures. On the basis of the experiments performed, we recommend a parameter set of heights 0 to 2 for the count version of the signature fingerprints and heights 0 to 3 for the bit version. These are in combination with a support vector machine using C in the range of 1 to 100 and gamma in the range of 0.001 to 0.1. When data sets are small or longer run times are not a problem, then there is reason to consider the addition of height 3 to the count fingerprint and a wider grid search. However, marked improvements should not be expected.

HSV kategori
Identifikatorer
urn:nbn:se:uu:diva-240239 (URN)10.1021/ci500344v (DOI)000345551000017 ()25318024 (PubMedID)
Forskningsfinansiär
eSSENCE - An eScience CollaborationScience for Life Laboratory - a national resource center for high-throughput molecular bioscience
Tilgjengelig fra: 2015-01-07 Laget: 2015-01-06 Sist oppdatert: 2018-01-11bibliografisk kontrollert
5. Large-scale ligand-based predictive modelling using support vector machines
Åpne denne publikasjonen i ny fane eller vindu >>Large-scale ligand-based predictive modelling using support vector machines
Vise andre…
2016 (engelsk)Inngår i: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 8, artikkel-id 39Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse.

Emneord
Predictive modelling; Support vector machine; Bioclipse; Molecular signatures; QSAR
HSV kategori
Forskningsprogram
Bioinformatik
Identifikatorer
urn:nbn:se:uu:diva-248959 (URN)10.1186/s13321-016-0151-5 (DOI)000381186100001 ()27516811 (PubMedID)
Forskningsfinansiär
Swedish National Infrastructure for Computing (SNIC), b2013262 b2015001Science for Life Laboratory - a national resource center for high-throughput molecular bioscienceeSSENCE - An eScience Collaboration
Tilgjengelig fra: 2015-04-09 Laget: 2015-04-09 Sist oppdatert: 2018-08-28bibliografisk kontrollert

Open Access i DiVA

fulltext(1275 kB)334 nedlastinger
Filinformasjon
Fil FULLTEXT02.pdfFilstørrelse 1275 kBChecksum SHA-512
0948769c9e0d549467923157ecf56650a7d08a4a85232a6602994f7f335835202df9860a475c7ce5ad78370030f4e12b324ea90b64418f8fc6aba0e2b1b957cf
Type fulltextMimetype application/pdf
Kjøp publikasjonen >>

Personposter BETA

Alvarsson, Jonathan

Søk i DiVA

Av forfatter/redaktør
Alvarsson, Jonathan
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 378 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 1988 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf