uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Bioclipse: Integration of Data and Software in the Life Sciences
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. (Prof. Wikberg)ORCID iD: 0000-0002-8083-2864
2009 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

New high throughput experimental techniques have turned the life sciences into a data-intensive field. Scientists are faced with new types of problems, such as managing voluminous sources of information, integrating heterogeneous data, and applying the proper analysis algorithms; all to end up with reliable conclusions. These challenges call for an infrastructure of algorithms and technologies to supply researchers with the tools and methods necessary to maximize the usefulness of the data. eScience has emerged as a promising technology to take on these challenges, and denotes integrated science carried out in highly distributed network environments, or science that makes use of large data sets and requires high performance computing resources.

In this thesis I present standards, exchange formats, algorithms, and software implementations for empowering researchers in the life sciences with the tools of eScience. The work is centered around Bioclipse - an extensible workbench developed in the frame of this thesis - which provides users with instruments for carrying out integrated research and where technical details are hidden under simple graphical interfaces. Bioclipse is a Rich Client that takes full advantage of the many offerings of eScience, such as networked databases and online services. The benefits of mixing local and remote software in a unifying platform are demonstrated with an integrated approach for predicting metabolic sites in chemical structures. To overcome the limitations of the commonly used technologies for interacting with networked services, I also present a new technology using the XMPP protocol. This enables service discovery and asynchronous communication between the client and server, which is ideal for long-running analyses.

To maximize the usefulness of the available data there is a need for standards, ontologies, and exchange formats, in order to define what information should be captured and how it should be structured and exchanged. A novel format for exchanging QSAR data sets in a fully interoperable and reproducible form is presented, together with an implementation in Bioclipse that takes advantage of eScience components during the setup process.

Bioclipse has been well received by the scientific community, attracted a large group of international users and developers, and has been awarded three international prizes for its innovative character. With continued development, the project has a good chance of becoming an important component in a sustainable infrastructure for the life sciences.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis , 2009. , 53 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy, ISSN 1651-6192 ; 111
Keyword [en]
Bioclipse, integration, life sciences, bioinformatics, cheminformatics, chemoinformatics, eclipse, rich client, xmpp, qsar-ml, web service, standard, ontology
National Category
Bioinformatics and Systems Biology
Identifiers
URN: urn:nbn:se:uu:diva-109305ISBN: 978-91-554-7633-5 (print)OAI: oai:DiVA.org:uu-109305DiVA: diva2:272465
Public defence
2009-11-27, B42, Uppsala Biomedical Center (BMC), Husargatan 3, Uppsala, 13:15 (English)
Opponent
Supervisors
Available from: 2009-11-06 Created: 2009-10-13 Last updated: 2015-05-04Bibliographically approved
List of papers
1. Bioclipse: an open source workbench for chemo- and bioinformatics
Open this publication in new window or tab >>Bioclipse: an open source workbench for chemo- and bioinformatics
Show others...
2007 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 8, 59- p.Article in journal (Refereed) Published
Abstract [en]

BACKGROUND: There is a need for software applications that provide users with a complete and extensible toolkit for chemo- and bioinformatics accessible from a single workbench. Commercial packages are expensive and closed source, hence they do not allow end users to modify algorithms and add custom functionality. Existing open source projects are more focused on providing a framework for integrating existing, separately installed bioinformatics packages, rather than providing user-friendly interfaces. No open source chemoinformatics workbench has previously been published, and no successful attempts have been made to integrate chemo- and bioinformatics into a single framework. RESULTS: Bioclipse is an advanced workbench for resources in chemo- and bioinformatics, such as molecules, proteins, sequences, spectra, and scripts. It provides 2D-editing, 3D-visualization, file format conversion, calculation of chemical properties, and much more; all fully integrated into a user-friendly desktop application. Editing supports standard functions such as cut and paste, drag and drop, and undo/redo. Bioclipse is written in Java and based on the Eclipse Rich Client Platform with a state-of-the-art plugin architecture. This gives Bioclipse an advantage over other systems as it can easily be extended with functionality in any desired direction. CONCLUSION: Bioclipse is a powerful workbench for bio- and chemoinformatics as well as an advanced integration platform. The rich functionality, intuitive user interface, and powerful plugin architecture make Bioclipse the most advanced and user-friendly open source workbench for chemo- and bioinformatics. Bioclipse is released under Eclipse Public License (EPL), an open source license which sets no constraints on external plugin licensing; it is totally open for both open source plugins as well as commercial ones. Bioclipse is freely available at http://www.bioclipse.net.

National Category
Pharmaceutical Sciences
Identifiers
urn:nbn:se:uu:diva-104257 (URN)10.1186/1471-2105-8-59 (DOI)000244600100001 ()17316423 (PubMedID)
Available from: 2009-05-28 Created: 2009-05-28 Last updated: 2015-09-11Bibliographically approved
2. XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services
Open this publication in new window or tab >>XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services
2009 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 10, 279- p.Article in journal (Refereed) Published
Abstract [en]

BACKGROUND:Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use.

RESULTS:We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics.

CONCLUSION:XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics.

Keyword
xmpp, bioclipse, cloud, service, protocol, bioinformatics, cheminformatics, life sciences
National Category
Industrial Biotechnology
Research subject
Pharmaceutical Pharmacology
Identifiers
urn:nbn:se:uu:diva-109290 (URN)doi:10.1186/1471-2105-10-279 (DOI)000271117300001 ()
Available from: 2009-10-13 Created: 2009-10-13 Last updated: 2015-05-04Bibliographically approved
3. Use of Historic Metabolic Biotransformation Data as a Means of Anticipating Metabolic Sites Using MetaPrint2D and Bioclipse
Open this publication in new window or tab >>Use of Historic Metabolic Biotransformation Data as a Means of Anticipating Metabolic Sites Using MetaPrint2D and Bioclipse
Show others...
2010 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 11, 362- p.Article in journal (Refereed) Published
Abstract [en]

Background: Predicting metabolic sites is important in the drug discovery process to aid in rapid compound optimisation. No interactive tool exists and most of the useful tools are quite expensive.Results: Here a fast and reliable method to analyse ligands and visualise potential metabolic sites is presented which is based on annotated metabolic data, described by circular fingerprints. The method is available via the graphical workbench Bioclipse, which is equipped with advanced features in cheminformatics.Conclusions: Due to the speed of predictions (less than 50 ms per molecule), scientists can get real time decision support when editing chemical structures. Bioclipse is a rich client, which means that all calculations are performed on the local computer and do not require network connection. Bioclipse and MetaPrint2D are free for all users, released under open source licenses, and available from http://www.bioclipse.net.

Keyword
bioclipse, metaprint2d, prediction, metabolic, safety, assessment
National Category
Bioinformatics and Systems Biology
Research subject
Bioinformatics; Pharmaceutical Pharmacology
Identifiers
urn:nbn:se:uu:diva-109301 (URN)10.1186/1471-2105-11-362 (DOI)000281440200001 ()20594327 (PubMedID)
Available from: 2009-10-13 Created: 2009-10-13 Last updated: 2015-05-04Bibliographically approved
4. Towards interoperable and reproducible QSAR analyses: Exchange of data sets
Open this publication in new window or tab >>Towards interoperable and reproducible QSAR analyses: Exchange of data sets
Show others...
2010 (English)In: Journal of Cheminformatics, ISSN 1758-2946, Vol. 2, 5Article in journal (Refereed) Published
Abstract [en]

BACKGROUND: QSAR/QSPR is a widely used method to relate chemical structures and responses based on ex- perimental observations. In QSAR, chemical structures are expressed as descriptors, which are mathematical representations like calculated properties or enumerated fragments. Many existing QSAR data sets are based on a combination of different software tools mixed with in-house developed solutions, with datasets manually assembled in spreadsheets. Currently there exists no agreed-upon definition of descriptors and no standard for exchanging data sets in QSAR, which together with numerous different descriptor implementations makes it a virtually impossible task to reproduce and validate analyses, and significantly hinders collaborations and re-use of data.

RESULTS: We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR/QSPR data sets, comprising an open XML format (QSAR-ML) and an open extensible descriptor ontology (Blue Obelisk Descriptor Ontology). The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a data set described by QSAR-ML makes its setup completely reproducible. We also provide an implementation as a set of plugins for Bioclipse that simplifies QSAR data set formation, and allows for exporting in QSAR-ML as well as traditional CSV formats. The implementation facilitates addition of new descriptor implementations, from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services.

CONCLUSIONS: Standardized QSAR data sets opens up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible dataset formation, solving the problems of defining which software components were used, their versions, and the case of multiple names for the same descriptor. This makes is easy to join, extend, combine data sets and also to work collectively. The presented Bioclipse plugins equip scientists with intuitive tools that make QSAR-ML widely available for the community.

Place, publisher, year, edition, pages
BioMed Central, 2010
Keyword
QSAR, Bioclipse, standard, ontology, life sciences, bioinformatics, cheminformatics, reproducible
National Category
Bioinformatics and Systems Biology
Research subject
Bioinformatics
Identifiers
urn:nbn:se:uu:diva-109302 (URN)10.1186/1758-2946-2-5 (DOI)000208222200004 ()20591161 (PubMedID)
Available from: 2009-10-13 Created: 2009-10-13 Last updated: 2015-08-14Bibliographically approved
5. Bioclipse 2: A scriptable integration platform for the life sciences
Open this publication in new window or tab >>Bioclipse 2: A scriptable integration platform for the life sciences
Show others...
2009 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 10, 397- p.Article in journal (Refereed) Published
Abstract [en]

Background: Contemporary biological research integrates neighboring scientific domains to answer complex ques- tions in fields such as systems biology and drug discovery. This calls for tools that are intuitive to use, yet flexible to adapt to new tasks.

Results: Bioclipse is a free, open source workbench with advanced features for the life sciences. Version 2.0 constitutes a complete rewrite of Bioclipse, and delivers a stable, scalable integration platform for developers and an intuitive workbench for end users. All functionality is available both from the graphical user interface and from a built-in novel domain-specific language, supporting the scientist in interdisciplinary research and reproducible analyses through advanced visualization of the inputs and the results. New components for Bioclipse 2 include a rewritten editor for chemical structures, a table for multiple molecules that supports gigabyte-sized files, as well as a graphical editor for sequences and alignments.

Conclusions: Bioclipse 2 is equipped with advanced tools required to carry out complex analysis in the fields of bio- and cheminformatics. Developed as a Rich Client based on Eclipse, Bioclipse 2 leverages on today’s powerful desktop computers for providing a responsive user interface, but also takes full advantage of the Web and networked (Web/Cloud) services for more demanding calculations or retrieval of data. That Bioclipse 2 is based on an advanced and widely used service platform ensures wide extensibility, and new algorithms, visualizations as well as scripting commands can easily be added. The intuitive tools for end users and the extensible architecture make Bioclipse 2 ideal for interdisciplinary and integrative research. Bioclipse 2 is released under the Eclipse Public License (EPL), a flexible open source license that allows additional plugins to be of any license. Bioclipse 2 is implemented in Java and supported on all major platforms; Source code and binaries are freely available at http://www.bioclipse.net.

Keyword
Bioclipse, bioinformatics, cheminformatics, scriptable, script, workbench, life science, platform
National Category
Bioinformatics and Systems Biology Pharmaceutical Sciences
Identifiers
urn:nbn:se:uu:diva-109304 (URN)10.1186/1471-2105-10-397 (DOI)000273329400001 ()
Available from: 2009-12-16 Created: 2009-10-13 Last updated: 2015-05-12Bibliographically approved

Open Access in DiVA

fulltext(1693 kB)1209 downloads
File information
File name FULLTEXT02.pdfFile size 1693 kBChecksum SHA-512
d95e87336c25bb36e2decef7e0a488408a372aa23617c3f62e30d91d2d01601e198aa0e745a40ed2b63d85f929c2fa377a9f5e5c8e9b67a209ccc0e2d53b8b12
Type fulltextMimetype application/pdf
Buy this publication >>

Authority records BETA

Spjuth, Ola

Search in DiVA

By author/editor
Spjuth, Ola
By organisation
Department of Pharmaceutical Biosciences
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 1260 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 2288 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf