uu.seUppsala universitets publikasjoner
Endre søk
Begrens søket
1 - 45 of 45
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Treff pr side
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
  • Standard (Relevans)
  • Forfatter A-Ø
  • Forfatter Ø-A
  • Tittel A-Ø
  • Tittel Ø-A
  • Type publikasjon A-Ø
  • Type publikasjon Ø-A
  • Eldste først
  • Nyeste først
  • Skapad (Eldste først)
  • Skapad (Nyeste først)
  • Senast uppdaterad (Eldste først)
  • Senast uppdaterad (Nyeste først)
  • Disputationsdatum (tidligste først)
  • Disputationsdatum (siste først)
Merk
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Al-Jaff, Mohammed
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinsk biokemi och mikrobiologi.
    Sandström, Eric
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinsk biokemi och mikrobiologi.
    Grabherr, Manfred
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinsk biokemi och mikrobiologi. Uppsala Univ, Bioinformat Infrastruct Life Sci, S-75123 Uppsala, Sweden..
    microTaboo: a general and practical solution to the k-disjoint problem2017Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 18, artikkel-id 228Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: A common challenge in bioinformatics is to identify short sub-sequences that are unique in a set of genomes or reference sequences, which can efficiently be achieved by k-mer (k consecutive nucleotides) counting. However, there are several areas that would benefit from a more stringent definition of "unique", requiring that these sub-sequences of length W differ by more than k mismatches (i.e. a Hamming distance greater than k) from any other sub-sequence, which we term the k-disjoint problem. Examples include finding sequences unique to a pathogen for probe-based infection diagnostics; reducing off-target hits for re-sequencing or genome editing; detecting sequence (e.g. phage or viral) insertions; and multiple substitution mutations. Since both sensitivity and specificity are critical, an exhaustive, yet efficient solution is desirable.

    Results: We present microTaboo, a method that allows for efficient and extensive sequence mining of unique (k-disjoint) sequences of up to 100 nucleotides in length. On a number of simulated and real data sets ranging from microbe-to mammalian-size genomes, we show that microTaboo is able to efficiently find all sub-sequences of a specified length W that do not occur within a threshold of k mismatches in any other sub-sequence. We exemplify that microTaboo has many practical applications, including point substitution detection, sequence insertion detection, padlock probe target search, and candidate CRISPR target mining.

    Conclusions: microTaboo implements a solution to the k-disjoint problem in an alignment-and assembly free manner. microTaboo is available for Windows, Mac OS X, and Linux, running Java 7 and higher, under the GNU GPLv3 license, at:https://MohammedAlJaff.github.io/microTaboo

  • 2.
    Alvarsson, Jonathan
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Andersson, Claes
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Larsson, Rolf
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Klinisk farmakologi.
    Wikberg, Jarl
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Brunn: an open source laboratory information system for microplates with a graphical plate layout design process2011Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 12, nr 1, artikkel-id 179Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background:

    Compound profiling and drug screening generates large amounts of data and is generally based on microplate assays. Current information systems used for handling this are mainly commercial, closed source, expensive, and heavyweight and there is a need for a flexible lightweight open system for handling plate design, and validation and preparation of data.

    Results:

    A Bioclipse plugin consisting of a client part and a relational database was constructed. A multiple-step plate layout point-and-click interface was implemented inside Bioclipse. The system contains a data validation step, where outliers can be removed, and finally a plate report with all relevant calculated data, including dose-response curves.

    Conclusions:

    Brunn is capable of handling the data from microplate assays. It can create dose-response curves and calculate IC50 values. Using a system of this sort facilitates work in the laboratory. Being able to reuse already constructed plates and plate layouts by starting out from an earlier step in the plate layout design process saves time and cuts down on error sources.

  • 3.
    Andersson, Claes R.
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Centrum för bioinformatik.
    Isaksson, Anders
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för genetik och patologi.
    Gustafsson, Mats G.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för genetik och patologi. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Tekniska sektionen, Institutionen för teknikvetenskaper, Signalbehandling.
    Bayesian detection of periodic mRNA time profiles withouth use of training examples2006Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 7, s. 63-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    BACKGROUND: Detection of periodically expressed genes from microarray data without use of known periodic and non-periodic training examples is an important problem, e.g. for identifying genes regulated by the cell-cycle in poorly characterised organisms. Commonly the investigator is only interested in genes expressed at a particular frequency that characterizes the process under study but this frequency is seldom exactly known. Previously proposed detector designs require access to labelled training examples and do not allow systematic incorporation of diffuse prior knowledge available about the period time. RESULTS: A learning-free Bayesian detector that does not rely on labelled training examples and allows incorporation of prior knowledge about the period time is introduced. It is shown to outperform two recently proposed alternative learning-free detectors on simulated data generated with models that are different from the one used for detector design. Results from applying the detector to mRNA expression time profiles from S. cerevisiae showsthat the genes detected as periodically expressed only contain a small fraction of the cell-cycle genes inferred from mutant phenotype. For example, when the probability of false alarm was equal to 7%, only 12% of the cell-cycle genes were detected. The genes detected as periodically expressed were found to have a statistically significant overrepresentation of known cell-cycle regulated sequence motifs. One known sequence motif and 18 putative motifs, previously not associated with periodic expression, were also over represented. CONCLUSION: In comparison with recently proposed alternative learning-free detectors for periodic gene expression, Bayesian inference allows systematic incorporation of diffuse a priori knowledge about, e.g. the period time. This results in relative performance improvements due to increased robustness against errors in the underlying assumptions. Results from applying the detector to mRNA expression time profiles from S. cerevisiae include several new findings that deserve further experimental studies.

  • 4.
    Ausmees, Kristiina
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för beräkningsvetenskap. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.
    John, Aji
    Toor, Salman Z.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för beräkningsvetenskap. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.
    Hellander, Andreas
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för beräkningsvetenskap. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.
    Nettelblad, Carl
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för beräkningsvetenskap. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.
    BAMSI: a multi-cloud service for scalable distributed filtering of massive genome data2018Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 19, s. 240:1-11, artikkel-id 240Artikkel i tidsskrift (Fagfellevurdert)
  • 5.
    Barrio, Alvaro Martínez
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Centrum för bioinformatik.
    Lagercrantz, Erik
    Sperber, Göran O.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för neurovetenskap, Fysiologi.
    Blomberg, Jonas
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Klinisk virologi.
    Bongcam-Rudloff, Erik
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Centrum för bioinformatik.
    Annotation and visualization of endogenous retroviral sequences using the Distributed Annotation System (DAS) and eBioX2009Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10 Suppl. 6, s. S18-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    BACKGROUND: The Distributed Annotation System (DAS) is a widely used network protocol for sharing biological information. The distributed aspects of the protocol enable the use of various reference and annotation servers for connecting biological sequence data to pertinent annotations in order to depict an integrated view of the data for the final user. RESULTS: An annotation server has been devised to provide information about the endogenous retroviruses detected and annotated by a specialized in silico tool called RetroTector. We describe the procedure to implement the DAS 1.5 protocol commands necessary for constructing the DAS annotation server. We use our server to exemplify those steps. Data distribution is kept separated from visualization which is carried out by eBioX, an easy to use open source program incorporating multiple bioinformatics utilities. Some well characterized endogenous retroviruses are shown in two different DAS clients. A rapid analysis of areas free from retroviral insertions could be facilitated by our annotations. CONCLUSION: The DAS protocol has shown to be advantageous in the distribution of endogenous retrovirus data. The distributed nature of the protocol is also found to aid in combining annotation and visualization along a genome in order to enhance the understanding of ERV contribution to its evolution. Reference and annotation servers are conjointly used by eBioX to provide visualization of ERV annotations as well as other data sources. Our DAS data source can be found in the central public DAS service repository, http://www.dasregistry.org, or at http://loka.bmc.uu.se/das/sources.

  • 6.
    Besnier, Francois
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Centrum för bioinformatik.
    Carlborg, Örjan
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Centrum för bioinformatik. Swedish University of Agricultural Sciences, Uppsala, Sweden.
    A general and efficient method for estimating continuous IBD functions for use in genome scans for QTL2007Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8, artikkel-id 440Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Identity by descent (IBD) matrix estimation is a central component in mapping of Quantitative Trait Loci (QTL) using variance component models. A large number of algorithms have been developed for estimation of IBD between individuals in populations at discrete locations in the genome for use in genome scans to detect QTL affecting various traits of interest in experimental animal, human and agricultural pedigrees. Here, we propose a new approach to estimate IBD as continuous functions rather than as discrete values. Results: Estimation of IBD functions improved the computational efficiency and memory usage in genome scanning for QTL. We have explored two approaches to obtain continuous marker-bracket IBD-functions. By re-implementing an existing and fast deterministic IBD-estimation method, we show that this approach results in IBD functions that produces the exact same IBD as the original algorithm, but with a greater than 2-fold improvement of the computational efficiency and a considerably lower memory requirement for storing the resulting genome-wide IBD. By developing a general IBD function approximation algorithm, we show that it is possible to estimate marker-bracket IBD functions from IBD matrices estimated at marker locations by any existing IBD estimation algorithm. The general algorithm provides approximations that lead to QTL variance component estimates that even in worst-case scenarios are very similar to the true values. The approach of storing IBD as polynomial IBD-function was also shown to reduce the amount of memory required in genome scans for QTL. Conclusion: In addition to direct improvements in computational and memory efficiency, estimation of IBD-functions is a fundamental step needed to develop and implement new efficient optimization algorithms for high precision localization of QTL. Here, we discuss and test two approaches for estimating IBD functions based on existing IBD estimation algorithms. Our approaches provide immediately useful techniques for use in single QTL analyses in the variance component QTL mapping framework. They will, however, be particularly useful in genome scans for multiple interacting QTL, where the improvements in both computational and memory efficiency are the key for successful development of efficient optimization algorithms to allow widespread use of this methodology.

  • 7.
    Bornelöv, Susanne
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Beräknings- och systembiologi. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Marillet, Simon
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Beräknings- och systembiologi. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Komorowski, Jan
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Beräknings- och systembiologi. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Ciruvis: a web-based tool for rule networks and interaction detection using rule-based classifiers2014Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 15, s. 139-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: The use of classification algorithms is becoming increasingly important for the field of computational biology. However, not only the quality of the classification, but also its biological interpretation is important. This interpretation may be eased if interacting elements can be identified and visualized, something that requires appropriate tools and methods. Results: We developed a new approach to detecting interactions in complex systems based on classification. Using rule-based classifiers, we previously proposed a rule network visualization strategy that may be applied as a heuristic for finding interactions. We now complement this work with Ciruvis, a web-based tool for the construction of rule networks from classifiers made of IF-THEN rules. Simulated and biological data served as an illustration of how the tool may be used to visualize and interpret classifiers. Furthermore, we used the rule networks to identify feature interactions, compared them to alternative methods, and computationally validated the findings. Conclusions: Rule networks enable a fast method for model visualization and provide an exploratory heuristic to interaction detection. The tool is made freely available on the web and may thus be used to aid and improve rule-based classification.

  • 8.
    Carlsson, Lars
    et al.
    Safety Assessment, AstraZeneca Research & Development.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Adams, Samuel
    Unilever Centre for Molecular Sciences Informatics, University Chemical Laboratory, Cambridge.
    Glen, Robert
    Unilever Centre for Molecular Sciences Informatics, University Chemical Laboratory, Cambridge.
    Boyer, Scott
    Safety Assessment, AstraZeneca Research & Development.
    Use of Historic Metabolic Biotransformation Data as a Means of Anticipating Metabolic Sites Using MetaPrint2D and Bioclipse2010Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11, s. 362-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Predicting metabolic sites is important in the drug discovery process to aid in rapid compound optimisation. No interactive tool exists and most of the useful tools are quite expensive.Results: Here a fast and reliable method to analyse ligands and visualise potential metabolic sites is presented which is based on annotated metabolic data, described by circular fingerprints. The method is available via the graphical workbench Bioclipse, which is equipped with advanced features in cheminformatics.Conclusions: Due to the speed of predictions (less than 50 ms per molecule), scientists can get real time decision support when editing chemical structures. Bioclipse is a rich client, which means that all calculations are performed on the local computer and do not require network connection. Bioclipse and MetaPrint2D are free for all users, released under open source licenses, and available from http://www.bioclipse.net.

  • 9.
    Chantzi, Efthymia
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin.
    Jarvius, Malin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Enarsson, Maria
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för immunologi, genetik och patologi, Neuroonkologi.
    Segerman, Anna
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för immunologi, genetik och patologi, Neuroonkologi. Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin.
    Gustafsson, Mats G
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin.
    COMBImage2: a parallel computational framework for higher-order drug combination analysis that includes automated plate design, matched filter based object counting and temporal data mining2019Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 20, artikkel-id 304Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Pharmacological treatment of complex diseases using more than two drugs is commonplace in the clinic due to better efficacy, decreased toxicity and reduced risk for developing resistance. However, many of these higher-order treatments have not undergone any detailed preceding in vitro evaluation that could support their therapeutic potential and reveal disease related insights. Despite the increased medical need for discovery and development of higher-order drug combinations, very few reports from systematic large-scale studies along this direction exist. A major reason is lack of computational tools that enable automated design and analysis of exhaustive drug combination experiments, where all possible subsets among a panel of pre-selected drugs have to be evaluated.

    Results: Motivated by this, we developed COMBImage2, a parallel computational framework for higher-order drug combination analysis. COMBImage2 goes far beyond its predecessor COMBImage in many different ways. In particular, it offers automated 384-well plate design, as well as quality control that involves resampling statistics and inter-plate analyses. Moreover, it is equipped with a generic matched filter based object counting method that is currently designed for apoptotic-like cells. Furthermore, apart from higher-order synergy analyses, COMBImage2 introduces a novel data mining approach for identifying interesting temporal response patterns and disentangling higher- from lower- and single-drug effects.COMBImage2 was employed in the context of a small pilot study focused on the CUSP9v4 protocol, which is currently used in the clinic for treatment of recurrent glioblastoma. For the first time, all 246 possible combinations of order 4 or lower of the 9 single drugs consisting the CUSP9v4 cocktail, were evaluated on an in vitro clonal culture of glioma initiating cells.

    Conclusions: COMBImage2 is able to automatically design and robustly analyze exhaustive and in general higher-order drug combination experiments. Such a versatile video microscopy oriented framework is likely to enable, guide and accelerate systematic large-scale drug combination studies not only for cancer but also other diseases.

  • 10.
    Chantzi, Efthymia
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin.
    Jarvius, Malin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin. Uppsala Univ, In Vitro Syst Pharmacol Facil, SciLifeLab Drug Discovery & Dev, Uppsala, Sweden.
    Niklasson, Mia
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för immunologi, genetik och patologi, Neuroonkologi.
    Segerman, Anna
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för immunologi, genetik och patologi, Neuroonkologi.
    Gustafsson, Mats G
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Cancerfarmakologi och beräkningsmedicin.
    COMBImage: a modular parallel processing framework for pairwise drug combination analysis that quantifies temporal changes in label-free video microscopy movies2018Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 19, artikkel-id 453Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Large-scale pairwise drug combination analysis has lately gained momentum in drug discovery and development projects, mainly due to the employment of advanced experimental-computational pipelines. This is fortunate as drug combinations are often required for successful treatment of complex diseases. Furthermore, most new drugs cannot totally replace the current standard-of-care medication, but rather have to enter clinical use as add-on treatment. However, there is a clear deficiency of computational tools for label-free and temporal image-based drug combination analysis that go beyond the conventional but relatively uninformative end point measurements.

    Results: COMBImage is a fast, modular and instrument independent computational framework for in vitro pairwise drug combination analysis that quantifies temporal changes in label-free video microscopy movies. Jointly with automated analyses of temporal changes in cell morphology and confluence, it performs and displays conventional cell viability and synergy end point analyses. The image processing algorithms are parallelized using Google's MapReduce programming model and optimized with respect to method-specific tuning parameters. COMBImage is shown to process time-lapse microscopy movies from 384-well plates within minutes on a single quad core personal computer.This framework was employed in the context of an ongoing drug discovery and development project focused on glioblastoma multiforme; the most deadly form of brain cancer. Interesting add-on effects of two investigational cytotoxic compounds when combined with vorinostat were revealed on recently established clonal cultures of glioma-initiating cells from patient tumor samples. Therapeutic synergies, when normal astrocytes were used as a toxicity cell model, reinforced the pharmacological interest regarding their potential clinical use.

    Conclusions: COMBImage enables, for the first time, fast and optimized pairwise drug combination analyses of temporal changes in label-free video microscopy movies. Providing this jointly with conventional cell viability based end point analyses, it could help accelerating and guiding any drug discovery and development project, without use of cell labeling and the need to employ a particular live cell imaging instrument.

  • 11.
    D'Elia, Domenica
    et al.
    Institute for Biomedical Technologies, CNR, Via Amendola 122/D, 70126 Bari, Italy.
    Gisel, Andreas
    Institute for Biomedical Technologies, CNR, Via Amendola 122/D, 70126 Bari, Italy.
    Eriksson, Nils-Einar
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Centrum för bioinformatik.
    Kossida, Sophia
    Bioinformatics & Medical Informatics Team, Biomedical Research Foundation of the Academy of Athens, 11527 Athens, Greece.
    Mattila, Kimmo
    CSC – IT Center for Science Ltd., Keilaranta 14, 02100 Espoo, Finland.
    Klucar, Lubos
    Institute of Molecular Biology, Slovak Academy of Sciences, Dubravska cesta 21, 84551 Bratislava, Slovakia.
    Bongcam-Rudloff, Erik
    Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, 75024 Uppsala, Sweden.
    The 20th anniversary of EMBnet: 20 years of bioinformatics for the Life Sciences community2009Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, nr Suppl. 6, s. S1-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    The EMBnet Conference 2008, focusing on 'Leading Applications and Technologies in Bioinformatics', was organized by the European Molecular Biology network (EMBnet) to celebrate its 20th anniversary. Since its foundation in 1988, EMBnet has been working to promote collaborative development of bioinformatics services and tools to serve the European community of molecular biology laboratories. This conference was the first meeting organized by the network that was open to the international scientific community outside EMBnet. The conference covered a broad range of research topics in bioinformatics with a main focus on new achievements and trends in emerging technologies supporting genomics, transcriptomics and proteomics analyses such as high-throughput sequencing and data managing, text and data-mining, ontologies and Grid technologies. Papers selected for publication, in this supplement to BMC Bioinformatics, cover a broad range of the topics treated, providing also an overview of the main bioinformatics research fields that the EMBnet community is involved in.

  • 12. Duforet-Frebourg, Nicolas
    et al.
    Gattepaille, Lucie M.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för ekologi och genetik, Evolutionsbiologi.
    Blum, Michael G. B.
    Jakobsson, Mattias
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för ekologi och genetik, Evolutionsbiologi. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    HaploPOP: a software that improves population assignment by combining markers into haplotypes2015Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 16, artikkel-id 242Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: In ecology and forensics, some population assignment techniques use molecular markers to assign individuals to known groups. However, assigning individuals to known populations can be difficult if the level of genetic differentiation among populations is small. Most assignment studies handle independent markers, often by pruning markers in Linkage Disequilibrium (LD), ignoring the information contained in the correlation among markers due to LD. Results: To improve the accuracy of population assignment, we present an algorithm, implemented in the HaploPOP software, that combines markers into haplotypes, without requiring independence. The algorithm is based on the Gain of Informativeness for Assignment that provides a measure to decide if a pair of markers should be combined into haplotypes, or not, in order to improve assignment. Because complete exploration of all possible solutions for constructing haplotypes is computationally prohibitive, our approach uses a greedy algorithm based on windows of fixed sizes. We evaluate the performance of HaploPOP to assign individuals to populations using a split-validation approach. We investigate both simulated SNPs data and dense genotype data from individuals from Spain and Portugal. Conclusions: Our results show that constructing haplotypes with HaploPOP can substantially reduce assignment error. The HaploPOP software is freely available as a command-line software at www.ieg.uu.se/Jakobsson/software/HaploPOP/.

  • 13.
    Eklund, Martin
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Wikberg, Jarl
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    An eScience-Bayes strategy for analyzing omics data2010Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11, s. 282-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: The omics fields promise to revolutionize our understanding of biology and biomedicine. However, their potential is compromised by the challenge to analyze the huge datasets produced. Analysis of omics data is plagued by the curse of dimensionality, resulting in imprecise estimates of model parameters and performance. Moreover, the integration of omics data with other data sources is difficult to shoehorn into classical statistical models. This has resulted in ad hoc approaches to address specific problems. Results: We present a general approach to omics data analysis that alleviates these problems. By combining eScience and Bayesian methods, we retrieve scientific information and data from multiple sources and coherently incorporate them into large models. These models improve the accuracy of predictions and offer new insights into the underlying mechanisms. This "eScience-Bayes" approach is demonstrated in two proof-of-principle applications, one for breast cancer prognosis prediction from transcriptomic data and one for protein-protein interaction studies based on proteomic data. Conclusions: Bayesian statistics provide the flexibility to tailor statistical models to the complex data structures in omics biology as well as permitting coherent integration of multiple data sources. However, Bayesian methods are in general computationally demanding and require specification of possibly thousands of prior distributions. eScience can help us overcome these difficulties. The eScience-Bayes thus approach permits us to fully leverage on the advantages of Bayesian methods, resulting in models with improved predictive performance that gives more information about the underlying biological system.

  • 14.
    Eklund, Martin
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Wikberg, Jarl E S
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    The C1C2: a framework for simultaneous model selection and assessment2008Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 9, s. 360-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    BACKGROUND: There has been recent concern regarding the inability of predictive modeling approaches to generalize to new data. Some of the problems can be attributed to improper methods for model selection and assessment. Here, we have addressed this issue by introducing a novel and general framework, the C1C2, for simultaneous model selection and assessment. The framework relies on a partitioning of the data in order to separate model choice from model assessment in terms of used data. Since the number of conceivable models in general is vast, it was also of interest to investigate the employment of two automatic search methods, a genetic algorithm and a brute-force method, for model choice. As a demonstration, the C1C2 was applied to simulated and real-world datasets. A penalized linear model was assumed to reasonably approximate the true relation between the dependent and independent variables, thus reducing the model choice problem to a matter of variable selection and choice of penalizing parameter. We also studied the impact of assuming prior knowledge about the number of relevant variables on model choice and generalization error estimates. The results obtained with the C1C2 were compared to those obtained by employing repeated K-fold cross-validation for choosing and assessing a model. RESULTS: The C1C2 framework performed well at finding the true model in terms of choosing the correct variable subset and producing reasonable choices for the penalizing parameter, even in situations when the independent variables were highly correlated and when the number of observations was less than the number of variables. The C1C2 framework was also found to give accurate estimates of the generalization error. Prior information about the number of important independent variables improved the variable subset choice but reduced the accuracy of generalization error estimates. Using the genetic algorithm worsened the model choice but not the generalization error estimates, compared to using the brute-force method. The results obtained with repeated K-fold cross-validation were similar to those produced by the C1C2 in terms of model choice, however a lower accuracy of the generalization error estimates was observed. CONCLUSION: The C1C2 framework was demonstrated to work well for finding the true model within a penalized linear model class and accurately assess its generalization error, even for datasets with many highly correlated independent variables, a low observation-to-variable ratio, and model assumption deviations. A complete separation of the model choice and the model assessment in terms of data used for each task improves the estimates of the generalization error.

  • 15. Flores, Samuel
    FlexOracle: predicting flexible hinges by identification of stable domains2007Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8, s. 215-Artikkel i tidsskrift (Fagfellevurdert)
  • 16. Flores, Samuel
    Hinge Atlas: relating sequence features to sites of structural flexibility2007Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8, s. 167-Artikkel i tidsskrift (Fagfellevurdert)
  • 17.
    Flores, Samuel
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Beräknings- och systembiologi.
    Gerstein, Mark
    Yale University.
    Predicting protein ligand binding motions with the Conformation Explorer2011Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 12, s. 417-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background

    Knowledge of the structure of proteins bound to known or potential ligands is crucial for biological understanding and drug design. Often the 3D structure of the protein is available in some conformation, but binding the ligand of interest may involve a large scale conformational change which is difficult to predict with existing methods.

    Results

    We describe how to generate ligand binding conformations of proteins that move by hinge bending, the largest class of motions. First, we predict the location of the hinge between domains. Second, we apply an Euler rotation to one of the domains about the hinge point. Third, we compute a short-time dynamical trajectory using Molecular Dynamics to equilibrate the protein and ligand and correct unnatural atomic positions. Fourth, we score the generated structures using a novel fitness function which favors closed or holo structures. By iterating the second through fourth steps we systematically minimize the fitness function, thus predicting the conformational change required for small ligand binding for five well studied proteins.

    Conclusions

    We demonstrate that the method in most cases successfully predicts the holo conformation given only an apo structure.

  • 18.
    Freyhult, Eva
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Tekniska sektionen, Institutionen för teknikvetenskaper, Signalbehandling.
    Prusis, Peteris
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Tekniska sektionen, Institutionen för teknikvetenskaper, Signalbehandling.
    Lapinsh, Maris
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Tekniska sektionen, Institutionen för teknikvetenskaper, Signalbehandling.
    Wikberg, Jarl E S
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Tekniska sektionen, Institutionen för teknikvetenskaper, Signalbehandling.
    Moulton, Vincent
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Tekniska sektionen, Institutionen för teknikvetenskaper, Signalbehandling.
    Gustafsson, Mats G
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Tekniska sektionen, Institutionen för teknikvetenskaper.
    Unbiased descriptor and parameter selection confirms the potential of proteochemometric modelling2005Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 6, s. 50-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background

    Proteochemometrics is a new methodology that allows prediction of protein function directly from real interaction measurement data without the need of 3D structure information. Several reported proteochemometric models of ligand-receptor interactions have already yielded significant insights into various forms of bio-molecular interactions. The proteochemometric models are multivariate regression models that predict binding affinity for a particular combination of features of the ligand and protein. Although proteochemometric models have already offered interesting results in various studies, no detailed statistical evaluation of their average predictive power has been performed. In particular, variable subset selection performed to date has always relied on using all available examples, a situation also encountered in microarray gene expression data analysis.

    Results

    A methodology for an unbiased evaluation of the predictive power of proteochemometric models was implemented and results from applying it to two of the largest proteochemometric data sets yet reported are presented. A double cross-validation loop procedure is used to estimate the expected performance of a given design method. The unbiased performance estimates (P2) obtained for the data sets that we consider confirm that properly designed single proteochemometric models have useful predictive power, but that a standard design based on cross validation may yield models with quite limited performance. The results also show that different commercial software packages employed for the design of proteochemometric models may yield very different and therefore misleading performance estimates. In addition, the differences in the models obtained in the double CV loop indicate that detailed chemical interpretation of a single proteochemometric model is uncertain when data sets are small.

    Conclusion

    The double CV loop employed offer unbiased performance estimates about a given proteochemometric modelling procedure, making it possible to identify cases where the proteochemometric design does not result in useful predictive models. Chemical interpretations of single proteochemometric models are uncertain and should instead be based on all the models selected in the double CV loop employed here.

  • 19.
    Hooper, Sean D
    et al.
    The Breakthrough Breast Cancer Research Centre, The Institute of Cancer Research.
    Jiao, Xiang
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för immunologi, genetik och patologi.
    Rosenlund, Magnus
    Tellgren-Roth, Christian
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för immunologi, genetik och patologi.
    Cavelier, Lucia
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för immunologi, genetik och patologi, Medicinsk genetik.
    Sjöblom, Tobias
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för immunologi, genetik och patologi, Genomik.
    Interpreting translocations detected by paired-end sequencing of cancer samples2012Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105Artikkel i tidsskrift (Fagfellevurdert)
  • 20.
    Kavakiotis, Ioannis
    et al.
    Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki, Greece..
    Xochelli, Aliki
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för immunologi, genetik och patologi, Experimentell och klinisk onkologi. CERTH, Inst Appl Biosci, Thessaloniki, Greece..
    Agathangelidis, Andreas
    Ist Sci San Raffaele, Div Mol Oncol, Milan, Italy.;Ist Sci San Raffaele, Dept Oncohematol, Milan, Italy..
    Tsoumakas, Grigorios
    Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki, Greece..
    Maglaveras, Nicos
    CERTH, Inst Appl Biosci, Thessaloniki, Greece.;Aristotle Univ Thessaloniki, Sch Med, Lab Comp & Med Informat, Thessaloniki, Greece..
    Stamatopoulos, Kostas
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för immunologi, genetik och patologi, Experimentell och klinisk onkologi. CERTH, Inst Appl Biosci, Thessaloniki, Greece..
    Hadzidimitriou, Anastasia
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för immunologi, genetik och patologi, Experimentell och klinisk onkologi. CERTH, Inst Appl Biosci, Thessaloniki, Greece..
    Vlahavas, Ioannis
    Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki, Greece..
    Chouvarda, Ioanna
    CERTH, Inst Appl Biosci, Thessaloniki, Greece.;Aristotle Univ Thessaloniki, Sch Med, Lab Comp & Med Informat, Thessaloniki, Greece..
    Integrating multiple immunogenetic data sources for feature extraction and mining somatic hypermutation patterns: the case of "towards analysis" in chronic lymphocytic leukaemia2016Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 17, artikkel-id 173Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Somatic Hypermutation (SHM) refers to the introduction of mutations within rearranged V(D)J genes, a process that increases the diversity of Immunoglobulins (IGs). The analysis of SHM has offered critical insight into the physiology and pathology of B cells, leading to strong prognostication markers for clinical outcome in chronic lymphocytic leukaemia (CLL), the most frequent adult B-cell malignancy. In this paper we present a methodology for integrating multiple immunogenetic and clinocobiological data sources in order to extract features and create high quality datasets for SHM analysis in IG receptors of CLL patients. This dataset is used as the basis for a higher level integration procedure, inspired form social choice theory. This is applied in the Towards Analysis, our attempt to investigate the potential ontogenetic transformation of genes belonging to specific stereotyped CLL subsets towards other genes or gene families, through SHM. Results: The data integration process, followed by feature extraction, resulted in the generation of a dataset containing information about mutations occurring through SHM. The Towards analysis performed on the integrated dataset applying voting techniques, revealed the distinct behaviour of subset #201 compared to other subsets, as regards SHM related movements among gene clans, both in allele-conserved and non-conserved gene areas. With respect to movement between genes, a high percentage movement towards pseudo genes was found in all CLL subsets. Conclusions: This data integration and feature extraction process can set the basis for exploratory analysis or a fully automated computational data mining approach on many as yet unanswered, clinically relevant biological questions.

  • 21.
    Kruczyk, Marcin
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Beräknings- och systembiologi.
    Umer, Husen Muhammad
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Beräknings- och systembiologi.
    Enroth, Stefan
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för immunologi, genetik och patologi, Genomik.
    Komorowski, Jan
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Beräknings- och systembiologi.
    Peak Finder Metaserver - a novel application for finding peaks in ChIP-seq data2013Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, s. 280-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Finding peaks in ChIP-seq is an important process in biological inference. In some cases, such as positioning nucleosomes with specific histone modifications or finding transcription factor binding specificities, the precision of the detected peak plays a significant role. There are several applications for finding peaks (called peak finders) based on different algorithms (e.g. MACS, Erange and HPeak). Benchmark studies have shown that the existing peak finders identify different peaks for the same dataset and it is not known which one is the most accurate. We present the first meta-server called Peak Finder MetaServer (PFMS) that collects results from several peak finders and produces consensus peaks. Our application accepts three standard ChIP-seq data formats: BED, BAM, and SAM. Results: Sensitivity and specificity of seven widely used peak finders were examined. For the experiments we used three previously studied Transcription Factors (TF) ChIP-seq datasets and identified three of the selected peak finders that returned results with high specificity and very good sensitivity compared to the remaining four. We also ran PFMS using the three selected peak finders on the same TF datasets and achieved higher specificity and sensitivity than the peak finders individually. Conclusions: We show that combining outputs from up to seven peak finders yields better results than individual peak finders. In addition, three of the seven peak finders outperform the remaining four, and running PFMS with these three returns even more accurate results. Another added value of PFMS is a separate report of the peaks returned by each of the included peak finders.

  • 22.
    Kuhn, Thomas
    et al.
    Fachhochschule Gelsenkirchen.
    Willighagen, Egon
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Zielesny, Achim
    Fachhochschule Gelsenkirchen.
    Steinbeck, Christoph
    European Bioinformatics Institute, Cambridge, UK.
    CDK-Taverna: an open workflow environment for cheminformatics2010Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11, s. 159-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background

    Small molecules are of increasing interest for bioinformatics in areas such as metabolomics and drug discovery. The recent release of large open access chemistry databases generates a demand for flexible tools to process them and discover new knowledge. To freely support open science based on these data resources, it is desirable for the processing tools to be open-source and available for everyone.

    Results

    Here we describe a novel combination of the workflow engine Taverna and the cheminformatics library Chemistry Development Kit (CDK) resulting in a open source workflow solution for cheminformatics. We have implemented more than 160 different workers to handle specific cheminformatics tasks. We describe the applications of CDK-Taverna in various usage scenarios.

    Conclusions

    The combination of the workflow engine Taverna and the Chemistry Development Kit provides the first open source cheminformatics workflow solution for the biosciences. With the Taverna-community working towards a more powerful workflow engine and a more user-friendly user interface, CDK-Taverna has the potential to become a free alternative to existing proprietary workflow tools.

  • 23.
    Kultima, Kim
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Scholz, Birger
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Alm, Henrik
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Sköld, Karl
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Svensson, Marcus
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Crossman, Alan
    Bezard, Erwan
    Andrén, Per E.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Lönnstedt, Ingrid
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Matematiska institutionen.
    Normalization and expression changes in predefined sets of proteins using 2D gel electrophoresis: A proteomic study of L-DOPA induced dyskinesia in an animal model of Parkinson's disease using DIGE2006Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 7, s. 475-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Two-Dimensional Difference In Gel Electrophoresis (2D-DIGE) is a powerful tool for measuring differences in protein expression between samples or conditions. However, to remove systematic variability within and between gels the data has to be normalized.

    In this study we examined the ability of four existing and four novel normalization methods to remove systematic bias in data produced with 2D-DIGE. We also propose a modification of an existing method where the statistical framework determines whether a set of proteins shows an association with the predefined phenotypes of interest. This method was applied to our data generated from a monkey model (Macaca fascicularis) of Parkinson's disease.

    Results: Using 2D-DIGE we analysed the protein content of the striatum from 6 control and 21 MPTP-treated monkeys, with or without de novo or long-term L-DOPA administration.

    There was an intensity and spatial bias in the data of all the gels examined in this study. Only two of the eight normalization methods evaluated ('2D loess+scale' and 'SC-2D+quantile') successfully removed both the intensity and spatial bias. In 'SC-2D+quantile' we extended the commonly used loess normalization method against dye bias in two-channel microarray systems to suit systems with three or more channels. Further, by using the proposed method, Differential Expression in Predefined Proteins Sets (DEPPS), several sets of proteins associated with the priming effects of L-DOPA in the striatum in parkinsonian animals were identified. Three of these sets are proteins involved in energy metabolism and one set involved proteins which are part of the microtubule cytoskeleton.

    Conclusion: Comparison of the different methods leads to a series of methodological recommendations for the normalization and the analysis of data, depending on the experimental design. Due to the nature of 2D-DIGE data we recommend that the p-values obtained in significance tests should be used as rankings only. Individual proteins may be interesting as such, but by studying sets of proteins the interpretation of the results are probably more accurate and biologically informative.

  • 24.
    Lapins, Maris
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Eklund, Martin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Prusis, Peteris
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Wikberg, Jarl E S
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Proteochemometric modeling of HIV protease susceptibility2008Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 9, s. 181-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    BACKGROUND

    A major obstacle in treatment of HIV is the ability of the virus to mutate rapidly into drug-resistant variants. A method for predicting the susceptibility of mutated HIV strains to antiviral agents would provide substantial clinical benefit as well as facilitate the development of new candidate drugs. Therefore, we used proteochemometrics to model the susceptibility of HIV to protease inhibitors in current use, utilizing descriptions of the physico-chemical properties of mutated HIV proteases and 3D structural property descriptions for the protease inhibitors. The descriptions were correlated to the susceptibility data of 828 unique HIV protease variants for seven protease inhibitors in current use; the data set comprised 4792 protease-inhibitor combinations.

    RESULTS

    The model provided excellent predictability (R2 = 0.92, Q2 = 0.87) and identified general and specific features of drug resistance. The model's predictive ability was verified by external prediction in which the susceptibilities to each one of the seven inhibitors were omitted from the data set, one inhibitor at a time, and the data for the six remaining compounds were used to create new models. This analysis showed that the over all predictive ability for the omitted inhibitors was Q2 inhibitors = 0.72.

    CONCLUSION

    Our results show that a proteochemometric approach can provide generalized susceptibility predictions for new inhibitors. Our proteochemometric model can directly analyze inhibitor-protease interactions and facilitate treatment selection based on viral genotype. The model is available for public use, and is located at HIV Drug Research Centre.

  • 25.
    Lapins, Maris
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Wikberg, Jarl E. S.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Kinome-wide interaction modelling using alignment-based and alignment-independent approaches for kinase description and linear and non-linear data analysis techniques2010Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 11, s. 339-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background

    Protein kinases play crucial roles in cell growth, differentiation, and apoptosis. Abnormal function of protein kinases can lead to many serious diseases, such as cancer. Kinase inhibitors have potential for treatment of these diseases. However, current inhibitors interact with a broad variety of kinases and interfere with multiple vital cellular processes, which causes toxic effects. Bioinformatics approaches that can predict inhibitor-kinase interactions from the chemical properties of the inhibitors and the kinase macromolecules might aid in design of more selective therapeutic agents, that show better efficacy and lower toxicity.

    Results

    We applied proteochemometric modelling to correlate the properties of 317 wild-type and mutated kinases and 38 inhibitors (12,046 inhibitor-kinase combinations) to the respective combination's interaction dissociation constant (K-d). We compared six approaches for description of protein kinases and several linear and non-linear correlation methods. The best performing models encoded kinase sequences with amino acid physico-chemical z-scale descriptors and used support vector machines or partial least-squares projections to latent structures for the correlations. Modelling performance was estimated by double cross-validation. The best models showed high predictive ability; the squared correlation coefficient for new kinase-inhibitor pairs ranging P-2 = 0.67-0.73; for new kinases it ranged P-kin(2) = 0.65-0.70. Models could also separate interacting from non-interacting inhibitor-kinase pairs with high sensitivity and specificity; the areas under the ROC curves ranging AUC = 0.92-0.93. We also investigated the relationship between the number of protein kinases in the dataset and the modelling results. Using only 10% of all data still a valid model was obtained with P-2 = 0.47, P-kin(2) = 0.42 and AUC = 0.83.

    Conclusions

    Our results strongly support the applicability of proteochemometrics for kinome-wide interaction modelling. Proteochemometrics might be used to speed-up identification and optimization of protein kinase targeted and multi-targeted inhibitors.

  • 26. Masseroli, Marco
    et al.
    Mons, Barend
    Bongcam-Rudloff, Erik
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för immunologi, genetik och patologi.
    Ceri, Stefano
    Kel, Alexander
    Rechenmann, Francois
    Lisacek, Frederique
    Romano, Paolo
    Integrated Bio-Search: challenges and trends for the integration, search and comprehensive processing of biological information2014Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 15, nr S1, s. S2-Artikkel, forskningsoversikt (Fagfellevurdert)
    Abstract [en]

    Many efforts exist to design and implement approaches and tools for data capture, integration and analysis in the life sciences. Challenges are not only the heterogeneity, size and distribution of information sources, but also the danger of producing too many solutions for the same problem. Methodological, technological, infrastructural and social aspects appear to be essential for the development of a new generation of best practices and tools. In this paper, we analyse and discuss these aspects from different perspectives, by extending some of the ideas that arose during the NETTAB 2012 Workshop, making reference especially to the European context. First, relevance of using data and software models for the management and analysis of biological data is stressed. Second, some of the most relevant community achievements of the recent years, which should be taken as a starting point for future efforts in this research domain, are presented. Third, some of the main outstanding issues, challenges and trends are analysed. The challenges related to the tendency to fund and create large scale international research infrastructures and public-private partnerships in order to address the complex challenges of data intensive science are especially discussed. The needs and opportunities of Genomic Computing (the integration, search and display of genomic information at a very specific level, e.g. at the level of a single DNA region) are then considered. In the current data and network-driven era, social aspects can become crucial bottlenecks. How these may best be tackled to unleash the technical abilities for effective data integration and validation efforts is then discussed. Especially the apparent lack of incentives for already overwhelmed researchers appears to be a limitation for sharing information and knowledge with other scientists. We point out as well how the bioinformatics market is growing at an unprecedented speed due to the impact that new powerful in silico analysis promises to have on better diagnosis, prognosis, drug discovery and treatment, towards personalized medicine. An open business model for bioinformatics, which appears to be able to reduce undue duplication of efforts and support the increased reuse of valuable data sets, tools and platforms, is finally discussed.

  • 27. Oja, Merja
    et al.
    Peltonen, Jaakko
    Blomberg, Jonas
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper.
    Kaski, Samuel
    Methods for estimating human endogenous retrovirus activities from EST databases2007Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8, s. S11-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Human endogenous retroviruses (HERVs) are surviving traces of ancient retrovirus infections and now reside within the human DNA. Recently HERV expression has been detected in both normal tissues and diseased patients. However, the activities (expression levels) of individual HERV sequences are mostly unknown. Results: We introduce a generative mixture model, based on Hidden Markov Models, for estimating the activities of the individual HERV sequences from EST (expressed sequence tag) databases. We use the model to estimate the relative activities of 181 HERVs. We also empirically justify a faster heuristic method for HERV activity estimation and use it to estimate the activities of 2450 HERVs. The majority of the HERV activities were previously unknown. Conclusion: (i) Our methods estimate activity accurately based on experiments on simulated data. (ii) Our estimate on real data shows that 7% of the HERVs are active. The active ones are spread unevenly into HERV groups and relatively uniformly in terms of estimated age. HERVs with the retroviral env gene are more often active than HERVs without env. Few of the active HERVs have open reading frames for retroviral proteins.

  • 28. Pemberton, Trevor J.
    et al.
    Sandefur, Conner I.
    Jakobsson, Mattias
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för evolution, genomik och systematik, Evolutionsbiologi.
    Rosenberg, Noah A.
    Sequence determinants of human microsatellite variability2009Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, s. e612-Artikkel, forskningsoversikt (Fagfellevurdert)
    Abstract [en]

    Background

    Microsatellite loci are frequently used in genomic studies of DNA sequence repeats and in population studies of genetic variability. To investigate the effect of sequence properties of microsatellites on their level of variability we have analyzed genotypes at 627 microsatellite loci in 1,048 worldwide individuals from the HGDP-CEPH cell line panel together with the DNA sequences of these microsatellites in the human RefSeq database.

    Results

    Calibrating PCR fragment lengths in individual genotypes by using the RefSeq sequence enabled us to infer repeat number in the HGDP-CEPH dataset and to calculate the mean number of repeats (as opposed to the mean PCR fragment length), under the assumption that differences in PCR fragment length reflect differences in the numbers of repeats in the embedded repeat sequences. We find the mean and maximum numbers of repeats across individuals to be positively correlated with heterozygosity. The size and composition of the repeat unit of a microsatellite are also important factors in predicting heterozygosity, with tetra-nucleotide repeat units high in G/C content leading to higher heterozygosity. Finally, we find that microsatellites containing more separate sets of repeated motifs generally have higher heterozygosity.

    Conclusions

    These results suggest that sequence properties of microsatellites have a significant impact in determining the features of human microsatellite variability.

  • 29.
    Polychronidou, Eleftheria
    et al.
    Ctr Res & Technol Hellas, Informat Technol Inst, 6th Km Harilaou Thermi Rd, Thessaloniki, Greece.
    Kalamaras, Ilias
    Ctr Res & Technol Hellas, Informat Technol Inst, 6th Km Harilaou Thermi Rd, Thessaloniki, Greece.
    Agathangelidis, Andreas
    Ctr Res & Technol Hellas, Inst Appl Biosci, 6th Km Harilaou Thermi Rd, Thessaloniki, Greece.
    Sutton, Lesley Ann
    Uppsala universitet, Science for Life Laboratory, SciLifeLab. Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för immunologi, genetik och patologi, Experimentell och klinisk onkologi. Tech Univ Denmark, Dept Immunol, Copenhagen, Denmark.
    Yan, Xiao-Jie
    Feinstein Inst Med Res, Karches Ctr Chron Lymphocyt Leukemia Res, Manhasset, NY USA.
    Bikos, Vasilis
    Masaryk Univ, Cent European Inst Technol, Brno, Czech Republic.
    Vardi, Anna
    G Papanikolaou Hosp, Hematol Dept, Thessaloniki, Greece;G Papanikolaou Hosp, HCT Unit, Thessaloniki, Greece.
    Mochament, Konstantinos
    Ctr Res & Technol Hellas, Informat Technol Inst, 6th Km Harilaou Thermi Rd, Thessaloniki, Greece.
    Chiorazzi, Nicholas
    Feinstein Inst Med Res, Karches Ctr Chron Lymphocyt Leukemia Res, Manhasset, NY USA.
    Belessi, Chrysoula
    Nikea Gen Hosp, Dept Hematol, Piraeus, Greece.
    Rosenquist, Richard
    Uppsala universitet, Science for Life Laboratory, SciLifeLab. Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för immunologi, genetik och patologi, Experimentell och klinisk onkologi. Tech Univ Denmark, Dept Immunol, Copenhagen, Denmark.
    Ghia, Paolo
    IRCCS San Raffaele Sci Inst, Milan, Italy;Univ Milan, VitaSalute, San Raffaele, Div Expt Oncol, Milan, Italy.
    Stamatopoulos, Kostas
    Ctr Res & Technol Hellas, Inst Appl Biosci, 6th Km Harilaou Thermi Rd, Thessaloniki, Greece.
    Vlamos, Panayiotis
    Ionian Univ, Dept Informat, Corfu, Greece.
    Chailyan, Anna
    Carlsberg Res Lab, Copenhagen, Denmark.
    Overby, Nanna
    Tech Univ Denmark, Ctr Biol Sequence Anal, Copenhagen, Denmark.
    Marcatili, Paolo
    Tech Univ Denmark, Ctr Biol Sequence Anal, Copenhagen, Denmark.
    Hatzidimitriou, Anastasia
    Ctr Res & Technol Hellas, Inst Appl Biosci, 6th Km Harilaou Thermi Rd, Thessaloniki, Greece.
    Tzovaras, Dimitrios
    Ctr Res & Technol Hellas, Informat Technol Inst, 6th Km Harilaou Thermi Rd, Thessaloniki, Greece.
    Automated shape-based clustering of 3D immunoglobulin protein structures in chronic lymphocytic leukemia2018Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 19, artikkel-id 414Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Although the etiology of chronic lymphocytic leukemia (CLL), the most common type of adult leukemia, is still unclear, strong evidence implicates antigen involvement in disease ontogeny and evolution. Primary and 3D structure analysis has been utilised in order to discover indications of antigenic pressure. The latter has been mostly based on the 3D models of the clonotypic B cell receptor immunoglobulin (BcR IG) amino acid sequences. Therefore, their accuracy is directly dependent on the quality of the model construction algorithms and the specific methods used to compare the ensuing models. Thus far, reliable and robust methods that can group the IG 3D models based on their structural characteristics are missing. Results: Here we propose a novel method for clustering a set of proteins based on their 3D structure focusing on 3D structures of BcR IG from a large series of patients with CLL. The method combines techniques from the areas of bioinformatics, 3D object recognition and machine learning. The clustering procedure is based on the extraction of 3D descriptors, encoding various properties of the local and global geometrical structure of the proteins. The descriptors are extracted from aligned pairs of proteins. A combination of individual 3D descriptors is also used as an additional method. The comparison of the automatically generated clusters to manual annotation by experts shows an increased accuracy when using the 3D descriptors compared to plain bioinformatics-based comparison. The accuracy is increased even more when using the combination of 3D descriptors. Conclusions: The experimental results verify that the use of 3D descriptors commonly used for 3D object recognition can be effectively applied to distinguishing structural differences of proteins. The proposed approach can be applied to provide hints for the existence of structural groups in a large set of unannotated BcR IG protein files in both CLL and, by logical extension, other contexts where it is relevant to characterize BcR IG structural similarity. The method does not present any limitations in application and can be extended to other types of proteins.

  • 30.
    Prusis, Peteris
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Uhlén, Staffan
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Petrovska, Ramona
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Lapinsh, Maris
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Wikberg, Jarl E S
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Prediction of indirect interactions in proteins2006Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 7, s. 167-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    BACKGROUND: Both direct and indirect interactions determine molecular recognition of ligands by proteins. Indirect interactions can be defined as effects on recognition controlled from distant sites in the proteins, e.g. by changes in protein conformation and mobility, whereas direct interactions occur in close proximity of the protein's amino acids and the ligand. Molecular recognition is traditionally studied using three-dimensional methods, but with such techniques it is difficult to predict the effects caused by mutational changes of amino acids located far away from the ligand-binding site. We recently developed an approach, proteochemometrics, to the study of molecular recognition that models the chemical effects involved in the recognition of ligands by proteins using statistical sampling and mathematical modelling. RESULTS: A proteochemometric model was built, based on a statistically designed protein library's (melanocortin receptors') interaction with three peptides and used to predict which amino acids and sequence fragments that are involved in direct and indirect ligand interactions. The model predictions were confirmed by directed mutagenesis. The predicted presumed direct interactions were in good agreement with previous three-dimensional studies of ligand recognition. However, in addition the model could also correctly predict the location of indirect effects on ligand recognition arising from distant sites in the receptors, something that three-dimensional modelling could not afford. CONCLUSION: We demonstrate experimentally that proteochemometric modelling can be used with high accuracy to predict the site of origin of direct and indirect effects on ligand recognitions by proteins.

  • 31. Rantalainen, Mattias
    et al.
    Cloarec, Olivier
    Ebbels, Timothy M. D.
    Lundstedt, Torbjörn
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för läkemedelskemi, Avdelningen för organisk farmaceutisk kemi.
    Nicholson, Jeremy K.
    Holmes, Elaine
    Trygg, Johan
    Piecewise multivariate modelling of sequential metabolic profiling data2008Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 9, s. 105-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Modelling the time-related behaviour of biological systems is essential for understanding their dynamic responses to perturbations. In metabolic profiling studies, the sampling rate and number of sampling points are often restricted due to experimental and biological constraints. Results: A supervised multivariate modelling approach with the objective to model the time-related variation in the data for short and sparsely sampled time-series is described. A set of piecewise Orthogonal Projections to Latent Structures (OPLS) models are estimated, describing changes between successive time points. The individual OPLS models are linear, but the piecewise combination of several models accommodates modelling and prediction of changes which are non-linear with respect to the time course. We demonstrate the method on both simulated and metabolic profiling data, illustrating how time related changes are successfully modelled and predicted. Conclusion: The proposed method is effective for modelling and prediction of short and multivariate time series data. A key advantage of the method is model transparency, allowing easy interpretation of time-related variation in the data. The method provides a competitive complement to commonly applied multivariate methods such as OPLS and Principal Component Analysis (PCA) for modelling and analysis of short time-series data.

  • 32. Rögnvaldsson, Thorsteinn
    et al.
    Etchells, A
    You, Liwen
    Garwicz, Daniel
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper.
    Jarman, Ian
    Lisboa, Paulo J. G.
    How to find simple and accurate rules for viral protease cleavage specificities2009Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, s. 149-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this knowledge in practically useful ways. New methods are being developed that produce large amounts of cleavage information for individual proteases and some have been applied to extract cleavage rules from data. However, the hitherto proposed methods for extracting rules have been neither easy to understand nor very accurate. To be practically useful, cleavage rules should be accurate, compact, and expressed in an easily understandable way. Results: A new method is presented for producing cleavage rules for viral proteases with seemingly complex cleavage profiles. The method is based on orthogonal search-based rule extraction (OSRE) combined with spectral clustering. It is demonstrated on substrate data sets for human immunodeficiency virus type 1 (HIV-1) protease and hepatitis C (HCV) NS3/4A protease, showing excellent prediction performance for both HIV-1 cleavage and HCV NS3/4A cleavage, agreeing with observed HCV genotype differences. New cleavage rules (consensus sequences) are suggested for HIV-1 and HCV NS3/4A cleavages. The practical usability of the method is also demonstrated by using it to predict the location of an internal cleavage site in the HCV NS3 protease and to correct the location of a previously reported internal cleavage site in the HCV NS3 protease. The method is fast to converge and yields accurate rules, on par with previous results for HIV-1 protease and better than previous state-of-the-art for HCV NS3/4A protease. Moreover, the rules are fewer and simpler than previously obtained with rule extraction methods. Conclusion: A rule extraction methodology by searching for multivariate low-order predicates yields results that significantly outperform existing rule bases on out-of-sample data, but are more transparent to expert users. The approach yields rules that are easy to use and useful for interpreting experimental data.

  • 33. Sennblad, Bengt
    et al.
    Schreil, Eva
    Berglund Sonnhammer, Ann-Charlotte
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Centrum för bioinformatik.
    Lagergren, Jens
    Arvestad, Lars
    primetv: a viewer for reconciled trees2007Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8, s. 148-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Evolutionary processes, such as gene family evolution or parasite-host cospeciation, can often be viewed as a tree evolving inside another tree. Relating two given trees under such a constraint is known as reconciling them. Adequate software tools for generating illustrations of tree reconciliations are instrumental for presenting and communicating results and ideas regarding these phenomena. Available visualization tools have been limited to illustrations of the most parsimonious reconciliation. However, there exists a plethora of biologically relevant non-parsimonious reconciliations. Illustrations of these general reconciliations may not be achieved without manual editing. Results: We have developed a new reconciliation viewer, primetv. It is a simple and compact visualization program that is the first automatic tool for illustrating general tree reconciliations. It reads reconciled trees in an extended Newick format and outputs them as tree-within-tree illustrations in a range of graphic formats. Output attributes, such as colors and layout, can easily be adjusted by the user. To enhance the construction of input to primetv, two helper programs, readReconciliation and reconcile, accompany primetv. Detailed examples of all programs' usage are provided in the text. For the casual user a web-service provides a simple user interface to all programs. Conclusion: With primetv, the first visualization tool for general reconciliations, illustrations of trees-within-trees are easy to produce. Because it clarifies and accentuates an underlying structure in a reconciled tree, e. g., the impact of a species tree on a gene-family phylogeny, it will enhance scientific presentations as well as pedagogic illustrations in an educational setting. primetv is available at http://prime.sbc.su.se/primetv, both as a standalone command-line tool and as a web service. The software is distributed under the GNU General Public License.

  • 34.
    Sperber, Göran
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för neurovetenskap, Fysiologi.
    Lövgren, Anders
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Centrum för bioinformatik.
    Eriksson, Nils-Einar
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Centrum för bioinformatik.
    Benachenhou, Farid
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Klinisk virologi.
    Blomberg, Jonas
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper, Klinisk virologi.
    RetroTector online, a rational tool for analysis of retroviral elements in small and medium size vertebrate genomic sequences2009Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10 Suppl 6, s. S4-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    BACKGROUND: The rapid accumulation of genomic information in databases necessitates rapid and specific algorithms for extracting biologically meaningful information. More or less complete retroviral sequences, also called proviral or endogenous retroviral sequences; ERVs, constitutes at least 5% of vertebrate genomes. After infecting the host, these retroviruses have integrated in germ line cells, and have then been carried in genomes for at least several 100 million years. A better understanding of structure and function of these sequences can have profound biological and medical consequences. METHODS: RetroTector (ReTe) is a platform-independent Java program for identification and characterization of proviral sequences in vertebrate genomes. The full ReTe requires a local installation with a MySQL database. Although not overly complicated, the installation may take some time. A "light" version of ReTe, (RetroTector online; ROL) which does not require specific installation procedures is provided, via the World Wide Web. RESULT: ROL http://www.fysiologi.neuro.uu.se/jbgs/ was implemented under the Batchelor web interface (A Lövgren et al). It allows both GenBank accession number, file and FASTA cut-and-paste admission of sequences (5 to 10,000 kilobases). Up to ten submissions can be done simultaneously, allowing batch analysis of <or= 100 Megabases. Jobs are shown in an IP-number specific list. Results are text files, and can be viewed with the program, RetroTectorViewer.jar (at the same site), which has the full graphical capabilities of the basic ReTe program. A detailed analysis of any retroviral sequences found in the submitted sequence is graphically presented, exportable in standard formats. With the current server, a complete analysis of a 1 Megabase sequence is complete in 10 minutes. It is possible to mask nonretroviral repetitive sequences in the submitted sequence, using host genome specific "brooms", which increase specificity. DISCUSSION: Proviral sequences can be hard to recognize, especially if the integration occurred many million years ago. Precise delineation of LTR, gag, pro, pol and env can be difficult, requiring manual work. ROL is a way of simplifying these tasks. CONCLUSION: ROL provides 1. annotation and presentation of known retroviral sequences, 2. detection of proviral chains in unknown genomic sequences, with up to 100 Mbase per submission.

  • 35.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Alvarsson, Jonathan
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Berg, Arvid
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Eklund, Martin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Kuhn, Stefan
    European Bioinformatics Institute, Hinxton, UK.
    Mäsak, Carl
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Torrance, Gilleain
    European Bioinformatics Institute, Hinxton, UK.
    Wagener, Johannes
    Max von Pettenkofer-Institut, Ludwig-Maximilians-Universität, Munich, Germany.
    Willighagen, Egon
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Steinbeck, Christoph
    European Bioinformatics Institute, Hinxton, UK.
    Wikberg, Jarl
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Bioclipse 2: A scriptable integration platform for the life sciences2009Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, s. 397-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Contemporary biological research integrates neighboring scientific domains to answer complex ques- tions in fields such as systems biology and drug discovery. This calls for tools that are intuitive to use, yet flexible to adapt to new tasks.

    Results: Bioclipse is a free, open source workbench with advanced features for the life sciences. Version 2.0 constitutes a complete rewrite of Bioclipse, and delivers a stable, scalable integration platform for developers and an intuitive workbench for end users. All functionality is available both from the graphical user interface and from a built-in novel domain-specific language, supporting the scientist in interdisciplinary research and reproducible analyses through advanced visualization of the inputs and the results. New components for Bioclipse 2 include a rewritten editor for chemical structures, a table for multiple molecules that supports gigabyte-sized files, as well as a graphical editor for sequences and alignments.

    Conclusions: Bioclipse 2 is equipped with advanced tools required to carry out complex analysis in the fields of bio- and cheminformatics. Developed as a Rich Client based on Eclipse, Bioclipse 2 leverages on today’s powerful desktop computers for providing a responsive user interface, but also takes full advantage of the Web and networked (Web/Cloud) services for more demanding calculations or retrieval of data. That Bioclipse 2 is based on an advanced and widely used service platform ensures wide extensibility, and new algorithms, visualizations as well as scripting commands can easily be added. The intuitive tools for end users and the extensible architecture make Bioclipse 2 ideal for interdisciplinary and integrative research. Bioclipse 2 is released under the Eclipse Public License (EPL), a flexible open source license that allows additional plugins to be of any license. Bioclipse 2 is implemented in Java and supported on all major platforms; Source code and binaries are freely available at http://www.bioclipse.net.

  • 36.
    Spjuth, Ola
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Helmus, Tobias
    Willighagen, Egon L
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Kuhn, Stefan
    Eklund, Martin
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Wagener, Johannes
    Murray-Rust, Peter
    Steinbeck, Christoph
    Wikberg, Jarl E S
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap, Avdelningen för farmaceutisk farmakologi.
    Bioclipse: an open source workbench for chemo- and bioinformatics2007Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8, s. 59-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    BACKGROUND: There is a need for software applications that provide users with a complete and extensible toolkit for chemo- and bioinformatics accessible from a single workbench. Commercial packages are expensive and closed source, hence they do not allow end users to modify algorithms and add custom functionality. Existing open source projects are more focused on providing a framework for integrating existing, separately installed bioinformatics packages, rather than providing user-friendly interfaces. No open source chemoinformatics workbench has previously been published, and no successful attempts have been made to integrate chemo- and bioinformatics into a single framework. RESULTS: Bioclipse is an advanced workbench for resources in chemo- and bioinformatics, such as molecules, proteins, sequences, spectra, and scripts. It provides 2D-editing, 3D-visualization, file format conversion, calculation of chemical properties, and much more; all fully integrated into a user-friendly desktop application. Editing supports standard functions such as cut and paste, drag and drop, and undo/redo. Bioclipse is written in Java and based on the Eclipse Rich Client Platform with a state-of-the-art plugin architecture. This gives Bioclipse an advantage over other systems as it can easily be extended with functionality in any desired direction. CONCLUSION: Bioclipse is a powerful workbench for bio- and chemoinformatics as well as an advanced integration platform. The rich functionality, intuitive user interface, and powerful plugin architecture make Bioclipse the most advanced and user-friendly open source workbench for chemo- and bioinformatics. Bioclipse is released under Eclipse Public License (EPL), an open source license which sets no constraints on external plugin licensing; it is totally open for both open source plugins as well as commercial ones. Bioclipse is freely available at http://www.bioclipse.net.

  • 37.
    Stenberg, Johan
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för genetik och patologi.
    Nilsson, Mats
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för genetik och patologi.
    Landegren, Ulf
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för genetik och patologi.
    ProbeMaker: an extensible framework for design of sets of oligonucleotide probes2005Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 6, s. 229-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    BACKGROUND: Procedures for genetic analyses based on oligonucleotide probes are powerful tools that can allow highly parallel investigations of genetic material. Such procedures require the design of large sets of probes using application-specific design constraints. RESULTS: ProbeMaker is a software framework for computer-assisted design and analysis of sets of oligonucleotide probe sequences. The tool assists in the design of probes for sets of target sequences, incorporating sequence motifs for purposes such as amplification, visualization, or identification. An extension system allows the framework to be equipped with application-specific components for evaluation of probe sequences, and provides the possibility to include support for importing sequence data from a variety of file formats. CONCLUSION: ProbeMaker is a suitable tool for many different oligonucleotide design and analysis tasks, including the design of probe sets for various types of parallel genetic analyses, experimental validation of design parameters, and in silico testing of probe sequence evaluation algorithms.

  • 38.
    Strömbergsson, Helena
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Centrum för bioinformatik.
    Kleywegt, Gerard J
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi.
    A chemogenomics view on protein-ligand spaces2009Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, nr Suppl.6, s. S13-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    BACKGROUND: Chemogenomics is an emerging inter-disciplinary approach to drug discovery that combines traditional ligand-based approaches with biological information on drug targets and lies at the interface of chemistry, biology and informatics. The ultimate goal in chemogenomics is to understand molecular recognition between all possible ligands and all possible drug targets. Protein and ligand space have previously been studied as separate entities, but chemogenomics studies deal with large datasets that cover parts of the joint protein-ligand space. Since drug discovery has traditionally focused on ligand optimization, the chemical space has been studied extensively. The protein space has been studied to some extent, typically for the purpose of classification of proteins into functional and structural classes. Since chemogenomics deals not only with ligands but also with the macromolecules the ligands interact with, it is of interest to find means to explore, compare and visualize protein-ligand subspaces. RESULTS: Two chemogenomics protein-ligand interaction datasets were prepared for this study. The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB). The second dataset contains all approved drugs and drug targets stored in the DrugBank database, and represents the approved drug-drug target space. To capture biological and physicochemical features of the chemogenomics datasets, sequence-based descriptors were computed for the proteins, and 0, 1 and 2 dimensional descriptors for the ligands. Principal component analysis (PCA) was used to analyze the multidimensional data and to create global models of protein-ligand space. The nearest neighbour method, computed using the principal components, was used to obtain a measure of overlap between the datasets. CONCLUSION: In this study, we present an approach to visualize protein-ligand spaces from a chemogenomics perspective, where both ligand and protein features are taken into account. The method can be applied to any protein-ligand interaction dataset. Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets. We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.

  • 39.
    Torabi Moghadam, Behrooz
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Beräkningsbiologi och bioinformatik.
    Dabrowski, Michal
    Polish Acad Sci, Nencki Inst Expt Biol, Lab Bioinformat, Neurobiol Ctr, Warsaw, Poland..
    Kaminska, Bozena
    Polish Acad Sci, Nencki Inst Expt Biol, Lab Bioinformat, Lab Mol Neurobiol, Warsaw, Poland..
    Grabherr, Manfred G.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinsk biokemi och mikrobiologi.
    Komorowski, Jan
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Beräkningsbiologi och bioinformatik. Polish Acad Sci, Inst Comp Sci, PL-01248 Warsaw, Poland..
    Combinatorial identification of DNA methylation patterns over age in the human brain2016Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 17, artikkel-id 393Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: DNA methylation plays a key role in developmental processes, which is reflected in changing methylation patterns at specific CpG sites over the lifetime of an individual. The underlying mechanisms are complex and possibly affect multiple genes or entire pathways. Results: We applied a multivariate approach to identify combinations of CpG sites that undergo modifications when transitioning between developmental stages. Monte Carlo feature selection produced a list of ranked and statistically significant CpG sites, while rule-based models allowed for identifying particular methylation changes in these sites. Our rule-based classifier reports combinations of CpG sites, together with changes in their methylation status in the form of easy-to-read IF-THEN rules, which allows for identification of the genes associated with the underlying sites. Conclusion: We utilized machine learning and statistical methods to discretize decision class (age) values to get a general pattern of methylation changes over the lifespan. The CpG sites present in the significant rules were annotated to genes involved in brain formation, general development, as well as genes linked to cancer and Alzheimer's disease.

  • 40.
    Wagener, Johannes
    et al.
    Max von Pettenkofer-Institut, Ludwig-Maximilians-Universität, Munich, Germany.
    Spjuth, Ola
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Willighagen, Egon
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    Wikberg, Jarl
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Farmaceutiska fakulteten, Institutionen för farmaceutisk biovetenskap.
    XMPP for cloud computing in bioinformatics supporting discovery and invocation of asynchronous web services2009Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, s. 279-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    BACKGROUND:Life sciences make heavily use of the web for both data provision and analysis. However, the increasing amount of available data and the diversity of analysis tools call for machine accessible interfaces in order to be effective. HTTP-based Web service technologies, like the Simple Object Access Protocol (SOAP) and REpresentational State Transfer (REST) services, are today the most common technologies for this in bioinformatics. However, these methods have severe drawbacks, including lack of discoverability, and the inability for services to send status notifications. Several complementary workarounds have been proposed, but the results are ad-hoc solutions of varying quality that can be difficult to use.

    RESULTS:We present a novel approach based on the open standard Extensible Messaging and Presence Protocol (XMPP), consisting of an extension (IO Data) to comprise discovery, asynchronous invocation, and definition of data types in the service. That XMPP cloud services are capable of asynchronous communication implies that clients do not have to poll repetitively for status, but the service sends the results back to the client upon completion. Implementations for Bioclipse and Taverna are presented, as are various XMPP cloud services in bio- and cheminformatics.

    CONCLUSION:XMPP with its extensions is a powerful protocol for cloud services that demonstrate several advantages over traditional HTTP-based Web services: 1) services are discoverable without the need of an external registry, 2) asynchronous invocation eliminates the need for ad-hoc solutions like polling, and 3) input and output types defined in the service allows for generation of clients on the fly without the need of an external semantics description. The many advantages over existing technologies make XMPP a highly interesting candidate for next generation online services in bioinformatics.

  • 41.
    Westholm, Jakub Orzechowski
    et al.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Centrum för bioinformatik.
    Xu, Feifei
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Centrum för bioinformatik.
    Ronne, Hans
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinsk biokemi och mikrobiologi.
    Komorowski, Jan
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Centrum för bioinformatik.
    Genome-scale study of the importance of binding site context for transcription factor binding and gene regulation.2008Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 9, s. 484-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    BACKGROUND

    The rate of mRNA transcription is controlled by transcription factors that bind to specific DNA motifs in promoter regions upstream of protein coding genes. Recent results indicate that not only the presence of a motif but also motif context (for example the orientation of a motif or its location relative to the coding sequence) is important for gene regulation.

    RESULTS

    In this study we present ContextFinder, a tool that is specifically aimed at identifying cases where motif context is likely to affect gene regulation. We used ContextFinder to examine the role of motif context in S. cerevisiae both for DNA binding by transcription factors and for effects on gene expression. For DNA binding we found significant patterns of motif location bias, whereas motif orientations did not seem to matter. Motif context appears to affect gene expression even more than it affects DNA binding, as biases in both motif location and orientation were more frequent in promoters of co-expressed genes. We validated our results against data on nucleosome positioning, and found a negative correlation between preferred motif locations and nucleosome occupancy.

    CONCLUSION

    We conclude that the requirement for stable binding of transcription factors to DNA and their subsequent function in gene regulation can impose constraints on motif context.

  • 42. Wilczynski, Bartek
    et al.
    Hvidsten, Torgeir R.
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Centrum för bioinformatik.
    Kryshtafovych, Andriy
    Tiuryn, Jerzy
    Komorowski, Jan
    Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Biologiska sektionen, Institutionen för cell- och molekylärbiologi, Centrum för bioinformatik.
    Fidelis, Krzysztof
    Using local gene expression similarities to discover regulatory binding site modules2006Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 7, s. 505-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: We present an approach designed to identify gene regulation patterns using sequence and expression data collected for Saccharomyces cerevisae. Our main goal is to relate the combinations of transcription factor binding sites (also referred to as binding site modules) identified in gene promoters to the expression of these genes. The novel aspects include local expression similarity clustering and an exact IF-THEN rule inference algorithm. We also provide a method of rule generalization to include genes with unknown expression profiles. Results: We have implemented the proposed framework and tested it on publicly available datasets from yeast S. cerevisae. The testing procedure consists of thorough statistical analyses of the groups of genes matching the rules we infer from expression data against known sets of coregulated genes. For this purpose we have used published ChIP-Chip data and Gene Ontology annotations. In order to make these tests more objective we compare our results with recently published similar studies. Conclusion: Results we obtain show that local expression similarity clustering greatly enhances overall quality of the derived rules, both in terms of enrichment of Gene Ontology functional annotation and coherence with ChIP-Chip binding data. Our approach thus provides reliable hypotheses on co-regulation that can be experimentally verified. An important feature of the method is its reliance only on widely accessible sequence and expression data. The same procedure can be easily applied to other microbial organisms.

  • 43.
    Willighagen, Egon
    et al.
    Cologne University.
    O'Boyle, Noel
    Cambridge Crystallographic Data Centre.
    Gopalakrishnan, Harini
    Indiana University.
    Jiao, Dazhi
    Indiana University.
    Guha, Rajarshi
    Indiana University.
    Steinbeck, Christoph
    University of Tübingen.
    Wild, David
    Indiana University.
    Userscripts for the Life Sciences2007Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 8, s. 487-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background

    The web has seen an explosion of chemistry and biology related resources in the last 15 years: thousands of scientific journals, databases, wikis, blogs and resources are available with a wide variety of types of information. There is a huge need to aggregate and organise this information. However, the sheer number of resources makes it unrealistic to link them all in a centralised manner. Instead, search engines to find information in those resources flourish, and formal languages like Resource Description Framework and Web Ontology Language are increasingly used to allow linking of resources. A recent development is the use of userscripts to change the appearance of web pages, by on-the-fly modification of the web content. This opens possibilities to aggregate information and computational results from different web resources into the web page of one of those resources.

    Results

    Several userscripts are presented that enrich biology and chemistry related web resources by incorporating or linking to other computational or data sources on the web. The scripts make use of Greasemonkey-like plugins for web browsers and are written in JavaScript. Information from third-party resources are extracted using open Application Programming Interfaces, while common Universal Resource Locator schemes are used to make deep links to related information in that external resource. The userscripts presented here use a variety of techniques and resources, and show the potential of such scripts.

    Conclusion

    This paper discusses a number of userscripts that aggregate information from two or more web resources. Examples are shown that enrich web pages with information from other resources, and show how information from web pages can be used to link to, search, and process information in other resources. Due to the nature of userscripts, scientists are able to select those scripts they find useful on a daily basis, as the scripts run directly in their own web browser rather than on the web server. This flexibility allows the scientists to tune the features of web resources to optimise their productivity.

  • 44.
    Zamani, Neda
    et al.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinsk biokemi och mikrobiologi. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Sundström, Görel
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinsk biokemi och mikrobiologi. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Meadows, Jennifer R. S.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinsk biokemi och mikrobiologi. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Höppner, Marc P.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinsk biokemi och mikrobiologi. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Dainat, Jacques
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinsk biokemi och mikrobiologi. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Lantz, Henrik
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinsk biokemi och mikrobiologi. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    Haas, Brian J.
    Grabherr, Manfred G.
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinsk biokemi och mikrobiologi. Uppsala universitet, Science for Life Laboratory, SciLifeLab.
    A universal genomic coordinate translator for comparative genomics2014Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 15, s. 227-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N-2 with the number of available genomes, N. Results: Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species. Conclusions: Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken.

  • 45. Önskog, Jenny
    et al.
    Freyhult, Eva
    Uppsala universitet, Medicinska och farmaceutiska vetenskapsområdet, Medicinska fakulteten, Institutionen för medicinska vetenskaper.
    Landfors, Mattias
    Ryden, Patrik
    Hvidsten, Torgeir R.
    Classification of microarrays: synergistic effects between normalization, gene selection and machine learning2011Inngår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 12, s. 390-Artikkel i tidsskrift (Fagfellevurdert)
    Abstract [en]

    Background: Machine learning is a powerful approach for describing and predicting classes in microarray data. Although several comparative studies have investigated the relative performance of various machine learning methods, these often do not account for the fact that performance (e. g. error rate) is a result of a series of analysis steps of which the most important are data normalization, gene selection and machine learning. Results: In this study, we used seven previously published cancer-related microarray data sets to compare the effects on classification performance of five normalization methods, three gene selection methods with 21 different numbers of selected genes and eight machine learning methods. Performance in term of error rate was rigorously estimated by repeatedly employing a double cross validation approach. Since performance varies greatly between data sets, we devised an analysis method that first compares methods within individual data sets and then visualizes the comparisons across data sets. We discovered both well performing individual methods and synergies between different methods. Conclusion: Support Vector Machines with a radial basis kernel, linear kernel or polynomial kernel of degree 2 all performed consistently well across data sets. We show that there is a synergistic relationship between these methods and gene selection based on the T-test and the selection of a relatively high number of genes. Also, we find that these methods benefit significantly from using normalized data, although it is hard to draw general conclusions about the relative performance of different normalization procedures.

1 - 45 of 45
RefereraExporteraLink til resultatlisten
Permanent link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf