uu.seUppsala University Publications
Change search
Refine search result
1234 1 - 50 of 177
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Aftab, Obaid
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Fryknäs, Mårten
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Hammerling, Ulf
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Larsson, Rolf
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Gustafsson, Mats
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Detection of cell aggregation and altered cell viability by automated label-free video microscopy: A promising alternative to endpoint viability assays in high throughput screening2015In: Journal of Biomolecular Screening, ISSN 1087-0571, E-ISSN 1552-454X, Vol. 20, no 3, p. 372-381Article in journal (Refereed)
    Abstract [en]

    Automated phase-contrast video microscopy now makes it feasible to monitor a high-throughput (HT) screening experiment in a 384-well microtiter plate format by collecting one time-lapse video per well. Being a very cost-effective and label-free monitoring method, its potential as an alternative to cell viability assays was evaluated. Three simple morphology feature extraction and comparison algorithms were developed and implemented for analysis of differentially time-evolving morphologies (DTEMs) monitored in phase-contrast microscopy videos. The most promising layout, pixel histogram hierarchy comparison (PHHC), was able to detect several compounds that did not induce any significant change in cell viability, but made the cell population appear as spheroidal cell aggregates. According to recent reports, all these compounds seem to be involved in inhibition of platelet-derived growth factor receptor (PDGFR) signaling. Thus, automated quantification of DTEM (AQDTEM) holds strong promise as an alternative or complement to viability assays in HT in vitro screening of chemical compounds.

  • 2.
    Agarwal, Prasoon
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Hematology and Immunology.
    Regulation of Gene Expression in Multiple Myeloma Cells and Normal Fibroblasts: Integrative Bioinformatic and Experimental Approaches2014Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The work presented in this thesis applies integrative genomic and experimental approaches to investigate mechanisms involved in regulation of gene expression in the context of disease and normal cell biology.

    In papers I and II, we have explored the role of epigenetic regulation of gene expression in multiple myeloma (MM). By using a bioinformatic approach we identified the Polycomb repressive complex 2 (PRC2) to be a common denominator for the underexpressed gene signature in MM. By using inhibitors of the PRC2 we showed an activation of the genes silenced by H3K27me3 and a reduction in the tumor load and increased overall survival in the in vivo 5TMM model. Using ChIP-sequencing we defined the distribution of H3K27me3 and H3K4me3 marks in MM patients cells. In an integrated bioinformatic approach, the H3K27me3-associated genes significantly correlated to under-expression in patients with less favorable survival. Thus, our data indicates the presence of a common under-expressed gene profile and provides a rationale for implementing new therapies focusing on epigenetic alterations in MM.

    In paper III we address the existence of a small cell population in MM presenting with differential tumorigenic properties in the 5T33MM murine model. We report that the predominant population of CD138+ cells had higher engraftment potential, higher clonogenic growth, whereas the CD138- MM cells presented with less mature phenotype and higher drug resistance. Our findings suggest that while designing treatment regimes for MM, both the cellpopulations must be targeted.

    In paper IV we have studied the general mechanism of differential gene expression regulation by CGGBP1 in response to growth signals in normal human fibroblasts. We found that CGGBP1 binding affects global gene expression by RNA Polymerase II. This is mediated by Alu RNAdependentinhibition of RNA Polymerase II. In presence of growth signals CGGBP1 is retained in the nuclei and exhibits enhanced Alu binding thus inhibiting RNA Polymerase III binding on Alus. Hence we suggest a mechanism by which CGGBP1 orchestrates Alu RNA-mediated regulation of RNA Polymerase II. This thesis provides new insights for using integrative bioinformatic approaches to decipher gene expression regulation mechanisms in MM and in normal cells.

    List of papers
    1. Polycomb target genes are silenced in multiple myeloma
    Open this publication in new window or tab >>Polycomb target genes are silenced in multiple myeloma
    Show others...
    2010 (English)In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 5, no 7, p. e11483-Article in journal (Refereed) Published
    Abstract [en]

    Multiple myeloma (MM) is a genetically heterogeneous disease, which to date remains fatal. Finding a common mechanism for initiation and progression of MM continues to be challenging. By means of integrative genomics, we identified an underexpressed gene signature in MM patient cells compared to normal counterpart plasma cells. This profile was enriched for previously defined H3K27-tri-methylated genes, targets of the Polycomb group (PcG) proteins in human embryonic fibroblasts. Additionally, the silenced gene signature was more pronounced in ISS stage III MM compared to stage I and II. Using chromatin immunoprecipitation (ChIP) assay on purified CD138+ cells from four MM patients and on two MM cell lines, we found enrichment of H3K27me3 at genes selected from the profile. As the data implied that the Polycomb-targeted gene profile would be highly relevant for pharmacological treatment of MM, we used two compounds to chemically revert the H3K27-tri-methylation mediated gene silencing. The S-adenosylhomocysteine hydrolase inhibitor 3-Deazaneplanocin (DZNep) and the histone deacetylase inhibitor LBH589 (Panobinostat), reactivated the expression of genes repressed by H3K27me3, depleted cells from the PRC2 component EZH2 and induced apoptosis in human MM cell lines. In the immunocompetent 5T33MM in vivo model for MM, treatment with LBH589 resulted in gene upregulation, reduced tumor load and increased overall survival. Taken together, our results reveal a common gene signature in MM, mediated by gene silencing via the Polycomb repressor complex. The importance of the underexpressed gene profile in MM tumor initiation and progression should be subjected to further studies.

    National Category
    Hematology
    Identifiers
    urn:nbn:se:uu:diva-133207 (URN)10.1371/journal.pone.0011483 (DOI)000279715300003 ()20634887 (PubMedID)
    Available from: 2010-11-03 Created: 2010-11-03 Last updated: 2017-12-12Bibliographically approved
    2. The epigenomic map of multiple myeloma reveals the importance of Polycomb gene silencing for the malignancy
    Open this publication in new window or tab >>The epigenomic map of multiple myeloma reveals the importance of Polycomb gene silencing for the malignancy
    Show others...
    (English)Manuscript (preprint) (Other academic)
    Abstract [en]

    Multiple myeloma (MM) is characterized by accumulation of post-germinal center, isotype switched, long-living plasma cells with retained proliferation capacity within the bone marrow. MM is highly heterogeneous and remains fatal. This heterogeneity has hampered identification of a common underlying mechanism for disease establishment and the development of targeted therapy. We recently provided proof-of-principle that gene silencing associated with H3K27me3 contributes to the malignancy of MM. Here we present the first epigenomic map of MM for H3K27me3 and H3K4me3 derived by ChIP- and RNA sequencing from freshly-isolated bone marrow plasma cells from four patients. We compile lists of targets common among the patients as well as unique to MM when compared with PBMCs. Indicating the clinical relevance of our findings, we find increased silencing of H3K27me3 targets with disease progression and in patients presenting with a poor prognosis. Bivalent genes further significantly correlated to under-expressed genes in MM and were unique to MM when compared to PBMCs. Furthermore, bivalent genes, unlike H3K27me3 targets, significantly associated with transcriptional activation upon Polycomb inhibition indicating a potential for drug targeting. Thus, we suggest that gene silencing by Polycomb plays an important role in the development of the malignant phenotype of the MM cell during tumor progression.

    National Category
    Cell and Molecular Biology
    Research subject
    Oncology
    Identifiers
    urn:nbn:se:uu:diva-199492 (URN)
    Available from: 2013-05-06 Created: 2013-05-06 Last updated: 2018-01-11Bibliographically approved
    3. Tumor-initiating capacity of CD138- and CD138+ tumor cells in the 5T33 multiple myeloma model
    Open this publication in new window or tab >>Tumor-initiating capacity of CD138- and CD138+ tumor cells in the 5T33 multiple myeloma model
    Show others...
    2012 (English)In: Leukemia, ISSN 0887-6924, E-ISSN 1476-5551, Vol. 26, no 6, p. 1436-1439Article in journal, Letter (Refereed) Published
    National Category
    Medical and Health Sciences
    Identifiers
    urn:nbn:se:uu:diva-177948 (URN)10.1038/leu.2011.373 (DOI)000305081000040 ()22289925 (PubMedID)
    Available from: 2012-07-25 Created: 2012-07-20 Last updated: 2017-12-07Bibliographically approved
    4. Growth signals employ CGGBP1 to suppress transcription of Alu-SINEs
    Open this publication in new window or tab >>Growth signals employ CGGBP1 to suppress transcription of Alu-SINEs
    Show others...
    2016 (English)In: Cell Cycle, ISSN 1538-4101, E-ISSN 1551-4005, Vol. 15, no 12, p. 1558-1571Article in journal (Refereed) Published
    Abstract [en]

    CGGBP1 (CGG triplet repeat-binding protein 1) regulates cell proliferation, stress response,cytokinesis, telomeric integrity and transcription. It could affect these processes by modulatingtarget gene expression under different conditions. Identification of CGGBP1-target genes andtheir regulation could reveal how a transcription regulator affects such diverse cellular processes.Here we describe the mechanisms of differential gene expression regulation by CGGBP1 inquiescent or growing cells. By studying global gene expression patterns and genome-wide DNAbindingpatterns of CGGBP1, we show that a possible mechanism through which it affects theexpression of RNA Pol II-transcribed genes in trans depends on Alu RNA. We also show that itregulates Alu transcription in cis by binding to Alu promoter. Our results also indicate thatpotential phosphorylation of CGGBP1 upon growth stimulation facilitates its nuclear retention,Alu-binding and dislodging of RNA Pol III therefrom. These findings provide insights into howAlu transcription is regulated in response to growth signals.

    Keywords
    Alu-SINEs; CGGBP1; ChIP-seq; growth signals; RNA Pol III; transcription; tyrosine phosphorylation
    National Category
    Cell Biology
    Research subject
    Bioinformatics; Biology
    Identifiers
    urn:nbn:se:uu:diva-230959 (URN)10.4161/15384101.2014.967094 (DOI)000379743800011 ()25483050 (PubMedID)
    Funder
    Swedish Cancer SocietySwedish Research Council
    Available from: 2014-09-01 Created: 2014-09-01 Last updated: 2017-12-05Bibliographically approved
  • 3.
    Ahmed, Laeeq
    et al.
    Royal Institute of Technology, KTH.
    Georgiev, Valentin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Capuccini, Marco
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Toor, Salman
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Schaal, Wesley
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Laure, Erwin
    Royal Institute of Technology, KTH.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Efficient iterative virtual screening with Apache Spark and conformal prediction.2018In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 10, article id 8Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands.

    CONTRIBUTION: In this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as 'low-scoring' ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling.

    RESULTS: We show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub ( https://github.com/laeeq80/spark-cpvs ) and can be run on high-performance computers as well as on cloud resources.

  • 4.
    Ajawatanawong, Pravech
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Systematic Biology.
    Atkinson, Gemma C.
    Watson-Haigh, Nathan S.
    MacKenzie, Bryony
    Baldauf, Sandra L.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Systematic Biology.
    SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments2012In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 40, no W1, p. W340-W347Article in journal (Refereed)
    Abstract [en]

    Analyses of multiple sequence alignments generally focus on well-defined conserved sequence blocks, while the rest of the alignment is largely ignored or discarded. This is especially true in phylogenomics, where large multigene datasets are produced through automated pipelines. However, some of the most powerful phylogenetic markers have been found in the variable length regions of multiple alignments, particularly insertions/deletions (indels) in protein sequences. We have developed Sequence Feature and Indel Region Extractor (SeqFIRE) to enable the automated identification and extraction of indels from protein sequence alignments. The program can also extract conserved blocks and identify fast evolving sites using a combination of conservation and entropy. All major variables can be adjusted by the user, allowing them to identify the sets of variables most suited to a particular analysis or dataset. Thus, all major tasks in preparing an alignment for further analysis are combined in a single flexible and user-friendly program. The output includes a numbered list of indels, alignments in NEXUS format with indels annotated or removed and indel-only matrices. SeqFIRE is a user-friendly web application, freely available online at www.seqfire.org/.

  • 5.
    Al-Jaff, Mohammed
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology.
    Sandström, Eric
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology.
    Grabherr, Manfred
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology. Uppsala Univ, Bioinformat Infrastruct Life Sci, S-75123 Uppsala, Sweden..
    microTaboo: a general and practical solution to the k-disjoint problem2017In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 18, article id 228Article in journal (Refereed)
    Abstract [en]

    Background: A common challenge in bioinformatics is to identify short sub-sequences that are unique in a set of genomes or reference sequences, which can efficiently be achieved by k-mer (k consecutive nucleotides) counting. However, there are several areas that would benefit from a more stringent definition of "unique", requiring that these sub-sequences of length W differ by more than k mismatches (i.e. a Hamming distance greater than k) from any other sub-sequence, which we term the k-disjoint problem. Examples include finding sequences unique to a pathogen for probe-based infection diagnostics; reducing off-target hits for re-sequencing or genome editing; detecting sequence (e.g. phage or viral) insertions; and multiple substitution mutations. Since both sensitivity and specificity are critical, an exhaustive, yet efficient solution is desirable.

    Results: We present microTaboo, a method that allows for efficient and extensive sequence mining of unique (k-disjoint) sequences of up to 100 nucleotides in length. On a number of simulated and real data sets ranging from microbe-to mammalian-size genomes, we show that microTaboo is able to efficiently find all sub-sequences of a specified length W that do not occur within a threshold of k mismatches in any other sub-sequence. We exemplify that microTaboo has many practical applications, including point substitution detection, sequence insertion detection, padlock probe target search, and candidate CRISPR target mining.

    Conclusions: microTaboo implements a solution to the k-disjoint problem in an alignment-and assembly free manner. microTaboo is available for Windows, Mac OS X, and Linux, running Java 7 and higher, under the GNU GPLv3 license, at:https://MohammedAlJaff.github.io/microTaboo

  • 6.
    Alvarsson, Jonathan
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Engkvist, Ola
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Carlsson, Lars
    Wikberg, Jarl E. S.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Noeske, Tobias
    Ligand-Based Target Prediction with Signature Fingerprints2014In: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, no 10, p. 2647-2653Article in journal (Refereed)
    Abstract [en]

    When evaluating a potential drug candidate it is desirable to predict target interactions in silico prior to synthesis in order to assess, e.g., secondary pharmacology. This can be done by looking at known target binding profiles of similar compounds using chemical similarity searching. The purpose of this study was to construct and evaluate the performance of chemical fingerprints based on the molecular signature descriptor for performing target binding predictions. For the comparison we used the area under the receiver operating characteristics curve (AUC) complemented with net reclassification improvement (NRI). We created two open source signature fingerprints, a bit and a count version, and evaluated their performance compared to a set of established fingerprints with regards to predictions of binding targets using Tanimoto-based similarity searching on publicly available data sets extracted from ChEMBL. The results showed that the count version of the signature fingerprint performed on par with well-established fingerprints such as ECFP. The count version outperformed the bit version slightly; however, the count version is more complex and takes more computing time and memory to run so its usage should probably be evaluated on a case-by-case basis. The NRI based tests complemented the AUC based ones and showed signs of higher power.

  • 7.
    Alvarsson, Jonathan
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Lampa, Samuel
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Schaal, Wesley
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Andersson, Claes
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Wikberg, Jarl E. S.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Large-scale ligand-based predictive modelling using support vector machines2016In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 8, article id 39Article in journal (Refereed)
    Abstract [en]

    The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse.

  • 8.
    Ameur, Adam
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    A Bioinformatics Study of Human Transcriptional Regulation2008Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Regulation of transcription is a central mechanism in all living cells that now can be investigated with high-throughput technologies. Data produced from such experiments give new insights to how transcription factors (TFs) coordinate the gene transcription and thereby regulate the amounts of proteins produced. These studies are also important from a medical perspective since TF proteins are often involved in disease. To learn more about transcriptional regulation, we have developed strategies for analysis of data from microarray and massively parallel sequencing (MPS) experiments.

    Our computational results consist of methods to handle the steadily increasing amount of data from high-throughput technologies. Microarray data analysis tools have been assembled in the LCB-Data Warehouse (LCB-DWH) (paper I), and other analysis strategies have been developed for MPS data (paper V). We have also developed a de novo motif search algorithm called BCRANK (paper IV).

    The analysis has lead to interesting biological findings in human liver cells (papers II-V). The investigated TFs appeared to bind at several thousand sites in the genome, that we have identified at base pair resolution. The investigated histone modifications are mainly found downstream of transcription start sites, and correlated to transcriptional activity. These histone marks are frequently found for pairs of genes in a bidirectional conformation. Our results suggest that a TF can bind in the shared promoter of two genes and regulate both of them.

    From a medical perspective, the genes bound by the investigated TFs are candidates to be involved in metabolic disorders. Moreover, we have developed a new strategy to detect single nucleotide polymorphisms (SNPs) that disrupt the binding of a TF (paper IV). We further demonstrated that SNPs can affect transcription in the immediate vicinity. Ultimately, our method may prove helpful to find disease-causing regulatory SNPs.

    List of papers
    1. The LCB Data Warehouse
    Open this publication in new window or tab >>The LCB Data Warehouse
    Show others...
    2006 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 22, no 8, p. 1024-1026Article in journal (Refereed) Published
    Abstract [en]

    The Linnaeus Centre for Bioinformatics Data Warehouse (LCB-DWH) is a web-based infrastructure for reliable and secure microarray gene expression data management and analysis that provides an online service for the scientific community. The LCB-DWH is an effort towards a complete system for storage (using the BASE system), analysis and publication of microarray data. Important features of the system include: access to established methods within R/Bioconductor for data analysis, built-in connection to the Gene Ontology database and a scripting facility for automatic recording and re-play of all the steps of the analysis. The service is up and running on a high performance server. At present there are more than 150 registered users.

    National Category
    Natural Sciences
    Identifiers
    urn:nbn:se:uu:diva-97704 (URN)10.1093/bioinformatics/btl036 (DOI)16455749 (PubMedID)
    Available from: 2008-11-06 Created: 2008-11-06 Last updated: 2017-12-14Bibliographically approved
    2. Binding sites for metabolic disease related transcription factors inferred at base pair resolution by chromatin immunoprecipitation and genomic microarrays
    Open this publication in new window or tab >>Binding sites for metabolic disease related transcription factors inferred at base pair resolution by chromatin immunoprecipitation and genomic microarrays
    Show others...
    2005 (English)In: Human Molecular Genetics, ISSN 0964-6906, E-ISSN 1460-2083, Vol. 14, no 22, p. 3435-3447Article in journal (Refereed) Published
    Abstract [en]

    We present a detailed in vivo characterization of hepatocyte transcriptional regulation in HepG2 cells, using chromatin immunoprecipitation and detection on PCR fragment-based genomic tiling path arrays covering the encyclopedia of DNA element (ENCODE) regions. Our data suggest that HNF-4α and HNF-3β, which were commonly bound to distal regulatory elements, may cooperate in the regulation of a large fraction of the liver transcriptome and that both HNF-4α and USF1 may promote H3 acetylation to many of their targets. Importantly, bioinformatic analysis of the sequences bound by each transcription factor (TF) shows an over-representation of motifs highly similar to the in vitro established consensus sequences. On the basis of these data, we have inferred tentative binding sites at base pair resolution. Some of these sites have been previously found by in vitro analysis and some were verified in vitro in this study. Our data suggests that a similar approach could be used for the in vivo characterization of all predicted/uncharacterized TF and that the analysis could be scaled to the whole genome.

    Keywords
    Base Pairing/*genetics, Binding Sites/genetics, Cell Line; Tumor, Chromatin/*metabolism, Chromatin Immunoprecipitation/methods, Consensus Sequence, Genome; Human, Hepatocyte Nuclear Factor 3-beta/physiology, Hepatocyte Nuclear Factor 4/physiology, Hepatocytes/metabolism, Histones/metabolism, Humans, Metabolic Diseases/*metabolism, Oligonucleotide Array Sequence Analysis/methods, Promoter Regions (Genetics), Research Support; N.I.H.; Extramural, Research Support; Non-U.S. Gov't, Sequence Analysis; DNA, Transcription Factors/genetics/*metabolism, Upstream Stimulatory Factors/metabolism
    National Category
    Medical and Health Sciences
    Identifiers
    urn:nbn:se:uu:diva-80603 (URN)10.1093/hmg/ddi378 (DOI)16221759 (PubMedID)
    Available from: 2006-05-19 Created: 2006-05-19 Last updated: 2017-12-14Bibliographically approved
    3. Whole-genome maps of USF1 and USF2 binding and histone H3 acetylation reveal new aspects of promoter structure and candidate genes for common human disorders
    Open this publication in new window or tab >>Whole-genome maps of USF1 and USF2 binding and histone H3 acetylation reveal new aspects of promoter structure and candidate genes for common human disorders
    Show others...
    2008 (English)In: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 18, no 3, p. 380-392Article in journal (Refereed) Published
    Abstract [en]

    Transcription factors and histone modifications are crucial regulators of gene expression that mutually influence each other. We present the DNA binding profiles of upstream stimulatory factors 1 and 2 (USF1, USF2) and acetylated histone H3 (H3ac) in a liver cell line for the whole human genome using ChIP-chip at a resolution of 35 base pairs. We determined that these three proteins bind mostly in proximity of protein coding genes transcription start sites (TSSs), and their bindings are positively correlated with gene expression levels. Based on the spatial and functional relationship between USFs and H3ac at protein coding gene promoters, we found similar promoter architecture for known genes and the novel and less-characterized transcripts human mRNAs and spliced ESTs. Furthermore, our analysis revealed a previously underestimated abundance of genes in a bidirectional conformation, where USFs are bound in between TSSs. After taking into account this promoter conformation, the results indicate that H3ac is mainly located downstream of TSS, and it is at this genomic location where it positively correlates with gene expression. Finally, USF1, which is associated to familial combined hyperlipidemia, was found to bind and potentially regulate nuclear mitochondrial genes as well as genes for lipid and cholesterol metabolism, frequently in collaboration with GA binding protein transcription factor alpha (GABPA, nuclear respiratory factor 2 [NRF-2]). This expands our understanding about the transcriptional control of metabolic processes and its alteration in metabolic disorders.

    National Category
    Bioinformatics and Systems Biology
    Identifiers
    urn:nbn:se:uu:diva-97706 (URN)10.1101/gr.6880908 (DOI)000253766700004 ()18230803 (PubMedID)
    Available from: 2008-11-06 Created: 2008-11-06 Last updated: 2017-12-14Bibliographically approved
    4. New algorithm and ChIP-analysis identifies candidate functional SNPs
    Open this publication in new window or tab >>New algorithm and ChIP-analysis identifies candidate functional SNPs
    Show others...
    In: PNASArticle in journal (Refereed) Submitted
    Identifiers
    urn:nbn:se:uu:diva-97707 (URN)
    Available from: 2008-11-06 Created: 2008-11-06Bibliographically approved
    5. Differential binding and co-binding pattern of FOXA1 and FOXA3 and their relation to H3K4me3 in HepG2 cells revealed by ChIP-seq
    Open this publication in new window or tab >>Differential binding and co-binding pattern of FOXA1 and FOXA3 and their relation to H3K4me3 in HepG2 cells revealed by ChIP-seq
    Show others...
    2009 (English)In: Genome Biology, ISSN 1465-6906, E-ISSN 1474-760X, Vol. 10, no 11, p. R129-Article in journal (Refereed) Published
    Abstract [en]

    BACKGROUND: The forkhead box/winged helix family members FOXA1, FOXA2, and FOXA3 are of high importance in development and specification of the hepatic linage and the continued expression of liver-specific genes. RESULTS: Here, we present a genome-wide location analysis of FOXA1 and FOXA3 binding sites in HepG2 cells through chromatin immunoprecipitation with detection by sequencing (ChIP-seq) studies and compare these with our previous results on FOXA2. We found that these factors often bind close to each other in different combinations and consecutive immunoprecipitation of chromatin for one and then a second factor (ChIP-reChIP) shows that this occurs in the same cell and on the same DNA molecule, suggestive of molecular interactions. Using co-immunoprecipitation, we further show that FOXA2 interacts with both FOXA1 and FOXA3 in vivo, while FOXA1 and FOXA3 do not appear to interact. Additionally, we detected diverse patterns of trimethylation of lysine 4 on histone H3 (H3K4me3) at transcriptional start sites and directionality of this modification at FOXA binding sites. Using the sequence reads at polymorphic positions, we were able to predict allele specific binding for FOXA1, FOXA3, and H3K4me3. Finally, several SNPs associated with diseases and quantitative traits were located in the enriched regions. CONCLUSIONS: We find that ChIP-seq can be used not only to create gene regulatory maps but also to predict molecular interactions and to inform on the mechanisms for common quantitative variation.

    National Category
    Medical and Health Sciences Biological Sciences
    Identifiers
    urn:nbn:se:uu:diva-119751 (URN)10.1186/gb-2009-10-11-r129 (DOI)000273344600016 ()19919681 (PubMedID)
    Note

    De två (2) första författarna delar förstaförfattarskapet.

    Available from: 2010-03-01 Created: 2010-03-01 Last updated: 2017-12-12Bibliographically approved
  • 9.
    Amrein, Beat Anton
    et al.
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Structure and Molecular Biology.
    Steffen-Munsberg, Fabian
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Structure and Molecular Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Szeler, Ireneusz
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Structure and Molecular Biology.
    Purg, Miha
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Structure and Molecular Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Kulkarni, Yashraj
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Structure and Molecular Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Kamerlin, Shina Caroline Lynn
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Structure and Molecular Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    CADEE: Computer-Aided Directed Evolution of Enzymes2017In: IUCrJ, ISSN 0972-6918, E-ISSN 2052-2525, Vol. 4, no 1, p. 50-64Article in journal (Refereed)
    Abstract [en]

    The tremendous interest in enzymes as biocatalysts has led to extensive work in enzyme engineering, as well as associated methodology development. Here, a new framework for computer-aided directed evolution of enzymes (CADEE) is presented which allows a drastic reduction in the time necessary to prepare and analyze in silico semi-automated directed evolution of enzymes. A pedagogical example of the application of CADEE to a real biological system is also presented in order to illustrate the CADEE workflow.

  • 10.
    Arvidsson, Staffan
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Carlsson, Lars
    AstraZeneca R&D.
    Paulo, Toccaceli
    Royal Holloway University of London.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Prediction of Metabolic Transformations using Cross Venn-ABERS Predictors2017In: Conformal and Probabilistic Prediction with Applications (COPA) 2017 / [ed] Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, Harris Papadopoulos, 2017, Vol. 60, p. 118-131Conference paper (Refereed)
    Abstract [en]

    Prediction of drug metabolism is an important topic in the drug discovery process, and we here present a study using probabilistic predictions applying Cross Venn-ABERS Predictors (CVAPs) on data for site-of-metabolism. We used a dataset of 73599 biotransformations, applied SMIRKS to define biotransformations of interest and constructed five datasets where chemical structures were represented using signatures descriptors. The results show that CVAP produces well-calibrated predictions for all datasets with good predictive capability, making CVAP an interesting method for further exploration in drug discovery applications.

  • 11.
    Attwood, T.K.
    et al.
    Faculty of Life Sciences & School of Computer Science, University of Manchester.
    Gisel, A
    Institute for Biomedical Technologies, CNR, Italy.
    Eriksson, Nils-Einar
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology.
    Bongcam-Rudloff, Erik
    Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences.
    Concepts, Historical Milestones and the Central Place of Bioinformatics in Modern Biology: A European Perspective2011In: Bioinformatics: Trends and Methodologies / [ed] Mahmood A. Mahdavi, InTech, 2011, p. 3-26Chapter in book (Refereed)
  • 12.
    Austin, Peter C.
    et al.
    Inst Clin Evaluat Sci, G106,2075 Bayview Ave, Toronto, ON M4N 3M5, Canada.;Univ Toronto, Inst Hlth Management Policy & Evaluat, Toronto, ON, Canada.;Sunnybrook Res Inst, Schulich Heart Res Program, Toronto, ON, Canada..
    Wagner, Philippe
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Medicinska och farmaceutiska vetenskapsområdet, centrumbildningar mm, Centre for Clinical Research, County of Västmanland. Lund Univ, Unit Social Epidemiol, Fac Med, Malmo, Sweden..
    Merlo, Juan
    Lund Univ, Unit Social Epidemiol, Fac Med, Malmo, Sweden.;Region Skane, Ctr Primary Hlth Care Res, Malmo, Sweden..
    The median hazard ratio: a useful measure of variance and general contextual effects in multilevel survival analysis2017In: Statistics in Medicine, ISSN 0277-6715, E-ISSN 1097-0258, Vol. 36, no 6, p. 928-938Article in journal (Refereed)
    Abstract [en]

    Multilevel data occurs frequently in many research areas like health services research and epidemiology. A suitable way to analyze such data is through the use of multilevel regression models (MLRM). MLRM incorporate cluster-specific random effects which allow one to partition the total individual variance into between-cluster variation and between-individual variation. Statistically, MLRM account for the dependency of the data within clusters and provide correct estimates of uncertainty around regression coefficients. Substantively, the magnitude of the effect of clustering provides a measure of the General Contextual Effect (GCE). When outcomes are binary, the GCE can also be quantified by measures of heterogeneity like the Median Odds Ratio (MOR) calculated from a multilevel logistic regression model. Time-to-event outcomes within a multilevel structure occur commonly in epidemiological and medical research. However, the Median Hazard Ratio (MHR) that corresponds to the MOR in multilevel (i.e., 'frailty') Cox proportional hazards regression is rarely used. Analogously to the MOR, the MHR is the median relative change in the hazard of the occurrence of the outcome when comparing identical subjects from two randomly selected different clusters that are ordered by risk. We illustrate the application and interpretation of the MHR in a case study analyzing the hazard of mortality in patients hospitalized for acute myocardial infarction at hospitals in Ontario, Canada. We provide R code for computing the MHR. The MHR is a useful and intuitive measure for expressing cluster heterogeneity in the outcome and, thereby, estimating general contextual effects in multilevel survival analysis.

  • 13.
    Baltzer, Nicholas
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational Biology and Bioinformatics. Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Stockholm County, Sweden.
    Sundström, Karin
    Karolinska Inst, Dept Lab Med, Stockholm, Stockholm Count, Sweden..
    Nygård, Jan F.
    Canc Registry Norway, Dept Registry Informat, Oslo, Oslo County, Norway..
    Dillner, Joakim
    Karolinska Inst, Dept Lab Med, Stockholm, Stockholm Count, Sweden..
    Komorowski, Jan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational Biology and Bioinformatics. Polish Acad Sci, Inst Comp Sci, Warsaw, Warsaw County, Poland..
    Risk stratification in cervical cancer screening by complete screening history: Applying bioinformatics to a general screening population2017In: International Journal of Cancer, ISSN 0020-7136, E-ISSN 1097-0215, Vol. 141, no 1, p. 200-209Article in journal (Refereed)
    Abstract [en]

    Women screened for cervical cancer in Sweden are currently treated under a one-size-fits-all programme, which has been successful in reducing the incidence of cervical cancer but does not use all of the participants' available medical information. This study aimed to use women's complete cervical screening histories to identify diagnostic patterns that may indicate an increased risk of developing cervical cancer. A nationwide case-control study was performed where cervical cancer screening data from 125,476 women with a maximum follow-up of 10 years were evaluated for patterns of SNOMED diagnoses. The cancer development risk was estimated for a number of different screening history patterns and expressed as Odds Ratios (OR), with a history of 4 benign cervical tests as reference, using logistic regression. The overall performance of the model was moderate (64% accuracy, 71% area under curve) with 61-62% of the study population showing no specific patterns associated with risk. However, predictions for high-risk groups as defined by screening history patterns were highly discriminatory with ORs ranging from 8 to 36. The model for computing risk performed consistently across different screening history lengths, and several patterns predicted cancer outcomes. The results show the presence of risk-increasing and risk-decreasing factors in the screening history. Thus it is feasible to identify subgroups based on their complete screening histories. Several high-risk subgroups identified might benefit from an increased screening density. Some low-risk subgroups identified could likely have a moderately reduced screening density without additional risk.

  • 14.
    Bartoszek, Krzysztof
    Gdansk University of Technology.
    A Graph – String Model of Gene Assembly in Ciliates2006In: Zeszyty Naukowe Wydzialu ETI Politechniki Gdanskiej, 2006, Vol. 10, p. 521-534Conference paper (Refereed)
    Abstract [en]

    The ciliates are a family of unicellular organisms that characterize themselves by having two types of nuclei, micro - and macronuclei. During cell mating the genetic material must change from the micronuclei to the macronuclei form. The paper summarises a formal model for this change. The model, which is described in recent works, is based on strings and graphs. It shows that inside the cell complex computational operations have to take place.

  • 15.
    Bartoszek, Krzysztof
    Gdansk University of Technology.
    The Bootstrap and Other Methods of Testing Phylogenetic Trees2007In: Zeszyty Naukowe Wydzialu ETI Politechniki Gdanskiej, 2007, Vol. 12, p. 103-108Conference paper (Refereed)
    Abstract [en]

    The final step of a phylogenetic analysis is the test of the generated tree. This is not a easy task for which there is an obvious methodology because we do not know the full probabilistic model of evolution. A number of methods have been proposed but there is a wide debate concerning the interpretations of the results they produce.

  • 16.
    Bartoszek, Krzysztof
    et al.
    Mathematical Sciences, Chalmers University of Technology and the University of Gothenburg.
    Liò, Pietro
    University of Cambridge.
    Sorathiya, Anil
    University of Cambridge.
    Influenza differentiation and evolution2010In: Acta Physica Polonica B Proceedings Supplement, 2010, Vol. 3, p. 417-452Conference paper (Refereed)
    Abstract [en]

    The aim of the study is to do a very wide analysis of HA, NA and M influenza gene segments to find short nucleotide regions,which differentiate between strains (i.e. H1, H2, ... e.t.c.), hosts, geographic regions, time when sequence was found and combination of time and region using a simple methodology. Finding regions  differentiating between strains has as its goal the construction of a Luminex microarray which will allow quick and efficient strain recognition. Discovery for the other splitting factors could shed lighton structures significant for host specificity and on the history of influenza evolution. A large number of places in the HA, NA and M gene segments were found that can differentiate between hosts, regions, time and combination of time and region. Also very good differentiation between different Hx strains can be seen.We link one of our findings to a proposed stochastic model of creation of viral phylogenetic trees.

  • 17.
    Bartoszek, Krzysztof
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Mathematics, Applied Mathematics and Statistics.
    Pietro, Lio'
    Cambridge University.
    A novel algorithm to reconstruct phylogenies using gene sequences and expression data2014In: International Proceedings of Chemical, Biological & Environmental Engineering; Environment, Energy and Biotechnology III, 2014, p. 8-12Conference paper (Refereed)
    Abstract [en]

    Phylogenies based on single loci should be viewed with caution and the best approach for obtaining robust trees is to examine numerous loci across the genome. It often happens that for the same set of species trees derived from different genes are in conflict between each other. There are several methods that combine information from different genes in order to infer the species tree. One novel approach is to use informationfrom different -omics. Here we describe a phylogenetic method based on an Ornstein–Uhlenbeck process that combines sequence and gene expression data. We test our method on genes belonging to the histidine biosynthetic operon. We found that the method provides interesting insights into selection pressures and adaptive hypotheses concerning gene expression levels.

  • 18.
    Bashardanesh, Zahedeh
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational Biology and Bioinformatics.
    Lötstedt, Per
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Numerical Analysis.
    Efficient Green's function reaction dynamics (GFRD) simulations for diffusion-limited, reversible reactions2018In: Journal of Computational Physics, ISSN 0021-9991, E-ISSN 1090-2716, Vol. 357, p. 78-99Article in journal (Refereed)
  • 19.
    Bebris, Kristaps
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Animal ecology. Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Local adaptation of Grauer's gorilla gut microbiome2017Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    The availability of high-throughput sequencing technologies has enabled metagenomicinvestigations into complex bacterial communities with unprecedented resolution andthroughput. The production of dedicated data sets for metagenomic analyses is, however, acostly process and, frequently, the first research questions focus on the study species itself. Ifthe source material is represented by fecal samples, target capture of host-specific sequencesis applied to enrich the complex DNA mixtures contained within a typical fecal DNA extract.Yet, even after this enrichment, the samples still contain a large amount of environmentalDNA that is usually left unanalysed. In my study I investigate the possibility of using shotgunsequencing data that has been subjected to target enrichment for mtDNA from the hostspecies, Grauer’s gorilla (Gorilla beringei graueri), for further analysis of the microbialcommunity present in these samples. The purpose of these analyses is to study the differencesin the bacterial communities present within a high-altitude Grauer’s gorilla, low-altitudeGrauer’s gorilla, and a sympatric chimpanzee population. Additionally, I explore the adaptivepotential of the gut microbiota within these great ape populations.I evaluated the impact that the enrichment process had on the microbial community by usingpre- and post-capture museum preserved samples. In addition to this, I also analysed the effectof two different extraction methods on the bacterial communities.My results show that the relative abundances of the bacterial taxa remain relatively unaffectedby the enrichment process and the extraction methods. The overall number of taxa is,however, reduced by each additional capture round and is not consistent between theextraction methods. This means that both the enrichment and extraction processes introducebiases that require the usage of abundance-based distance measures for biological inferences.Additionally, even if the data cannot be used to study the bacterial communities in anunbiased manner, it provides useful comparative insights for samples that were treated in thesame fashion.With this background, I used museum and fecal samples to perform cluster analysis to explorethe relationships between the gut microbiota of the three great ape populations. I found thatpopulations cluster by species first, and only then group according to habitat. I further foundthat a bacterial taxon that degrades plant matter is enriched in the gut microbiota of all threegreat ape species, where it could help with the digestion of vegetative foods. Another bacterialtaxon that consumes glucose is enriched in the gut microbiota of the low-altitude gorilla andchimpanzee populations, where it could help with the modulation of the host’s mucosalimmune system, and could point to the availability of fruit in the animals diet. In addition, Ifound a bacterial taxon that is linked with diarrhea in humans to be part of the gut microbiotaof the habituated high-altitude gorilla population, which could indicate that this pathogen hasbeen transmitted to the gorillas from their interaction with humans, or it could be indicative ofthe presence of a contaminated water source.

  • 20.
    Bergman, Ebba
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Molecular Evolution.
    Haplotype Inference as a caseof Maximum Satisfiability: A strategy for identifying multi-individualinversion points in computational phasing2017Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Phasing genotypes from sequence data is an important step betweendata gathering and downstream analysis in population genetics,disease studies, and multiple other fields. This determination ofthe sequences of markers corresponding to the individualchromosomes can be done on data where the markers are in lowdensity across the chromosome, such as from single nucleotidepolymorphism (SNP) microarrays, or on data with a higher localdensity of markers like in next generation sequencing (NGS). Thesorted markers may then be used for many different analyses anddata processing such as linkage analysis, or inference of missinggenotypes in the process of imputation

    cnF2freq is a haplotype phasing program that uses an uncommonapproach allowing it to divide big groups of related individualsinto smaller ones. It sets an initial haplotype phase and theniteratively changes it using estimations from Hidden MarkovModels. If a marker is judged to have been placed in the wronghaplotype, a switch needs to be made so that it belongs to thecorrect phase. The objective of this project was to go fromallowing only one individual within a group to be switched in aniteration to allowing multiple switches that are dependent on eachother.

    The result of this project is a theoretical solution for allowingmultiple dependent switches in cnF2freq, and an implementedsolution using the max-SAT solver toulbar2.

  • 21.
    Bornelöv, Susanne
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    Rule-based Models of Transcriptional Regulation and Complex Diseases: Applications and Development2014Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    As we gain increased understanding of genetic disorders and gene regulation more focus has turned towards complex interactions. Combinations of genes or gene and environmental factors have been suggested to explain the missing heritability behind complex diseases. Furthermore, gene activation and splicing seem to be governed by a complex machinery of histone modification (HM), transcription factor (TF), and DNA sequence signals. This thesis aimed to apply and develop multivariate machine learning methods for use on such biological problems. Monte Carlo feature selection was combined with rule-based classification to identify interactions between HMs and to study the interplay of factors with importance for asthma and allergy.

    Firstly, publicly available ChIP-seq data (Paper I) for 38 HMs was studied. We trained a classifier for predicting exon inclusion levels based on the HMs signals. We identified HMs important for splicing and illustrated that splicing could be predicted from the HM patterns. Next, we applied a similar methodology on data from two large birth cohorts describing asthma and allergy in children (Paper II). We identified genetic and environmental factors with importance for allergic diseases which confirmed earlier results and found candidate gene-gene and gene-environment interactions.

    In order to interpret and present the classifiers we developed Ciruvis, a web-based tool for network visualization of classification rules (Paper III). We applied Ciruvis on classifiers trained on both simulated and real data and compared our tool to another methodology for interaction detection using classification. Finally, we continued the earlier study on epigenetics by analyzing HM and TF signals in genes with or without evidence of bidirectional transcription (Paper IV). We identified several HMs and TFs with different signals between unidirectional and bidirectional genes. Among these, the CTCF TF was shown to have a well-positioned peak 60-80 bp upstream of the transcription start site in unidirectional genes.

    List of papers
    1. Combinations of histone modifications mark exon inclusion levels
    Open this publication in new window or tab >>Combinations of histone modifications mark exon inclusion levels
    2012 (English)In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 7, no 1, article id e29911Article in journal (Refereed) Published
    Abstract [en]

    Splicing is a complex process regulated by sequence at the classical splice sites and other motifs in exons and introns with an enhancing or silencing effect. In addition, specific histone modifications on nucleosomes positioned over the exons have been shown to correlate both positively and negatively with exon expression. Here, we trained a model of "IF … THEN …" rules to predict exon inclusion levels in a transcript from histone modification patterns. Furthermore, we showed that combinations of histone modifications, in particular those residing on nucleosomes preceding or succeeding the exon, are better predictors of exon inclusion levels than single modifications. The resulting model was evaluated with cross validation and had an average accuracy of 72% for 27% of the exons, which demonstrates that epigenetic signals substantially mark alternative splicing.

    National Category
    Cell and Molecular Biology
    Identifiers
    urn:nbn:se:uu:diva-175875 (URN)10.1371/journal.pone.0029911 (DOI)000312662100045 ()22242188 (PubMedID)
    Funder
    Knut and Alice Wallenberg FoundationSwedish Foundation for Strategic Research Swedish Research CouncilSwedish Cancer Society
    Available from: 2012-06-13 Created: 2012-06-13 Last updated: 2018-01-12Bibliographically approved
    2. Rule-Based Models of the Interplay between Genetic and Environmental Factors in Childhood Allergy
    Open this publication in new window or tab >>Rule-Based Models of the Interplay between Genetic and Environmental Factors in Childhood Allergy
    Show others...
    2013 (English)In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 8, no 11, p. e80080-Article in journal (Refereed) Published
    Abstract [en]

    Both genetic and environmental factors are important for the development of allergic diseases. However, a detailed understanding of how such factors act together is lacking. To elucidate the interplay between genetic and environmental factors in allergic diseases, we used a novel bioinformatics approach that combines feature selection and machine learning. In two materials, PARSIFAL (a European cross-sectional study of 3113 children) and BAMSE (a Swedish birth-cohort including 2033 children), genetic variants as well as environmental and lifestyle factors were evaluated for their contribution to allergic phenotypes. Monte Carlo feature selection and rule based models were used to identify and rank rules describing how combinations of genetic and environmental factors affect the risk of allergic diseases. Novel interactions between genes were suggested and replicated, such as between ORMDL3 and RORA, where certain genotype combinations gave odds ratios for current asthma of 2.1 (95% CI 1.2-3.6) and 3.2 (95% CI 2.0-5.0) in the BAMSE and PARSIFAL children, respectively. Several combinations of environmental factors appeared to be important for the development of allergic disease in children. For example, use of baby formula and antibiotics early in life was associated with an odds ratio of 7.4 (95% CI 4.5-12.0) of developing asthma. Furthermore, genetic variants together with environmental factors seemed to play a role for allergic diseases, such as the use of antibiotics early in life and COL29A1 variants for asthma, and farm living and NPSR1 variants for allergic eczema. Overall, combinations of environmental and life style factors appeared more frequently in the models than combinations solely involving genes. In conclusion, a new bioinformatics approach is described for analyzing complex data, including extensive genetic and environmental information. Interactions identified with this approach could provide useful hints for further in-depth studies of etiological mechanisms and may also strengthen the basis for risk assessment and prevention.

    National Category
    Medical and Health Sciences
    Identifiers
    urn:nbn:se:uu:diva-213817 (URN)10.1371/journal.pone.0080080 (DOI)000327311900057 ()
    Available from: 2014-01-05 Created: 2014-01-04 Last updated: 2017-12-06Bibliographically approved
    3. Ciruvis: a web-based tool for rule networks and interaction detection using rule-based classifiers
    Open this publication in new window or tab >>Ciruvis: a web-based tool for rule networks and interaction detection using rule-based classifiers
    2014 (English)In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 15, p. 139-Article in journal (Refereed) Published
    Abstract [en]

    Background: The use of classification algorithms is becoming increasingly important for the field of computational biology. However, not only the quality of the classification, but also its biological interpretation is important. This interpretation may be eased if interacting elements can be identified and visualized, something that requires appropriate tools and methods. Results: We developed a new approach to detecting interactions in complex systems based on classification. Using rule-based classifiers, we previously proposed a rule network visualization strategy that may be applied as a heuristic for finding interactions. We now complement this work with Ciruvis, a web-based tool for the construction of rule networks from classifiers made of IF-THEN rules. Simulated and biological data served as an illustration of how the tool may be used to visualize and interpret classifiers. Furthermore, we used the rule networks to identify feature interactions, compared them to alternative methods, and computationally validated the findings. Conclusions: Rule networks enable a fast method for model visualization and provide an exploratory heuristic to interaction detection. The tool is made freely available on the web and may thus be used to aid and improve rule-based classification.

    Keywords
    Visualization, Rules, Interactions, Interaction detection, Classification, Rule-based classification
    National Category
    Biochemistry and Molecular Biology
    Identifiers
    urn:nbn:se:uu:diva-228027 (URN)10.1186/1471-2105-15-139 (DOI)000336679600001 ()
    Available from: 2014-07-02 Created: 2014-07-02 Last updated: 2017-12-05Bibliographically approved
    4. Different distribution of histone modifications in genes with unidirectional and bidirectional transcription and a role of CTCF and cohesin in directing transcription
    Open this publication in new window or tab >>Different distribution of histone modifications in genes with unidirectional and bidirectional transcription and a role of CTCF and cohesin in directing transcription
    2015 (English)In: BMC Genomics, ISSN 1471-2164, E-ISSN 1471-2164, Vol. 16, article id 300Article in journal (Refereed) Published
    Abstract [en]

    Background: Several post-translational histone modifications are mainly found in gene promoters and are associated with the promoter activity. It has been hypothesized that histone modifications regulate the transcription, as opposed to the traditional view with transcription factors as the key regulators. Promoters of most active genes do not only initiate transcription of the coding sequence, but also a substantial amount of transcription of the antisense strand upstream of the transcription start site (TSS). This promoter feature has generally not been considered in previous studies of histone modifications and transcription factor binding.

    Results: We annotated protein-coding genes as bi- or unidirectional depending on their mode of transcription and compared histone modifications and transcription factor occurrences between them. We found that H3K4me3, H3K9ac, and H3K27ac were significantly more enriched upstream of the TSS in bidirectional genes compared with the unidirectional ones. In contrast, the downstream histone modification signals were similar, suggesting that the upstream histone modifications might be a consequence of transcription rather than a cause. Notably, we found well-positioned CTCF and RAD21 peaks approximately 60-80 bp upstream of the TSS in the unidirectional genes. The peak heights were related to the amount of antisense transcription and we hypothesized that CTCF and cohesin act as a barrier against antisense transcription.

    Conclusions: Our results provide insights into the distribution of histone modifications at promoters and suggest a novel role of CTCF and cohesin as regulators of transcriptional direction.

    Keywords
    Antisense transcription, CTCF, RAD21, Cohesin, CAGE, Epigenetics, Transcription factor, Histone modification
    National Category
    Bioinformatics and Systems Biology
    Identifiers
    urn:nbn:se:uu:diva-230158 (URN)10.1186/s12864-015-1485-5 (DOI)000355166000001 ()25881024 (PubMedID)
    Available from: 2014-08-19 Created: 2014-08-19 Last updated: 2017-12-05Bibliographically approved
  • 22.
    Bornelöv, Susanne
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Enroth, Stefan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Genomics. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Komorowski, Jan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Visualization of Rules in Rule-Based Classifiers2012In: INTELLIGENT DECISION TECHNOLOGIES (IDT'2012), VOL 1, 2012, Vol. 15, p. 329-338Conference paper (Refereed)
    Abstract [en]

    Interpretation and visualization of the classification models are important parts of machine learning. Rule-based classifiers often contain too many rules to be easily interpreted by humans, and methods for post-classification analysis of the rules are needed. Here we present a strategy for circular visualization of sets of classification rules. The Circos software was used to generate graphs showing all pairs of conditions that were present in the rules as edges inside a circle. We showed using simulated data that all two-way interactions in the data were found by the classifier and displayed in the graph, although the single attributes were constructed to have no correlation to the decision class. For all examples we used rules trained using the rough set theory, but the visualization would by applicable to any sort of classification rules. This method for rule visualization may be useful for applications where interaction terms are expected, and the size of the model limits the interpretability.

  • 23.
    Boukharta, Lars
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    Gutierréz de Terán, Hugo
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    Åqvist, Johan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    Computational prediction of alanine scanning and ligand binding energetics in G-protein coupled receptors2014In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 10, no 4, p. e1003585-Article in journal (Refereed)
    Abstract [en]

    Site-directed mutagenesis combined with binding affinity measurements is widely used to probe the nature of ligand interactions with GPCRs. Such experiments, as well as structure-activity relationships for series of ligands, are usually interpreted with computationally derived models of ligand binding modes. However, systematic approaches for accurate calculations of the corresponding binding free energies are still lacking. Here, we report a computational strategy to quantitatively predict the effects of alanine scanning and ligand modifications based on molecular dynamics free energy simulations. A smooth stepwise scheme for free energy perturbation calculations is derived and applied to a series of thirteen alanine mutations of the human neuropeptide Y1 receptor and series of eight analogous antagonists. The robustness and accuracy of the method enables univocal interpretation of existing mutagenesis and binding data. We show how these calculations can be used to validate structural models and demonstrate their ability to discriminate against suboptimal ones. Author Summary G-protein coupled receptors constitute a family of drug targets of outstanding interest, with more than 30% of the marketed drugs targeting a GPCR. The combination of site-directed mutagenesis, biochemical experiments and computationally generated 3D structural models has traditionally been used to investigate these receptors. The increasing number of GPCR crystal structures now paves the way for detailed characterization of receptor-ligand interactions and energetics using advanced computer simulations. Here, we present an accurate computational scheme to predict and interpret the effects of alanine scanning experiments, based on molecular dynamics free energy simulations. We apply the technique to antagonist binding to the neuropeptide Y receptor Y1, the structure of which is still unknown. A structural model of a Y1-antagonist complex was derived and used as starting point for computational characterization of the effects on binding of alanine substitutions at thirteen different receptor positions. Further, we used the model and computational scheme to predict the binding of a series of seven antagonist analogs. The results are in excellent agreement with available experimental data and provide validation of both the methodology and structural models of the complexes.

  • 24.
    Bringeland, Nathalie
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Neuroscience, Functional Pharmacology. Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    DNA methylation correlation networks in overweight and normal-weight adolescents reveal differential coordination2013Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Multiple health issues are associated with obesity and numerous factors are causative of the disease. The role of genetic factors is well established, as is the knowledge that dietary and sedentary behavior promotes weight gain. Although there is strong suspicion towards the role of epigenetics as a driving force toward disease, this field remains l in the context of obesity. DNA methylation correlation networks were profiled from blood samples of 69 adolescents of two distinct weight-classes; obese (n=35) and normal-weight (n=34). The network analysis revealed major differences in the organization of the networks where the network of the obese had less modularity compared to normal-weight. This is manifested by more and smaller clusters in the obese, pertaining to genes of related functions and pathways, than the network of the normal-weight. Consequently, this suggests that biological pathways have a lower order of coordination between each other in means of DNA methylation in obese than normal-weight. Analysis of highly connected genes, hubs, in the two networks suggests that the difference in coordination between biological pathways may be derived by changes of the methylation pattern of these hubs; highly connected genes in one network had an intriguingly low connectivity in the other. In conclusion, the results suggest differential regulation of transcription through changes in the coordination of DNA methylation in overweight and normal weighted individuals. The findings of this study are a major step towards understanding the role of DNA methylation in obesity and provide potential biomarkers for diagnosing and predicting obesity.

  • 25.
    Bäcklin, Christofer
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Machine Learning Based Analysis of DNA Methylation Patterns in Pediatric Acute Leukemia2015Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Acute lymphoblastic leukemia (ALL) is the most common pediatric cancer in the Nordic countries. Recent evidence indicate that DNA methylation (DNAm) play a central role in the development and progression of the disease.

    DNAm profiles of a collection of ALL patient samples and a panel of non-leukemic reference samples were analyzed using the Infinium 450k methylation assay. State-of-the-art machine learning algorithms were used to search the large amounts of data produced for patterns predictive of future relapses, in vitro drug resistance, and cytogenetic subtypes, aiming at improving our understanding of the disease and ultimately improving treatment.

    In paper I, the predictive modeling framework developed to perform the analyses of DNAm dataset was presented. It focused on uncompromising statistical rigor and computational efficiency, while allowing a high level of modeling flexibility and usability. In paper II, the DNAm landscape of ALL was comprehensively characterized, discovering widespread aberrant methylation at diagnosis strongly influenced by cytogenetic subtype. The aberrantly methylated regions were enriched for genes repressed by polycomb group proteins, repressively marked histones in healthy cells, and genes associated with embryonic development. A consistent trend of hypermethylation at relapse was also discovered. In paper III, a tool for DNAm-based subtyping was presented, validated using blinded samples and used to re-classify samples with incomplete phenotypic information. Using RNA-sequencing, previously undetected non-canonical aberrations were found in many re-classified samples. In paper IV, the relationship between DNAm and in vitro drug resistance was investigated and predictive signatures were obtained for seven of the eight therapeutic drugs studied. Interpretation was challenging due to poor correlation between DNAm and gene expression, further complicated by the discovery that random subsets of the array can yield comparable classification accuracy. Paper V presents a novel Bayesian method for multivariate density estimation with variable bandwidths. Simulations showed comparable performance to the current state-of-the-art methods and an advantage on skewed distributions.

    In conclusion, the studies characterize the information contained in the aberrant DNAm patterns of ALL and assess its predictive capabilities for future relapses, in vitro drug sensitivity and subtyping. They also present three publicly available tools for the scientific community to use.

    List of papers
    1. Developer Friendly and Computationally Efficient Predictive Modeling without Information Leakage: The emil Package for R
    Open this publication in new window or tab >>Developer Friendly and Computationally Efficient Predictive Modeling without Information Leakage: The emil Package for R
    (English)In: Journal of Statistical Software, ISSN 1548-7660, E-ISSN 1548-7660Article in journal (Other academic) Submitted
    Abstract [en]

    Machine learning-based solutions to predictive modeling problems (classification, regression, or survival analysis) typically involve a number of steps beginning with data pre-processing and ending with performance evaluation. A large number of packages providing tools for the individual steps are available for R but not for facilitating the assembly of them into complete modeling procedures or rigorously evaluating their combined performance.

    We present a new package for R denoted emil (evaluation of modeling without information leakage) that is designed to be a flexible backbone of modeling procedures having the following properties:(1) Enable evaluation of performance and variable importance by means of resampling methods without introducing information leakage.(2) Return parameter tuning statistics and final prediction models.(3) Transparent, highly customizable and easy to debug structure.(4) Offer the user direct control over memory and CPU-intensive steps of the calculations.(5) Comprehensive yet concise documentation.

    First we explain emil's functionality in the context of standard usage, resampling, and customization. Specific application examples are presented to show its potential in terms of parallelization, customization for survival analysis, and memory management.

    The result is a computationally efficient and developer friendly framework that enables resampling based analyzes using several hundreds of thousands of variables, is easy to extend, and allows development of scalable solutions.

    Keywords
    predictive modeling, machine learning, performance evaluation, resampling, high performance computing
    National Category
    Computational Mathematics
    Research subject
    Materials Science
    Identifiers
    urn:nbn:se:uu:diva-242353 (URN)
    Funder
    Swedish Foundation for Strategic Research , RBc08-008
    Available from: 2015-01-25 Created: 2015-01-25 Last updated: 2017-12-05Bibliographically approved
    2. Genome-wide signatures of differential DNA methylation in pediatric acute lymphoblastic leukemia
    Open this publication in new window or tab >>Genome-wide signatures of differential DNA methylation in pediatric acute lymphoblastic leukemia
    Show others...
    2013 (English)In: Genome Biology, ISSN 1465-6906, E-ISSN 1474-760X, Vol. 14, no 9, p. r105-Article in journal (Refereed) Published
    Abstract [en]

    BACKGROUND:

    Although aberrant DNA methylation has been observed previously in acute lymphoblastic leukemia (ALL), the patterns of differential methylation have not been comprehensively determined in all subtypes of ALL on a genome-wide scale. The relationship between DNA methylation, cytogenetic background, drug resistance and relapse in ALL is poorly understood.

    RESULTS:

    We surveyed the DNA methylation levels of 435,941 CpG sites in samples from 764 children at diagnosis of ALL and from 27 children at relapse. This survey uncovered four characteristic methylation signatures. First, compared with control blood cells, the methylomes of ALL cells shared 9,406 predominantly hypermethylated CpG sites, independent of cytogenetic background. Second, each cytogenetic subtype of ALL displayed a unique set of hyper- and hypomethylated CpG sites. The CpG sites that constituted these two signatures differed in their functional genomic enrichment to regions with marks of active or repressed chromatin. Third, we identified subtype-specific differential methylation in promoter and enhancer regions that were strongly correlated with gene expression. Fourth, a set of 6,612 CpG sites was predominantly hypermethylated in ALL cells at relapse, compared with matched samples at diagnosis. Analysis of relapse-free survival identified CpG sites with subtype-specific differential methylation that divided the patients into different risk groups, depending on their methylation status.

    CONCLUSIONS:

    Our results suggest an important biological role for DNA methylation in the differences between ALL subtypes and in their clinical outcome after treatment.

    National Category
    Medical Genetics
    Identifiers
    urn:nbn:se:uu:diva-208296 (URN)10.1186/gb-2013-14-9-r105 (DOI)000328195700011 ()24063430 (PubMedID)
    Note

    De två första författarna delar förstaförfattarskapet.

    Available from: 2013-09-27 Created: 2013-09-27 Last updated: 2018-01-11Bibliographically approved
    3. DNA methylation-based subtype prediction for pediatric acute lymphoblastic leukemia
    Open this publication in new window or tab >>DNA methylation-based subtype prediction for pediatric acute lymphoblastic leukemia
    Show others...
    2015 (English)In: Clinical Epigenetics, E-ISSN 1868-7083, Vol. 7, article id 11Article in journal (Refereed) Published
    Abstract [en]

    Background

    We present a method that utilizes DNA methylation profiling for prediction of the cytogenetic subtypes of acute lymphoblastic leukemia (ALL) cells from pediatric ALL patients. The primary aim of our study was to improve risk stratification of ALL patients into treatment groups using DNA methylation as a complement to current diagnostic methods. A secondary aim was to gain insight into the functional role of DNA methylation in ALL.

    Results

    We used the methylation status of ~450,000 CpG sites in 546 well-characterized patients with T-ALL or seven recurrent B-cell precursor ALL subtypes to design and validate sensitive and accurate DNA methylation classifiers. After repeated cross-validation, a final classifier was derived that consisted of only 246 CpG sites. The mean sensitivity and specificity of the classifier across the known subtypes was 0.90 and 0.99, respectively. We then used DNA methylation classification to screen for subtype membership of 210 patients with undefined karyotype (normal or no result) or non-recurrent cytogenetic aberrations (‘other’ subtype). Nearly half (n = 106) of the patients lacking cytogenetic subgrouping displayed highly similar methylation profiles as the patients in the known recurrent groups. We verified the subtype of 20% of the newly classified patients by examination of diagnostic karyotypes, array-based copy number analysis, and detection of fusion genes by quantitative polymerase chain reaction (PCR) and RNA-sequencing (RNA-seq). Using RNA-seq data from ALL patients where cytogenetic subtype and DNA methylation classification did not agree, we discovered several novel fusion genes involving ETV6, RUNX1, and PAX5.

    Conclusions

    Our findings indicate that DNA methylation profiling contributes to the clarification of the heterogeneity in cytogenetically undefined ALL patient groups and could be implemented as a complementary method for diagnosis of ALL. The results of our study provide clues to the origin and development of leukemic transformation. The methylation status of the CpG sites constituting the classifiers also highlight relevant biological characteristics in otherwise unclassified ALL patients.

    National Category
    Hematology
    Identifiers
    urn:nbn:se:uu:diva-242351 (URN)10.1186/s13148-014-0039-z (DOI)000350260800001 ()25729447 (PubMedID)
    Funder
    Swedish Foundation for Strategic Research , RBc08-008
    Note

    De två sista författarna delar sistaförfattarskapet.

    Available from: 2015-01-25 Created: 2015-01-25 Last updated: 2017-12-05Bibliographically approved
    4. DNA methylation-based prediction of in vitro drug resistance in primary pediatric acute lymphoblastic leukemia patient samples
    Open this publication in new window or tab >>DNA methylation-based prediction of in vitro drug resistance in primary pediatric acute lymphoblastic leukemia patient samples
    Show others...
    (English)Manuscript (preprint) (Other academic)
    National Category
    Cancer and Oncology Hematology Bioinformatics (Computational Biology)
    Identifiers
    urn:nbn:se:uu:diva-242543 (URN)
    Funder
    Swedish Foundation for Strategic Research , RBc08-008
    Available from: 2015-01-27 Created: 2015-01-27 Last updated: 2018-01-11
    5. Bayesian model averaging of adaptive bandwidth kernel density estimators yields state-of-the-art performance
    Open this publication in new window or tab >>Bayesian model averaging of adaptive bandwidth kernel density estimators yields state-of-the-art performance
    (English)Manuscript (preprint) (Other academic)
    Keywords
    Variable kernel density estimation, adaptive kernel density estimation, Bayesian model averaging, variable bandwidth, square root law
    National Category
    Probability Theory and Statistics
    Identifiers
    urn:nbn:se:uu:diva-242354 (URN)
    Funder
    Swedish Foundation for Strategic Research , RBc08-008EU, FP7, Seventh Framework Programme, PROACTIVE
    Available from: 2015-01-27 Created: 2015-01-25 Last updated: 2015-03-11
  • 26.
    Bäcklin, Christofer
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Freyhult, Eva
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Frost, Britt-Marie
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Women's and Children's Health, Pediatrics.
    Palle, Josefine
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Women's and Children's Health, Pediatrics.
    Larsson, Rolf
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Syvänen, Ann-Christine
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Molecular Medicine.
    Lönnerholm, Gudmar
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Women's and Children's Health, Pediatrics.
    Gustafsson, Mats
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    DNA methylation-based prediction of in vitro drug resistance in primary pediatric acute lymphoblastic leukemia patient samplesManuscript (preprint) (Other academic)
  • 27.
    Capuccini, Marco
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Structure-Based Virtual Screening in Spark2015Independent thesis Advanced level (degree of Master (Two Years)), 30 credits / 45 HE creditsStudent thesis
  • 28.
    Capuccini, Marco
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Ahmed, Laeeq
    Schaal, Wesley
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Laure, Erwin
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Large-scale virtual screening on public cloud resources with Apache Spark2017In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 9, article id 15Article in journal (Refereed)
  • 29.
    Carreras-Puigvert, Jordi
    et al.
    Karolinska Inst, Div Translat Med & Chem Biol, Dept Mol Biochem & Biophys, Sci Life Lab, S-17165 Stockholm, Sweden.
    Zitnik, Marinka
    Univ Ljubljana, Fac Comp & Informat Sci, SI-1000 Ljubljana, Slovenia.; Stanford Univ, Dept Comp Sci, Palo Alto, CA 94305 USA.
    Jemth, Ann-Sofie
    Karolinska Inst, Div Translat Med & Chem Biol, Dept Mol Biochem & Biophys, Sci Life Lab, S-17165 Stockholm, Sweden.
    Carter, Megan
    Stockholm Univ, Dept Biochem & Biophys, S-10691 Stockholm, Sweden.
    Unterlass, Judith E
    Karolinska Inst, Div Translat Med & Chem Biol, Dept Mol Biochem & Biophys, Sci Life Lab, S-17165 Stockholm, Sweden.
    Hallström, Björn
    KTH Royal Inst Technol, Sci Life Lab, Cell Profiling Affin Prote, S-17165 Stockholm, Sweden.
    Loseva, Olga
    Karolinska Inst, Div Translat Med & Chem Biol, Dept Mol Biochem & Biophys, Sci Life Lab, S-17165 Stockholm, Sweden.
    Karem, Zhir
    Karolinska Inst, Div Translat Med & Chem Biol, Dept Mol Biochem & Biophys, Sci Life Lab, S-17165 Stockholm, Sweden.
    Calderón-Montaño, José Manuel
    Karolinska Inst, Div Translat Med & Chem Biol, Dept Mol Biochem & Biophys, Sci Life Lab, S-17165 Stockholm, Sweden.
    Lindskog, Cecilia
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Edqvist, Per-Henrik D
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Experimental and Clinical Oncology.
    Matuszewski, Damian J.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Ait Blal, Hammou
    KTH Royal Inst Technol, Sci Life Lab, Cell Profiling Affin Prote, S-17165 Stockholm, Sweden.
    Berntsson, Ronnie P A
    Stockholm Univ, Dept Biochem & Biophys, S-10691 Stockholm, Sweden.
    Häggblad, Maria
    Stockholm Univ, Dept Biochem & Biophys, Sci Life Lab, Biochem & Cellular Screening Facil, S-17165 Stockholm, Sweden.
    Martens, Ulf
    Stockholm Univ, Dept Biochem & Biophys, Sci Life Lab, Biochem & Cellular Screening Facil, S-17165 Stockholm, Sweden.
    Studham, Matthew
    Stockholm Univ, Dept Biochem & Biophys, Stockholm Bioinformat Ctr, Sci Life Lab, Box 1031, S-17121 Solna, Sweden.
    Lundgren, Bo
    Stockholm Univ, Dept Biochem & Biophys, Sci Life Lab, Biochem & Cellular Screening Facil, S-17165 Stockholm, Sweden.
    Wählby, Carolina
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Sonnhammer, Erik L L
    Stockholm Univ, Dept Biochem & Biophys, Stockholm Bioinformat Ctr, Sci Life Lab, Box 1031, S-17121 Solna, Sweden.
    Lundberg, Emma
    KTH Royal Inst Technol, Sci Life Lab, Cell Profiling Affin Prote, S-17165 Stockholm, Sweden.
    Stenmark, Pål
    Stockholm Univ, Dept Biochem & Biophys, S-10691 Stockholm, Sweden.
    Zupan, Blaz
    Univ Ljubljana, Fac Comp & Informat Sci, SI-1000 Ljubljana, Slovenia.; Baylor Coll Med, Dept Mol & Human Genet, Houston, TX 77030 USA.
    Helleday, Thomas
    Karolinska Inst, Div Translat Med & Chem Biol, Dept Mol Biochem & Biophys, Sci Life Lab, S-17165 Stockholm, Sweden.
    A comprehensive structural, biochemical and biological profiling of the human NUDIX hydrolase family2017In: Nature Communications, ISSN 2041-1723, E-ISSN 2041-1723, Vol. 8, no 1, article id 1541Article in journal (Refereed)
    Abstract [en]

    The NUDIX enzymes are involved in cellular metabolism and homeostasis, as well as mRNA processing. Although highly conserved throughout all organisms, their biological roles and biochemical redundancies remain largely unclear. To address this, we globally resolve their individual properties and inter-relationships. We purify 18 of the human NUDIX proteins and screen 52 substrates, providing a substrate redundancy map. Using crystal structures, we generate sequence alignment analyses revealing four major structural classes. To a certain extent, their substrate preference redundancies correlate with structural classes, thus linking structure and activity relationships. To elucidate interdependence among the NUDIX hydrolases, we pairwise deplete them generating an epistatic interaction map, evaluate cell cycle perturbations upon knockdown in normal and cancer cells, and analyse their protein and mRNA expression in normal and cancer tissues. Using a novel FUSION algorithm, we integrate all data creating a comprehensive NUDIX enzyme profile map, which will prove fundamental to understanding their biological functionality.

  • 30.
    Che, Huiwen
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Evaluation of de novo assembly using PacBio long reads2016Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    New sequencing technologies show promise for the construction of complete and accurate genome sequences, by a process called de novo assembly that joins reads by overlap to longer contiguous sequences without the need for a reference genome. High-quality de novo assembly leads to better understanding in genetic variations. The purpose of this thesis is to evaluate human genome sequences obtained from the PacBio sequencing platform, which is a new technology suitable for de novo assembly of large genomes. The evaluation focuses on comparing sequence identity between our own de novo assemblies and the available human reference and through that, benchmark accuracy of our data. Sequences that are absent from the reference genome, are investigated for potential unannotated genes coordinately. We also assess the complex structural variation using different approaches. Our assemblies show high consensus with the human reference genome, with ⇠ 98.6% of the bases in the assemblies mapped to the human reference. We also detect more than ten thousand of structural variants, including some large rearrangements, with respect to the reference.

  • 31.
    Christoffersson, Gustaf
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Cell Biology.
    Lomei, Jalal
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Cell Biology.
    O'Callaghan, Paul
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Cell Biology.
    Kreuger, Johan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Cell Biology.
    Engblom, Stefan
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Phillipson, Mia
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Cell Biology.
    Vascular sprouts induce local attraction of proangiogenic neutrophils2017In: Journal of Leukocyte Biology, ISSN 0741-5400, E-ISSN 1938-3673, Vol. 102, p. 741-751Article in journal (Refereed)
  • 32.
    Clauson, Björn
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Evaluation of methodologies for estimation of change in systemic drug exposure in renally impaired patients: Elucidation of possible causes to discrepancies in results based on phase I and III data2015Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Introduction: Regulatory authorities require certain subpopulations, such as patients with renal impairment (RI) to be studied specifically. This may be done in phase I analyzed with Non-Compartmental Analysis (NCA), and/or as part of phase III utilizing population pharmacokinetic (PopPK) methods. However, it has been suggested that phase I data analyzed with NCA may overestimate the effect of RI, as compared with PopPK analysis.

    Aim:  This project aimed to investigate causes for the discrepancy previously observed when calculating the exposure increase over different RI groups based on phase I and III data, and to examine the effect of erroneous assumptions made during PopPK model development, which can be of potential benefit in drug development.

    Materials and Methods: Phase I and III data were simulated based on PopPK models. Potential causes, related to the methods used, to the over-prediction by NCA were investigated. For phase III data the influence of model misspecification on the estimation of exposure increase in RI was explored.

    Results: The observed over-predictions by NCA were suggested to be due mainly to sub-optimal NCA and bias calculations, the latter with respect to creatinine clearance (CrCL) reference value. In PopPK analysis of phase III data, using erroneous structural and/or covariate model may result in severe bias in the estimation of the effect of RI, while disregarding the effect of inter-occasion variability led to low bias. 

    Conclusions: The previously observed over-prediction by the NCA method appears to mainly be an artefact due to inappropriate methodology. When investigating exposure increase in RI patients using PopPK for phase III data, careful consideration regarding assumptions should be made, especially with lower fraction excreted, as results suggest large bias when an erroneous PopPK model is applied. 

  • 33.
    Dahlqvist, Bengt
    et al.
    Uppsala University.
    Bengtsson, Ewert
    Uppsala University.
    Eriksson, Olle
    Uppsala University.
    Jarkrans, Torsten
    Uppsala University.
    Nordin, Bo
    Uppsala University.
    Stenkvist, Björn
    A Computer Program for Logistic Prediction Modelling1985In: Computer Programs in Biomedicine, ISSN 0010-468X, no 19, p. 235-238Article in journal (Refereed)
  • 34.
    Dahlö, Martin
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Haziza, Frédéric
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
    Kallio, Aleksi
    Korpelainen, Eija
    Bongcam-Rudloff, Erik
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    BioImg.org: A catalog of virtual machine images for the life sciences2015In: Bioinformatics and Biology Insights, ISSN 1177-9322, E-ISSN 1177-9322, Vol. 9, p. 125-128Article in journal (Refereed)
    Abstract [en]

    Virtualization is becoming increasingly important in bioscience, enabling assembly and provisioning of complete computer setups, including operating system, data, software, and services packaged as virtual machine images (VMIs). We present an open catalog of VMIs for the life sciences, where scientists can share information about images and optionally upload them to a server equipped with a large file system and fast Internet connection. Other scientists can then search for and download images that can be run on the local computer or in a cloud computing environment, providing easy access to bioinformatics environments. We also describe applications where VMIs aid life science research, including distributing tools and data, supporting reproducible analysis, and facilitating education.

  • 35.
    Dakshinamurthi, Ashwin Kumar
    et al.
    Department of Biotechnology, Sri Venkateswara College of Engineering, Sriperumbudur, Tamilnadu, India.
    Chidambaram, Manthira Vasagam
    Department of Biotechnology, Sri Venkateswara College of Engineering, Sriperumbudur, Tamilnadu, India.
    Manivel, Vivek Anand
    Department of Biotechnology, Sri Venkateswara College of Engineering, Sriperumbudur, Tamilnadu, India.
    Detchanamurthy, Swaminathan
    Department of Chemical and Process Engineering, University of Canterbury, Christchurch, New Zealand.
    Site directed mutagenesis of human Interleukin-2 gene to increase the stability of the gene product: A Bioinformatics Approach2009In: International Journal of Bioinformatics Research, ISSN 0975–3087, Vol. 1, no 2, p. 4-13Article in journal (Refereed)
    Abstract [en]

    Interleukin-2 (IL-2) is an immunoregulatory cytokine whose biological effects are mediated through interaction with specific receptors on the surface of target cells. Due to its presumed role in generating a normal immune response, IL-2 is being evaluated for the treatment of a variety of tumors, in addition to infectious diseases. Main drawback of human IL-2 is that the molecule is relatively unstable. Therefore, with the objective of increasing the stability of the molecule, site directed mutagenesis of human IL-2 gene was carried out. Early studies indicated that mutations at three Cysteine residues (58, 105, 125) which are in the active sites of human IL-2 resulted in the reduced stability as well as the biological activity of the molecule. Therefore, mutations were carried out at the positions of amino acid other than the receptor binding sites at 111Valine to Arginine, 117Lysine to Glutamine and 133 Threonine to Asparagine of the human sequence by comparing it with the bovine sequence which has higher stability than the human counterpart, using SWISS PDB tool. To understand the biological activity of the mutated IL-2, energy minimization studies were carried out using SWISS-PDB. Docking studies were performed to check the reliability of the results using HEX DOCK, ARGUS LAB and PATCH DOCK between the IL-2 receptor and its mutated Ligand. These docking results also confirmed that the reliability of these mutated IL-2 gene. Stability, half life and ADME characteristics of these mutants can be studied in a detailed manner in the in vivo studies.

  • 36. Das, Sarbashis
    et al.
    Duggal, Priyanka
    Roy, Rahul
    Myneedu, Vithal P
    Behera, Digamber
    Prasad, Hanumanthappa K
    Bhattacharya, Alok
    Identification of hot and cold spots in genome of Mycobacterium tuberculosis using Shewhart Control Charts.2012In: Scientific reports, ISSN 2045-2322, Vol. 2, p. 297-Article in journal (Refereed)
    Abstract [en]

    The organization of genomic sequences is dynamic and undergoes change during the process of evolution. Many of the variations arise spontaneously and the observed genomic changes can either be distributed uniformly throughout the genome or be preferentially localized to some regions (hot spots) compared to others. Conversely cold spots may tend to accumulate very few variations or none at all. In order to identify such regions statistically, we have developed a method based on Shewhart Control Chart. The method was used for identification of hot and cold spots of single-nucleotide variations (SNVs) in Mycobacterium tuberculosis genomes. The predictions have been validated by sequencing some of these regions derived from clinical isolates. This method can be used for analysis of other genome sequences particularly infectious microbes.

  • 37. Das, Sarbashis
    et al.
    Roychowdhury, Tanmoy
    Kumar, Parameet
    Kumar, Anil
    Kalra, Priya
    Singh, Jitendra
    Singh, Sarman
    Prasad, H K
    Bhattacharya, Alok
    Genetic heterogeneity revealed by sequence analysis of Mycobacterium tuberculosis isolates from extra-pulmonary tuberculosis patients.2013In: BMC genomics, ISSN 1471-2164, Vol. 14, p. 404-Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Tuberculosis remains a major public health problem. Clinical tuberculosis manifests often as pulmonary and occasionally as extra-pulmonary tuberculosis. The emergence of drug resistant tubercle bacilli and its association with HIV is a formidable challenge to curb the spread of tuberculosis. There have been concerted efforts by whole genome sequencing and bioinformatics analysis to identify genomic patterns and to establish a relationship between the genotype of the organism and clinical manifestation of tuberculosis. Extra-pulmonary TB constitutes 15-20 percent of the total clinical cases of tuberculosis reported among immunocompetent patients, whereas among HIV patients the incidence is more than 50 percent. Genomic analysis of M. tuberculosis isolates from extra pulmonary patients has not been explored.

    RESULTS: The genomic DNA of 5 extra-pulmonary clinical isolates of M. tuberculosis derived from cerebrospinal fluid, lymph node fine needle aspirates (FNAC) / biopsies, were sequenced. Next generation sequencing approach (NGS) was employed to identify Single Nucleotide Variations (SNVs) and computational methods used to predict their consequence on functional genes. Analysis of distribution of SNVs led to the finding that there are mixed genotypes in patient isolates and that many SNVs are likely to influence either gene function or their expression. Phylogenetic relationship between the isolates correlated with the origin of the isolates. In addition, insertion sites of IS elements were identified and their distribution revealed a variation in number and position of the element in the 5 extra-pulmonary isolates compared to the reference M. tuberculosis H37Rv strain.

    CONCLUSIONS: The results suggest that NGS sequencing is able to identify small variations in genomes of M. tuberculosis isolates including changes in IS element insertion sites. Moreover, variations in isolates of M. tuberculosis from non-pulmonary sites were documented. The analysis of our results indicates genomic heterogeneity in the clinical isolates.

  • 38. Das, Sarbashis
    et al.
    Vishnoi, Anchal
    Bhattacharya, Alok
    ABWGAT: anchor-based whole genome analysis tool.2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 24, p. 3319-20Article in journal (Refereed)
    Abstract [en]

    SUMMARY: Large numbers of genomes are being sequenced regularly and the rate will go up in future due to availability of new genome sequencing techniques. In order to understand genotype to phenotype relationships, it is necessary to identify sequence variations at the genomic level. Alignment of a pair of genomes and parsing the alignment data is an accepted approach for identification of variations. Though there are a number of tools available for whole-genome alignment, none of these allows automatic parsing of the alignment and identification of different kinds of genomic variants with high degree of sensitivity. Here we present a simple web-based interface for whole genome comparison named ABWGAT (Anchor-Based Whole Genome Analysis Tool) that is simple to use. The output is a list of variations such as SNVs, indels, repeat expansion and inversion.

    AVAILABILITY: The web server is freely available to non-commercial users at the following address http://abwgc.jnu.ac.in/_sarba. Supplementary data are available at http://abwgc.jnu.ac.in/_sarba/cgi-bin/abwgc_retrival.cgi using job id 524, 526 and 528.

    CONTACT: dsarbashis@gmail.com; alok.bhattacharya@gmail.com

  • 39.
    D'Elia, Domenica
    et al.
    Institute for Biomedical Technologies, CNR, Via Amendola 122/D, 70126 Bari, Italy.
    Gisel, Andreas
    Institute for Biomedical Technologies, CNR, Via Amendola 122/D, 70126 Bari, Italy.
    Eriksson, Nils-Einar
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Kossida, Sophia
    Bioinformatics & Medical Informatics Team, Biomedical Research Foundation of the Academy of Athens, 11527 Athens, Greece.
    Mattila, Kimmo
    CSC – IT Center for Science Ltd., Keilaranta 14, 02100 Espoo, Finland.
    Klucar, Lubos
    Institute of Molecular Biology, Slovak Academy of Sciences, Dubravska cesta 21, 84551 Bratislava, Slovakia.
    Bongcam-Rudloff, Erik
    Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, 75024 Uppsala, Sweden.
    The 20th anniversary of EMBnet: 20 years of bioinformatics for the Life Sciences community2009In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, no Suppl. 6, p. S1-Article in journal (Refereed)
    Abstract [en]

    The EMBnet Conference 2008, focusing on 'Leading Applications and Technologies in Bioinformatics', was organized by the European Molecular Biology network (EMBnet) to celebrate its 20th anniversary. Since its foundation in 1988, EMBnet has been working to promote collaborative development of bioinformatics services and tools to serve the European community of molecular biology laboratories. This conference was the first meeting organized by the network that was open to the international scientific community outside EMBnet. The conference covered a broad range of research topics in bioinformatics with a main focus on new achievements and trends in emerging technologies supporting genomics, transcriptomics and proteomics analyses such as high-throughput sequencing and data managing, text and data-mining, ontologies and Grid technologies. Papers selected for publication, in this supplement to BMC Bioinformatics, cover a broad range of the topics treated, providing also an overview of the main bioinformatics research fields that the EMBnet community is involved in.

  • 40.
    Dvirnas, Albertas
    et al.
    Lund Univ, Dept Astron & Theoret Phys, Lund, Sweden..
    Pichler, Christoffer
    Lund Univ, Dept Astron & Theoret Phys, Lund, Sweden..
    Stewart, Callum L.
    Lund Univ, Dept Astron & Theoret Phys, Lund, Sweden..
    Quaderi, Saair
    Lund Univ, Dept Astron & Theoret Phys, Lund, Sweden.;Chalmers Univ Technol, Dept Biol & Biol Engn, Gothenburg, Sweden..
    Nyberg, Lena K.
    Chalmers Univ Technol, Dept Biol & Biol Engn, Gothenburg, Sweden..
    Muller, Vilhelm
    Chalmers Univ Technol, Dept Biol & Biol Engn, Gothenburg, Sweden..
    Bikkarolla, Santosh Kumar
    Chalmers Univ Technol, Dept Biol & Biol Engn, Gothenburg, Sweden..
    Kristiansson, Erik
    Univ Gothenburg, Chalmers Univ Technol, Dept Math Sci, Gothenburg, Sweden..
    Sandegren, Linus
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology.
    Westerlund, Fredrik
    Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden.
    Ambjornsson, Tobias
    Lund Univ, Dept Astron & Theoret Phys, Lund, Sweden..
    Facilitated sequence assembly using densely labeled optical DNA barcodes: A combinatorial auction approach2018In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 13, no 3, article id e0193900Article in journal (Refereed)
    Abstract [en]

    The output from whole genome sequencing is a set of contigs, i.e. short non-overlapping DNA sequences (sizes 1-100 kilobasepairs). Piecing the contigs together is an especially difficult task for previously unsequenced DNA, and may not be feasible due to factors such as the lack of sufficient coverage or larger repetitive regions which generate gaps in the final sequence. Here we propose a new method for scaffolding such contigs. The proposed method uses densely labeled optical DNA barcodes from competitive binding experiments as scaffolds. On these scaffolds we position theoretical barcodes which are calculated from the contig sequences. This allows us to construct longer DNA sequences from the contig sequences. This proof-of-principle study extends previous studies which use sparsely labeled DNA barcodes for scaffolding purposes. Our method applies a probabilistic approach that allows us to discard "foreign" contigs from mixed samples with contigs from different types of DNA. We satisfy the contig non-overlap constraint by formulating the contig placement challenge as a combinatorial auction problem. Our exact algorithm for solving this problem reduces computational costs compared to previous methods in the combinatorial auction field. We demonstrate the usefulness of the proposed scaffolding method both for synthetic contigs and for contigs obtained using Illumina sequencing for a mixed sample with plasmid and chromosomal DNA.

  • 41. Dyakova, O.
    et al.
    Mueller, M.M.
    Egelhaaf, M.
    Nordström, K.
    Predicting unconstrained field flight behaviour from image statisticsIn: Article in journal (Refereed)
  • 42. Emami Khoonsari, Payam
    et al.
    Moreno, Pablo
    Bergmann, Sven
    Burman, Joachim
    Capuccini, Marco
    Carone, Matteo
    Cascante, Marta
    Atauri, Pedro de
    Dudova, Zdenka
    Foguet, Carles
    Gonzalez-Beltran, Alejandra
    Hankemeier, Thomas
    Haug, Kenneth
    He, Sijin
    Herman, Stephanie
    Johnson, David
    Kale, Namrata
    Larsson, Anders
    Salek, Reza M
    Neumann, Steffen
    Peters, Kristian
    Pireddu, Luca
    Rocca-Serra, Philippe
    Roger, Pierrick
    Rueedi, Rico
    Ruttkies, Christoph
    Sadawi, Noureddin
    Sansone, Susanna-Assunta
    Schober, Daniel
    Selivanov, Vitaly
    Thévenot, Etienne A.
    Vliet, Michael van
    Zanetti, Gianluigi
    Steinbeck, Christoph
    Kultima, Kim
    Spjuth, Ola
    Interoperable and scalable metabolomics data analysis with microservicesManuscript (preprint) (Other academic)
    Abstract [en]

    Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We here present a generic method based on microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed in parallel using the Kubernetes container orchestrator. The method was developed within the PhenoMeNal consortium to support flexible metabolomics data analysis and was designed as a virtual research environment which can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and established workflows can be re-used effortlessly by any novice user. We validate our method on two mass spectrometry studies, one nuclear magnetic resonance spectroscopy study and one fluxomics study, showing that the method scales dynamically with increasing availability of computational resources. We achieved a complete integration of the major software suites resulting in the first turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, multivariate statistics, and metabolite identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative data analysis.

  • 43.
    Foster, David V.
    et al.
    Pluribus Systems.
    Rorick, Mary M.
    Howard Hughes Medical Institute, Univ. of Michican.
    Gesell, Tanja
    University of Vienna.
    Feeney, Laura Marie
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
    Foster, Jacob G.
    University of Chicago.
    Dynamic landscapes: A model of context and contingency in evolution2013In: Journal of Theoretical Biology, ISSN 0022-5193, E-ISSN 1095-8541, Vol. 334, p. 162-172Article in journal (Refereed)
    Abstract [en]

    Although the basic mechanics of evolution have been understood since Darwin, debate continues over whether macroevolutionary phenomena are driven by the fitness structure of genotype space or by ecological interaction. In this paper we propose a simple model capturing key features of fitness-landscape and ecological models of evolution. Our model describes evolutionary dynamics in a high-dimensional, structured genotype space with interspecies interaction. We find promising qualitative similarity with the empirical facts about macroevolution, including broadly distributed extinction sizes and realistic exploration of the genotype space. The abstraction of our model permits numerous applications beyond macroevolution, including protein and RNA evolution.

  • 44.
    Fowlkes, Charless C.
    et al.
    Department of Computer Science, University of California Irvine.
    Eckenrode, Kelly B.
    Department of Systems Biology, Harvard Medical School.
    Bragdon, Meghan D.
    Department of Systems Biology, Harvard Medical School.
    Meyer, Miriah
    School of Engineering and Applied Sciences, Harvard University.
    Wunderlich, Zeba
    Department of Systems Biology, Harvard Medical School.
    Simirenko, Lisa
    California Institute for Quantitative Biosciences, University of California Berkeley.
    Luengo Hendriks, Cris L.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Centre for Image Analysis. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Keränen, Soile V. E.
    Genomics and Life Sciences Division, Lawrence Berkeley National Laboratory.
    Henriquez, Clara
    Genomics and Life Sciences Division, Lawrence Berkeley National Laboratory.
    Knowles, David W.
    Genomics and Life Sciences Division, Lawrence Berkeley National Laboratory.
    Biggin, Mark D.
    Genomics and Life Sciences Division, Lawrence Berkeley National Laboratory.
    Eisen, Michael B.
    California Institute for Quantitative Biosciences, University of California Berkeley.
    DePace, Angela H.
    Department of Systems Biology, Harvard Medical School.
    A Conserved Developmental Patterning Network Produces Quantitatively Different Output in Multiple Species of Drosophila2011In: PLoS Genetics, ISSN 1553-7390, Vol. 7, no 10, p. e1002346-Article in journal (Refereed)
  • 45.
    Freyhult, Eva
    Uppsala University, Teknisk-naturvetenskapliga vetenskapsområdet, Faculty of Science and Technology, Biology, The Linnaeus Centre for Bioinformatics.
    A Study in RNA Bioinformatics: Identification, Prediction and Analysis2007Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Research in the last few decades has revealed the great capacity of the RNA molecule. RNA, which previously was assumed to play a main role only as an intermediate in the translation of genes to proteins, is today known to play many important roles in the cell in addition to that as a messenger RNA and transfer RNA, including the ability to catalyze reactions and gene regulations at various levels.

    This thesis investigates several computational aspects of RNA. We will discuss identification of novel RNAs and RNAs that are known to exist in related species, RNA secondary structure prediction, as well as more general tools for analyzing, visualizing and classifying RNA sequences.

    We present two benchmark studies concerning RNA identification, both de novo identification/characterization of single RNA sequences and homology search methods.

    We develope a novel algorithm for analysis of the RNA folding landscape that is based on the nearest neighbor energy model adopted in many secondary structure prediction programs. We implement this algorithm, which computes structural neighbors of a given RNA secondary structure, in the program RNAbor, which is accessible on a web server.

    Furthermore, we combine a mutual information based structure prediction algorithm with a sequence logo visualization to create a novel visualization tool for analyzing an RNA alignment and identifying covarying sites.

    Finally, we present extensions to sequence logos for the purpose of tRNA identity analysis. We introduce function logos, which display features that distinguish functional subclasses within a large set of structurally related sequences, as well as the inverse logos, which display underrepresented features. For the purpose of comparing tRNA identity elements between different taxa we introduce two contrasting logos, the information difference and the Kullback-Leibler divergence difference logos.

    List of papers
    1. A comparison of RNA folding measures
    Open this publication in new window or tab >>A comparison of RNA folding measures
    2005 In: BMC Bioinformatics, ISSN 1471-2105, Vol. 6, p. 241-Article in journal (Refereed) Published
    Identifiers
    urn:nbn:se:uu:diva-96423 (URN)
    Available from: 2007-11-13 Created: 2007-11-13Bibliographically approved
    2. Exploring genomic dark matter: A critical assessment of the performance of homology search methods on noncoding RNA
    Open this publication in new window or tab >>Exploring genomic dark matter: A critical assessment of the performance of homology search methods on noncoding RNA
    2007 (English)In: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 17, no 1, p. 117-125Article in journal (Refereed) Published
    Abstract [en]

    Homology search is one of the most ubiquitous bioinformatic tasks, yet it is unknown how effective the currently available tools are for identifying noncoding RNAs (ncRNAs). In this work, we use reliable ncRNA data sets to assess the effectiveness of methods such as BLAST, FASTA, HMMer, and Infernal. Surprisingly, the most popular homology search methods are often the least accurate. As a result, many studies have used inappropriate tools for their analyses. On the basis of our results, we suggest homology search strategies using the currently available tools and some directions for future development.

    National Category
    Biological Sciences
    Identifiers
    urn:nbn:se:uu:diva-96424 (URN)10.1101/gr.5890907 (DOI)000243191400015 ()17151342 (PubMedID)
    Available from: 2007-11-13 Created: 2007-11-13 Last updated: 2017-12-14Bibliographically approved
    3. Boltzmann probability of RNA structural neighbors and riboswitch detection
    Open this publication in new window or tab >>Boltzmann probability of RNA structural neighbors and riboswitch detection
    2007 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 23, no 16, p. 2054-2062Article in journal (Refereed) Published
    Abstract [en]

    Motivation: We describe algorithms implemented in a new software package, RNAbor, to investigate structures in a neighborhood of an input secondary structure of an RNA sequence s. The input structure could be the minimum free energy structure, the secondary structure obtained by analysis of the X-ray structure or by comparative sequence analysis, or an arbitrary intermediate structure.

    Results: A secondary structure of s is called a -neighbor of if and differ by exactly base pairs. RNAbor computes the number (N), the Boltzmann partition function (Z) and the minimum free energy (MFE) and corresponding structure over the collection of all -neighbors of . This computation is done simultaneously for all m, in run time O (mn3) and memory O(mn2), where n is the sequence length. We apply RNAbor for the detection of possible RNA conformational switches, and compare RNAbor with the switch detection method paRNAss. We also provide examples of how RNAbor can at times improve the accuracy of secondary structure prediction.

    National Category
    Biological Sciences Computer and Information Sciences
    Identifiers
    urn:nbn:se:uu:diva-96425 (URN)10.1093/bioinformatics/btm314 (DOI)000249818300004 ()
    Available from: 2007-11-13 Created: 2007-11-13 Last updated: 2018-01-13Bibliographically approved
    4. RNAbor: a web server for RNA structural neighbors
    Open this publication in new window or tab >>RNAbor: a web server for RNA structural neighbors
    2007 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 35, no Suppl. S: Web Server issue, p. W305-W309Article in journal (Refereed) Published
    Abstract [en]

    RNAbor provides a new tool for researchers in the biological and related sciences to explore important aspects of RNA secondary structure and folding pathways. RNAbor computes statistics concerning delta-neighbors of a given input RNA sequence and structure (the structure can, for example, be the minimum free energy (MFE) structure). A delta-neighbor is a structure that differs from the input structure by exactly delta base pairs, that is, it can be obtained from the input structure by adding and/or removing exactly d base pairs. For each distance delta RNAbor computes the density of delta-neighbors, the number of delta-neighbors, and the MFE structure, or MFEd structure, among all delta-neighbors. RNAbor can be used to study possible folding pathways, to determine alternate low-energy structures, to predict potential nucleation sites and to explore structural neighbors of an intermediate, biologically active structure. The web server is available at http://bioinformatics.bc.edu/clotelab/RNAbor.

    Keywords
    Cluster Analysis, Computational Biology/*methods, Computer Simulation, Conserved Sequence, Databases, Genetic, Internet, Molecular Sequence Data, Nucleic Acid Conformation, RNA/*chemistry, RNA, Untranslated, Regulatory Sequences, Ribonucleic Acid, Sequence Alignment, Sequence Analysis, RNA, Sequence Homology, Nucleic Acid
    National Category
    Biological Sciences
    Identifiers
    urn:nbn:se:uu:diva-96426 (URN)10.1093/nar/gkm255 (DOI)000255311500057 ()17526527 (PubMedID)
    Available from: 2007-11-13 Created: 2007-11-13 Last updated: 2017-12-14Bibliographically approved
    5. Predicting RNA structure using mutual information
    Open this publication in new window or tab >>Predicting RNA structure using mutual information
    2005 In: Applied Bioinformatics, ISSN 1175-5636, Vol. 4, no 1, p. 53-59Article in journal (Refereed) Published
    Identifiers
    urn:nbn:se:uu:diva-96427 (URN)
    Available from: 2007-11-13 Created: 2007-11-13Bibliographically approved
    6. Visualizing bacterial tRNA identity determinants and antideterminants using function logos and inverse function logos
    Open this publication in new window or tab >>Visualizing bacterial tRNA identity determinants and antideterminants using function logos and inverse function logos
    2006 In: Nucleic Acids Research, ISSN 0305-1048, Vol. 34, no 3, p. 905-916Article in journal (Refereed) Published
    Identifiers
    urn:nbn:se:uu:diva-96428 (URN)
    Available from: 2007-11-13 Created: 2007-11-13Bibliographically approved
    7. New computational methods reveal tRNA identity element divergence between Proteobacteria and Cyanobacteria
    Open this publication in new window or tab >>New computational methods reveal tRNA identity element divergence between Proteobacteria and Cyanobacteria
    2007 (English)In: Biochimie, ISSN 0300-9084, E-ISSN 1638-6183, Vol. 89, no 10, p. 1276-1288Article in journal (Refereed) Published
    Abstract [en]

    There are at least 21 subfunctional classes of tRNAs in most cells that, despite a very highly conserved and compact common structure, must interact specifically with different cliques of proteins or cause grave organismal consequences. Protein recognition of specific tRNA substrates is achieved in part through class-restricted tRNA features called tRNA identity determinants. In earlier work we used TFAM, a statistical classifier of tRNA function, to show evidence of unexpectedly large diversity among bacteria in tRNA identity determinants. We also created a data reduction technique called function logos to visualize identity determinants for a given taxon. Here we show evidence that determinants for lysylated isoleucine tRNAs are not the same in Proteobacteria as in other bacterial groups including the Cyanobacteria. Consistent with this, the lysylating biosynthetic enzyme TilS lacks a C-terminal domain in Cyanobacteria that is present in Proteobacteria. We present here, using function logos, a map estimating all potential identity determinants generally operational in Cyanobacteria and Proteobacteria. To further isolate the differences in potential tRNA identity determinants between Proteobacteria and Cyanobacteria, we created two new data reduction visualizations to contrast sequence and function logos between two taxa. One, called Information Difference logos (ID logos), shows the evolutionary gain or retention of functional information associated to features in one lineage. The other, Kullback–Leibler divergence Difference logos (KLD logos), shows recruitments or shifts in the functional associations of features, especially those informative in both lineages. We used these new logos to specifically isolate and visualize the differences in potential tRNA identity determinants between Proteobacteria and Cyanobacteria. Our graphical results point to numerous differences in potential tRNA identity determinants between these groups. Although more differences in general are explained by shifts in functional association rather than gains or losses, the apparent identity differences in lysylated isoleucine tRNAs appear to have evolved through both mechanisms.

    Keywords
    tRNA identity, Function logos, tRNA identity determinants, Lysylated isoleucine tRNA, TilS, Aminoacyl-tRNA synthetase, Proteobacteria, Cyanobacteria, Kullback–Leibler Divergence
    National Category
    Biological Sciences
    Identifiers
    urn:nbn:se:uu:diva-96429 (URN)10.1016/j.biochi.2007.07.013 (DOI)000250613600015 ()17889982 (PubMedID)
    Available from: 2007-11-13 Created: 2007-11-13 Last updated: 2017-12-14Bibliographically approved
  • 46.
    Freyhult, Eva
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Gustafsson, Mats G.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    Strömbergsson, Helena
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Cancer Pharmacology and Computational Medicine.
    A Machine Learning Approach to Explain Drug Selectivity to Soluble and Membrane Protein Targets2015In: Molecular Informatics, ISSN 1868-1751, Vol. 34, no 1, p. 44-52Article in journal (Refereed)
  • 47.
    Fälth Savitski, Maria
    Uppsala University, Medicinska vetenskapsområdet, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Improved Neuropeptide Identification: Bioinformatics and Mass Spectrometry2008Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Bioinformatic methods were developed for improved identification of endogenous peptides using mass spectrometry. As a framework for these methods, a database for endogenous peptides, SwePep, was created. It was designed for storing information about endogenous peptides including tandem mass spectra. SwePep can be used for identification and validation of endogenous peptides by comparing experimentally derived masses of peptides and their fragments with information in the database. To improve automatic peptide identification of neuropeptides, targeted sequence collections that better mimic the peptidomic sample was derived from the SwePep database. Three sequence collections were created: SwePep precursors, SwePep peptides, and SwePep predicted. The searches for neuropeptides performed against these three sequence collections were compared with searches performed against the entire mouse proteome, and it was observed that three times as many peptides were identified with the targeted SwePep sequence collections. Applying the targeted SwePep sequence collections to identification of previously uncharacterized peptides yielded 27 novel potentially bioactive neuropeptides.

    Two fragmentations studies were performed using high mass accuracy tandem mass spectra of tryptic peptides. For this purpose, two databases were created: SwedCAD and SwedECD for CID and ECD tandem mass spectra, respectively. In the first study, fragmentation pattern of peptides with missed cleaved sites was studied using SwedCAD. It was observed that peptides with two arginines positioned next to each other have the same ability to immobilize two protons as peptides with two distant arginines. In the second study, SwedECD was used for studying small neutral losses from the reduced species in ECD fragmentation. The neutral losses were characterized with regard to their specificity and sensitivity to function as reporter ions for revealing the presence of specific amino acids in the peptide sequence. The results from these two studies can be used to improve identification of both tryptic and endogenous peptides.

    In summary, a collection of methods was developed that greatly improved the sensitivity of mass spectrometry peptide identification.

    List of papers
    1. SwePep, a database designed for endogenous peptides and mass spectrometry
    Open this publication in new window or tab >>SwePep, a database designed for endogenous peptides and mass spectrometry
    Show others...
    2006 (English)In: Molecular & Cellular Proteomics, ISSN 1535-9476, E-ISSN 1535-9484, Vol. 5, no 6, p. 998-1005Article in journal (Refereed) Published
    Abstract [en]

    A new database, SwePep, specifically designed for endogenous peptides, has been constructed to significantly speed up the identification process from complex tissue samples utilizing mass spectrometry. In the identification process the experimental peptide masses are compared with the peptide masses stored in the database both with and without possible post-translational modifications. This intermediate identification step is fast and singles out peptides that are potential endogenous peptides and can later be confirmed with tandem mass spectrometry data. Successful applications of this methodology are presented. The SwePep database is a relational database developed using MySql and Java. The database contains 4180 annotated endogenous peptides from different tissues originating from 394 different species as well as 50 novel peptides from brain tissue identified in our laboratory. Information about the peptides, including mass, isoelectric point, sequence, and precursor protein, is also stored in the database. This new approach holds great potential for removing the bottleneck that occurs during the identification process in the field of peptidomics. The SwePep database is available to the public.

    National Category
    Medical and Health Sciences
    Identifiers
    urn:nbn:se:uu:diva-97816 (URN)10.1074/mcp.M500401-MCP200 (DOI)16501280 (PubMedID)
    Available from: 2008-11-20 Created: 2008-11-20 Last updated: 2017-12-14Bibliographically approved
    2. Neuropeptidomics strategies for specific and sensitive identification of endogenous peptides
    Open this publication in new window or tab >>Neuropeptidomics strategies for specific and sensitive identification of endogenous peptides
    Show others...
    2007 (English)In: Molecular & Cellular Proteomics, ISSN 1535-9476, E-ISSN 1535-9484, Vol. 6, no 7, p. 1188-1197Article in journal (Refereed) Published
    Abstract [en]

    A new approach using targeted sequence collections has been developed for identifying endogenous peptides. This approach enables a fast, specific, and sensitive identification of endogenous peptides. Three different sequence collections were constituted in this study to mimic the peptidomic samples: SwePep precursors, SwePep peptides, and SwePep predicted. The searches for neuropeptides performed against these three sequence collections were compared with searches performed against the entire mouse proteome, which is commonly used to identify neuropeptides. These four sequence collections were searched with both Mascot and X! Tandem. Evaluation of the sequence collections was achieved using a set of manually identified and previously verified peptides. By using the three new sequence collections, which more accurately mimic the sample, 3 times as many peptides were significantly identified, with a false-positive rate below 1%, in comparison with the mouse proteome. The new sequence collections were also used to identify previously uncharacterized peptides from brain tissue; 27 previously uncharacterized peptides and potentially bioactive neuropeptides were identified. These novel peptides are cleaved from the peptide precursors at sites that are characteristic for prohormone convertases, and some of them have post-translational modifications that are characteristic for neuropeptides. The targeted protein sequence collections for different species are publicly available for download from SwePep.

    National Category
    Pharmaceutical Sciences
    Identifiers
    urn:nbn:se:uu:diva-97817 (URN)10.1074/mcp.M700016-MCP200 (DOI)000247850200008 ()17401030 (PubMedID)
    Available from: 2008-11-20 Created: 2008-11-20 Last updated: 2018-01-13Bibliographically approved
    3. Validation of endogenous peptide identifications using a database of tandem mass spectra
    Open this publication in new window or tab >>Validation of endogenous peptide identifications using a database of tandem mass spectra
    Show others...
    2008 (English)In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 7, no 7, p. 3049-3053Article in journal (Refereed) Published
    Abstract [en]

    The SwePep database is designed for endogenous peptides and mass spectrometry. It contains information about the peptides such as mass, p/, precursor protein and potential post-translational modifications. Here, we have improved and extended the SwePep database with tandem mass spectra, by adding a locally curated version of the global proteome machine database (GPMDB). In peptidomic experiment practice, many peptide sequences contain multiple tandem mass spectra with different quality. The new tandem mass spectra database in SwePep enables validation of low quality spectra using high quality tandem mass spectra. The validation is performed by comparing the fragmentation patterns of the two spectra using algorithms for calculating the correlation coefficient between the spectra. The present study is the first step in developing a tandem spectrum database for endogenous peptides that can be used for spectrum-to-spectrum identifications instead of peptide identifications using traditional protein sequence database searches.

    Keywords
    bioinformatics, neuropeptides, peptidomics, peptide identification, MS/MS database
    National Category
    Pharmaceutical Sciences
    Identifiers
    urn:nbn:se:uu:diva-97818 (URN)10.1021/pr800036d (DOI)000257449500045 ()
    Available from: 2008-11-20 Created: 2008-11-20 Last updated: 2018-01-13Bibliographically approved
    4. SwedCAD, a database of annotated high-mass accuracy MS/MS spectra of tryptic peptides
    Open this publication in new window or tab >>SwedCAD, a database of annotated high-mass accuracy MS/MS spectra of tryptic peptides
    Show others...
    2007 (English)In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 6, no 10, p. 4063-4067Article in journal (Refereed) Published
    Abstract [en]

    A database of high-mass accuracy tryptic peptides has been created. The database contains 15897 unique, annotated MS/MS spectra. It is possible to search for peptides according to their mass, number of missed cleavages, and sequence motifs. All of the data contained in the database is downloadable, and each spectrum can be visualized. An example is presented of how the database can be used for studying peptide fragmentation. Fragmentation of different types of missed cleaved peptides has been studied, and the results can be used to improve identification of these types of peptides.

    Keywords
    database, CAD, MS/MS, peptide fragmentation
    National Category
    Pharmaceutical Sciences
    Identifiers
    urn:nbn:se:uu:diva-97819 (URN)10.1021/pr070345h (DOI)000249983500021 ()17711326 (PubMedID)
    Available from: 2008-11-20 Created: 2008-11-20 Last updated: 2018-01-13Bibliographically approved
    5. Analytical utility of small neutral losses from reduced species in electron capture dissociation studied using SwedECD database
    Open this publication in new window or tab >>Analytical utility of small neutral losses from reduced species in electron capture dissociation studied using SwedECD database
    Show others...
    2008 (English)In: Analytical Chemistry, ISSN 0003-2700, E-ISSN 1520-6882, Vol. 80, no 21, p. 8089-8094Article in journal (Refereed) Published
    Abstract [en]

    Small neutral losses from charge-reduced species [M + nH]((n-1)+center dot) is one of the most abundant fragmentation channels in both electron capture dissociation, ECD, and electron transfer dissociation, ETD. Several groups have previously studied these losses on particular examples. Now, the availability of a large (11491 entries) SwedECD database (http://www.bmms.uu.se/CAD/indexECD.html) of high-resolution ECD data sets on doubly charged tryptic peptides has made possible a systematic study involving statistical evaluation of neutral losses from [M + 2H](+center dot) ions. Several new types of losses are discovered, and 16 specific (>94%) losses are characterized according to their specificity and sensitivity, as well as occurrence for peptides of different lengths. On average, there is more than one specific loss per ECD mass spectrum, and two-thirds of all MS/MS data sets in SwedECD contain at least one specific loss. Therefore, specific neutral losses are analytically useful for improved database searching and de novo sequencing. In particular, N and GG isomeric sequences can be distinguished. The pattern of neutral losses was found to be remarkably dissimilar with the losses from radical z(center dot) fragment ions: e.g., there is no direct formation of w ions from the reduced species. This finding emphasizes the difference in fragmentation behaviors of hydrogen-abundant and hydrogen-deficient species.

    National Category
    Pharmaceutical Sciences
    Identifiers
    urn:nbn:se:uu:diva-97820 (URN)10.1021/ac800944u (DOI)000260567000028 ()
    Available from: 2008-11-20 Created: 2008-11-20 Last updated: 2018-01-13Bibliographically approved
  • 48.
    Ganna, Andrea
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Molecular epidemiology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Lee, Donghwan
    Ingelsson, Erik
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Sciences, Molecular epidemiology.
    Pawitan, Yudi
    Rediscovery rate estimation for assessing the validation of significant findings in high-throughput studies2015In: Briefings in Bioinformatics, ISSN 1467-5463, E-ISSN 1477-4054, Vol. 16, no 4, p. 563-575Article in journal (Refereed)
    Abstract [en]

    It is common and advised practice in biomedical research to validate experimental or observational findings in a population different from the one where the findings were initially assessed. This practice increases the generalizability of the results and decreases the likelihood of reporting false-positive findings. Validation becomes critical when dealing with high-throughput experiments, where the large number of tests increases the chance to observe false-positive results. In this article, we review common approaches to determine statistical thresholds for validation and describe the factors influencing the proportion of significant findings from a 'training' sample that are replicated in a 'validation' sample. We refer to this proportion as rediscovery rate (RDR). In high-throughput studies, the RDR is a function of false-positive rate and power in both the training and validation samples. We illustrate the application of the RDR using simulated data and real data examples from metabolomics experiments. We further describe an online tool to calculate the RDR using t-statistics. We foresee two main applications. First, if the validation study has not yet been collected, the RDR can be used to decide the optimal combination between the proportion of findings taken to validation and the size of the validation study. Secondly, if a validation study has already been done, the RDR estimated using the training data can be compared with the observed RDR from the validation data; hence, the success of the validation study can be assessed.

  • 49.
    Gholami, Ali
    et al.
    Royal Institute of Technology.
    Laure, Erwin
    Royal Institute of Technology.
    Somogyi, Peter
    Karolinska Institutet.
    Spjuth, Ola
    Swedish e-Science Research Center and Department of Medical Epidemiology and Biostatistics, Karolinska Institute.
    Niazi, Salman
    Swedish Institute of Computer Science.
    Dowling, Jim
    Swedish Institute of Computer Science.
    Privacy-Preservation for Publishing Sample Availability Data with Personal Identifiers2015In: Journal of medical and bioengineering, ISSN 2301-3796, Vol. 4, no 2, p. 117-125Article in journal (Refereed)
    Abstract [en]

    Medical organizations collect, store and process vast amounts of sensitive information about patients. Easy access to this information by researchers is crucial to improving medical research, but in many institutions, cumbersome security measures and walled-gardens have created a situation where even information about what medical data is out there is not available. One of the main security challenges in this area, is enabling researchers to cross-link different medical studies, while preserving the privacy of the patients involved. In this paper, we introduce a privacy-preserving system for publishing sample availability data that allows researchers to make queries that crosscut different studies. That is, researchers can ask questions such as how many patients have had both diabetes and prostate cancer, where the diabetes and prostate cancer information originates from different clinical registries. We realize our solution by having a two-level anonymiziation mechanism, where our toolkit for publishing availability data first pseudonymizes personal identifiers and then anonymizes sensitive attributes. Our toolkit also includes a web-based server that stores the encrypted pseudonymized sample data and allows researchers to execute cross-linked queries across different study data. We believe that our toolkit contributes a first step to support the privacy preserving publication of data containing personal identifiers.

  • 50. Golkaram, Mahdi
    et al.
    Jang, Jiwon
    Hellander, Stefan
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Kosik, Kenneth S.
    Petzold, Linda R.
    The role of chromatin density in cell population heterogeneity during stem cell differentiation2017In: Scientific Reports, ISSN 2045-2322, E-ISSN 2045-2322, Vol. 7, p. 13307:1-11, article id 13307Article in journal (Refereed)
1234 1 - 50 of 177
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf