Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
Refine search result
1234567 1 - 50 of 403
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Ahlström, Anna
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Testing the specificity of the pBAD arabinose reporter2017Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    The project highlights Salmonella enterica subspecies enterica serovar Typhimurium (S. Tm)'s ability to metabolize simple sugars released from dead commensal bacteria, by using the pBAD (araBAD promoter) system as a reporter of L-arabinose availability. Using bioinformatics and homology of conserved L-arabinose transporter genes shared in Escherichia coli K12 (E. coli) and S. Tm, we aimed to create a S. Tm mutant strain unable to obtain L-arabinose from it environment. During the projects course of time it was discovered that L-arabinose transporters are not a shared gene trait between E. coli and S. Tm, and that putative L-arabinose transporter orthologues may exists in the S. Tm genome.

    Download full text (pdf)
    fulltext
  • 2.
    Ajawatanawong, Pravech
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Systematic Biology.
    Mine the Gaps: Evolution of Eukaryotic Protein Indels and their Application for Testing Deep Phylogeny2014Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Insertions/deletions (indels) are potentially powerful evolutionary markers, but little is known about their evolution and few tools exist to effectively study them. To address this, I developed SeqFIRE, a tool for automated identification and extraction of indels from protein multiple sequence alignments. The program also extracts conserved alignment blocks, thus covering all major steps in preparing multiple sequence alignments for phylogenetic analysis.

    I then used SeqFIRE to build an indel database, using 299 single copy proteins from a broad taxonomic sampling of mainly multicellular eukaryotes. A total of 4,707 indels were extracted, of which 901 are simple (one genetic event) and 3,806 are complex (multiple events). The most abundant indels are single amino acid simple indels. Indel frequency decreases exponentially with length and shows a linear relationship with host protein size. Singleton indels reveal a strong bias towards insertions (2.31 x deletions on average). These analyses also identify 43 indels marking major clades in Plantae and Fungi (clade defining indels or CDIs), but none for Metazoa.

    In order to study the 3806 complex indels they were first classified by number of states. Analysis of the 2-state complex and simple indels combined (“bi-state indels”) confirms that insertions are over 2.5 times as frequent as deletions. Three-quarters of the complex indels had three-nine states (“slightly complex indels”). A tree-assisted search method was developed allowing me to identify 1,010 potential CDIs supporting all examined major branches of Plantae and Fungi.

    Forty-two proteins were also found to host complex indel CDIs for the deepest branches of Metazoa. After expanding the taxon set for these proteins, I identified a total of 49 non-bilaterian specific CDIs. Parsimony analysis of these indels places Ctenophora as sister taxon to all other Metazoa including Porifera. Six CDIs were also found placing Placozoa as sister to Bilateria. I conclude that slightly complex indels are a rich source of CDIs, and my tree-assisted search strategy could be automated and implemented in the program SeqFIRE to facilitate their discovery. This will have important implications for mining the phylogenomic content of the vast resource of protist genome data soon to become available.

    List of papers
    1. SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments
    Open this publication in new window or tab >>SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments
    Show others...
    2012 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 40, no W1, p. W340-W347Article in journal (Refereed) Published
    Abstract [en]

    Analyses of multiple sequence alignments generally focus on well-defined conserved sequence blocks, while the rest of the alignment is largely ignored or discarded. This is especially true in phylogenomics, where large multigene datasets are produced through automated pipelines. However, some of the most powerful phylogenetic markers have been found in the variable length regions of multiple alignments, particularly insertions/deletions (indels) in protein sequences. We have developed Sequence Feature and Indel Region Extractor (SeqFIRE) to enable the automated identification and extraction of indels from protein sequence alignments. The program can also extract conserved blocks and identify fast evolving sites using a combination of conservation and entropy. All major variables can be adjusted by the user, allowing them to identify the sets of variables most suited to a particular analysis or dataset. Thus, all major tasks in preparing an alignment for further analysis are combined in a single flexible and user-friendly program. The output includes a numbered list of indels, alignments in NEXUS format with indels annotated or removed and indel-only matrices. SeqFIRE is a user-friendly web application, freely available online at www.seqfire.org/.

    Keywords
    Indels, Alignment, Conserved blocks
    National Category
    Bioinformatics (Computational Biology) Bioinformatics and Systems Biology
    Identifiers
    urn:nbn:se:uu:diva-179937 (URN)10.1093/nar/gks561 (DOI)000306670900056 ()
    Available from: 2012-08-27 Created: 2012-08-27 Last updated: 2019-08-28Bibliographically approved
    2. Evolution of protein indels in plants, animals and fungi
    Open this publication in new window or tab >>Evolution of protein indels in plants, animals and fungi
    2013 (English)In: BMC Evolutionary Biology, E-ISSN 1471-2148, Vol. 13, p. 140-Article in journal (Refereed) Published
    Abstract [en]

    Background: Insertions/deletions (indels) in protein sequences are useful as drug targets, protein structure predictors, species diagnostics and evolutionary markers. However there is limited understanding of indel evolutionary patterns. We sought to characterize indel patterns focusing first on the major groups of multicellular eukaryotes. Results: Comparisons of complete proteomes from a taxonically broad set of primarily Metazoa, Fungi and Viridiplantae yielded 299 substantial (>250aa) universal, single-copy (in-paralog only) proteins, from which 901 simple (present/absent) and 3,806 complex (multistate) indels were extracted. Simple indels are mostly small (1-7aa) with a most frequent size class of 1aa. However, even these simple looking indels show a surprisingly high level of hidden homoplasy (multiple independent origins). Among the apparently homoplasy-free simple indels, we identify 69 potential clade-defining indels (CDIs) that may warrant closer examination. CDIs show a very uneven taxonomic distribution among Viridiplante (13 CDIs), Fungi (40 CDIs), and Metazoa (0 CDIs). An examination of singleton indels shows an excess of insertions over deletions in nearly all examined taxa. This excess averages 2.31 overall, with a maximum observed value of 7.5 fold. Conclusions: We find considerable potential for identifying taxon-marker indels using an automated pipeline. However, it appears that simple indels in universal proteins are too rare and homoplasy-rich to be used for pure indel-based phylogeny. The excess of insertions over deletions seen in nearly every genome and major group examined maybe useful in defining more realistic gap penalties for sequence alignment. This bias also suggests that insertions in highly conserved proteins experience less purifying selection than do deletions.

    Keywords
    Indels, Rare genomic changes, Phylogeny, Insertion/deletion, Multiple sequence alignment, Eukaryote evolution, Indel profiles
    National Category
    Natural Sciences
    Identifiers
    urn:nbn:se:uu:diva-204971 (URN)10.1186/1471-2148-13-140 (DOI)000321461800001 ()
    Available from: 2013-08-16 Created: 2013-08-13 Last updated: 2024-01-17Bibliographically approved
    3. An automatable method for high throughput analysis of evolutionary patterns in slightly complex indels and its application to the deep phylogeny of Metazoa
    Open this publication in new window or tab >>An automatable method for high throughput analysis of evolutionary patterns in slightly complex indels and its application to the deep phylogeny of Metazoa
    2014 (English)Manuscript (preprint) (Other academic)
    Abstract [en]

    Insertions/deletions (indels) in protein sequences are potential powerful evolutionary markers. However, these characters have rarely been explored systematically at deep phylogenetic levels. Previous analyses of simple (2-state) clade defining indels (CDIs) in universal eukaryotic proteins found none to support any major animal clade. We hypothesized that CDIs might still be found in the remaining population of indels, which we term complex indels. Here, we propose a method for analyzing the simplest class of complex indels the “slightly complex indels”, and use these to investigate deep branches in animal phylogeny. Complex indels with two states, called bi-state indels, show similar evolutionary patterns to singleton simple indels and confirms that insertion mutations are more common than deletions. Exploration of CDIs in 2- to 9-state complex indels shows strong support for all examined branches of fungi and Archaeplastida. Surprisingly, we also found CDIs supporting major branches in animals, particular in vertebrates. We then expanded the search to non-bilaterial animals (Porifera, Cnidaria and Ctenophora). The phylogenetic tree reconstructed by CDIs places the Ctenophore Mnemiopsis leidyi as the deepest branch of animals with 6 CDIs support. Trichoplax adhaerens is closely related to the Bilateria. Moreover, the indel phylogeny shows Nematostella vectensis and Hydra magnipapillata are paraphyletic group and position of Cnidarian branches seems to be problematic in the indel phylogeny because of homoplasy. This might be solved if we discover CDIs from animal specific proteins, which emerged after the universal orthologous proteins.Evolutionary Patterns in Slightly Complex Protein Insertions/Deletions (Indels) and Their Application to the Study of Deep Phylogeny in Metazoa

    National Category
    Other Biological Topics
    Identifiers
    urn:nbn:se:uu:diva-216842 (URN)
    Available from: 2014-01-27 Created: 2014-01-27 Last updated: 2024-04-03Bibliographically approved
    Download full text (pdf)
    fulltext
    Download (jpg)
    presentationsbild
  • 3.
    Ajawatanawong, Pravech
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Systematic Biology.
    Atkinson, Gemma C.
    Watson-Haigh, Nathan S.
    MacKenzie, Bryony
    Baldauf, Sandra L.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Systematic Biology.
    SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments2012In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 40, no W1, p. W340-W347Article in journal (Refereed)
    Abstract [en]

    Analyses of multiple sequence alignments generally focus on well-defined conserved sequence blocks, while the rest of the alignment is largely ignored or discarded. This is especially true in phylogenomics, where large multigene datasets are produced through automated pipelines. However, some of the most powerful phylogenetic markers have been found in the variable length regions of multiple alignments, particularly insertions/deletions (indels) in protein sequences. We have developed Sequence Feature and Indel Region Extractor (SeqFIRE) to enable the automated identification and extraction of indels from protein sequence alignments. The program can also extract conserved blocks and identify fast evolving sites using a combination of conservation and entropy. All major variables can be adjusted by the user, allowing them to identify the sets of variables most suited to a particular analysis or dataset. Thus, all major tasks in preparing an alignment for further analysis are combined in a single flexible and user-friendly program. The output includes a numbered list of indels, alignments in NEXUS format with indels annotated or removed and indel-only matrices. SeqFIRE is a user-friendly web application, freely available online at www.seqfire.org/.

    Download full text (pdf)
    fulltext
  • 4.
    Al Jewari, Caesar
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Examining the Root of the Eukaryotic Tree of Life2017Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Identifying the evolutionary root of eukaryotic tree of life (eToL) is a central problem in systematic biology that has been receiving growing attention. This task has been aided by the development of advanced phylogenetic methods and the availability of large amounts of genomic data from across the tree. Recently, two studies have tried a novel approach to define the eToL root, using euBacteria (instead of the more distantly related Archaea) as the outgroup. The results of these two recent studies are partially overlapping datasets, which produce contradictory results. One study, using mixed eubacterial data (euBac), makes the case for a neozoan-excavate root, while the other study, using alpha-proteobacterial (aP) data, concluded the traditional unikont-bikont root. These two results suggest different theories of early eukaryote evolution. However, there is also evidence of substantial artefacts in these datasets and traces of horizontal gene transfer (HGT), the exchange of DNA between unrelated organisms. This project aims to re-examine the datasets of both publications (61 total protein markers). The work started with updating both datasets with solid new phylogenomic data from the supervisor lab and new publicly available data. I then used these data to systematically investigate the phylogenetic signals of the 61 protein markers across 88 taxa (68 eukaryotes and 20 Bacteria). These were first subjected to preliminary phylogenetic analyses to sort orthologues from paralogues. All orthologues were then combined into a single dataset and subjected to in depth phylogenetic analyses to evaluate the support for various hypotheses. I also investigated potential sources of artefact in the data using traditional and novel methods I devised and developed myself including computer scripts specifically written for this work. I created a pipeline for the data curation process to make it fast and efficient by automating various parts of the workflow, including concatenating the multigene dataset into a super matrix. I estimated the level of incongruence in each dataset, excluded the protein markers that have a strong phylogenetic bias, and reconstructed new datasets. I conclude that the data in hand (protein markers and taxa) contain conflicting and inconsistent phylogenetic signal and that a few proteins can have a very strong effect on the results of the analyses. However, a third possible hypothesis is clearly rejected. This suggests that there are specific artefacts in the data, favouring one or the other of the two remaining hypotheses.

  • 5.
    Alacamli, Erkin
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Developing an Advanced Method for Kinship from Ancient DNA Data2023Independent thesis Advanced level (degree of Master (Two Years)), 30 credits / 45 HE creditsStudent thesis
    Abstract [en]

     The analysis of kinship from ancient DNA (aDNA) data has the potential to provide insight into social structures of prehistoric societies. Kinship analysis is gaining popularity as optimised wet-lab methods allow for studies with sample sizes on the level of whole cemeteries. However, the specifics of ancient DNA require different methods than what would be used for modern DNA. A common way is to use the sites that are identical-bydescent (IBD), however, detecting these is often a challenging task since it is not easy to determine whether a shared locus between two individuals is inherited from the ancestor or if another factor caused the similarity. Most methods used in the field are able to identify up to 2nd or 3rd degree relatives from aDNA data but do not distinguish between different types of relationship for the same degree, for instance not being able to differentiate between parentoffspring and full sibling-sibling relationship in first degree. The aDNA kinship methods often use either of window-based or single-site approaches, however, these two approaches have not been compared formally before in terms of effectivity and efficiency. In this work, READv2 is presented as a re-implementation of a popular kinship analysis method for aDNA studies with additional features such as accepting .bed files as input, which take up less space than the previous input type, plain-text .tped files. It is shown that the new version works more efficiently in terms of runtime. However, the memory requirements seem to be increased with the new implementation. Furthermore, a window-based approach is compared with the single-site approach of READv2, as well as varying window sizes, with benchmarked simulation data which contains approximately 700 individuals with known 1st degree, 2nd degree and 3rd degree relationships. According to the comparison, the sensitivity of the method does not vary between the approaches and different window sizes for high coverages. However, the single-site approach has been shown to be the superior one by a small margin for lower coverages. In addition to these, using the variance of non-shared alleles in windows along the genome has been used to implement a method to differentiate different first-degree relationships, parent-offspring and siblings. The method is tested with an independent dataset from the 1000 Genomes Project which shows that the proposed method is able to work with different datasets with varying sets of SNPs. Nevertheless, the first-degree classification method requires further analyses to determine the stress-point where the True Positive rates for both categories start to drop. Additionally, some necessary changes and decisions are required for READv2 to be a user-friendly method that can be used by other researchers. The preliminary release of READv2, including example data as well as instructions to install the necessary packages and to run the algorithm can be found in https://github.com/GuntherLab/READv2/releases/tag/READ. 

    Download full text (pdf)
    fulltext
  • 6.
    Alexiou, Athanasios
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Sets of Genes Predict Survival of Glioblastoma Patients2016Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
  • 7.
    Allalou, Amin
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Centre for Image Analysis. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis.
    Pinidiyaarachchi, Amalka
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Centre for Image Analysis. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis.
    Wählby, Carolina
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Centre for Image Analysis. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis.
    Robust signal detection in 3D fluorescence microscopy2010In: Cytometry. Part A, ISSN 1552-4922, Vol. 77A, no 1, p. 86-96Article in journal (Refereed)
    Abstract [en]

    Robust detection and localization of biomolecules inside cells is of great importance to better understand the functions related to them. Fluorescence microscopy and specific staining methods make biomolecules appear as point-like signals on image data, often acquired in 3D. Visual detection of such point-like signals can be time consuming and problematic if the 3D images are large, containing many, sometimes overlapping, signals. This sets a demand for robust automated methods for accurate detection of signals in 3D fluorescence microscopy. We propose a new 3D point-source signal detection method that is based on Fourier series. The method consists of two parts, a detector, which is a cosine filter to enhance the point-like signals, and a verifier, which is a sine filter to validate the result from the detector. Compared to conventional methods, our method shows better robustness to noise and good ability to resolve signals that are spatially close. Tests on image data show that the method has equivalent accuracy in signal detection in comparison to Visual detection by experts. The proposed method can be used as an efficient point-like signal detection tool for various types of biological 3D image data.

  • 8.
    Alneberg, Johannes
    et al.
    KTH Royal Inst Technol, Sch Engn Sci Chem Biotechnol & Hlth, Dept Gene Technol, Sci Life Lab, Stockholm, Sweden.
    Karlsson, Christofer M. G.
    Linnaeus Univ, Ctr Ecol & Evolut Microbial Model Syst, EEMiS, Kalmar, Sweden.
    Divne, Anna-Maria
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Molecular Evolution. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Bergin, Claudia
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Molecular Evolution. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Homa, Felix
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Molecular Evolution. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Lindh, Markus V.
    Linnaeus Univ, Ctr Ecol & Evolut Microbial Model Syst, EEMiS, Kalmar, Sweden;Lund Univ, Dept Biol, Lund, Sweden.
    Hugerth, Luisa W.
    KTH Royal Inst Technol, Sch Engn Sci Chem Biotechnol & Hlth, Dept Gene Technol, Sci Life Lab, Stockholm, Sweden;Karolinska Inst, Ctr Translat Microbiome Res, Dept Mol Tumour & Cell Biol, Sci Life Lab, Solna, Sweden.
    Ettema, Thijs J. G.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Molecular Evolution. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Bertilsson, Stefan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Limnology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Andersson, Anders F.
    KTH Royal Inst Technol, Sch Engn Sci Chem Biotechnol & Hlth, Dept Gene Technol, Sci Life Lab, Stockholm, Sweden.
    Pinhassi, Jarone
    Linnaeus Univ, Ctr Ecol & Evolut Microbial Model Syst, EEMiS, Kalmar, Sweden.
    Genomes from uncultivated prokaryotes: a comparison of metagenome-assembled and single-amplified genomes2018In: Microbiome, E-ISSN 2049-2618, Vol. 6, article id 173Article in journal (Refereed)
    Abstract [en]

    Background: Prokaryotes dominate the biosphere and regulate biogeochemical processes essential to all life. Yet, our knowledge about their biology is for the most part limited to the minority that has been successfully cultured. Molecular techniques now allow for obtaining genome sequences of uncultivated prokaryotic taxa, facilitating in-depth analyses that may ultimately improve our understanding of these key organisms.

    Results: We compared results from two culture-independent strategies for recovering bacterial genomes: single-amplified genomes and metagenome-assembled genomes. Single-amplified genomes were obtained from samples collected at an offshore station in the Baltic Sea Proper and compared to previously obtained metagenome-assembled genomes from a time series at the same station. Among 16 single-amplified genomes analyzed, seven were found to match metagenome-assembled genomes, affiliated with a diverse set of taxa. Notably, genome pairs between the two approaches were nearly identical (average 99.51% sequence identity; range 98.77-99.84%) across overlapping regions (30-80% of each genome). Within matching pairs, the single-amplified genomes were consistently smaller and less complete, whereas the genetic functional profiles were maintained. For the metagenome-assembled genomes, only on average 3.6% of the bases were estimated to be missing from the genomes due to wrongly binned contigs.

    Conclusions: The strong agreement between the single-amplified and metagenome-assembled genomes emphasizes that both methods generate accurate genome information from uncultivated bacteria. Importantly, this implies that the research questions and the available resources are allowed to determine the selection of genomics approach for microbiome studies.

    Download full text (pdf)
    FULLTEXT01
  • 9.
    Alström, Per
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Animal ecology.
    Sundev, Gombobaatar
    Mongolian Short-toed Lark (Calandrella dukhunensis)2021Other (Other academic)
  • 10.
    Alvarsson, Jonathan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Ligand-based Methods for Data Management and Modelling2015Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Drug discovery is a complicated and expensive process in the billion dollar range. One way of making the drug development process more efficient is better information handling, modelling and visualisation. The majority of todays drugs are small molecules, which interact with drug targets to cause an effect. Since the 1980s large amounts of compounds have been systematically tested by robots in so called high-throughput screening. Ligand-based drug discovery is based on modelling drug molecules. In the field known as Quantitative Structure–Activity Relationship (QSAR) molecules are described by molecular descriptors which are used for building mathematical models. Based on these models molecular properties can be predicted and using the molecular descriptors molecules can be compared for, e.g., similarity. Bioclipse is a workbench for the life sciences which provides ligand-based tools through a point and click interface. 

    The aims of this thesis were to research, and develop new or improved ligand-based methods and open source software, and to work towards making these tools available for users through the Bioclipse workbench. To this end, a series of molecular signature studies was done and various Bioclipse plugins were developed.

    An introduction to the field is provided in the thesis summary which is followed by five research papers. Paper I describes the Bioclipse 2 software and the Bioclipse scripting language. In Paper II the laboratory information system Brunn for supporting work with dose-response studies on microtiter plates is described. In Paper III the creation of a molecular fingerprint based on the molecular signature descriptor is presented and the new fingerprints are evaluated for target prediction and found to perform on par with industrial standard commercial molecular fingerprints. In Paper IV the effect of different parameter choices when using the signature fingerprint together with support vector machines (SVM) using the radial basis function (RBF) kernel is explored and reasonable default values are found. In Paper V the performance of SVM based QSAR using large datasets with the molecular signature descriptor is studied, and a QSAR model based on 1.2 million substances is created and made available from the Bioclipse workbench.

    List of papers
    1. Bioclipse 2: A scriptable integration platform for the life sciences
    Open this publication in new window or tab >>Bioclipse 2: A scriptable integration platform for the life sciences
    Show others...
    2009 (English)In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 10, p. 397-Article in journal (Refereed) Published
    Abstract [en]

    Background: Contemporary biological research integrates neighboring scientific domains to answer complex ques- tions in fields such as systems biology and drug discovery. This calls for tools that are intuitive to use, yet flexible to adapt to new tasks.

    Results: Bioclipse is a free, open source workbench with advanced features for the life sciences. Version 2.0 constitutes a complete rewrite of Bioclipse, and delivers a stable, scalable integration platform for developers and an intuitive workbench for end users. All functionality is available both from the graphical user interface and from a built-in novel domain-specific language, supporting the scientist in interdisciplinary research and reproducible analyses through advanced visualization of the inputs and the results. New components for Bioclipse 2 include a rewritten editor for chemical structures, a table for multiple molecules that supports gigabyte-sized files, as well as a graphical editor for sequences and alignments.

    Conclusions: Bioclipse 2 is equipped with advanced tools required to carry out complex analysis in the fields of bio- and cheminformatics. Developed as a Rich Client based on Eclipse, Bioclipse 2 leverages on today’s powerful desktop computers for providing a responsive user interface, but also takes full advantage of the Web and networked (Web/Cloud) services for more demanding calculations or retrieval of data. That Bioclipse 2 is based on an advanced and widely used service platform ensures wide extensibility, and new algorithms, visualizations as well as scripting commands can easily be added. The intuitive tools for end users and the extensible architecture make Bioclipse 2 ideal for interdisciplinary and integrative research. Bioclipse 2 is released under the Eclipse Public License (EPL), a flexible open source license that allows additional plugins to be of any license. Bioclipse 2 is implemented in Java and supported on all major platforms; Source code and binaries are freely available at http://www.bioclipse.net.

    Keywords
    Bioclipse, bioinformatics, cheminformatics, scriptable, script, workbench, life science, platform
    National Category
    Bioinformatics and Systems Biology Pharmaceutical Sciences
    Identifiers
    urn:nbn:se:uu:diva-109304 (URN)10.1186/1471-2105-10-397 (DOI)000273329400001 ()
    Available from: 2009-12-16 Created: 2009-10-13 Last updated: 2024-01-17Bibliographically approved
    2. Brunn: an open source laboratory information system for microplates with a graphical plate layout design process
    Open this publication in new window or tab >>Brunn: an open source laboratory information system for microplates with a graphical plate layout design process
    Show others...
    2011 (English)In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 12, no 1, article id 179Article in journal (Refereed) Published
    Abstract [en]

    Background:

    Compound profiling and drug screening generates large amounts of data and is generally based on microplate assays. Current information systems used for handling this are mainly commercial, closed source, expensive, and heavyweight and there is a need for a flexible lightweight open system for handling plate design, and validation and preparation of data.

    Results:

    A Bioclipse plugin consisting of a client part and a relational database was constructed. A multiple-step plate layout point-and-click interface was implemented inside Bioclipse. The system contains a data validation step, where outliers can be removed, and finally a plate report with all relevant calculated data, including dose-response curves.

    Conclusions:

    Brunn is capable of handling the data from microplate assays. It can create dose-response curves and calculate IC50 values. Using a system of this sort facilitates work in the laboratory. Being able to reuse already constructed plates and plate layouts by starting out from an earlier step in the plate layout design process saves time and cuts down on error sources.

    Keywords
    brunn, microtiter, bioclipse, screening, information system, lis, lims
    National Category
    Pharmacology and Toxicology
    Research subject
    Bioinformatics
    Identifiers
    urn:nbn:se:uu:diva-153210 (URN)10.1186/1471-2105-12-179 (DOI)000292027200001 ()21599898 (PubMedID)
    Available from: 2011-05-09 Created: 2011-05-09 Last updated: 2024-01-17Bibliographically approved
    3. Ligand-Based Target Prediction with Signature Fingerprints
    Open this publication in new window or tab >>Ligand-Based Target Prediction with Signature Fingerprints
    Show others...
    2014 (English)In: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, no 10, p. 2647-2653Article in journal (Refereed) Published
    Abstract [en]

    When evaluating a potential drug candidate it is desirable to predict target interactions in silico prior to synthesis in order to assess, e.g., secondary pharmacology. This can be done by looking at known target binding profiles of similar compounds using chemical similarity searching. The purpose of this study was to construct and evaluate the performance of chemical fingerprints based on the molecular signature descriptor for performing target binding predictions. For the comparison we used the area under the receiver operating characteristics curve (AUC) complemented with net reclassification improvement (NRI). We created two open source signature fingerprints, a bit and a count version, and evaluated their performance compared to a set of established fingerprints with regards to predictions of binding targets using Tanimoto-based similarity searching on publicly available data sets extracted from ChEMBL. The results showed that the count version of the signature fingerprint performed on par with well-established fingerprints such as ECFP. The count version outperformed the bit version slightly; however, the count version is more complex and takes more computing time and memory to run so its usage should probably be evaluated on a case-by-case basis. The NRI based tests complemented the AUC based ones and showed signs of higher power.

    National Category
    Pharmaceutical Sciences Bioinformatics (Computational Biology)
    Research subject
    Bioinformatics
    Identifiers
    urn:nbn:se:uu:diva-237934 (URN)10.1021/ci500361u (DOI)000343849600004 ()25230336 (PubMedID)
    Funder
    Swedish Research Council, VR-2011-6129eSSENCE - An eScience CollaborationScience for Life Laboratory - a national resource center for high-throughput molecular bioscienceSwedish National Infrastructure for Computing (SNIC)
    Available from: 2014-12-08 Created: 2014-12-08 Last updated: 2018-01-11Bibliographically approved
    4. Benchmarking Study of Parameter Variation When Using Signature Fingerprints Together with Support Vector Machines
    Open this publication in new window or tab >>Benchmarking Study of Parameter Variation When Using Signature Fingerprints Together with Support Vector Machines
    Show others...
    2014 (English)In: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, no 11, p. 3211-3217Article in journal (Refereed) Published
    Abstract [en]

    QSAR modeling using molecular signatures and support vector machines with a radial basis function is increasingly used for virtual screening in the drug discovery field. This method has three free parameters: C, ?, and signature height. C is a penalty parameter that limits overfitting, ? controls the width of the radial basis function kernel, and the signature height determines how much of the molecule is described by each atom signature. Determination of optimal values for these parameters is time-consuming. Good default values could therefore save considerable computational cost. The goal of this project was to investigate whether such default values could be found by using seven public QSAR data sets spanning a wide range of end points and using both a bit version and a count version of the molecular signatures. On the basis of the experiments performed, we recommend a parameter set of heights 0 to 2 for the count version of the signature fingerprints and heights 0 to 3 for the bit version. These are in combination with a support vector machine using C in the range of 1 to 100 and gamma in the range of 0.001 to 0.1. When data sets are small or longer run times are not a problem, then there is reason to consider the addition of height 3 to the count fingerprint and a wider grid search. However, marked improvements should not be expected.

    National Category
    Medical Biotechnology Pharmaceutical Sciences
    Identifiers
    urn:nbn:se:uu:diva-240239 (URN)10.1021/ci500344v (DOI)000345551000017 ()25318024 (PubMedID)
    Funder
    eSSENCE - An eScience CollaborationScience for Life Laboratory - a national resource center for high-throughput molecular bioscience
    Available from: 2015-01-07 Created: 2015-01-06 Last updated: 2018-01-11Bibliographically approved
    5. Large-scale ligand-based predictive modelling using support vector machines
    Open this publication in new window or tab >>Large-scale ligand-based predictive modelling using support vector machines
    Show others...
    2016 (English)In: Journal of Cheminformatics, E-ISSN 1758-2946, Vol. 8, article id 39Article in journal (Refereed) Published
    Abstract [en]

    The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse.

    Keywords
    Predictive modelling; Support vector machine; Bioclipse; Molecular signatures; QSAR
    National Category
    Pharmaceutical Sciences Bioinformatics (Computational Biology)
    Research subject
    Bioinformatics
    Identifiers
    urn:nbn:se:uu:diva-248959 (URN)10.1186/s13321-016-0151-5 (DOI)000381186100001 ()27516811 (PubMedID)
    Funder
    Swedish National Infrastructure for Computing (SNIC), b2013262 b2015001Science for Life Laboratory - a national resource center for high-throughput molecular bioscienceeSSENCE - An eScience Collaboration
    Available from: 2015-04-09 Created: 2015-04-09 Last updated: 2022-05-10Bibliographically approved
    Download full text (pdf)
    fulltext
    Download (jpg)
    presentationsbild
  • 11.
    Ammunet, Tea
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology.
    Evolution and diversification of secreted protein effectors in the order Legionellales2018Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    The evolution of a large, diverse group of intracellular bacteria was previously very difficult to study. Recent advancements in both metagenomic methods and bioinformatics has made it possible. This thesis investigates the evolution of the order Legionellales. The study concentrates on a group of proteins essential for pathogenesis and host manipulation in the order, called effector proteins. The role of effectors in host adaptation, evolutionary history and the diversification of the order were investigated using a multitude of bioinformatics methods. First, the abundance and distribution of the known effector proteins in the orderwas found to cover newly discovered clades. There was a clear distinction between the proteins present in Legionellales and the outgoup, indicating the important role of the effectors in the order. Further, the effectors with known functions found in the new clades, particularly in Berkiella, revealed potential modes of host manipulation of this group. Secondly, the evolution of the effector gene content in the order shed light on theevolution of the order, as well as on the potential evolutionary differences between Legionellaceae and Coxiellaceae. In general, most of the effectors were gained early in the last common ancestor of Legionellales and Legionellaceae, as further indication of their role in the diversification of the order. New effector genes were acquired in the Legionellaceae even up to recent speciation events, whereas Coxiellacea have lost more protein coding genes with time. These differences may be due to horizontal gene transfer in the case of gene gains in Legionellaceae and loss of selection in the case of gene losses in Coxiellaceae. Third, the early evolution of core gained effector proteins for the order was studied.Two of the eight investigated core effectors seem to have a connection to eukaryotes, the rest to other bacteria, indicating both inter-domain and within bacteria horizontal gene transfer. In particular, one effector protein with eukaryotic motif gained at the last common ancestor of Legionellales, was found in all the clades and is therefore an important evolutionary link that may have allowed Legionellales to utilize eukaryotic hosts.

    Download full text (pdf)
    fulltext
  • 12.
    Andersson, Axel
    et al.
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Partel, Gabriele
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Solorzano, Leslie
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Wählby, Carolina
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Transcriptome-Supervised Classification of Tissue Morphology Using Deep Learning2020In: IEEE 17th International Symposium on Biomedical Imaging (ISBI), 2020, p. 1630-1633Conference paper (Refereed)
    Abstract [en]

    Deep learning has proven to successfully learn variations in tissue and cell morphology. Training of such models typically relies on expensive manual annotations. Here we conjecture that spatially resolved gene expression, e.i., the transcriptome, can be used as an alternative to manual annotations. In particular, we trained five convolutional neural networks with patches of different size extracted from locations defined by spatially resolved gene expression. The network is trained to classify tissue morphology related to two different genes, general tissue, as well as background, on an image of fluorescence stained nuclei in a mouse brain coronal section. Performance is evaluated on an independent tissue section from a different mouse brain, reaching an average Dice score of 0.51. Results may indicate that novel techniques for spatially resolved transcriptomics together with deep learning may provide a unique and unbiased way to find genotype phenotype relationships

  • 13.
    Andersson, Robin
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Decoding the Structural Layer of Transcriptional Regulation: Computational Analyses of Chromatin and Chromosomal Aberrations2010Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Gene activity is regulated at two separate layers. Through structural and chemical properties of DNA – the primary layer of encoding – local signatures may enable, or disable, the binding of proteins or complexes of them with regulatory potential to the DNA. At a higher level – the structural layer of encoding – gene activity is regulated through the properties of higher order DNA structure, chromatin, and chromosome organization. Cells with abnormal chromosome compaction or organization, e.g. cancer cells, may thus have perturbed regulatory activities resulting in abnormal gene activity.

    Hence, there is a great need to decode the transcriptional regulation encoded in both layers to further our understanding of the factors that control activity and life of a cell and, ultimately, an organism. Modern genome-wide studies with those aims rely on data-intense experiments requiring sophisticated computational and statistical methods for data handling and analyses. This thesis describes recent advances of analyzing experimental data from quantitative biological studies to decipher the structural layer of encoding in human cells.

    Adopting an integrative approach when possible, combining multiple sources of data, allowed us to study the influences of chromatin (Papers I and II) and chromosomal aberrations (Paper IV) on transcription. Combining chromatin data with chromosomal aberration data allowed us to identify putative driver oncogenes and tumor-suppressor genes in cancer (Paper IV).

    Bayesian approaches enabling the incorporation of background information in the models and the adaptability of such models to data have been very useful. Their usages yielded accurate and narrow detection of chromosomal breakpoints in cancer (Papers III and IV) and reliable positioning of nucleosomes and their dynamics during transcriptional regulation at functionally relevant regulatory elements (Paper II).

    Using massively parallel sequencing data, we explored the chromatin landscapes of human cells (Papers I and II) and concluded that there is a preferential and evolutionary conserved positioning at internal exons nearly unaffected by the transcriptional level. We also observed a strong association between certain histone modifications and the inclusion or exclusion of an exon in the mature gene transcript, suggesting a functional role in splicing.

    List of papers
    1. Nucleosomes are well positioned in exons and carry characteristic histone modifications
    Open this publication in new window or tab >>Nucleosomes are well positioned in exons and carry characteristic histone modifications
    Show others...
    2009 (English)In: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 19, no 10, p. 1732-1741Article in journal (Refereed) Published
    Abstract [en]

    The genomes of higher organisms are packaged in nucleosomes with functional histone modifications. Until now, genome-wide nucleosome and histone modification studies have focused on transcription start sites (TSSs) where nucleosomes in RNA polymerase II (RNAPII) occupied genes are well positioned and have histone modifications that are characteristic of expression status. Using public data, we here show that there is a higher nucleosome-positioning signal in internal human exons and that this positioning is independent of expression. We observed a similarly strong nucleosome-positioning signal in internal exons of C. elegans. Among the 38 histone modifications analyzed in man, H3K36me3, H3K79me1, H2BK5me1, H3K27me1, H3K27me2 and H3K27me3 had evidently higher signal in internal exons than in the following introns and were clearly related to exon expression. These observations are suggestive of roles in splicing. Thus, exons are not only characterized by their coding capacity but also by their nucleosome organization, which seems evolutionary conserved since it is present in both primates and nematodes.

    National Category
    Medical and Health Sciences
    Identifiers
    urn:nbn:se:uu:diva-107609 (URN)10.1101/gr.092353.109 (DOI)000270389700005 ()19687145 (PubMedID)
    Note

    De tre första författarna delar första författarskapet.

    Available from: 2009-08-19 Created: 2009-08-19 Last updated: 2022-01-28Bibliographically approved
    2. Strand-based mixture modeling of nucleosome positioning in HepG2 cells and their regulatory dynamics in response to TGF-beta treatment
    Open this publication in new window or tab >>Strand-based mixture modeling of nucleosome positioning in HepG2 cells and their regulatory dynamics in response to TGF-beta treatment
    Show others...
    (English)Manuscript (preprint) (Other academic)
    Identifiers
    urn:nbn:se:uu:diva-130998 (URN)
    Available from: 2010-09-20 Created: 2010-09-20 Last updated: 2010-11-11
    3. A Segmental Maximum A Posteriori Approach to Genome-wide Copy Number Profiling
    Open this publication in new window or tab >>A Segmental Maximum A Posteriori Approach to Genome-wide Copy Number Profiling
    Show others...
    2008 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 24, no 6, p. 751-758Article in journal (Other academic) Published
    Abstract [en]

    MOTIVATION: Copy number profiling methods aim at assigning DNA copy numbers to chromosomal regions using measurements from microarray-based comparative genomic hybridizations. Among the proposed methods to this end, Hidden Markov Model (HMM)-based approaches seem promising since DNA copy number transitions are naturally captured in the model. Current discrete-index HMM-based approaches do not, however, take into account heterogeneous information regarding the genomic overlap between clones. Moreover, the majority of existing methods are restricted to chromosome-wise analysis. RESULTS: We introduce a novel Segmental Maximum A Posteriori approach, SMAP, for DNA copy number profiling. Our method is based on discrete-index Hidden Markov Modeling and incorporates genomic distance and overlap between clones. We exploit a priori information through user-controllable parameterization that enables the identification of copy number deviations of various lengths and amplitudes. The model parameters may be inferred at a genome-wide scale to avoid overfitting of model parameters often resulting from chromosome-wise model inference. We report superior performances of SMAP on synthetic data when compared with two recent methods. When applied on our new experimental data, SMAP readily recognizes already known genetic aberrations including both large-scale regions with aberrant DNA copy number and changes affecting only single features on the array. We highlight the differences between the prediction of SMAP and the compared methods and show that SMAP accurately determines copy number changes and benefits from overlap consideration.

    National Category
    Medical and Health Sciences
    Identifiers
    urn:nbn:se:uu:diva-13616 (URN)10.1093/bioinformatics/btn003 (DOI)000254010400003 ()18204059 (PubMedID)
    Available from: 2008-08-21 Created: 2008-08-21 Last updated: 2022-01-28Bibliographically approved
    4. Integrative epigenomic and genomic analysis of malignant pheochromocytoma
    Open this publication in new window or tab >>Integrative epigenomic and genomic analysis of malignant pheochromocytoma
    Show others...
    2010 (English)In: Experimental and Molecular Medicine, ISSN 1226-3613, E-ISSN 2092-6413, Vol. 42, no 7, p. 484-502Article in journal (Refereed) Published
    Abstract [en]

    Epigenomic and genomic changes affect gene expression and contribute to tumor development. The histone modifications trimethylated histone H3 lysine 4 (H3K4me3) and lysine 27 (H3K27me3) are epigenetic regulators associated to active and silenced genes, respectively and alterations of these modifications have been observed in cancer. Furthermore, genomic aberrations such as DNA copy number changes are common events in tumors. Pheochromocytoma is a rare endocrine tumor of the adrenal gland that mostly occurs sporadic with unknown epigenetic/genetic cause. The majority of cases are benign. Here we aimed to combine the genome-wide profiling of H3K4me3 and H3K27me3, obtained by the ChIP-chip methodology, and DNA copy number data with global gene expression examination in a malignant pheochromocytoma sample. The integrated analysis of the tumor expression levels, in relation to normal adrenal medulla, indicated that either histone modifications or chromosomal alterations, or both, have great impact on the expression of a substantial fraction of the genes in the investigated sample. Candidate tumor suppressor genes identified with decreased expression, a H3K27me3 mark and/or in regions of deletion were for instance TGIF1, DSC3, TNFRSF10B, RASSF2, HOXA9, PTPRE and CDH11. More genes were found with increased expression, a H3K4me3 mark, and/or in regions of gain. Potential oncogenes detected among those were GNAS, INSM1, DOK5, ETV1, RET, NTRK1, IGF2, and the H3K27 trimethylase gene EZH2. Our approach to associate histone methylations and DNA copy number changes to gene expression revealed apparent impact on global gene transcription, and enabled the identification of candidate tumor genes for further exploration.

    Keywords
    histone code, DNA copy number changes, gene expression, oncogenes, pheochromocytoma, tumor suppressor genes
    National Category
    Medical and Health Sciences
    Identifiers
    urn:nbn:se:uu:diva-129532 (URN)10.3858/emm.2010.42.7.050 (DOI)000280558100002 ()20534969 (PubMedID)
    Available from: 2010-08-18 Created: 2010-08-18 Last updated: 2022-01-28Bibliographically approved
    Download full text (pdf)
    FULLTEXT01
  • 14.
    Andreev, Georgy
    et al.
    Insilico Med AI Ltd, Masdar City 145748, U Arab Emirates..
    Kovalenko, Max
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing.
    Bozdaganyan, Marine E.
    Shenzhen MSU BIT Univ, Fac Biol, Shenzhen 518172, Peoples R China..
    Orekhov, Philipp S.
    Shenzhen MSU BIT Univ, Fac Biol, Shenzhen 518172, Peoples R China..
    Colabind: A Cloud-Based Approach for Prediction of Binding Sites Using Coarse-Grained Simulations with Molecular Probes2024In: Journal of Physical Chemistry B, ISSN 1520-6106, E-ISSN 1520-5207, Vol. 128, no 13, p. 3211-3219Article in journal (Refereed)
    Abstract [en]

    Binding site prediction is a crucial step in understanding protein-ligand and protein-protein interactions (PPIs) with broad implications in drug discovery and bioinformatics. This study introduces Colabind, a robust, versatile, and user-friendly cloud-based approach that employs coarse-grained molecular dynamics simulations in the presence of molecular probes, mimicking fragments of drug-like compounds. Our method has demonstrated high effectiveness when validated across a diverse range of biological targets spanning various protein classes, successfully identifying orthosteric binding sites, as well as known druggable allosteric or PPI sites, in both experimentally determined and AI-predicted protein structures, consistently placing them among the top-ranked sites. Furthermore, we suggest that careful inspection of the identified regions with a high affinity for specific probes can provide valuable insights for the development of pharmacophore hypotheses. The approach is available at https://github.com/porekhov/CG_probeMD

  • 15.
    Anlind, Alice
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Improvments and evaluation of data processing in LC-MS metabolomics: for application in in vitro systems pharmacology2017Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    The resistance of established medicines is rapidly increasing while the rate of

    discovery of new drugs and treatments have not increases during the last decades

    (Spiro et al. 2008). Systems pharmacology can be used to find new combinations or

    concentrations of established drugs to find new treatments faster (Borisy et al. 2003).

    A recent study aimed to use high resolution Liquid chromatography–mass

    spectrometry (LC-MS) for in vitro systems pharmacology, but encountered problems

    with unwanted variability and batch effects(Herman et al. 2017). This thesis builds on

    this work by improving the pipeline and comparing alternative methods and evaluating

    used methods. The evaluation of methods indicated that the data quality was often

    not improved substantially by complex methods and pipelines. Instead simpler

    methods such as binning for feature extraction performed best. In-fact many of the

    preprocessing method commonly used proved to have negative or neglect-able effects

    on resulting data quality. Finally the recently introduced Optimal Orthonormal System

    for Discriminant Analysis (OOS-DA) for batch removal was found to be a good

    alternative to the more complex Combat method.

    Download full text (pdf)
    fulltext
  • 16.
    Anyango, Stephen Omondi Otieno
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre. Uppsala University.
    VisuNet: Visualizing Networks of feature interactions in rule-based classifiers2016Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Download full text (pdf)
    fulltext
  • 17. Araújo, Naiara Pereira
    et al.
    de Lima, Leonardo Gomes
    Dias, Guilherme
    Kuhn, Gustavo Campos Silva
    de Melo, Alan Lane
    Yonenaga-Yassuda, Yatiyo
    Stanyon, Roscoe
    Svartman, Marta
    Identification and characterization of a subtelomeric satellite DNA in Callitrichini monkeys.2017In: DNA research, ISSN 1340-2838, E-ISSN 1756-1663, Vol. 24, no 4, p. 377-385Article in journal (Refereed)
    Abstract [en]

    Repetitive DNAs are abundant fast-evolving components of eukaryotic genomes, which often possess important structural and functional roles. Despite their ubiquity, repetitive DNAs are poorly studied when compared with the genic fraction of genomes. Here, we took advantage of the availability of the sequenced genome of the common marmoset Callithrix jacchus to assess its satellite DNAs (satDNAs) and their distribution in Callitrichini. After clustering analysis of all reads and comparisons by similarity, we identified a satDNA composed by 171 bp motifs, named MarmoSAT, which composes 1.09% of the C. jacchus genome. Fluorescent in situ hybridization on chromosomes of species from the genera Callithrix, Mico and Callimico showed that MarmoSAT had a subtelomeric location. In addition to the common monomeric, we found that MarmoSAT was also organized in higher-order repeats of 338 bp in Callimico goeldii. Our phylogenetic analyses showed that MarmoSAT repeats from C. jacchus lack chromosome-specific features, suggesting exchange events among subterminal regions of non-homologous chromosomes. MarmoSAT is transcribed in several tissues of C. jacchus, with the highest transcription levels in spleen, thymus and heart. The transcription profile and subtelomeric location suggest that MarmoSAT may be involved in the regulation of telomerase and modulation of telomeric chromatin.

  • 18. Araújo, Naiara Pereira
    et al.
    Dias, Guilherme
    Amaro, Beatriz Dias
    Kuhn, Gustavo Campos Silva
    Svartman, Marta
    The complete mitochondrial genomes of two Atlantic spiny rats, genus Trinomys (Rodentia: Echimyidae), from low-pass shotgun sequencing2016In: Gene Reports, ISSN 2452-0144, Vol. 5, p. 18-22Article in journal (Refereed)
  • 19.
    Ardell, David H.
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics. Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Evolution, Genomics and Systematics, Molecular Evolution.
    Andersson, Siv G. E.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Evolution, Genomics and Systematics, Molecular Evolution.
    TFAM detects co-evolution of tRNA identity rules with lateral transfer of histidyl-tRNA sythetase2006In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 34, no 3, p. 893-904Article in journal (Refereed)
    Abstract [en]

    We present TFAM, an automated, statistical method to classify the identity of tRNAs. TFAM, currently optimized for bacteria, classifies initiator tRNAs and predicts the charging identity of both typical and atypical tRNAs such as suppressors with high confidence. We show statistical evidence for extensive variation in tRNA identity determinants among bacterial genomes due to variation in overall tDNA base content. With TFAM we have detected the first case of eukaryotic-like tRNA identity rules in bacteria. An alpha-proteobacterial clade encompassing Rhizobiales, Caulobacter crescentus and Silicibacter pomeroyi, unlike a sister clade containing the Rickettsiales, Zymomonas mobilis and Gluconobacter oxydans, uses the eukaryotic identity element A73 instead of the highly conserved prokaryotic element C73. We confirm divergence of bacterial histidylation rules by demonstrating perfect covariation of alpha-proteobacterial tRNA(His) acceptor stems and residues in the motif IIb tRNA-binding pocket of their histidyl-tRNA synthetases (HisRS). Phylogenomic analysis supports lateral transfer of a eukaryotic-like HisRS into the alpha-proteobacteria followed by in situ adaptation of the bacterial tDNA(His) and identity rule divergence. Our results demonstrate that TFAM is an effective tool for the bioinformatics, comparative genomics and evolutionary study of tRNA identity.

  • 20.
    Arevalo, Sergio
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Chemistry, Department of Chemistry - Ångström. Univ Utrecht, Biol Dept, NL-3584 CH Utrecht, Netherlands.;CSIC, Inst Bioquim Vegetal & Fotosintesis, Seville 41092, Spain.;Univ Seville, Seville 41092, Spain.;Rensselaer Polytech Inst, Dept Biol Sci, Troy, NY 12180 USA..
    Rico, Daniel Perez
    Univ Utrecht, Biol Dept, NL-3584 CH Utrecht, Netherlands..
    Abarca, Dolores
    Univ Utrecht, Biol Dept, NL-3584 CH Utrecht, Netherlands.;Univ Alcala, Dept Life Sci, Alcala De Henares, Spain..
    Dijkhuizen, Laura W.
    Univ Utrecht, Biol Dept, NL-3584 CH Utrecht, Netherlands..
    Sarasa-Buisan, Cristina
    CSIC, Inst Bioquim Vegetal & Fotosintesis, Seville 41092, Spain.;Univ Seville, Seville 41092, Spain..
    Lindblad, Peter
    Uppsala University, Disciplinary Domain of Science and Technology, Chemistry, Department of Chemistry - Ångström, Molecular Biomimetics.
    Flores, Enrique
    CSIC, Inst Bioquim Vegetal & Fotosintesis, Seville 41092, Spain.;Univ Seville, Seville 41092, Spain..
    Nierzwicki-Bauer, Sandra
    Rensselaer Polytech Inst, Dept Biol Sci, Troy, NY 12180 USA..
    Schluepmann, Henriette
    Univ Utrecht, Biol Dept, NL-3584 CH Utrecht, Netherlands..
    Genome Engineering by RNA-Guided Transposition for Anabaena sp. PCC 71202024In: ACS Synthetic Biology, E-ISSN 2161-5063, Vol. 13, no 3, p. 901-912Article in journal (Refereed)
    Abstract [en]

    In genome engineering, the integration of incoming DNA has been dependent on enzymes produced by dividing cells, which has been a bottleneck toward increasing DNA insertion frequencies and accuracy. Recently, RNA-guided transposition with CRISPR-associated transposase (CAST) was reported as highly effective and specific in Escherichia coli. Here, we developed Golden Gate vectors to test CAST in filamentous cyanobacteria and to show that it is effective in Anabaena sp. strain PCC 7120. The comparatively large plasmids containing CAST and the engineered transposon were successfully transferred into Anabaena via conjugation using either suicide or replicative plasmids. Single guide (sg) RNA encoding the leading but not the reverse complement strand of the target were effective with the protospacer-associated motif (PAM) sequence included in the sgRNA. In four out of six cases analyzed over two distinct target loci, the insertion site was exactly 63 bases after the PAM. CAST on a replicating plasmid was toxic, which could be used to cure the plasmid. In all six cases analyzed, only the transposon cargo defined by the sequence ranging from left and right elements was inserted at the target loci; therefore, RNA-guided transposition resulted from cut and paste. No endogenous transposons were remobilized by exposure to CAST enzymes. This work is foundational for genome editing by RNA-guided transposition in filamentous cyanobacteria, whether in culture or in complex communities. [GRAPHICS] .

    Download full text (pdf)
    fulltext
  • 21.
    Attwood, Misty
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    The gene repertoire and functional characterization of membrane bound proteins: with focus on three- and four-transmembrane regions2015Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
  • 22.
    Attwood, Misty M.
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Neuroscience, Schiöth: Functional Pharmacology.
    Schiöth, Helgi B.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Neuroscience, Schiöth: Functional Pharmacology. Sechenov First Moscow State Med Univ, Inst Translat Med & Biotechnol, Moscow, Russia..
    Characterization of five transmembrane proteins: With focus on the Tweety, Sidoreflexin, and YIP1 domain families2021In: Frontiers in Cell and Developmental Biology, E-ISSN 2296-634X, Vol. 9, article id 708754Article in journal (Refereed)
    Abstract [en]

    Transmembrane proteins are involved in many essential cell processes such as signal transduction, transport, and protein trafficking, and hence many are implicated in different disease pathways. Further, as the structure and function of proteins are correlated, investigating a group of proteins with the same tertiary structure, i.e. the same number of transmembrane regions, may give understanding about their functional roles and potential as therapeutic targets. This analysis investigates the previously unstudied group of proteins with five transmembrane-spanning regions (5TM). More than half of the 58 proteins identified with the 5TM architecture belong to twelve families with two or more members, with ten complete families that do not have any other homologous human proteins identified. Interestingly, more than half the proteins in the dataset function in localization activities through movement or tethering of cell components and more than one-third are involved in transport activities, particularly in the mitochondria. Surprisingly, no receptor activity was identified within this family in large contrast with other TM families. The three major 5TM families include the Tweety family, which are pore-forming subunits of the swelling-dependent volume regulated anion channel in astrocytes; the sidoreflexin family that act as mitochondrial amino acid transporters; and the Yip1 domain family engaged in vesicle budding and intra-Golgi transport.  About 30% of the 5TM proteins have enhanced expression in the brain, liver, or testis. Importantly, 60% of these proteins are identified as cancer prognostic markers, where they are associated with clinical outcomes of various tumour types, indicating further investigation into the function and expression of these proteins is important. This study provides the first comprehensive analysis of proteins with 5TM providing details of the unique characteristics

    Download full text (pdf)
    fulltext
  • 23. Axelsson, Nils
    et al.
    Mårsäter, David
    Computational modelling of quorum sensing using cascade delay2022Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    The scope of this project was to implement a quorum sensing model capable of synchronised oscillations from the article ”A synchronized quorum of genetic clocks” [1] in the software framework URDME [2]. The model consists of a system of partial differential equations describing time delayed and coupled biochemical reactions. In URDME, the time delay system was formed using a cascade of reactions in which the rate of each reaction was set so that the expected total time for all reactions in the cascade corresponds to a certain delay time. One reason for this cascade delay model is that it might better capture the inherently stochastic nature of the delay mechanism in the quorum sensing network, as opposed to a model using explicit delays.Another reason is simplicity of implementation, as delays are not explicitly supported in URDME.

    After initial tests suggested that the cascade delay model gave satisfying results, it was incorporated into the quorum sensing model from the article, which was implemented by rewriting the differential equations as a system of biochemical reactions. Simulations in one and two dimensions were then done, with both stochastic and deterministic solution methods. The one dimensional and two dimensional simulations yielded distinct synchronised oscillations with a cascade delay containing five sub-reactions. Several results from the simulations of the original article could be reproduced.

    From the results, it was concluded that the proposed cascade delay model was successful in modelling the delayed reactions in the quorum sensing network. In future studies, it is suggested that the individual cells, in which most of the reactions in the quorum sensing network happen, are modelled with greater resolution.

    Download full text (pdf)
    fulltext
  • 24.
    Bajalan, Amanj
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Improved methods for virus detection and discovery in metagenomic sequence data2020Independent thesis Advanced level (degree of Master (Two Years)), 30 credits / 45 HE creditsStudent thesis
    Download full text (pdf)
    fulltext
  • 25.
    Baltzer, Nicholas
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational Biology and Bioinformatics.
    Predictive Healthcare: Cervical Cancer Screening Risk Stratification and Genetic Disease Markers2019Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The use of Machine Learning is rapidly expanding into previously uncharted waters. In the medicine fields there are vast troves of data available from hospitals, biobanks and registries that now are being explored due to the tremendous advancement in computer science and its related hardware. The progress in genomic extraction and analysis has made it possible for any individual to know their own genetic code. Genetic testing has become affordable and can be used as a tool in treatment, discovery, and prognosis of individuals in a wide variety of healthcare settings. This thesis addresses three different approaches to-wards predictive healthcare and disease exploration; first, the exploita-tion of diagnostic data in Nordic screening programmes for the purpose of identifying individuals at high risk of developing cervical cancer so that their screening schedules can be intensified in search of new dis-ease developments. Second, the search for genomic markers that can be used either as additions to diagnostic data for risk predictions or as can-didates for further functional analysis. Third, the development of a Ma-chine Learning pipeline called ||-ROSETTA that can effectively process large datasets in the search for common patterns. Together, this provides a functional approach to predictive healthcare that allows intervention at early stages of disease development resulting in treatments with reduced health consequences at a lower financial burden

    List of papers
    1. Risk stratification in cervical cancer screening by complete screening history: Applying bioinformatics to a general screening population
    Open this publication in new window or tab >>Risk stratification in cervical cancer screening by complete screening history: Applying bioinformatics to a general screening population
    Show others...
    2017 (English)In: International Journal of Cancer, ISSN 0020-7136, E-ISSN 1097-0215, Vol. 141, no 1, p. 200-209Article in journal (Refereed) Published
    Abstract [en]

    Women screened for cervical cancer in Sweden are currently treated under a one-size-fits-all programme, which has been successful in reducing the incidence of cervical cancer but does not use all of the participants' available medical information. This study aimed to use women's complete cervical screening histories to identify diagnostic patterns that may indicate an increased risk of developing cervical cancer. A nationwide case-control study was performed where cervical cancer screening data from 125,476 women with a maximum follow-up of 10 years were evaluated for patterns of SNOMED diagnoses. The cancer development risk was estimated for a number of different screening history patterns and expressed as Odds Ratios (OR), with a history of 4 benign cervical tests as reference, using logistic regression. The overall performance of the model was moderate (64% accuracy, 71% area under curve) with 61-62% of the study population showing no specific patterns associated with risk. However, predictions for high-risk groups as defined by screening history patterns were highly discriminatory with ORs ranging from 8 to 36. The model for computing risk performed consistently across different screening history lengths, and several patterns predicted cancer outcomes. The results show the presence of risk-increasing and risk-decreasing factors in the screening history. Thus it is feasible to identify subgroups based on their complete screening histories. Several high-risk subgroups identified might benefit from an increased screening density. Some low-risk subgroups identified could likely have a moderately reduced screening density without additional risk.

    Keywords
    bioinformatics, cervical cancer, screening, personalized medicine, machine learning
    National Category
    Cancer and Oncology Bioinformatics (Computational Biology)
    Identifiers
    urn:nbn:se:uu:diva-323754 (URN)10.1002/ijc.30725 (DOI)000400766500021 ()28383102 (PubMedID)
    Available from: 2017-06-12 Created: 2017-06-12 Last updated: 2019-10-07Bibliographically approved
    2. Allele specific chromatin signals, 3D interactions, and motif predictions for immune and B cell related diseases
    Open this publication in new window or tab >>Allele specific chromatin signals, 3D interactions, and motif predictions for immune and B cell related diseases
    Show others...
    2019 (English)In: Scientific Reports, E-ISSN 2045-2322, Vol. 9, article id 2695Article in journal (Refereed) Published
    Abstract [en]

    Several Genome Wide Association Studies (GWAS) have reported variants associated to immune diseases. However, the identified variants are rarely the drivers of the associations and the molecular mechanisms behind the genetic contributions remain poorly understood. ChIP-seq data for TFs and histone modifications provide snapshots of protein-DNA interactions allowing the identification of heterozygous SNPs showing significant allele specific signals (AS-SNPs). AS-SNPs can change a TF binding site resulting in altered gene regulation and are primary candidates to explain associations observed in GWAS and expression studies. We identified 17,293 unique AS-SNPs across 7 lymphoblastoid cell lines. In this set of cell lines we interrogated 85% of common genetic variants in the population for potential regulatory effect and we identified 237 AS-SNPs associated to immune GWAS traits and 714 to gene expression in B cells. To elucidate possible regulatory mechanisms we integrated long-range 3D interactions data to identify putative target genes and motif predictions to identify TFs whose binding may be affected by AS-SNPs yielding a collection of 173 AS-SNPs associated to gene expression and 60 to B cell related traits. We present a systems strategy to find functional gene regulatory variants, the TFs that bind differentially between alleles and novel strategies to detect the regulated genes.

    Place, publisher, year, edition, pages
    NATURE PUBLISHING GROUP, 2019
    National Category
    Medical Genetics
    Identifiers
    urn:nbn:se:uu:diva-379258 (URN)10.1038/s41598-019-39633-0 (DOI)000459571100059 ()30804403 (PubMedID)
    Funder
    Swedish Research Council, 78081Swedish National Infrastructure for Computing (SNIC)EXODIAB - Excellence of Diabetes Research in SwedenSwedish Diabetes AssociationErnfors FoundationSwedish Cancer Society, 160518German Research Foundation (DFG), GR-3526/1German Research Foundation (DFG), GR-3526/2
    Available from: 2019-03-15 Created: 2019-03-15 Last updated: 2022-09-15Bibliographically approved
    3. Risk Stratification in Cervical Cancer Screening – Validation and Generalization of a Data-driven  Screening Recall Model
    Open this publication in new window or tab >>Risk Stratification in Cervical Cancer Screening – Validation and Generalization of a Data-driven  Screening Recall Model
    Show others...
    (English)Manuscript (preprint) (Other academic)
    Keywords
    Cervical Cancer, Screening, Classification, Bioinformatics, Rough Sets
    National Category
    Bioinformatics and Systems Biology
    Research subject
    Bioinformatics; Bioinformatics
    Identifiers
    urn:nbn:se:uu:diva-394291 (URN)
    Available from: 2019-10-07 Created: 2019-10-07 Last updated: 2019-10-07
    4. Studies of liver tissue identify functional gene regulatory elements associated to gene expression, type 2 diabetes, and other metabolic diseases
    Open this publication in new window or tab >>Studies of liver tissue identify functional gene regulatory elements associated to gene expression, type 2 diabetes, and other metabolic diseases
    Show others...
    2019 (English)In: Human Genomics, ISSN 1473-9542 , E-ISSN 1479-7364 , Vol. 13, article id 20Article in journal (Refereed) Published
    Abstract [en]

    Background:

    Genome-wide association studies (GWAS) of diseases and traits have found associations to gene regions but not the functional SNP or the gene mediating the effect. Difference in gene regulatory signals can be detected using chromatin immunoprecipitation and next-gen sequencing (ChIP-seq) of transcription factors or histone modifications by aligning reads to known polymorphisms in individual genomes. The aim was to identify such regulatory elements in the human liver to understand the genetics behind type 2 diabetes and metabolic diseases.

    Methods:

    The genome of liver tissue was sequenced using 10X Genomics technology to call polymorphic positions. Using ChIP-seq for two histone modifications, H3K4me3 and H3K27ac, and the transcription factor CTCF, and our established bioinformatics pipeline, we detected sites with significant difference in signal between the alleles.

    Results:

    We detected 2329 allele-specific SNPs (AS-SNPs) including 25 associated to GWAS SNPs linked to liver biology, e.g., 4 AS-SNPs at two type 2 diabetes loci. Two hundred ninety-two AS-SNPs were associated to liver gene expression in GTEx, and 134 AS-SNPs were located on 166 candidate functional motifs and most of them in EGR1-binding sites.

    Conclusions:

    This study provides a valuable collection of candidate liver regulatory elements for further experimental validation.

    Keywords
    ChIP-seq, T2D, Regulatory SNPs
    National Category
    Medical Genetics Bioinformatics and Systems Biology
    Identifiers
    urn:nbn:se:uu:diva-383513 (URN)10.1186/s40246-019-0204-8 (DOI)000466335200001 ()31036066 (PubMedID)
    Available from: 2019-05-16 Created: 2019-05-16 Last updated: 2020-11-18Bibliographically approved
    5. ||-ROSETTA
    Open this publication in new window or tab >>||-ROSETTA
    (English)Manuscript (preprint) (Other academic)
    Keywords
    bioinformatics, Rough Sets
    National Category
    Computer Sciences Bioinformatics (Computational Biology)
    Research subject
    Bioinformatics; Computer Science
    Identifiers
    urn:nbn:se:uu:diva-393477 (URN)
    Available from: 2019-10-07 Created: 2019-10-07 Last updated: 2019-10-07
    Download full text (pdf)
    fulltext
    Download (jpg)
    presentationsbild
  • 26.
    Baltzer, Nicholas
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational Biology and Bioinformatics.
    Komorowski, Jan
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational Biology and Bioinformatics.
    Sundström, Karin
    Nygård, Jan
    Nygård, Mari
    Dillner, Joakim
    Risk Stratification in Cervical Cancer Screening – Validation and Generalization of a Data-driven  Screening Recall ModelManuscript (preprint) (Other academic)
  • 27.
    Bartoszek, Krzysztof
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Mathematics, Applied Mathematics and Statistics.
    Phylogenetic effective sample size2016In: Journal of Theoretical Biology, ISSN 0022-5193, E-ISSN 1095-8541, Vol. 407, p. 371-386Article in journal (Refereed)
    Abstract [en]

    In this paper I address the question—how large is a phylogenetic sample? I propose a definition of a phylogenetic effective sample size for Brownian motion and Ornstein-Uhlenbeck processes-the regression effective sample size. I discuss how mutual information can be used to define an effective sample size in the non-normal process case and compare these two definitions to an already present concept of effective sample size (the mean effective sample size). Through a simulation study I find that the AICc is robust if one corrects for the number of species or effective number of species. Lastly I discuss how the concept of the phylogenetic effective sample size can be useful for biodiversity quantification, identification of interesting clades and deciding on the importance of phylogenetic correlations.

  • 28.
    Belin, Stella
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Transparent Machine Learning for Multi-Omics Analysis of Mental Disorders2020Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Schizophrenia and bipolar disorder are two severe mental disorders that affect more than 65 million individuals worldwide. The aim of thisproject was to find co-prediction mechanisms for genes associated with schizophrenia and bipolar disorder using a multi-omics data set and a transparent machine learning approach. The overall purpose of theproject was to further understand the biological mechanisms of these complex disorders. In this work, publicly available multi-omics data collected from post-mortem brain tissue were used. The omics types included were gene expression, DNA methylation, and SNP array data. The data consisted of samples from individuals with schizophrenia, bipolar disorder, and healthy controls. Individuals with schizophrenia or bipolar disorder were considered as a combined CASE class.

    Using machine learning techniques, a multi-omics pipeline was developedto integrate these data in a manner such that all types were adequately represented. A feature selection was performed on methylation and SNP data, where the most important sites were estimated and mapped to their corresponding genes. Next, those genes were intersected with the gene expression data, and another feature selection was performed on the gene expression data. The most important genes were used to develop an interpretable rule-based model with an accuracy of 88%. The model wasthen visualized as a network. The graph highlighted genes that may be of biological importance, including CACNG8, RTN4, TERT, OSBPL8, and ANTXR1. Moreover, strong co-predictions were found, most notable between CNKSR4 and KDM4C in CASE samples. However, further investigations would need to be performed in order to prove that these are real biological interactions.

    Through the methods developed and the results found in this project, we hope to shed new light towards analyzing multi-omics data as well as to reveal more about the underlying mechanisms of psychiatric disorders.

    Download full text (pdf)
    fulltext
  • 29.
    Berglund, Eva Caroline
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Evolution, Genomics and Systematics, Molecular Evolution.
    Genome Evolution and Host Adaptation in Bartonella2009Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Bacteria of the genus Bartonella infect the red blood cells of a wide range of wild and domestic mammals and are transmitted between hosts by blood-sucking insects. Although most Bartonella infections are asymptomatic, the genus contains several human pathogens. In this work, host adaptation and host switches in Bartonella have been studied from a genomic perspective, with special focus on the acquisition and evolution of genes involved in host interactions.

    As part of this study, the complete genome of B. grahamii isolated from a Swedish wood mouse was sequenced. A genus-wide comparison revealed that rodent-associated Bartonella species, which have rarely been associated with human disease, have the largest genomes and the largest number of host-adaptability genes. Analysis of known and putative genes for host interactions identified several families of autotransporters as horizontally transferred to the Bartonella ancestor, with a possible role both during early host adaptation and subsequent host shifts.

    In B. grahamii, the association of a gene transfer agent (GTA) and phage-derived run-off replication of a large genomic segment was demonstrated for the first time. Among all acquisitions to the Bartonella ancestor, the only well conserved gene clusters are those that encode the GTA and contain the origin of the run-off replication. This conservation, along with a high density of host-adaptability genes in the amplified region suggest that the GTA provides a strong selective advantage, possibly by increasing recombination frequencies of host-adaptability genes, thereby facilitating evasion of the host immune system and colonization of new hosts.

    B. grahamii displays stronger geographic pattern and higher recombination frequencies than the cat-associated B. henselae, probably caused by different lifestyles and/or population sizes of the hosts. The genomic diversity of B. grahamii is markedly lower in Europe and North America than in Asia, possibly an effect of reduced host variability in these areas following the latest ice age.

    List of papers
    1. Run-off replication of host-adaptability genes is associated with gene transfer agents in the genome of mouse-infecting Bartonella grahamii
    Open this publication in new window or tab >>Run-off replication of host-adaptability genes is associated with gene transfer agents in the genome of mouse-infecting Bartonella grahamii
    Show others...
    2009 (English)In: PLoS genetics, ISSN 1553-7404, Vol. 5, no 7, p. e1000546-Article in journal (Refereed) Published
    Abstract [en]

    The genus Bartonella comprises facultative intracellular bacteria adapted to mammals, including previously recognized and emerging human pathogens. We report the 2,341,328 bp genome sequence of Bartonella grahamii, one of the most prevalent Bartonella species in wild rodents. Comparative genomics revealed that rodent-associated Bartonella species have higher copy numbers of genes for putative host-adaptability factors than the related human-specific pathogens. Many of these gene clusters are located in a highly dynamic region of 461 kb. Using hybridization to a microarray designed for the B. grahamii genome, we observed a massive, putatively phage-derived run-off replication of this region. We also identified a novel gene transfer agent, which packages the bacterial genome, with an over-representation of the amplified DNA, in 14 kb pieces. This is the first observation associating the products of run-off replication with a gene transfer agent. Because of the high concentration of gene clusters for host-adaptation proteins in the amplified region, and since the genes encoding the gene transfer agent and the phage origin are well conserved in Bartonella, we hypothesize that these systems are driven by selection. We propose that the coupling of run-off replication with gene transfer agents promotes diversification and rapid spread of host-adaptability factors, facilitating host shifts in Bartonella.

    National Category
    Biological Sciences
    Identifiers
    urn:nbn:se:uu:diva-108371 (URN)10.1371/journal.pgen.1000546 (DOI)000269219500042 ()19578403 (PubMedID)
    Available from: 2009-09-17 Created: 2009-09-17 Last updated: 2022-01-28Bibliographically approved
    2. Genome dynamics of Bartonella grahamii in micro-populations of woodland rodents
    Open this publication in new window or tab >>Genome dynamics of Bartonella grahamii in micro-populations of woodland rodents
    Show others...
    2010 (English)In: BMC Genomics, E-ISSN 1471-2164, Vol. 11, p. 152-Article in journal (Refereed) Published
    Abstract [en]

    Background: Rodents represent a high-risk reservoir for the emergence of new human pathogens. The recent completion of the 2.3 Mb genome of Bartonella grahamii, one of the most prevalent blood-borne bacteria in wild rodents, revealed a higher abundance of genes for host-cell interaction systems than in the genomes of closely related human pathogens. The sequence variability within the global B. grahamii population was recently investigated by multi locus sequence typing, but no study on the variability of putative host-cell interaction systems has been performed.

    Results: To study the population dynamics of B. grahamii, we analyzed the genomic diversity on a whole-genome scale of 27 B. grahamii strains isolated from four different species of wild rodents in three geographic locations separated by less than 30 km. Even using highly variable spacer regions, only 3 sequence types were identified. This low sequence diversity contrasted with a high variability in genome content. Microarray comparative genome hybridizations identified genes for outer surface proteins, including a repeated region containing the fha gene for filamentous hemaggluttinin and a plasmid that encodes a type IV secretion system, as the most variable. The estimated generation times in liquid culture medium for a subset of strains ranged from 5 to 22 hours, but did not correlate with sequence type or presence/absence patterns of the fha gene or the plasmid.

    Conclusion: Our study has revealed a geographic microstructure of B. grahamii in wild rodents. Despite near-identity in nucleotide sequence, major differences were observed in gene presence/absence patterns that did not segregate with host species. This suggests that genetically similar strains can infect a range of different hosts.

    National Category
    Biological Sciences
    Identifiers
    urn:nbn:se:uu:diva-108379 (URN)10.1186/1471-2164-11-152 (DOI)000276363100003 ()
    Available from: 2009-09-23 Created: 2009-09-17 Last updated: 2024-01-17Bibliographically approved
    3. Diversification by recombination in Bartonella grahamii from wild rodents in Asia contrasts with a clonal population structure in Northern Europe and America
    Open this publication in new window or tab >>Diversification by recombination in Bartonella grahamii from wild rodents in Asia contrasts with a clonal population structure in Northern Europe and America
    Show others...
    (English)Manuscript (preprint) (Other academic)
    Identifiers
    urn:nbn:se:uu:diva-108384 (URN)
    Available from: 2009-09-24 Created: 2009-09-17 Last updated: 2010-01-14
    4. Evolution of Host Adaptation Systems in  the Mammalian Blood Specialist Bartonella
    Open this publication in new window or tab >>Evolution of Host Adaptation Systems in  the Mammalian Blood Specialist Bartonella
    Show others...
    (English)Manuscript (preprint) (Other academic)
    Abstract [en]

    Bacteria of the genus Bartonella are facultative intracellular bacteria infecting the red blood cells of mammals. Bartonella isolates have now been reported from a wide range of mammalian host species, including humans, domestic animals such as pets and livestock, as well as many wild animals such as deer, moose, kangaroo, and whales. Here, we present the first major genus-wide investigation of host-adaptation systems in Bartonella, using 5 published and 5 draft genome sequences. The sampling includes both clinical and natural isolates, and represent well the major phylogenetic diversity of the genus. Our study reveals four distinct protein families of Type V Secretion Systems (T5SS) shared by all sequenced members of the genus. We also show that a recently identified gene transfer agent (GTA) consisting of a defective phage is, surprisingly, the most conserved gene cluster among all Bartonella-specific or imported genes, strongly emphasizing the functional importance of this system for the life-style and evolution of Bartonella.

    Keywords
    host adaptation, pathogen, secretion systems, flagella, gene transfer agent, evolution
    National Category
    Bioinformatics and Systems Biology
    Research subject
    Evolutionary Genetics
    Identifiers
    urn:nbn:se:uu:diva-107784 (URN)
    Available from: 2009-08-26 Created: 2009-08-26 Last updated: 2010-01-14
    5. Low-coverage pyrosequencing reveals recombination and run-off replication in Bartonella henselae strains
    Open this publication in new window or tab >>Low-coverage pyrosequencing reveals recombination and run-off replication in Bartonella henselae strains
    Show others...
    (English)Manuscript (preprint) (Other academic)
    Abstract [en]

    Bartonella henselae is a natural intracellular colonizer of cats, and is transferred by blood-sucking insect vectors. It is also an opportunistic human pathogen. Two strains of B. henselae, thought to be representative of the diversity of the species, were selected for low-coverage 454 sequencing. The comparison of these two strains to the published Houston-1 reveals very high nucleotide identity and low substitution and recombination, with the remarkable exception of phages and host-interaction genes such as type IV and V secretion systems. Among the few variable genes of unknown function, BH14680, an alpha-Proteobacteria-specific gene, shows faster evolution in Bartonella compared to other alpha-Proteobacteria. Its 5’ end, which is likely coding for a domain exposed extracellularly, is under positive or very relaxed selection, and might be involved in host-interaction processes. Finally, we show that a simple genome coverage analysis reveal major genomic events such as duplications and unusual replication modes, such as the run-off replication. The latter, combined with a gene transfer agent, is thought to be a novel way to increase substitution and recombination frequencies. An extensive analysis of all bacterial pyrosequencing projects showed that it is probably Bartonella-specific.

    Keywords
    pathogen, recombination, run-off replication, phage, gene transfer agent, pyrosequencing, evolution
    National Category
    Bioinformatics and Systems Biology
    Research subject
    Evolutionary Genetics
    Identifiers
    urn:nbn:se:uu:diva-107785 (URN)
    Available from: 2009-08-27 Created: 2009-08-26 Last updated: 2010-01-14
    Download full text (pdf)
    FULLTEXT01
  • 30.
    Besnier, Francois
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Development of Variance Component Methods for Genetic Dissection of Complex Traits2009Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    This thesis presents several developments on Variance component (VC) approach for Quantitative Trait Locus (QTL) mapping.

    The first part consists of methodological improvements: a new fast and efficient method for estimating IBD matrices, have been developed. The new method makes a better use of the computer resources in terms of computational power and storage memory, facilitating further improvements by resolving methodological bottlenecks in algorithms to scan multiple QTL. A new VC model have also been developed in order to consider and evaluate the correlation of the allelic effects within parental lines origin in experimental outbred crosses. The method was tested on simulated and experimental data and revealed a higher or similar power to detect QTL than linear regression based QTL mapping.

    The second part focused on the prospect to analyze multi-generational pedigrees by VC approach. The IBD estimation algorithm was extended to include haplotype information in addition to genotype and pedigree to improve the accuracy of the IBD estimates, and a new haplotyping algorithm was developed for limiting the risk of haplotyping errors in multigenerational pedigrees. Those newly developed methods where subsequently applied for the analysis of a nine generations AIL pedigree obtained after crossing two chicken lines divergently selected for body weight. Nine QTL described in a F2 population were replicated in the AIL pedigree, and our strategy to use both genotype and phenotype information from all individuals in the entire pedigree clearly made efficient use of the available genotype information provided in AIL.

    List of papers
    1. An Improved Method for Quantitative Trait Loci Detection and Identification of Within-Line Segregation in F2 Intercross Designs
    Open this publication in new window or tab >>An Improved Method for Quantitative Trait Loci Detection and Identification of Within-Line Segregation in F2 Intercross Designs
    2008 (English)In: Genetics, ISSN 0016-6731, E-ISSN 1943-2631, Vol. 178, no 4, p. 2315-2326Article in journal (Refereed) Published
    Abstract [en]

    We present a new flexible, simple, and power ful genome-scan method (flexible intercross analysis, FIA) for detecting quantitative trait loci (QTL) in experimental line crosses. The method is based on a pure random-effects model that simultaneously models between- and within-line QTL variation for single as well as epistatic QTL. It utilizes the score statistic and thereby facilitates computationally efficient significance testing based on empirical significance thresholds obtained by means of permutations. The properties of the method are explored using simulations and analyses of experimental data. The simulations showed that the power of FIA was as good as, or better than, Haley–Knott regression and that FIA was rather insensitive to the level of allelic fixation in the founders, especially for pedigrees with few founders. A chromosome scan was conducted for a meat quality trait in an F2 intercross in pigs where a mutation in the halothane (Ryanodine receptor, RYR1) gene with a large effect on meat quality was known to segregate in one founder line. FIA obtained significant support for the halothane-associated QTL and identified the base generation allele with the mutated allele. A genome scan was also performed in a previously analyzed chicken F2 intercross. In the chicken intercross analysis, four previously detected QTL were confirmed at a 5% genomewide significance level, and FIA gave strong evidence (P , 0.01) for two of these QTL to be segregating within the founder lines. FIA was also extended to account for epistasis and using simulations we show that the method provides good estimates of epistatic QTL variance even for segregating QTL. Extensions of FIA and its applications on other intercross populations including backcrosses, advanced intercross lines, and heterogeneous stocks are also discussed.

    National Category
    Genetics
    Research subject
    Genetics
    Identifiers
    urn:nbn:se:uu:diva-101358 (URN)10.1534/genetics.107.083162 (DOI)000255239600039 ()18430952 (PubMedID)
    Available from: 2009-05-06 Created: 2009-04-23 Last updated: 2017-12-13Bibliographically approved
    2. Fine mapping and replication of QTL in outbred chicken advanced intercross lines
    Open this publication in new window or tab >>Fine mapping and replication of QTL in outbred chicken advanced intercross lines
    Show others...
    2011 (English)In: Genetics Selection Evolution, ISSN 0999-193X, E-ISSN 1297-9686, Vol. 43, p. 3-Article in journal (Refereed) Published
    Abstract [en]

    BACKGROUND: Linkage mapping is used to identify genomic regions affecting the expression of complex traits. However, when experimental crosses such as F2 populations or backcrosses are used to map regions containing a Quantitative Trait Locus (QTL), the size of the regions identified remains quite large, i.e. 10 or more Mb. Thus, other experimental strategies are needed to refine the QTL locations. Advanced Intercross Lines (AIL) are produced by repeated intercrossing of F2 animals and successive generations, which decrease linkage disequilibrium in a controlled manner. Although this approach is seen as promising, both to replicate QTL analyses and fine-map QTL, only a few AIL datasets, all originating from inbred founders, have been reported in the literature.

    METHODS: We have produced a nine-generation AIL pedigree (n = 1529) from two outbred chicken lines divergently selected for body weight at eight weeks of age. All animals were weighed at eight weeks of age and genotyped for SNP located in nine genomic regions where significant or suggestive QTL had previously been detected in the F2 population. In parallel, we have developed a novel strategy to analyse the data that uses both genotype and pedigree information of all AIL individuals to replicate the detection of and fine-map QTL affecting juvenile body weight.

    RESULTS: Five of the nine QTL detected with the original F2 population were confirmed and fine-mapped with the AIL, while for the remaining four, only suggestive evidence of their existence was obtained. All original QTL were confirmed as a single locus, except for one, which split into two linked QTL.

    CONCLUSIONS: Our results indicate that many of the QTL, which are genome-wide significant or suggestive in the analyses of large intercross populations, are true effects that can be replicated and fine-mapped using AIL. Key factors for success are the use of large populations and powerful statistical tools. Moreover, we believe that the statistical methods we have developed to efficiently study outbred AIL populations will increase the number of organisms for which in-depth complex traits can be analyzed.

     

    National Category
    Genetics
    Research subject
    Genetics
    Identifiers
    urn:nbn:se:uu:diva-101398 (URN)10.1186/1297-9686-43-3 (DOI)000287133300001 ()21241486 (PubMedID)
    Available from: 2009-04-24 Created: 2009-04-24 Last updated: 2022-01-28
    3.
    The record could not be found. The reason may be that the record is no longer available or you may have typed in a wrong id in the address field.
    4. A genetic algorithm based method for stringent haplotyping of family data
    Open this publication in new window or tab >>A genetic algorithm based method for stringent haplotyping of family data
    2009 (English)In: BMC Genetics, E-ISSN 1471-2156, Vol. 10, article id 57Article in journal (Refereed) Published
    Abstract [en]

    Background: The linkage phase, or haplotype, is an extra level of information that in addition to genotype and pedigree can be useful for reconstructing the inheritance pattern of the alleles in a pedigree, and computing for example Identity By Descent probabilities. If a haplotype is provided, the precision of estimated IBD probabilities increases, as long as the haplotype is estimated without errors. It is therefore important to only use haplotypes that are strongly supported by the available data for IBD estimation, to avoid introducing new errors due to erroneous linkage phases.

    Results: We propose a genetic algorithm based method for haplotype estimation in family data that includes a stringency parameter. This allows the user to decide the error tolerance level when inferring parental origin of the alleles. This is a novel feature compared to existing methods for haplotype estimation. We show that using a high stringency produces haplotype data with few errors, whereas a low stringency provides haplotype estimates in most situations, but with an increased number of errors.

    Conclusion: By including a stringency criterion in our haplotyping method, the user is able to maintain the error rate at a suitable level for the particular study; one can select anything from haplotyped data with very small proportion of errors and a higher proportion of non-inferred haplotypes, to data with phase estimates for every marker, when haplotype errors are tolerable. Giving this choice makes the method more flexible and useful in a wide range of applications as it is able to fulfil different requirements regarding the tolerance for haplotype errors, or uncertain marker-phases.

    National Category
    Bioinformatics and Systems Biology
    Research subject
    Genetics
    Identifiers
    urn:nbn:se:uu:diva-101397 (URN)10.1186/1471-2156-10-57 (DOI)000270360900001 ()19761594 (PubMedID)
    Note

    Manuscripttitle in list of papers in thesis: A genetic algorithm based haplotyping method provides better control on haplotype error rate

    Available from: 2009-04-24 Created: 2009-04-24 Last updated: 2024-01-17Bibliographically approved
    Download full text (pdf)
    FULLTEXT01
  • 31.
    Besnier, Francois
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Carlborg, Örjan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    A genetic algorithm based method for stringent haplotyping of family data2009In: BMC Genetics, E-ISSN 1471-2156, Vol. 10, article id 57Article in journal (Refereed)
    Abstract [en]

    Background: The linkage phase, or haplotype, is an extra level of information that in addition to genotype and pedigree can be useful for reconstructing the inheritance pattern of the alleles in a pedigree, and computing for example Identity By Descent probabilities. If a haplotype is provided, the precision of estimated IBD probabilities increases, as long as the haplotype is estimated without errors. It is therefore important to only use haplotypes that are strongly supported by the available data for IBD estimation, to avoid introducing new errors due to erroneous linkage phases.

    Results: We propose a genetic algorithm based method for haplotype estimation in family data that includes a stringency parameter. This allows the user to decide the error tolerance level when inferring parental origin of the alleles. This is a novel feature compared to existing methods for haplotype estimation. We show that using a high stringency produces haplotype data with few errors, whereas a low stringency provides haplotype estimates in most situations, but with an increased number of errors.

    Conclusion: By including a stringency criterion in our haplotyping method, the user is able to maintain the error rate at a suitable level for the particular study; one can select anything from haplotyped data with very small proportion of errors and a higher proportion of non-inferred haplotypes, to data with phase estimates for every marker, when haplotype errors are tolerable. Giving this choice makes the method more flexible and useful in a wide range of applications as it is able to fulfil different requirements regarding the tolerance for haplotype errors, or uncertain marker-phases.

    Download full text (pdf)
    fulltext
  • 32.
    Bhawe, Harshal Kunal
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Systematic Biology. Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Phylogenomics of Ascetosporea2022Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Ascetosporea is a class of poorly studied unicellular eukaryotes that function as parasites of marine invertebrates. These parasites cause mass mortality events in aquaculture species such as oysters and mussels. The economic importance of these aquaculture species should lead to more attention on the genomics of Ascetosporea and their place on the evolutionary tree of life. With the onset of global warming and rising sea levels and temperatures, many emerging pathogens have been seen and until these are sequenced and analysed, it is difficult to make any conclusions about their relationships and evolution. As there aren’t many genomes and transcriptomes available for Ascetosporea, their position in the larger eukaryotic tree of life remains hypothetical. To attempt to remedy this lack of information, the Burki lab has recently generated sequencing data through sample collection and sequencing for these organisms (genomes and transcriptomes).

    A curated dataset of the various eukaryotic species was previously created and newly sampled and sequenced Ascetosporean genomes of Paramarteilia sp., Marteilia pararefringens, Paramikrocytos canceri, etc. from multiple sampling locations like Ireland, Norway, Sweden, and the UK were included. These could increase the genomic and transcriptomic data available for Ascetosporea and help to resolve the relationships within Ascetosporea. A few reasons why this group has not yet been placed on the tree of life are that the samples are from host tissue, which makes it difficult to sequence these parasites. These Ascetosporeans have also been seen to be very fast-evolving.

    After building phylogenetic relationships with single gene trees to allow for the identification of possible contaminants and paralogs, it was seen that there was a lot of contamination in Ascetosporea, due to the sampling being from host tissue material (hosts are open to the environment). After cleaning and filtering the possible contaminated genes, the trees were remade and a possible link between a fungal group called Microsporidia and Ascetosporea was observed in a few genes. This was hypothesized to be lateral gene transfer between the two groups resulting from their similar lifestyles and infection of invertebrates.

    There were complications like contamination and short blast hits that arose during analysis, and these could be caused by problems by fragmentation in the genome. This fragmentation could have negative effects on genome annotation predictions and consequently phylogenetic and phylogenomic analysis. Due to this and the challenging nature of collecting samples, the read coverage for the genomes is low but it can be used to perform phylogenetic and phylogenomic studies using currently available data and methods. Another expected result was that the sequenced data had contaminants, and a thorough and comprehensive search would have to be conducted on a dataset-wide level to remove any contaminants.

    Download full text (pdf)
    fulltext
  • 33.
    Blomberg, Louise
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Characterization and contribution of Plavaka elements in the genome of Lactarius deliciosus (Milk-cap mushrooms).2023Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Download full text (pdf)
    fulltext
  • 34.
    Bornelöv, Susanne
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    Rule-based Models of Transcriptional Regulation and Complex Diseases: Applications and Development2014Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    As we gain increased understanding of genetic disorders and gene regulation more focus has turned towards complex interactions. Combinations of genes or gene and environmental factors have been suggested to explain the missing heritability behind complex diseases. Furthermore, gene activation and splicing seem to be governed by a complex machinery of histone modification (HM), transcription factor (TF), and DNA sequence signals. This thesis aimed to apply and develop multivariate machine learning methods for use on such biological problems. Monte Carlo feature selection was combined with rule-based classification to identify interactions between HMs and to study the interplay of factors with importance for asthma and allergy.

    Firstly, publicly available ChIP-seq data (Paper I) for 38 HMs was studied. We trained a classifier for predicting exon inclusion levels based on the HMs signals. We identified HMs important for splicing and illustrated that splicing could be predicted from the HM patterns. Next, we applied a similar methodology on data from two large birth cohorts describing asthma and allergy in children (Paper II). We identified genetic and environmental factors with importance for allergic diseases which confirmed earlier results and found candidate gene-gene and gene-environment interactions.

    In order to interpret and present the classifiers we developed Ciruvis, a web-based tool for network visualization of classification rules (Paper III). We applied Ciruvis on classifiers trained on both simulated and real data and compared our tool to another methodology for interaction detection using classification. Finally, we continued the earlier study on epigenetics by analyzing HM and TF signals in genes with or without evidence of bidirectional transcription (Paper IV). We identified several HMs and TFs with different signals between unidirectional and bidirectional genes. Among these, the CTCF TF was shown to have a well-positioned peak 60-80 bp upstream of the transcription start site in unidirectional genes.

    List of papers
    1. Combinations of histone modifications mark exon inclusion levels
    Open this publication in new window or tab >>Combinations of histone modifications mark exon inclusion levels
    2012 (English)In: PLOS ONE, E-ISSN 1932-6203, Vol. 7, no 1, article id e29911Article in journal (Refereed) Published
    Abstract [en]

    Splicing is a complex process regulated by sequence at the classical splice sites and other motifs in exons and introns with an enhancing or silencing effect. In addition, specific histone modifications on nucleosomes positioned over the exons have been shown to correlate both positively and negatively with exon expression. Here, we trained a model of "IF … THEN …" rules to predict exon inclusion levels in a transcript from histone modification patterns. Furthermore, we showed that combinations of histone modifications, in particular those residing on nucleosomes preceding or succeeding the exon, are better predictors of exon inclusion levels than single modifications. The resulting model was evaluated with cross validation and had an average accuracy of 72% for 27% of the exons, which demonstrates that epigenetic signals substantially mark alternative splicing.

    National Category
    Cell and Molecular Biology
    Identifiers
    urn:nbn:se:uu:diva-175875 (URN)10.1371/journal.pone.0029911 (DOI)000312662100045 ()22242188 (PubMedID)
    Funder
    Knut and Alice Wallenberg FoundationSwedish Foundation for Strategic Research Swedish Research CouncilSwedish Cancer Society
    Available from: 2012-06-13 Created: 2012-06-13 Last updated: 2021-06-14Bibliographically approved
    2. Rule-Based Models of the Interplay between Genetic and Environmental Factors in Childhood Allergy
    Open this publication in new window or tab >>Rule-Based Models of the Interplay between Genetic and Environmental Factors in Childhood Allergy
    Show others...
    2013 (English)In: PLOS ONE, E-ISSN 1932-6203, Vol. 8, no 11, p. e80080-Article in journal (Refereed) Published
    Abstract [en]

    Both genetic and environmental factors are important for the development of allergic diseases. However, a detailed understanding of how such factors act together is lacking. To elucidate the interplay between genetic and environmental factors in allergic diseases, we used a novel bioinformatics approach that combines feature selection and machine learning. In two materials, PARSIFAL (a European cross-sectional study of 3113 children) and BAMSE (a Swedish birth-cohort including 2033 children), genetic variants as well as environmental and lifestyle factors were evaluated for their contribution to allergic phenotypes. Monte Carlo feature selection and rule based models were used to identify and rank rules describing how combinations of genetic and environmental factors affect the risk of allergic diseases. Novel interactions between genes were suggested and replicated, such as between ORMDL3 and RORA, where certain genotype combinations gave odds ratios for current asthma of 2.1 (95% CI 1.2-3.6) and 3.2 (95% CI 2.0-5.0) in the BAMSE and PARSIFAL children, respectively. Several combinations of environmental factors appeared to be important for the development of allergic disease in children. For example, use of baby formula and antibiotics early in life was associated with an odds ratio of 7.4 (95% CI 4.5-12.0) of developing asthma. Furthermore, genetic variants together with environmental factors seemed to play a role for allergic diseases, such as the use of antibiotics early in life and COL29A1 variants for asthma, and farm living and NPSR1 variants for allergic eczema. Overall, combinations of environmental and life style factors appeared more frequently in the models than combinations solely involving genes. In conclusion, a new bioinformatics approach is described for analyzing complex data, including extensive genetic and environmental information. Interactions identified with this approach could provide useful hints for further in-depth studies of etiological mechanisms and may also strengthen the basis for risk assessment and prevention.

    National Category
    Medical and Health Sciences
    Identifiers
    urn:nbn:se:uu:diva-213817 (URN)10.1371/journal.pone.0080080 (DOI)000327311900057 ()
    Available from: 2014-01-05 Created: 2014-01-04 Last updated: 2021-06-14Bibliographically approved
    3. Ciruvis: a web-based tool for rule networks and interaction detection using rule-based classifiers
    Open this publication in new window or tab >>Ciruvis: a web-based tool for rule networks and interaction detection using rule-based classifiers
    2014 (English)In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 15, p. 139-Article in journal (Refereed) Published
    Abstract [en]

    Background: The use of classification algorithms is becoming increasingly important for the field of computational biology. However, not only the quality of the classification, but also its biological interpretation is important. This interpretation may be eased if interacting elements can be identified and visualized, something that requires appropriate tools and methods. Results: We developed a new approach to detecting interactions in complex systems based on classification. Using rule-based classifiers, we previously proposed a rule network visualization strategy that may be applied as a heuristic for finding interactions. We now complement this work with Ciruvis, a web-based tool for the construction of rule networks from classifiers made of IF-THEN rules. Simulated and biological data served as an illustration of how the tool may be used to visualize and interpret classifiers. Furthermore, we used the rule networks to identify feature interactions, compared them to alternative methods, and computationally validated the findings. Conclusions: Rule networks enable a fast method for model visualization and provide an exploratory heuristic to interaction detection. The tool is made freely available on the web and may thus be used to aid and improve rule-based classification.

    Keywords
    Visualization, Rules, Interactions, Interaction detection, Classification, Rule-based classification
    National Category
    Biochemistry and Molecular Biology
    Identifiers
    urn:nbn:se:uu:diva-228027 (URN)10.1186/1471-2105-15-139 (DOI)000336679600001 ()
    Available from: 2014-07-02 Created: 2014-07-02 Last updated: 2024-01-17Bibliographically approved
    4. Different distribution of histone modifications in genes with unidirectional and bidirectional transcription and a role of CTCF and cohesin in directing transcription
    Open this publication in new window or tab >>Different distribution of histone modifications in genes with unidirectional and bidirectional transcription and a role of CTCF and cohesin in directing transcription
    2015 (English)In: BMC Genomics, E-ISSN 1471-2164, Vol. 16, article id 300Article in journal (Refereed) Published
    Abstract [en]

    Background: Several post-translational histone modifications are mainly found in gene promoters and are associated with the promoter activity. It has been hypothesized that histone modifications regulate the transcription, as opposed to the traditional view with transcription factors as the key regulators. Promoters of most active genes do not only initiate transcription of the coding sequence, but also a substantial amount of transcription of the antisense strand upstream of the transcription start site (TSS). This promoter feature has generally not been considered in previous studies of histone modifications and transcription factor binding.

    Results: We annotated protein-coding genes as bi- or unidirectional depending on their mode of transcription and compared histone modifications and transcription factor occurrences between them. We found that H3K4me3, H3K9ac, and H3K27ac were significantly more enriched upstream of the TSS in bidirectional genes compared with the unidirectional ones. In contrast, the downstream histone modification signals were similar, suggesting that the upstream histone modifications might be a consequence of transcription rather than a cause. Notably, we found well-positioned CTCF and RAD21 peaks approximately 60-80 bp upstream of the TSS in the unidirectional genes. The peak heights were related to the amount of antisense transcription and we hypothesized that CTCF and cohesin act as a barrier against antisense transcription.

    Conclusions: Our results provide insights into the distribution of histone modifications at promoters and suggest a novel role of CTCF and cohesin as regulators of transcriptional direction.

    Keywords
    Antisense transcription, CTCF, RAD21, Cohesin, CAGE, Epigenetics, Transcription factor, Histone modification
    National Category
    Bioinformatics and Systems Biology
    Identifiers
    urn:nbn:se:uu:diva-230158 (URN)10.1186/s12864-015-1485-5 (DOI)000355166000001 ()25881024 (PubMedID)
    Available from: 2014-08-19 Created: 2014-08-19 Last updated: 2024-01-17Bibliographically approved
    Download full text (pdf)
    fulltext
    Download (jpg)
    presentationsbild
  • 35.
    Bornelöv, Susanne
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    Komorowski, Jan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    Wadelius, Claes
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology.
    Different distribution of histone modifications in genes with unidirectional and bidirectional transcription and a role of CTCF and cohesin in directing transcription2015In: BMC Genomics, E-ISSN 1471-2164, Vol. 16, article id 300Article in journal (Refereed)
    Abstract [en]

    Background: Several post-translational histone modifications are mainly found in gene promoters and are associated with the promoter activity. It has been hypothesized that histone modifications regulate the transcription, as opposed to the traditional view with transcription factors as the key regulators. Promoters of most active genes do not only initiate transcription of the coding sequence, but also a substantial amount of transcription of the antisense strand upstream of the transcription start site (TSS). This promoter feature has generally not been considered in previous studies of histone modifications and transcription factor binding.

    Results: We annotated protein-coding genes as bi- or unidirectional depending on their mode of transcription and compared histone modifications and transcription factor occurrences between them. We found that H3K4me3, H3K9ac, and H3K27ac were significantly more enriched upstream of the TSS in bidirectional genes compared with the unidirectional ones. In contrast, the downstream histone modification signals were similar, suggesting that the upstream histone modifications might be a consequence of transcription rather than a cause. Notably, we found well-positioned CTCF and RAD21 peaks approximately 60-80 bp upstream of the TSS in the unidirectional genes. The peak heights were related to the amount of antisense transcription and we hypothesized that CTCF and cohesin act as a barrier against antisense transcription.

    Conclusions: Our results provide insights into the distribution of histone modifications at promoters and suggest a novel role of CTCF and cohesin as regulators of transcriptional direction.

    Download full text (pdf)
    fulltext
  • 36.
    Breton, Gwenna
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Human Evolution.
    Johansson, Anna C. V.
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology.
    Sjödin, Per
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Human Evolution.
    Schlebusch, Carina
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Human Evolution. Uppsala University, Science for Life Laboratory, SciLifeLab. Univ Johannesburg, Palaeo Res Inst, POB 524, ZA-2006 Auckland Pk, South Africa.
    Jakobsson, Mattias
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Human Evolution. Univ Johannesburg, Palaeo Res Inst, POB 524, ZA-2006 Auckland Pk, South Africa.
    Comparison of sequencing data processing pipelines and application to underrepresented African human populations2021In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 22, no 1, article id 488Article in journal (Refereed)
    Abstract [en]

    Background Population genetic studies of humans make increasing use of high-throughput sequencing in order to capture diversity in an unbiased way. There is an abundance of sequencing technologies, bioinformatic tools and the available genomes are increasing in number. Studies have evaluated and compared some of these technologies and tools, such as the Genome Analysis Toolkit (GATK) and its "Best Practices" bioinformatic pipelines. However, studies often focus on a few genomes of Eurasian origin in order to detect technical issues. We instead surveyed the use of the GATK tools and established a pipeline for processing high coverage full genomes from a diverse set of populations, including Sub-Saharan African groups, in order to reveal challenges from human diversity and stratification. Results We surveyed 29 studies using high-throughput sequencing data, and compared their strategies for data pre-processing and variant calling. We found that processing of data is very variable across studies and that the GATK "Best Practices" are seldom followed strictly. We then compared three versions of a GATK pipeline, differing in the inclusion of an indel realignment step and with a modification of the base quality score recalibration step. We applied the pipelines on a diverse set of 28 individuals. We compared the pipelines in terms of count of called variants and overlap of the callsets. We found that the pipelines resulted in similar callsets, in particular after callset filtering. We also ran one of the pipelines on a larger dataset of 179 individuals. We noted that including more individuals at the joint genotyping step resulted in different counts of variants. At the individual level, we observed that the average genome coverage was correlated to the number of variants called. Conclusions We conclude that applying the GATK "Best Practices" pipeline, including their recommended reference datasets, to underrepresented populations does not lead to a decrease in the number of called variants compared to alternative pipelines. We recommend to aim for coverage of > 30X if identifying most variants is important, and to work with large sample sizes at the variant calling stage, also for underrepresented individuals and populations.

    Download full text (pdf)
    FULLTEXT01
  • 37.
    Buinovskaja, Greta
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Identifying structural variants from plant short-read sequencing data2022Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Download full text (pdf)
    Master_Thesis_Greta_Buinovskaja
  • 38.
    Byström, Petter
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Evaluating the Suitability of Genetically Engineered Cell Models in Morphological Cell Profiling Experiments2024Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Morphological Cell Profiling has gained substantial traction in drug discovery since its inception in 2016, primarily due to its cost-effectiveness and high-throughput capabilities compared to traditional drug discovery methods. This technique captures comprehensive cellular responses to treatments, enabling the identification of novel drug effects and mechanisms of action. 

    In this thesis, an alternative experimental approach to morphological profiling utilizing genetically engineered cells is proposed. It presents a significant advancement by eliminating the need for labor-intensive staining procedures through genetic modifications that express fluorescent proteins in targeted organelles. This innovation reduces preparation time and costs, making high-content screening more accessible, particularly for small-scale laboratories, and more effective at screening large libraries of compounds. 

    Comparative analysis reveals that the genetic cell models generally exhibits higher grit scores and better separation of compounds from controls than traditional Cell Painting methods. Although Cell Painting often achieves superior inter-compound distance separation, the genetic cell models demonstrate a more consistent capacity for clustering compounds, suggesting an enhanced ability to identify phenotypic shifts. Moreover, while the cell models and Cell Painting show comparable performance in Mechanism of Action (MoA) classification, differences in the sensitivity to specific cellular features hint at the genetic model's potential advantages in certain contexts. 

    The genetic model's reliance on easily transfected cell lines, such as U-2 OS, enables the development of specific disease models, although it faces challenges with more sensitive cell lines. Nevertheless, the versatility and reduced operational costs associated with the genetic cell models, combined with its capability to produce detailed morphological data, position it as a promising tool for drug discovery and disease research. Future work should focus on expanding the genetic cell models to incorporate additional organelles, thereby enhancing its comprehensiveness and applicability in morphological profiling. 

    The full text will be freely available from 2025-06-01 15:27
  • 39.
    Bălan, Mirela
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology. Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Integrative bioinformatic analysis of SARs-CoV-2 data2021Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Download full text (pdf)
    fulltext
  • 40.
    Camargo Romera, Paula
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Computational prediction of cell-cell interactions in the brain-tumour microenvironment2023Independent thesis Advanced level (degree of Master (Two Years)), 30 credits / 45 HE creditsStudent thesis
    Abstract [en]

    Glioblastoma is the fastest-growing, and the most common malignant brain tumour in adults. It is normally treated with surgery and radio- or chemotherapy, but the approximate life expectancy is of 15 months with a high probability of cancer recurring. Therefore, there is a need for decreasing its severity. Bulk and single-cell RNA sequencing allow the identification of cellular states in tumours affected by cell-intrinsic and extrinsic factors. Four different cellular states have been identified in glioblastoma: neural progenitor-like, oligodendrocyte progenitor-like, astrocyte-like, and mesenchymal-like. As glioblastoma is an immunosuppressive tumour, it can alter the immune system and increase the tumour's immune escaping by segregating immunosuppressive factors or interacting with the brain microenvironment.Two datasets were used in this study to explore if the localization of the tumour in the brain microenvironment and the tendency of glioblastomas to activate microglial cells are due to particular ligand-receptor interactions. Data quality control was applied to both datasets and SingleCellSignalR and CellphoneDB packages were used to predict the possible interactions. A total of seven experiments were designed for this study. The first dataset, GBmap, allowed us to do a comparison between tumour cells and microglia, tumour cells and other cell types in the brain, and the four cellular states of glioblastoma with microglia and macrophages. Next, healthy microglia from GBmap was used to compare with the tumour bulk data from the second dataset, HGCC. The bootstrap technique was performed to compare bulk data vs single-cell data, and a comparison between tumour cells and microglia or other cell types was analysed.Results showed specific and shared interactions between cell types or cellular states, revealing the different localization of the tumour cells depends on the expressed ligand-receptor pairs. Also, a total of four patterns of interactions were found in the 50 samples to have a different tendency to activate microglial cells, which are promising results to further explore drugs to interfere with or how these interactions are related to patient survival. Furthermore, even if glioblastoma is a heterogenous disease, more interactions were predicted with microglial/macrophage cells without a uniform pattern between patients, and therefore, this study is a starting point upon which further in vitro studies would be needed to study the predicted interactions as potential targets to stop the progression of this type of cancer.

    Download full text (pdf)
    fulltext
  • 41.
    Cao, Shuowen
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    The expression and regulatory functions of Ms1 RNA in Mycobacterium marinum2021Independent thesis Advanced level (degree of Master (Two Years)), 30 credits / 45 HE creditsStudent thesis
    Abstract [en]

    Ms1 RNA (Mycobacterium smegmatis small RNA 1) is a recently characterized small non-coding RNA. It appears in some Mycobacterium species and exhibits a stress-related expression pattern. Despite its essential regulatory function, there is still much to learn about the expression, functions, and structure. Here we quantify the Ms1 RNA expression levels in different growth phases using the model system Mycobacterium marinum (Mmar) and show that the high Ms1 RNA level in the stationary phase is due to high expression level rather than accumulation. The data also suggest that the aceA gene is downregulated by Ms1 RNA in the exponential phase and upregulated in the stationary phase while the alkB_2 gene is upregulated by Ms1 RNA in all phases. Furthermore, we identify Ms1 RNA functional regions by overexpressing the mutant Ms1 RNA genes and then observing the downstream regulatory effects. As a result, all 3 mutations changed the general expression patterns or the regulations of the aceA or alkB_2 gene, which indicates that the 3 regions (29~41, 81~103, and 259~295 bases) are potential Ms1 RNA functional regions. It was also attempted to compensate for the knock-out of the genomic Ms1 RNA gene by a vector-carried Ms1 RNA under the control of a Tet inducible promoter but not a single colony was obtained. Apart from that, we also generated large amounts of Ms1 RNA by T7 in vitro expression which can be used for structure probing.

    Download (pdf)
    popular summary
  • 42.
    Capuccini, Marco
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Larsson, Anders
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational Biology and Bioinformatics.
    Carone, Matteo
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Novella, Jon Ander
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational Biology and Bioinformatics.
    Sadawi, Noureddin
    Gao, Jianliang
    Toor, Salman
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences. Uppsala University, Science for Life Laboratory, SciLifeLab.
    On-demand virtual research environments using microservices2019In: PeerJ Computer Science, E-ISSN 2376-5992, Vol. 5, article id e232Article in journal (Refereed)
  • 43.
    Carlsson, Lars
    et al.
    Safety Assessment, AstraZeneca Research & Development.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Adams, Samuel
    Unilever Centre for Molecular Sciences Informatics, University Chemical Laboratory, Cambridge.
    Glen, Robert
    Unilever Centre for Molecular Sciences Informatics, University Chemical Laboratory, Cambridge.
    Boyer, Scott
    Safety Assessment, AstraZeneca Research & Development.
    Use of Historic Metabolic Biotransformation Data as a Means of Anticipating Metabolic Sites Using MetaPrint2D and Bioclipse2010In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 11, p. 362-Article in journal (Refereed)
    Abstract [en]

    Background: Predicting metabolic sites is important in the drug discovery process to aid in rapid compound optimisation. No interactive tool exists and most of the useful tools are quite expensive.Results: Here a fast and reliable method to analyse ligands and visualise potential metabolic sites is presented which is based on annotated metabolic data, described by circular fingerprints. The method is available via the graphical workbench Bioclipse, which is equipped with advanced features in cheminformatics.Conclusions: Due to the speed of predictions (less than 50 ms per molecule), scientists can get real time decision support when editing chemical structures. Bioclipse is a rich client, which means that all calculations are performed on the local computer and do not require network connection. Bioclipse and MetaPrint2D are free for all users, released under open source licenses, and available from http://www.bioclipse.net.

  • 44.
    Carlsson, Lars
    et al.
    AstraZeneca R&D.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Boyer, Scott
    AstraZeneca R&D.
    Model building in Bioclipse Decision Support applied to open datasets2012In: Toxicology Letters, ISSN 0378-4274, E-ISSN 1879-3169, Vol. 211, no Suppl., p. S62-Article in journal (Refereed)
    Abstract [en]

    Bioclipse Decision Support (DS) is a system capable of building predictive models of any collection of SAR data, and making them available in a simple user interface based on Bioclipse (www.bioclipse.net).

    The method is fast and uses Faulon Signatures as chemical descriptors together with a Support Vector Machine algorithm for QSAR model building. A key feature is the capability to visualize and interpret results by highlighting the substructures which contributed most to the prediction. This, together with very fast predictions, allows for editing chemical structures with instantly updated results.

    We here present the results from applying Bioclipse Decision Support to several open QSAR data sets, including endpoints from OpenTox and PubChem. The results show how to extract data from the sources and to build models which can be integrated with user specific models.

  • 45.
    Cavalli, Marco
    et al.
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Medicinsk genetik och genomik. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Baltzer, Nicholas
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational Biology and Bioinformatics.
    Pan, Gang
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Medicinsk genetik och genomik. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Walls, Jose Ramon Barcenas
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Medicinsk genetik och genomik.
    Garbulowska, Karolina Smolinska
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational Biology and Bioinformatics.
    Kumar, Chanchal
    AstraZeneca, Gothenburg, Sweden.
    Skrtic, Stanko
    AstraZeneca, Gothenburg, Sweden.
    Komorowski, Jan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational Biology and Bioinformatics. Polish Acad Sci, Inst Comp Sci, Warsaw, Poland.
    Wadelius, Claes
    Uppsala University, Science for Life Laboratory, SciLifeLab. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Medicinsk genetik och genomik.
    Studies of liver tissue identify functional gene regulatory elements associated to gene expression, type 2 diabetes, and other metabolic diseases2019In: Human Genomics, ISSN 1473-9542 , E-ISSN 1479-7364 , Vol. 13, article id 20Article in journal (Refereed)
    Abstract [en]

    Background:

    Genome-wide association studies (GWAS) of diseases and traits have found associations to gene regions but not the functional SNP or the gene mediating the effect. Difference in gene regulatory signals can be detected using chromatin immunoprecipitation and next-gen sequencing (ChIP-seq) of transcription factors or histone modifications by aligning reads to known polymorphisms in individual genomes. The aim was to identify such regulatory elements in the human liver to understand the genetics behind type 2 diabetes and metabolic diseases.

    Methods:

    The genome of liver tissue was sequenced using 10X Genomics technology to call polymorphic positions. Using ChIP-seq for two histone modifications, H3K4me3 and H3K27ac, and the transcription factor CTCF, and our established bioinformatics pipeline, we detected sites with significant difference in signal between the alleles.

    Results:

    We detected 2329 allele-specific SNPs (AS-SNPs) including 25 associated to GWAS SNPs linked to liver biology, e.g., 4 AS-SNPs at two type 2 diabetes loci. Two hundred ninety-two AS-SNPs were associated to liver gene expression in GTEx, and 134 AS-SNPs were located on 166 candidate functional motifs and most of them in EGR1-binding sites.

    Conclusions:

    This study provides a valuable collection of candidate liver regulatory elements for further experimental validation.

    Download full text (pdf)
    FULLTEXT01
  • 46. Claesson, Alf
    et al.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    On Mechanisms of Reactive Metabolite Formation from Drugs2013In: Mini-Reviews in medical chemistry, ISSN 1389-5575, E-ISSN 1875-5607, Vol. 13, no 5, p. 720-729Article in journal (Refereed)
    Abstract [en]

    Idiosyncratic adverse drug reactions (IADRs) cause a broad range of clinically severe conditions of which drug induced liver injury (DILI) in particular is one of the most frequent causes of safety-related drug withdrawals. The underlying cause is almost invariably formation of reactive metabolites (RM) which by attacking macromolecules induce organ injuries. Attempts are being made in the pharmaceutical industry to lower the risk of selecting unfit compounds as clinical candidates. Approaches vary but do not seem to be overly successful at the initial design/synthesis stage. We review here the most frequent categories of mechanisms for RM formation and propose that many cases of RMs encountered within early ADME screening can be foreseen by applying chemical and metabolic knowledge. We also mention a web tool, SpotRM, which can be used for efficient look-up and learning about drugs that have recognized IADRs likely caused by RM formation.

  • 47.
    Coulier, Adrien
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Hellander, Stefan
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    Hellander, Andreas
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.
    A multiscale compartment-based model of stochastic gene regulatory networks using hitting-time analysis2021In: Journal of Chemical Physics, ISSN 0021-9606, E-ISSN 1089-7690, Vol. 154, no 18, article id 184105Article in journal (Refereed)
    Abstract [en]

    Spatial stochastic models of single cell kinetics are capable of capturing both fluctuations in molecular numbers and the spatial dependencies of the key steps of intracellular regulatory networks. The spatial stochastic model can be simulated both on a detailed microscopic level using particle tracking and on a mesoscopic level using the reaction–diffusion master equation. However, despite substantial progress on simulation efficiency for spatial models in the last years, the computational cost quickly becomes prohibitively expensive for tasks that require repeated simulation of thousands or millions of realizations of the model. This limits the use of spatial models in applications such as multicellular simulations, likelihood-free parameter inference, and robustness analysis. Further approximation of the spatial dynamics is needed to accelerate such computational engineering tasks. We here propose a multiscale model where a compartment-based model approximates a detailed spatial stochastic model. The compartment model is constructed via a first-exit time analysis on the spatial model, thus capturing critical spatial aspects of the fine-grained simulations, at a cost close to the simple well-mixed model. We apply the multiscale model to a canonical model of negative-feedback gene regulation, assess its accuracy over a range of parameters, and demonstrate that the approximation can yield substantial speedups for likelihood-free parameter inference.

    Download full text (pdf)
    fulltext
  • 48.
    Csombordi, Rajmund
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Metabolomics database resolver2020Independent thesis Advanced level (degree of Master (Two Years)), 80 credits / 120 HE creditsStudent thesis
    Abstract [en]

    Metabolomics is a rising field combining bioinformatics and cheminformatics together. A major component of research is having a reliable data source, which usually comes in the form of metabolomic databases. This paper documents arising issues revolving categorizing metabolome compounds within databases, and a possible solution in the form of an R package that is capable of matching up various metabolome identifiers that originate from various metabolome databases. Then, by using this package we reflect on the average coverage of external reference between metabolome databases to highlight the lack of a universal compound primary identifier.

    Download full text (png)
    Thesis planning
    Download (pdf)
    Thesis
    Download (zip)
    Discovery test results csv
    Download (zip)
    R package
    Download (png)
    Algorithm flow diagram
  • 49.
    Cumlin, Tomas
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Filtering of Clinical NGS Data to Improve Low Allele Frequency Variant Calling2022Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Massive parallel sequencing (NGS) is useful in detecting and later classifying somatic driver mutations in cancer tumours. False-positive variants occur in the NGS workflow and they may be mistaken for low frequency somatic cancer mutations in a patient sample. This pushes the need for decreasing the noise rate in the NGS workflow since it may improve the detection of rare allele frequency variants, in particular cancer mutations.

    In this project, the aim was to reduce the level of false-positive variants in an NGS workflow. The scope was limited to looking at substitution errors and their neighbouring nucleotides. Alongside this, it was also a way to understand how different types of substitution errors are distributed in the data, if their frequencies are affected by neighbouring nucleotides and how data processing may affect these substitution rates. A bioinformatic pipeline was set up where a commercially available genomic DNA sample with known variants was subjected to different trimming and filtering settings. The goal was to reduce the substitution error rate as much as possible, without removing any true variants from the data. The optimised settings were trimming the sequencing reads with 5 bp from the tail and filtering sequencing reads that contained 5 or more substitutions. Three additional samples, whereof two were clinical and the third commercial, were tested with these settings.

    The results showed that in all samples, C:G>T:A substitutions were of a higher frequency compared to the rest of the substitution types. For all samples, A:T>C:G substitutions, where the neighbouring nucleotide was a C or a G on each side, had a higher frequency compared to A:T>C:G substitutions with other neighbouring nucleotides on both sides. Those substitution types were especially targeted by the trimming. For the two commercial samples, substitutions that resulted in the nucleotide combinations >XAA or >XTT were of a higher frequency compared to the same substitution types that did not result in those nucleotide combinations. Filtering reads with 5 or more substitutions particularly targeted these substitution types. Consequently, filtering had a greater effect on the commercial samples, compared to the clinical samples. Overall, trimming and filtering helped reduce transversions more than the transitions, increasing the transition/transversion ratio after processing the data.

    The results suggest that trimming and filtering can be a useful method to computationally reduce the transversion errors introduced in an NGS workflow, but transition errors to a lesser extent, in particular A:T>G:C transitions. To confirm these findings, more samples should be tested using this methodology. To better understand the effect of trimming and filtering on variant calling, the scope could in the future be expanded to also look at small insertions and deletions.

    Download full text (pdf)
    fulltext
  • 50. Cvijovic, Marija
    et al.
    Almquist, Joachim
    Hagmar, Jonas
    Hohmann, Stefan
    Kaltenbach, Hans-Michael
    Klipp, Edda
    Krantz, Marcus
    Mendes, Pedro
    Nelander, Sven
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Cancer and Vascular Biology.
    Nielsen, Jens
    Pagnani, Andrea
    Przulj, Natasa
    Raue, Andreas
    Stelling, Joerg
    Stoma, Szymon
    Tobin, Frank
    Wodke, Judith A. H.
    Zecchina, Riccardo
    Jirstrand, Mats
    Bridging the gaps in systems biology2014In: Molecular Genetics and Genomics, ISSN 1617-4615, E-ISSN 1617-4623, Vol. 289, no 5, p. 727-734Article, review/survey (Refereed)
    Abstract [en]

    Systems biology aims at creating mathematical models, i.e., computational reconstructions of biological systems and processes that will result in a new level of understanding-the elucidation of the basic and presumably conserved "design" and "engineering" principles of biomolecular systems. Thus, systems biology will move biology from a phenomenological to a predictive science. Mathematical modeling of biological networks and processes has already greatly improved our understanding of many cellular processes. However, given the massive amount of qualitative and quantitative data currently produced and number of burning questions in health care and biotechnology needed to be solved is still in its early phases. The field requires novel approaches for abstraction, for modeling bioprocesses that follow different biochemical and biophysical rules, and for combining different modules into larger models that still allow realistic simulation with the computational power available today. We have identified and discussed currently most prominent problems in systems biology: (1) how to bridge different scales of modeling abstraction, (2) how to bridge the gap between topological and mechanistic modeling, and (3) how to bridge the wet and dry laboratory gap. The future success of systems biology largely depends on bridging the recognized gaps.

1234567 1 - 50 of 403
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf