uu.seUppsala University Publications
Change search
Refine search result
12345 1 - 50 of 203
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Ahlström, Anna
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Testing the specificity of the pBAD arabinose reporter2017Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    The project highlights Salmonella enterica subspecies enterica serovar Typhimurium (S. Tm)'s ability to metabolize simple sugars released from dead commensal bacteria, by using the pBAD (araBAD promoter) system as a reporter of L-arabinose availability. Using bioinformatics and homology of conserved L-arabinose transporter genes shared in Escherichia coli K12 (E. coli) and S. Tm, we aimed to create a S. Tm mutant strain unable to obtain L-arabinose from it environment. During the projects course of time it was discovered that L-arabinose transporters are not a shared gene trait between E. coli and S. Tm, and that putative L-arabinose transporter orthologues may exists in the S. Tm genome.

    The full text will be freely available from 2018-06-30 15:30
  • 2.
    Ajawatanawong, Pravech
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Systematic Biology.
    Mine the Gaps: Evolution of Eukaryotic Protein Indels and their Application for Testing Deep Phylogeny2014Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Insertions/deletions (indels) are potentially powerful evolutionary markers, but little is known about their evolution and few tools exist to effectively study them. To address this, I developed SeqFIRE, a tool for automated identification and extraction of indels from protein multiple sequence alignments. The program also extracts conserved alignment blocks, thus covering all major steps in preparing multiple sequence alignments for phylogenetic analysis.

    I then used SeqFIRE to build an indel database, using 299 single copy proteins from a broad taxonomic sampling of mainly multicellular eukaryotes. A total of 4,707 indels were extracted, of which 901 are simple (one genetic event) and 3,806 are complex (multiple events). The most abundant indels are single amino acid simple indels. Indel frequency decreases exponentially with length and shows a linear relationship with host protein size. Singleton indels reveal a strong bias towards insertions (2.31 x deletions on average). These analyses also identify 43 indels marking major clades in Plantae and Fungi (clade defining indels or CDIs), but none for Metazoa.

    In order to study the 3806 complex indels they were first classified by number of states. Analysis of the 2-state complex and simple indels combined (“bi-state indels”) confirms that insertions are over 2.5 times as frequent as deletions. Three-quarters of the complex indels had three-nine states (“slightly complex indels”). A tree-assisted search method was developed allowing me to identify 1,010 potential CDIs supporting all examined major branches of Plantae and Fungi.

    Forty-two proteins were also found to host complex indel CDIs for the deepest branches of Metazoa. After expanding the taxon set for these proteins, I identified a total of 49 non-bilaterian specific CDIs. Parsimony analysis of these indels places Ctenophora as sister taxon to all other Metazoa including Porifera. Six CDIs were also found placing Placozoa as sister to Bilateria. I conclude that slightly complex indels are a rich source of CDIs, and my tree-assisted search strategy could be automated and implemented in the program SeqFIRE to facilitate their discovery. This will have important implications for mining the phylogenomic content of the vast resource of protist genome data soon to become available.

    List of papers
    1. SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments
    Open this publication in new window or tab >>SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments
    Show others...
    2012 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 40, no W1, W340-W347 p.Article in journal (Refereed) Published
    Abstract [en]

    Analyses of multiple sequence alignments generally focus on well-defined conserved sequence blocks, while the rest of the alignment is largely ignored or discarded. This is especially true in phylogenomics, where large multigene datasets are produced through automated pipelines. However, some of the most powerful phylogenetic markers have been found in the variable length regions of multiple alignments, particularly insertions/deletions (indels) in protein sequences. We have developed Sequence Feature and Indel Region Extractor (SeqFIRE) to enable the automated identification and extraction of indels from protein sequence alignments. The program can also extract conserved blocks and identify fast evolving sites using a combination of conservation and entropy. All major variables can be adjusted by the user, allowing them to identify the sets of variables most suited to a particular analysis or dataset. Thus, all major tasks in preparing an alignment for further analysis are combined in a single flexible and user-friendly program. The output includes a numbered list of indels, alignments in NEXUS format with indels annotated or removed and indel-only matrices. SeqFIRE is a user-friendly web application, freely available online at www.seqfire.org/.

    Keyword
    Indels, Alignment, Conserved blocks
    National Category
    Bioinformatics (Computational Biology) Bioinformatics and Systems Biology
    Identifiers
    urn:nbn:se:uu:diva-179937 (URN)10.1093/nar/gks561 (DOI)000306670900056 ()
    Available from: 2012-08-27 Created: 2012-08-27 Last updated: 2014-04-17Bibliographically approved
    2. Evolution of protein indels in plants, animals and fungi
    Open this publication in new window or tab >>Evolution of protein indels in plants, animals and fungi
    2013 (English)In: BMC Evolutionary Biology, ISSN 1471-2148, Vol. 13, 140- p.Article in journal (Refereed) Published
    Abstract [en]

    Background: Insertions/deletions (indels) in protein sequences are useful as drug targets, protein structure predictors, species diagnostics and evolutionary markers. However there is limited understanding of indel evolutionary patterns. We sought to characterize indel patterns focusing first on the major groups of multicellular eukaryotes. Results: Comparisons of complete proteomes from a taxonically broad set of primarily Metazoa, Fungi and Viridiplantae yielded 299 substantial (>250aa) universal, single-copy (in-paralog only) proteins, from which 901 simple (present/absent) and 3,806 complex (multistate) indels were extracted. Simple indels are mostly small (1-7aa) with a most frequent size class of 1aa. However, even these simple looking indels show a surprisingly high level of hidden homoplasy (multiple independent origins). Among the apparently homoplasy-free simple indels, we identify 69 potential clade-defining indels (CDIs) that may warrant closer examination. CDIs show a very uneven taxonomic distribution among Viridiplante (13 CDIs), Fungi (40 CDIs), and Metazoa (0 CDIs). An examination of singleton indels shows an excess of insertions over deletions in nearly all examined taxa. This excess averages 2.31 overall, with a maximum observed value of 7.5 fold. Conclusions: We find considerable potential for identifying taxon-marker indels using an automated pipeline. However, it appears that simple indels in universal proteins are too rare and homoplasy-rich to be used for pure indel-based phylogeny. The excess of insertions over deletions seen in nearly every genome and major group examined maybe useful in defining more realistic gap penalties for sequence alignment. This bias also suggests that insertions in highly conserved proteins experience less purifying selection than do deletions.

    Keyword
    Indels, Rare genomic changes, Phylogeny, Insertion/deletion, Multiple sequence alignment, Eukaryote evolution, Indel profiles
    National Category
    Natural Sciences
    Identifiers
    urn:nbn:se:uu:diva-204971 (URN)10.1186/1471-2148-13-140 (DOI)000321461800001 ()
    Available from: 2013-08-16 Created: 2013-08-13 Last updated: 2014-04-17Bibliographically approved
    3. An automatable method for high throughput analysis of evolutionary patterns in slightly complex indels and its application to the deep phylogeny of Metazoa
    Open this publication in new window or tab >>An automatable method for high throughput analysis of evolutionary patterns in slightly complex indels and its application to the deep phylogeny of Metazoa
    2014 (English)Article in journal (Refereed) Submitted
    Abstract [en]

    Insertions/deletions (indels) in protein sequences are potential powerful evolutionary markers. However, these characters have rarely been explored systematically at deep phylogenetic levels. Previous analyses of simple (2-state) clade defining indels (CDIs) in universal eukaryotic proteins found none to support any major animal clade. We hypothesized that CDIs might still be found in the remaining population of indels, which we term complex indels. Here, we propose a method for analyzing the simplest class of complex indels the “slightly complex indels”, and use these to investigate deep branches in animal phylogeny. Complex indels with two states, called bi-state indels, show similar evolutionary patterns to singleton simple indels and confirms that insertion mutations are more common than deletions. Exploration of CDIs in 2- to 9-state complex indels shows strong support for all examined branches of fungi and Archaeplastida. Surprisingly, we also found CDIs supporting major branches in animals, particular in vertebrates. We then expanded the search to non-bilaterial animals (Porifera, Cnidaria and Ctenophora). The phylogenetic tree reconstructed by CDIs places the Ctenophore Mnemiopsis leidyi as the deepest branch of animals with 6 CDIs support. Trichoplax adhaerens is closely related to the Bilateria. Moreover, the indel phylogeny shows Nematostella vectensis and Hydra magnipapillata are paraphyletic group and position of Cnidarian branches seems to be problematic in the indel phylogeny because of homoplasy. This might be solved if we discover CDIs from animal specific proteins, which emerged after the universal orthologous proteins.Evolutionary Patterns in Slightly Complex Protein Insertions/Deletions (Indels) and Their Application to the Study of Deep Phylogeny in Metazoa

    National Category
    Other Biological Topics
    Identifiers
    urn:nbn:se:uu:diva-216842 (URN)
    Available from: 2014-01-27 Created: 2014-01-27 Last updated: 2014-04-17Bibliographically approved
  • 3.
    Ajawatanawong, Pravech
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Systematic Biology.
    Atkinson, Gemma C.
    Watson-Haigh, Nathan S.
    MacKenzie, Bryony
    Baldauf, Sandra L.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organismal Biology, Systematic Biology.
    SeqFIRE: a web application for automated extraction of indel regions and conserved blocks from protein multiple sequence alignments2012In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 40, no W1, W340-W347 p.Article in journal (Refereed)
    Abstract [en]

    Analyses of multiple sequence alignments generally focus on well-defined conserved sequence blocks, while the rest of the alignment is largely ignored or discarded. This is especially true in phylogenomics, where large multigene datasets are produced through automated pipelines. However, some of the most powerful phylogenetic markers have been found in the variable length regions of multiple alignments, particularly insertions/deletions (indels) in protein sequences. We have developed Sequence Feature and Indel Region Extractor (SeqFIRE) to enable the automated identification and extraction of indels from protein sequence alignments. The program can also extract conserved blocks and identify fast evolving sites using a combination of conservation and entropy. All major variables can be adjusted by the user, allowing them to identify the sets of variables most suited to a particular analysis or dataset. Thus, all major tasks in preparing an alignment for further analysis are combined in a single flexible and user-friendly program. The output includes a numbered list of indels, alignments in NEXUS format with indels annotated or removed and indel-only matrices. SeqFIRE is a user-friendly web application, freely available online at www.seqfire.org/.

  • 4.
    Al Jewari, Caesar
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Examining the Root of the Eukaryotic Tree of Life2017Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Identifying the evolutionary root of eukaryotic tree of life (eToL) is a central problem in systematic biology that has been receiving growing attention. This task has been aided by the development of advanced phylogenetic methods and the availability of large amounts of genomic data from across the tree. Recently, two studies have tried a novel approach to define the eToL root, using euBacteria (instead of the more distantly related Archaea) as the outgroup. The results of these two recent studies are partially overlapping datasets, which produce contradictory results. One study, using mixed eubacterial data (euBac), makes the case for a neozoan-excavate root, while the other study, using alpha-proteobacterial (aP) data, concluded the traditional unikont-bikont root. These two results suggest different theories of early eukaryote evolution. However, there is also evidence of substantial artefacts in these datasets and traces of horizontal gene transfer (HGT), the exchange of DNA between unrelated organisms. This project aims to re-examine the datasets of both publications (61 total protein markers). The work started with updating both datasets with solid new phylogenomic data from the supervisor lab and new publicly available data. I then used these data to systematically investigate the phylogenetic signals of the 61 protein markers across 88 taxa (68 eukaryotes and 20 Bacteria). These were first subjected to preliminary phylogenetic analyses to sort orthologues from paralogues. All orthologues were then combined into a single dataset and subjected to in depth phylogenetic analyses to evaluate the support for various hypotheses. I also investigated potential sources of artefact in the data using traditional and novel methods I devised and developed myself including computer scripts specifically written for this work. I created a pipeline for the data curation process to make it fast and efficient by automating various parts of the workflow, including concatenating the multigene dataset into a super matrix. I estimated the level of incongruence in each dataset, excluded the protein markers that have a strong phylogenetic bias, and reconstructed new datasets. I conclude that the data in hand (protein markers and taxa) contain conflicting and inconsistent phylogenetic signal and that a few proteins can have a very strong effect on the results of the analyses. However, a third possible hypothesis is clearly rejected. This suggests that there are specific artefacts in the data, favouring one or the other of the two remaining hypotheses.

  • 5.
    Alexiou, Athanasios
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Sets of Genes Predict Survival of Glioblastoma Patients2016Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
  • 6.
    Allalou, Amin
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Centre for Image Analysis. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis.
    Pinidiyaarachchi, Amalka
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Centre for Image Analysis. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis.
    Wählby, Carolina
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Centre for Image Analysis. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis.
    Robust signal detection in 3D fluorescence microscopy2010In: Cytometry. Part A, ISSN 1552-4922, Vol. 77A, no 1, 86-96 p.Article in journal (Refereed)
    Abstract [en]

    Robust detection and localization of biomolecules inside cells is of great importance to better understand the functions related to them. Fluorescence microscopy and specific staining methods make biomolecules appear as point-like signals on image data, often acquired in 3D. Visual detection of such point-like signals can be time consuming and problematic if the 3D images are large, containing many, sometimes overlapping, signals. This sets a demand for robust automated methods for accurate detection of signals in 3D fluorescence microscopy. We propose a new 3D point-source signal detection method that is based on Fourier series. The method consists of two parts, a detector, which is a cosine filter to enhance the point-like signals, and a verifier, which is a sine filter to validate the result from the detector. Compared to conventional methods, our method shows better robustness to noise and good ability to resolve signals that are spatially close. Tests on image data show that the method has equivalent accuracy in signal detection in comparison to Visual detection by experts. The proposed method can be used as an efficient point-like signal detection tool for various types of biological 3D image data.

  • 7.
    Alvarsson, Jonathan
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Ligand-based Methods for Data Management and Modelling2015Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Drug discovery is a complicated and expensive process in the billion dollar range. One way of making the drug development process more efficient is better information handling, modelling and visualisation. The majority of todays drugs are small molecules, which interact with drug targets to cause an effect. Since the 1980s large amounts of compounds have been systematically tested by robots in so called high-throughput screening. Ligand-based drug discovery is based on modelling drug molecules. In the field known as Quantitative Structure–Activity Relationship (QSAR) molecules are described by molecular descriptors which are used for building mathematical models. Based on these models molecular properties can be predicted and using the molecular descriptors molecules can be compared for, e.g., similarity. Bioclipse is a workbench for the life sciences which provides ligand-based tools through a point and click interface. 

    The aims of this thesis were to research, and develop new or improved ligand-based methods and open source software, and to work towards making these tools available for users through the Bioclipse workbench. To this end, a series of molecular signature studies was done and various Bioclipse plugins were developed.

    An introduction to the field is provided in the thesis summary which is followed by five research papers. Paper I describes the Bioclipse 2 software and the Bioclipse scripting language. In Paper II the laboratory information system Brunn for supporting work with dose-response studies on microtiter plates is described. In Paper III the creation of a molecular fingerprint based on the molecular signature descriptor is presented and the new fingerprints are evaluated for target prediction and found to perform on par with industrial standard commercial molecular fingerprints. In Paper IV the effect of different parameter choices when using the signature fingerprint together with support vector machines (SVM) using the radial basis function (RBF) kernel is explored and reasonable default values are found. In Paper V the performance of SVM based QSAR using large datasets with the molecular signature descriptor is studied, and a QSAR model based on 1.2 million substances is created and made available from the Bioclipse workbench.

    List of papers
    1. Bioclipse 2: A scriptable integration platform for the life sciences
    Open this publication in new window or tab >>Bioclipse 2: A scriptable integration platform for the life sciences
    Show others...
    2009 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 10, 397- p.Article in journal (Refereed) Published
    Abstract [en]

    Background: Contemporary biological research integrates neighboring scientific domains to answer complex ques- tions in fields such as systems biology and drug discovery. This calls for tools that are intuitive to use, yet flexible to adapt to new tasks.

    Results: Bioclipse is a free, open source workbench with advanced features for the life sciences. Version 2.0 constitutes a complete rewrite of Bioclipse, and delivers a stable, scalable integration platform for developers and an intuitive workbench for end users. All functionality is available both from the graphical user interface and from a built-in novel domain-specific language, supporting the scientist in interdisciplinary research and reproducible analyses through advanced visualization of the inputs and the results. New components for Bioclipse 2 include a rewritten editor for chemical structures, a table for multiple molecules that supports gigabyte-sized files, as well as a graphical editor for sequences and alignments.

    Conclusions: Bioclipse 2 is equipped with advanced tools required to carry out complex analysis in the fields of bio- and cheminformatics. Developed as a Rich Client based on Eclipse, Bioclipse 2 leverages on today’s powerful desktop computers for providing a responsive user interface, but also takes full advantage of the Web and networked (Web/Cloud) services for more demanding calculations or retrieval of data. That Bioclipse 2 is based on an advanced and widely used service platform ensures wide extensibility, and new algorithms, visualizations as well as scripting commands can easily be added. The intuitive tools for end users and the extensible architecture make Bioclipse 2 ideal for interdisciplinary and integrative research. Bioclipse 2 is released under the Eclipse Public License (EPL), a flexible open source license that allows additional plugins to be of any license. Bioclipse 2 is implemented in Java and supported on all major platforms; Source code and binaries are freely available at http://www.bioclipse.net.

    Keyword
    Bioclipse, bioinformatics, cheminformatics, scriptable, script, workbench, life science, platform
    National Category
    Bioinformatics and Systems Biology Pharmaceutical Sciences
    Identifiers
    urn:nbn:se:uu:diva-109304 (URN)10.1186/1471-2105-10-397 (DOI)000273329400001 ()
    Available from: 2009-12-16 Created: 2009-10-13 Last updated: 2015-05-12Bibliographically approved
    2. Brunn: an open source laboratory information system for microplates with a graphical plate layout design process
    Open this publication in new window or tab >>Brunn: an open source laboratory information system for microplates with a graphical plate layout design process
    Show others...
    2011 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 12, no 1, 179Article in journal (Refereed) Published
    Abstract [en]

    Background:

    Compound profiling and drug screening generates large amounts of data and is generally based on microplate assays. Current information systems used for handling this are mainly commercial, closed source, expensive, and heavyweight and there is a need for a flexible lightweight open system for handling plate design, and validation and preparation of data.

    Results:

    A Bioclipse plugin consisting of a client part and a relational database was constructed. A multiple-step plate layout point-and-click interface was implemented inside Bioclipse. The system contains a data validation step, where outliers can be removed, and finally a plate report with all relevant calculated data, including dose-response curves.

    Conclusions:

    Brunn is capable of handling the data from microplate assays. It can create dose-response curves and calculate IC50 values. Using a system of this sort facilitates work in the laboratory. Being able to reuse already constructed plates and plate layouts by starting out from an earlier step in the plate layout design process saves time and cuts down on error sources.

    Keyword
    brunn, microtiter, bioclipse, screening, information system, lis, lims
    National Category
    Pharmacology and Toxicology
    Research subject
    Bioinformatics
    Identifiers
    urn:nbn:se:uu:diva-153210 (URN)10.1186/1471-2105-12-179 (DOI)000292027200001 ()21599898 (PubMedID)
    Available from: 2011-05-09 Created: 2011-05-09 Last updated: 2015-07-22Bibliographically approved
    3. Ligand-Based Target Prediction with Signature Fingerprints
    Open this publication in new window or tab >>Ligand-Based Target Prediction with Signature Fingerprints
    Show others...
    2014 (English)In: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, no 10, 2647-2653 p.Article in journal (Refereed) Published
    Abstract [en]

    When evaluating a potential drug candidate it is desirable to predict target interactions in silico prior to synthesis in order to assess, e.g., secondary pharmacology. This can be done by looking at known target binding profiles of similar compounds using chemical similarity searching. The purpose of this study was to construct and evaluate the performance of chemical fingerprints based on the molecular signature descriptor for performing target binding predictions. For the comparison we used the area under the receiver operating characteristics curve (AUC) complemented with net reclassification improvement (NRI). We created two open source signature fingerprints, a bit and a count version, and evaluated their performance compared to a set of established fingerprints with regards to predictions of binding targets using Tanimoto-based similarity searching on publicly available data sets extracted from ChEMBL. The results showed that the count version of the signature fingerprint performed on par with well-established fingerprints such as ECFP. The count version outperformed the bit version slightly; however, the count version is more complex and takes more computing time and memory to run so its usage should probably be evaluated on a case-by-case basis. The NRI based tests complemented the AUC based ones and showed signs of higher power.

    National Category
    Pharmaceutical Sciences Bioinformatics (Computational Biology)
    Research subject
    Bioinformatics
    Identifiers
    urn:nbn:se:uu:diva-237934 (URN)10.1021/ci500361u (DOI)000343849600004 ()25230336 (PubMedID)
    Funder
    Swedish Research Council, VR-2011-6129eSSENCE - An eScience CollaborationScience for Life Laboratory - a national resource center for high-throughput molecular bioscienceSwedish National Infrastructure for Computing (SNIC)
    Available from: 2014-12-08 Created: 2014-12-08 Last updated: 2015-05-12Bibliographically approved
    4. Benchmarking Study of Parameter Variation When Using Signature Fingerprints Together with Support Vector Machines
    Open this publication in new window or tab >>Benchmarking Study of Parameter Variation When Using Signature Fingerprints Together with Support Vector Machines
    Show others...
    2014 (English)In: Journal of Chemical Information and Modeling, ISSN 1549-9596, Vol. 54, no 11, 3211-3217 p.Article in journal (Refereed) Published
    Abstract [en]

    QSAR modeling using molecular signatures and support vector machines with a radial basis function is increasingly used for virtual screening in the drug discovery field. This method has three free parameters: C, ?, and signature height. C is a penalty parameter that limits overfitting, ? controls the width of the radial basis function kernel, and the signature height determines how much of the molecule is described by each atom signature. Determination of optimal values for these parameters is time-consuming. Good default values could therefore save considerable computational cost. The goal of this project was to investigate whether such default values could be found by using seven public QSAR data sets spanning a wide range of end points and using both a bit version and a count version of the molecular signatures. On the basis of the experiments performed, we recommend a parameter set of heights 0 to 2 for the count version of the signature fingerprints and heights 0 to 3 for the bit version. These are in combination with a support vector machine using C in the range of 1 to 100 and gamma in the range of 0.001 to 0.1. When data sets are small or longer run times are not a problem, then there is reason to consider the addition of height 3 to the count fingerprint and a wider grid search. However, marked improvements should not be expected.

    National Category
    Medical Biotechnology Pharmaceutical Sciences
    Identifiers
    urn:nbn:se:uu:diva-240239 (URN)10.1021/ci500344v (DOI)000345551000017 ()25318024 (PubMedID)
    Funder
    eSSENCE - An eScience CollaborationScience for Life Laboratory - a national resource center for high-throughput molecular bioscience
    Available from: 2015-01-07 Created: 2015-01-06 Last updated: 2015-05-12Bibliographically approved
    5. Large-scale ligand-based predictive modelling using support vector machines
    Open this publication in new window or tab >>Large-scale ligand-based predictive modelling using support vector machines
    Show others...
    2016 (English)In: Journal of Cheminformatics, ISSN 1758-2946, E-ISSN 1758-2946, Vol. 8, 39Article in journal (Refereed) Published
    Abstract [en]

    The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse.

    Keyword
    Predictive modelling; Support vector machine; Bioclipse; Molecular signatures; QSAR
    National Category
    Pharmaceutical Sciences Bioinformatics (Computational Biology)
    Research subject
    Bioinformatics
    Identifiers
    urn:nbn:se:uu:diva-248959 (URN)10.1186/s13321-016-0151-5 (DOI)000381186100001 ()27516811 (PubMedID)
    Funder
    Swedish National Infrastructure for Computing (SNIC), b2013262 b2015001Science for Life Laboratory - a national resource center for high-throughput molecular bioscienceeSSENCE - An eScience Collaboration
    Available from: 2015-04-09 Created: 2015-04-09 Last updated: 2016-11-17Bibliographically approved
  • 8.
    Andersson, Robin
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Decoding the Structural Layer of Transcriptional Regulation: Computational Analyses of Chromatin and Chromosomal Aberrations2010Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Gene activity is regulated at two separate layers. Through structural and chemical properties of DNA – the primary layer of encoding – local signatures may enable, or disable, the binding of proteins or complexes of them with regulatory potential to the DNA. At a higher level – the structural layer of encoding – gene activity is regulated through the properties of higher order DNA structure, chromatin, and chromosome organization. Cells with abnormal chromosome compaction or organization, e.g. cancer cells, may thus have perturbed regulatory activities resulting in abnormal gene activity.

    Hence, there is a great need to decode the transcriptional regulation encoded in both layers to further our understanding of the factors that control activity and life of a cell and, ultimately, an organism. Modern genome-wide studies with those aims rely on data-intense experiments requiring sophisticated computational and statistical methods for data handling and analyses. This thesis describes recent advances of analyzing experimental data from quantitative biological studies to decipher the structural layer of encoding in human cells.

    Adopting an integrative approach when possible, combining multiple sources of data, allowed us to study the influences of chromatin (Papers I and II) and chromosomal aberrations (Paper IV) on transcription. Combining chromatin data with chromosomal aberration data allowed us to identify putative driver oncogenes and tumor-suppressor genes in cancer (Paper IV).

    Bayesian approaches enabling the incorporation of background information in the models and the adaptability of such models to data have been very useful. Their usages yielded accurate and narrow detection of chromosomal breakpoints in cancer (Papers III and IV) and reliable positioning of nucleosomes and their dynamics during transcriptional regulation at functionally relevant regulatory elements (Paper II).

    Using massively parallel sequencing data, we explored the chromatin landscapes of human cells (Papers I and II) and concluded that there is a preferential and evolutionary conserved positioning at internal exons nearly unaffected by the transcriptional level. We also observed a strong association between certain histone modifications and the inclusion or exclusion of an exon in the mature gene transcript, suggesting a functional role in splicing.

    List of papers
    1. Nucleosomes are well positioned in exons and carry characteristic histone modifications
    Open this publication in new window or tab >>Nucleosomes are well positioned in exons and carry characteristic histone modifications
    Show others...
    2009 (English)In: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 19, no 10, 1732-1741 p.Article in journal (Refereed) Published
    Abstract [en]

    The genomes of higher organisms are packaged in nucleosomes with functional histone modifications. Until now, genome-wide nucleosome and histone modification studies have focused on transcription start sites (TSSs) where nucleosomes in RNA polymerase II (RNAPII) occupied genes are well positioned and have histone modifications that are characteristic of expression status. Using public data, we here show that there is a higher nucleosome-positioning signal in internal human exons and that this positioning is independent of expression. We observed a similarly strong nucleosome-positioning signal in internal exons of C. elegans. Among the 38 histone modifications analyzed in man, H3K36me3, H3K79me1, H2BK5me1, H3K27me1, H3K27me2 and H3K27me3 had evidently higher signal in internal exons than in the following introns and were clearly related to exon expression. These observations are suggestive of roles in splicing. Thus, exons are not only characterized by their coding capacity but also by their nucleosome organization, which seems evolutionary conserved since it is present in both primates and nematodes.

    National Category
    Medical and Health Sciences
    Identifiers
    urn:nbn:se:uu:diva-107609 (URN)10.1101/gr.092353.109 (DOI)000270389700005 ()19687145 (PubMedID)
    Note

    De tre första författarna delar första författarskapet.

    Available from: 2009-08-19 Created: 2009-08-19 Last updated: 2017-02-02Bibliographically approved
    2. Strand-based mixture modeling of nucleosome positioning in HepG2 cells and their regulatory dynamics in response to TGF-beta treatment
    Open this publication in new window or tab >>Strand-based mixture modeling of nucleosome positioning in HepG2 cells and their regulatory dynamics in response to TGF-beta treatment
    Show others...
    (English)Manuscript (preprint) (Other academic)
    Identifiers
    urn:nbn:se:uu:diva-130998 (URN)
    Available from: 2010-09-20 Created: 2010-09-20 Last updated: 2010-11-11
    3. A Segmental Maximum A Posteriori Approach to Genome-wide Copy Number Profiling
    Open this publication in new window or tab >>A Segmental Maximum A Posteriori Approach to Genome-wide Copy Number Profiling
    Show others...
    2008 (English)In: Bioinformatics, ISSN 1367-4803, Vol. 24, no 6, 751-758 p.Article in journal (Other academic) Published
    Abstract [en]

    MOTIVATION: Copy number profiling methods aim at assigning DNA copy numbers to chromosomal regions using measurements from microarray-based comparative genomic hybridizations. Among the proposed methods to this end, Hidden Markov Model (HMM)-based approaches seem promising since DNA copy number transitions are naturally captured in the model. Current discrete-index HMM-based approaches do not, however, take into account heterogeneous information regarding the genomic overlap between clones. Moreover, the majority of existing methods are restricted to chromosome-wise analysis. RESULTS: We introduce a novel Segmental Maximum A Posteriori approach, SMAP, for DNA copy number profiling. Our method is based on discrete-index Hidden Markov Modeling and incorporates genomic distance and overlap between clones. We exploit a priori information through user-controllable parameterization that enables the identification of copy number deviations of various lengths and amplitudes. The model parameters may be inferred at a genome-wide scale to avoid overfitting of model parameters often resulting from chromosome-wise model inference. We report superior performances of SMAP on synthetic data when compared with two recent methods. When applied on our new experimental data, SMAP readily recognizes already known genetic aberrations including both large-scale regions with aberrant DNA copy number and changes affecting only single features on the array. We highlight the differences between the prediction of SMAP and the compared methods and show that SMAP accurately determines copy number changes and benefits from overlap consideration.

    National Category
    Medical and Health Sciences
    Identifiers
    urn:nbn:se:uu:diva-13616 (URN)10.1093/bioinformatics/btn003 (DOI)000254010400003 ()18204059 (PubMedID)
    Available from: 2008-08-21 Created: 2008-08-21 Last updated: 2010-11-11Bibliographically approved
    4. Integrative epigenomic and genomic analysis of malignant pheochromocytoma
    Open this publication in new window or tab >>Integrative epigenomic and genomic analysis of malignant pheochromocytoma
    Show others...
    2010 (English)In: Experimental and Molecular Medicine, ISSN 1226-3613, Vol. 42, no 7, 484-502 p.Article in journal (Refereed) Published
    Abstract [en]

    Epigenomic and genomic changes affect gene expression and contribute to tumor development. The histone modifications trimethylated histone H3 lysine 4 (H3K4me3) and lysine 27 (H3K27me3) are epigenetic regulators associated to active and silenced genes, respectively and alterations of these modifications have been observed in cancer. Furthermore, genomic aberrations such as DNA copy number changes are common events in tumors. Pheochromocytoma is a rare endocrine tumor of the adrenal gland that mostly occurs sporadic with unknown epigenetic/genetic cause. The majority of cases are benign. Here we aimed to combine the genome-wide profiling of H3K4me3 and H3K27me3, obtained by the ChIP-chip methodology, and DNA copy number data with global gene expression examination in a malignant pheochromocytoma sample. The integrated analysis of the tumor expression levels, in relation to normal adrenal medulla, indicated that either histone modifications or chromosomal alterations, or both, have great impact on the expression of a substantial fraction of the genes in the investigated sample. Candidate tumor suppressor genes identified with decreased expression, a H3K27me3 mark and/or in regions of deletion were for instance TGIF1, DSC3, TNFRSF10B, RASSF2, HOXA9, PTPRE and CDH11. More genes were found with increased expression, a H3K4me3 mark, and/or in regions of gain. Potential oncogenes detected among those were GNAS, INSM1, DOK5, ETV1, RET, NTRK1, IGF2, and the H3K27 trimethylase gene EZH2. Our approach to associate histone methylations and DNA copy number changes to gene expression revealed apparent impact on global gene transcription, and enabled the identification of candidate tumor genes for further exploration.

    Keyword
    histone code, DNA copy number changes, gene expression, oncogenes, pheochromocytoma, tumor suppressor genes
    National Category
    Medical and Health Sciences
    Identifiers
    urn:nbn:se:uu:diva-129532 (URN)10.3858/emm.2010.42.7.050 (DOI)000280558100002 ()20534969 (PubMedID)
    Available from: 2010-08-18 Created: 2010-08-18 Last updated: 2017-02-02Bibliographically approved
  • 9.
    Anlind, Alice
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Improvments and evaluation of data processing in LC-MS metabolomics: for application in in vitro systems pharmacology2017Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    The resistance of established medicines is rapidly increasing while the rate of

    discovery of new drugs and treatments have not increases during the last decades

    (Spiro et al. 2008). Systems pharmacology can be used to find new combinations or

    concentrations of established drugs to find new treatments faster (Borisy et al. 2003).

    A recent study aimed to use high resolution Liquid chromatography–mass

    spectrometry (LC-MS) for in vitro systems pharmacology, but encountered problems

    with unwanted variability and batch effects(Herman et al. 2017). This thesis builds on

    this work by improving the pipeline and comparing alternative methods and evaluating

    used methods. The evaluation of methods indicated that the data quality was often

    not improved substantially by complex methods and pipelines. Instead simpler

    methods such as binning for feature extraction performed best. In-fact many of the

    preprocessing method commonly used proved to have negative or neglect-able effects

    on resulting data quality. Finally the recently introduced Optimal Orthonormal System

    for Discriminant Analysis (OOS-DA) for batch removal was found to be a good

    alternative to the more complex Combat method.

  • 10.
    Anyango, Stephen Omondi Otieno
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre. Uppsala University.
    VisuNet: Visualizing Networks of feature interactions in rule-based classifiers2016Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
  • 11.
    Ardell, David H.
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics. Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Evolution, Genomics and Systematics, Molecular Evolution.
    Andersson, Siv G. E.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Evolution, Genomics and Systematics, Molecular Evolution.
    TFAM detects co-evolution of tRNA identity rules with lateral transfer of histidyl-tRNA sythetase2006In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 34, no 3, 893-904 p.Article in journal (Refereed)
    Abstract [en]

    We present TFAM, an automated, statistical method to classify the identity of tRNAs. TFAM, currently optimized for bacteria, classifies initiator tRNAs and predicts the charging identity of both typical and atypical tRNAs such as suppressors with high confidence. We show statistical evidence for extensive variation in tRNA identity determinants among bacterial genomes due to variation in overall tDNA base content. With TFAM we have detected the first case of eukaryotic-like tRNA identity rules in bacteria. An alpha-proteobacterial clade encompassing Rhizobiales, Caulobacter crescentus and Silicibacter pomeroyi, unlike a sister clade containing the Rickettsiales, Zymomonas mobilis and Gluconobacter oxydans, uses the eukaryotic identity element A73 instead of the highly conserved prokaryotic element C73. We confirm divergence of bacterial histidylation rules by demonstrating perfect covariation of alpha-proteobacterial tRNA(His) acceptor stems and residues in the motif IIb tRNA-binding pocket of their histidyl-tRNA synthetases (HisRS). Phylogenomic analysis supports lateral transfer of a eukaryotic-like HisRS into the alpha-proteobacteria followed by in situ adaptation of the bacterial tDNA(His) and identity rule divergence. Our results demonstrate that TFAM is an effective tool for the bioinformatics, comparative genomics and evolutionary study of tRNA identity.

  • 12.
    Attwood, Misty
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    The gene repertoire and functional characterization of membrane bound proteins: with focus on three- and four-transmembrane regions2015Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
  • 13.
    Bartoszek, Krzysztof
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Mathematics, Applied Mathematics and Statistics.
    Phylogenetic effective sample size2016In: Journal of Theoretical Biology, ISSN 0022-5193, E-ISSN 1095-8541, Vol. 407, 371-386 p.Article in journal (Refereed)
    Abstract [en]

    In this paper I address the question—how large is a phylogenetic sample? I propose a definition of a phylogenetic effective sample size for Brownian motion and Ornstein-Uhlenbeck processes-the regression effective sample size. I discuss how mutual information can be used to define an effective sample size in the non-normal process case and compare these two definitions to an already present concept of effective sample size (the mean effective sample size). Through a simulation study I find that the AICc is robust if one corrects for the number of species or effective number of species. Lastly I discuss how the concept of the phylogenetic effective sample size can be useful for biodiversity quantification, identification of interesting clades and deciding on the importance of phylogenetic correlations.

  • 14.
    Berglund, Eva Caroline
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Evolution, Genomics and Systematics, Molecular Evolution.
    Genome Evolution and Host Adaptation in Bartonella2009Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Bacteria of the genus Bartonella infect the red blood cells of a wide range of wild and domestic mammals and are transmitted between hosts by blood-sucking insects. Although most Bartonella infections are asymptomatic, the genus contains several human pathogens. In this work, host adaptation and host switches in Bartonella have been studied from a genomic perspective, with special focus on the acquisition and evolution of genes involved in host interactions.

    As part of this study, the complete genome of B. grahamii isolated from a Swedish wood mouse was sequenced. A genus-wide comparison revealed that rodent-associated Bartonella species, which have rarely been associated with human disease, have the largest genomes and the largest number of host-adaptability genes. Analysis of known and putative genes for host interactions identified several families of autotransporters as horizontally transferred to the Bartonella ancestor, with a possible role both during early host adaptation and subsequent host shifts.

    In B. grahamii, the association of a gene transfer agent (GTA) and phage-derived run-off replication of a large genomic segment was demonstrated for the first time. Among all acquisitions to the Bartonella ancestor, the only well conserved gene clusters are those that encode the GTA and contain the origin of the run-off replication. This conservation, along with a high density of host-adaptability genes in the amplified region suggest that the GTA provides a strong selective advantage, possibly by increasing recombination frequencies of host-adaptability genes, thereby facilitating evasion of the host immune system and colonization of new hosts.

    B. grahamii displays stronger geographic pattern and higher recombination frequencies than the cat-associated B. henselae, probably caused by different lifestyles and/or population sizes of the hosts. The genomic diversity of B. grahamii is markedly lower in Europe and North America than in Asia, possibly an effect of reduced host variability in these areas following the latest ice age.

    List of papers
    1. Run-off replication of host-adaptability genes is associated with gene transfer agents in the genome of mouse-infecting Bartonella grahamii
    Open this publication in new window or tab >>Run-off replication of host-adaptability genes is associated with gene transfer agents in the genome of mouse-infecting Bartonella grahamii
    Show others...
    2009 (English)In: PLoS genetics, ISSN 1553-7404, Vol. 5, no 7, e1000546- p.Article in journal (Refereed) Published
    Abstract [en]

    The genus Bartonella comprises facultative intracellular bacteria adapted to mammals, including previously recognized and emerging human pathogens. We report the 2,341,328 bp genome sequence of Bartonella grahamii, one of the most prevalent Bartonella species in wild rodents. Comparative genomics revealed that rodent-associated Bartonella species have higher copy numbers of genes for putative host-adaptability factors than the related human-specific pathogens. Many of these gene clusters are located in a highly dynamic region of 461 kb. Using hybridization to a microarray designed for the B. grahamii genome, we observed a massive, putatively phage-derived run-off replication of this region. We also identified a novel gene transfer agent, which packages the bacterial genome, with an over-representation of the amplified DNA, in 14 kb pieces. This is the first observation associating the products of run-off replication with a gene transfer agent. Because of the high concentration of gene clusters for host-adaptation proteins in the amplified region, and since the genes encoding the gene transfer agent and the phage origin are well conserved in Bartonella, we hypothesize that these systems are driven by selection. We propose that the coupling of run-off replication with gene transfer agents promotes diversification and rapid spread of host-adaptability factors, facilitating host shifts in Bartonella.

    National Category
    Biological Sciences
    Identifiers
    urn:nbn:se:uu:diva-108371 (URN)10.1371/journal.pgen.1000546 (DOI)000269219500042 ()19578403 (PubMedID)
    Available from: 2009-09-17 Created: 2009-09-17 Last updated: 2010-07-09Bibliographically approved
    2. Genome dynamics of Bartonella grahamii in micro-populations of woodland rodents
    Open this publication in new window or tab >>Genome dynamics of Bartonella grahamii in micro-populations of woodland rodents
    Show others...
    2010 (English)In: BMC Genomics, ISSN 1471-2164, Vol. 11, 152- p.Article in journal (Refereed) Published
    Abstract [en]

    Background: Rodents represent a high-risk reservoir for the emergence of new human pathogens. The recent completion of the 2.3 Mb genome of Bartonella grahamii, one of the most prevalent blood-borne bacteria in wild rodents, revealed a higher abundance of genes for host-cell interaction systems than in the genomes of closely related human pathogens. The sequence variability within the global B. grahamii population was recently investigated by multi locus sequence typing, but no study on the variability of putative host-cell interaction systems has been performed.

    Results: To study the population dynamics of B. grahamii, we analyzed the genomic diversity on a whole-genome scale of 27 B. grahamii strains isolated from four different species of wild rodents in three geographic locations separated by less than 30 km. Even using highly variable spacer regions, only 3 sequence types were identified. This low sequence diversity contrasted with a high variability in genome content. Microarray comparative genome hybridizations identified genes for outer surface proteins, including a repeated region containing the fha gene for filamentous hemaggluttinin and a plasmid that encodes a type IV secretion system, as the most variable. The estimated generation times in liquid culture medium for a subset of strains ranged from 5 to 22 hours, but did not correlate with sequence type or presence/absence patterns of the fha gene or the plasmid.

    Conclusion: Our study has revealed a geographic microstructure of B. grahamii in wild rodents. Despite near-identity in nucleotide sequence, major differences were observed in gene presence/absence patterns that did not segregate with host species. This suggests that genetically similar strains can infect a range of different hosts.

    National Category
    Biological Sciences
    Identifiers
    urn:nbn:se:uu:diva-108379 (URN)10.1186/1471-2164-11-152 (DOI)000276363100003 ()
    Available from: 2009-09-23 Created: 2009-09-17 Last updated: 2011-01-05Bibliographically approved
    3. Diversification by recombination in Bartonella grahamii from wild rodents in Asia contrasts with a clonal population structure in Northern Europe and America
    Open this publication in new window or tab >>Diversification by recombination in Bartonella grahamii from wild rodents in Asia contrasts with a clonal population structure in Northern Europe and America
    Show others...
    (English)Manuscript (preprint) (Other academic)
    Identifiers
    urn:nbn:se:uu:diva-108384 (URN)
    Available from: 2009-09-24 Created: 2009-09-17 Last updated: 2010-01-14
    4. Evolution of Host Adaptation Systems in  the Mammalian Blood Specialist Bartonella
    Open this publication in new window or tab >>Evolution of Host Adaptation Systems in  the Mammalian Blood Specialist Bartonella
    Show others...
    (English)Manuscript (preprint) (Other academic)
    Abstract [en]

    Bacteria of the genus Bartonella are facultative intracellular bacteria infecting the red blood cells of mammals. Bartonella isolates have now been reported from a wide range of mammalian host species, including humans, domestic animals such as pets and livestock, as well as many wild animals such as deer, moose, kangaroo, and whales. Here, we present the first major genus-wide investigation of host-adaptation systems in Bartonella, using 5 published and 5 draft genome sequences. The sampling includes both clinical and natural isolates, and represent well the major phylogenetic diversity of the genus. Our study reveals four distinct protein families of Type V Secretion Systems (T5SS) shared by all sequenced members of the genus. We also show that a recently identified gene transfer agent (GTA) consisting of a defective phage is, surprisingly, the most conserved gene cluster among all Bartonella-specific or imported genes, strongly emphasizing the functional importance of this system for the life-style and evolution of Bartonella.

    Keyword
    host adaptation, pathogen, secretion systems, flagella, gene transfer agent, evolution
    National Category
    Bioinformatics and Systems Biology
    Research subject
    Evolutionary Genetics
    Identifiers
    urn:nbn:se:uu:diva-107784 (URN)
    Available from: 2009-08-26 Created: 2009-08-26 Last updated: 2010-01-14
    5. Low-coverage pyrosequencing reveals recombination and run-off replication in Bartonella henselae strains
    Open this publication in new window or tab >>Low-coverage pyrosequencing reveals recombination and run-off replication in Bartonella henselae strains
    Show others...
    (English)Manuscript (preprint) (Other academic)
    Abstract [en]

    Bartonella henselae is a natural intracellular colonizer of cats, and is transferred by blood-sucking insect vectors. It is also an opportunistic human pathogen. Two strains of B. henselae, thought to be representative of the diversity of the species, were selected for low-coverage 454 sequencing. The comparison of these two strains to the published Houston-1 reveals very high nucleotide identity and low substitution and recombination, with the remarkable exception of phages and host-interaction genes such as type IV and V secretion systems. Among the few variable genes of unknown function, BH14680, an alpha-Proteobacteria-specific gene, shows faster evolution in Bartonella compared to other alpha-Proteobacteria. Its 5’ end, which is likely coding for a domain exposed extracellularly, is under positive or very relaxed selection, and might be involved in host-interaction processes. Finally, we show that a simple genome coverage analysis reveal major genomic events such as duplications and unusual replication modes, such as the run-off replication. The latter, combined with a gene transfer agent, is thought to be a novel way to increase substitution and recombination frequencies. An extensive analysis of all bacterial pyrosequencing projects showed that it is probably Bartonella-specific.

    Keyword
    pathogen, recombination, run-off replication, phage, gene transfer agent, pyrosequencing, evolution
    National Category
    Bioinformatics and Systems Biology
    Research subject
    Evolutionary Genetics
    Identifiers
    urn:nbn:se:uu:diva-107785 (URN)
    Available from: 2009-08-27 Created: 2009-08-26 Last updated: 2010-01-14
  • 15.
    Besnier, Francois
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Development of Variance Component Methods for Genetic Dissection of Complex Traits2009Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    This thesis presents several developments on Variance component (VC) approach for Quantitative Trait Locus (QTL) mapping.

    The first part consists of methodological improvements: a new fast and efficient method for estimating IBD matrices, have been developed. The new method makes a better use of the computer resources in terms of computational power and storage memory, facilitating further improvements by resolving methodological bottlenecks in algorithms to scan multiple QTL. A new VC model have also been developed in order to consider and evaluate the correlation of the allelic effects within parental lines origin in experimental outbred crosses. The method was tested on simulated and experimental data and revealed a higher or similar power to detect QTL than linear regression based QTL mapping.

    The second part focused on the prospect to analyze multi-generational pedigrees by VC approach. The IBD estimation algorithm was extended to include haplotype information in addition to genotype and pedigree to improve the accuracy of the IBD estimates, and a new haplotyping algorithm was developed for limiting the risk of haplotyping errors in multigenerational pedigrees. Those newly developed methods where subsequently applied for the analysis of a nine generations AIL pedigree obtained after crossing two chicken lines divergently selected for body weight. Nine QTL described in a F2 population were replicated in the AIL pedigree, and our strategy to use both genotype and phenotype information from all individuals in the entire pedigree clearly made efficient use of the available genotype information provided in AIL.

    List of papers
    1. An Improved Method for Quantitative Trait Loci Detection and Identification of Within-Line Segregation in F2 Intercross Designs
    Open this publication in new window or tab >>An Improved Method for Quantitative Trait Loci Detection and Identification of Within-Line Segregation in F2 Intercross Designs
    2008 (English)In: Genetics, ISSN 0016-6731, E-ISSN 1943-2631, Vol. 178, no 4, 2315-2326 p.Article in journal (Refereed) Published
    Abstract [en]

    We present a new flexible, simple, and power ful genome-scan method (flexible intercross analysis, FIA) for detecting quantitative trait loci (QTL) in experimental line crosses. The method is based on a pure random-effects model that simultaneously models between- and within-line QTL variation for single as well as epistatic QTL. It utilizes the score statistic and thereby facilitates computationally efficient significance testing based on empirical significance thresholds obtained by means of permutations. The properties of the method are explored using simulations and analyses of experimental data. The simulations showed that the power of FIA was as good as, or better than, Haley–Knott regression and that FIA was rather insensitive to the level of allelic fixation in the founders, especially for pedigrees with few founders. A chromosome scan was conducted for a meat quality trait in an F2 intercross in pigs where a mutation in the halothane (Ryanodine receptor, RYR1) gene with a large effect on meat quality was known to segregate in one founder line. FIA obtained significant support for the halothane-associated QTL and identified the base generation allele with the mutated allele. A genome scan was also performed in a previously analyzed chicken F2 intercross. In the chicken intercross analysis, four previously detected QTL were confirmed at a 5% genomewide significance level, and FIA gave strong evidence (P , 0.01) for two of these QTL to be segregating within the founder lines. FIA was also extended to account for epistasis and using simulations we show that the method provides good estimates of epistatic QTL variance even for segregating QTL. Extensions of FIA and its applications on other intercross populations including backcrosses, advanced intercross lines, and heterogeneous stocks are also discussed.

    National Category
    Genetics
    Research subject
    Genetics
    Identifiers
    urn:nbn:se:uu:diva-101358 (URN)10.1534/genetics.107.083162 (DOI)000255239600039 ()18430952 (PubMedID)
    Available from: 2009-05-06 Created: 2009-04-23 Last updated: 2016-05-25Bibliographically approved
    2. Fine mapping and replication of QTL in outbred chicken advanced intercross lines
    Open this publication in new window or tab >>Fine mapping and replication of QTL in outbred chicken advanced intercross lines
    Show others...
    2011 (English)In: Genetics Selection Evolution, ISSN 0999-193X, E-ISSN 1297-9686, Vol. 43, 3- p.Article in journal (Refereed) Published
    Abstract [en]

    BACKGROUND: Linkage mapping is used to identify genomic regions affecting the expression of complex traits. However, when experimental crosses such as F2 populations or backcrosses are used to map regions containing a Quantitative Trait Locus (QTL), the size of the regions identified remains quite large, i.e. 10 or more Mb. Thus, other experimental strategies are needed to refine the QTL locations. Advanced Intercross Lines (AIL) are produced by repeated intercrossing of F2 animals and successive generations, which decrease linkage disequilibrium in a controlled manner. Although this approach is seen as promising, both to replicate QTL analyses and fine-map QTL, only a few AIL datasets, all originating from inbred founders, have been reported in the literature.

    METHODS: We have produced a nine-generation AIL pedigree (n = 1529) from two outbred chicken lines divergently selected for body weight at eight weeks of age. All animals were weighed at eight weeks of age and genotyped for SNP located in nine genomic regions where significant or suggestive QTL had previously been detected in the F2 population. In parallel, we have developed a novel strategy to analyse the data that uses both genotype and pedigree information of all AIL individuals to replicate the detection of and fine-map QTL affecting juvenile body weight.

    RESULTS: Five of the nine QTL detected with the original F2 population were confirmed and fine-mapped with the AIL, while for the remaining four, only suggestive evidence of their existence was obtained. All original QTL were confirmed as a single locus, except for one, which split into two linked QTL.

    CONCLUSIONS: Our results indicate that many of the QTL, which are genome-wide significant or suggestive in the analyses of large intercross populations, are true effects that can be replicated and fine-mapped using AIL. Key factors for success are the use of large populations and powerful statistical tools. Moreover, we believe that the statistical methods we have developed to efficiently study outbred AIL populations will increase the number of organisms for which in-depth complex traits can be analyzed.

     

    National Category
    Genetics
    Research subject
    Genetics
    Identifiers
    urn:nbn:se:uu:diva-101398 (URN)10.1186/1297-9686-43-3 (DOI)000287133300001 ()21241486 (PubMedID)
    Available from: 2009-04-24 Created: 2009-04-24 Last updated: 2016-05-18
    3.
    The record could not be found. The reason may be that the record is no longer available or you may have typed in a wrong id in the address field.
    4. A genetic algorithm based method for stringent haplotyping of family data
    Open this publication in new window or tab >>A genetic algorithm based method for stringent haplotyping of family data
    2009 (English)In: BMC Genetics, ISSN 1471-2156, E-ISSN 1471-2156, Vol. 10, 57Article in journal (Refereed) Published
    Abstract [en]

    Background: The linkage phase, or haplotype, is an extra level of information that in addition to genotype and pedigree can be useful for reconstructing the inheritance pattern of the alleles in a pedigree, and computing for example Identity By Descent probabilities. If a haplotype is provided, the precision of estimated IBD probabilities increases, as long as the haplotype is estimated without errors. It is therefore important to only use haplotypes that are strongly supported by the available data for IBD estimation, to avoid introducing new errors due to erroneous linkage phases.

    Results: We propose a genetic algorithm based method for haplotype estimation in family data that includes a stringency parameter. This allows the user to decide the error tolerance level when inferring parental origin of the alleles. This is a novel feature compared to existing methods for haplotype estimation. We show that using a high stringency produces haplotype data with few errors, whereas a low stringency provides haplotype estimates in most situations, but with an increased number of errors.

    Conclusion: By including a stringency criterion in our haplotyping method, the user is able to maintain the error rate at a suitable level for the particular study; one can select anything from haplotyped data with very small proportion of errors and a higher proportion of non-inferred haplotypes, to data with phase estimates for every marker, when haplotype errors are tolerable. Giving this choice makes the method more flexible and useful in a wide range of applications as it is able to fulfil different requirements regarding the tolerance for haplotype errors, or uncertain marker-phases.

    National Category
    Bioinformatics and Systems Biology
    Research subject
    Genetics
    Identifiers
    urn:nbn:se:uu:diva-101397 (URN)10.1186/1471-2156-10-57 (DOI)000270360900001 ()19761594 (PubMedID)
    Note

    Manuscripttitle in list of papers in thesis: A genetic algorithm based haplotyping method provides better control on haplotype error rate

    Available from: 2009-04-24 Created: 2009-04-24 Last updated: 2016-05-27Bibliographically approved
  • 16.
    Besnier, Francois
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    Carlborg, Örjan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    A genetic algorithm based method for stringent haplotyping of family data2009In: BMC Genetics, ISSN 1471-2156, E-ISSN 1471-2156, Vol. 10, 57Article in journal (Refereed)
    Abstract [en]

    Background: The linkage phase, or haplotype, is an extra level of information that in addition to genotype and pedigree can be useful for reconstructing the inheritance pattern of the alleles in a pedigree, and computing for example Identity By Descent probabilities. If a haplotype is provided, the precision of estimated IBD probabilities increases, as long as the haplotype is estimated without errors. It is therefore important to only use haplotypes that are strongly supported by the available data for IBD estimation, to avoid introducing new errors due to erroneous linkage phases.

    Results: We propose a genetic algorithm based method for haplotype estimation in family data that includes a stringency parameter. This allows the user to decide the error tolerance level when inferring parental origin of the alleles. This is a novel feature compared to existing methods for haplotype estimation. We show that using a high stringency produces haplotype data with few errors, whereas a low stringency provides haplotype estimates in most situations, but with an increased number of errors.

    Conclusion: By including a stringency criterion in our haplotyping method, the user is able to maintain the error rate at a suitable level for the particular study; one can select anything from haplotyped data with very small proportion of errors and a higher proportion of non-inferred haplotypes, to data with phase estimates for every marker, when haplotype errors are tolerable. Giving this choice makes the method more flexible and useful in a wide range of applications as it is able to fulfil different requirements regarding the tolerance for haplotype errors, or uncertain marker-phases.

  • 17.
    Bornelöv, Susanne
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    Rule-based Models of Transcriptional Regulation and Complex Diseases: Applications and Development2014Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    As we gain increased understanding of genetic disorders and gene regulation more focus has turned towards complex interactions. Combinations of genes or gene and environmental factors have been suggested to explain the missing heritability behind complex diseases. Furthermore, gene activation and splicing seem to be governed by a complex machinery of histone modification (HM), transcription factor (TF), and DNA sequence signals. This thesis aimed to apply and develop multivariate machine learning methods for use on such biological problems. Monte Carlo feature selection was combined with rule-based classification to identify interactions between HMs and to study the interplay of factors with importance for asthma and allergy.

    Firstly, publicly available ChIP-seq data (Paper I) for 38 HMs was studied. We trained a classifier for predicting exon inclusion levels based on the HMs signals. We identified HMs important for splicing and illustrated that splicing could be predicted from the HM patterns. Next, we applied a similar methodology on data from two large birth cohorts describing asthma and allergy in children (Paper II). We identified genetic and environmental factors with importance for allergic diseases which confirmed earlier results and found candidate gene-gene and gene-environment interactions.

    In order to interpret and present the classifiers we developed Ciruvis, a web-based tool for network visualization of classification rules (Paper III). We applied Ciruvis on classifiers trained on both simulated and real data and compared our tool to another methodology for interaction detection using classification. Finally, we continued the earlier study on epigenetics by analyzing HM and TF signals in genes with or without evidence of bidirectional transcription (Paper IV). We identified several HMs and TFs with different signals between unidirectional and bidirectional genes. Among these, the CTCF TF was shown to have a well-positioned peak 60-80 bp upstream of the transcription start site in unidirectional genes.

    List of papers
    1. Combinations of histone modifications mark exon inclusion levels
    Open this publication in new window or tab >>Combinations of histone modifications mark exon inclusion levels
    2012 (English)In: PLoS ONE, ISSN 1932-6203, Vol. 7, no 1, e29911Article in journal (Refereed) Published
    Abstract [en]

    Splicing is a complex process regulated by sequence at the classical splice sites and other motifs in exons and introns with an enhancing or silencing effect. In addition, specific histone modifications on nucleosomes positioned over the exons have been shown to correlate both positively and negatively with exon expression. Here, we trained a model of "IF … THEN …" rules to predict exon inclusion levels in a transcript from histone modification patterns. Furthermore, we showed that combinations of histone modifications, in particular those residing on nucleosomes preceding or succeeding the exon, are better predictors of exon inclusion levels than single modifications. The resulting model was evaluated with cross validation and had an average accuracy of 72% for 27% of the exons, which demonstrates that epigenetic signals substantially mark alternative splicing.

    National Category
    Cell and Molecular Biology
    Identifiers
    urn:nbn:se:uu:diva-175875 (URN)10.1371/journal.pone.0029911 (DOI)000312662100045 ()22242188 (PubMedID)
    Funder
    Knut and Alice Wallenberg FoundationSwedish Foundation for Strategic Research Swedish Research CouncilSwedish Cancer Society
    Available from: 2012-06-13 Created: 2012-06-13 Last updated: 2017-02-02Bibliographically approved
    2. Rule-Based Models of the Interplay between Genetic and Environmental Factors in Childhood Allergy
    Open this publication in new window or tab >>Rule-Based Models of the Interplay between Genetic and Environmental Factors in Childhood Allergy
    Show others...
    2013 (English)In: PLoS ONE, ISSN 1932-6203, Vol. 8, no 11, e80080- p.Article in journal (Refereed) Published
    Abstract [en]

    Both genetic and environmental factors are important for the development of allergic diseases. However, a detailed understanding of how such factors act together is lacking. To elucidate the interplay between genetic and environmental factors in allergic diseases, we used a novel bioinformatics approach that combines feature selection and machine learning. In two materials, PARSIFAL (a European cross-sectional study of 3113 children) and BAMSE (a Swedish birth-cohort including 2033 children), genetic variants as well as environmental and lifestyle factors were evaluated for their contribution to allergic phenotypes. Monte Carlo feature selection and rule based models were used to identify and rank rules describing how combinations of genetic and environmental factors affect the risk of allergic diseases. Novel interactions between genes were suggested and replicated, such as between ORMDL3 and RORA, where certain genotype combinations gave odds ratios for current asthma of 2.1 (95% CI 1.2-3.6) and 3.2 (95% CI 2.0-5.0) in the BAMSE and PARSIFAL children, respectively. Several combinations of environmental factors appeared to be important for the development of allergic disease in children. For example, use of baby formula and antibiotics early in life was associated with an odds ratio of 7.4 (95% CI 4.5-12.0) of developing asthma. Furthermore, genetic variants together with environmental factors seemed to play a role for allergic diseases, such as the use of antibiotics early in life and COL29A1 variants for asthma, and farm living and NPSR1 variants for allergic eczema. Overall, combinations of environmental and life style factors appeared more frequently in the models than combinations solely involving genes. In conclusion, a new bioinformatics approach is described for analyzing complex data, including extensive genetic and environmental information. Interactions identified with this approach could provide useful hints for further in-depth studies of etiological mechanisms and may also strengthen the basis for risk assessment and prevention.

    National Category
    Medical and Health Sciences
    Identifiers
    urn:nbn:se:uu:diva-213817 (URN)10.1371/journal.pone.0080080 (DOI)000327311900057 ()
    Available from: 2014-01-05 Created: 2014-01-04 Last updated: 2015-01-22Bibliographically approved
    3. Ciruvis: a web-based tool for rule networks and interaction detection using rule-based classifiers
    Open this publication in new window or tab >>Ciruvis: a web-based tool for rule networks and interaction detection using rule-based classifiers
    2014 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 15, 139- p.Article in journal (Refereed) Published
    Abstract [en]

    Background: The use of classification algorithms is becoming increasingly important for the field of computational biology. However, not only the quality of the classification, but also its biological interpretation is important. This interpretation may be eased if interacting elements can be identified and visualized, something that requires appropriate tools and methods. Results: We developed a new approach to detecting interactions in complex systems based on classification. Using rule-based classifiers, we previously proposed a rule network visualization strategy that may be applied as a heuristic for finding interactions. We now complement this work with Ciruvis, a web-based tool for the construction of rule networks from classifiers made of IF-THEN rules. Simulated and biological data served as an illustration of how the tool may be used to visualize and interpret classifiers. Furthermore, we used the rule networks to identify feature interactions, compared them to alternative methods, and computationally validated the findings. Conclusions: Rule networks enable a fast method for model visualization and provide an exploratory heuristic to interaction detection. The tool is made freely available on the web and may thus be used to aid and improve rule-based classification.

    Keyword
    Visualization, Rules, Interactions, Interaction detection, Classification, Rule-based classification
    National Category
    Biochemistry and Molecular Biology
    Identifiers
    urn:nbn:se:uu:diva-228027 (URN)10.1186/1471-2105-15-139 (DOI)000336679600001 ()
    Available from: 2014-07-02 Created: 2014-07-02 Last updated: 2015-01-22Bibliographically approved
    4. Different distribution of histone modifications in genes with unidirectional and bidirectional transcription and a role of CTCF and cohesin in directing transcription
    Open this publication in new window or tab >>Different distribution of histone modifications in genes with unidirectional and bidirectional transcription and a role of CTCF and cohesin in directing transcription
    2015 (English)In: BMC Genomics, ISSN 1471-2164, Vol. 16, 300Article in journal (Refereed) Published
    Abstract [en]

    Background: Several post-translational histone modifications are mainly found in gene promoters and are associated with the promoter activity. It has been hypothesized that histone modifications regulate the transcription, as opposed to the traditional view with transcription factors as the key regulators. Promoters of most active genes do not only initiate transcription of the coding sequence, but also a substantial amount of transcription of the antisense strand upstream of the transcription start site (TSS). This promoter feature has generally not been considered in previous studies of histone modifications and transcription factor binding.

    Results: We annotated protein-coding genes as bi- or unidirectional depending on their mode of transcription and compared histone modifications and transcription factor occurrences between them. We found that H3K4me3, H3K9ac, and H3K27ac were significantly more enriched upstream of the TSS in bidirectional genes compared with the unidirectional ones. In contrast, the downstream histone modification signals were similar, suggesting that the upstream histone modifications might be a consequence of transcription rather than a cause. Notably, we found well-positioned CTCF and RAD21 peaks approximately 60-80 bp upstream of the TSS in the unidirectional genes. The peak heights were related to the amount of antisense transcription and we hypothesized that CTCF and cohesin act as a barrier against antisense transcription.

    Conclusions: Our results provide insights into the distribution of histone modifications at promoters and suggest a novel role of CTCF and cohesin as regulators of transcriptional direction.

    Keyword
    Antisense transcription, CTCF, RAD21, Cohesin, CAGE, Epigenetics, Transcription factor, Histone modification
    National Category
    Bioinformatics and Systems Biology
    Identifiers
    urn:nbn:se:uu:diva-230158 (URN)10.1186/s12864-015-1485-5 (DOI)000355166000001 ()25881024 (PubMedID)
    Available from: 2014-08-19 Created: 2014-08-19 Last updated: 2015-06-26Bibliographically approved
  • 18.
    Bornelöv, Susanne
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    Komorowski, Jan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    Wadelius, Claes
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology.
    Different distribution of histone modifications in genes with unidirectional and bidirectional transcription and a role of CTCF and cohesin in directing transcription2015In: BMC Genomics, ISSN 1471-2164, Vol. 16, 300Article in journal (Refereed)
    Abstract [en]

    Background: Several post-translational histone modifications are mainly found in gene promoters and are associated with the promoter activity. It has been hypothesized that histone modifications regulate the transcription, as opposed to the traditional view with transcription factors as the key regulators. Promoters of most active genes do not only initiate transcription of the coding sequence, but also a substantial amount of transcription of the antisense strand upstream of the transcription start site (TSS). This promoter feature has generally not been considered in previous studies of histone modifications and transcription factor binding.

    Results: We annotated protein-coding genes as bi- or unidirectional depending on their mode of transcription and compared histone modifications and transcription factor occurrences between them. We found that H3K4me3, H3K9ac, and H3K27ac were significantly more enriched upstream of the TSS in bidirectional genes compared with the unidirectional ones. In contrast, the downstream histone modification signals were similar, suggesting that the upstream histone modifications might be a consequence of transcription rather than a cause. Notably, we found well-positioned CTCF and RAD21 peaks approximately 60-80 bp upstream of the TSS in the unidirectional genes. The peak heights were related to the amount of antisense transcription and we hypothesized that CTCF and cohesin act as a barrier against antisense transcription.

    Conclusions: Our results provide insights into the distribution of histone modifications at promoters and suggest a novel role of CTCF and cohesin as regulators of transcriptional direction.

  • 19.
    Carlsson, Lars
    et al.
    Safety Assessment, AstraZeneca Research & Development.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Adams, Samuel
    Unilever Centre for Molecular Sciences Informatics, University Chemical Laboratory, Cambridge.
    Glen, Robert
    Unilever Centre for Molecular Sciences Informatics, University Chemical Laboratory, Cambridge.
    Boyer, Scott
    Safety Assessment, AstraZeneca Research & Development.
    Use of Historic Metabolic Biotransformation Data as a Means of Anticipating Metabolic Sites Using MetaPrint2D and Bioclipse2010In: BMC Bioinformatics, ISSN 1471-2105, Vol. 11, 362- p.Article in journal (Refereed)
    Abstract [en]

    Background: Predicting metabolic sites is important in the drug discovery process to aid in rapid compound optimisation. No interactive tool exists and most of the useful tools are quite expensive.Results: Here a fast and reliable method to analyse ligands and visualise potential metabolic sites is presented which is based on annotated metabolic data, described by circular fingerprints. The method is available via the graphical workbench Bioclipse, which is equipped with advanced features in cheminformatics.Conclusions: Due to the speed of predictions (less than 50 ms per molecule), scientists can get real time decision support when editing chemical structures. Bioclipse is a rich client, which means that all calculations are performed on the local computer and do not require network connection. Bioclipse and MetaPrint2D are free for all users, released under open source licenses, and available from http://www.bioclipse.net.

  • 20.
    Carlsson, Lars
    et al.
    AstraZeneca R&D.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    Boyer, Scott
    AstraZeneca R&D.
    Model building in Bioclipse Decision Support applied to open datasets2012In: Toxicology Letters, ISSN 0378-4274, E-ISSN 1879-3169, Vol. 211, no Suppl., S62- p.Article in journal (Refereed)
    Abstract [en]

    Bioclipse Decision Support (DS) is a system capable of building predictive models of any collection of SAR data, and making them available in a simple user interface based on Bioclipse (www.bioclipse.net).

    The method is fast and uses Faulon Signatures as chemical descriptors together with a Support Vector Machine algorithm for QSAR model building. A key feature is the capability to visualize and interpret results by highlighting the substructures which contributed most to the prediction. This, together with very fast predictions, allows for editing chemical structures with instantly updated results.

    We here present the results from applying Bioclipse Decision Support to several open QSAR data sets, including endpoints from OpenTox and PubChem. The results show how to extract data from the sources and to build models which can be integrated with user specific models.

  • 21. Claesson, Alf
    et al.
    Spjuth, Ola
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    On Mechanisms of Reactive Metabolite Formation from Drugs2013In: Mini-Reviews in medical chemistry, ISSN 1389-5575, Vol. 13, no 5, 720-729 p.Article in journal (Refereed)
    Abstract [en]

    Idiosyncratic adverse drug reactions (IADRs) cause a broad range of clinically severe conditions of which drug induced liver injury (DILI) in particular is one of the most frequent causes of safety-related drug withdrawals. The underlying cause is almost invariably formation of reactive metabolites (RM) which by attacking macromolecules induce organ injuries. Attempts are being made in the pharmaceutical industry to lower the risk of selecting unfit compounds as clinical candidates. Approaches vary but do not seem to be overly successful at the initial design/synthesis stage. We review here the most frequent categories of mechanisms for RM formation and propose that many cases of RMs encountered within early ADME screening can be foreseen by applying chemical and metabolic knowledge. We also mention a web tool, SpotRM, which can be used for efficient look-up and learning about drugs that have recognized IADRs likely caused by RM formation.

  • 22. Cvijovic, Marija
    et al.
    Almquist, Joachim
    Hagmar, Jonas
    Hohmann, Stefan
    Kaltenbach, Hans-Michael
    Klipp, Edda
    Krantz, Marcus
    Mendes, Pedro
    Nelander, Sven
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Cancer and Vascular Biology.
    Nielsen, Jens
    Pagnani, Andrea
    Przulj, Natasa
    Raue, Andreas
    Stelling, Joerg
    Stoma, Szymon
    Tobin, Frank
    Wodke, Judith A. H.
    Zecchina, Riccardo
    Jirstrand, Mats
    Bridging the gaps in systems biology2014In: Molecular Genetics and Genomics, ISSN 1617-4615, E-ISSN 1617-4623, Vol. 289, no 5, 727-734 p.Article, review/survey (Refereed)
    Abstract [en]

    Systems biology aims at creating mathematical models, i.e., computational reconstructions of biological systems and processes that will result in a new level of understanding-the elucidation of the basic and presumably conserved "design" and "engineering" principles of biomolecular systems. Thus, systems biology will move biology from a phenomenological to a predictive science. Mathematical modeling of biological networks and processes has already greatly improved our understanding of many cellular processes. However, given the massive amount of qualitative and quantitative data currently produced and number of burning questions in health care and biotechnology needed to be solved is still in its early phases. The field requires novel approaches for abstraction, for modeling bioprocesses that follow different biochemical and biophysical rules, and for combining different modules into larger models that still allow realistic simulation with the computational power available today. We have identified and discussed currently most prominent problems in systems biology: (1) how to bridge different scales of modeling abstraction, (2) how to bridge the gap between topological and mechanistic modeling, and (3) how to bridge the wet and dry laboratory gap. The future success of systems biology largely depends on bridging the recognized gaps.

  • 23. Dramiński, Michał
    et al.
    Da̧browski, Michał J.
    Diamanti, Klev
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational Biology and Bioinformatics.
    Koronacki, Jacek
    Komorowski, Jan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational Biology and Bioinformatics.
    Discovering Networks of Interdependent Features in High-Dimensional Problems2016In: Big Data Analysis: New Algorithms for a New Society / [ed] Japkowicz, Nathalie; Stefanowski, Jerzy, Cham: Springer, 2016, 285-304 p.Chapter in book (Refereed)
    Abstract [en]

    The availability of very large data sets in Life Sciences provided earlier by the technological breakthroughs such as microarrays and more recently by various forms of sequencing has created both challenges in analyzing these data as well as new opportunities. A promising, yet underdeveloped approach to Big Data, not limited to Life Sciences, is the use of feature selection and classification to discover interdependent features. Traditionally, classifiers have been developed for the best quality of supervised classification. In our experience, more often than not, rather than obtaining the best possible supervised classifier, the Life Scientist needs to know which features contribute best to classifying observations (objects, samples) into distinct classes and what the interdependencies between the features that describe the observation. Our underlying hypothesis is that the interdependent features and rule networks do not only reflect some syntactical properties of the data and classifiers but also may convey meaningful clues about true interactions in the modeled biological system. In this chapter we develop further our method of Monte Carlo Feature Selection and Interdependency Discovery (MCFS and MCFS-ID, respectively), which are particularly well suited for high-dimensional problems, i.e., those where each observation is described by very many features, often many more features than the number of observations. Such problems are abundant in Life Science applications. Specifically, we define Inter-Dependency Graphs (termed, somewhat confusingly, ID Graphs) that are directed graphs of interactions between features extracted by aggregation of information from the classification trees constructed by the MCFS algorithm. We then proceed with modeling interactions on a finer level with rule networks. We discuss some of the properties of the ID graphs and make a first attempt at validating our hypothesis on a large gene expression data set for CD4+ T-cells. The MCFS-ID and ROSETTA including the Ciruvis approach offer a new methodology for analyzing Big Data from feature selection, through identification of feature interdependencies, to classification with rules according to decision classes, to construction of rule networks. Our preliminary results confirm that MCFS-ID is applicable to the identification of interacting features that are functionally relevant while rule networks offer a complementary picture with finer resolution of the interdependencies on the level of feature-value pairs.

  • 24. Dutilh, Bas E.
    et al.
    Snel, Berend
    Ettema, Thijs J.G.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Molecular Evolution.
    Huynen, Martijn A.
    Signature genes as a phylogenomic tool2008In: Molecular biology and evolution, ISSN 0737-4038, E-ISSN 1537-1719, Vol. 25, no 8, 1659-1667 p.Article in journal (Refereed)
    Abstract [en]

    Gene content has been shown to contain a strong phylogenetic signal, yet its usage for phylogenetic questions is hampered by horizontal gene transfer and parallel gene loss and until now required completely sequenced genomes. Here, we introduce an approach that allows the phylogenetic signal in gene content to be applied to any set of sequences, using signature genes for phylogenetic classification. The hundreds of publicly available genomes allow us to identify signature genes at various taxonomic depths, and we show how the presence of signature genes in an unspecified sample can be used to characterize its taxonomic composition. We identify 8,362 signature genes specific for 112 prokaryotic taxa. We show that these signature genes can be used to address phylogenetic questions on the basis of gene content in cases where classic gene content or sequence analyses provide an ambiguous answer, such as for Nanoarchaeum equitans, and even in cases where complete genomes are not available, such as for metagenomics data. Cross-validation experiments leaving out up to 30% of the species show that approximately 92% of the signature genes correctly place the species in a related clade. Analyses of metagenomics data sets with the signature gene approach are in good agreement with the previously reported species distributions based on phylogenetic analysis of marker genes. Summarizing, signature genes can complement traditional sequence-based methods in addressing taxonomic questions.

  • 25.
    Dyrhage, Karl
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology.
    Bioinformatic Analysis of Genomic and Proteomic Data from Gemmata2016Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Members of the bacterial phylum Planctomycetes have been claimed to have a

    compartmentalised cell plan, with cell walls lacking peptidoglycan despite being free-living.

    These theories have been challenged in recent years, and the nature of the planctomycete cell

    structure is currently under debate. Yet it remains clear that the planctomycete membranes

    have unique properties, and are thus likely localisations of evolutional innovation. In this

    study, proteomes and genomes of four planctomycete species from the Gemmata/Tuwongella

    clade were investigated with the aim to find candidate genes for functional characterisation.

    Analysis based on full genome sequencing and mass spectrometry revealed 21 proteins unique

    to the Gemmata/Tuwongella clade that were present in the proteomes of all four species. The

    gene coding for one of these was found to be organised in an operon, containing an additional

    four clade-specific genes, likely related to type II secretion. A planctomycete-specific cell

    surface signal peptide previously not seen in Gemmata was identified in all four species, with

    proteins found to have the motif indicating that their cell surface has a strong negative charge.

    Lastly, the study has revealed evidence suggesting that the planctomycetes have a traditional

    gram-negative cell wall, contradicting the previously proposed proteinaceous cell wall model.

    The full text will be freely available from 2018-10-01 10:40
  • 26.
    Eklund, Martin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
    eScience Approaches to Model Selection and Assessment: Applications in Bioinformatics2009Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    High-throughput experimental methods, such as DNA and protein microarrays, have become ubiquitous and indispensable tools in biology and biomedicine, and the number of high-throughput technologies is constantly increasing. They provide the power to measure thousands of properties of a biological system in a single experiment and have the potential to revolutionize our understanding of biology and medicine. However, the high expectations on high-throughput methods are challenged by the problem to statistically model the wealth of data in order to translate it into concrete biological knowledge, new drugs, and clinical practices. In particular, the huge number of properties measured in high-throughput experiments makes statistical model selection and assessment exigent. To use high-throughput data in critical applications, it must be warranted that the models we construct reflect the underlying biology and are not just hypotheses suggested by the data. We must furthermore have a clear picture of the risk of making incorrect decisions based on the models.

    The rapid improvements of computers and information technology have opened up new ways of how the problem of model selection and assessment can be approached. Specifically, eScience, i.e. computationally intensive science that is carried out in distributed network envi- ronments, provides computational power and means to efficiently access previously acquired scientific knowledge. This thesis investigates how we can use eScience to improve our chances of constructing biologically relevant models from high-throughput data. Novel methods for model selection and assessment that leverage on computational power and on prior scientific information to "guide" the model selection to models that a priori are likely to be relevant are proposed. In addition, a software system for deploying new methods and make them easily accessible to end users is presented.

    List of papers
    1. The C1C2: a framework for simultaneous model selection and assessment
    Open this publication in new window or tab >>The C1C2: a framework for simultaneous model selection and assessment
    2008 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 9, 360- p.Article in journal (Refereed) Published
    Abstract [en]

    BACKGROUND: There has been recent concern regarding the inability of predictive modeling approaches to generalize to new data. Some of the problems can be attributed to improper methods for model selection and assessment. Here, we have addressed this issue by introducing a novel and general framework, the C1C2, for simultaneous model selection and assessment. The framework relies on a partitioning of the data in order to separate model choice from model assessment in terms of used data. Since the number of conceivable models in general is vast, it was also of interest to investigate the employment of two automatic search methods, a genetic algorithm and a brute-force method, for model choice. As a demonstration, the C1C2 was applied to simulated and real-world datasets. A penalized linear model was assumed to reasonably approximate the true relation between the dependent and independent variables, thus reducing the model choice problem to a matter of variable selection and choice of penalizing parameter. We also studied the impact of assuming prior knowledge about the number of relevant variables on model choice and generalization error estimates. The results obtained with the C1C2 were compared to those obtained by employing repeated K-fold cross-validation for choosing and assessing a model. RESULTS: The C1C2 framework performed well at finding the true model in terms of choosing the correct variable subset and producing reasonable choices for the penalizing parameter, even in situations when the independent variables were highly correlated and when the number of observations was less than the number of variables. The C1C2 framework was also found to give accurate estimates of the generalization error. Prior information about the number of important independent variables improved the variable subset choice but reduced the accuracy of generalization error estimates. Using the genetic algorithm worsened the model choice but not the generalization error estimates, compared to using the brute-force method. The results obtained with repeated K-fold cross-validation were similar to those produced by the C1C2 in terms of model choice, however a lower accuracy of the generalization error estimates was observed. CONCLUSION: The C1C2 framework was demonstrated to work well for finding the true model within a penalized linear model class and accurately assess its generalization error, even for datasets with many highly correlated independent variables, a low observation-to-variable ratio, and model assumption deviations. A complete separation of the model choice and the model assessment in terms of data used for each task improves the estimates of the generalization error.

    National Category
    Pharmaceutical Sciences
    Identifiers
    urn:nbn:se:uu:diva-104211 (URN)10.1186/1471-2105-9-360 (DOI)000259742800001 ()18761753 (PubMedID)
    Available from: 2009-05-27 Created: 2009-05-27 Last updated: 2015-05-04Bibliographically approved
    2. An eScience-Bayes strategy for analyzing omics data
    Open this publication in new window or tab >>An eScience-Bayes strategy for analyzing omics data
    2010 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 11, 282- p.Article in journal (Refereed) Published
    Abstract [en]

    Background: The omics fields promise to revolutionize our understanding of biology and biomedicine. However, their potential is compromised by the challenge to analyze the huge datasets produced. Analysis of omics data is plagued by the curse of dimensionality, resulting in imprecise estimates of model parameters and performance. Moreover, the integration of omics data with other data sources is difficult to shoehorn into classical statistical models. This has resulted in ad hoc approaches to address specific problems. Results: We present a general approach to omics data analysis that alleviates these problems. By combining eScience and Bayesian methods, we retrieve scientific information and data from multiple sources and coherently incorporate them into large models. These models improve the accuracy of predictions and offer new insights into the underlying mechanisms. This "eScience-Bayes" approach is demonstrated in two proof-of-principle applications, one for breast cancer prognosis prediction from transcriptomic data and one for protein-protein interaction studies based on proteomic data. Conclusions: Bayesian statistics provide the flexibility to tailor statistical models to the complex data structures in omics biology as well as permitting coherent integration of multiple data sources. However, Bayesian methods are in general computationally demanding and require specification of possibly thousands of prior distributions. eScience can help us overcome these difficulties. The eScience-Bayes thus approach permits us to fully leverage on the advantages of Bayesian methods, resulting in models with improved predictive performance that gives more information about the underlying biological system.

    Place, publisher, year, edition, pages
    BioMed Central, 2010
    National Category
    Pharmaceutical Sciences
    Identifiers
    urn:nbn:se:uu:diva-109359 (URN)10.1186/1471-2105-11-282 (DOI)000279732900004 ()20504364 (PubMedID)
    Available from: 2009-10-14 Created: 2009-10-14 Last updated: 2015-05-04Bibliographically approved
    3. SimSel: a new simulation method for variable selection
    Open this publication in new window or tab >>SimSel: a new simulation method for variable selection
    2012 (English)In: Journal of Statistical Computation and Simulation, ISSN 0094-9655, E-ISSN 1563-5163, Vol. 82, no 4, 515-527 p.Article in journal (Refereed) Published
    Abstract [en]

    We propose a new simulation method, SimSel, for variable selection in linear and nonlinear modelling problems. SimSel works by disturbing the input data with pseudo-errors. We then study how this disturbance affects the quality of an approximative model fitted to the data. The main idea is that disturbing unimportant variables does not affect the quality of the model fit. The use of an approximative model has the advantage that the true underlying function does not need to be known and that the method becomes insensitive to model misspecifications. We demonstrate SimSel on simulated data from linear and nonlinear models and on two real data sets. The simulation studies suggest that SimSel works well in complicated situations, such as nonlinear errors-in-variable models.

    Keyword
    variable selection, simulation method, pseudo-error, pseudo-variable
    National Category
    Computer and Information Science
    Identifiers
    urn:nbn:se:uu:diva-109360 (URN)10.1080/00949655.2010.543981 (DOI)000303234800003 ()
    Available from: 2009-10-14 Created: 2009-10-14 Last updated: 2012-07-26Bibliographically approved
    4. Ridge-SimSel: A generalization of the variable selection method SimSel to multicollinear data sets
    Open this publication in new window or tab >>Ridge-SimSel: A generalization of the variable selection method SimSel to multicollinear data sets
    (English)Article in journal (Refereed) Submitted
    National Category
    Probability Theory and Statistics Medical and Health Sciences
    Identifiers
    urn:nbn:se:uu:diva-109361 (URN)
    Available from: 2009-10-14 Created: 2009-10-14 Last updated: 2012-07-26Bibliographically approved
    5. Bioclipse: an open source workbench for chemo- and bioinformatics
    Open this publication in new window or tab >>Bioclipse: an open source workbench for chemo- and bioinformatics
    Show others...
    2007 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 8, 59- p.Article in journal (Refereed) Published
    Abstract [en]

    BACKGROUND: There is a need for software applications that provide users with a complete and extensible toolkit for chemo- and bioinformatics accessible from a single workbench. Commercial packages are expensive and closed source, hence they do not allow end users to modify algorithms and add custom functionality. Existing open source projects are more focused on providing a framework for integrating existing, separately installed bioinformatics packages, rather than providing user-friendly interfaces. No open source chemoinformatics workbench has previously been published, and no successful attempts have been made to integrate chemo- and bioinformatics into a single framework. RESULTS: Bioclipse is an advanced workbench for resources in chemo- and bioinformatics, such as molecules, proteins, sequences, spectra, and scripts. It provides 2D-editing, 3D-visualization, file format conversion, calculation of chemical properties, and much more; all fully integrated into a user-friendly desktop application. Editing supports standard functions such as cut and paste, drag and drop, and undo/redo. Bioclipse is written in Java and based on the Eclipse Rich Client Platform with a state-of-the-art plugin architecture. This gives Bioclipse an advantage over other systems as it can easily be extended with functionality in any desired direction. CONCLUSION: Bioclipse is a powerful workbench for bio- and chemoinformatics as well as an advanced integration platform. The rich functionality, intuitive user interface, and powerful plugin architecture make Bioclipse the most advanced and user-friendly open source workbench for chemo- and bioinformatics. Bioclipse is released under Eclipse Public License (EPL), an open source license which sets no constraints on external plugin licensing; it is totally open for both open source plugins as well as commercial ones. Bioclipse is freely available at http://www.bioclipse.net.

    National Category
    Pharmaceutical Sciences
    Identifiers
    urn:nbn:se:uu:diva-104257 (URN)10.1186/1471-2105-8-59 (DOI)000244600100001 ()17316423 (PubMedID)
    Available from: 2009-05-28 Created: 2009-05-28 Last updated: 2015-09-11Bibliographically approved
  • 27.
    Emanuelsson, O., von Heijne, G., Elofsson, A., Cristóbal, S.
    Uppsala University, Teknisk-naturvetenskapliga vetenskapsområdet, Faculty of Science and Technology, Biology, Department of Cell and Molecular Biology. Uppsala University, Teknisk-naturvetenskapliga vetenskapsområdet, Faculty of Science and Technology, Biology, Department of Cell and Molecular Biology, Molecular Cell Biology. MOLECULAR CELL BIOLOGY.
    In silico prediction of the peroxisomal proteome in fungi, plants and animals2003In: J. Mol. Biol., Vol. 330, 443-456 p.Article in journal (Refereed)
    Abstract [en]

    In an attempt to improve our abilities to predict peroxisomal proteins, we have combined machine-learning techniques for analyzing peroxisomal targeting signals (PTS1) with domain-based cross-species comparisons between eight eukaryotic genomes. Our resul

  • 28.
    Enroth, Stefan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
    The Nucleosome as a Signal Carrying Unit: From Experimental Data to Combinatorial Models of Transcriptional Control2010Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The human genome consists of over 3 billion nucleotides and would be around 2 meters long if uncoiled and laid out. Each human somatic cell contains all this in their nucleus which is only around 5 µm across. This extreme compaction is largely achieved by wrapping the DNA around a histone octamer, the nucleosome. Still, the DNA is accessible to the transcriptional machinery and this regulation is highly dynamic and change rapidly with, e.g. exposure to drugs. The individual histone proteins can carry specific modifications such as methylations and acetylations. These modifications are a major part of the epigenetic status of the DNA which contributes significantly to the transcriptional status of a gene - certain modifications repress transcription and others are necessary for transcription to occur. Specific histone methylations and acetylations have also been implicated in more detailed regulation such as inclusion/exclusion of individual exons, i.e. splicing. Thus, the nucleosome is involved in chromatin remodeling and transcriptional regulation – both directly from steric hindrance but also as a signaling platform via the epigenetic modifications.

    In this work, we have developed tools for storage (Paper I) and normalization (Paper II) of next generation sequencing data in general, and analyzed nucleosome locations and histone modification in particular (Paper I, III and IV). The computational tools developed allowed us as one of the first groups to discover well positioned nucleosomes over internal exons in such wide spread organisms as worm, mouse and human. We have also provided biological insight into how the epigenetic histone modifications can control exon expression in a combinatorial way. This was achieved by applying a Monte Carlo feature selection system in combination with rule based modeling of exon expression. The constructed model was validated on data generated in three additional cell types suggesting a general mechanism.

     

    List of papers
    1. SICTIN: Rapid footprinting of massively parallel sequencing data
    Open this publication in new window or tab >>SICTIN: Rapid footprinting of massively parallel sequencing data
    2010 (English)In: BioData Mining, ISSN 1756-0381, E-ISSN 1756-0381, Vol. 3, 4Article in journal (Refereed) Published
    Abstract [en]

    BACKGROUND: Massively parallel sequencing allows for genome-wide hypothesis-free investigation of for instance transcription factor binding sites or histone modifications. Although nucleotide resolution detailed information can easily be generated, biological insight often requires a more general view of patterns (footprints) over distinct genomic features such as transcription start sites, exons or repetitive regions. The construction of these footprints is however a time consuming task.

    METHODS: The presented software generates a binary representation of the signals enabling fast and scalable lookup. This representation allows for footprint generation in mere minutes on a desktop computer. Several different input formats are accepted, e.g. the SAM format, bed-files and the UCSC wiggle track.

    CONCLUSIONS: Hypothesis-free investigation of genome wide interactions allows for biological data mining at a scale never before seen. Until recently, the main focus of analysis of sequencing data has been targeted on signal patterns around transcriptional start sites which are in manageable numbers. Today, focus is shifting to a wider perspective and numerous genomic features are being studied. To this end, we provide a system allowing for fast querying in the order of hundreds of thousands of features.

    National Category
    Medical and Health Sciences Mathematics
    Identifiers
    urn:nbn:se:uu:diva-129177 (URN)10.1186/1756-0381-3-4 (DOI)000208761100004 ()20707885 (PubMedID)
    Available from: 2010-08-10 Created: 2010-08-06 Last updated: 2017-02-02Bibliographically approved
    2.
    The record could not be found. The reason may be that the record is no longer available or you may have typed in a wrong id in the address field.
    3. Nucleosomes are well positioned in exons and carry characteristic histone modifications
    Open this publication in new window or tab >>Nucleosomes are well positioned in exons and carry characteristic histone modifications
    Show others...
    2009 (English)In: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 19, no 10, 1732-1741 p.Article in journal (Refereed) Published
    Abstract [en]

    The genomes of higher organisms are packaged in nucleosomes with functional histone modifications. Until now, genome-wide nucleosome and histone modification studies have focused on transcription start sites (TSSs) where nucleosomes in RNA polymerase II (RNAPII) occupied genes are well positioned and have histone modifications that are characteristic of expression status. Using public data, we here show that there is a higher nucleosome-positioning signal in internal human exons and that this positioning is independent of expression. We observed a similarly strong nucleosome-positioning signal in internal exons of C. elegans. Among the 38 histone modifications analyzed in man, H3K36me3, H3K79me1, H2BK5me1, H3K27me1, H3K27me2 and H3K27me3 had evidently higher signal in internal exons than in the following introns and were clearly related to exon expression. These observations are suggestive of roles in splicing. Thus, exons are not only characterized by their coding capacity but also by their nucleosome organization, which seems evolutionary conserved since it is present in both primates and nematodes.

    National Category
    Medical and Health Sciences
    Identifiers
    urn:nbn:se:uu:diva-107609 (URN)10.1101/gr.092353.109 (DOI)000270389700005 ()19687145 (PubMedID)
    Note

    De tre första författarna delar första författarskapet.

    Available from: 2009-08-19 Created: 2009-08-19 Last updated: 2017-02-02Bibliographically approved
    4. Combinations of histone modifications control exon expression
    Open this publication in new window or tab >>Combinations of histone modifications control exon expression
    (English)Article in journal (Other academic) Submitted
    National Category
    Medical and Health Sciences
    Identifiers
    urn:nbn:se:uu:diva-129178 (URN)
    Available from: 2010-08-10 Created: 2010-08-06 Last updated: 2010-12-22Bibliographically approved
  • 29.
    Eriksson, Markus
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing.
    Lattice-based simulations of microscopic reaction-diffusion models in a crowded environment2016Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Molecules inside living cells move by diffusion and can react with each other upon

    collision. Living cells are occupied by macromolecules which limits the available space

    for the particles to diffuse in. The effect caused by these crowders has been modeled

    and the accuracy of the models has been evaluated. We investigate two models that

    follows individual particle trajectories. The more accurate model samples Brownian

    Dynamics in a continuous space. The second computationally more efficient model,

    uses discrete space were the particles move on a lattice. The result shows that the

    lattice model under-estimates the crowding effect on the diffusive behavior. The

    reaction rates were both increased and decreased depending on the time and amount

    of crowders when comparing the lattice model to the model using Brownian

    Dynamics. This also prove the importance to model the particles with realistic sizes

    when simulating reaction-diffusion in a crowded environment.

  • 30. Fagerberg, Linn
    et al.
    Oksvold, Per
    Skogs, Marie
    Algenäs, Cajsa
    Lundberg, Emma
    Pontén, Fredrik
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Molecular and Morphological Pathology.
    Sivertsson, Asa
    Odeberg, Jacob
    Klevebring, Daniel
    Kampf, Caroline
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Molecular and Morphological Pathology.
    Asplund, Anna
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Molecular and Morphological Pathology.
    Sjöstedt, Evelina
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Molecular and Morphological Pathology.
    Al-Khalili Szigyarto, Cristina
    Edqvist, Per-Henrik
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Molecular and Morphological Pathology.
    Olsson, IngMarie
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Molecular and Morphological Pathology.
    Rydberg, Urban
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Immunology, Genetics and Pathology, Molecular and Morphological Pathology.
    Hudson, Paul
    Ottosson Takanen, Jenny
    Berling, Holger
    Björling, Lisa
    Tegel, Hanna
    Rockberg, Johan
    Nilsson, Peter
    Navani, Sanjay
    Jirström, Karin
    Mulder, Jan
    Schwenk, Jochen M
    Zwahlen, Martin
    Hober, Sophia
    Forsberg, Mattias
    von Feilitzen, Kalle
    Uhlén, Mathias
    Contribution of Antibody-based Protein Profiling to the Human Chromosome-centric Proteome Project (C-HPP)2013In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 12, no 6, 2439-2448 p.Article in journal (Refereed)
    Abstract [en]

    A gene-centric Human Proteome Project has been proposed to characterize the human protein-coding genes in a chromosome-centered manner to understand human biology and disease. Here, we report on the protein evidence for all genes predicted from the genome sequence based on manual annotation from literature (UniProt), antibody-based profiling in cells, tissues and organs and analysis of the transcript profiles using next generation sequencing in human cell lines of different origins. We estimate that there is good evidence for protein existence for 69% (n = 13985) of the human protein-coding genes, while 23% have only evidence on the RNA level and 7% still lack experimental evidence. Analysis of the expression patterns shows few tissue-specific proteins and approximately half of the genes expressed in all the analyzed cells. The status for each gene with regards to protein evidence is visualized in a chromosome-centric manner as part of a new version of the Human Protein Atlas ( www.proteinatlas.org ).

  • 31.
    Fange, David
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology.
    Fange, David
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology.
    Elf, Johan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology.
    MesoRD 1.0: Stochastic reaction-diffusion simulations in the microscopic limit2012In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811Article in journal (Refereed)
  • 32. Field, Dawn
    et al.
    Garrity, George
    Gray, Tanya
    Morrison, Norman
    Selengut, Jeremy
    Sterk, Peter
    Tatusova, Tatiana
    Thomson, Nicholas
    Allen, Michael J
    Angiuoli, Samuel V
    Ashburner, Michael
    Axelrod, Nelson
    Baldauf, Sandra
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Organism Biology, Systematic Biology.
    Ballard, Stuart
    Boore, Jeffrey
    Cochrane, Guy
    Cole, James
    Dawyndt, Peter
    De Vos, Paul
    DePamphilis, Claude
    Edwards, Robert
    Faruque, Nadeem
    Feldman, Robert
    Gilbert, Jack
    Gilna, Paul
    Glöckner, Frank Oliver
    Goldstein, Philip
    Guralnick, Robert
    Haft, Dan
    Hancock, David
    Hermjakob, Henning
    Hertz-Fowler, Christiane
    Hugenholtz, Phil
    Joint, Ian
    Kagan, Leonid
    Kane, Matthew
    Kennedy, Jessie
    Kowalchuk, George
    Kottmann, Renzo
    Kolker, Eugene
    Kravitz, Saul
    Kyrpides, Nikos
    Leebens-Mack, Jim
    Lewis, Suzanna E
    Li, Kelvin
    Lister, Allyson L
    Lord, Phillip
    Maltsev, Natalia
    Markowitz, Victor
    Martiny, Jennifer
    Methe, Barbara
    Mizrachi, Ilene
    Moxon, Richard
    Nelson, Karen
    Parkhill, Julian
    Proctor, Lita
    White, Owen
    Sansone, Susanna-Assunta
    Spiers, Andrew
    Stevens, Robert
    Swift, Paul
    Taylor, Chris
    Tateno, Yoshio
    Tett, Adrian
    Turner, Sarah
    Ussery, David
    Vaughan, Bob
    Ward, Naomi
    Whetzel, Trish
    San Gil, Ingio
    Wilson, Gareth
    Wipat, Anil
    The minimum information about a genome sequence (MIGS) specification.2008In: Nature biotechnology, ISSN 1546-1696, Vol. 26, no 5, 541-7 p.Article in journal (Refereed)
    Abstract [en]

    With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the 'transparency' of the information contained in existing genomic databases.

  • 33. Flores, Samuel
    FlexOracle: predicting flexible hinges by identification of stable domains2007In: BMC Bioinformatics, ISSN 1471-2105, Vol. 8, 215- p.Article in journal (Refereed)
  • 34. Flores, Samuel
    Hinge Atlas: relating sequence features to sites of structural flexibility2007In: BMC Bioinformatics, ISSN 1471-2105, Vol. 8, 167- p.Article in journal (Refereed)
  • 35. Flores, Samuel
    HingeMaster: Normal mode hinge prediction approach and integration of complementary predictors2008In: Proteins: Structure, Function, and Bioinformatics, Vol. 73, 299-319 p.Article in journal (Refereed)
  • 36. Flores, Samuel
    Predicting RNA stucture by multiple template homology modeling2010In: Pacific Symposium on Biocomputing, 216-227 p.Article in journal (Refereed)
  • 37.
    Flores, Samuel
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    Bernauer, Julie
    INRIA.
    Shin, Seokmin
    Chemistry Department of Seoul National University.
    Zhou, Ruhong
    Huang, Xuhui
    Department of Chemistry, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong.
    Multiscale modeling of macromolecular biosystems2012In: Briefings in Bioinformatics, ISSN 1467-5463, E-ISSN 1477-4054, Vol. 13, no 4, 395-405 p.Article in journal (Refereed)
    Abstract [en]

    In this article, we review the recent progress in multiresolution modeling of structure and dynamics of protein, RNA and their complexes. Many approaches using both physics-based and knowledge-based potentials have been developed at multiple granularities to model both protein and RNA. Coarse graining can be achieved not only in the length, but also in the time domain using discrete time and discrete state kinetic network models. Models with different resolutions can be combined either in a sequential or parallel fashion. Similarly, the modeling of assemblies is also often achieved using multiple granularities.The progress shows that a multiresolution approach has considerable potential to continue extending the length and time scales of macromolecular modeling.

  • 38.
    Flores, Samuel
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    Gerstein, Mark
    Yale University.
    Predicting protein ligand binding motions with the Conformation Explorer2011In: BMC Bioinformatics, ISSN 1471-2105, Vol. 12, 417- p.Article in journal (Refereed)
    Abstract [en]

    Background

    Knowledge of the structure of proteins bound to known or potential ligands is crucial for biological understanding and drug design. Often the 3D structure of the protein is available in some conformation, but binding the ligand of interest may involve a large scale conformational change which is difficult to predict with existing methods.

    Results

    We describe how to generate ligand binding conformations of proteins that move by hinge bending, the largest class of motions. First, we predict the location of the hinge between domains. Second, we apply an Euler rotation to one of the domains about the hinge point. Third, we compute a short-time dynamical trajectory using Molecular Dynamics to equilibrate the protein and ligand and correct unnatural atomic positions. Fourth, we score the generated structures using a novel fitness function which favors closed or holo structures. By iterating the second through fourth steps we systematically minimize the fitness function, thus predicting the conformational change required for small ligand binding for five well studied proteins.

    Conclusions

    We demonstrate that the method in most cases successfully predicts the holo conformation given only an apo structure.

  • 39.
    Flores, Samuel
    et al.
    Stanford University.
    Jonikas, Magdalena
    Stanford University.
    Methods for building and refining 3D models of RNA2012In: "RNA 3D Structure Analysis and Prediction / [ed] Neocles Leontis and Eric Westhof, Springer London, 2012, 1Chapter in book (Other academic)
  • 40. Flores, Samuel
    et al.
    Sherman, Michael
    Stanford University.
    Fast flexible modeling of RNA structure using internal coordinates2011In: Transactions in Computational Biology and BioinformaticsArticle in journal (Refereed)
  • 41.
    Francis, Roy M.
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Ecology and Genetics, Evolutionary Biology.
    POPHELPER: an R package and web app to analyse and visualize population structure2017In: Molecular Ecology Resources, ISSN 1755-098X, E-ISSN 1755-0998, Vol. 17, no 1, 27-32 p.Article in journal (Refereed)
    Abstract [en]

    The POPHELPER R package and web app are software tools to aid in population structure analyses. They can be used for the analyses and visualization of output generated from population assignment programs such as ADMIXTURE, STRUCTURE and TESS. Some of the functions include parsing output run files to tabulate data, estimating K using the Evanno method, generating files for CLUMPP and functionality to create barplots. These functions can be streamlined into standard R analysis workflows. The latest version of the package is available on GITHUB ( https://github.com/royfrancis/pophelper). An interactive web version of the POPHELPER package is available which covers the same functionalities as the R package version with features such as interactive plots, cluster alignment during plotting, sorting individuals and ordering of population groups. The interactive version is available at http://pophelper.com/.

  • 42.
    Frisk, Christoffer
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Bioinformatics.
    Automated protein-family classification based on hidden Markov models2015Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    The aim of the project presented in this paper was to investigate the possibility toautomatically sub-classify the superfamily of Short-chain Dehydrogenase/Reductases (SDR).This was done based on an algorithm previously designed to sub-classify the superfamily ofMedium-chain Dehydrogenase/Reductases (MDR). While the SDR-family is interesting andimportant to sub-classify there was also a focus on making the process as automatic aspossible so that future families also can be classified using the same methods.To validate the results generated it was compared to previous sub-classifications done on theSDR-family. The results proved promising and the work conducted here can be seen as a goodinitial part of a more comprehensive full investigation

  • 43. Gerardo, Nicole M
    et al.
    Altincicek, Boran
    Anselme, Caroline
    Atamian, Hagop
    Barribeau, Seth M
    de Vos, Martin
    Duncan, Elizabeth J
    Evans, Jay D
    Gabaldón, Toni
    Ghanim, Murad
    Heddi, Adelaziz
    Kaloshian, Isgouhi
    Latorre, Amparo
    Moya, Andres
    Nakabachi, Atsushi
    Parker, Benjamin J
    Pérez-Brocal, Vincente
    Pignatelli, Miguel
    Rahbé, Yvan
    Ramsey, John S
    Spragg, Chelsea J
    Tamames, Javier
    Tamarit, Daniel
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Molecular Evolution.
    Tamborindeguy, Cecilia
    Vincent-Monegat, Caroline
    Vilcinskas, Andreas
    Immunity and other defenses in pea aphids, Acyrthosiphon pisum.2010In: Genome Biology, ISSN 1465-6906, E-ISSN 1474-760X, Vol. 11, no 2Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Recent genomic analyses of arthropod defense mechanisms suggest conservation of key elements underlying responses to pathogens, parasites and stresses. At the center of pathogen-induced immune responses are signaling pathways triggered by the recognition of fungal, bacterial and viral signatures. These pathways result in the production of response molecules, such as antimicrobial peptides and lysozymes, which degrade or destroy invaders. Using the recently sequenced genome of the pea aphid (Acyrthosiphon pisum), we conducted the first extensive annotation of the immune and stress gene repertoire of a hemipterous insect, which is phylogenetically distantly related to previously characterized insects models.

    RESULTS: Strikingly, pea aphids appear to be missing genes present in insect genomes characterized to date and thought critical for recognition, signaling and killing of microbes. In line with results of gene annotation, experimental analyses designed to characterize immune response through the isolation of RNA transcripts and proteins from immune-challenged pea aphids uncovered few immune-related products. Gene expression studies, however, indicated some expression of immune and stress-related genes.

    CONCLUSIONS: The absence of genes suspected to be essential for the insect immune response suggests that the traditional view of insect immunity may not be as broadly applicable as once thought. The limitations of the aphid immune system may be representative of a broad range of insects, or may be aphid specific. We suggest that several aspects of the aphid life style, such as their association with microbial symbionts, could facilitate survival without strong immune protection.

  • 44.
    Glaros, Anastasios
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Biology Education Centre.
    Data-driven Definition of Cell Types Based on Single-cell Gene Expression Data2016Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
  • 45. Gould, Cathryn M
    et al.
    Diella, Francesca
    Via, Allegra
    Puntervoll, Pål
    Gemünd, Christine
    Chabanis-Davidson, Sophie
    Michael, Sushama
    Sayadi, Ahmed
    Department of Biochemical Sciences, ‘Sapienza Universita’ di Roma, Rome, Italy.
    Bryne, Jan Christian
    Chica, Claudia
    Seiler, Markus
    Davey, Norman E
    Haslam, Niall
    Weatheritt, Robert J
    Budd, Aidan
    Hughes, Tim
    Pas, Jakub
    Rychlewski, Leszek
    Travé, Gilles
    Aasland, Rein
    Helmer-Citterich, Manuela
    Linding, Rune
    Gibson, Toby J
    ELM: the status of the 2010 eukaryotic linear motif resource.2010In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 38Article in journal (Refereed)
    Abstract [en]

    Linear motifs are short segments of multidomain proteins that provide regulatory functions independently of protein tertiary structure. Much of intracellular signalling passes through protein modifications at linear motifs. Many thousands of linear motif instances, most notably phosphorylation sites, have now been reported. Although clearly very abundant, linear motifs are difficult to predict de novo in protein sequences due to the difficulty of obtaining robust statistical assessments. The ELM resource at http://elm.eu.org/ provides an expanding knowledge base, currently covering 146 known motifs, with annotation that includes >1300 experimentally reported instances. ELM is also an exploratory tool for suggesting new candidates of known linear motifs in proteins of interest. Information about protein domains, protein structure and native disorder, cellular and taxonomic contexts is used to reduce or deprecate false positive matches. Results are graphically displayed in a 'Bar Code' format, which also displays known instances from homologous proteins through a novel 'Instance Mapper' protocol based on PHI-BLAST. ELM server output provides links to the ELM annotation as well as to a number of remote resources. Using the links, researchers can explore the motifs, proteins, complex structures and associated literature to evaluate whether candidate motifs might be worth experimental investigation.

  • 46. Grabherr, Manfred G
    et al.
    Russell, Pamela
    Meyer, Miriah
    Mauceli, Evan
    Alföldi, Jessica
    Di Palma, Federica
    Lindblad-Toh, Kerstin
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology.
    Genome-wide synteny through highly sensitive sequence alignment: Satsuma2010In: Bioinformatics, ISSN 1367-4803, E-ISSN 1460-2059, Vol. 26, no 9, 1145-1151 p.Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: Comparative genomics heavily relies on alignments of large and often complex DNA sequences. From an engineering perspective, the problem here is to provide maximum sensitivity (to find all there is to find), specificity (to only find real homology) and speed (to accommodate the billions of base pairs of vertebrate genomes). RESULTS: Satsuma addresses all three issues through novel strategies: (i) cross-correlation, implemented via fast Fourier transform; (ii) a match scoring scheme that eliminates almost all false hits; and (iii) an asynchronous 'battleship'-like search that allows for aligning two entire fish genomes (470 and 217 Mb) in 120 CPU hours using 15 processors on a single machine. AVAILABILITY: Satsuma is part of the Spines software package, implemented in C++ on Linux. The latest version of Spines can be freely downloaded under the LGPL license from http://www.broadinstitute.org/science/programs/genome-biology/spines/.

  • 47.
    Grönlund, Andreas
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Mathematics, Analysis and Applied Mathematics.
    Lötstedt, Per
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Numerical Analysis.
    Elf, Johan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology.
    Costs and constraints from time-delayed feedback in small gene regulatory motifs2010In: Proceedings of the National Academy of Sciences of the United States of America, ISSN 0027-8424, E-ISSN 1091-6490, Vol. 107, 8171-8176 p.Article in journal (Refereed)
  • 48.
    Grönlund, Andreas
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Mathematics, Analysis and Applied Mathematics.
    Lötstedt, Per
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Numerical Analysis.
    Elf, Johan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Delay-induced anomalous fluctuations in intracellular regulation2011In: Nature Communications, ISSN 2041-1723, Vol. 2, 419:1-7 p.Article in journal (Refereed)
  • 49.
    Grönlund, Andreas
    et al.
    Uppsala University, Science for Life Laboratory, SciLifeLab.
    Lötstedt, Per
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Numerical Analysis.
    Elf, Johan
    Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, Computational and Systems Biology. Uppsala University, Science for Life Laboratory, SciLifeLab.
    Transcription factor binding kinetics constrain noise suppression via negative feedback2013In: Nature Communications, ISSN 2041-1723, Vol. 4, 1864:1-5 p.Article in journal (Refereed)
  • 50.
    Guy, Lionel
    Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Medicine, Department of Medical Biochemistry and Microbiology.
    phyloSkeleton: taxon selection, data retrieval and marker identification for phylogenomics2017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 8, 1230-1232 p.Article in journal (Refereed)
    Abstract [en]

    With the wealth of available genome sequences, a difficult and tedious part of inferring phylogenomic trees is now to select genomes with an appropriate taxon density in the different parts of the tree. The package described here offers tools to easily select the most representative organisms, following a set of simple rules based on taxonomy and assembly quality, to retrieve the genomes from public databases (NCBI, JGI), to annotate them if necessary, to identify given markers in these, and to prepare files for multiple sequence alignment.

    AVAILABILITY AND IMPLEMENTATION: phyloSkeleton is a Perl module and is freely available under GPLv3 at https://bitbucket.org/lionelguy/phyloskeleton/ CONTACT: lionel.guy@imbim.uu.se.

12345 1 - 50 of 203
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf