uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Monte Carlo approach to modeling post-translational modification sites using local physicochemical properties.
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics. (Jan Komorowski's)
Interdisciplinary Centre for Mathematical and Computer Modeling, University of Warsaw, Poland.
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics. (Jan Komorowski's)
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics. (Jan Komorowski's)
Show others and affiliations
(English)Manuscript (preprint) (Other (popular science, discussion, etc.))
Abstract [en]

Many proteins undergo various chemical modifications during or shortly after translation. Post-translational modifications (PTM) greatly contribute to the diversity of protein functions and play crucial role in many cellular processes. Therefore understanding where and why certain protein is modified is an important issue in biomedical research. Mechanisms underlying some types of PTMs have been elucidated but many still remain unknown and a number of tools for predicting PTMs from short sequence fragments exists. While usually accurate at predicting modification sites, these tools are not designed to increase the understanding of modification mechanisms. Here we attempted at building easy-to-interpret models of PTMs and at identifying the physicochemical properties significant for determining modification status. To this end we applied our Monte Carlo feature selection and interdependency discovery (MCFS-ID) method. Considering 9 aa-long sequence fragments that were represented in terms of their physicochem- ical properties we analyzed 76 types of PTMs and for each type we identified the properties that played significant (p ≤ 0.05) role in the classification process. For 17 types of modifications no significant prop- erty was found. For the remaining 59 types, we used the significant properties to construct random forest-based high quality predictive models. We also showed an example of how to interpret the models by analyzing interdependency networks of significant properties and how to complement the networks with decision rules inferred using rough set theory. The obtained results showed the necessity of applying feature selection prior to constructing a model that considers short sequence fragments. Interestingly, for some types of modifications we saw that models based on insignificant features can yield accurate results. This observation deserves further investigation. Among the examined PTMs we observed groups that share similar patterns of significant properties. We also showed how to complement our models with decision rules that can guide life scientists in their research and to shed light on the actual molecular mechanisms determining modification status.

National Category
Bioinformatics and Systems Biology
Identifiers
URN: urn:nbn:se:uu:diva-109836OAI: oai:DiVA.org:uu-109836DiVA: diva2:274122
Available from: 2009-10-29 Created: 2009-10-27 Last updated: 2010-01-13Bibliographically approved
In thesis
1. From Physicochemical Features to Interdependency Networks: A Monte Carlo Approach to Modeling HIV-1 Resistome and Post-translational Modifications
Open this publication in new window or tab >>From Physicochemical Features to Interdependency Networks: A Monte Carlo Approach to Modeling HIV-1 Resistome and Post-translational Modifications
2009 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The availability of new technologies supplied life scientists with large amounts of experimental data. The data sets are large not only in terms of the number of observations, but also in terms of the number of recorded features. One of the aims of modeling is to explain a given phenomenon in possibly the simplest way, hence the need for selection of suitable features.

We extended a Monte Carlo-based approach to selecting statistically significant features with discovery of feature interdependencies and used it in modeling sequence-function relationships in proteins. Our approach led to compact and easy-to-interpret predictive models.

First, we represented protein sequences in terms of their physicochemical properties. This was followed by our feature selection and discovery of feature interdependencies. Finally, predictive models based on e.g., decision trees or rough sets were constructed.

We applied the method to model two important biological problems: 1) HIV-1 resistance to reverse transcriptase-targeted drugs and 2) post-translational modifications of proteins.

In the case of HIV resistance, we were not only able to predict whether the mutated protein is resistant to a drug or not, but we also suggested some new, previously neglected, mutations that possibly contribute to drug resistance. For all these mutations we proposed probable molecular mechanisms of action using literature and 3D structure studies.

In the case of predicting PTMs, we built high accuracy models of modifications. In comparison to other methods, we were able to resolve whether the closest neighborhood of a residue (the nanomer) is sufficient to determine its modification status. Importantly, the application of our method yields networks of interdependent physicochemical properties of amino acids that show how these properties collaborate in establishing a given modification.

We believe that the presented methods will help researchers to analyze a large class of important biological problems and will guide them in their research.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2009. 89 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 688
Keyword
bioinformatics, HIV-1, resistome analysis, drug resistance, predicting PTMs, molecular interdependency networks, MCFS-ID, feature selection, interactome, machine-learning, rough sets
National Category
Bioinformatics and Systems Biology
Research subject
Computer Science; Biology, with specialization in structural biology
Identifiers
urn:nbn:se:uu:diva-109873 (URN)978-91-554-7650-2 (ISBN)
Public defence
2009-12-15, C8:305, BMC, Husargatan 3, Uppsala, 09:15 (English)
Opponent
Supervisors
Available from: 2009-11-18 Created: 2009-10-28 Last updated: 2009-11-18Bibliographically approved

Open Access in DiVA

No full text

Authority records BETA

Kierczak, Marcin

Search in DiVA

By author/editor
Kierczak, Marcin
By organisation
The Linnaeus Centre for Bioinformatics
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 465 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf