A Monte Carlo approach to modeling post-translational modification sites using local physicochemical properties.
(English)Manuscript (preprint) (Other (popular science, discussion, etc.))
Many proteins undergo various chemical modiﬁcations during or shortly after translation. Post-translational modiﬁcations (PTM) greatly contribute to the diversity of protein functions and play crucial role in many cellular processes. Therefore understanding where and why certain protein is modiﬁed is an important issue in biomedical research. Mechanisms underlying some types of PTMs have been elucidated but many still remain unknown and a number of tools for predicting PTMs from short sequence fragments exists. While usually accurate at predicting modiﬁcation sites, these tools are not designed to increase the understanding of modiﬁcation mechanisms. Here we attempted at building easy-to-interpret models of PTMs and at identifying the physicochemical properties signiﬁcant for determining modiﬁcation status. To this end we applied our Monte Carlo feature selection and interdependency discovery (MCFS-ID) method. Considering 9 aa-long sequence fragments that were represented in terms of their physicochem- ical properties we analyzed 76 types of PTMs and for each type we identiﬁed the properties that played signiﬁcant (p ≤ 0.05) role in the classiﬁcation process. For 17 types of modiﬁcations no signiﬁcant prop- erty was found. For the remaining 59 types, we used the signiﬁcant properties to construct random forest-based high quality predictive models. We also showed an example of how to interpret the models by analyzing interdependency networks of signiﬁcant properties and how to complement the networks with decision rules inferred using rough set theory. The obtained results showed the necessity of applying feature selection prior to constructing a model that considers short sequence fragments. Interestingly, for some types of modiﬁcations we saw that models based on insigniﬁcant features can yield accurate results. This observation deserves further investigation. Among the examined PTMs we observed groups that share similar patterns of signiﬁcant properties. We also showed how to complement our models with decision rules that can guide life scientists in their research and to shed light on the actual molecular mechanisms determining modiﬁcation status.
Bioinformatics and Systems Biology
IdentifiersURN: urn:nbn:se:uu:diva-109836OAI: oai:DiVA.org:uu-109836DiVA: diva2:274122