uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
Generalized modeling of enzyme-ligand interactions using proteochemometrics and local protein substructures
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics.
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.
Show others and affiliations
2006 (English)In: Proteins: Structure, Function, and Genetics, ISSN 0887-3585, E-ISSN 1097-0134, Vol. 65, no 3, 568-579 p.Article in journal (Refereed) Published
Abstract [en]

Modeling and understanding protein-ligand interactions is one of the most important goals in computational drug discovery. To this end, proteochemometrics uses structural and chemical descriptors from several proteins and several ligands to induce interaction-models. Here, we present a new and generalized approach in which proteins varying greatly in terms of sequence and structure are represented by a library of local substructures. Using linear regression and rule-based learning, we combine such local substructures with chemical descriptors from the ligands to model binding affinity for a training set of hydrolase and lyase enzymes. We evaluate the predictive performance of these models using cross validation and sets of unseen ligand with unknown three-dimensional structure. The models are shown to generalize by outperforming models using descriptors from only proteins or only ligands, or models using global structure similarities rather than local similarities. Thus, we demonstrate that this approach is capable of describing dependencies between local structural properties and ligands in otherwise dissimilar protein structures. These dependencies are often, but not always, associated with local substructures that are in contact with the ligands. Finally, we show that strongly bound enzyme-ligand complexes require the presence of particular local substructures, while weakly bound complexes may be described by the absence of certain properties. The results demonstrate that the alignment-independent approach using local substructures is capable of describing protein-ligand interaction for largely different proteins and hence opens up for proteochemometrics-analysis of the interaction-space of entire proteomes. Current approaches are limited to families of closely related proteins. families of closely related proteins.

Place, publisher, year, edition, pages
2006. Vol. 65, no 3, 568-579 p.
Keyword [en]
QSAR, partial least squares, rule-based learning, drug design, local descriptors of protein structure
National Category
Pharmaceutical Sciences
URN: urn:nbn:se:uu:diva-23995DOI: 10.1002/prot.21163ISI: 000241247100005PubMedID: 16948162OAI: oai:DiVA.org:uu-23995DiVA: diva2:51769
Available from: 2007-02-02 Created: 2007-02-02 Last updated: 2011-05-20Bibliographically approved
In thesis
1. Chemogenomics: Models of Protein-Ligand Interaction Space
Open this publication in new window or tab >>Chemogenomics: Models of Protein-Ligand Interaction Space
2009 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The large majority of the currently used drugs are small molecules that interact with proteins. Understanding protein-ligand recognition is thus central to drug discovery and design. Improved experimental techniques have resulted in an immense growth of drug target information. This has stimulated the development of chemogenomics and proteochemometrics (PCM) that take target information as well as ligand information into account to study the genomic effect of potential drugs.

This thesis is concerned with modeling protein-ligand recognition, and the aim is to develop models that generalize to the entire protein-ligand space. To this end, protein-ligand interaction data has been extracted and manually curated from public databases, protein and ligand descriptors have been computed, and predictive models have been induced with machine-learning methods.

An introduction to chemogenomics, machine learning, and PCM modeling is given in the thesis summary, which is followed by five research papers. Paper I shows that it is possible to induce interpretable models with a non-linear rule-based method, and paper II demonstrates that local descriptors of protein structure may be used to induce PCM models that cover proteins differing in sequence and fold. In paper III, such local descriptors are used to induce a PCM model on a large dataset that includes all major enzyme classes. This demonstrates that the local descriptors may be used to induce generalized models that span the entire known structural enzyme-ligand space. Paper IV describes a step towards proteome-wide PCM models, and shows that it is possible to predict high- and low-affinity complexes using a set of protein and ligand descriptors that do not require knowledge of 3D structure. Finally, paper V presents a method to visualize and compare protein-ligand chemogenomic subspaces, which may be used to predict unwanted cross-interactions of drugs with other proteins in the proteome.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2009. 54 p.
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 608
urn:nbn:se:uu:diva-89299 (URN)978-91-554-7430-0 (ISBN)
Public defence
2009-03-27, C8:305, Biomedical Centre, Uppsala, 13:00 (English)
Available from: 2009-03-05 Created: 2009-02-10 Last updated: 2009-06-02Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textPubMed
By organisation
The Linnaeus Centre for BioinformaticsDepartment of Pharmaceutical Biosciences
In the same journal
Proteins: Structure, Function, and Genetics
Pharmaceutical Sciences

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 207 hits
ReferencesLink to record
Permanent link

Direct link