uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
Chemogenomics: Models of Protein-Ligand Interaction Space
Uppsala University, Disciplinary Domain of Science and Technology, Biology, Department of Cell and Molecular Biology, The Linnaeus Centre for Bioinformatics. (Gerard Kleywegt)
2009 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The large majority of the currently used drugs are small molecules that interact with proteins. Understanding protein-ligand recognition is thus central to drug discovery and design. Improved experimental techniques have resulted in an immense growth of drug target information. This has stimulated the development of chemogenomics and proteochemometrics (PCM) that take target information as well as ligand information into account to study the genomic effect of potential drugs.

This thesis is concerned with modeling protein-ligand recognition, and the aim is to develop models that generalize to the entire protein-ligand space. To this end, protein-ligand interaction data has been extracted and manually curated from public databases, protein and ligand descriptors have been computed, and predictive models have been induced with machine-learning methods.

An introduction to chemogenomics, machine learning, and PCM modeling is given in the thesis summary, which is followed by five research papers. Paper I shows that it is possible to induce interpretable models with a non-linear rule-based method, and paper II demonstrates that local descriptors of protein structure may be used to induce PCM models that cover proteins differing in sequence and fold. In paper III, such local descriptors are used to induce a PCM model on a large dataset that includes all major enzyme classes. This demonstrates that the local descriptors may be used to induce generalized models that span the entire known structural enzyme-ligand space. Paper IV describes a step towards proteome-wide PCM models, and shows that it is possible to predict high- and low-affinity complexes using a set of protein and ligand descriptors that do not require knowledge of 3D structure. Finally, paper V presents a method to visualize and compare protein-ligand chemogenomic subspaces, which may be used to predict unwanted cross-interactions of drugs with other proteins in the proteome.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis , 2009. , 54 p.
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 608
URN: urn:nbn:se:uu:diva-89299ISBN: 978-91-554-7430-0OAI: oai:DiVA.org:uu-89299DiVA: diva2:159948
Public defence
2009-03-27, C8:305, Biomedical Centre, Uppsala, 13:00 (English)
Available from: 2009-03-05 Created: 2009-02-10 Last updated: 2009-06-02Bibliographically approved
List of papers
The record could not be found. The reason may be that the record is no longer available or you may have typed in a wrong id in the address field.
2. A chemogenomics view on protein-ligand spaces
Open this publication in new window or tab >>A chemogenomics view on protein-ligand spaces
2009 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 10, no Suppl.6, S13- p.Article in journal (Refereed) Published
Abstract [en]

BACKGROUND: Chemogenomics is an emerging inter-disciplinary approach to drug discovery that combines traditional ligand-based approaches with biological information on drug targets and lies at the interface of chemistry, biology and informatics. The ultimate goal in chemogenomics is to understand molecular recognition between all possible ligands and all possible drug targets. Protein and ligand space have previously been studied as separate entities, but chemogenomics studies deal with large datasets that cover parts of the joint protein-ligand space. Since drug discovery has traditionally focused on ligand optimization, the chemical space has been studied extensively. The protein space has been studied to some extent, typically for the purpose of classification of proteins into functional and structural classes. Since chemogenomics deals not only with ligands but also with the macromolecules the ligands interact with, it is of interest to find means to explore, compare and visualize protein-ligand subspaces. RESULTS: Two chemogenomics protein-ligand interaction datasets were prepared for this study. The first dataset covers the known structural protein-ligand space, and includes all non-redundant protein-ligand interactions found in the worldwide Protein Data Bank (PDB). The second dataset contains all approved drugs and drug targets stored in the DrugBank database, and represents the approved drug-drug target space. To capture biological and physicochemical features of the chemogenomics datasets, sequence-based descriptors were computed for the proteins, and 0, 1 and 2 dimensional descriptors for the ligands. Principal component analysis (PCA) was used to analyze the multidimensional data and to create global models of protein-ligand space. The nearest neighbour method, computed using the principal components, was used to obtain a measure of overlap between the datasets. CONCLUSION: In this study, we present an approach to visualize protein-ligand spaces from a chemogenomics perspective, where both ligand and protein features are taken into account. The method can be applied to any protein-ligand interaction dataset. Here, the approach is applied to analyze the structural protein-ligand space and the protein-ligand space of all approved drugs and their targets. We show that this approach can be used to visualize and compare chemogenomics datasets, and possibly to identify cross-interaction complexes in protein-ligand space.

National Category
Natural Sciences
urn:nbn:se:uu:diva-89297 (URN)10.1186/1471-2105-10-S6-S13 (DOI)000267522200013 ()19534738 (PubMedID)
Available from: 2009-02-10 Created: 2009-02-10 Last updated: 2011-03-12Bibliographically approved
3. Generalized modeling of enzyme-ligand interactions using proteochemometrics and local protein substructures
Open this publication in new window or tab >>Generalized modeling of enzyme-ligand interactions using proteochemometrics and local protein substructures
Show others...
2006 (English)In: Proteins: Structure, Function, and Genetics, ISSN 0887-3585, E-ISSN 1097-0134, Vol. 65, no 3, 568-579 p.Article in journal (Refereed) Published
Abstract [en]

Modeling and understanding protein-ligand interactions is one of the most important goals in computational drug discovery. To this end, proteochemometrics uses structural and chemical descriptors from several proteins and several ligands to induce interaction-models. Here, we present a new and generalized approach in which proteins varying greatly in terms of sequence and structure are represented by a library of local substructures. Using linear regression and rule-based learning, we combine such local substructures with chemical descriptors from the ligands to model binding affinity for a training set of hydrolase and lyase enzymes. We evaluate the predictive performance of these models using cross validation and sets of unseen ligand with unknown three-dimensional structure. The models are shown to generalize by outperforming models using descriptors from only proteins or only ligands, or models using global structure similarities rather than local similarities. Thus, we demonstrate that this approach is capable of describing dependencies between local structural properties and ligands in otherwise dissimilar protein structures. These dependencies are often, but not always, associated with local substructures that are in contact with the ligands. Finally, we show that strongly bound enzyme-ligand complexes require the presence of particular local substructures, while weakly bound complexes may be described by the absence of certain properties. The results demonstrate that the alignment-independent approach using local substructures is capable of describing protein-ligand interaction for largely different proteins and hence opens up for proteochemometrics-analysis of the interaction-space of entire proteomes. Current approaches are limited to families of closely related proteins. families of closely related proteins.

QSAR, partial least squares, rule-based learning, drug design, local descriptors of protein structure
National Category
Pharmaceutical Sciences
urn:nbn:se:uu:diva-23995 (URN)10.1002/prot.21163 (DOI)000241247100005 ()16948162 (PubMedID)
Available from: 2007-02-02 Created: 2007-02-02 Last updated: 2011-05-20Bibliographically approved
4. Towards proteome-wide interaction models using the proteochemometrics approach
Open this publication in new window or tab >>Towards proteome-wide interaction models using the proteochemometrics approach
2010 (English)In: Molecular Informatics, ISSN 1868-1743, Vol. 29, no 6-7, 499-508 p.Article in journal (Refereed) Published
Abstract [en]

A proteochemometrics model was induced from all interaction data in the BindingDB database, comprizing in all 7078 protein-ligand complexes with representatives from all major drug target categories. Proteins were represented by alignment-independent sequence descriptors holding information on properties such as hydrophobicity, charge, and secondary structure. Ligands were represented by commonly used QSAR descriptors. The inhibition constant (pK(i)) values of protein-ligand complexes were discretized into "high" and "low" interaction activity. Different machine-learning techniques were used to induce models relating protein and ligand properties to the interaction activity. The best was decision trees, which gave an accuracy of 80% and an area under the ROC curve of 0.81. The tree pointed to the protein and ligand properties, which are relevant for the interaction. As the approach does neither require alignments nor knowledge of protein 3D structures virtually all available protein-ligand interaction data could be utilized, thus opening a way to completely general interaction models that may span entire proteomes.

Bioinformatics, Chemogenomics, Drug design, Protein-Ligand interactions, Proteochemometrics
National Category
Pharmaceutical Sciences Biological Sciences
urn:nbn:se:uu:diva-89298 (URN)10.1002/minf.201000052 (DOI)000280908200004 ()
Available from: 2009-02-10 Created: 2009-02-10 Last updated: 2013-04-12Bibliographically approved
5. Rough set-based proteochemometrics modeling of G-protein-coupled receptor-ligand
Open this publication in new window or tab >>Rough set-based proteochemometrics modeling of G-protein-coupled receptor-ligand
Show others...
2006 (English)In: Proteins: Structure, Function, and Bioinformatics, ISSN 1097-0134, Vol. 63, no 1, 24-34 p.Article in journal (Refereed) Published
Abstract [en]

G-Protein-coupled receptors (GPCRs) are among the most important drug targets. Because of a shortage of 3D crystal structures, most of the drug design for GPCRs has been ligand-based. We propose a novel, rough set-based proteochemometric approach to the study of receptor and ligand recognition. The approach is validated on three datasets containing GPCRs. In proteochemometrics, properties of receptors and ligands are used in conjunction and modeled to predict binding affinity. The rough set (RS) rule-based models presented herein consist of minimal decision rules that associate properties of receptors and ligands with high or low binding affinity. The information provided by the rules is then used to develop a mechanistic interpretation of interactions between the ligands and receptors included in the datasets. The first two datasets contained descriptors of melanocortin receptors and peptide ligands. The third set contained descriptors of adrenergic receptors and ligands. All the rule models induced from these datasets have a high predictive quality. An example of a decision rule is If R1_ligand(Ethyl) and TM helix 2 position 27(Methionine) then Binding(High). The easily interpretable rule sets are able to identify determinative receptor and ligand parts. For instance, all three models suggest that transmembrane helix 2 is determinative for high and low binding affinity. RS models show that it is possible to use rule-based models to predict ligand-binding affinities. The models may be used to gain a deeper biological understanding of the combinatorial nature of receptor-ligand interactions.

drug design, QSAR, GPCRs, machine learning, rough sets, partial least squares
urn:nbn:se:uu:diva-79854 (URN)10.1002/prot.2077 (DOI)16435365 (PubMedID)
Available from: 2006-04-14 Created: 2006-04-14 Last updated: 2009-10-13Bibliographically approved

Open Access in DiVA

fulltext(2150 kB)698 downloads
File information
File name FULLTEXT01.pdfFile size 2150 kBChecksum SHA-512
Type fulltextMimetype application/pdf
Buy this publication >>

By organisation
The Linnaeus Centre for Bioinformatics

Search outside of DiVA

GoogleGoogle Scholar
Total: 698 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 662 hits
ReferencesLink to record
Permanent link

Direct link