Towards proteome-wide interaction models using the proteochemometrics approach
2010 (English)In: Molecular Informatics, ISSN 1868-1743, Vol. 29, no 6-7, 499-508 p.Article in journal (Refereed) Published
A proteochemometrics model was induced from all interaction data in the BindingDB database, comprizing in all 7078 protein-ligand complexes with representatives from all major drug target categories. Proteins were represented by alignment-independent sequence descriptors holding information on properties such as hydrophobicity, charge, and secondary structure. Ligands were represented by commonly used QSAR descriptors. The inhibition constant (pK(i)) values of protein-ligand complexes were discretized into "high" and "low" interaction activity. Different machine-learning techniques were used to induce models relating protein and ligand properties to the interaction activity. The best was decision trees, which gave an accuracy of 80% and an area under the ROC curve of 0.81. The tree pointed to the protein and ligand properties, which are relevant for the interaction. As the approach does neither require alignments nor knowledge of protein 3D structures virtually all available protein-ligand interaction data could be utilized, thus opening a way to completely general interaction models that may span entire proteomes.
Place, publisher, year, edition, pages
2010. Vol. 29, no 6-7, 499-508 p.
Bioinformatics, Chemogenomics, Drug design, Protein-Ligand interactions, Proteochemometrics
Pharmaceutical Sciences Biological Sciences
IdentifiersURN: urn:nbn:se:uu:diva-89298DOI: 10.1002/minf.201000052ISI: 000280908200004OAI: oai:DiVA.org:uu-89298DiVA: diva2:159947