uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Unbiased descriptor and parameter selection confirms the potential of proteochemometric modelling
Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Signal Processing.
Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Signal Processing.
Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Signal Processing.
Uppsala University, Disciplinary Domain of Science and Technology, Technology, Department of Engineering Sciences, Signal Processing.
Show others and affiliations
2005 (English)In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 6, 50- p.Article in journal (Refereed) Published
Abstract [en]

Background

Proteochemometrics is a new methodology that allows prediction of protein function directly from real interaction measurement data without the need of 3D structure information. Several reported proteochemometric models of ligand-receptor interactions have already yielded significant insights into various forms of bio-molecular interactions. The proteochemometric models are multivariate regression models that predict binding affinity for a particular combination of features of the ligand and protein. Although proteochemometric models have already offered interesting results in various studies, no detailed statistical evaluation of their average predictive power has been performed. In particular, variable subset selection performed to date has always relied on using all available examples, a situation also encountered in microarray gene expression data analysis.

Results

A methodology for an unbiased evaluation of the predictive power of proteochemometric models was implemented and results from applying it to two of the largest proteochemometric data sets yet reported are presented. A double cross-validation loop procedure is used to estimate the expected performance of a given design method. The unbiased performance estimates (P2) obtained for the data sets that we consider confirm that properly designed single proteochemometric models have useful predictive power, but that a standard design based on cross validation may yield models with quite limited performance. The results also show that different commercial software packages employed for the design of proteochemometric models may yield very different and therefore misleading performance estimates. In addition, the differences in the models obtained in the double CV loop indicate that detailed chemical interpretation of a single proteochemometric model is uncertain when data sets are small.

Conclusion

The double CV loop employed offer unbiased performance estimates about a given proteochemometric modelling procedure, making it possible to identify cases where the proteochemometric design does not result in useful predictive models. Chemical interpretations of single proteochemometric models are uncertain and should instead be based on all the models selected in the double CV loop employed here.

Place, publisher, year, edition, pages
2005. Vol. 6, 50- p.
Keyword [en]
Algorithms, Animals, Computational Biology/*methods, Computer Simulation, Data Interpretation; Statistical, Humans, Ligands, Models; Biological, Models; Chemical, Models; Molecular, Models; Statistical, Models; Theoretical, Oligonucleotide Array Sequence Analysis/*methods, Predictive Value of Tests, Programming Languages, Protein Binding, Protein Conformation, Rats, Receptors; Adrenergic; alpha-1/chemistry, Receptors; G-Protein-Coupled/chemistry, Regression Analysis, Reproducibility of Results, Research Support; Non-U.S. Gov't, Selection (Genetics), Software
National Category
Pharmaceutical Sciences
Identifiers
URN: urn:nbn:se:uu:diva-75329DOI: 10.1186/1471-2105-6-50PubMedID: 15760465OAI: oai:DiVA.org:uu-75329DiVA: diva2:103239
Available from: 2007-04-17 Created: 2007-04-17 Last updated: 2017-12-14Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textPubMedhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=15760465&dopt=Citation

Authority records BETA

Freyhult, EvaGustafsson, Mats G

Search in DiVA

By author/editor
Freyhult, EvaGustafsson, Mats G
By organisation
Signal ProcessingDepartment of Engineering Sciences
In the same journal
BMC Bioinformatics
Pharmaceutical Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 539 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf