Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Improved estimation of intrinsic solubility of drug-like molecules through multi-task graph transformer
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmacy.ORCID iD: 0009-0004-1619-4521
Johnson & Johnson, Pharmaceut & Mat Sci, Beerse, Belgium..
Johnson & Johnson, Discovery Pharmaceut, La Jolla, CA USA..
Johnson & Johnson, Pharmaceut & Mat Sci, Beerse, Belgium..
Show others and affiliations
2025 (English)In: Journal of Cheminformatics, E-ISSN 1758-2946, Vol. 17, no 1, article id 153Article in journal (Refereed) Published
Abstract [en]

Aqueous solubility of a compound plays a crucial role throughout various stages of drug discovery and development. Despite numerous efforts using various machine learning models, accurately estimating aqueous solubility remains a challenge. One primary limitation is the absence of a single source, large dataset of druglike compounds for model training. Additionally, studies have highlighted the need for improvements in prediction algorithms and molecular representations. To address these challenges, the Johnson and Johnson (J&J) in-house solubility data was leveraged. Theoretical pH-solubility equations and in-house pKa prediction tools were utilized to calculate intrinsic solubility from J&J data. A multi-task graph transformer model was developed and trained on the calculated intrinsic solubility data of 13,306 compounds along with seven relevant physicochemical properties including solubility at pH 2/7, logP, and logD at three different pHs. When evaluated making use of high-quality test data, the developed model achieved a root mean square error (RMSE) of 0.61 and coefficient of determination (R2) of 0.60, demonstrating state-of-the-art performance in estimating intrinsic solubility for drug-like compounds.

Place, publisher, year, edition, pages
BioMed Central (BMC), 2025. Vol. 17, no 1, article id 153
Keywords [en]
Graph transformer, Muti-task learning, Quantitative structure-property relationship (QSPR), Molecular property prediction, Drug-like compounds
National Category
Bioinformatics (Computational Biology)
Identifiers
URN: urn:nbn:se:uu:diva-570504DOI: 10.1186/s13321-025-01106-0ISI: 001592018500001PubMedID: 41084070Scopus ID: 2-s2.0-105018704970OAI: oai:DiVA.org:uu-570504DiVA, id: diva2:2009776
Available from: 2025-10-28 Created: 2025-10-28 Last updated: 2025-10-28Bibliographically approved

Open Access in DiVA

fulltext(1864 kB)142 downloads
File information
File name FULLTEXT01.pdfFile size 1864 kBChecksum SHA-512
23b7c399a51bb9412a0c417ab2514785ca7fe49bb6673ef431121970b5df112541396c901726ba236e31b6069e747ed720fda03afcef33ed7ea4997b54fad579
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMedScopus

Authority records

Zhao, JiaxiBergström, ChristelLarsson, Per

Search in DiVA

By author/editor
Zhao, JiaxiBergström, ChristelAhmad, MazenLarsson, Per
By organisation
Department of Pharmacy
In the same journal
Journal of Cheminformatics
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 660 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf