Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Federated learning for predicting compound mechanism of action based on image-data from cell painting
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.ORCID iD: 0000-0001-9500-1791
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Numerical Analysis. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.ORCID iD: 0000-0001-7273-7923
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.ORCID iD: 0000-0002-8083-2864
2024 (English)In: Artificial Intelligence in the Life Sciences, E-ISSN 2667-3185, Vol. 5, article id 100098Article in journal (Refereed) Published
Abstract [en]

Having access to sufficient data is essential in order to train accurate machine learning models, but much data is not publicly available. In drug discovery this is particularly evident, as much data is withheld at pharmaceutical companies for various reasons. Federated Learning (FL) aims at training a joint model between multiple parties but without disclosing data between the parties. In this work, we leverage Federated Learning to predict compound Mechanism of Action (MoA) using fluorescence image data from cell painting. Our study evaluates the effectiveness and efficiency of FL, comparing to non-collaborative and data-sharing collaborative learning in diverse scenarios. Specifically, we investigate the impact of data heterogeneity across participants on MoA prediction, an essential concern in real-life applications of FL, and demonstrate the benefits for all involved parties. This work highlights the potential of federated learning in multi-institutional collaborative machine learning for drug discovery and assessment of chemicals, offering a promising avenue to overcome data-sharing constraints.

Place, publisher, year, edition, pages
Elsevier, 2024. Vol. 5, article id 100098
Keywords [en]
Federated learning, Cell profiling, Cell painting, Artificial intelligence, Collaborative learning
National Category
Computer and Information Sciences
Research subject
Scientific Computing; Machine learning
Identifiers
URN: urn:nbn:se:uu:diva-544328DOI: 10.1016/j.ailsci.2024.100098ISI: 001333825000001Scopus ID: 2-s2.0-85193044761OAI: oai:DiVA.org:uu-544328DiVA, id: diva2:1917914
Projects
eSSENCE - An eScience Collaboration
Funder
Uppsala UniversityeSSENCE - An eScience CollaborationSwedish Research Council, 2020-03731Swedish Research Council, 2020-01865Swedish Research Council Formas, 2022-00940Swedish Cancer Society, 22 2412EU, Horizon Europe, 101057014EU, Horizon Europe, 101057442Swedish Research Council, 2022-06725Available from: 2024-12-03 Created: 2024-12-03 Last updated: 2026-03-31Bibliographically approved
In thesis
1. Robust Learning from Distributed and Heterogeneous Data
Open this publication in new window or tab >>Robust Learning from Distributed and Heterogeneous Data
2026 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Modern machine learning is increasingly expanding beyond centralized, mono-modal training toward systems that must also learn from data across distributed edge devices and heterogeneous data modalities. This transition breaks the foundational identical and independent distribution (i.i.d.) assumptions of traditional models, making robustness a first-class requirement for real-world applications. This thesis studies the mechanisms and methodologies necessary to achieve algorithmic robustness across three intersecting dimensions: distributed optimization, geometry-aware uncertainty quantification, and simulation-based inference.

The first dimension addresses statistical heterogeneity in Federated Learning (FL), a distributed training framework in which multiple participants collaboratively train a shared model without exchanging their local data. In FL, the non-i.i.d. nature of distributed data often induces performance degradation, convergence issues and fairness problems. Through an empirical study on drug discovery and the development of new algorithms, this work demonstrates that adaptive optimization and dynamic hyperparameter adjustment can mitigate training instabilities. These methods ensure equitable performance across diverse data silos, preventing the global model from favoring specific participants.

The second dimension explores the structural challenges of multi-modal language models, which map data of heterogeneous modalities onto complex, non-Euclidean manifolds. This research models aleatoric and epistemic uncertainty with directional distributions via parametric models and Riemannian Flow Matching. This geometry-aware approach allows models to respect the intrinsic geometric structure of the embedding space, providing a mathematically grounded framework for models to quantify their ignorance when confronted with ambiguous or out-of-distribution inputs.

The final dimension addresses the robustness of a unified framework which supports both forward and inverse processes for Bayesian inference. The proposed framework utilizes a unified Flow Matching model to learn the joint distribution of parameters and observations. By employing randomized masking, this architecture robustly handles partially observed or noisy data, integrating forward and inverse processes into a single cohesive neural network without the need for specialized retraining.  Collectively, this thesis contributes theoretical analyses, novel algorithms, and empirical validations that advance the robustness of machine learning across federated optimization, multi-modal uncertainty quantification, and simulation-based inference, bridging the gap between idealized training assumptions and the demands of real-world applications.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2026. p. 62
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 2665
Keywords
Machine Learning, Distributed Optimization, Federated Learning, Probabilistic Modeling, Multi-modal Learning
National Category
Artificial Intelligence
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-583490 (URN)978-91-513-2816-4 (ISBN)
Public defence
2026-06-04, 101195, Heinz-Otto Kreiss, Regementsvägen 10, Uppsala, 09:15 (English)
Opponent
Supervisors
Available from: 2026-05-07 Created: 2026-03-31 Last updated: 2026-05-13

Open Access in DiVA

fulltext(2044 kB)283 downloads
File information
File name FULLTEXT01.pdfFile size 2044 kBChecksum SHA-512
46ea42027915c8cb42f6ab4f503e1da8d9f68348dd28c33f17f191b9ad9960715b2ec714cb8a7bafc74881cc8dfa81e9811cc0c989d7de2f4fee66bc505291e5
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Ju, LiHellander, AndreasSpjuth, Ola

Search in DiVA

By author/editor
Ju, LiHellander, AndreasSpjuth, Ola
By organisation
Division of Scientific ComputingComputational ScienceNumerical AnalysisDepartment of Pharmaceutical Biosciences
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 284 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 243 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf