Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data management of scientific applications in a reinforcement learning-based hierarchical storage system
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Scientific Computing. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computational Science.ORCID iD: 0000-0001-9983-3755
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division Vi3. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.ORCID iD: 0000-0002-9961-1041
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science. Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.ORCID iD: 0000-0001-8745-9858
Uppsala University, Disciplinary Domain of Medicine and Pharmacy, Faculty of Pharmacy, Department of Pharmaceutical Biosciences.ORCID iD: 0000-0002-8083-2864
Show others and affiliations
2024 (English)In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 237, article id 121443Article in journal (Refereed) Published
Abstract [en]

In many areas of data-driven science, large datasets are generated where the individual data objects are images, matrices, or otherwise have a clear structure. However, these objects can be information-sparse, and a challenge is to efficiently find and work with the most interesting data as early as possible in an analysis pipeline. We have recently proposed a new model for big data management where the internal structure and information of the data are associated with each data object (as opposed to simple metadata). There is then an opportunity for comprehensive data management solutions to account for data-specific internal structure as well as access patterns. In this article, we explore this idea together with our recently proposed hierarchical storage management framework that uses reinforcement learning (RL) for autonomous and dynamic data placement in different tiers in a storage hierarchy. Our case-study is based on four scientific datasets: Protein translocation microscopy images, Airfoil angle of attack meshes, 1000 Genomes sequences, and Phenotypic screening images. The presented results highlight that our framework is optimal and can quickly adapt to new data access requirements. It overall reduces the data processing time, and the proposed autonomous data placement is superior compared to any static or semi-static data placement policies.

Place, publisher, year, edition, pages
Elsevier, 2024. Vol. 237, article id 121443
Keywords [en]
Data management, Scientific application, Hierarchical storage system, Reinforcement learning, Large scientific datasets
National Category
Computer Sciences Computational Mathematics
Research subject
Computer Science with specialization in Database Technology; Computer Science
Identifiers
URN: urn:nbn:se:uu:diva-513854DOI: 10.1016/j.eswa.2023.121443ISI: 001081909200001OAI: oai:DiVA.org:uu-513854DiVA, id: diva2:1804392
Funder
Swedish Foundation for Strategic Research, BD15-0008Swedish National Infrastructure for Computing (SNIC), SNIC 2022/22-835eSSENCE - An eScience CollaborationAvailable from: 2023-10-12 Created: 2023-10-12 Last updated: 2024-12-08Bibliographically approved
In thesis
1. Adapting Deep Learning for Microscopy: Interaction, Application, and Validation
Open this publication in new window or tab >>Adapting Deep Learning for Microscopy: Interaction, Application, and Validation
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Microscopy is an integral technique in biology to study the fundamental components of life visually. Digital microscopy and automation have enabled biologists to conduct faster and larger-scale experiments with a sharp increase in the data generated. Microscopy images contain rich but sparse information, as typically, only small regions in the images are relevant for further study. Image analysis is a crucial tool for biologists in the objective interpretation and extraction of quantitative measurements from microscopy data. Recently, deep learning techniques have shown superior performance in various image analysis tasks. The models learn feature representations from the data by optimizing for a task. However, the techniques require a significant amount of annotated data to perform well. Domain experts are required to annotate microscopy data, making it expensive and time-consuming. The models offer no insight into their prediction, and the learned features are not directly interpretable. This poses challenges to the reliable utilization of the technique in high-trust applications such as drug discovery or disease detection. High data variability in microscopy and poor generalization performance of deep learning models further increase the difficulty in general usage of the technique. 

The work in this thesis presents frameworks and methods to solve the practical challenges of applying deep learning in microscopy. The application-specific evaluation approaches were presented to validate the approaches, aiming to increase trust in the system. The major contributions of this work are as follows. Papers I and III present human-in-the-loop frameworks for quick adaption of deep learning to new data and for improving models' performance based on human input in visual explanations provided by the model, respectively. Paper II proposes a template-matching approach to improve user interactions in the framework proposed in Paper I. Papers III and IV present architectural modifications in the deep learning models proposed for better visual explanation and image-to-image translation, respectively. Papers IV and V present biologically relevant evaluations of approaches, i.e., analysis of the deep learning models in relation to the biological task.

This thesis is aimed towards better utilization and adaptation of the DL methods and techniques to the microscopy data. We show that the annotation burden for the user can be significantly reduced by intuitive annotation frameworks and using contemporary deep-learning paradigms. We further propose architectural modifications in the models to adapt to the requirements and demonstrate the utility of application-specific analysis in microscopy.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2023. p. 65
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 2321
Keywords
Deep Learning, Microscopy, Human-in-the-Loop, Semi-Supervised Learning, Application-Specific Analysis, Image Classification, Image-to-Image Translation, Template Matching
National Category
Computer graphics and computer vision Medical Imaging
Research subject
Computerized Image Processing
Identifiers
urn:nbn:se:uu:diva-513911 (URN)978-91-513-1927-8 (ISBN)
Public defence
2023-12-15, Theatrum Visuale, Ångströmlaboratoriet, Lägerhyddsvägen 1, Uppsala, 09:00 (English)
Opponent
Supervisors
Funder
Swedish Foundation for Strategic Research, BD15-0008SB16-0046EU, European Research Council, ERC-2015-CoG 683810.
Available from: 2023-11-24 Created: 2023-10-13 Last updated: 2025-02-09
2. Intelligent Data Management via Machine Learning: From Storage Hierarchy to Information Hierarchy
Open this publication in new window or tab >>Intelligent Data Management via Machine Learning: From Storage Hierarchy to Information Hierarchy
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The rise of Big Data has catalyzed numerous advanced data-driven methods, while simultaneously posing significant challenges in data management. This thesis aims to address two fundamental aspects of data management–storage management and information extraction–by leveraging machine learning (ML) techniques. In particular, we focus on two research topics: Storage Hierarchy, which explores hierarchical storage management (HSM) in multi-tiered storage systems; and Information Hierarchy, which targets the extraction of intrinsic data hierarchies from raw data.

We begin by introducing the key stages of data life cycle and their associated challenges in the Big Data era, alongside a review of machine learning foundations and their potentials for addressing these challenges. Subsequently, we present the Storage Hierarchy project, which is detailed across Paper I, II, and III. In these works, we develop automated, adaptive, and efficient HSM approaches using reinforcement learning (RL). In Paper I we introduce the HSM-RL framework for managing file-level data migration in hierarchical storage system (HSS). It leverages RL to optimize file placement and temporal difference learning for real-time adaptability. Paper II extends this work to complex real–world scenarios using scientific datasets, exploring the framework’s flexibility, scalability, and effectiveness. Moving to finer granularity, Paper III presents ReStore, an RL-based page-level data migration approach that incorporates the unique characteristics of modern Solid-State Drives (SSDs), such as read/write asymmetry and parallelism.

The Information Hierarchy project focuses on autonomous extraction of implicit data hierarchies from raw, unlabeled data. Presented in Paper IV, we propose InfoHier, a framework that integrates self-supervised learning (SSL) with hierarchical clustering (HC) to uncover latent data representations and hierarchical structures. By jointly training SSL and HC through a dynamic balancing loss, InfoHier ensure that the HC results align with the intrinsic data hierarchy. This method facilitates meaningful and structured information extraction and retrieval. 

Collectively, the Storage Hierarchy and Information Hierarchy projects advance intelligent data management by enabling efficient storage solutions and autonomous information extraction. These contributions pave the foundation for next generation data management systems, addressing the challenges of Big Data with adaptive and scalable solutions.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2025. p. 93
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 2483
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-544718 (URN)978-91-513-2332-9 (ISBN)
Public defence
2025-02-07, Häggsalen, Ångströmlaboratoriet, Lägerhyddsvägen 1, Uppsala, 10:15 (English)
Opponent
Supervisors
Available from: 2025-01-16 Created: 2024-12-08 Last updated: 2025-01-16

Open Access in DiVA

fulltext(5357 kB)208 downloads
File information
File name FULLTEXT01.pdfFile size 5357 kBChecksum SHA-512
ea1105d06a4706a7cf99c8ec7454069de847c28ccbbf5c7db04cc0a379347af8b10fbfd55de2e60d86d24e01564edd0ee2c32932d8ad6740fe588f856ed67864
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Zhang, TianruGupta, AnkitFrancisco Rodríguez, María AndreínaSpjuth, OlaHellander, AndreasToor, Salman

Search in DiVA

By author/editor
Zhang, TianruGupta, AnkitFrancisco Rodríguez, María AndreínaSpjuth, OlaHellander, AndreasToor, Salman
By organisation
Division of Scientific ComputingComputational ScienceDivision Vi3Computerized Image Analysis and Human-Computer InteractionComputing ScienceDepartment of Pharmaceutical Biosciences
In the same journal
Expert systems with applications
Computer SciencesComputational Mathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 208 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 291 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf