Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
System disruptions
We are currently experiencing disruptions on the search portals due to high traffic. We are working to resolve the issue, you may temporarily encounter an error message.
Change search
Link to record
Permanent link

Direct link
Publications (10 of 11) Show all publications
Zhang, T. (2025). Intelligent Data Management via Machine Learning: From Storage Hierarchy to Information Hierarchy. (Doctoral dissertation). Uppsala: Acta Universitatis Upsaliensis
Open this publication in new window or tab >>Intelligent Data Management via Machine Learning: From Storage Hierarchy to Information Hierarchy
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The rise of Big Data has catalyzed numerous advanced data-driven methods, while simultaneously posing significant challenges in data management. This thesis aims to address two fundamental aspects of data management–storage management and information extraction–by leveraging machine learning (ML) techniques. In particular, we focus on two research topics: Storage Hierarchy, which explores hierarchical storage management (HSM) in multi-tiered storage systems; and Information Hierarchy, which targets the extraction of intrinsic data hierarchies from raw data.

We begin by introducing the key stages of data life cycle and their associated challenges in the Big Data era, alongside a review of machine learning foundations and their potentials for addressing these challenges. Subsequently, we present the Storage Hierarchy project, which is detailed across Paper I, II, and III. In these works, we develop automated, adaptive, and efficient HSM approaches using reinforcement learning (RL). In Paper I we introduce the HSM-RL framework for managing file-level data migration in hierarchical storage system (HSS). It leverages RL to optimize file placement and temporal difference learning for real-time adaptability. Paper II extends this work to complex real–world scenarios using scientific datasets, exploring the framework’s flexibility, scalability, and effectiveness. Moving to finer granularity, Paper III presents ReStore, an RL-based page-level data migration approach that incorporates the unique characteristics of modern Solid-State Drives (SSDs), such as read/write asymmetry and parallelism.

The Information Hierarchy project focuses on autonomous extraction of implicit data hierarchies from raw, unlabeled data. Presented in Paper IV, we propose InfoHier, a framework that integrates self-supervised learning (SSL) with hierarchical clustering (HC) to uncover latent data representations and hierarchical structures. By jointly training SSL and HC through a dynamic balancing loss, InfoHier ensure that the HC results align with the intrinsic data hierarchy. This method facilitates meaningful and structured information extraction and retrieval. 

Collectively, the Storage Hierarchy and Information Hierarchy projects advance intelligent data management by enabling efficient storage solutions and autonomous information extraction. These contributions pave the foundation for next generation data management systems, addressing the challenges of Big Data with adaptive and scalable solutions.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2025. p. 93
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 2483
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-544718 (URN)978-91-513-2332-9 (ISBN)
Public defence
2025-02-07, Häggsalen, Ångströmlaboratoriet, Lägerhyddsvägen 1, Uppsala, 10:15 (English)
Opponent
Supervisors
Available from: 2025-01-16 Created: 2024-12-08 Last updated: 2025-01-16
Ju, L., Zhang, T., Toor, S. & Hellander, A. (2024). Accelerating Fair Federated Learning: Adaptive Federated Adam. IEEE Transactions on Machine Learning in Communications and Networking, 2, 1017-1032
Open this publication in new window or tab >>Accelerating Fair Federated Learning: Adaptive Federated Adam
2024 (English)In: IEEE Transactions on Machine Learning in Communications and Networking, E-ISSN 2831-316X, Vol. 2, p. 1017-1032Article in journal (Refereed) Published
Abstract [en]

Federated learning is a distributed and privacy-preserving approach to train a statistical model collaboratively from decentralized data held by different parties. However, when the datasets are not independent and identically distributed, models trained by naive federated algorithms may be biased towards certain participants, and model performance across participants is non-uniform. This is known as the fairness problem in federated learning. In this paper, we formulate fairness-controlled federated learning as a dynamical multi-objective optimization problem to ensure the fairness and convergence with theoretical guarantee. To solve the problem efficiently, we study the convergence and bias of Adam as the server optimizer in federated learning, and propose Adaptive Federated Adam ( AdaFedAdam ) to accelerate fair federated learning with alleviated bias. We validated the effectiveness, Pareto optimality and robustness of AdaFedAdam with numerical experiments and show that AdaFedAdam outperforms existing algorithms, providing better convergence and fairness properties of the federated scheme.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
National Category
Computer Sciences
Research subject
Machine learning
Identifiers
urn:nbn:se:uu:diva-544327 (URN)10.1109/tmlcn.2024.3423648 (DOI)
Projects
eSSENCE - An eScience Collaboration
Available from: 2024-12-03 Created: 2024-12-03 Last updated: 2025-01-07Bibliographically approved
Zhang, T. (2024). Autonomous Hierarchical Storage Management via Reinforcement Learning. In: : . Paper presented at PhD Workshop, 50th International Conference on Very Large Databases (VLDB 2024), Guangzhou, China, August 26-30, 2024. VLDB, Article ID 6.
Open this publication in new window or tab >>Autonomous Hierarchical Storage Management via Reinforcement Learning
2024 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In the present era of big data, the challenges of data management have grown significantly. One crucial aspect is the management of data storage. As data volumes continue to expand, effective storage management becomes increasingly essential. Meanwhile, evolving hardware technologies offer various storage options, ranging from HDDs to SSDs and NVRAMs. To this end, hierarchical (multi-tier) storage systems (HSS) have emerged as a solution, organizing different storage devices hierarchically to provide various storage options. However, managing multiple storage tiers and their data, while optimizing performance and cost-efficiency, is extremely complex. In this paper, we discuss the challenges in the management of hierarchical storage system. We summarise our previous contributions on tackling these challenges, including the proposal of a reinforcement learning (RL) based data migration policy and the design of an autonomous hierarchical storage management framework HSM-RL. We also present the applications of HSM-RL in scientific data management to demonstrate its adaptability and scalability. Finally, we conclude our work to date and outline the future research plans.

Place, publisher, year, edition, pages
VLDB, 2024
Series
Proceedings of the VLDB Endowment, E-ISSN 2150-8097
Keywords
Hierarchical Storage Management, Reinforcement Learning
National Category
Computer Sciences
Research subject
Computer Science with specialization in Database Technology
Identifiers
urn:nbn:se:uu:diva-542280 (URN)
Conference
PhD Workshop, 50th International Conference on Very Large Databases (VLDB 2024), Guangzhou, China, August 26-30, 2024
Projects
eSSENCE - An eScience Collaboration
Funder
Swedish Foundation for Strategic Research, BD15-0008Swedish National Infrastructure for Computing (SNIC)
Available from: 2024-11-09 Created: 2024-11-09 Last updated: 2025-01-07Bibliographically approved
Li, S., Ngai, E. C. H., Ye, F., Ju, L., Zhang, T. & Voigt, T. (2024). Blades: A Unified Benchmark Suite for Byzantine Attacks and Defenses in Federated Learning. In: 2024 IEEE/ACM Ninth International Conference on Internet-of-Things Design and Implementation (IoTDI): . Paper presented at 9th ACM/IEEE Conference on Internet of Things Design and Implementation (IoTDI), May 13-16, 2024, Hong Kong, Hong Kong (pp. 158-169). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Blades: A Unified Benchmark Suite for Byzantine Attacks and Defenses in Federated Learning
Show others...
2024 (English)In: 2024 IEEE/ACM Ninth International Conference on Internet-of-Things Design and Implementation (IoTDI), Institute of Electrical and Electronics Engineers (IEEE), 2024, p. 158-169Conference paper, Published paper (Refereed)
Abstract [en]

Federated learning (FL) facilitates distributed training across different IoT and edge devices, safeguarding the privacy of their data. The inherent distributed structure of FL introduces vulnerabilities, especially from adversarial devices aiming to skew local updates to their advantage. Despite the plethora of research focusing on Byzantine-resilient FL, the academic community has yet to establish a comprehensive benchmark suite, pivotal for impartial assessment and comparison of different techniques. This paper presents Blades, a scalable, extensible, and easily configurable benchmark suite that supports researchers and developers in efficiently implementing and validating novel strategies against baseline algorithms in Byzantine-resilient FL. Blades contains built-in implementations of representative attack and defense strategies and offers a user-friendly interface that seamlessly integrates new ideas. Using Blades, we re-evaluate representative attacks and defenses on wide-ranging experimental configurations (approximately 1,500 trials in total). Through our extensive experiments, we gained new insights into FL robustness and highlighted previously overlooked limitations due to the absence of thorough evaluations and comparisons of baselines under various attack settings. We maintain the source code and documents at https://github.com/lishenghui/blades.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Byzantine attacks, distributed learning, federated learning, IoT, neural networks, robustness
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-537577 (URN)10.1109/IoTDI61053.2024.00018 (DOI)001261370500014 ()2-s2.0-85196568437 (Scopus ID)979-8-3503-7025-6 (ISBN)979-8-3503-7026-3 (ISBN)
Conference
9th ACM/IEEE Conference on Internet of Things Design and Implementation (IoTDI), May 13-16, 2024, Hong Kong, Hong Kong
Funder
Swedish Research Council, 2017-04543
Available from: 2024-09-05 Created: 2024-09-05 Last updated: 2025-02-11Bibliographically approved
Zhang, T., Gupta, A., Francisco Rodríguez, M. A., Spjuth, O., Hellander, A. & Toor, S. (2024). Data management of scientific applications in a reinforcement learning-based hierarchical storage system. Expert systems with applications, 237, Article ID 121443.
Open this publication in new window or tab >>Data management of scientific applications in a reinforcement learning-based hierarchical storage system
Show others...
2024 (English)In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 237, article id 121443Article in journal (Refereed) Published
Abstract [en]

In many areas of data-driven science, large datasets are generated where the individual data objects are images, matrices, or otherwise have a clear structure. However, these objects can be information-sparse, and a challenge is to efficiently find and work with the most interesting data as early as possible in an analysis pipeline. We have recently proposed a new model for big data management where the internal structure and information of the data are associated with each data object (as opposed to simple metadata). There is then an opportunity for comprehensive data management solutions to account for data-specific internal structure as well as access patterns. In this article, we explore this idea together with our recently proposed hierarchical storage management framework that uses reinforcement learning (RL) for autonomous and dynamic data placement in different tiers in a storage hierarchy. Our case-study is based on four scientific datasets: Protein translocation microscopy images, Airfoil angle of attack meshes, 1000 Genomes sequences, and Phenotypic screening images. The presented results highlight that our framework is optimal and can quickly adapt to new data access requirements. It overall reduces the data processing time, and the proposed autonomous data placement is superior compared to any static or semi-static data placement policies.

Place, publisher, year, edition, pages
Elsevier, 2024
Keywords
Data management, Scientific application, Hierarchical storage system, Reinforcement learning, Large scientific datasets
National Category
Computer Sciences Computational Mathematics
Research subject
Computer Science with specialization in Database Technology; Computer Science
Identifiers
urn:nbn:se:uu:diva-513854 (URN)10.1016/j.eswa.2023.121443 (DOI)001081909200001 ()
Funder
Swedish Foundation for Strategic Research, BD15-0008Swedish National Infrastructure for Computing (SNIC), SNIC 2022/22-835eSSENCE - An eScience Collaboration
Available from: 2023-10-12 Created: 2023-10-12 Last updated: 2024-12-08Bibliographically approved
Zhang, T., Gupta, A., Francisco Rodríguez, M. A., Spjuth, O., Hellander, A. & Toor, S. (2024). Data management of scientific applications in a reinforcement learning-based hierarchical storage system. Expert systems with applications, 237, 121443-121443, Article ID 121443.
Open this publication in new window or tab >>Data management of scientific applications in a reinforcement learning-based hierarchical storage system
Show others...
2024 (English)In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 237, p. 121443-121443, article id 121443Article in journal (Refereed) Published
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-552596 (URN)10.1016/j.eswa.2023.121443 (DOI)
Funder
Swedish Foundation for Strategic Research, BD15-0008Swedish National Infrastructure for Computing (SNIC), SNIC 2022/22-835eSSENCE - An eScience Collaboration
Available from: 2025-03-17 Created: 2025-03-17 Last updated: 2025-03-17
Li, S., Ngait, E.-H. C. -., Ye, F., Ju, L., Zhang, T. & Voigt, T. (2024). Demo Abstract: Blades: A Unified Benchmark Suite for Byzantine-Resilient in Federated Learning. In: 9TH ACM/IEEE CONFERENCE ON INTERNET OF THINGS DESIGN AND IMPLEMENTATION, IOTDI 2024: . Paper presented at 9th ACM/IEEE Conference on Internet of Things Design and Implementation (IoTDI), MAY 13-16, 2024, Hong Kong, PEOPLES R CHINA (pp. 229-230). IEEE Computer Society
Open this publication in new window or tab >>Demo Abstract: Blades: A Unified Benchmark Suite for Byzantine-Resilient in Federated Learning
Show others...
2024 (English)In: 9TH ACM/IEEE CONFERENCE ON INTERNET OF THINGS DESIGN AND IMPLEMENTATION, IOTDI 2024, IEEE Computer Society, 2024, p. 229-230Conference paper, Published paper (Refereed)
Abstract [en]

Federated learning (FL) facilitates distributed training across different IoT and edge devices, safeguarding the privacy of their data. The inherently distributed nature of FL introduces vulnerabilities, especially from adversarial devices aiming to skew local updates to their desire. Despite the plethora of research focusing on Byzantine-resilient FL, the academic conununity has yet to establish a comprehensive benchmark suite, pivotal for the assessment and comparison of different techniques. This demonstration presents Blades, a scalable, extensible, and easily configurable benchmark suite that supports researchers and developers in efficiently implementing and validating strategies against baseline algorithms in Byzantine-resilient FL.

Place, publisher, year, edition, pages
IEEE Computer Society, 2024
Keywords
Byzantine attacks, distributed learning, federated learning, IoT, neural networks, robustness
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-537570 (URN)10.1109/IoTDI61053.2024.00030 (DOI)001261370500026 ()979-8-3503-7025-6 (ISBN)979-8-3503-7026-3 (ISBN)
Conference
9th ACM/IEEE Conference on Internet of Things Design and Implementation (IoTDI), MAY 13-16, 2024, Hong Kong, PEOPLES R CHINA
Available from: 2024-09-05 Created: 2024-09-05 Last updated: 2024-09-05Bibliographically approved
Zhang, T., Hellander, A. & Toor, S. (2023). Efficient Hierarchical Storage Management Empowered by Reinforcement Learning. IEEE Transactions on Knowledge and Data Engineering, 35, 5780-5793
Open this publication in new window or tab >>Efficient Hierarchical Storage Management Empowered by Reinforcement Learning
2023 (English)In: IEEE Transactions on Knowledge and Data Engineering, ISSN 1041-4347, E-ISSN 1558-2191, Vol. 35, p. 5780-5793Article in journal (Refereed) Published
Abstract [en]

With the rapid development of big data and cloud computing, data management has become increasingly challenging. Over the years, a number of frameworks for data management have become available. Most of them are highly efficient, but ultimately create data silos. It becomes difficult to move and work coherently with data as new requirements emerge. A possible solution is to use an intelligent hierarchical (multi-tier) storage system (HSS). A HSS is a meta solution that consists of different storage frameworks organized as a jointly constructed storage pool. A built-in data migration policy that determines the optimal placement of the datasets in the hierarchy is essential. Placement decisions is a non-trivial task since it should be made according to the characteristics of the dataset, the tier status in a hierarchy, and access patterns. This paper presents an open-source hierarchical storage framework with a dynamic migration policy based on reinforcement learning (RL). We present a mathematical model, a software architecture, and implementations based on both simulations and a live cloud-based environment. We compare the proposed RL-based strategy to a baseline of three rule-based policies, showing that the RL-based policy achieves significantly higher efficiency and optimal data distribution in different scenarios.

Place, publisher, year, edition, pages
IEEE, 2023
Keywords
Data Management, Cloud Computing, Hierarchical Storage System, Data Migration, Reinforcement Learning
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-490399 (URN)10.1109/tkde.2022.3176753 (DOI)000981944600024 ()
Projects
eSSENCE - An eScience Collaboration
Funder
Swedish Foundation for Strategic Research, BD15-0008
Available from: 2022-12-09 Created: 2022-12-09 Last updated: 2024-12-16Bibliographically approved
Zhang, T., Hellander, A. & Toor, S. (2023). Efficient Hierarchical Storage Management Empowered by Reinforcement Learning Extended Abstract. In: : . Paper presented at 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, California, 3-7 April, 2023 (pp. 3869-3870). IEEE
Open this publication in new window or tab >>Efficient Hierarchical Storage Management Empowered by Reinforcement Learning Extended Abstract
2023 (English)Conference paper, Published paper (Refereed)
Abstract [en]

With the rapid development of big data and cloud computing, data management has become increasingly challenging. A possible solution is to use an intelligent hierarchical (multi-tier) storage system (HSS). An HSS is a meta solution that consists of different storage frameworks organized as a jointly constructed storage pool. A built-in data migration policy that determines the optimal placement of the datasets in the hierarchy is essential. Placement decisions are a non-trivial task since they should be made according to the characteristics of the dataset, the tier status in a hierarchy, and access patterns. This paper presents an open-source hierarchical storage framework with a dynamic migration policy based on reinforcement learning (RL).

Place, publisher, year, edition, pages
IEEE, 2023
Keywords
Cloud computing, Storage management, Reinforcement learning, Big Data, Data engineering
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-525454 (URN)10.1109/ICDE55515.2023.00361 (DOI)979-8-3503-2227-9 (ISBN)979-8-3503-2228-6 (ISBN)
Conference
2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, California, 3-7 April, 2023
Funder
Swedish Foundation for Strategic Research, BD15-0008
Available from: 2024-03-22 Created: 2024-03-22 Last updated: 2024-03-22Bibliographically approved
Zhang, T., Ju, L., Singh, P. & Toor, S.InfoHier: Hierarchical Information Extraction via Encoding and Embedding.
Open this publication in new window or tab >>InfoHier: Hierarchical Information Extraction via Encoding and Embedding
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Analyzing large-scale datasets, especially involving complex and high-dimensional data like images, is particularly challenging. While self-supervised learning (SSL) has proven effective for learning representations from unlabeled data, it typically focuses on flat, non-hierarchical structures, missing the multi-level relationships present in many real-world datasets. Hierarchical clustering (HC) can uncover these relationships by organizing data into a tree-like structure, but it often relies on rigid similarity metrics that struggle to capture the complexity of diverse data types. To address these we envision InfoHier, a framework that combines SSL with HC to jointly learn robust latent representations and hierarchical structures. This approach leverages SSL to provide adaptive representations, enhancing HC's ability to capture complex patterns. Simultaneously, it integrates HC loss to refine SSL training, resulting in representations that are more attuned to the underlying information hierarchy. InfoHier has the potential to improve the expressiveness and performance of both clustering and representation learning, offering significant benefits for data analysis, management, and information retrieval.

Keywords
Hierarchical Representation, Hierarchical Clustering, Self-Supervised Learning, Joint Learning, Information Retrieval
National Category
Computer Sciences Information Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-544717 (URN)
Available from: 2024-12-08 Created: 2024-12-08 Last updated: 2024-12-18Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-9983-3755

Search in DiVA

Show all publications