Logotyp: till Uppsala universitets webbplats

uu.sePublikationer från Uppsala universitet
Ändra sökning
Länk till posten
Permanent länk

Direktlänk
Publikationer (10 of 11) Visa alla publikationer
Zhang, T. (2025). Intelligent Data Management via Machine Learning: From Storage Hierarchy to Information Hierarchy. (Doctoral dissertation). Uppsala: Acta Universitatis Upsaliensis
Öppna denna publikation i ny flik eller fönster >>Intelligent Data Management via Machine Learning: From Storage Hierarchy to Information Hierarchy
2025 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The rise of Big Data has catalyzed numerous advanced data-driven methods, while simultaneously posing significant challenges in data management. This thesis aims to address two fundamental aspects of data management–storage management and information extraction–by leveraging machine learning (ML) techniques. In particular, we focus on two research topics: Storage Hierarchy, which explores hierarchical storage management (HSM) in multi-tiered storage systems; and Information Hierarchy, which targets the extraction of intrinsic data hierarchies from raw data.

We begin by introducing the key stages of data life cycle and their associated challenges in the Big Data era, alongside a review of machine learning foundations and their potentials for addressing these challenges. Subsequently, we present the Storage Hierarchy project, which is detailed across Paper I, II, and III. In these works, we develop automated, adaptive, and efficient HSM approaches using reinforcement learning (RL). In Paper I we introduce the HSM-RL framework for managing file-level data migration in hierarchical storage system (HSS). It leverages RL to optimize file placement and temporal difference learning for real-time adaptability. Paper II extends this work to complex real–world scenarios using scientific datasets, exploring the framework’s flexibility, scalability, and effectiveness. Moving to finer granularity, Paper III presents ReStore, an RL-based page-level data migration approach that incorporates the unique characteristics of modern Solid-State Drives (SSDs), such as read/write asymmetry and parallelism.

The Information Hierarchy project focuses on autonomous extraction of implicit data hierarchies from raw, unlabeled data. Presented in Paper IV, we propose InfoHier, a framework that integrates self-supervised learning (SSL) with hierarchical clustering (HC) to uncover latent data representations and hierarchical structures. By jointly training SSL and HC through a dynamic balancing loss, InfoHier ensure that the HC results align with the intrinsic data hierarchy. This method facilitates meaningful and structured information extraction and retrieval. 

Collectively, the Storage Hierarchy and Information Hierarchy projects advance intelligent data management by enabling efficient storage solutions and autonomous information extraction. These contributions pave the foundation for next generation data management systems, addressing the challenges of Big Data with adaptive and scalable solutions.

Ort, förlag, år, upplaga, sidor
Uppsala: Acta Universitatis Upsaliensis, 2025. s. 93
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 2483
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Datavetenskap
Identifikatorer
urn:nbn:se:uu:diva-544718 (URN)978-91-513-2332-9 (ISBN)
Disputation
2025-02-07, Häggsalen, Ångströmlaboratoriet, Lägerhyddsvägen 1, Uppsala, 10:15 (Engelska)
Opponent
Handledare
Tillgänglig från: 2025-01-16 Skapad: 2024-12-08 Senast uppdaterad: 2025-01-16
Ju, L., Zhang, T., Toor, S. & Hellander, A. (2024). Accelerating Fair Federated Learning: Adaptive Federated Adam. IEEE Transactions on Machine Learning in Communications and Networking, 2, 1017-1032
Öppna denna publikation i ny flik eller fönster >>Accelerating Fair Federated Learning: Adaptive Federated Adam
2024 (Engelska)Ingår i: IEEE Transactions on Machine Learning in Communications and Networking, E-ISSN 2831-316X, Vol. 2, s. 1017-1032Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Federated learning is a distributed and privacy-preserving approach to train a statistical model collaboratively from decentralized data held by different parties. However, when the datasets are not independent and identically distributed, models trained by naive federated algorithms may be biased towards certain participants, and model performance across participants is non-uniform. This is known as the fairness problem in federated learning. In this paper, we formulate fairness-controlled federated learning as a dynamical multi-objective optimization problem to ensure the fairness and convergence with theoretical guarantee. To solve the problem efficiently, we study the convergence and bias of Adam as the server optimizer in federated learning, and propose Adaptive Federated Adam ( AdaFedAdam ) to accelerate fair federated learning with alleviated bias. We validated the effectiveness, Pareto optimality and robustness of AdaFedAdam with numerical experiments and show that AdaFedAdam outperforms existing algorithms, providing better convergence and fairness properties of the federated scheme.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2024
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Maskininlärning
Identifikatorer
urn:nbn:se:uu:diva-544327 (URN)10.1109/tmlcn.2024.3423648 (DOI)
Projekt
eSSENCE - An eScience Collaboration
Tillgänglig från: 2024-12-03 Skapad: 2024-12-03 Senast uppdaterad: 2025-01-07Bibliografiskt granskad
Zhang, T. (2024). Autonomous Hierarchical Storage Management via Reinforcement Learning. In: : . Paper presented at PhD Workshop, 50th International Conference on Very Large Databases (VLDB 2024), Guangzhou, China, August 26-30, 2024. VLDB, Article ID 6.
Öppna denna publikation i ny flik eller fönster >>Autonomous Hierarchical Storage Management via Reinforcement Learning
2024 (Engelska)Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In the present era of big data, the challenges of data management have grown significantly. One crucial aspect is the management of data storage. As data volumes continue to expand, effective storage management becomes increasingly essential. Meanwhile, evolving hardware technologies offer various storage options, ranging from HDDs to SSDs and NVRAMs. To this end, hierarchical (multi-tier) storage systems (HSS) have emerged as a solution, organizing different storage devices hierarchically to provide various storage options. However, managing multiple storage tiers and their data, while optimizing performance and cost-efficiency, is extremely complex. In this paper, we discuss the challenges in the management of hierarchical storage system. We summarise our previous contributions on tackling these challenges, including the proposal of a reinforcement learning (RL) based data migration policy and the design of an autonomous hierarchical storage management framework HSM-RL. We also present the applications of HSM-RL in scientific data management to demonstrate its adaptability and scalability. Finally, we conclude our work to date and outline the future research plans.

Ort, förlag, år, upplaga, sidor
VLDB, 2024
Serie
Proceedings of the VLDB Endowment, E-ISSN 2150-8097
Nyckelord
Hierarchical Storage Management, Reinforcement Learning
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Datavetenskap med inriktning mot databasteknik
Identifikatorer
urn:nbn:se:uu:diva-542280 (URN)
Konferens
PhD Workshop, 50th International Conference on Very Large Databases (VLDB 2024), Guangzhou, China, August 26-30, 2024
Projekt
eSSENCE - An eScience Collaboration
Forskningsfinansiär
Stiftelsen för strategisk forskning (SSF), BD15-0008Swedish National Infrastructure for Computing (SNIC)
Tillgänglig från: 2024-11-09 Skapad: 2024-11-09 Senast uppdaterad: 2025-01-07Bibliografiskt granskad
Li, S., Ngai, E. C. H., Ye, F., Ju, L., Zhang, T. & Voigt, T. (2024). Blades: A Unified Benchmark Suite for Byzantine Attacks and Defenses in Federated Learning. In: 2024 IEEE/ACM Ninth International Conference on Internet-of-Things Design and Implementation (IoTDI): . Paper presented at 9th ACM/IEEE Conference on Internet of Things Design and Implementation (IoTDI), May 13-16, 2024, Hong Kong, Hong Kong (pp. 158-169). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>Blades: A Unified Benchmark Suite for Byzantine Attacks and Defenses in Federated Learning
Visa övriga...
2024 (Engelska)Ingår i: 2024 IEEE/ACM Ninth International Conference on Internet-of-Things Design and Implementation (IoTDI), Institute of Electrical and Electronics Engineers (IEEE), 2024, s. 158-169Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Federated learning (FL) facilitates distributed training across different IoT and edge devices, safeguarding the privacy of their data. The inherent distributed structure of FL introduces vulnerabilities, especially from adversarial devices aiming to skew local updates to their advantage. Despite the plethora of research focusing on Byzantine-resilient FL, the academic community has yet to establish a comprehensive benchmark suite, pivotal for impartial assessment and comparison of different techniques. This paper presents Blades, a scalable, extensible, and easily configurable benchmark suite that supports researchers and developers in efficiently implementing and validating novel strategies against baseline algorithms in Byzantine-resilient FL. Blades contains built-in implementations of representative attack and defense strategies and offers a user-friendly interface that seamlessly integrates new ideas. Using Blades, we re-evaluate representative attacks and defenses on wide-ranging experimental configurations (approximately 1,500 trials in total). Through our extensive experiments, we gained new insights into FL robustness and highlighted previously overlooked limitations due to the absence of thorough evaluations and comparisons of baselines under various attack settings. We maintain the source code and documents at https://github.com/lishenghui/blades.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2024
Nyckelord
Byzantine attacks, distributed learning, federated learning, IoT, neural networks, robustness
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:uu:diva-537577 (URN)10.1109/IoTDI61053.2024.00018 (DOI)001261370500014 ()2-s2.0-85196568437 (Scopus ID)979-8-3503-7025-6 (ISBN)979-8-3503-7026-3 (ISBN)
Konferens
9th ACM/IEEE Conference on Internet of Things Design and Implementation (IoTDI), May 13-16, 2024, Hong Kong, Hong Kong
Forskningsfinansiär
Vetenskapsrådet, 2017-04543
Tillgänglig från: 2024-09-05 Skapad: 2024-09-05 Senast uppdaterad: 2025-02-11Bibliografiskt granskad
Zhang, T., Gupta, A., Francisco Rodríguez, M. A., Spjuth, O., Hellander, A. & Toor, S. (2024). Data management of scientific applications in a reinforcement learning-based hierarchical storage system. Expert systems with applications, 237, Article ID 121443.
Öppna denna publikation i ny flik eller fönster >>Data management of scientific applications in a reinforcement learning-based hierarchical storage system
Visa övriga...
2024 (Engelska)Ingår i: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 237, artikel-id 121443Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

In many areas of data-driven science, large datasets are generated where the individual data objects are images, matrices, or otherwise have a clear structure. However, these objects can be information-sparse, and a challenge is to efficiently find and work with the most interesting data as early as possible in an analysis pipeline. We have recently proposed a new model for big data management where the internal structure and information of the data are associated with each data object (as opposed to simple metadata). There is then an opportunity for comprehensive data management solutions to account for data-specific internal structure as well as access patterns. In this article, we explore this idea together with our recently proposed hierarchical storage management framework that uses reinforcement learning (RL) for autonomous and dynamic data placement in different tiers in a storage hierarchy. Our case-study is based on four scientific datasets: Protein translocation microscopy images, Airfoil angle of attack meshes, 1000 Genomes sequences, and Phenotypic screening images. The presented results highlight that our framework is optimal and can quickly adapt to new data access requirements. It overall reduces the data processing time, and the proposed autonomous data placement is superior compared to any static or semi-static data placement policies.

Ort, förlag, år, upplaga, sidor
Elsevier, 2024
Nyckelord
Data management, Scientific application, Hierarchical storage system, Reinforcement learning, Large scientific datasets
Nationell ämneskategori
Datavetenskap (datalogi) Beräkningsmatematik
Forskningsämne
Datavetenskap med inriktning mot databasteknik; Datavetenskap
Identifikatorer
urn:nbn:se:uu:diva-513854 (URN)10.1016/j.eswa.2023.121443 (DOI)001081909200001 ()
Forskningsfinansiär
Stiftelsen för strategisk forskning (SSF), BD15-0008Swedish National Infrastructure for Computing (SNIC), SNIC 2022/22-835eSSENCE - An eScience Collaboration
Tillgänglig från: 2023-10-12 Skapad: 2023-10-12 Senast uppdaterad: 2024-12-08Bibliografiskt granskad
Zhang, T., Gupta, A., Francisco Rodríguez, M. A., Spjuth, O., Hellander, A. & Toor, S. (2024). Data management of scientific applications in a reinforcement learning-based hierarchical storage system. Expert systems with applications, 237, 121443-121443, Article ID 121443.
Öppna denna publikation i ny flik eller fönster >>Data management of scientific applications in a reinforcement learning-based hierarchical storage system
Visa övriga...
2024 (Engelska)Ingår i: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 237, s. 121443-121443, artikel-id 121443Artikel i tidskrift (Refereegranskat) Published
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:uu:diva-552596 (URN)10.1016/j.eswa.2023.121443 (DOI)
Forskningsfinansiär
Stiftelsen för strategisk forskning (SSF), BD15-0008Swedish National Infrastructure for Computing (SNIC), SNIC 2022/22-835eSSENCE - An eScience Collaboration
Tillgänglig från: 2025-03-17 Skapad: 2025-03-17 Senast uppdaterad: 2025-03-17
Li, S., Ngait, E.-H. C. -., Ye, F., Ju, L., Zhang, T. & Voigt, T. (2024). Demo Abstract: Blades: A Unified Benchmark Suite for Byzantine-Resilient in Federated Learning. In: 9TH ACM/IEEE CONFERENCE ON INTERNET OF THINGS DESIGN AND IMPLEMENTATION, IOTDI 2024: . Paper presented at 9th ACM/IEEE Conference on Internet of Things Design and Implementation (IoTDI), MAY 13-16, 2024, Hong Kong, PEOPLES R CHINA (pp. 229-230). IEEE Computer Society
Öppna denna publikation i ny flik eller fönster >>Demo Abstract: Blades: A Unified Benchmark Suite for Byzantine-Resilient in Federated Learning
Visa övriga...
2024 (Engelska)Ingår i: 9TH ACM/IEEE CONFERENCE ON INTERNET OF THINGS DESIGN AND IMPLEMENTATION, IOTDI 2024, IEEE Computer Society, 2024, s. 229-230Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Federated learning (FL) facilitates distributed training across different IoT and edge devices, safeguarding the privacy of their data. The inherently distributed nature of FL introduces vulnerabilities, especially from adversarial devices aiming to skew local updates to their desire. Despite the plethora of research focusing on Byzantine-resilient FL, the academic conununity has yet to establish a comprehensive benchmark suite, pivotal for the assessment and comparison of different techniques. This demonstration presents Blades, a scalable, extensible, and easily configurable benchmark suite that supports researchers and developers in efficiently implementing and validating strategies against baseline algorithms in Byzantine-resilient FL.

Ort, förlag, år, upplaga, sidor
IEEE Computer Society, 2024
Nyckelord
Byzantine attacks, distributed learning, federated learning, IoT, neural networks, robustness
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:uu:diva-537570 (URN)10.1109/IoTDI61053.2024.00030 (DOI)001261370500026 ()979-8-3503-7025-6 (ISBN)979-8-3503-7026-3 (ISBN)
Konferens
9th ACM/IEEE Conference on Internet of Things Design and Implementation (IoTDI), MAY 13-16, 2024, Hong Kong, PEOPLES R CHINA
Tillgänglig från: 2024-09-05 Skapad: 2024-09-05 Senast uppdaterad: 2024-09-05Bibliografiskt granskad
Zhang, T., Hellander, A. & Toor, S. (2023). Efficient Hierarchical Storage Management Empowered by Reinforcement Learning. IEEE Transactions on Knowledge and Data Engineering, 35, 5780-5793
Öppna denna publikation i ny flik eller fönster >>Efficient Hierarchical Storage Management Empowered by Reinforcement Learning
2023 (Engelska)Ingår i: IEEE Transactions on Knowledge and Data Engineering, ISSN 1041-4347, E-ISSN 1558-2191, Vol. 35, s. 5780-5793Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

With the rapid development of big data and cloud computing, data management has become increasingly challenging. Over the years, a number of frameworks for data management have become available. Most of them are highly efficient, but ultimately create data silos. It becomes difficult to move and work coherently with data as new requirements emerge. A possible solution is to use an intelligent hierarchical (multi-tier) storage system (HSS). A HSS is a meta solution that consists of different storage frameworks organized as a jointly constructed storage pool. A built-in data migration policy that determines the optimal placement of the datasets in the hierarchy is essential. Placement decisions is a non-trivial task since it should be made according to the characteristics of the dataset, the tier status in a hierarchy, and access patterns. This paper presents an open-source hierarchical storage framework with a dynamic migration policy based on reinforcement learning (RL). We present a mathematical model, a software architecture, and implementations based on both simulations and a live cloud-based environment. We compare the proposed RL-based strategy to a baseline of three rule-based policies, showing that the RL-based policy achieves significantly higher efficiency and optimal data distribution in different scenarios.

Ort, förlag, år, upplaga, sidor
IEEE, 2023
Nyckelord
Data Management, Cloud Computing, Hierarchical Storage System, Data Migration, Reinforcement Learning
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Datavetenskap
Identifikatorer
urn:nbn:se:uu:diva-490399 (URN)10.1109/tkde.2022.3176753 (DOI)000981944600024 ()
Projekt
eSSENCE - An eScience Collaboration
Forskningsfinansiär
Stiftelsen för strategisk forskning (SSF), BD15-0008
Tillgänglig från: 2022-12-09 Skapad: 2022-12-09 Senast uppdaterad: 2024-12-16Bibliografiskt granskad
Zhang, T., Hellander, A. & Toor, S. (2023). Efficient Hierarchical Storage Management Empowered by Reinforcement Learning Extended Abstract. In: : . Paper presented at 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, California, 3-7 April, 2023 (pp. 3869-3870). IEEE
Öppna denna publikation i ny flik eller fönster >>Efficient Hierarchical Storage Management Empowered by Reinforcement Learning Extended Abstract
2023 (Engelska)Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

With the rapid development of big data and cloud computing, data management has become increasingly challenging. A possible solution is to use an intelligent hierarchical (multi-tier) storage system (HSS). An HSS is a meta solution that consists of different storage frameworks organized as a jointly constructed storage pool. A built-in data migration policy that determines the optimal placement of the datasets in the hierarchy is essential. Placement decisions are a non-trivial task since they should be made according to the characteristics of the dataset, the tier status in a hierarchy, and access patterns. This paper presents an open-source hierarchical storage framework with a dynamic migration policy based on reinforcement learning (RL).

Ort, förlag, år, upplaga, sidor
IEEE, 2023
Nyckelord
Cloud computing, Storage management, Reinforcement learning, Big Data, Data engineering
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Datavetenskap
Identifikatorer
urn:nbn:se:uu:diva-525454 (URN)10.1109/ICDE55515.2023.00361 (DOI)979-8-3503-2227-9 (ISBN)979-8-3503-2228-6 (ISBN)
Konferens
2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, California, 3-7 April, 2023
Forskningsfinansiär
Stiftelsen för strategisk forskning (SSF), BD15-0008
Tillgänglig från: 2024-03-22 Skapad: 2024-03-22 Senast uppdaterad: 2024-03-22Bibliografiskt granskad
Zhang, T., Ju, L., Singh, P. & Toor, S.InfoHier: Hierarchical Information Extraction via Encoding and Embedding.
Öppna denna publikation i ny flik eller fönster >>InfoHier: Hierarchical Information Extraction via Encoding and Embedding
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

Analyzing large-scale datasets, especially involving complex and high-dimensional data like images, is particularly challenging. While self-supervised learning (SSL) has proven effective for learning representations from unlabeled data, it typically focuses on flat, non-hierarchical structures, missing the multi-level relationships present in many real-world datasets. Hierarchical clustering (HC) can uncover these relationships by organizing data into a tree-like structure, but it often relies on rigid similarity metrics that struggle to capture the complexity of diverse data types. To address these we envision InfoHier, a framework that combines SSL with HC to jointly learn robust latent representations and hierarchical structures. This approach leverages SSL to provide adaptive representations, enhancing HC's ability to capture complex patterns. Simultaneously, it integrates HC loss to refine SSL training, resulting in representations that are more attuned to the underlying information hierarchy. InfoHier has the potential to improve the expressiveness and performance of both clustering and representation learning, offering significant benefits for data analysis, management, and information retrieval.

Nyckelord
Hierarchical Representation, Hierarchical Clustering, Self-Supervised Learning, Joint Learning, Information Retrieval
Nationell ämneskategori
Datavetenskap (datalogi) Systemvetenskap, informationssystem och informatik
Forskningsämne
Datavetenskap
Identifikatorer
urn:nbn:se:uu:diva-544717 (URN)
Tillgänglig från: 2024-12-08 Skapad: 2024-12-08 Senast uppdaterad: 2024-12-18Bibliografiskt granskad
Organisationer
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0001-9983-3755

Sök vidare i DiVA

Visa alla publikationer