Logotyp: till Uppsala universitets webbplats

uu.sePublikationer från Uppsala universitet
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Restore: A Reinforcement Learning Approach For Data Migration In Multi-Tiered Storage
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för beräkningsvetenskap. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap.ORCID-id: 0000-0001-9983-3755
Boston University. (BU DiSC lab)
Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Tillämpad beräkningsvetenskap. Uppsala universitet, Teknisk-naturvetenskapliga vetenskapsområdet, Matematisk-datavetenskapliga sektionen, Institutionen för informationsteknologi, Avdelningen för beräkningsvetenskap.ORCID-id: 0000-0003-0302-6276
Boston University. (BU DiSC lab)
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

With the development of storage technologies, a wide variety of storage devices with differing performance characteristics and cost profiles have emerged. As a result, data systems are increasingly adopting multi-tiered storage solutions. A primary challenge in multi-tiered storage systems is data placement, as data must be dynamically stored and migrated across different storage tiers to optimize overall performance. Effective data migration policies should be able to adapt to workload variations while also considering the unique characteristics of underlying devices (such as PCIe/SATA SSD, or HDD), notably their read/write asymmetry and parallelism. In this paper, we introduce ReStore, a reinforcement learning (RL) approach for data migration in multi-tiered storage systems. ReStore leverages RL to capture both workload patterns and device-specific characteristics, including access frequency and recency, as well as device read/write asymmetry and parallelism. Each storage tier uses a different device and is associated with an RL agent that dynamically updates its parameter using temporal difference learning, ensuring continuous adaptability to changing workloads and system states. We experimentally show that ReStore achieves up to 2.2× lower runtime and up to 10× fewer migrations using industry-grade benchmarks, like TPC-C/E and YCSB, real-life traces, like Google Thesios, and a wide variety of synthetic workloads.

Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Datavetenskap med inriktning mot databasteknik; Datavetenskap
Identifikatorer
URN: urn:nbn:se:uu:diva-544716OAI: oai:DiVA.org:uu-544716DiVA, id: diva2:1919249
Tillgänglig från: 2024-12-08 Skapad: 2024-12-08 Senast uppdaterad: 2024-12-18Bibliografiskt granskad
Ingår i avhandling
1. Intelligent Data Management via Machine Learning: From Storage Hierarchy to Information Hierarchy
Öppna denna publikation i ny flik eller fönster >>Intelligent Data Management via Machine Learning: From Storage Hierarchy to Information Hierarchy
2025 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The rise of Big Data has catalyzed numerous advanced data-driven methods, while simultaneously posing significant challenges in data management. This thesis aims to address two fundamental aspects of data management–storage management and information extraction–by leveraging machine learning (ML) techniques. In particular, we focus on two research topics: Storage Hierarchy, which explores hierarchical storage management (HSM) in multi-tiered storage systems; and Information Hierarchy, which targets the extraction of intrinsic data hierarchies from raw data.

We begin by introducing the key stages of data life cycle and their associated challenges in the Big Data era, alongside a review of machine learning foundations and their potentials for addressing these challenges. Subsequently, we present the Storage Hierarchy project, which is detailed across Paper I, II, and III. In these works, we develop automated, adaptive, and efficient HSM approaches using reinforcement learning (RL). In Paper I we introduce the HSM-RL framework for managing file-level data migration in hierarchical storage system (HSS). It leverages RL to optimize file placement and temporal difference learning for real-time adaptability. Paper II extends this work to complex real–world scenarios using scientific datasets, exploring the framework’s flexibility, scalability, and effectiveness. Moving to finer granularity, Paper III presents ReStore, an RL-based page-level data migration approach that incorporates the unique characteristics of modern Solid-State Drives (SSDs), such as read/write asymmetry and parallelism.

The Information Hierarchy project focuses on autonomous extraction of implicit data hierarchies from raw, unlabeled data. Presented in Paper IV, we propose InfoHier, a framework that integrates self-supervised learning (SSL) with hierarchical clustering (HC) to uncover latent data representations and hierarchical structures. By jointly training SSL and HC through a dynamic balancing loss, InfoHier ensure that the HC results align with the intrinsic data hierarchy. This method facilitates meaningful and structured information extraction and retrieval. 

Collectively, the Storage Hierarchy and Information Hierarchy projects advance intelligent data management by enabling efficient storage solutions and autonomous information extraction. These contributions pave the foundation for next generation data management systems, addressing the challenges of Big Data with adaptive and scalable solutions.

Ort, förlag, år, upplaga, sidor
Uppsala: Acta Universitatis Upsaliensis, 2025. s. 93
Serie
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 2483
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Datavetenskap
Identifikatorer
urn:nbn:se:uu:diva-544718 (URN)978-91-513-2332-9 (ISBN)
Disputation
2025-02-07, Häggsalen, Ångströmlaboratoriet, Lägerhyddsvägen 1, Uppsala, 10:15 (Engelska)
Opponent
Handledare
Tillgänglig från: 2025-01-16 Skapad: 2024-12-08 Senast uppdaterad: 2025-01-16

Open Access i DiVA

Fulltext saknas i DiVA

Person

Zhang, TianruToor, Salman

Sök vidare i DiVA

Av författaren/redaktören
Zhang, TianruToor, Salman
Av organisationen
Avdelningen för beräkningsvetenskapTillämpad beräkningsvetenskap
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 77 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf