uu.seUppsala University Publications
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluating the Importance of Disk-locality for Data Analytics Workloads: Evaluating the Importance of Disk-locality for Data Analytics Workloads
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2020 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Designing on-premise hardware platforms to deal with big data analytics should be done in a way in which the available resources can be scaled both up and down depending on future needs.

Two of the main components of an analytics cluster is the data storage and computational part. Separating those two components yields great value but can come with the price of performance loss if not set up properly.

The objective of this thesis is to examine how much the performance gets impacted when the computational and storage part gets divided into different hardware nodes. To get data on how well this separation could be done, several tests were conducted on different hardware setups. These tests included real-world workloads run on configurations where both the storage and the computation took place on the same nodes and on configurations where these components were separated. While those tests were done on a smaller scale with only three compute nodes parallel, tests with similar workloads were also conducted on a larger scale with up to 32 computational nodes.

The tests revealed that separating compute from storage on a smaller scale could be done without any significant performance drawbacks. However,when the computational components grew large enough,bottlenecks in the storage cluster surfaced. While the results on a smaller scale were satisfactory,further improvements could be made for the larger-scale tests.

Place, publisher, year, edition, pages
2020. , p. 50
Series
IT ; 20008
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:uu:diva-410212OAI: oai:DiVA.org:uu-410212DiVA, id: diva2:1429694
Educational program
Master Programme in Computer Science
Supervisors
Examiners
Available from: 2020-05-20 Created: 2020-05-12 Last updated: 2020-05-20Bibliographically approved

Open Access in DiVA

fulltext(874 kB)6 downloads
File information
File name FULLTEXT01.pdfFile size 874 kBChecksum SHA-512
79f06adf9d5a8af0a7afc507e83721f499dc2a9936f518f80f2bed7c1d7c84f74af4d70faf11dfaab093657d63c23b2ee621d36985175191c9c94ee4d1a8ee7c
Type fulltextMimetype application/pdf

By organisation
Department of Information Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 6 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 2 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf