uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Optimizing Hadoop Parameters Based on the Application Resource Consumption
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2013 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The interest in analyzing the growing amounts of data has encouraged the deployment of large scale parallel computing frameworks such as Hadoop. In other words, data analytic is the main reason behind the success of distributed systems; this is due tothe fact that data might not fit on a single disk, and that processing can be very time consuming which makes parallel input analysis very useful. Hadoop relies on the MapReduce programming paradigm to distribute work among the machines; so a good balance of load will eventually influence the execution time of those kinds of applications.

This paper introduces a technique to optimize some configuration parameters using the application's CPU utilization in order to tune Hadoop; the theories stated and proved in this paper rely on the fact that the CPUs should neither be over utilized nor under utilized; in other words, the conclusion will be a sort of an equation of the parameter to be optimized in terms of the cluster infrastructure.The future research concerning this topic is planned to focus on tuning other Hadoop parameters and to use more accurate tools to analyze the cluster performance; moreover, it is also interesting to research any possible ways to optimize Hadoop parameters based on other consumption criteria such the input/output statistics and the network traffic.

Place, publisher, year, edition, pages
2013.
Series
IT, 13 034
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:uu:diva-200144OAI: oai:DiVA.org:uu-200144DiVA: diva2:622285
Educational program
Master Programme in Computer Science
Uppsok
Technology
Supervisors
Examiners
Available from: 2013-05-21 Created: 2013-05-21 Last updated: 2013-12-03Bibliographically approved

Open Access in DiVA

fulltext(1319 kB)2278 downloads
File information
File name FULLTEXT01.pdfFile size 1319 kBChecksum SHA-512
f67e0e1e28f00fe2b93cc5c7cec1f33eb58c0d3799c257ef19e6b9544a39a1cfe2ce1b73270d064ec09a3dde42bc6514d1aa7bf2cbcb4e0df3550832182904ee
Type fulltextMimetype application/pdf

By organisation
Department of Information Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 2278 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 883 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf