Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Application Task and Data Placement in Embedded Multi-core NUMA Architectures: Optimization techniques for the Samsung 16-SRP
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2013 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The evolution of microprocessors  has lead to a situation where more memory is integrated closer to the computational cores. This has created architectures where memory latencies vary depending on the calling cores location. Such architectures are referred to as Non-Uniform Memory Access (NUMA) architectures. This adds further complexity to the already complex environment of developing parallel applications.

In this paper I research effective task and data placement optimization techniques for a Samsung Multi-Processor System-on-Chip (MPSoC) prototype. The research was structured  by first conducting a series of extreme case micro benchmarks to gain insight of hardware behavior. These insights was then used to optimize two applications from the imaging domain; a 2D image blurring application and a 3D Seeded Region Growing (SRG) application.

The results from conducted benchmarks show that a wide range of factors are of importance when optimizing applications for the Samsung 16-SRP architec- ture. Although NUMA penalties exists, reducing congestion at the memory controllers and in the DMA channels are of importance to overall execution time. I propose task and data distribution schemes that work well for benchmarks with static and dynamic workloads. Clustered hierarchical work queues with work stealing have shown to be an effective approach to optimizing applications with a dynamic workload.

For future research it would be interesting to run further micro benchmarks of the system under congestion. To gain further verification of suggested task and data distribution schemes suggested in this thesis it would be of interest to apply them to more applications.

Place, publisher, year, edition, pages
2013.
Series
UPTEC IT, ISSN 1401-5749 ; 13 006
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:uu:diva-199614OAI: oai:DiVA.org:uu-199614DiVA: diva2:620342
Educational program
Master of Science Programme in Information Technology Engineering
Uppsok
Technology
Supervisors
Examiners
Available from: 2013-05-08 Created: 2013-05-08 Last updated: 2013-05-08Bibliographically approved

Open Access in DiVA

fulltext(13668 kB)607 downloads
File information
File name FULLTEXT01.pdfFile size 13668 kBChecksum SHA-512
f84f8d65d4ca4e88df3db3dfba012c870ed5d6c61fb57d9109db5ebd01444b63506591f25a9ca009278203f8deefbdd74daf0999485643c30c117552f6e1d55e
Type fulltextMimetype application/pdf

By organisation
Department of Information Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 607 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 451 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf