Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Efficient techniques for predicting cache sharing and throughput
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. (UART)ORCID iD: 0000-0001-9349-5791
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. (UART)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. (UART)
2012 (English)In: Proc. 21st International Conference on Parallel Architectures and Compilation Techniques, New York: ACM Press, 2012, 305-314 p.Conference paper, Published paper (Refereed)
Abstract [en]

This work addresses the modeling of shared cache contention in multicore systems and its impact on throughput and bandwidth. We develop two simple and fast cache sharing models for accurately predicting shared cache allocations for random and LRU caches.

To accomplish this we use low-overhead input data that captures the behavior of applications running on real hardware as a function of their shared cache allocation. This data enables us to determine how much and how aggressively data is reused by an application depending on how much shared cache it receives. From this we can model how applications compete for cache space, their aggregate performance (throughput)¸ and bandwidth.

We evaluate our models for two- and four-application workloads in simulation and on modern hardware. On a four-core machine, we demonstrate an average relative fetch ratio error of 6.7% for groups of four applications. We are able to predict workload bandwidth with an average relative error of less than 5.2% and throughput with an average error of less than 1.8%. The model can predict cache size with an average error of 1.3% compared to simulation.

Place, publisher, year, edition, pages
New York: ACM Press, 2012. 305-314 p.
National Category
Computer Systems
Research subject
Computer Systems
Identifiers
URN: urn:nbn:se:uu:diva-178207DOI: 10.1145/2370816.2370861ISBN: 978-1-4503-1182-3 (print)OAI: oai:DiVA.org:uu-178207DiVA: diva2:559093
Conference
PACT 2012, September 19–23, Minneapolis, MN
Projects
CoDeR-MPUPMARC
Available from: 2012-10-09 Created: 2012-07-30 Last updated: 2014-04-29Bibliographically approved
In thesis
1. Understanding Multicore Performance: Efficient Memory System Modeling and Simulation
Open this publication in new window or tab >>Understanding Multicore Performance: Efficient Memory System Modeling and Simulation
2014 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

To increase performance, modern processors employ complex techniques such as out-of-order pipelines and deep cache hierarchies. While the increasing complexity has paid off in performance, it has become harder to accurately predict the effects of hardware/software optimizations in such systems. Traditional microarchitectural simulators typically execute code 10 000×–100 000× slower than native execution, which leads to three problems: First, high simulation overhead makes it hard to use microarchitectural simulators for tasks such as software optimizations where rapid turn-around is required. Second, when multiple cores share the memory system, the resulting performance is sensitive to how memory accesses from the different cores interleave. This requires that applications are simulated multiple times with different interleaving to estimate their performance distribution, which is rarely feasible with today's simulators. Third, the high overhead limits the size of the applications that can be studied. This is usually solved by only simulating a relatively small number of instructions near the start of an application, with the risk of reporting unrepresentative results.

In this thesis we demonstrate three strategies to accurately model multicore processors without the overhead of traditional simulation. First, we show how microarchitecture-independent memory access profiles can be used to drive automatic cache optimizations and to qualitatively classify an application's last-level cache behavior. Second, we demonstrate how high-level performance profiles, that can be measured on existing hardware, can be used to model the behavior of a shared cache. Unlike previous models, we predict the effective amount of cache available to each application and the resulting performance distribution due to different interleaving without requiring a processor model. Third, in order to model future systems, we build an efficient sampling simulator. By using native execution to fast-forward between samples, we reach new samples much faster than a single sample can be simulated. This enables us to simulate multiple samples in parallel, resulting in almost linear scalability and a maximum simulation rate close to native execution.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2014. 54 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1136
Keyword
Computer Architecture, Simulation, Modeling, Sampling, Caches, Memory Systems, gem5, Parallel Simulation, Virtualization, Sampling, Multicore
National Category
Computer Engineering
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-220652 (URN)978-91-554-8922-9 (ISBN)
Public defence
2014-05-22, Room 2446, Polacksbacken, Lägerhyddsvägen 2, Uppsala, 09:30 (English)
Opponent
Supervisors
Projects
CoDeR-MPUPMARC
Available from: 2014-04-28 Created: 2014-03-18 Last updated: 2018-01-11Bibliographically approved

Open Access in DiVA

pact2012_sharing.pdf(455 kB)617 downloads
File information
File name FULLTEXT02.pdfFile size 455 kBChecksum SHA-512
bb4e333591f8d566af0c6a9a5e378d6a88c6e256f74e7dcdf4a7ff4a255647ee71ba64fa5c09b309f84da01b8f8f0b2a1a799acec211c88b050a618ad2c066f5
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Sandberg, AndreasBlack-Schaffer, DavidHagersten, Erik
By organisation
Computer Systems
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 617 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 1263 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf