Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Dynamically Disabling Way-prediction to Reduce Instruction Replay
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (UART)ORCID iD: 0000-0002-6259-7821
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (UART)
2018 (English)In: 2018 IEEE 36th International Conference on Computer Design (ICCD), IEEE, 2018, p. 140-143Conference paper, Published paper (Refereed)
Abstract [en]

Way-predictors have long been used to reduce dynamic cache energy without the performance loss of serial caches. However, they produce variable-latency hits, as incorrect predictions increase load-to-use latency. While the performance impact of these extra cycles has been well-studied, the need to replay subsequent instructions in the pipeline due to the load latency increase has been ignored. In this work we show that way-predictors pay a significant performance penalty beyond previously studied effects due to instruction replays caused by mispredictions. To address this, we propose a solution that learns the confidence of the way prediction and dynamically disables it when it is likely to mispredict and cause replays. This allows us to reduce cache latency (when we can trust the way-prediction) while still avoiding the need to replay instructions in the pipeline (by avoiding way-mispredictions). Standard way-predictors degrade IPC by 6.9% vs. a parallel cache due to 10% of the instructions being replayed (worst case 42.3%). While our solution decreases way-prediction accuracy by turning off the way-predictor in some cases when it would have been correct, it delivers higher performance than a standard way-predictor. Our confidence-based way-predictor degrades IPC by only 4.4% by replaying just 5.6% of the instructions (worse case 16.3%). This reduces the way-predictor cache energy overhead compared to serial access cache, from 8.5% to 3.7% on average and on the worst case, from 33.8% to 9.5%.

Place, publisher, year, edition, pages
IEEE, 2018. p. 140-143
Series
Proceedings IEEE International Conference on Computer Design, ISSN 1063-6404, E-ISSN 2576-6996
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:uu:diva-361215DOI: 10.1109/ICCD.2018.00029ISI: 000458293200018ISBN: 978-1-5386-8477-1 (electronic)OAI: oai:DiVA.org:uu-361215DiVA, id: diva2:1250104
Conference
IEEE 36th International Conference on Computer Design (ICCD), October 7–10, 2018, Orlando, FL, USA
Available from: 2018-09-21 Created: 2018-09-21 Last updated: 2019-05-22Bibliographically approved
In thesis
1. Leveraging Existing Microarchitectural Structures to Improve First-Level Caching Efficiency
Open this publication in new window or tab >>Leveraging Existing Microarchitectural Structures to Improve First-Level Caching Efficiency
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Low-latency data access is essential for performance. To achieve this, processors use fast first-level caches combined with out-of-order execution, to decrease and hide memory access latency respectively. While these approaches are effective for performance, they cost significant energy, leading to the development of many techniques that require designers to trade-off performance and efficiency.

Way-prediction and filter caches are two of the most common strategies for improving first-level cache energy efficiency while still minimizing latency. They both have compromises as way-prediction trades off some latency for better energy efficiency, while filter caches trade off some energy efficiency for lower latency. However, these strategies are not mutually exclusive. By borrowing elements from both, and taking into account SRAM memory layout limitations, we proposed a novel MRU-L0 cache that mitigates many of their shortcomings while preserving their benefits. Moreover, while first-level caches are tightly integrated into the cpu pipeline, existing work on these techniques largely ignores the impact they have on instruction scheduling. We show that the variable hit latency introduced by way-misspredictions causes instruction replays of load dependent instruction chains, which hurts performance and efficiency. We study this effect and propose a variable latency cache-hit instruction scheduler, that identifies potential misschedulings, reduces instruction replays, reduces negative performance impact, and further improves cache energy efficiency.

Modern pipelines also employ sophisticated execution strategies to hide memory latency and improve performance. While their primary use is for performance and correctness, they require intermediate storage that can be used as a cache as well. In this work we demonstrate how the store-buffer, paired with the memory dependency predictor, can be used to efficiently cache dirty data; and how the physical register file, paired with a value predictor, can be used to efficiently cache clean data. These strategies not only improve both performance and energy, but do so with no additional storage and minimal additional complexity, since they recycle existing cpu structures to detect reuse, memory ordering violations, and misspeculations.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2019. p. 42
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1821
Keywords
Energy Efficient Caching, Memory Architecture, Single Thread Performance, First-Level Caching, Out-of-Order Pipelines, Instruction Scheduling, Filter-Cache, Way-Prediction, Value-Prediction, Register-Sharing.
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-383811 (URN)978-91-513-0681-0 (ISBN)
Public defence
2019-08-26, Sal VIII, Universitetshuset, Biskopsgatan 3, Uppsala, 09:00 (English)
Opponent
Supervisors
Available from: 2019-06-11 Created: 2019-05-22 Last updated: 2019-08-23

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full texthttps://www.iccd-conf.com/Home.html

Search in DiVA

By author/editor
Alves, RicardoKaxiras, StefanosBlack-Schaffer, David
By organisation
Computer Architecture and Computer Communication
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 1482 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf