Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Advances Towards Data-Race-Free Cache Coherence Through Data Classification
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Architecture and Computer Communication. (UART)
2017 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Providing a consistent view of the shared memory based on precise and well-defined semantics—memory consistency model—has been an enabling factor in the widespread acceptance and commercial success of shared-memory architectures. Moreover, cache coherence protocols have been employed by the hardware to remove from the programmers the burden of dealing with the memory inconsistency that emerges in the presence of the private caches. The principle behind all such cache coherence protocols is to guarantee that consistent values are read from the private caches at all times.

In its most stringent form, a cache coherence protocol eagerly enforces two invariants before each data modification: i) no other core has a copy of the data in its private caches, and ii) all other cores know where to receive the consistent data should they need the data later. Nevertheless, by partly transferring the responsibility for maintaining those invariants to the programmers, commercial multicores have adopted weaker memory consistency models, namely the Total Store Order (TSO), in order to optimize the performance for more common cases.

Moreover, memory models with more relaxed invariants have been proposed based on the observation that more and more software is written in compliance with the Data-Race-Free (DRF) semantics. The semantics of DRF software can be leveraged by the hardware to infer when data in the private caches might be inconsistent. As a result, hardware ignores the inconsistent data and retrieves the consistent data from the shared memory. DRF semantics therefore removes from the hardware the burden of eagerly enforcing the strong consistency invariants before each data modification. Instead, consistency is guaranteed only when needed. This results in manifold optimizations, such as reducing the energy consumption and improving the performance and scalability. The efficiency of detecting and discarding the inconsistent data is an important factor affecting the efficiency of such coherence protocols. For instance, discarding the consistent data does not affect the correctness, but results in performance loss and increased energy consumption.

In this thesis we show how data classification can be leveraged as an effective tool to simplify the cache coherence based on the DRF semantics. In particular, we introduce simple but efficient hardware-based private/shared data classification techniques that can be used to efficiently detect the inconsistent data, thus enabling low-overhead and scalable cache coherence solutions based on the DRF semantics.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2017. , 64 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1521
Keyword [en]
Shared Memory Architectures, Multicore, Memory Hierarchy, Cache Coherence, Data Classification
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:uu:diva-320595ISBN: 978-91-554-9925-9 (print)OAI: oai:DiVA.org:uu-320595DiVA: diva2:1090151
Public defence
2017-06-08, 2446, Lägerhyddsvägen 2, Hus 2, Uppsala, 13:15 (English)
Opponent
Supervisors
Available from: 2017-05-15 Created: 2017-04-22 Last updated: 2017-05-17
List of papers
1. Hierarchical private/shared classification: The key to simple and efficient coherence for clustered cache hierarchies
Open this publication in new window or tab >>Hierarchical private/shared classification: The key to simple and efficient coherence for clustered cache hierarchies
2015 (English)In: Proc. 21st International Symposium on High Performance Computer Architecture, IEEE Computer Society Digital Library, 2015, 186-197 p.Conference paper, Published paper (Refereed)
Abstract [en]

Hierarchical clustered cache designs are becoming an appealing alternative for multicores. Grouping cores and their caches in clusters reduces network congestion by localizing traffic among several hierarchical levels, potentially enabling much higher scalability. While such architectures can be formed recursively by replicating a base design pattern, keeping the whole hierarchy coherent requires more effort and consideration. The reason is that, in hierarchical coherence, even basic operations must be recursive. As a consequence, intermediate-level caches behave both as directories and as leaf caches. This leads to an explosion of states, protocol-races, and protocol complexity. While there have been previous efforts to extend directory-based coherence to hierarchical designs their increased complexity and verification cost is a serious impediment to their adoption. We aim to address these concerns by encapsulating all hierarchical complexity in a simple function: that of determining when a data block is shared entirely within a cluster (sub-tree of the hierarchy) and is private from the outside. This allows us to eliminate complex recursive operations that span the hierarchy and instead employ simple coherence mechanisms such as self-invalidation and write-through-now restricted to operate within the cluster where a data block is shared. We examine two inclusivity options and discuss the relation of our approach to the recently proposed Hierarchical-Race-Free (HRF) memory models. Finally, comparisons to a hierarchical directory-based MOESI, VIPS-M, and TokenCMP protocols show that, despite its simplicity our approach results in competitive performance and decreased network traffic.

Place, publisher, year, edition, pages
IEEE Computer Society Digital Library, 2015
National Category
Computer Systems
Identifiers
urn:nbn:se:uu:diva-265651 (URN)10.1109/HPCA.2015.7056032 (DOI)000380564900016 ()9781479989300 (ISBN)
External cooperation:
Conference
HPCA 2015, February 7–11, Burlingame, CA
Available from: 2015-02-11 Created: 2015-11-02 Last updated: 2017-04-22Bibliographically approved
2. The effects of granularity and adaptivity on private/shared classification for coherence
Open this publication in new window or tab >>The effects of granularity and adaptivity on private/shared classification for coherence
2015 (English)In: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, E-ISSN 1544-3973, Vol. 12, no 3, 26Article in journal (Refereed) Published
Abstract [en]

Classification of data into private and shared has proven to be a catalyst for techniques to reduce coherence cost, since private data can be taken out of coherence and resources can be concentrated on providing coherence for shared data. In this article, we examine how granularity-page-level versus cache-line level- and adaptivity-going from shared to private-affect the outcome of classification and its final impact on coherence. We create a classification technique, called Generational Classification, and a coherence protocol called Generational Coherence, which treats data as private or shared based on cache-line generations. We compare two coherence protocols based on self-invalidation/self-downgrade with respect to data classification. Our findings are enlightening: (i) Some programs benefit from finer granularity, some benefit further from adaptivity, but some do not benefit from either. (ii) Reducing the amount of shared data has no perceptible impact on coherence misses caused by self-invalidation of shared data, hence no impact on performance. (iii) In contrast, classifying more data as private has implications for protocols that employ write-through as a means of self-downgrade, resulting in network traffic reduction-up to 30%-by reducing write-through traffic.

National Category
Computer Systems
Identifiers
urn:nbn:se:uu:diva-265580 (URN)10.1145/2790301 (DOI)000363004100001 ()
Available from: 2015-10-06 Created: 2015-11-02 Last updated: 2017-04-22Bibliographically approved
3. An efficient, self-contained, on-chip directory: DIR1-SISD
Open this publication in new window or tab >>An efficient, self-contained, on-chip directory: DIR1-SISD
2015 (English)In: Proc. 24th International Conference on Parallel Architectures and Compilation Techniques, IEEE Computer Society, 2015, 317-330 p.Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
IEEE Computer Society, 2015
National Category
Computer Systems
Identifiers
urn:nbn:se:uu:diva-265611 (URN)10.1109/PACT.2015.23 (DOI)000378942700027 ()978-1-4673-9524-3 (ISBN)
Conference
PACT 2015, October 18–21, San Francisco, CA
Available from: 2015-11-02 Created: 2015-11-02 Last updated: 2017-04-22Bibliographically approved
4. Scope-Aware Classification: Taking the hierarchical private/shared data classification to the next level
Open this publication in new window or tab >>Scope-Aware Classification: Taking the hierarchical private/shared data classification to the next level
2017 (English)Report (Other academic)
Series
Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203 ; 2017-008
National Category
Computer Systems
Identifiers
urn:nbn:se:uu:diva-320324 (URN)
Available from: 2017-04-27 Created: 2017-04-19 Last updated: 2017-07-03Bibliographically approved
5. The best of both works: A hybrid data-race-free cache coherence scheme
Open this publication in new window or tab >>The best of both works: A hybrid data-race-free cache coherence scheme
2017 (English)In: ACM Transactions on Architecture and Code Optimization (TACO), ISSN 1544-3566, E-ISSN 1544-3973, Vol. 14Article in journal (Refereed) Accepted
National Category
Computer Systems
Identifiers
urn:nbn:se:uu:diva-320320 (URN)
Available from: 2017-04-19 Created: 2017-04-19 Last updated: 2017-07-03Bibliographically approved

Open Access in DiVA

fulltext(2039 kB)67 downloads
File information
File name FULLTEXT01.pdfFile size 2039 kBChecksum SHA-512
e837da2908ffe1f3a9b35637a8a0c2e3e5750caa54bc5ca50a9c4ead6c7fd4054ed036671d5ca606b7fef031100d1d6a990d36fe812456f37f82e81e5e7ded30
Type fulltextMimetype application/pdf
Buy this publication >>

Search in DiVA

By author/editor
Davari, Mahdad
By organisation
Division of Computer SystemsComputer Architecture and Computer Communication
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 67 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 753 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf