Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Advancements towards non-speculative concurrent execution of critical sections
University of Murcia.
2025 (English)Doctoral thesis, comprehensive summary (Other academic)Alternative title
Avances hacia la ejecución concurrente y no especulativa de secciones críticas (Spanish)
Description
Abstract [en]

Parallel programs require, besides the cache orchestration, another mechanism that guarantees synchronization among other threads of the same program.These synchronization mechanisms will induce overheads, by slowing down certain operations and stalling threads, among many others, to comply with the requirements established by the programmer.

The thesis's objective is the efficient execution of critical sections, that is, regions of code that must be executed atomically.The most efficient method is the concurrent and non-speculative executions of these sections.To achieve this, we present the 3 steps we have taken:1) single-atomic instructions can be used to implement non-speculative critical sections, therefore, we develop an updated version of the well-known Splash benchmark suite that uses single-address atomic instructions to implement most of the critical sections (Splash-4);2) a new set of multi-address atomic instructions, and a methodology on how to efficiently implement them, that can be used for small critical sections (MADs);3) without the direct intervention of the programmer, a more generic method that limits the retries required to execute contended critical regions (CLEAR).

For an efficient evaluation of the results, we have used the most up-to-date tools possible in each case, and even, when possible, real machines instead of simulations.For the simulations, we have used the gem5 simulator, at all times performing multiple runs.The simulator has been configured to emulate, as reliably as possible, processors based on the latest intel generations.

In our first step, Splash-4, we have managed to reduce the execution time by using 64-cores by 50%, while maintaining the original structure and algorithms.In the second objective (MADs), the new atomic instructions implemented, reduce execution time by 80% compared to the classical lock mechanism, and by 60% by using a transitional memory technique similar to intel TSX, adding only 68 bytes per core.Finally, CLEAR is able to limit the number of re-executions of critical sections executed under speculative methods, increasing by 35% the number of sections that complete on the first retry, and reducing from 37% to 15% the number of sections that need to reach fallback. All this improving the execution time by 35% against an Intel TSX implementation and 23% against PowerTM.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2025. , p. 74
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 2520
Keywords [en]
Computer Architecture, microarchitecture, atomic instructions, benchmark suite, non-speculative execution.
National Category
Computer and Information Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:uu:diva-552947ISBN: 978-91-513-2437-1 (print)OAI: oai:DiVA.org:uu-552947DiVA, id: diva2:1945956
Public defence
2025-06-03, Salón de Grados, Facultad de Informatica (Building 32), University of Murcia, Murcia (Spain), 16:00 (English)
Opponent
Supervisors
Available from: 2025-04-29 Created: 2025-03-19 Last updated: 2025-04-29
List of papers
1. Splash-4: A Modern Benchmark Suite with Lock-Free Constructs
Open this publication in new window or tab >>Splash-4: A Modern Benchmark Suite with Lock-Free Constructs
2022 (English)In: 2022 IEEE International Symposium on Workload Characterization (IISWC), Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 51-64Conference paper, Published paper (Refereed)
Abstract [en]

The cornerstone for the performance evaluation of computer systems is the benchmark suite. Among the many benchmark suites used in high-performance computing and multicore research, Splash-2 has been instrumental in advancing knowledge for both academia and industry. Published in 1995 and with over 5276 citations and counting, this benchmark suite is still in use to evaluate novel architectural proposals. Recently, the Splash-3 suite eliminates important performance bugs, data races, and improper synchronization that plagued Splash-2 benchmarks after the formal definition of the C memory model.

However, keeping up with architectural changes while maintaining the same workloads and algorithms (for comparative purposes) is a real challenge. Benchmark suites can misrepresent the performance characteristics of a computer system if they do not reflect the available features of the hardware and architects may end up overestimating the impact of proposed techniques or underestimating others.

In this work we introduce a revised version of Splash-3, designated Splash-4, that introduces modern programming techniques to improve scalability on contemporary hardware. We then characterize Splash-3 and Splash-4 in a state-ofthe-art simulated architecture, Intel's Ice Lake with gem5-20 simulator, as well as a real contemporary hardware processor (AMD's EPYC 7002 series). Our evaluation shows that for a 64-thread execution Splash-4 reduces the normalized execution time by an average of 52% and 34% for AMD's EPYC and Intel's Ice Lake, respectively.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Series
Proceedings of the IEEE International Symposium on Workload Characterization, ISSN 2835-222X, E-ISSN 2835-2238
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-498071 (URN)10.1109/IISWC55918.2022.00015 (DOI)000904205700005 ()978-1-6654-8798-6 (ISBN)978-1-6654-8799-3 (ISBN)
Conference
IEEE International Symposium on Workload Characterization (IISWC), NOV 06-08, 2022, Austin, TX
Funder
EU, Horizon 2020, 819134
Available from: 2023-03-13 Created: 2023-03-13 Last updated: 2025-03-19Bibliographically approved
2. Efficient, Distributed, and Non-Speculative Multi-Address Atomic Operations
Open this publication in new window or tab >>Efficient, Distributed, and Non-Speculative Multi-Address Atomic Operations
Show others...
2021 (English)In: Proceedings of 54th Annual IEEE/ACM International Symposium on Microarchitecture, Micro 2021, Association for Computing Machinery (ACM), 2021, p. 337-349Conference paper, Published paper (Refereed)
Abstract [en]

Critical sections that read, modify, and write (RMW) a small set of addresses are common in parallel applications and concurrent data structures. However, to escape from the intricacies of finegrained locks, which require reasoning about all possible thread interleavings, programmers often resort to coarse-grained locks to ensure atomicity. This results in atomic protection of a much larger set of potentially conflicting addresses, and, consequently, increased lock contention and unneeded serialization. As many before us have observed, these problems would be solved if only general RMW multi-address atomic operations were available, but current proposals are impractical because of deadlock scenarios that appear due to resource limitations. Alternatively, transactional memory can detect conflicts at run-time aiming to maximize concurrency, but it has significant overheads in highly-contended critical sections. In this work, we propose multi-address atomic operations (MAD atomics). MADatomics achieve complexity-effective, non-speculative, non-deadlocking, fine-grained locking for multiple addresses, relying solely on the coherence protocol and a predetermined locking order. Unlike prior works, MAD atomics address the challenge of enabling atomic modification over a set of cachelines with arbitrary addresses, simultaneously locking all of them while sidestepping deadlock. MAD atomics only require a small storage per core (around 68 bytes), while significantly outperforming typical lock implementations. Indeed, our evaluation using gem5-20 shows that MAD atomics can improve performance by up to 18x (3.4x, on average, for the applications and concurrent data structures evaluated in this work) over a baseline implemented with locks running on 16 cores. More importantly, the improvement still reaches 2.7x, on average, compared to an Intel hardware transactional memory implementation running on 16 cores.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2021
Series
International Symposium on Microarchitecture Proceedings, ISSN 1072-4451
Keywords
Multi-core architectures, synchronization, critical sections, atomicity, multi-address atomics
National Category
Computer Sciences Computer Engineering Computer Systems
Identifiers
urn:nbn:se:uu:diva-523380 (URN)10.1145/3466752.3480073 (DOI)001118047400025 ()978-1-4503-8557-2 (ISBN)
Conference
54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), OCT 18-22, 2021, ELECTR NETWORK
Funder
EU, Horizon 2020, 819134Swedish Research Council, 2018-05254
Available from: 2024-02-16 Created: 2024-02-16 Last updated: 2025-03-19Bibliographically approved
3. Bounding Speculative Execution of Atomic Regions to a Single Retry
Open this publication in new window or tab >>Bounding Speculative Execution of Atomic Regions to a Single Retry
(English)Manuscript (preprint) (Other academic)
National Category
Computer and Information Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-552946 (URN)
Available from: 2025-03-19 Created: 2025-03-19 Last updated: 2025-03-21

Open Access in DiVA

UUThesis_E-J-Gómez-Hernández-2025(794 kB)27 downloads
File information
File name FULLTEXT01.pdfFile size 794 kBChecksum SHA-512
db6648cca82170746f5b50cadea013ad2a3a9df74ef427c3bd1238a35acc1a16ce231949025281ec191251f0f6a30b63589825dd935be1fe92fd7b27135efbb7
Type fulltextMimetype application/pdf

Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 27 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 445 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf