Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Realization and Performance Comparison of Sequential and Weak Memory Consistency Models in Network-on-Chip based Multi-core Systems
KTH, School of Information and Communication Technology (ICT), Electronic Systems.
KTH, School of Information and Communication Technology (ICT), Electronic Systems.
KTH, School of Information and Communication Technology (ICT), Electronic Systems.ORCID iD: 0000-0003-0061-3475
KTH, School of Information and Communication Technology (ICT), Electronic Systems.
2011 (English)In: Proceedings of 16th ACM/IEEE Asia and South Pacific Design Automation Conference(ASP-DAC) 2011, IEEE Press, 2011, 154-159 p.Conference paper, Published paper (Refereed)
Abstract [en]

This paper studies realization and performance comparison of the sequential and weak consistency models in the network-on-chip (NoC) based distributed shared memory (DSM) multi-ore systems. Memory consistency constrains the order of shared memory operations for the expected behavior of the multi-core systems. Both the consistency models are realized in the NoC based multi-core systems. The performance of the two consistency models are compared for various sizes of networks using regular mesh topologies and deflection routing algorithm. The results show that the weak consistency improves the performance by 46.17% and 33.76% on average in the code and consistency latencies over the sequential consistency model, due to relaxation in the program order, as the system grows from single core to 64 cores.

Place, publisher, year, edition, pages
IEEE Press, 2011. 154-159 p.
Keyword [en]
NoC, consistency scalability, distributed shared memory, lock position, mesh topology, multicore architecture, network-on-chip, synchronization latency, torus topology, transaction counter, distributed shared memory systems, multiprocessor interconnection networks, network-on-chip, parallel architectures, performance evaluation, synchronisation, transaction processing
National Category
Embedded Systems
Identifiers
URN: urn:nbn:se:kth:diva-62136DOI: 10.1109/ASPDAC.2011.5722176ISI: 000299427300043Scopus ID: 2-s2.0-79952920046OAI: oai:DiVA.org:kth-62136DiVA: diva2:479886
Conference
16th ACM/IEEE Asia and South Pacific Design Automation Conference(ASP-DAC 2011)
Projects
Mosart
Note

© 2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Qc 20120201

Available from: 2012-02-01 Created: 2012-01-18 Last updated: 2013-02-04Bibliographically approved
In thesis
1. Architecture Support and Scalability Analysis of Memory Consistency Models in Network-on-Chip based Systems
Open this publication in new window or tab >>Architecture Support and Scalability Analysis of Memory Consistency Models in Network-on-Chip based Systems
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The shared memory systems should support parallelization at the computation (multi-core), communication (Network-on-Chip, NoC) and memory architecture levels to exploit the potential performance benefits. These parallel systems supporting shared memory abstraction both in the general purpose and application specific domains are confronting the critical issue of memory consistency. The memory consistency issue arises due to the unconstrained memory operations which leads to the unexpected behavior of shared memory systems. The memory consistency models enforce ordering constraints on the memory operations for the expected behavior of the shared memory systems. The intuitive Sequential Consistency (SC) model enforces strict ordering constraints on the memory operations and does not take advantage of the system optimizations both in the hardware and software. Alternatively, the relaxed memory consistency models relax the ordering constraints on the memory operations and exploit these optimizations to enhance the system performance at the reasonable cost. The purpose of this thesis is twofold. First, the novel architecture supports are provided for the different memory consistency models like: SC, Total Store Ordering (TSO), Partial Store Ordering (PSO), Weak Consistency (WC), Release Consistency (RC) and Protected Release Consistency (PRC) in the NoC-based multi-core (McNoC) systems. The PRC model is proposed as an extension of the RC model which provides additional reordering and relaxation in the memory operations. Second, the scalability analysis of these memory consistency models is performed in the McNoC systems.

The architecture supports for these different memory consistency models are provided in the McNoC platforms. Each configurable McNoC platform uses a packet-switched 2-D mesh NoC with deflection routing policy, distributed shared memory (DSM), distributed locks and customized processor interface. The memory consistency models/protocols are implemented in the customized processor interfaces which are developed to integrate the processors with the rest of the system. The realization schemes for the memory consistency models are based on a transaction counter and an an an address ddress ddress ddress ddress ddress ddress stack tacktack-based based based based based based novel approaches.approaches.approaches.approaches. approaches.approaches.approaches.approaches.approaches.approaches. The transaction counter is used in each node of the network to keep track of the outstanding memory operations issued by a processor in the system. The address stack is used in each node of the network to keep track of the addresses of the outstanding memory operations issued by a processor in the system. These hardware structures are used in the processor interface to enforce the required global orders under these different memory consistency models. The realization scheme of the PRC model in addition also uses acquire counter for further classification of the data operations as unprotected and protected operations.

The scalability analysis of these different memory consistency models is performed on the basis of different workloads which are developed and mapped on the various sized networks. The scalability study is conducted in the McNoC systems with 1 to 64-cores with various applications using different problem sizes and traffic patterns. The performance metrics like execution time, performance, speedup, overhead and efficiency are evaluated as a function of the network size. The experiments are conducted both with the synthetic and application workloads. The experimental results under different application workloads show that the average execution time under the relaxed memory consistency models decreases relative to the SC model. The specific numbers are highly sensitive to the application and depend on how well it matches to the architectures. This study shows the performance improvement under the relaxed memory consistency models over the SC model that is dependent on the computation-to-communication ratio, traffic patterns, data-to-synchronization ratio and the problem size. The performance improvement of the PRC and RC models over the SC model tends to be higher than 50% as observed in the experiments, when the system is further scaled up.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2013. xviii, 143 p.
Series
Trita-ICT-ECS AVH, ISSN 1653-6363 ; 12:11
Keyword
Memory consistency, Protected release consistency, Distributed shared memory; Network-on-Chip, Scalability
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-117700 (URN)978-91-7501-617-7 (ISBN)
Public defence
2013-03-13, Sal E, Forum, Isafjordsgatan 39, Kista, 09:00 (English)
Opponent
Supervisors
Note

QC 20130204

Available from: 2013-02-04 Created: 2013-02-02 Last updated: 2013-02-04Bibliographically approved

Open Access in DiVA

ASP-DAC Abdul Naeem(253 kB)279 downloads
File information
File name FULLTEXT01.pdfFile size 253 kBChecksum SHA-512
37f9acab3f7237e4b865850e8a774af9b34c3ca3580c43a99e9cb8b05144bc2ce6845a8d3bd625565fd038f0aa060020d6501c435c6a3581bfca548147cac154
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopusIEEEXplore

Authority records BETA

Lu, Zhonghai

Search in DiVA

By author/editor
Naeem, AbdulChen, XiaowenLu, ZhonghaiJantsch, Axel
By organisation
Electronic Systems
Embedded Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 279 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 69 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf