Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Performance Modeling of Multi-core Systems: Caches and Locks
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Computer Systems. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems.
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Performance is an important aspect of computer systems since it directly affects user experience. One way to analyze and predict performance is via performance modeling. In recent years, multi-core systems have made processors more powerful while keeping power consumption relatively low. However the complicated design of these systems makes it difficult to analyze performance. This thesis presents performance modeling techniques for cache performance and synchronization cost on multi-core systems.

A cache can be designed in many ways with different configuration parameters including cache size, associativity and replacement policy. Understanding cache performance under different configurations is useful to explore the design choices. We propose a general modeling framework for estimating the cache miss ratio under different cache configurations, based on the reuse distance distribution. On multi-core systems, each core usually has a private cache. Keeping shared data in private caches coherent has an extra cost. We propose three models to estimate this cost, based on information that can be gathered when running the program on a single core.

Locks are widely used as a synchronization primitive in multi-threaded programs on multi-core systems. While they are often necessary for protecting shared data, they also introduce lock contention, which causes performance issues. We present a model to predict how much contention a lock has on multi-core systems, based on information obtainable from profiling a run on a single core. If lock contention is shown to be a performance bottleneck, one of the ways to mitigate it is to use another lock implementation. However, it is costly to investigate if adopting another lock implementation would reduce lock contention since it requires reimplementation and measurement. We present a model for forecasting lock contention with another lock implementation without replacing the current lock implementation.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2016. , 55 p.
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1336
Keyword [en]
performance modeling, performance analysis, multi-core, cache, lock
National Category
Computer Science
Identifiers
URN: urn:nbn:se:uu:diva-271124ISBN: 978-91-554-9451-3 (print)OAI: oai:DiVA.org:uu-271124DiVA: diva2:891196
Public defence
2016-03-07, 2446, ITC, Lägerhyddsvägen 2, Uppsala, 13:15 (English)
Opponent
Supervisors
Available from: 2016-02-19 Created: 2016-01-05 Last updated: 2016-03-09Bibliographically approved
List of papers
1. A Modeling Framework for Reuse Distance-based Estimation of Cache Performance
Open this publication in new window or tab >>A Modeling Framework for Reuse Distance-based Estimation of Cache Performance
2015 (English)In: Performance Analysis of Systems and Software (ISPASS), 2015 IEEE International Symposium on, IEEE, 2015, 62-71 p.Conference paper, Published paper (Refereed)
Abstract [en]

We develop an analytical modeling framework for efficient prediction of cache miss ratios based on reuse distance distributions. The only input needed for our predictions is the reuse distance distribution of a program execution: previous work has shown that they can be obtained with very small overhead by sampling from native executions. This should be contrasted with previous approaches that base predictions on stack distance distributions, whose collection need significantly larger overhead or additional hardware support. The predictions are based on a uniform modeling framework which can be specialized for a variety of cache replacement policies, including Random, LRU, PLRU, and MRU (aka. bit-PLRU), and for arbitrary values of cache size and cache associativity. We evaluate our modeling framework with the SPEC CPU 2006 benchmark suite over a set of cache configurations with varying cache size, associativity and replacement policy. The introduced inaccuracies were generally below 1% for the model of the policy, and additionally around 2% when set-local reuse distances must be estimated from global reuse distance distributions. The inaccuracy introduced by sampling is significantly smaller.

Place, publisher, year, edition, pages
IEEE: , 2015
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-260767 (URN)10.1109/ISPASS.2015.7095785 (DOI)000380554200007 ()9781479919574 (ISBN)
External cooperation:
Conference
2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS),
Projects
UPMARC
Available from: 2015-08-24 Created: 2015-08-24 Last updated: 2016-09-15Bibliographically approved
2. Modeling cache coherence misses on multicores
Open this publication in new window or tab >>Modeling cache coherence misses on multicores
2014 (English)In: 2014 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS), IEEE, 2014, 96-105 p.Conference paper, Published paper (Refereed)
Abstract [en]

While maintaining the coherency of private caches, invalidation-based cache coherence protocols introduce cache coherence misses. We address the problem of predicting the number of cache coherence misses in the private cache of a parallel application when running on a multicore system with an invalidation-based cache coherence protocol. We propose three new performance models (uniform, phased and symmetric) for estimating the number of coherence misses from information about inter-core data sharing patterns and the individual core's data reuse patterns. The inputs to the uniform and phased models are the write frequency and reuse distance distribution of shared data from different cores. This input can be obtained either from profiling the target application on a single core or by analyzing the data access pattern statically, and does not need a detailed simulation of the pattern of interleaving accesses to shared data. The output of the models is an estimated number of coherence misses of the target application. The output can be combined with the number of other kinds of misses to estimate the total number of misses in each core's private cache. This output can also be used to guide program optimization to improve cache performance. We evaluate our models with a set of benchmarks from the PARSEC benchmark suite on real hardware.

Place, publisher, year, edition, pages
IEEE: , 2014
Series
IEEE International Symposium on Performance Analysis of Systems and Software-ISPASS
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:uu:diva-238139 (URN)10.1109/ISPASS.2014.6844465 (DOI)000364102000011 ()978-1-4799-3604-5 (ISBN)
Conference
2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Monterey, CA
Projects
UPMARC
Available from: 2014-12-09 Created: 2014-12-09 Last updated: 2016-02-22Bibliographically approved
3. Predicting the Cost of Lock Contention in Parallel Applications on Multicores using Analytic Modeling
Open this publication in new window or tab >>Predicting the Cost of Lock Contention in Parallel Applications on Multicores using Analytic Modeling
2012 (English)In: Proc. 5th Swedish Workshop on Multi-Core Computing, 2012Conference paper, Published paper (Other academic)
National Category
Computer Science
Identifiers
urn:nbn:se:uu:diva-270130 (URN)
Conference
MCC12
Projects
UPMARC
Available from: 2012-11-23 Created: 2015-12-21 Last updated: 2016-02-22Bibliographically approved
4. Forecasting Lock Contention Before Adopting Another Lock Algorithm
Open this publication in new window or tab >>Forecasting Lock Contention Before Adopting Another Lock Algorithm
2015 (English)Report (Other academic)
National Category
Computer Science
Identifiers
urn:nbn:se:uu:diva-270716 (URN)
Available from: 2016-01-03 Created: 2016-01-03 Last updated: 2016-02-22Bibliographically approved

Open Access in DiVA

fulltext(485 kB)240 downloads
File information
File name FULLTEXT01.pdfFile size 485 kBChecksum SHA-512
e641a2145c9ea38bd0a9b0b0dff8607dd1a03fef7ede5f461f2926100bcbd227b34a86353954bc9d7b784072e2c5a173385b744227e0ab7e5a5a07c56bd40036
Type fulltextMimetype application/pdf
Buy this publication >>

Search in DiVA

By author/editor
Pan, Xiaoyue
By organisation
Division of Computer SystemsComputer Systems
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 240 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 20225 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf