Performance Analysis of Cache-Aware Multicore Parallelization with Application to Optimization Theory
In previous work, a cache-aware sparse matrix multiplication for linear programming interior point methods was proposed. The serial implementations achieved speedups ranging from 1.2 to 108.0 over the implementation in GLPK, an open-source linear programming solver. In this work, the same ideas and data structures are used to develop a cache-aware sparse cholesky decomposition as it is implemented in GLPK. The serial implementation achieves a speedup of up to 2.5 on the problem set considered. The matrix multiplication and cholesky decomposition are analysed by use of performance counters on both an AMD-based and an Intel-based system. The analysis shows that the applied blocking techniques reduce the number of floating point operations performed, and that this effect is even more important than the achieved cache utilization to produce speedup for some problems.
Place, publisher, year, edition, pages
Institutt for datateknikk og informasjonsvitenskap , 2012. , 46 p.
ntnudaim:7758, MTDT datateknikk, Komplekse datasystemer
IdentifiersURN: urn:nbn:no:ntnu:diva-18991Local ID: ntnudaim:7758OAI: oai:DiVA.org:ntnu-18991DiVA: diva2:566384
Amundsen, Jørn Aslak, FørsteamanuensisEleyat, Mujahed