Fine-Grained Bulge-Chasing Kernels for Strongly Scalable Parallel QR Algorithms
2014 (English)In: Parallel Computing, ISSN 0167-8191, no 7, 271-288 p.Article in journal (Refereed) Published
The bulge-chasing kernel in the small-bulge multi-shift QR algorithm for the non-symmetric dense eigenvalue problem becomes a sequential bottleneck when the QR algorithm is run in parallel on a multicore platform with shared memory. The duration of each kernel invocation is short, but the critical path of the QR algorithm contains a long sequence of calls to the bulge-chasing kernel. We study the problem of parallelizing the bulge-chasing kernel itself across a handful of processor cores in order to reduce the execution time of the critical path. We propose and evaluate a sequence of four algorithms with varying degrees of complexity and verify that a pipelined algorithm with a slowly shifting block column distribution of the Hessenberg matrix is superior. The load-balancing problem is non-trivial and computational experiments show that the load-balancing scheme has a large impact on the overall performance. We propose two heuristics for the load-balancing problem and also an effective optimization method based on local search. Numerical experiments show that speed-ups are obtained for problems as small as 40-by-40 on two different multicore architectures.
Place, publisher, year, edition, pages
Elsevier, 2014. no 7, 271-288 p.
Fine-grained parallelism, Scalability, Load-balancing, Load-balance optimization, Auto-tuning
Research subject Computing Science
IdentifiersURN: urn:nbn:se:umu:diva-79742DOI: 10.1016/j.parco.2014.04.003ISI: 000339598400010OAI: oai:DiVA.org:umu-79742DiVA: diva2:644471
7th International Workshop on Parallel Matrix Algorithms and Applications, London, June 28-30, 2012
Volume: 40 Issue: 7 Pages: 271-288 Special Issue: SI2013-08-302013-08-302014-08-28Bibliographically approved