Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Fine-Grained Bulge-Chasing Kernels for Strongly Scalable Parallel QR Algorithms
Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).ORCID iD: 0000-0002-4675-7434
Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
Umeå University, Faculty of Science and Technology, Department of Computing Science. Umeå University, Faculty of Science and Technology, High Performance Computing Center North (HPC2N).
2014 (English)In: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, no 7, 271-288 p.Article in journal (Refereed) Published
Abstract [en]

The bulge-chasing kernel in the small-bulge multi-shift QR algorithm for the non-symmetric dense eigenvalue problem becomes a sequential bottleneck when the QR algorithm is run in parallel on a multicore platform with shared memory. The duration of each kernel invocation is short, but the critical path of the QR algorithm contains a long sequence of calls to the bulge-chasing kernel. We study the problem of parallelizing the bulge-chasing kernel itself across a handful of processor cores in order to reduce the execution time of the critical path. We propose and evaluate a sequence of four algorithms with varying degrees of complexity and verify that a pipelined algorithm with a slowly shifting block column distribution of the Hessenberg matrix is superior. The load-balancing problem is non-trivial and computational experiments show that the load-balancing scheme has a large impact on the overall performance. We propose two heuristics for the load-balancing problem and also an effective optimization method based on local search. Numerical experiments show that speed-ups are obtained for problems as small as 40-by-40 on two different multicore architectures.

Place, publisher, year, edition, pages
Elsevier, 2014. no 7, 271-288 p.
Keyword [en]
Fine-grained parallelism, Scalability, Load-balancing, Load-balance optimization, Auto-tuning
National Category
Computer Science
Research subject
Computing Science
Identifiers
URN: urn:nbn:se:umu:diva-79742DOI: 10.1016/j.parco.2014.04.003ISI: 000339598400010OAI: oai:DiVA.org:umu-79742DiVA: diva2:644471
Conference
7th International Workshop on Parallel Matrix Algorithms and Applications, London, June 28-30, 2012
Note

Volume: 40 Issue: 7 Pages: 271-288 Special Issue: SI

Available from: 2013-08-30 Created: 2013-08-30 Last updated: 2017-12-06Bibliographically approved

Open Access in DiVA

PARCO-D-12-00193.pdf(703 kB)264 downloads
File information
File name FULLTEXT01.pdfFile size 703 kBChecksum SHA-512
75e219e892de22965b1cde4fc7605e62992118db65cd2dd2cee87a0b057ffb6776391474e8a5549019a395330d30c736ca627a468a57addc8a6a56b7f86bffff
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Karlsson, LarsKågström, BoWadbro, Eddie
By organisation
Department of Computing ScienceHigh Performance Computing Center North (HPC2N)
In the same journal
Parallel Computing
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 264 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 208 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf