Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Identification and tuning of algorithmic parameters in parallel matrix computations: Hessenberg reduction and tensor storage format conversion
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2018 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis considers two problems in numerical linear algebra and high performance computing (HPC): (i) the parallelization of a new blocked Hessenberg reduction algorithm using Parallel Cache Assignment (PCA) and the tunability of its algorithm parameters, and (ii) storing and manipulating dense tensors on shared memory HPC systems.

The Hessenberg reduction appears in the Aggressive Early Deflation (AED) process for identifying converged eigenvalues in the distributed multishift QR algorithm (state-of-the-art algorithm for computing all eigenvalues for dense square matrices). Since the AED process becomes a parallel bottleneck it motivates a further study of AED components. We present a new Hessenberg reduction algorithm based on PCA which is NUMA-aware and targeting relatively small problem sizes on shared memory systems. The tunability of the algorithm parameters are investigated. A simple off-line tuning is presented and the performance of the new Hessenberg reduction algorithm is compared to its counterparts from LAPACK and ScaLAPACK. The new algorithm outperforms LAPACK in all tested cases and outperforms ScaLAPACK in problems smaller than order 1500, which are common problem sizes for AED in the context of the distributed multishift QR algorithm.

We also investigate automatic tuning of the algorithm parameters. The parameters span a huge search space and it is impractical to tune them using standard auto-tuning and optimization techniques. We present a modular auto-tuning framework which applies: search space decomposition, binning, and multi-stage search to enable searching the huge search space efficiently. The framework using these techniques exposes the underlying subproblems which allows using standard auto-tuning methods to tune them. In addition, the framework defines an abstract interface, which combined with its modular design, allows testing various tuning algorithms.

In the last part of the thesis, the focus is on the problem of storing and manipulating dense tensors. Developing open source tensor algorithms and applications is hard due to the lack of open source software for fundamental tensor operations. We present a software library dten, which includes tools for storing dense tensors in shared memory and converting a tensor storage format from one canonical form to another. The library provides two different ways to perform the conversion in parallel, in-place and out-of-place. The conversion involves moving blocks of contiguous data and are done to maximize the size of the blocks to move. In addition, the library supports tensor matricization for one or two tensors at the same time. The latter case is important in preparing tensors for contraction operations. The library is general purpose and highly flexible.

Place, publisher, year, edition, pages
Umeå: Umeå universitet , 2018. , p. 15
Series
Report / UMINF, ISSN 0348-0542 ; UMINF 18.22
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:umu:diva-145345ISBN: 978-91-7601-843-9 (print)OAI: oai:DiVA.org:umu-145345DiVA, id: diva2:1186623
Supervisors
Available from: 2018-03-01 Created: 2018-02-28 Last updated: 2018-06-09Bibliographically approved
List of papers
1. On the Tunability of a New Hessenberg Reduction Algorithm Using Parallel Cache Assignment
Open this publication in new window or tab >>On the Tunability of a New Hessenberg Reduction Algorithm Using Parallel Cache Assignment
2017 (English)Conference paper, Published paper (Refereed)
National Category
Computer Sciences Computational Mathematics
Identifiers
urn:nbn:se:umu:diva-145342 (URN)
Conference
12th International Conference on Parallel Processing and Applied Mathematics (PPAM 2017)
Note

Accepted for publishing

Available from: 2018-02-28 Created: 2018-02-28 Last updated: 2018-06-09
2. An auto-tuning framework for a NUMA-aware Hessenberg reduction algorithm
Open this publication in new window or tab >>An auto-tuning framework for a NUMA-aware Hessenberg reduction algorithm
2017 (English)Report (Other academic)
Abstract [en]

The performance of a recently developed Hessenberg reduction algorithm greatly depends on the values chosen for its tunable parameters. The search space is huge combined with other complications makes the problem hard to solve effectively with generic methods and tools. We describe a modular auto-tuning framework in which the underlying optimization algorithm is easy to substitute. The framework exposes sub-problems of standard auto-tuning type for which existing generic methods can be reused. The outputs of concurrently executing sub-tuners are assembled by the framework into a solution to the original problem.

Place, publisher, year, edition, pages
Umeå: Department of computing science, Umeå university, 2017. p. 14
Series
Report / UMINF, ISSN 0348-0542 ; 17.19
Keywords
Auto-tuning, Tuning framework, Binning, Search space decomposition, Multistage search, Hessenberg reduction, NUMA-aware
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-145297 (URN)
Available from: 2018-02-28 Created: 2018-02-28 Last updated: 2018-06-09Bibliographically approved
3. A library for storing and manipulating dense tensors
Open this publication in new window or tab >>A library for storing and manipulating dense tensors
2016 (English)Report (Other academic)
Abstract [en]

Aiming to build a layered infrastructure for high-performance dense tensor applications, we present a library, called dten, for storing and manipulating dense tensors. The library focuses on storing dense tensors in canonical storage formats and converting between storage formats in parallel. In addition, it supports tensor matricization in different ways. The library is general-purpose and provides a high degree of flexibility.

Place, publisher, year, edition, pages
Umeå: Department of computing science, Umeå university, 2016. p. 21
Series
Report / UMINF, ISSN 0348-0542 ; 17.22
Keywords
Dense tensors, canonical storage format, tensor matricization, tensor storage format conversion, out-of-place conversion, in-place conversion
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-145341 (URN)
Available from: 2018-02-28 Created: 2018-02-28 Last updated: 2018-06-09Bibliographically approved

Open Access in DiVA

fulltext(149 kB)29 downloads
File information
File name FULLTEXT01.pdfFile size 149 kBChecksum SHA-512
babcbd738e503c09f2ae79ed781b9fb6ed8a4fc85a9956097a3c68465312f67bf69afacc73aacfd9b491981c1cb70ec0636d52bb5b3ea83d63a3edb16bb84aab
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Eljammaly, Mahmoud
By organisation
Department of Computing Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 29 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 70 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf