Dataflow Implementation of QR Decomposition on a Manycore
2016 (English)In: MES '16: Proceedings of the Third ACM International Workshop on Many-core Embedded Systems, New York, NY: ACM Press, 2016, 26-30 p.Conference paper (Refereed)
While parallel computer architectures have become mainstream, application development on them is still challenging. There is a need for new tools, languages and programming models. Additionally, there is a lack of knowledge about the performance of parallel approaches of basic but important operations, such as the QR decomposition of a matrix, on current commercial manycore architectures.
This paper evaluates a high level dataflow language (CAL), a source-to-source compiler (Cal2Many) and three QR decomposition algorithms (Givens Rotations, Householder and Gram-Schmidt). The algorithms are implemented both in CAL and hand-optimized C languages, executed on Adapteva's Epiphany manycore architecture and evaluated with respect to performance, scalability and development effort.
The performance of the CAL (generated C) implementations gets as good as 2\% slower than the hand-written versions. They require an average of 25\% fewer lines of source code without significantly increasing the binary size. Development effort is reduced and debugging is significantly simplified. The implementations executed on Epiphany cores outperform the GNU scientific library on the host ARM processor of the Parallella board by up to 30x. © 2016 Copyright held by the owner/author(s).
Place, publisher, year, edition, pages
New York, NY: ACM Press, 2016. 26-30 p.
IdentifiersURN: urn:nbn:se:hh:diva-32371DOI: 10.1145/2934495.2934499ScopusID: 2-s2.0-84991106778ISBN: 978-1-4503-4262-9 (print)OAI: oai:DiVA.org:hh-32371DiVA: diva2:1044642
MES '16, International Workshop on Many-core Embedded Systems, Seoul, Republic of Korea, June 19, 2016
FunderKnowledge FoundationSwedish Foundation for Strategic Research ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications