Change search
ReferencesLink to record
Permanent link

Direct link
Dataflow Implementation of QR Decomposition on a Manycore
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES). (EPC Group)ORCID iD: 0000-0001-8652-0098
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).
Halmstad University, School of Information Technology, Halmstad Embedded and Intelligent Systems Research (EIS), Centre for Research on Embedded Systems (CERES).ORCID iD: 0000-0002-4932-4036
Show others and affiliations
2016 (English)In: MES '16: Proceedings of the Third ACM International Workshop on Many-core Embedded Systems, New York, NY: ACM Press, 2016, 26-30 p.Conference paper (Refereed)
Abstract [en]

While parallel computer architectures have become mainstream, application development on them is still challenging. There is a need for new tools, languages and programming models. Additionally, there is a lack of knowledge about the performance of parallel approaches of basic but important operations, such as the QR decomposition of a matrix, on current commercial manycore architectures.

This paper evaluates a high level dataflow language (CAL), a source-to-source compiler (Cal2Many) and three QR decomposition algorithms (Givens Rotations, Householder and Gram-Schmidt). The algorithms are implemented both in CAL and hand-optimized C languages, executed on Adapteva's Epiphany manycore architecture and evaluated with respect to performance, scalability and development effort.

The performance of the CAL (generated C) implementations gets as good as 2\% slower than the hand-written versions. They require an average of 25\% fewer lines of source code without significantly increasing the binary size. Development effort is reduced and debugging is significantly simplified. The implementations executed on Epiphany cores outperform the GNU scientific library on the host ARM processor of the Parallella board by up to 30x. © 2016 Copyright held by the owner/author(s).

Place, publisher, year, edition, pages
New York, NY: ACM Press, 2016. 26-30 p.
National Category
Embedded Systems
Identifiers
URN: urn:nbn:se:hh:diva-32371DOI: 10.1145/2934495.2934499ScopusID: 2-s2.0-84991106778ISBN: 978-1-4503-4262-9OAI: oai:DiVA.org:hh-32371DiVA: diva2:1044642
Conference
MES '16, International Workshop on Many-core Embedded Systems, Seoul, Republic of Korea, June 19, 2016
Projects
ESCHERHiPEC
Funder
Knowledge FoundationSwedish Foundation for Strategic Research ELLIIT - The Linköping‐Lund Initiative on IT and Mobile Communications
Available from: 2016-11-04 Created: 2016-11-04 Last updated: 2016-11-08Bibliographically approved

Open Access in DiVA

fulltext(368 kB)32 downloads
File information
File name FULLTEXT01.pdfFile size 368 kBChecksum SHA-512
2bfd6168167d3e92bbefa2a3bf9665ff1f3a7ab8135497d40ef7ac1f1cd7b021b10d44125f4061b06c4c06600a4809a54bac9103ae03bcf247e4f3326c8a2524
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Savas, SüleymanRaase, SebastianGebrewahid, EssayasUl-Abdin, ZainNordström, Tomas
By organisation
Centre for Research on Embedded Systems (CERES)
Embedded Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 32 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 34 hits
ReferencesLink to record
Permanent link

Direct link