Change search
ReferencesLink to record
Permanent link

Direct link
Architectural Support for Reducing Parallel Processing Overhead in an Embedded Multiprocessor
Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
2010 (English)In: EUC '10 Proceedings of the 2010 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, Washington, DC, USA: IEEE Computer Society , 2010, 47-52 p.Conference paper (Refereed)
Abstract [en]

The host-multi-SIMD chip multiprocessor (CMP) architecture has been proved to be an efficient architecture for high performance signal processing which explores both task level parallelism by multi-core processing and data level parallelism by SIMD processors. Different from the cache-based memory subsystem in most general purpose processors, this architecture uses on-chip scratchpad memory (SPM) as processor local data buffer and allows software to explicitly control the data movements in the memory hierarchy. This SPM-based solution is more efficient for predictable signal processing in embedded systems where data access patterns are known at design time. The predictable performance is especially important for real time signal processing. According to Amdahl¡¯s law, the nonparallelizable part of an algorithm has critical impact on the overall performance. Implementing an algorithm in a parallel platform usually produces control and communication overhead which is not parallelizable. This paper presents the architectural support in an embedded multiprocessor platform to maximally reduce the parallel processing overhead. The effectiveness of these architecture designs in boosting parallel performance is evaluated by an implementation example of 64x64 complex matrix multiplication. The result shows that the parallel processing overhead is reduced from 369% to 28%.

Place, publisher, year, edition, pages
Washington, DC, USA: IEEE Computer Society , 2010. 47-52 p.
Keyword [en]
Parallel DSP, Multiprocessor, Control overhead, Communication overhead, Matrix multiplication
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:liu:diva-66220DOI: 10.1109/EUC.2010.17ISBN: 978-0-7695-4322-2OAI: oai:DiVA.org:liu-66220DiVA: diva2:402562
Conference
8th International Conference on Embedded and Ubiquitous Computing (EUC), 2010 IEEE/IFIP, 11-13 December, Hong Kong, China
Projects
ePUMA
Available from: 2011-03-17 Created: 2011-03-08 Last updated: 2011-04-12Bibliographically approved

Open Access in DiVA

fulltext(532 kB)628 downloads
File information
File name FULLTEXT01.pdfFile size 532 kBChecksum SHA-512
e034624c6684f9c5a39bd52fc521c1d075ef48dcd973f41fa27cca9d091f16dc4fb3f555ec100a7cec4c414e4e98c89c3d6ac24220b57cabd85a4648485d76ce
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Wang, JianSohl, JoarDake, Liu
By organisation
Computer EngineeringThe Institute of Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 628 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 182 hits
ReferencesLink to record
Permanent link

Direct link