Change search
ReferencesLink to record
Permanent link

Direct link
Low Overhead Memory Subsystem Design for a Multicore Parallel DSP Processor
Linköping University, Department of Electrical Engineering, Computer Engineering. Linköping University, The Institute of Technology.
2014 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

The physical scaling following Moore’s law is saturated while the requirement on computing keeps growing. The gain from improving silicon technology is only the shrinking of the silicon area, and the speed-power scaling has almost stopped in the last two years. It calls for new parallel computing architectures and new parallel programming methods.

Traditional ASIC (Application Specific Integrated Circuits) hardware has been used for acceleration of Digital Signal Processing (DSP) subsystems on SoC (System-on-Chip). Embedded systems become more complicated, and more functions, more applications, and more features must be integrated in one ASIC chip to follow up the market requirements. At the same time, the product lifetime of a SoC with ASIC has been much reduced because of the dynamic market. The life time of the design for a typical main chip in a mobile phone based on ASIC acceleration is about half a year and the NRE (Non-Recurring Engineering) cost of it can be much more than 50 million US$.

The current situation calls for a new solution than that of ASIC. ASIP (Application Specific Instruction set Processor) offers comparable power consumption and silicon cost to ASICs. Its greatest advantage is the functional flexibility in a predefined application domain. ASIP based SoC enables software upgrading without changing hardware. Thus the product life time can be 5-10 times more than that of ASIC based SoC.

This dissertation will present an ASIP based SoC, a new unified parallel DSP subsystem named ePUMA (embedded Parallel DSP Platform with Unique Memory Access), to target embedded signal processing in  communication and multimedia applications. The unified DSP subsystem can further reduce the hardware cost, especially the memory cost, of embedded SoC processors, and most importantly, provide full programmability for a wide range of DSP applications. The ePUMA processor is based on a master-slave heterogeneous multi-core architecture. One master core performs the central control, and multiple Single Instruction Multiple Data (SIMD) coprocessors work in parallel to offer a majority of the computing power.

The focus and the main contribution of this thesis are on the memory subsystem design of ePUMA. The multi-core system uses a distributed memory architecture based on scratchpad memories and software controlled data movement. It is suitable for the data access properties of streaming applications and the kernel based multi-core computing model. The essential techniques include the conflict free access parallel memory architecture, the multi-layer interconnection network, the non-address stream data transfer, the transitioned memory buffers, and the lookup table based parallel memory addressing. The goal of the design is to minimize the hardware cost, simplify the software protocol for inter-processor communication, and increase the arithmetic computing efficiency.

We have so far proved by applications that most DSP algorithms, such as filters, vector/matrix operations, transforms, and arithmetic functions, can achieve computing efficiency over 70% on the ePUMA platform. And the non-address stream network provides equivalent communication bandwidth by less than 30% implementation cost of a crossbar interconnection.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2014. , 190 p.
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1532
National Category
Engineering and Technology
URN: urn:nbn:se:liu:diva-105866DOI: 10.3384/diss.diva-105866ISBN: 978-91-7519-556-8 (print)OAI: diva2:711712
Public defence
2014-05-09, Visionen, Hus B, Campus Valla, Linköpings universitet, Linköping, 13:15 (English)
Available from: 2014-04-11 Created: 2014-04-11 Last updated: 2015-02-18Bibliographically approved

Open Access in DiVA

Low Overhead Memory Subsystem Design for a Multicore Parallel DSP Processor(1806 kB)772 downloads
File information
File name FULLTEXT01.pdfFile size 1806 kBChecksum SHA-512
Type fulltextMimetype application/pdf
omslag(2773 kB)12 downloads
File information
File name COVER01.pdfFile size 2773 kBChecksum SHA-512
Type coverMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Wang, Jian
By organisation
Computer EngineeringThe Institute of Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 772 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 665 hits
ReferencesLink to record
Permanent link

Direct link