Change search
Refine search result
2345678 201 - 250 of 633
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 201. Hemani, A.
    et al.
    Svantesson, B.
    Ellervee, P.
    Postula, A.
    Öberg, J.
    Jantsch, A.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Tenhunen, Hannu
    KTH, Superseded Departments, Electronic Systems Design.
    High-level Synthesis of Control and Memory Intensive Communication Systems1995In:  , 1995Conference paper (Refereed)
  • 202. Hemani, Ahmed
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Kumar, Shashi
    Postula, Adam
    Öberg, Johnny
    Millberg, Mikael
    Lindqvist, Dan
    Network on Chip: An architecture for billion transistor era2000In: Proceeding of the IEEE NorChip Conference, 2000Conference paper (Refereed)
  • 203.
    Hemani, Ahmed
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Meincke, Thomas
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Kumar, Shashi
    Indian Institute of Technology.
    Postula, Adam
    Department of CSEE, University of Queensland.
    Olsson, Thomas
    Dept. of Applied Electronics, Univ. of Lund.
    Nilsson, Peter
    Dept. of Applied Electronics, Univ. of Lund.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Ellervee, Peeter
    KTH, School of Information and Communication Technology (ICT).
    Lindqvist, Dan
    Ericsson Radio Systems AB.
    Lowering power consumption in clock by using globally asynchronous locally synchronous design style1999In: Design Automation Conference, 1999. Proceedings. 36th, 1999, p. 873-878Conference paper (Refereed)
    Abstract [en]

    Power consumption in clock of large high performance VLSIs can be reduced by adopting globally asynchronous, locally synchronous design style (GALS). GALS has small overheads for the global asynchronous communication and local clock generation. We propose methods to (a) evaluate the benefits of GALS and account for its overheads, which can be used as the basis for partitioning the system into optimal number/size of synchronous blocks, and (b) automate the synthesis of the global asynchronous communication. Three realistic ASICs, ranging in complexity from 1 to 3 million gates, were used to evaluate GALS benefits and overheads. The results show an average power saving of about 70% in clock with negligible overheads

  • 204.
    Hemani, Ahmed
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Deb, Abhijit Kumar
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lindqvist, Dan
    Ericsson Radio Systems AB.
    Fjellborg, Björn
    Ericsson Radio Systems AB.
    System level virtual prototyping of DSP ASICs using grammar based approach1999In: Rapid System Prototyping, 1999. IEEE International Workshop on, 1999, p. 166-171Conference paper (Refereed)
    Abstract [en]

    DSP systems are often modeled using functional and bit-true level simulators, where it is not possible to validate the system level timing, control and configuration (SLTCC) of the product. In this paper, we present a methodology that adds SLTCC specified in grammar to functional models to create a rate true system level virtual prototype. The methodology is illustrated and benefits are quantified using two realistic examples

  • 205.
    Hemani, Ahmed
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Öberg, Johnny
    Deb, Abhijit Kumar
    Lindqvist, Dan
    Fjellborg, Björn
    Virtual prototyping of DSP ASICs using grammar based approach1999In: Radiovetenskap och Kommunikation (RVK 99), 1999Conference paper (Other academic)
  • 206.
    Henriksson, Tomas
    et al.
    NXP Semiconductors Research.
    Wolf, Pieter van der
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Bruce, Alistair
    ARM.
    Network Calculus Applied to Verification of Memory Access Performance in SoCs2007In: Proceedings of the 2007 IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia, ESTIMedia 2007, 2007, p. 21-26Conference paper (Refereed)
    Abstract [en]

    SoCs for multimedia applications typically use only one port to off-chip DRAM for cost reasons. The sharing of interconnect and the off-chip DRAM port by several IP blocks makes the performance of a SoC under design hard to predict. Network calculus defines the concept of flow and has been successfully used to analyse the performance of communication networks. We propose to apply network calculus to the verification of memory access latencies. Two novel network elements, packet stretcher and packet compressor, are used to model the SoC interconnect and DRAM controller. We further extend the flow concept with a degree and make use of the peak characteristics of a flow to tighten the bounds in the analysis. We present a video playback case study and show that the proposed application of network calculus allows us to statically verify that all requirements on memory access latency are fulfilled.

  • 207.
    Herrera, Fernando
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Attarzadeh, Hosein
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Towards a Modelling and Design Framework for Mixed-Criticality SoCs and Systems-of-Systems2013In: Proceedings - 16th Euromicro Conference on Digital System Design, DSD 2013, IEEE conference proceedings, 2013, p. 989-996Conference paper (Refereed)
    Abstract [en]

    Mixed-criticality system (MCS) design is an emerging discipline, which has been identified as a core foundational concept in fields such as cyber-physical systems. The hard real-time design community has pioneered the contributions to MCS design, extending scheduling theory to consider mixed-criticalities and the impact of on-chip and off-chip communication infrastructures. However, the development of MCS design methodologies capable to provide safe and efficient solutions for complex applications and platforms in an acceptable design time demands a more interdisciplinary approach. This paper is a first step towards such an approach in the development of MCS design methodologies. The paper first identifies main design disciplines to be involved in MCS design, both at SoC and system-of-systems (SoS) scales. Then, the paper proposes a core ontology for modelling a mixed-criticality system at both SoC scale (MCSoC) and SoS scale (MCSoS). Finally, the paper introduces a set of aspects required for MCS design which have been identified as open and challenging attending the overviewed state-of-the-art.

  • 208.
    Herrera, Fernando
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    An extensible infrastructure for modeling and time analysis of predictable embedded systems2015In: Forum on Specification and Design Languages, IEEE Computer Society, 2015Conference paper (Refereed)
    Abstract [en]

    Efficient design of predictable systems on top of multiprocessor-based architectures is challenging. It demands an integration effort to support system models relying on Models-of-Computation (MoC) theory, supporting real-time (RT) analysis and electronic system-level (ESL) design techniques. This paper presents a SystemC-based framework for modelling and time analysis of predictable embedded systems which aims such an integration. The framework has features for system-level design and research of predictable systems. Moreover, the framework is extensible, to enable experts from different communities to explore and assess their contributions, e.g. new schedulers, schedulability analyses, and predictable platform components, without having to rely on a physical platform. © 2014 ECSI.

  • 209.
    Herrera, Fernando
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Combining analytical and simulation-based design space exploration for efficient time-critical and mixed-criticality systems2015In: Forum on Specification and Design Languages, FDL 2013, 2015, p. 167-188Conference paper (Refereed)
    Abstract [en]

    In the context of the design on time-critical systems, analytical models with worst case workloads are used to identify safe solutions that guarantee hard timing constraints. However, the focus on the worst case often leads to unnecessarily pessimistic and inefficient solutions, in particular for mixed-critical systems. To overcome the situation, the paper proposes a novel design flow integrating analytical and simulation-based Design Space Exploration (DSE). This combined approach is capable to find more efficient design solutions, without sacrificing timing guarantees. For it, a first analytical DSE phase obtains a set of solutions compliant with the critical time constraints. Search of the Pareto optimum solutions is done among this set, but it is delegated to a second simulation-based search. The simulation-based search enables more accurate estimations, and the consideration of a specific (or an average-case) scenario. The chapter shows that this can lead to different Pareto sets which reflect improved design decisions with respect to a pure analytical DSE approach, and which are found faster than through a pure simulation-based DSE approach. This is illustrated through an accompanying example and a proof-of-concept implementation of the proposed DSE flow.

  • 210.
    Herrera, Fernando
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Combining Analytical and Simulation-based Design Space Exploration for Time-Critical Systems2013In: Forum on Specification & Design Languages (FDL), 2013, IEEE conference proceedings, 2013, p. 6646657-Conference paper (Refereed)
    Abstract [en]

    In the context of the design on time-critical systems, analytical models with worst case workloads are used to identify safe solutions that guarantee hard timing constraints. However, the focus on the worst case often leads to unnecessarily pessimistic and inefficient solutions, in particular for mixed-critical systems. To overcome the situation, the paper proposes a novel design flow integrating analytical and simulation-based design space exploration (DSE). This combined approach is capable to find more efficient design solutions, without sacrificing timing guarantees. For it, a first analytical DSE phase obtains a set of solutions compliant with the critical time constraints. Search of the optimum solution is done among this set, but it is delegated to a second simulation-based search, for fine tuning and average-case optimisation. The potential of our approach is illustrated by a proof-of-concept implementation of the proposed DSE flow and an accompanying DSE example.

  • 211.
    Hesamzadeh, Mohammad R.
    et al.
    KTH, School of Electrical Engineering (EES), Electric Power Systems.
    Rahman, A K M Zami-Ur
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Amelin, Mikael
    KTH, School of Electrical Engineering (EES), Electric Power Systems.
    The Probabilistic TC-PSI for Studying Market Power2011Conference paper (Refereed)
    Abstract [en]

    It is widely recognized that wholesale electricity markets tend to be prone to the exercise of market power. The exercise of market power has antisocial impacts in the liberalised electricity markets. It results in inefficient short-term dispatch outcomes, and affects the efficiency of longer-term generation investment decisions. And thus, it results in power price rises and substantial wealth transfers between electricity customers and generators. Electricity market regulators around the world tend to be interested in mechanisms for predicting marker power ex ante and detecting and controlling the exercise of market power ex post. The common indices of ex ante market power indicators however, mostly disregard transmission constraints, variation of wind farms' capacities, and dynamics of electric power systems. This paper carries out a probabilistic study of market power using an index termed Probabilistic Transmission-Constrained Pivotal Supplier Indicator (Probabilistic TC-PSI). Two probabilistic approaches (a) Monte Carlo Method (MCM), and (b) Two-Point Estimation Method (T-PEM) are employed in the probabilistic study and then compared.

  • 212.
    Hjort Blindell, Gabriel
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Menne, Christian
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Synthesizing Code for GPGPUs from Abstract Formal Models2014In: Forum on specification and Design Languages (FDL), Munich, Germany, October 14-16, 2014 / [ed] Dr. Adam Morawiec and Jinnie Hinderscheit, IEEE conference proceedings, 2014Conference paper (Refereed)
    Abstract [en]

    Today multiple frameworks exist for elevating thetask of writing programs for GPGPUs, which are massively data-parallel execution platforms. These are needed as writing correctand high-performing applications for GPGPUs is notoriouslydifficult due to the intricacies of the underlying architecture.However, the existing frameworks lack a formal foundation thatmakes them difficult to use together with formal verification,testing, and design space exploration. We present in this papera novel software synthesis tool – called f2cc – which is capableof generating efficient GPGPU code from abstract formal modelsbased on the synchronous model of computation. These modelscan be built using high-level modeling methodologies that hidelow-level architecture details from the developer. The correctnessof the tool has been experimentally validated on models derivedfrom two applications. The experiments also demonstrate that thesynthesized GPGPU code yielded a 28× speedup when executedon a graphics card with 96 cores and compared against asequential version that uses only the CPU.

  • 213.
    Horn, Wolfgang
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Svantesson, Bengt
    KTH, School of Information and Communication Technology (ICT).
    Kumar, Shashi
    Dept. of Computer Science & Engineering, Indian Institute of Technology.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hardware synthesis of an ATM multiplexer from SDL to VHDL: a case study1999In: VLSI ’99. Proceedings IEEE Computer Society Workshop On, 1999, p. 100-105Conference paper (Refereed)
    Abstract [en]

    Hardware synthesis of SDL models poses several problems, because SDL uses Communicating Sequential Processes (CSP) paradigm for system specification. It allows dynamic processes and its semantics assume an infinite FIFO buffer at the input of each process for inter-process communication. We had presented previously a methodology and later refined it for efficient hardware synthesis from SDL specification. In this paper we describe the experience of applying this methodology to a large case study. The case study is an ATM Multiplexer which exhibits a complex control flow and uses large tables. It was modelled using multiple processes. Hardware synthesis was carried out using the methodology starting from its SDL model. The results show that the methodology leads to a correct and efficient hardware implementation. In particular, the methodology avoids use of costly FIFO buffers for implementing inter-process communication and allows sharing of hardware resources among various instances of the same process. The final implementation also meets the 155 Mbit/sec data rate performance requirement

  • 214. Horn, Wolfgang
    et al.
    Svantesson, Bengt
    Kumar, Shashi
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    Hardware Synthesis of an ATM Multiplexer Modelled in SDL: A Case Study1999In: Proceedings of the IEEE Computer Society Annual Workshop on VLSI, 1999Conference paper (Refereed)
  • 215. Hu, Wenmin
    et al.
    Liu, Hengzhu
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Fu, Guitao
    Self-selection pseudo-circuit: a clever crossbar pre- allocation2012In: IEICE Electronics Express, ISSN 1349-2543, Vol. 9, no 6, p. 558-564Article in journal (Refereed)
    Abstract [en]

    This paper proposes self-selection pseudo- circuit (SP), a simple and effective approach to increase switch connection reusing rate and improve the network performance. It especially suits the network in which the performance is dominated by the number of hops. In SP scheme, multiple switch connections are allowed to be reserved for one inport, and the flit can reuse the partial switch connection(s) based on the routing information. For the evaluation with the traces from Splash-2, SP reduces the interconnection latency by up to 21.6% (16.9% average) with 16-core CMP configuration, and 22.2% ( 19.5 on average) with 64- core CMP configuration. Evaluated with synthetic traffic, the proposed scheme decreases the latency up to 19% ( 16% average).

  • 216.
    Hu, Wenmin
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. School of Computer, National University of Defense Technology, Changsha, China .
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Liu, H.
    Wang, S.
    Liu, D.
    A flexible configuration approach for fault-tolerant multicast/unicast2011In: IEEE Int. Conf. Commun. Softw. Networks, ICCSN, 2011, p. 393-396Conference paper (Refereed)
    Abstract [en]

    A flexible approach for lookup table configuration is proposed. In this scheme, a predetermined path is setup in parallel by several unicastsetup packets. Compared with other approaches, our scheme eliminates the overhead of configuration bus by adding little logic to existing multicast router based on lookup table. This extension makes any-shaped path setup possible, which benefits fault-tolerance on Network-on-Chip (NoC).

  • 217.
    Hu, Wenmin
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Liu, Hengzhu
    Power-efficient Tree-based Multicast Support for Networks-on-Chip2011In: Proceedings of the Asian Pacific Design Automation Conference (ASPDAC), 2011, p. 363-368Conference paper (Refereed)
    Abstract [en]

    In this paper, a novel hardware support for multicast on mesh Networks-on-Chip (NoC) is proposed. It supports multicast routing on any shape of tree-based paths. Two power-efficient tree-based multicast routing algorithms, Optimized tree (OPT) and Left-XY-Right-Optimized tree (LXYROPT) are also proposed. XY tree-based (XYT) algorithm and multiple unicast copies (MUC) are also implemented on the router as baselines. Along with the increase of the destination size, compared with MUC, OPT and LXYROPT achieve a remarkable improvement in both latency and throughput while the average power consumption is reduced by 50% and 45%, respectively. Compared with XYT, OPT is 10% higher in latency but gains 17% saving in power consumption. LXYROPT is 3% lower in latency and 8% lower in power consumption. In some cases, OPT and LXYROPT give power saving up to 70% less than the XYT.

  • 218.
    Hu, Wenmin
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Liu, Hengzhu
    Zhang, Botao
    Liu, Dongpei
    Network-on-Chip Multicasting with Low Latency Path Setup2011In: Proceedings of the VLSI-SoC Conference, 2011Conference paper (Refereed)
    Abstract [en]

    A low-latency path setup approach with multiple setup packets for parallel set is presented. It reduces the header overhead compared to multiaddress encoding. Further, we propose four variants of deadlock-free multicast routing algorithms using different subpath generation methods, different destination partitioning, and channel sharing strategies. Experimental results show that the quatuor partitions path-like tree outperforms other algorithms.

  • 219. Hu, Wenmin
    et al.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Liu, Hengzhu
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Multicast Path Setup Incorporating Evicting2012In: Elektronika ir Elektrotechnika, ISSN 1392-1215, no 8, p. 101-104Article in journal (Refereed)
    Abstract [en]

    In this paper, we propose a novel multicast path setup scheme, which incorporates the evicting process. Compared with the previous work, our scheme either overcomes the limitation in evicting times or reduces the setup latency.

  • 220.
    Hu, Wenmin
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Liu, Hengzhu
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    TPSS: A flexible hardware support for unicast and multicast on networks-on-chip2012In: Journal of Computers, ISSN 1796-203X, E-ISSN 1796-203X, Vol. 7, no 7, p. 1743-1752Article in journal (Refereed)
    Abstract [en]

    Multicast is an important traffic mode that runs on multi-core systems, and an efficient hardware support for multicast can greatly improve the performance of the whole system. Most multicast solutions use the dimension-order routing to generate the mutlicast trees, which are neither bandwidth nor power efficient. This article presents a synthesizable router for network-on-chip (NoC) which supports arbitrarily shaped multicast path based on a mesh topology. In our scheme, incremental setup is adopted to simplify the process of multicast tree construction. For each sub-path setup, we present a novel scheme called two period sub-path setup (TPSS). TPSS is divided into two periods: routing to a predeterminate intermediate router, and updating lookup tables from the intermediate router to destination. This novel setup makes it feasible to support arbitrarily shaped path setup. In our case study, Optimized tree algorithm (OPT) and Left-XY-Right-Optimized tree algorithm (LXYROPT) are proposed for power-efficient path searching, but they need to be pre-configured for the reason of high computation cost. Moreover, Virtual Circuit Tree Multicasting (VCTM) is also supported in our scheme for dynamic construction of multicast path, which needs no computation in path searching. The performance is evaluated by using a cycle accurate simulator developed in SystemC, and the hardware overhead is estimated by using a synthesizable HDL model. Compared to VCTM (without FIFO, multicast table and network adapter), the area overhead of implementing our router is negligible (less than 0.5%).

  • 221. Huan, Yuxiang
    et al.
    Ma, Ning
    KTH, School of Information and Communication Technology (ICT).
    Blixt, Stefan
    Zou, Zhuo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Zheng, Lirong
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    A 61 μa/MHz reconfigurable application-specific processor and system-on-chip for Internet-of-Things2016In: International System on Chip Conference, IEEE Computer Society, 2016, p. 235-239Conference paper (Refereed)
    Abstract [en]

    This paper presents a SoC design that combines general purpose control and application-specific acceleration within a reconfigurable ASIP core for Internet-of-Things applications. Sufficient processing capability and re-configurability are provided by highly customizable data path and efficient sequence control loop. By fully utilizing the data path of proposed architecture, the processor significantly reduces >4X code size and offers superior performance compared with MSP430 and Atmega128 in FIR and Whetstone benchmarks. More than 10X speedup can be obtained in executing encryption algorithms by optimized micro-instructions without extra hardware accelerators. Fabricated in 0.18 μm CMOS, our SoC's energy efficiency beats most of the microcontrollers with a value as low as 61 μA/MHz.

  • 222.
    Huang, Jinliang
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Signell, Svante
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    On Spectral Efficiency of Low-Complexity Adaptive MIMO Systems in Rayleigh Fading Channel2009In: IEEE Transactions on Wireless Communications, ISSN 1536-1276, E-ISSN 1558-2248, Vol. 8, no 9, p. 4369-4374Article in journal (Refereed)
    Abstract [en]

    Adaptive MQAM modulation is used to maximize spectral efficiency of Multiple-Input Multiple-Output (MIMO) systems while keeping bit error rate (BER) under a target level. Closed-form expressions of the average spectral efficiency, coined as discrete-rate spectral efficiency (DRSE), are derived for adaptive modulation MIMO systems using different algorithms. To further enhance the spectral efficiency, a low complexity adaptation scheme is suggested to switch across different algorithms based on the DRSE. In the current letter, we investigate the adaptation scheme that switches between Orthogonal Space-Time Block Codes (OSTBC) and spatial multiplexing with zero-forcing (ZF) detection for MIMO systems with two transmit antennas. Two types of operating environment are considered: flat Rayleigh fading channel without spatial correlation and spatially correlated Rayleigh fading channel with transmit correlation.

  • 223. Isfahani, S. M. M.
    et al.
    Kazerouni, I. A.
    Zou, Zhuo
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Baghaei-Nejad, Majid
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Zheng, Li-Rong
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    An ultra-low power multi-tunable triangle wave generator with frequency and amplitude control2010In:  , 2010, p. 236-239Conference paper (Refereed)
    Abstract [en]

     ultra-low power adjustable triangle wave generator with a multi tunable amplitude and frequency is introduced in this paper. The proposed circuit consists of a Schmitt trigger and a current source. The overall nonlinearity of the TWG circuit is less than 2% in its current-to-frequency transfer characteristic. The tunable frequency and amplitude range are 10KHz to 40KHz and 0.1V-1.7V respectively. The topology is suitable for VLSI realization and can be used in the WINeR system.

  • 224. Isoaho, Jouni
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    DSP Development with Full-Speed Prototyping Based on HW-SW Codesign Techniques1994In: Proc. of the Fourth International Workshop on Field programmable Logic and Applications, Prague, FPL’94, 1994Conference paper (Refereed)
  • 225.
    Jafari, Fahimeh
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Output Process of Variable Bit-Rate Flows in On-Chip Networks Based on Aggregate Scheduling2011In: Proceedings of the International Conference on Computer Design, 2011, p. 445-446Conference paper (Refereed)
    Abstract [en]

     In NoCs often several flows are merged into one aggregate flow due to heavy resource sharing. For strengthening formal performance analysis, we propose an improved model for an output flow of a FIFO multiplexer under aggregate scheduling. The model of the aggregate flow is formally proven and can serve as the basis for a stringent worst case delay and buffer analysis.

  • 226.
    Jafari, Fahimeh
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Worst-Case Delay Analysis of Variable Bit-Rate Flows in Network-on-Chip with Aggregate Scheduling2012In: Proceedings of the Design and Test in Europe Conference (DATE), 2012, p. 538-541Conference paper (Refereed)
    Abstract [en]

    Aggregate scheduling in routers merges several flows into one aggregate flow. We propose an approach for computing the end-to-end delay bound of individual flows in a FIFO multiplexer under aggregate scheduling. A synthetic case study exhibits that the end-to-end delay bound is up to 33.6% tighter than the case without considering the traffic peak behavior.

  • 227.
    Jafari, Fahimeh
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Li, Shuo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Optimal Selection of Function Implementation in a Hierarchical Configware Synthesis Method for a Coarse Grain Reconfigurable Architecture2011In: Proceedings: 2011 14th Euromicro Conference on Digital System Design: Architectures, Methods and Tools, DSD 2011, 2011, p. 73-80Conference paper (Refereed)
    Abstract [en]

    We have proposed a Dynamically Reconfigurable Resource Array (DRRA), which is a Coarse Grain Reconfigurable Architecture (CGRA). In this paper, we propose a hierarchical method for compiling DSP applications in Simulink into DRRA. In this method, each function in DRRA library can be implemented in different architecture styles and also each architectural style can be implemented in varying degrees of parallelism. Since selecting an appropriate implementation for functions of an application is very effective in performance and cost of architecture, we also formulate an optimization problem that considers implementations of functions as decision variables in order to minimize total energy consumed in the architecture under performance and cost constraints. A realistic case study exhibits up to 89% reduction of total energy consumption. It is worth mentioning that by using the proposed hierarchically compilation method, the design space is reduced dramatically while keeping the solution optimized in term of energy consumption. Hence, the optimization algorithm has low run-time complexity, enabling quick exploration of large design spaces.

  • 228.
    Jafari, Fahimeh
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. Ferdowsi University of Mashhad, Iran .
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Yaghmaee, Mohammad H.
    Ferdowsi Univ Mashhad, Dept Comp, Fac Engn, Mashhad, Iran.
    Optimal Regulation of Traffic Flows in Networks-on-Chip2010In: Proceedings of the Design Automation and Test Europe Conference (DATE), IEEE Computer Society, 2010, p. 1621-1624Conference paper (Refereed)
    Abstract [en]

    We have proposed (σ, ρ)-based flow regulation to reduce delay and backlog bounds in SoC architectures, where σ bounds the traffic burstiness and ρ the traffic rate. The regulation is conducted per-flow for its peak rate and traffic burstiness. In this paper, we optimize these regulation parameters in networks on chips where many flows may have conflicting regulation requirements. We formulate an optimization problem for minimizing total buffers under performance constraints. We solve the problem with the interior point method. Our case study results exhibit 48% reduction of total buffers and 16% reduction of total latency for the proposed problem. The optimization solution has low run-time complexity, enabling quick exploration of large design space.

  • 229.
    Jafari, Fahimeh
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Yaghmaee, Mohammad Hossein
    Buffer Optimization in Network-on-Chip Through Flow Regulation2010In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, ISSN 0278-0070, E-ISSN 1937-4151, Vol. 29, no 12, p. 1973-1986Article in journal (Refereed)
    Abstract [en]

    For network-on-chip (NoC) designs, optimizing buffers is an essential task since buffers are a major source of cost and power consumption. This paper proposes flow regulation and has defined a regulation spectrum as a means for system-on-chip architects to control delay and backlog bounds. The regulation is performed per flow for its peak rate and burstiness. However, many flows may have conflicting regulation requirements due to interferences with each other. Based on the regulation spectrum, this paper optimizes the regulation parameters aiming for buffer optimization. Three timing-constrained buffer optimization problems are formulated, namely, buffer size minimization, buffer variance minimization, and multiobjective optimization, which has both buffer size and variance as minimization objectives. Minimizing buffer variance is also important because it affects the modularity of routers and network interfaces. A realistic case study exhibits 62.8% reduction of total buffers, 84.3% reduction of total latency, and 94.4% reduction on the sum of variances of buffers. Likewise, the experimental results demonstrate similar improvements in the case of synthetic traffic patterns. The optimization algorithm has low run-time complexity, enabling quick exploration of large design spaces. This paper concludes that optimal flow regulation can be a highly valuable instrument for buffer optimization in NoC designs.

  • 230.
    Jafri, Syed. M. A. H.
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Bag, Ozan
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Farahini, Nasim
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Kolin, Paul
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Plosila, Juha
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Energy-Aware Coarse-Grained Reconfigurable Architectures using Dynamically Reconfigurable Isolation Cells2013In: Proceedings Of The Fourteenth International Symposium On Quality Electronic Design (ISQED 2013), 2013, p. 104-111Conference paper (Refereed)
    Abstract [en]

    This paper presents a self adaptive architecture to enhance the energy efficiency of coarse-grained reconfigurable architectures (CGRAs). Today, platforms host multiple applications, with arbitrary inter-application communication and concurrency patterns. Each application itself can have multiple versions (implementations with different degree of parallelism) and the optimal version can only be determined at runtime. For such scenarios, traditional worst case designs and compile time mapping decisions are neither optimal nor desirable. Existing solutions to this problem employ costly dedicated hardware to configure the operating point at runtime (using DVFS). As an alternative to dedicated hardware, we propose exploiting the reconfiguration features of modern CGRAs. Our solution relies on dynamically reconfigurable isolation cells (DRICs) and autonomous parallelism, voltage, and frequency selection algorithm (APVFS). The DRICs reduce the overheads of DVFS circuitry by configuring the existing resources as isolation cells. APVFS ensures high efficiency by dynamically selecting the parallelism, voltage and frequency trio, which consumes minimum power to meet the deadlines on available resources. Simulation results using representative applications (Matrix multiplication, FIR, and FFT) showed up to 23% and 51% reduction in power and energy, respectively, compared to traditional DVFS designs. Synthesis results have confirmed significant reduction in area overheads compared to state of the art DVFS methods.

  • 231.
    Jafri, Syed M. A. H.
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. University of Turku, Finland.
    Tajammul, Adeel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Daneshtalab, Masoud
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. University of Turku, Finland.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, Kolin
    Indian Institute of Technology.
    Ellervee, Peeter
    Plosila, Juha
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. University of Turku, Finland.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. University of Turku, Finland.
    Morphable Compression Architecture for Efficient Configuration in CGRAs2014In: 2014 17th Euromicro Conference on Digital System Design (DSD), 2014, p. 42-49Conference paper (Refereed)
    Abstract [en]

    Today, Coarse Grained Reconfigurable Architectures (CGRAs) host multiple applications. Novel CGRAs allow each application to exploit runtime parallelism and time sharing. Although these features enhance the power and silicon efficiency, they significantly increase the configuration memory overheads (up to 50% area of the overall platform). As a solution to this problem researchers have employed statistical compression, intermediate compact representation, and multicasting. Each of these techniques has different properties (i.e. compression ratio and decoding time), and is therefore best suited for a particular class of applications (and situation). However, existing research only deals with these methods separately. In this paper we propose a morphable compression architecture that interleaves these techniques in a unique platform. The proposed architecture allows each application to enjoy a separate compression/decompression hierarchy (consisting of various types and implementations of hardware/software decoders) tailored to its needs. Thereby, our solution offers minimal memory while meeting the required configuration deadlines. Simulation results, using different applications (FFT, Matrix multiplication, and WLAN), reveal that the choice of compression hierarchy has a significant impact on compression ratio (from configware replication to 52%) and configuration cycles (from 33 nsec to 1.5 secs) for the tested applications. Synthesis results reveal that introducing adaptivity incurs negligible additional overheads (1%) compared to the overall platform area.

  • 232.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Bag, Ozan
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Farahini, Nasim
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, Kolin
    Plosila, Juha
    University of Turku, Finland.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Energy-Aware CGRAs using Dynamically Re-configurable isolation Cells2013Conference paper (Refereed)
    Abstract [en]

    This paper presents a self adaptive architectureto enhance the energy efficiency of coarse-grained reconfigurablearchitectures (CGRAs). Today, platforms host multipleapplications, with arbitrary inter-application communication andconcurrency patterns. Each application itself can have multipleversions (implementations with different degree of parallelism)and the optimal version can only be determined at runtime. Forsuch scenarios, traditional worst case designs and compile timemapping decisions are neither optimal nor desirable. Existingsolutions to this problem employ costly dedicated hardware toconfigure the operating point at runtime (using DVFS). As analternative to dedicated hardware, we propose exploiting thereconfiguration features of modern CGRAs. Our solution relieson dynamically reconfigurable isolation cells (DRICs) and autonomousparallelism, voltage, and frequency selection algorithm(APVFS). The DRICs reduce the overheads of DVFS circuitryby configuring the existing resources as isolation cells. APVFSensures high efficiency by dynamically selecting the parallelism,voltage and frequency trio, which consumes minimum powerto meet the deadlines on available resources. Simulation resultsusing representative applications (Matrix multiplication, FIR,and FFT) showed up to 23% and 51% reduction in powerand energy, respectively, compared to traditional DVFS designs.Synthesis results have confirmed significant reduction in areaoverheads compared to state of the art DVFS methods.

  • 233.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. Turku Centre for Computer Science, Finland; University of Turku, Finland.
    Gia, T.N.
    University of Turku, Finland.
    Dytckov, Sergei
    University of Turku, Finland.
    Daneshtalab, Masoud
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. University of Turku, Finland.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Plosila, Juha
    Turku Centre for Computer Science, Finland; University of Turku, Finland.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. University of Turku, Finland.
    NeuroCGRA: A CGRA with support for neural networks2014In: Proceedings of the 2014 International Conference on High Performance Computing and Simulation, HPCS 2014, IEEE , 2014, p. 506-511Conference paper (Refereed)
    Abstract [en]

    Today, Coarse Grained Reconfigurable Architectures (CGRAs) are becoming an increasingly popular implementation platform. In real world applications, the CGRAs are required to simultaneously host processing (e.g. Audio/video acquisition) and estimation (e.g. audio/video/image recognition) tasks. For estimation problems, neural networks, promise a higher efficiency than conventional processing. However, most of the existing CGRAs provide no support for neural networks. To realize realize both neural networks and conventional processing on the same platform, this paper presents NeuroCGRA. NeuroCGRA allows the processing elements and the network to dynamically morph into either conventional CGRA or a neural network, depending on the hosted application. We have chosen the DRRA as a vehicle to study the feasibility and overheads of our approach. Synthesis results reveal that the proposed enhancements incur negligible overheads (4.4% area and 9.1% power) compared to the original DRRA cell.

  • 234.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Guang, Liang
    University of Turku, Finland.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, Kolin
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Plosila, Juha
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Energy-Aware Fault-Tolerant Network-on-Chips for Addressing Multiple Traffic Classes2012In: Proceedings: 15th Euromicro Conference on Digital System Design, DSD 2012, 2012, p. 242-249Conference paper (Refereed)
    Abstract [en]

    This paper presents an energy efficient architectureto provide on-demand fault tolerance to multiple traffic classes,running simultaneously on single network on chip (NoC) platform.Today, NoCs host multiple traffic classes with potentiallydifferent reliability needs. Providing platform-wide worst-case(maximum) protection to all the classes is neither optimal nordesirable. To reduce the overheads incurred by fault tolerance,various adaptive strategies have been proposed. The proposedtechniques rely on individual packet fields and operating conditionsto adjust the intensity and hence the overhead of faulttolerance. Presence of multiple traffic classes undermines theeffectiveness of these methods. To complement the existing adaptivestrategies, we propose on-demand fault tolerance, capableof providing required reliability, while significantly reducing theenergy overhead. Our solution relies on a hierarchical agentbased control layer and a reconfigurable fault tolerance datapath. The control layer identifies the traffic class and directs thepacket to the path providing the needed reliability. Simulationresults using representative applications (matrix multiplication,FFT, wavefront, and HiperLAN) showed up to 95% decrease inenergy consumption compared to traditional worst case methods.Synthesisresultshave confirmedanegligible additionaloverhead,for providing on-demand protection (up to 5.3% area), comparedto the overall fault tolerance circuitry.

  • 235.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. University of Turku, Finland.
    Guang, Liang
    University of Turku, Finland.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, Kolin
    Indian Institute of Technology, Delhi, India.
    Plosila, Juha
    University of Turku, Finland.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Energy-aware fault-tolerant network-on-chips for addressing multiple traffic classes2013In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 37, no 8, p. 811-822Article in journal (Refereed)
    Abstract [en]

    This paper presents an energy efficient architecture to provide on-demand fault tolerance to multiple traffic classes, running simultaneously on single network on chip (NoC) platform. Today, NoCs host multiple traffic classes with potentially different reliability needs. Providing platform-wide worst-case (maximum) protection to all the classes is neither optimal nor desirable. To reduce the overheads incurred by fault tolerance, various adaptive strategies have been proposed. The proposed techniques rely on individual packet fields and operating conditions to adjust the intensity and hence the overhead of fault tolerance. Presence of multiple traffic classes undermines the effectiveness of these methods. To complement the existing adaptive strategies, we propose on-demand fault tolerance, capable of providing required reliability, while significantly reducing the energy overhead. Our solution relies on a hierarchical agent based control layer and a reconfigurable fault tolerance data path. The control layer identifies the traffic class and directs the packet to the path providing the needed reliability. Simulation results using representative applications (matrix multiplication, FFT, wavefront, and HiperLAN) showed up to 95% decrease in energy consumption compared to traditional worst case methods. Synthesis results have confirmed a negligible additional overhead, for providing on-demand protection (up to 5.3% area), compared to the overall fault tolerance circuitry.

  • 236.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Guang, Liang
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, Kolin
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Self-Adaptive NoC Power Management with Dual-Level Agents: Architecture and Implementation2012In: PECCS 2012 - Proceedings of the 2nd International Conference on Pervasive Embedded Computing and Communication Systems, 2012, p. 450-458Conference paper (Refereed)
    Abstract [en]

    Architecture and Implementation of adaptive NoC to improve performance and power consumption is presented. On platforms hosting multiple applications, hardware variations and unpredictable workloads make static design-time assignments highly sub-optimal e.g. in terms of power and performance. As a solution to this problem, adaptive NoCs are designed, which dynamically adapt towards optimal implementation. This paper addresses the architectural design of adaptive NoC, which is an essential step towards design automation. The architecture involves two levels of agents: a system level agent implemented in software on a dedicated general purpose processor and the local agents implemented as microcontrollers of each network node. The system agent issues specific instructions to perform monitoring and reconfiguration operations, while the local agents operate according to the commands from the system agent. To demonstrate the system architecture, best-effort power management with distributed voltage and frequency scaling is implemented, while meeting run-time execution requirements. Four benchmarks (matrix multiplication, FFT, wavefront, and hiperLAN transmitter) are experimented on a cycle-accurate RTL-level shared-memory NoC simulator. Power analysis with 65nm multi-Vdd library shows a significant reduction in energy consumption (from 21 % to 36 %). The synthesis also shows minimal area overhead (4 %) of the local agent compared to the original NoC switch.

  • 237.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, Kolin
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Plosila, Juha
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Compact Generic Intermediate representation (CGIR) to enable late binding in Coarse Grained Reconfigurable Architectures2011Conference paper (Refereed)
  • 238.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, Kolin
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Plosila, Juha
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Compression Based Efficient and Agile Configuration Mechanism for Coarse Grained Reconfigurable Architectures2011In: Proc. IEEE Int Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW) Symp, 2011, p. 290-293Conference paper (Refereed)
    Abstract [en]

    This paper considers the possibility of speeding up the configuration by reducing the size of configware in coarsegrained reconfigurable architectures (CGRAs). Our goal was to reduce the number of cycles and increase the configuration bandwidth. The proposed technique relies on multicasting and bitstream compression. The multicasting reduces the cycles by configuring the components performing identical functions simultaneously, in a single cycle, while the bitstream compression increases the configuration bandwidth. We have chosen the dynamically reconfigurable resource array (DRRA) architecture as a vehicle to study the efficiency of this approach. In our proposed method, the configuration bitstream is compressed offline and stored in a memory. If reconfiguration is required, the compressed bitstream is decompressed using an online decompresser and sent to DRRA. Simulation results using practical applications showed upto 78% and 22% decrease in configuration cycles for completely parallel and completely serial implementations, respectively. Synthesis results have confirmed nigligible overhead in terms of area (1.2 %) and timing.

  • 239.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Leon, Guillermo Serrano
    Daneshtalab, Masoud
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Abbas, N.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, Kolin
    Indian Institute of Technology.
    Plosila, Juha
    University of Turku, Finland.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    TransPar: Transformation based dynamic Parallelism for low power CGRAs2014In: Conference Digest - 24th International Conference on Field Programmable Logic and Applications, FPL 2014, 2014Conference paper (Refereed)
    Abstract [en]

    Coarse Grained Reconfigurable Architectures (CGRAs) are emerging as enabling platforms to meet the high performance demanded by modern applications (e.g. 4G, CDMA, etc.). Recently proposed CGRAs offer runtime parallelism to reduce energy consumption (by lowering voltage/frequency). To implement the runtime parallelism, CGRAs commonly store multiple compile-time generated implementations of an application (with different degree of parallelism) and select the optimal version at runtime. However, the compile-time binding incurs excessive configuration memory overheads and/or is unable to parallelize an application even when sufficient resources are available. As a solution to this problem, we propose Transformation based dynamic Parallelism (TransPar). TransPar stores only a single implementation and applies a series for transformations to generate the bitstream for the parallel version. In addition, it also allows to displace and/or rotate an application to parallelize in resource constrained scenarios. By storing only a single implementation, TransPar offers significant reductions in configuration memory requirements (up to 73% for the tested applications), compared to state of the art compaction techniques. Simulation and synthesis results, using real applications, reveal that the additional flexibility allows up to 33% energy reduction compared to static memory based parallelism techniques. Gate level analysis reveals that TransPar incurs negligible silicon (0.2% of the platform) and timing (6 additional cycles per application) penalty.

  • 240.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Leon, Guillermo Serrano
    Iqbal, J.
    Daneshtalab, Masoud
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, Kolin
    Indian Institute of Technology.
    Plosila, Juha
    University of Turku, Finland.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    RuRot: Run-time rotatable-expandable partitions for efficient mapping in CGRAs2014In: Proceedings - International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, SAMOS 2014, 2014, p. 233-241Conference paper (Refereed)
    Abstract [en]

    Today, Coarse Grained Reconfigurable Architectures (CGRAs) host multiple applications, with arbitrary communication and computation patterns. Compile-time mapping decisions are neither optimal nor desirable to efficiently support the diverse and unpredictable application requirements. As a solution to this problem, recently proposed architectures offer run-time remapping. The run-time remappers displace or expand (parallelize/serialize) an application to optimize different parameters (such as platform utilization). However, the existing remappers support application displacement or expansion in either horizontal or vertical direction. Moreover, most of the works only address dynamic remapping in packet-switched networks and therefore are not applicable to the CGRAs that exploit circuitswitching for low-power and high predictability. To enhance the optimality of the run-time remappers, this paper presents a design framework called Run-time Rotatable-expandable Partitions (RuRot). RuRot provides architectural support to dynamically remap or expand (i.e. parallelize) the hosted applications in CGRAs with circuit-switched interconnects. Compared to state of the art, the proposed design supports application rotation (in clockwise and anticlockwise directions) and displacement (in horizontal and vertical directions), at run-time. Simulation results using a few applications reveal that the additional flexibility enhances the device utilization, significantly (on average 50 % for the tested applications). Synthesis results confirm that the proposed remapper has negligible silicon (0.2 % of the platform) and timing (2 cycles per application) overheads.

  • 241.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Piestrak, S. J.
    Sentieys, O.
    Pillement, S.
    Design of a fault-tolerant coarse-grained reconfigurable architecture: A case study2010In: Proceedings of the 11th International Symposium on Quality Electronic Design, ISQED 2010, 2010, p. 845-852Conference paper (Refereed)
    Abstract [en]

    This paper considers the possibility of implementing low-cost hardware techniques which would allow to tolerate temporary faults in the datapaths of coarse-grained reconfigurable architectures (CGRAs). Our goal was to use less hardware overhead than commonly used duplication or triplication methods. The proposed technique relies on concurrent error detection by using residue code modulo 3 and re-execution of the last operation, once an error is detected. We have chosen the DART architecture as a vehicle to study the efficiency of this approach to protect its datapaths. Simulation results have confirmed hardware savings of the proposed approach over duplication.

  • 242.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Piestrak, S. J.
    Sentieys, O.
    Pillement, S.
    Design of the coarse-grained reconfigurable architecture DART with on-line error detection2014In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 38, no 2, p. 124-136Article in journal (Refereed)
    Abstract [en]

    This paper presents the implementation of the coarse-grained reconfigurable architecture (CGRA) DART with on-line error detection intended for increasing fault-tolerance. Most parts of the data paths and of the local memory of DART are protected using residue code modulo 3, whereas only the logic unit is protected using duplication with comparison. These low-cost hardware techniques would allow to tolerate temporary faults (including so called soft errors caused by radiation), provided that some technique based on re-execution of the last operation is used. Synthesis results obtained for a 90 nm CMOS technology have confirmed significant hardware and power consumption savings of the proposed approach over commonly used duplication with comparison. Introducing one extra pipeline stage in the self-checking version of the basic arithmetic blocks has allowed to significantly reduce the delay overhead compared to our previous design.

  • 243.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. University of Turku, Finland.
    Piestrak, Stanislaw J.
    IJL/Universit´e de Lorraine, France.
    Paul, Kolin
    Indian Institute of Technology, Delhi, India.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Plosila, Juha
    University of Turku, Finland.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Energy-Aware Fault-Tolerant CGRAs Addressing Application with Different Reliability Needs2013In: Digital System Design (DSD), 2013 Euromicro Conference on, IEEE conference proceedings, 2013, p. 525-534Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose a polymorphic fault tolerant architecture that can be tailored to efficiently support the reliability needs of multiple applications at run-time. Today, coarse-grained reconfigurable architectures (CGRAs) host multiple applications with potentially different reliability needs. Providing platform-wide worst-case (maximum) protection to all the applications is neither optimal nor desirable. To reduce the fault-tolerance overhead, adaptive fault-tolerance strategies have been proposed. The proposed techniques access the reliability requirements of each application and adjust the fault-tolerance intensity (and hence overhead), accordingly. However, existing flexible reliability schemes only allow to shift between different levels of modular redundancy (duplication, triplication, etc.) and deal with only a single class of faults (e.g. soft errors). To complement these strategies, we propose energy-aware fault-tolerance that, in addition to modular redundancy, can also provide low cost, sub-modular (e.g. residue mod 3) redundancy, to cater both permanent and temporary faults. Our solution relies on an agent based control layer and a configurable fault-tolerance data path. The control layer identifies the application class and configures the data path to provide the needed reliability. Simulation results using a few selected algorithms (FFT, matrix multiplication, and FIR filter) showed that the proposed method provides flexible protection with energy overhead ranging from 3.125% to 107% for different reliability levels. Synthesis results have confirmed that the proposed architecture significantly reduces the area overhead for self-checking (59.1%) and fault tolerant (7.1%) versions, compared to the state of the art adaptive reliability techniques.

  • 244.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. University of Turku, Finland.
    Tajammul, Muhammad Adeel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, Kolin
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. Indian Institute of Technology.
    Plosila, Juha
    University of Turku, Finland.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Energy-Aware-Task-Parallelism for Efficient Dynamic Voltage, and Frequency Scaling, in CGRAs2013In: Proceedings - 2013 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, IC-SAMOS 2013, IEEE , 2013, p. 104-112Conference paper (Refereed)
    Abstract [en]

    Today, coarse grained reconfigurable architectures (CGRAs) host multiple applications, with arbitrary communication and computation patterns. Each application itself is composed of multiple tasks, spatially mapped to different parts of platform. Providing worst-case operating point to all applications leads to excessive energy and power consumption. To cater this problem, dynamic voltage and frequency scaling (DVFS) is a frequently used technique. DVFS allows to scale the voltage and/or frequency of the device, based on runtime constraints. Recent research suggests that the efficiency of DVFS can be significantly enhanced by combining dynamic parallelism with DVFS. The proposed methods exploit the speedup induced by parallelism to allow aggressive frequency and voltage scaling. These techniques, employ greedy algorithm, that blindly parallelizes a task whenever required resources are available. Therefore, it is likely to parallelize a task(s) even if it offers no speedup to the application, thereby undermining the effectiveness of parallelism. As a solution to this problem, we present energy aware task parallelism. Our solution relies on a resource allocation graphs and an autonomous parallelism, voltage, and frequency selection algorithm. Using resource allocation graph, as a guide, the autonomous parallelism, voltage, and frequency selection algorithm parallelizes a task only if its parallel version reduces overall application execution time. Simulation results, using representative applications (MPEG4, WLAN), show that our solution promises better resource utilization, compared to greedy algorithm. Synthesis results (using WLAN) confirm a significant reduction in energy (up to 36%), power (up to 28%), and configuration memory requirements (up to 36%), compared to state of the art.

  • 245.
    Jafri, Syed
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Piestrak, S. J.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, K.
    Plosila, J.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Implementation and evaluation of configuration scrubbing on CGRAs: A case study2013In: 2013 International Symposium on System-on-Chip, SoC 2013 - Proceedings, IEEE Computer Society, 2013, p. 6675262-Conference paper (Refereed)
    Abstract [en]

    This paper investigates the overhead imposed by various configuration scrubbing techniques used in fault-tolerant Coarse Grained Reconfigurable Arrays (CGRAs). Today, reconfigurable architectures host large configuration memories. As we progress further in the nanometer regime, these configuration memories have become increasingly susceptible to single event upsets caused e.g. by cosmic radiation. Configuration scrubbing is a frequently used technique to protect these configuration memories against single event upsets. Existing works on configuration scrubbing deal only with FPGA without any reference to the CGRAs (in which configuration memories consume up to 50% of silicon area). Moreover, in the known literature lacks a comprehensive comparison of various configuration scrubbing techniques to guide system designers about the merits/demerits of different scrubbing methods which could be applied to CGRAs. To address these problems, in this paper we classify various configuration scrubbing techniques and quantify their trade-offs when implemented on a CGRA. Synthesis results reveal that scrubbing logic incurs negligible silicon overhead (up to 3% of the area of computational units). Simulation results obtained for a few algorithms/applications (FFT, FIR, matrix multiplication, and WLAN) show that the choice of the configuration scrubbing scheme (external vs. internal) has significant impact on both the size of configuration memory and the number of reconfiguration cycles (respectively 20-80% more and up to 38 times more for the former).

  • 246.
    Jakobsen, M. K.
    et al.
    Technical University of Denmark.
    Madsen, J.
    Technical University of Denmark.
    Attarzadeh Niaki, Seyed Hosein
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hansen, J.
    System level modelling with open source tools2011In: Embedded World Conference 2011, 2011Conference paper (Refereed)
    Abstract [en]

    In this paper, we present a system level designmethodology which allows designers to model andanalyze their systems from the early stages of thedesign process until nal implementation. The de-sign methodology targets heterogeneous embeddedsystems and is based on a formal modeling frame-work, called ForSyDe. ForSyDe is available underthe open Source approach, which allows small andmedium enterprises (SME) to get easy access toadvanced modeling capabilities and tools. We givean introduction to the design methodology throughthe system level modeling of a simple industrial usecase, and we outline the basics of the underlyingForSyDe model.

  • 247.
    Jantsch, A.
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Kumar, S.
    Sander, I.
    Svantesson, B.
    Öberg, J.
    Hemani, A.
    Evaluation of Languages for Specification of Telecom Systems1998Report (Other academic)
  • 248.
    Jantsch, A.
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Kumar, S.
    Sander, Ingo
    Svantesson, B.
    Öberg, J.
    Hemani, A.
    Comparison of Six Languages for System Level Descriptions of Telecom Systems1998In: Proceedings of the Forum on Design Languages, 1998, Vol. 2Conference paper (Refereed)
  • 249.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A Capability Library for High Level Synthesis1992In: Workshop on Control Dominated Synthesis, 1992Conference paper (Refereed)
  • 250.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    An Analysis of the relation between a dataflow graph and its implementations1992Report (Other academic)
2345678 201 - 250 of 633
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf