Change search
Refine search result
1234567 51 - 100 of 633
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 51.
    Bagger, Reza
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Olsson, Håkan
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    RF Power Amplifier IC with Low Memory Effect, Reduced Low Frequency Gain Peak and Isolated Temperature Tracking Circuitry2008In: 2008 IEEE CSIC Symposium: GaAs ICs Celebrate 30 Years in Monterey, Technical Digest 2008, IEEE , 2008, p. 60-63Conference paper (Refereed)
    Abstract [en]

    A highly linear wideband power amplifier IC with low memory effect for W-CDMA applications is presented utilizing Si LDMOS process technology. The IC was optimized to reduce typical low frequency gain peak often observed in LDMOS power devices. Topology of the interstage matching contributes to reduction of the electrical memory effect to specification level of maximum 2 dB imbalance over power between upper and lower Adjacent Channel Power Ratio when using II-tone wideband modulated signal. The on-chip temperature compensation circuitry tracks the active device temperature characteristic without degradation of the linearity or worsens the memory effect. The measured gain of the IC was 28.5 dB and 3-dB bandwidth of 600 MHz around 2100 MHz was achieved. The IC attained -50 dBc ACPR at 5 W output power. At power level of 45 W and IMD3 = -30dBc (two-tone) the IC exhibited power densities in excess of 469 mW/mm, in which matching losses were included. The IC demonstrated state-of-the-art RF power performance in terms of good linearity, low memory effects, well-suppressed low frequency gain peak and temperature tracking without linearity and IMD balance degradation.

  • 52.
    Baghaei-Nejad, Majid
    et al.
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Radiom, S.
    Vandenbosch, G.
    Zheng, Li-Rong
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Gielen, G.
    MICAS-ESAT, Katholieke Univ. Leuven.
    Fully integrated 1.2 pJ/p UWB transmitter with on-chip antenna for wireless identification2010In: Ultra-Wideband (ICUWB), 2010 IEEE International Conference on, IEEE Press, 2010, Vol. 1, p. 237-240Conference paper (Refereed)
    Abstract [en]

    A fully CMOS integrated impulse ultra wideband transmitter with monolithically on-chipantenna (OCA) for wireless identification is presented. Both OOK and BPSK modulation schemes are supported by the module. The chip is fabricated in standard 0.18μm CMOS technology. Direct measurement verifies the chip operation and wireless transmission measurement shows 7cm operation range with 1.2 pJ/pulse consumption at 10MPps, which is a huge improvement compared with related reported work with OCA.

  • 53.
    Baghaei-Nejad, Majid
    et al.
    Sabzevar Tarbiat Moallem University, Sabzevar, Iran.
    Zheng, Li-Rong
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Low cost and precise localization in a remote-powered wireless sensor and identification system2011Conference paper (Refereed)
    Abstract [en]

    A low cost and precise localization system based on a remote-powered UWB-RFID tag is presented for wireless identification, sensing, positioning and tracking. Our contribution is to utilize the Impulse Radio Ultra wideband (IR-UWB) communication in aRFID system. Such as conventional RFIDs, a tag captures energy from the received RF signal transmitted by a reader which also carries data and clock. However, instead of backscattering, an Impulse-UWB transmitter is used. By a low power design operation distance of 13.9 meters is achieved. A network consist of several readers provide power and retrieve data from the tags in a wide area. Due to the fine time resolution of the ultra-short pulse in IR-UWB, the UWB receiver in the readers are able to accurately approximate the time of arrival of the signal and based on the time-difference-of-arrival algorithm the position of the tag can be estimated precisely. In the line-of-sight scenario by a two-step acquisition system, ±16.8cm accuracy can be achieved. By a new communication protocol proposed based on slotted-ALOHA anti-collision algorithm, 2000 tags per second can be read. The tag circuitry is designed and implemented in CMOS 180nm technology in a single chip solution.

  • 54. Baldoni, R.
    et al.
    Di Ciccio, C.
    Mecella, M.
    Patrizi, F.
    Querzoni, L.
    Santucci, G.
    Dustdar, S.
    Li, F.
    Truong, H. -L
    Albornos, L.
    Milagro, F.
    Rafael, P. A.
    Ayani, Rassul
    KTH, School of Information and Communication Technology (ICT).
    Rasch, Katharina
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lozano, M. G.
    Aiello, M.
    Lazovik, A.
    Denaro, A.
    Lasala, G.
    Pucci, P.
    Holzner, C.
    Cincotti, F.
    Aloise, F.
    An embedded middleware platform for pervasive and immersive environments for-all2009In: 2009 6th IEEE Annual Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops, SECON Workshops 2009, IEEE , 2009, p. 161-163Conference paper (Refereed)
    Abstract [en]

    Embedded systems are specialized computers used in larger systems or machines to control equipments such as automobiles, home appliances, communication, control and office machines. Such pervasivity is particularly evident in immersive realities, i.e., scenarios in which invisible embedded systems need to continuously interact with human users, in order to provide continuous sensed information and to react to service requests from the users themselves. The SM4All project investigates an innovative middleware platform for inter-working of smart embedded services in immersive and person-centric environments, through the use of composability and semantic techniques for dynamic service reconfiguration. This is applied to the challenging scenario of private houses and home-care assistance in presence of users with different abilities and needs (e.g., young, able-bodied, aged and disabled). This paper presentes a brief overview of the SM4All system architecture.

  • 55.
    Bengtsson, T.
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A Sufficient Condition for Detection of XOR-Type Logic2001In: Proceedings of NORCHIP’01, 2001, p. 271-279Conference paper (Refereed)
  • 56.
    Bengtsson, T.
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Krenz, R.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Programmable Logic in Fault-Tolerant Design2001In: Proceedings of 4th Military and Aerospace Applications of Programmable Devices and Technologies International Conference, 2001Conference paper (Refereed)
  • 57.
    Beserra, G. S.
    et al.
    University of Brasilia.
    Attarzadeh Niaki, Seyed Hosein
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Integrating virtual platforms into a heterogeneous MoC-based modeling framework2012In: Proceedings of Forum on Specification and Design Languages (FDL) 2012, IEEE conference proceedings, 2012, p. 143-150Conference paper (Refereed)
    Abstract [en]

    In order to handle the increasing complexity of embedded systems, design methodologies must take into account important aspects, such as abstraction, IP-reuse and heterogeneity. System design often starts in a high abstraction level, by developing a virtual platform (VP), which is typically composed of TLM models. TLM has become very popular in the modeling of bus-based systems and currently there is an increasing availability of libraries that provide TLM IPs. Heterogeneity can be naturally captured in a framework supporting different Models of Computation (MoCs). We introduce a novel approach for integrating TLM IPs/VPs into a MoC-based modeling framework, allowing them to co-simulate heterogeneous systems. This approach allows to raise the abstraction level, enabling a more careful design space exploration before selecting a proper VP. We exemplify the potential of our approach with a case study in which a VP with a processor generated by ArchC communicates with a continuous-time model.

  • 58. Bjureus, Per
    et al.
    Millberg, Mickael
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    FPGA Resource and Timing Estimation from Matlab Execution Traces2002In: Proceedings of the International Workshop on Hardware/Software Codesign, 2002Conference paper (Refereed)
  • 59. Bjuréus, Per
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Heterogenous System-level Cosimulation with SDL and Matlab1999In: Proceedings of the Forum on Design Languages (FDL), 1999Conference paper (Refereed)
  • 60. Bjuréus, Per
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Modeling of Mixed Control and Dataflow Systems in MASCOT2001In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 9, p. 690-704Article in journal (Refereed)
  • 61. Bjuréus, Per
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Performance Analysis with Confidence Intervals for Embedded Software Processes2001In: Proceedings of the International Symposium on System Synthesis (ISSS), 2001Conference paper (Refereed)
  • 62. Borkar, A.
    et al.
    Hayes, M.
    Smith, Mark T.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Detecting lane markers in a complex environment using a single camera approach2011In: Proceedings of the 8th IASTED International Conference on Signal Processing, Pattern Recognition, and Applications, 2011, p. 15-22Conference paper (Refereed)
    Abstract [en]

    Lane detection is an important application of driver assistance. In this paper, a new technique for detecting lane markers that is able to cope with many complex conditions is presented. Some of these conditions include dynamic illumination, scattered shadows, and the presence of neighboring vehicles to name a few. The input image is first pre-processed with a perspective removal transformation followed by a color space conversion. Then, the core elements of the proposed technique consisting of template matching, lane region merging, elliptical projections, and parametric tracking are explained. A formal error metric used in performance evaluation is also introduced. Finally, quantitative analyses show that the developed system performs well in real-world driving conditions with variations in illumination, traffic, and road surface quality.

  • 63. Candaele, Bernard
    et al.
    Aguirre, Sylvain
    Sarlotte, Michel
    Anagnostopoulos, Iraklis
    Xydis, Sotirios
    Bartzas, Alexandros
    Bekiaris, Dimitris
    Soudris, Dimitrios
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Xiaowen
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Chabloz, Jean-Michel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Vanmeerbeeck, Geert
    Kreku, Jari
    Tiensyrja, Kari
    Ieromnimon, Fragkiskos
    Kritharidis, Dimitrios
    Wiefrink, Andreas
    Vanthournout, Bart
    Martin, Philippe
    Mapping Optimisation for Scalable multi-core ARchiTecture: The MOSART approach2010In: Proceedings - IEEE Annual Symposium on VLSI, ISVLSI 2010, 2010, p. 518-523Conference paper (Refereed)
    Abstract [en]

    The project will address two main challenges of prevailing architectures: 1) The global Interconnect and memory bottleneck due to a single, globally shared memory with high access times and power consumption; 2) The difficulties in programming heterogeneous, multi-core platforms, in particular in dynamically managing data structures in distributed memory. MOSART aims to overcome these through a multi-core architecture with distributed memory organisation, a Network-on-Chip (NoC) communication backbone and configurable processing cores that are scaled, optimised and customised together to achieve diverse energy, performance, cost and size requirements of different classes of applications. MOSART achieves this by: A) Providing platform support for management of abstract data structures Including middleware services and a run-time data manager for NoC based communication infrastructure; 2) Developing tool support for parallelizing and mapping applications on the multi-core target platform and customizing the processing cores for the application.

  • 64. Candaele, Bernard
    et al.
    Aguirre, Sylvain
    Sarlotte, Michel
    Anagnostopoulos, Iraklis
    Xydis, Sotirios
    Bartzas, Alexandros
    Bekiaris, Dimitris
    Soudris, Dimitrios
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Xiaowen
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chabloz, Jean-Michel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Vanmeerbeeck, Geert
    Kreku, Jari
    Tiensyrja, Kari
    Ieromnimon, Fragkiskos
    Kritharidis, Dimitrios
    Wiefrink, Andreas
    Vanthournout, Bart
    Martin, Philippe
    The MOSART Mapping Optimization for multi-core Architectures2011In: VLSI 2010 Annual Symposium, Springer Publishing Company, 2011, p. 181-195Conference paper (Refereed)
    Abstract [en]

    MOSART project addresses two main challenges of prevailing architectures: (i) Theglobal interconnect and memory bottleneck due to a single, globally shared memorywith high access times and power consumption; (ii) The difficulties in programmingheterogeneous, multi-core platforms MOSART aims to overcome these through amulti-core architecture with distributed memory organization, a Network-on-Chip(NoC) communication backbone and configurable processing cores that are scaled,optimized and customized together to achieve diverse energy, performance, cost andsize requirements of different classes of applications. MOSART achieves this by:(i) Providing platform support for management of abstract data structures includingmiddleware services and a run-time data manager for NoC based communicationinfrastructure; (ii) Developing tool support for parallelizing and mapping applicationson the multi-core target platform and customizing the processing cores for theapplication.

  • 65. Cevrero, Alessandro
    et al.
    Athanasopoulos, Panagiotis
    Parandeh-Afshar, Hadi
    Verma, Ajay K.
    Attarzadeh Niaki, Hosein Seyed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Nicopoulos, Chrysostomos
    Gurkaynak, Frank K.
    Brisk, Philip
    Leblebici, Yusuf
    Ienne, Paolo
    Field Programmable Compressor Trees: Acceleration of Multi-Input Addition on FPGAs2009In: ACM Trans. Reconfigurable Technol. Syst., ISSN 1936-7406, Vol. 2, no 2, p. 1-36Article in journal (Refereed)
    Abstract [en]

    Multi-input addition occurs in a variety of arithmetically intensive signal processing applications. The DSP blocks embedded in high-performance FPGAs perform fixed bitwidth parallel multiplication and Multiply-ACcumulate (MAC) operations. In theory, the compressor trees contained within the multipliers could implement multi-input addition; however, they are not exposed to the programmer. To improve FPGA performance for these applications, this article introduces the Field Programmable Compressor Tree (FPCT) as an alternative to the DSP blocks. By providing just a compressor tree, the FPCT can perform multi-input addition along with parallel multiplication and MAC in conjunction with a small amount of FPGA general logic. Furthermore, the user can configure the FPCT to precisely match the bitwidths of the operands being summed. Although an FPCT cannot beat the performance of a well-designed ASIC compressor tree of fixed bitwidth, for example, 9×9 and 18×18-bit multipliers/MACs in DSP blocks, its configurable bitwidth and ability to perform multi-input addition is ideal for reconfigurable devices that are used across a variety of applications.

  • 66.
    Chabloz, Jean-Michel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Globally-Ratiochronous, Locally-Synchronous Systems2012Doctoral thesis, monograph (Other academic)
    Abstract [en]

    It is well recognized in the literature that the fully-synchronous design style, once the best choice due especially to the simplicity of its design flow, is not suitable for present-days systems, which contain many more gates compared to their predecessors, and has to be superseded to meet the new needs of the industry. The alternative solution that has enjoyed more success in industry and the literature consists in breaking down a system into several fully-synchronous modules clocked with independent clocks. Such systems go under the name of Globally-non-Synchronous (GnS) and make no assumption on the phase alignment between the clocks in the individual modules. GnS design styles do not require a globally balanced clock tree and employ special synchronizers to achieve latency-insensitivity. The individual modules, whose sizes are relatively small, remain fully-synchronous, thus easy to design andmaintain.

    Two main classes of GnS systems have been proposed: the GALS (for Globally-Asynchronous, Locally-Synchronous) design style allows each module to be clocked at its own independent clock frequency; the mesochronous design style constrains all modules to run at the same frequency. GALS systems support per-module Dynamic Voltage-Frequency Scaling (DVFS), but GALS interfaces are complex and introduce high performance penalties; mesochronous systems do not support per-module DVFS but support simpler and faster interfaces. It is well recognized that neither of the two design styles can fully satisfy all the contrasting needs of the electronic industry, and often hybrid solutions are deployed as a trade-off. We propose Globally-Ratiochronous, Locally-Synchronous (GRLS) systems, where GRLS is a design style intermediate between the mesochronous and the GALS design paradigms: local frequencies in a GRLS system do not need to be identical, but are required to be rationally-related (such as one being 3/4 or 2/5 of the other). The periodic properties of rationally-related systems allow the deployment of interfaces that do not use any form of handshake and, thanks to this, are much more performant than GALS interfaces; on the other hand, GRLS supports quantized per-module DVFS.

    In this work we deploy and analyse all the components of the GRLS design style: the frequency regulation system, the voltage regulation system, and the GRLS latency-insensitive interfaces. We perform a theoretical analysis of DVFS efficiency in different GRLS systems, and then study a GRLS NoC-based platform. We also develop a complete GRLS power management system for a GRLS Network-on-Chip (NoC)-based platform. Experimental results show that GRLS performances are close to those of mesochronous systems and GRLS flexibility is close to that of GALS systems, which results in high figures of merit for GRLS systems. As an example, the GRLS NoC-based platform we study in this work has at least ≈ 21% lower latency-power product compared to alternative mesochronous-GALS hybrid platforms, and respectively ≈ 32% and ≈ 48% better latency-power product compared to mesochronous and GALS platforms.

  • 67.
    Chabloz, Jean-Michel
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A Flexible Communication Scheme for Rationally-Related Clock Frequencies2009In: 2009 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN   , 2009, p. 109-116Conference paper (Refereed)
    Abstract [en]

    As a replacement for the fast-fading Globally-Synchronous model, we have defined a flexible design style for SoCs, called GRLS, for Globally-Ratiochronous, Locally-Synchronous, which does not rely on global synchronization and is based on using rationally-related clock frequencies derived from the same source. In this paper, using the special periodical properties of rationally-related systems, we build a latency-insensitive, maximal-throughput, low-overhead communication method, based on the idea of using both clock edges to sample data at the Receiver. The validity of the method and its resistance to non-idealities such as jitter, misalignments and clock drifts are formally proven while experimental results including overhead are presented for 90 nm technology. Despite allowing much greater flexibility, the overhead of our method is comparable to that of state-of-the-art mesochronous communication techniques. We also show performances, complexity and overhead improvements over all other approaches that have so far been proposed for rationally-related clock frequencies.

  • 68.
    Chabloz, Jean-Michel
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A GALS Network-on-Chip based on Rationally-Related Frequencies2011In: 2011 IEEE 29TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), LOS ALAMITOS: IEEE COMPUTER SOC , 2011, p. 12-18Conference paper (Refereed)
    Abstract [en]

    GALS Networks-on-Chip (NoCs) in which the frequency of every switch can be set independently would enable per-node DVFS without requiring asynchronous switch design. However, traditional GALS interfaces introduce high latency penalties and are therefore ill-suited for inter-switch links in a NoC. In this paper we introduce and study a GALS Network-on-Chip based on the Globally-Ratiochronous, Locally-Synchronous (GRLS) paradigm. GRLS constrains all switch frequencies to be rationally-related but enables the use of efficient interfaces which reduce the latency of the network 60% compared to GALS solutions and obtains better throughput-per-power ratios compared to synchronous and mesochronous solutions.

  • 69.
    Chabloz, Jean-Michel
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Distributed DVFS using rationally-related frequencies and discrete voltage levels2010In: Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design, IEEE , 2010, p. 247-252Conference paper (Refereed)
    Abstract [en]

    We have defined a flexible latency-insensitive design style called Globally Ratiochronous Locally Synchronous (GRLS), based on quantized voltage levels and rationally-related clock frequencies. In this paper we present the infrastructure necessary to enable Distributed DVFS in such a system and analyze its overheads, quantitatively showing how, with minimal overheads, we obtain energy benefits that are close to those of a totally ideal GALS approach. The benefits that we show, coupled with the complexity and performance benefits of GRLS, which we briefly analyze, show how this approach is a strong competitor to GALS.

  • 70.
    Chabloz, Jean-Michel
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lowering the Latency of Interfaces for Rationally-Related Frequencies2010In: 2010 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, 2010, p. 23-30Conference paper (Refereed)
    Abstract [en]

    We have introduced the Globally-Ratiochronous, Locally-Synchronous (GRLS) design paradigm, a design style based on rationally-related frequencies, with the objective to overcome the limitations of traditional multi-frequency systems by providing a flexibility close that of Globally-Asynchronous, Locally-Synchronous (GALS) systems but introducing performance penalties and overheads close to those of mesochronous systems. In this paper we focus on performances and improve the latency figures of our original GRLS interfaces by introducing two new interfaces, called GRLS-F and GRLS-noF, the first suitable for blocks with long computation time and the second for blocks with short computation time. The latency figures of the original GRLS interfaces are improved up to 50% without increasing complexity. The average latency figures of the resulting interfaces are lower than 1 Receiver clock cycle, the latency of a synchronous interface.

  • 71.
    Chabloz, Jean-Michel
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Low-latency and low-overhead mesochronous and plesiochronous synchronizers2011Conference paper (Refereed)
    Abstract [en]

    In this paper we present efficient Mesochronous and Plesiochronous interfaces targeting low-latency and low-overhead links. Our source-synchronous scheme can easily be integrated in traditional design flows, supports maximal throughput, has low latency and has an overhead of only three flipflops per data line. With one additional flipflop per data line, the Plesiochronous interface allows the synchronizer to cope with clock drifts. The simple synchronization scheme is validated through formal analysis and simulation.

  • 72.
    Chabloz, Jean-Michel
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Low-Latency Maximal-Throughput Communication Interfaces for Rationally Related Clock Domains2014In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 22, no 3, p. 641-654Article in journal (Refereed)
    Abstract [en]

    In this paper, we introduce a source-synchronous adaptive interface for the globally ratiochronous, locally synchronous design style, a subset of the globally asynchronous, locally synchronous (GALS) design style in which the frequencies of all clocks are not phase-aligned but are constrained to be rationally related, i.e., they are all submultiple of the same physical or virtual frequency. The interface can be designed using only standard cells and guarantees maximal throughput in addition to an average latency four times lower compared with state-of-the-art asynchronous first-input, first-output GALS interfaces. Several properties of the interface are formally stated and proved. We also demonstrate that the interface has a low area overhead, with only four flip-flops per data line, and is robust against nonidealities such as clock jitters and propagation delay misalignments. For a realistic link in 90-nm application-specific integrated circuit technology, we derive a 1-GHz upper bound for the least common multiple among the frequencies.

  • 73.
    Chabloz, Jean-Michel
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Low-latency no-handshake GALS interfaces for fast-receiver links2012In: Proceedings of the IEEE International Conference on VLSI Design, IEEE , 2012, p. 191-196Conference paper (Refereed)
    Abstract [en]

    In this paper we introduce a novel interface for Globally-Asynchronous, Locally-Synchronous systems which does not use any form of handshake to cross the gap between the clock domains. In particular, links in which the Receiver runs faster than the Transmitter are targeted. The interface works by finding an approximate ratio between the clock frequencies. Then, ratiochronous synchronizers that can tolerate clock drifts are employed to transmit data from the Transmitter to the Receiver clock domain. Thanks to the periodic properties of rationally-related systems, no handshake is employed and the average latency of the interface is decreased ∌ 75% compared to state-of-the-art GALS interfaces. Additionally, the interface uses only standard cells and, save for a delay line, can be designed at Register Transfer Level.

  • 74.
    Chabloz, Jean-Michel
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sharif Mansouri, Shohreh
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    An Algorithm for Constructing a Fastest Galois NLFSR Generating a Given Sequence2010In: SEQUENCES AND THEIR APPLICATIONS-SETA 2010 / [ed] Carlet C; Pott A, 2010, Vol. 6338, p. 41-54Conference paper (Refereed)
    Abstract [en]

    The problem of efficient implementation of security mechanisms for advanced contactless technologies like RFID is gaining increasing attention. Severe constraints on resources such as area, power consumption, and production cost make the application of traditional cryptographic techniques to these technologies a challenging task. Non-Linear Feedback Shift Register (NLFSR)-based stream ciphers are promising candidates for cryptographic primitives for RFIDs because they have the smallest hardware footprint of all existing cryptographic systems. This paper presents a heuristic algorithm for constructing a fastest Galois NLFSR generating a given sequence. The algorithm takes an NLFSR in the Fibonacci configuration and transforms it to an equivalent Galois NLFSR which has the minimal delay. Our key idea is to find a best position for a given feedback connection without changing the positions of the other feedback connections. We use a technology dependent cost function which approximates the delay of an NLFSR after the technology mapping. The experimental results on 57 NLFSRs used in existing stream ciphers show that, on average, the presented algorithm allows us to decrease the delay by 25.5% as well as to reduce the area by 4.1%.

  • 75.
    Chen, Jian
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Low Noise Oscillator in ADPLL toward Direct-to-RF All-digital Polar Transmitter2012Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    In recent years all-digital or digitally-intensive RF transmitters (TX) have attracted great attention in both literature and industry. The motivation is to implement RF circuits in a manner suiting advanced nanometer CMOS processes. To achieve that, information is encoded in the time-domain rather than voltage amplitude. This enables RF design to also benefit from CMOS process scaling. In this thesis an improved architecture of a digitally-intensive transmitter is proposed and validated experimentally. The techniques to lower oscillator phase noise and all-digital phase-locked loop (ADPLL) quantization noise are discussed and proved by both simulation and measurements.

    The impact of device sizing on 1/f^2 phase noise is analyzed and validated by measurements. Seven oscillators in 180-nm CMOS with the same LC-tank, operation frequency and power consumption but different core device width are compared. The conclusion clarify the different suggestions on device sizing in the literature. It is illustrated that tail noise contribution is strongly positive dependent to core device sizing, while the contribution of core devices themselves is weakly dependent. Measurements demonstrate that there is a 14-dB phase noise increase when sizing core devices from 40 um to 280 um in the case of noisy tail current. If tail current is clean, the increase is only 4 dB.  For 1/f^3 phase noise, the investigation reveals that the capacitance modulation is the dominant factor accounting for the 1/f or flick noise up-conversion, which is proved by measurements of 180-nm CMOS designs.   A class-C oscillator with ensured start-up and constant amplitude is presented. It achieves a 3.9-dB phase noise reduction in theory and 5-dB reduction in measurements, compared to a conventional LC-tank oscillator operating at the same frequency and power. With the help of a digital bias voltage and bias current control loop, a 191 Figure-of-Merit (FoM) is achieved, showing the ability for low power and noise application.   The previous oscillator optimization techniques have been applied in designing a digital controlled oscillator (DCO) for an ADPLL. A fine tuning varactor is proposed to reduce quantization noise, achieving a frequency step of only several hundreds Hz. In order to measure this small frequency step when the DCO is free-running, a method based on the narrow-band frequency modulation (FM) theory is proposed. The ADPLL wide-band FM is fulfilled by using a digital two-point modulation so that the modulation bandwidth is not limited by the ADPLL loop dynamic.

    Finally an all-digital polar TX is proposed based on an improved architecture. The ADPLL is used for FM while a one-bit low-pass Sigma Delta modulator using the phase modulated ADPLL output as the clock accomplishes amplitude modulation. A simple AND gate is adopted to increase the fundamental power as mixers. A class-D power amplifier stages diliver 6.8-dBm power to antenna through a on-chip band-pass pre-filter. The filter also acts as single-ended to differential-end conversion and matching network.

  • 76.
    Chen, Jian
    et al.
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jonsson, Fredrik
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Carlsson, Mats
    Hedenas, Charlotta
    Zheng, Li-Rong
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A Low Power, Startup Ensured and Constant Amplitude Class-C VCO in 0.18 mu m CMOS2011In: IEEE Microwave and Wireless Components Letters, ISSN 1531-1309, E-ISSN 1558-1764, Vol. 21, no 8, p. 427-429Article in journal (Refereed)
    Abstract [en]

    A low power and robust class-C voltage-controlled oscillator (VCO) is presented in this letter. It features 1) an automatic startup loop to achieve the optimal point and address the inherent risk of startup failure and 2) a digital amplitude control loop to stabilize amplitude and enhance the PVT ( process, voltage and temperature) tolerance. The design is implemented in a 0.18 mu m CMOS process. Measurement demonstrates the VCO has a 20% tuning range and phase noise of -123.0 dBc/Hz at 1 MHz offset from a 3.1 GHz carrier while consuming 1.57-mW power from a 1 V supply, yielding a Figure-of-Merit (FoM) of 191.1. While operating under the minimum power of 560 mu W, it produces -111.3 dBc/Hz phase noise at 1 MHz offset from a 3.1 GHz carrier showing a 183.8 FoM.

  • 77.
    Chen, Jian
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Jonsson, Fredrik
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Carlsson, Mats
    Hedenäs, Charlotta
    Zheng, Li-Rong
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Flicker noise conversion in CMOS LC oscillators: capacitance modulation dominance and core device sizing2011In: Analog Integrated Circuits and Signal Processing, ISSN 0925-1030, E-ISSN 1573-1979, Vol. 68, no 2, p. 145-154Article in journal (Refereed)
    Abstract [en]

    Flicker noise upconversion mechanisms in oscillators have been acquired in the literature, however their relative weights are still under investigation. It is desirable to find the dominant one, since a certain noise suppression method reduces one mechanism but may increase another. In this work, we propose a systematic simulation method to distinguish their relative impacts. The outcome indicates parasitic capacitance is the dominant factor for both tail 1/f noise and switch pair 1/f noise upconversions, implying to use small dimension core devices. Design guidelines on sizing devices are presented and two suppression techniques are compared. Two voltage-controlled oscillators (VCOs) with these suppression techniques are fabricated in a 0.18 mu m CMOS process, allowing us to compare their performance. The two VCOs can be Focused-Ion-Beam (FIB) trimmed to change the width of switch pair FETs. The fair comparison of measurement results among them verify the dominant role of parasitic capacitance in 1/f noise upconversion. The measurement results also confirm the design guidelines and demonstrate the difference of two suppression methods.

  • 78.
    Chen, Jian
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Jonsson, Fredrik
    Carlsson, Mats
    Hedenäs, Charlotta
    Zheng, Li-Rong
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Zhou, Dian
    Experimental Validation of Device Sizing on CMOS LC-VCO Phase NoiseManuscript (preprint) (Other academic)
    Abstract [en]

    This work investigates the impact of device sizingon phase noise in CMOS LC-tank oscillators, based on specificdesigns and careful measurements. It experimentally verified thepreviously published equations and clarified some conflictingdesign guidelines. The conclusions are grounded on the faircomparison of seven VCOs with the core device width varyingfrom 40 um to 280 um. These VCOs are originated from the samedie by using Focused Ion Beam (FIB), guaranteeing the sameorder of process variation. With the aid of a switched capacitorbank, they are able to operate at practically same oscillationfrequency under the same bias. These conditions assure the faircomparison. It validated that phase noise from tail devices isstrongly dependent to core device size (14 dB from measurements)while phase noise from core devices themselves shows smallerdependence (4 dB). Design guidelines, applying to different tailnoise cases, are concluded and generally advise the minimumcore device width especially when tail noise is dominant.

  • 79.
    Chen, Jian
    et al.
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jonsson, Fredrik
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Zheng, Li-Rong
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A Fast and Accurate Phase Noise Measurement of Free Running Oscillators Using a Single Spectrum Analyzer2010In: 28th Norchip Conference, NORCHIP 2010, 2010Conference paper (Refereed)
    Abstract [en]

    This paper presents a practical phase noise measurement approach, which only requires a spectrum analyzer and a computer, featuring fast setups, accurate results and low cost. Not like the conventional methods using extra assistant circuits to get rid of the frequency drift problem, this approach takes advantage of modern spectrum analyzers to acquire IQ data to calculate phase noise. The low quantization noise of the instrument makes this approach suitable for most CMOS integrated oscillators. The IQ data sampling time can be made small enough so that the frequency drift is not so obvious to harm the measurement accuracy. The experimental results clearly demonstrates the accuracy and the effectiveness of this method through measuring phase noise of two voltage controlled oscillators (VCOs) in 180nm CMOS process at 2.6 GHz and 3.0 GHz respectively.

  • 80.
    Chen, Jian
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Rong, Liang
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Jonsson, Fredrik
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Yang, Geng
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Zheng, Lirong
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    The Design of All-Digital Polar Transmitter based on ADPLL and Phase Synchronized Delta Sigma Modulator2012In: IEEE Journal of Solid-State Circuits, ISSN 0018-9200, E-ISSN 1558-173X, Vol. 47, no 5, p. 1154-1164Article in journal (Refereed)
    Abstract [en]

    An improved architecture of polar transmitter (TX) is presented. The proposed architectureis digitally-intensive and mainly composed of an all-digital PLL (ADPLL) for phasemodulation, a 1-bit low-pass delta sigma (Delta Sigma) modulator for envelop modulation, and aH-bridge class-D power amplifier (PA) for differential signaling. The (Delta Sigma) modulator isclocked using the phase modulated RF carrier to ensure phase synchronization between theamplitude and phase path, and to guarantee the PA is switching at zero crossings of theoutput current.An on chip pre-filter is used to reduce the parasitic capacitance from packages at theswitch stage output. The high over sampling ratio of the (Delta Sigma) modulator move quantizationnoise far away from the carrier frequency, ensuring good in-band performance and relax filterrequirements. The on-chip filter also acts as impedance matching and differential to singleended conversion. The measured digital transmitter consumes 58 mW from a 1 V at 6.8 dBm output power.

  • 81.
    Chen, Jian
    et al.
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Rong, Liang
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Jonsson, Fredrik
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Zheng, Li-Rong
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    All-digital transmitter based on ADPLL and phase synchronized delta sigma modulator2011In: Radio Frequency Integrated Circuits Symposium (RFIC), 2011 IEEE, IEEE , 2011, p. 1-4Conference paper (Refereed)
    Abstract [en]

    A novel architecture of all-digital polar transmitters is proposed, mainly composed of an all digital PLL (ADPLL) for phase modulation, a 1-bit low-pass delta sigma (ΔΣ) modulator for envelop modulation and a high efficiency class-D PA. The low noise ADPLL and high oversample ΔΣ modulator relax filter design, enabling the use of a on-chip filter. The differential signaling scheme enhances the power of the fundamental tone and suppresses DC and high harmonics. The transmitter was fabricated in a 90nm digital CMOS process, occupying 1.4 mm2. The measurement results demonstrate effectiveness of the architecture. The digital transmitter consumes 58 mW power from a 1 V supply, delivering a 6.81-dBm output.

  • 82.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Area and Performance Optimization of Barrier Synchronization on Multi-core Network-on-Chips2010In: 3rd IEEE International Conference on Computer and Electrical Engineering (ICCEE), 2010Conference paper (Refereed)
    Abstract [en]

    Barrier synchronization is commonly and widelyused to synchronize the execution of parallel processor coreson multi-core Network-on-Chips (NoCs). Since its globalnature may cause heavy serialization resulting in largeperformance penalty, barrier synchronization should becarefully designed to have low latency communication and tominimize overall completion time. Therefore, in the paper, wepropose a fast barrier synchronization mechanism, targetingMulti-core NoCs. The fast barrier synchronization mechanismincludes a dedicated hardware module, named Fast BarrierSynchronizer (FBS), integrated with each processor node. Itoffers a set of barrier counters and can concurrently processsynchronization requests issued by the local node and remotenodes via the on-chip network. The salient feature of our fastbarrier synchronization mechanism is that, once the barriercondition is reached, the “barrier release” acknowledgement isrouted to all processor nodes in a broadcast way in order tosave chip area by avoiding storing source node informationand to minimize completion time by avoiding serialization ofbarrier releasing. Synthesis results suggest that the FBS canrun over 1 GHz in SMIC® 130nm technology with small areaoverhead. We implemented a FBS-enhanced multi-core NoCarchitecture on our FPGA platform using the Xilinx® Virtex 5as the FPGA chip. FPGA utilization and simulation resultsshow that our fast barrier synchronization demonstrates botharea and performance advantages over the barriersynchronization counterpart with unicast barrier releasing.

  • 83.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hybrid distributed shared memory space in multi-core processors2011In: Journal of Software, ISSN 1796-217X, Vol. 6, no 12 SPEC. ISSUE, p. 2369-2378Article in journal (Refereed)
    Abstract [en]

    On multi-core processors, memories are preferably distributed and supporting Distributed Shared Memory (DSM) is essential for the sake of reusing huge amount of legacy code and easy programming. However, the DSM organization imports the inherent overhead of translating virtual memory addresses into physical memory addresses, resulting in negative performance. We observe that, in parallel applications, different data have different properties (private or shared). For the private data accesses, it's unnecessary to perform Virtual-to-Physical address translations. Even for the same datum, its property may be changeable in different phases of the program execution. Therefore, this paper focuses on decreasing the overhead of Virtualto- Physical address translation and hence improving the system performance by introducing hybrid DSM organization and supporting run-time partitioning according to the data property. The hybrid DSM organization aims at supporting fast and physical memory accesses for private data and maintaining a global and single virtual memory space for shared data. Based on the data property of parallel applications, the run-time partitioning supports changing the hybrid DSM organization during the program execution. It ensures fast physical memory addressing on private data and conventional virtual memory addressing on shared data, improving the performance of the entire system by reducing virtual-to-physical address translation overhead as much as possible. We formulate the run-time partitioning of hybrid DSM organization in order to analyze its performance. A real DSM based multi-core platform is also constructed. The experimental results of real applications show that the hybrid DSM organization with run-time partitioning demonstrates performance advantage over the conventional DSM counterpart. The percentage of performance improvement depends on problem size, way of data partitioning and computation/communication ratio of parallel applications, network size of the system, etc. In our experiments, the maximal improvement is 34.42%, the minimal improvement 3.68%.

  • 84.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Xu, Bangjian
    Luo, Heng
    Multi-FPGA Implementation of a Network-on-Chip Based Many-core Architecture with Fast Barrier Synchronization Mechanism2010In: Proceedings of the IEEE Norchip Conference, 2010Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose a fast barrier synchronization mechanism, targetingNetwork-on-Chip based manycore architectures. Its salient feature is that, once thebarrier condition is reached, the "barrier release" acknowledgement is routed to all processor nodes in a broadcast way in order to save area by avoiding storing source node information and to minimize completion time by eliminating serialization of barrierreleasing. Then, we construct a multi-FPGA platform using Xilinx® Virtex 5 as FPGA chipsand implement a NoC based many-core architecture on it. FPGA utilization and simulation results show that our mechanism demonstrates both area and performance advantages over the barrier synchronization counterpart with unicast barrier releasing. 

  • 85.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Run-time Partitioning of Hybrid Distributed Shared Memory on Multi-core Network-on-Chips2010In: The 3rd IEEE International Symposium on Parallel Architectures, Algorithms and Programming (PAAP 2010), 2010, p. 39-46Conference paper (Refereed)
    Abstract [en]

    On multi-core Network-on-Chips (NoCs), mem- ories are preferably distributed and supporting Distributed Shared Memory (DSM) is essential for the sake of reusing huge amount of legacy code and easy programming. However, the DSM organization imports the inherent overhead of translating virtual memory addresses into physical memoryaddresses, resulting in negative performance. We observe that, in parallel applications, different data have different properties (private or shared). For the private data accesses, it's unnecessary to perform Virtual-to-Physical address translations. Even for the same datum, its property may be changeable in different phases of the program execution. Therefore, this paper focuses on decreasing the overhead of Virtual-to-Physical address translation and hence improving the system performance by introducing hybrid DSM organization and supporting run-time partitioning according to the data property. Thehybrid DSM organization aims at supporting fast and physical memory accesses for private data and maintaining a global and single virtual memory space for shared data. Based on the data property of parallel applications, the run-time partitioning supports changing the hybrid DSM organization during the program execution. It ensures fast physical memory addressing on private data and conventional virtual memory addressingon shared data, improving the performance of the entire system by reducing virtual-to-physical address translation overhead as much as possible. We formulate the run-timepartitioning of hybrid DSM organization in order to analyze its perfor- mance. A real DSM based multi-core NoC platform is also constructed. The experimental results of real applications show that the hybrid DSM organization with run-time partitioningdemonstrates performance advantage over the conventional DSM counterpart. The percentage of performance improve- ment depends on problem size, way of datapartitioning and computation/ communication ratio of parallel applications, network size of the system, etc. In our experiments, the maximal improvement is 34.42%, the minimal improvement 3.68%.

  • 86.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Handling Shared Variable Synchronization in Multi-core Network-on-Chips with Distributed Memory2010In: Proceedings: IEEE International SOC Conference, SOCC 2010, 2010, p. 467-472Conference paper (Refereed)
    Abstract [en]

    Parallelized shared variable applications running on multi-core Network-on-Chips(NoCs) require efficient support for synchronization, since communication is on the critical path of system performance and contended synchronization requests may cause large performance penalty. In this paper, we propose a dedicated hardware module forsynchronization management. This module is called Synchronization Handler (SH), integrated with each processor-memory node on the multi-core NoCs. It uses two physical buffers to concurrently process synchronization requests issued by the local processor and remote processors via the on-chip network. One salient feature is that the two physical buffers are dynamically allocated to form multiple virtual buffers (a virtual buffer is related to a shared synchronization variable) so as to improve the buffer utilization and alleviate the head-of-line blocking. Synthesis results suggest that the SH can run over 900 MHz in 130nm technology with small area overhead. To justify the SH-enhanced multicore NoCs, we employ synthetic workloads to evaluate synchronizationcost and buffer utilization, and run synchronization-intensive applications to investigate speedup. The results show that our approach is viable.

  • 87.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Supporting Distributed Shared Memory on Multi-core Network-on-Chips Using a Dual Microcoded Controller2010In: Proceedings of the confernece for Design Automation and Test in Europe, 2010, p. 39-44Conference paper (Refereed)
    Abstract [en]

    Supporting Distributed Shared Memory (DSM) is essential for multi-coreNetwork-on-Chips for the sake of reusing huge amount of legacy code and easy programmability. We propose a microcoded controller as a hardware module in each node to connect the core, the local memory and the network. The controller is programmable where the DSM functions such as virtual-to-physical address translation,memory access and synchronization etc. are realized using microcode. To enable concurrent processing of memory requests from the local and remote cores, ourcontroller features two mini-processors, one dealing with requests from the local coreand the other from remote cores. Synthesis results suggest that the controller consumes 51k gates for the logic and can run up to 455 MHz in 130 nm technology. To evaluate its performance, we use synthetic and application workloads. Results show that, when the system size is scaled up, the delay overhead incurred by the controller may become less significant when compared with the network delay. In this way, the delay efficiency of our DSM solution is close to hardware solutions on average but still have all the flexibility of software solutions.

  • 88.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Supporting Efficient Synchronization in Multi-core NoCs Using Dynamic Buffer Allocation Technique2010In: Proceedings of the IEEE Annual Symposium on VLSI, 2010, p. 462-463Conference paper (Refereed)
    Abstract [en]

    This paper explores a dynamic buffer allocation technique to guide a distributedsynchronization architecture to support efficient synchronization on multi-core Network-on-Chips (NoCs). The synchronization architecture features two physical buffers to be able to concurrently queue and handle synchronization requests issued by the local processor and remote processors via the on-chip network. Using the dynamic bufferallocation technique, the two physical buffers are dynamically allocated to form multiple virtual buffers in order to improve buffers' utilization. Experiments are carried on to evaluate buffers' utilization.

  • 89.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Chen, Shenggang
    Gu, Huitao
    Reducing Virtual-to-Physical address translation overhead in Distributed Shared Memory based multi-core Network-on-Chips according to data property2013In: Computers & electrical engineering, ISSN 0045-7906, E-ISSN 1879-0755, Vol. 39, no 2, p. 596-612Article in journal (Refereed)
    Abstract [en]

    In Network-on-Chip (NoC) based multi-core platforms, Distributed Shared Memory (DSM) preferably uses virtual addressing in order to hide the physical locations of the memories. However, this incurs performance penalty due to the Virtual-to-Physical (V2P) address translation overhead for all memory accesses. Based on the data property which can be either private or shared, this paper proposes a hybrid DSM which partitions a local memory into a private and a shared part. The private part is accessed directly using physical addressing and the shared part using virtual addressing. In particular, the partitioning boundary can be configured statically at design time and dynamically at runtime. The dynamic configuration further removes the V2P address translation overhead for those data with changeable property when they become private at runtime. In the experiments with three applications (matrix multiplication, 2D FFT, and H.264/AVC encoding), compared with the conventional DSM, our techniques show performance improvement up to 37.89%.

  • 90.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. National University of Defense Technology, China .
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Guo, Yang
    Liu, Hengzhu
    Cooperative communication for efficient and scalable all-to-all barrier synchronization on mesh-based many-core NoCs2014In: IEICE Electronics Express, ISSN 1349-2543, E-ISSN 1349-2543, Vol. 11, no 18, p. 20140542-Article in journal (Refereed)
    Abstract [en]

    On many-core Network-on-Chips (NoCs), communication is on the critical path of system performance and contended synchronization requests may cause large performance penalty. Different from conventional algorithm-based approaches, the paper addresses the barrier synchronization problem from the angle of optimizing its communication performance and proposes cooperative communication as a means to achieve efficient and scalable all-to-all barrier synchronization on mesh-based many-core NoCs. With the cooperative communication, routers collaborate with one another to accomplish a fast barrier synchronization task. The cooperative communication is implemented in our router at low cost. Through comparative experiments, our approach evidently exhibits high efficiency and good scalability.

  • 91.
    Chen, Xiaowen
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Shuming
    Liu, Hai
    Cooperative communication based barrier synchronization in on-chip mesh architectures2011In: IEICE ELECTRON EXPR, ISSN 1349-2543, Vol. 8, no 22, p. 1856-1862Article in journal (Refereed)
    Abstract [en]

    We propose cooperative communication as a means to enable efficient and scalable barrier synchronization on mesh-based many-core architectures. Our approach is different from but orthogonal to conventional algorithm-based optimizations. It relies on collaborating routers to provide efficient gather and multicast communication. In conjunction with a master-slave algorithm, it exploits the mesh regularity to achieve efficiency. The gather and multicast functions have been implemented in our router. Synthesis results suggest marginal area overhead. With synthetic and benchmark experiments, we show that our approach significantly reduces synchronization completion time and increases speedup.

  • 92. Chen, Y.
    et al.
    Xie, L.
    Li, J.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A deadlock-free fault-tolerant routing algorithm based on pseudo-receiving mechanism for networks-on-chip of CMP2011In: 2011 International Conference on Multimedia Technology, ICMT 2011, 2011, p. 2825-2828Conference paper (Refereed)
    Abstract [en]

    As the size of CMOS technology scales down to nanometers domain, fault-tolerant is becoming a challenge for NoC. Turn model provides a simple and efficient systematic approach to the development of deadlock-free routing algorithms. In this paper, we propose a pseudo-receiving mechanism based on the support of local processor's cache to enable prohibited turn, and meanwhile make it keep deadlockfree. We present a fault-tolerant routing algorithm based on pseudo-receiving mechanism for 2D mesh. The routing algorithm is livelock-free in the cost of disable a few un-faulty links or nodes. The algorithm is applied to a single-cycle fixed output-buffered router. Experimental results show that, it achieves high performance even under high faulty rate.

  • 93. Chen, Y.
    et al.
    Xie, L.
    Li, J.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Slice router: For fine-granularity fault-tolerant Networks-on-Chip2011In: 2011 International Conference on Multimedia Technology, ICMT 2011, 2011, p. 3230-3233Conference paper (Refereed)
    Abstract [en]

    Almost all existing Networks-on-Chip (NoC) faulttolerant schemes are based on fault-tolerant routing algorithms. In these fault-tolerant schemes, faulty links or routers will be discarded all together. However, only a few part of the discarded link or router is faulty in most cases. It is wasteful to discard the whole link or router. In this paper, we present a slice router architecture which can be used in fine-granularity fault-tolerant NoC. The major motivation of presenting slice router is to refine faulty links and routers. The major idea is that a router is split into several sub-link routers, noted slices. Different from several physically independent routers, slices are coupled together in input/output ports. The coupling of slices makes the network to be able to fine-granularity fault-tolerant. In order to evaluate the fault-tolerant capability of slice routers, we design a looselycoupled 4-slices router with a backup sub-link in each link. Each slice is a single-cycle output buffered switch. Simulation results prove its fault-tolerant capability in the present of high faulty rates. The critical latency is only increased 0.04ns, because the configuration of slice interfaces is parallel with the output arbiter of slices. Under 65nm technology synthesized results show that, the increased area overhead of a slice router is only a few logic gates compared with the non-coupled slice router.

  • 94. Chen, Yancang
    et al.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Xie, Lunguo
    Li, Jinwen
    Zhang, Minxuan
    A single-cycle output buffered router with layered switching for Networks-on-Chips2012In: Computers & electrical engineering, ISSN 0045-7906, E-ISSN 1879-0755, Vol. 38, no 4, p. 906-916Article in journal (Refereed)
    Abstract [en]

    We present a single-cycle output buffered router based on layered switching for networks on chips (NoCs). Different from state-of-the-art NoC routers, the router has three important characteristics: (1) It employs layered switching, which implements wormhole on top of virtual cut-through (VCT) switching; (2) In contrast to input buffered architectures, it adopts an output buffered architecture; (3) It is single cycle, meaning that the router pipeline takes only one cycle for all flits. Experimental results show that the router achieves up to 80% of ideal network throughput under uniform random traffic pattern. Compared with wormhole switching, layered switching achieves up to 36.9% latency reduction for 12-flit packets under uniform random traffic with an injection rate of 0.5 flit/cycle/node. Under 65 nm technology synthesized results show that its critical path has only 20 logic gates, and it reduces 11% area compared to the input virtual-channel router with the same buffer capacity.

  • 95.
    Collin, Mikael
    et al.
    KTH, School of Information and Communication Technology (ICT), Communication: Services and Infrastucture, Software and Computer Systems, SCS.
    Brorsson, Mats
    KTH, School of Information and Communication Technology (ICT), Communication: Services and Infrastucture, Software and Computer Systems, SCS.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A performance and energy exploration of dictionary code compression architectures2011In: 2011 International  Green Computing Conference and Workshops (IGCC), IEEE conference proceedings, 2011, p. 1-8Conference paper (Refereed)
    Abstract [en]

    We have made a performance and energy exploration of a previously proposed dictionary code compression mechanism where frequently executed individual instructions and/or sequences are replaced in memory with short code words. Our simulated design shows a dramatically reduced instruction memory access frequency leading to a performance improvement for small instruction cache sizes and to significantly reduced energy consumption in the instruction fetch path. We have evaluated the performance and energy implications of three architectural parameters: branch prediction accuracy, instruction cache size and organization. To asses the complexity of the design we have implemented the critical stages in VHDL.

  • 96. Daneshtalab, M.
    et al.
    Ebrahimi, M.
    Liljeberg, P.
    Plosila, J.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    CMIT: A novel cluster-based topology for 3D stacked architectures2010In: IEEE 3D System Integration Conference 2010, 3DIC 2010, 2010Conference paper (Refereed)
    Abstract [en]

    Combining the benefits of 3D IC and Network-on-Chip (NoC) schemes, provides a significant performance gain for 3D stacked architectures. In recent years, Through-Silicon-Via (TSV), employed for inter-layer connectivity (vertical channel), has attracted a lot of interest since it enables faster and more power efficient inter-layer communication across multiple stacked layers. However, the area overhead of TSVs reduces wafer utilization and yield which impact design of 3D architectures using a large number of TSVs. In this paper, we propose a novel stacked topology, named CMIT (Cluster Mesh Inter-layer Topology) for 3D architectures to reduce the area overhead of TSVs and power dissipation on each layer with minimal performance penalty. Experimental results with synthetic test cases demonstrate that the presented topology can save more than 75% of TSV area footprint and reduces more than 10% of power consumption with a negligible performance overhead.

  • 97. Daneshtalab, M.
    et al.
    Ebrahimi, M.
    Liljeberg, P.
    Plosila, J.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. University of Turku, Finland.
    High-performance on-chip network platform for memory-on-processor architectures2011In: 6th International Workshop on Reconfigurable Communication-Centric Systems-on-Chip, ReCoSoC 2011 - Proceedings, 2011, article id 5981509Conference paper (Refereed)
    Abstract [en]

    Three Dimensional Integrated Circuits (3D ICs) are emerging to improve existing Two Dimensional (2D) designs by providing smaller chip areas, higher performance and lower power consumption. Stacking memory layers on top of a multiprocessor layer (logic layer) is a potential solution to reduce wire delay and increase the bandwidth. To fully employ this capability, an efficient on-chip communication platform is required to be integrated in the logic layer. In this paper, we present an on-chip network platform for the logic layer utilizing an efficient network interface to exploit the potential bandwidth of stacked memory-on-processor architectures. Experimental results demonstrate that the platform equipped with the presented network interface increases the performance considerably.

  • 98. Daneshtalab, M.
    et al.
    Ebrahimi, M.
    Liljeberg, P.
    Plosila, J.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. University of Turku, Finland.
    High-Performance TSV Architecture for 3-D ICs2010In: Proceedings - IEEE Annual Symposium on VLSI, ISVLSI 2010, 2010, p. 467-468, article id 5572813Conference paper (Refereed)
    Abstract [en]

    Three-dimensional integrated circuits (3-D ICs) outperform traditional planar ICs in terms of performance, packaging density, interconnection power consumption, and functionality. Since the performance of 3-D ICs employing Through Silicon Vias (TSVs) depends on vertical interlayer interconnects, in this paper we present a high-performance bus architecture for TSVs.

  • 99. Daneshtalab, M.
    et al.
    Ebrahimi, M.
    Liljeberg, P.
    Plosila, J.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Memory-Efficient On-Chip Network With Adaptive Interfaces2012In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, ISSN 0278-0070, E-ISSN 1937-4151, Vol. 31, no 1, p. 146-159Article in journal (Refereed)
    Abstract [en]

    To achieve higher memory bandwidth in network-based multiprocessor architectures, multiple dynamic random access memories can be accessed simultaneously. In such architectures, not only resource utilization and latency are the critical issues but also a reordering mechanism is required to deliver the response transactions of concurrent memory accesses in-order. In this paper, we present a memory-efficient on-chip network architecture to cope with these issues efficiently. Each node of the network is equipped with a novel network interface (NI) to deal with out-of-order delivery, and a priority-based router to decrease the network latency. The proposed NI exploits a streamlined reordering mechanism to handle the in-order delivery and utilizes the advance extensible interface transaction-based protocol to maintain compatibility with existing intellectual property cores. To improve the memory utilization and reduce the memory latency, an optimized memory controller is integrated in the presented NI. Experimental results with synthetic test cases demonstrate that the proposed on-chip network architecture provides significant improvements in average network latency (16%), average memory access latency (19%), and average memory utilization (22%).

  • 100. Daneshtalab, M.
    et al.
    Ebrahimi, M.
    Liljeberg, P.
    Plosila, J.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. University of Turku, Finland.
    Pipeline-based interlayer bus structure for 3D networks-on-chip2010In: Proceedings - 15th CSI International Symposium on Computer Architecture and Digital Systems, CADS 2010, 2010, p. 35-41, article id 5623524Conference paper (Refereed)
    Abstract [en]

    The structure of direct vertical interconnections, called Through Silicon Vias (TSVs), is an important issue in the realm of 3D ICs. The bus-based and network-based structures are the two dominant architectures for implementing TSVs as interlayer connection in 3D ICs. Both implementations have some disadvantages. The former suffers from poor scalability and deteriorates the performance at high injection rates, and the latter consumes more area and power dissipation. In this paper, we propose a novel pipeline bus structure for TSVs to improve the performance of the prior bus-based architecture. The presented structure can utilize bi-synchronous FIFO for synchronization between stacked layers if each layer is fabricated by different technologies. Experimental results with synthetic test cases demonstrate that the proposed architecture gives significant improvements in average network latency. Also, the hardware area and power consumption of the presented bus structure are 9% and 11% less than the typical bus structure of TSVs, respectively.

1234567 51 - 100 of 633
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf