Change search
Refine search result
6789101112 401 - 450 of 633
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 401.
    Mathew, J.
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Tomas, J.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Dubrova, Elena
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Two Fault Tolerant MIPS Processor Architectures for NOC Applications2003In: Proceedings of NORCHIP’03, 2003Conference paper (Refereed)
  • 402. Meganathan, D.
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A low-power, medium-resolution, high-speed CMOS pipelined ADC2010In: 28th Norchip Conference, NORCHIP 2010, 2010, p. 5669438-Conference paper (Refereed)
    Abstract [en]

    This paper presents the systematic design approach of a low-power, medium-resolution, high-speed pipelined Analog-to-Digital Converter (ADC). The ADC is implemented in 180nm digital CMOS technology. The converter achieves signal-to-noise distortion ratio of 59.8 dB, spurious-free dynamic range of 89 dB and effective number of bits of 9.64-bits at sampling speed of 50MHz with an input signal frequency of 4MHz. The peak differential-nonlinearity of the converter is 0.28/-0.17LSB and integral-nonlinearity of the converter is +0.42/-0.41LSB. The proposed 10-bit, 50MS/sec pipelined ADC consumes 24.5mW amount of power from 1.8V supply.

  • 403.
    Meincke, Thomas
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    Ellervee, Peeter
    Öberg, Johnny
    Kumar, Shashi
    Lindqvist, Dan
    Ericsson AB.
    Tenhunen, Hannu
    KTH, Superseded Departments, Electronic Systems Design.
    Postula, Adam
    Univ. of Queensland.
    Evaluating benefits of Globally Asynchronous Locally Synchronous VLSI Architecture1998Conference paper (Refereed)
  • 404.
    Meincke, Thomas
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Kumar, Shashi
    KTH, School of Information and Communication Technology (ICT).
    Ellervee, Peeter
    KTH, School of Information and Communication Technology (ICT).
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Olsson, Thomas
    Dept. of Applied Electronics, Univ. of Lund.
    Nilsson, Peter
    Dept. of Applied Electronics, Univ. of Lund.
    Lindqvist, Dan
    Dept. of Computer Science, IIT New Delhi.
    Tenhunen, Hannu
    KTH, Superseded Departments, Electronic Systems Design.
    Globally asynchronous locally synchronous architecture for large high-performance ASICs1999In:  , 1999, Vol. 2, p. 512-515Conference paper (Refereed)
    Abstract [en]

    Clock nets are the major source of power consumption in large, high-performance ASICs and a design bottleneck when it comes to tolerable clock skew. A way to obviate the global clock net is to partition the design into large synchronous blocks each having its own clock. Data with other blocks is exchanged asynchronously using handshake signals. Adopting such a strategy requires a methodology that supports: 1) a partitioning method dividing a design into the number of synchronous blocks such that the gain due to global clock net removal exceeds the communication overhead and 2) synthesis of handshake protocols to implement the data transfer between synchronous blocks. We describe this methodology and present results of applying it to a realistic design done in 0.25 micron, ranging in operating frequencies from 20 MHz to 1 GHz. The results show that the net power savings compared to fully synchronous designs are on an average about 30%

  • 405. Meincke, Thomas
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Ellervee, Peeter
    Hemani, Ahmed
    Tenhunen, Hannu
    KTH, Superseded Departments, Electronic Systems Design.
    A Generic Scheme for Communication Representation and Mapping1999In:  , 1999Conference paper (Refereed)
  • 406.
    Millberg, Mikael
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Architectural Techniques for Improving Performance in Networks on Chip2011Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The main aim of this thesis is to propose enhancing techniques for the performance in Networks on Chips. In addition, a concrete proposal for a protocol stack within our NoC platform Nostrum is presented. Nostrum inherently supports both Best Effort as well as Guaranteed Throughput traffic delivery. It employs a deflective routing scheme for best effort traffic delivery that gives a small footprint of the switches in combination with robustness to disturbances in the network. For the traffic delivery with hard guarantees a TDMA based scheme is used. During the transmission process in a NoC several stages are involved. In the papers included, I propose a set of strategies to enhance the performance in several of these stages. The strategies are summarised as follows

    Temporally Disjoint Networks is that a physical network, potentially, can be seen to contain a set of separate networks that a packet can enter dependenton when it enters the physical network. This has the consequence that wecould have different traffic types in the different networks.

    Looped containers provide means to set up virtual circuits in networksusing deflective routing. High priority container packets are inserted intothe network to follow a predefined, closed, route between source and destination.At sender side the packets are loaded and sent to the destination where it is unloaded and sent back.

    Proximity Congestion Awareness reduces the load of the network by diverting packets away from congested areas. It can increase the maximum trafficload by a factor of 20.

    Dual Packet Exit increases the exit bandwidth of the network leading to a50 percent reduction in worst-case latency and a 30 percent reduction inaverage latency as well as a lowered buffer usage.

    Priority Based Forced Requeue prematurely lifts out low priority packetsfrom the network to be requeued. Packets that have not yet entered the network compete with packets inside the network which gives tighter boundson admission with a reduction of worst case latencies by 50 percent.

    Furthermore, Operational Efficiency is proposed as a measure to quantifyhow effective a network is and is defined as the throughput per buffers used in the system. An increase of the injection of packets into the network to increase the system throughput will have a cost associated to it and can be optimised to save energy.

  • 407.
    Millberg, Mikael
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Increasing NoC performance and utilisation using a Dual Packet Exit strategy2007In: DSD 2007: 10TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN ARCHITECTURES, METHODS AND TOOLS, PROCEEDINGS / [ed] Kubatova, H, LOS ALAMITOS: IEEE COMPUTER SOC , 2007, p. 511-518Conference paper (Refereed)
    Abstract [en]

    When designing a network the use of buffers is inevitable. Buffers are used at the entry point, inside and at the exits of the network. The usage of these buffers significantly changes the performance of the system. as a whole. In order to enhance the buffer utilisation the concept of letting more than one packet exit the network at every switch each clock cycle is introduced - Dual Packet Exit (DPE). The approach is tried on a 4x4 and a 6x6 mesh. We demonstrate the buffers used in combination with different routing strategies for best effort performance. The result we present shows a 50% reduction in terms of worst case latency and a 30% reduction in terms of average latency as well as an increased throughput both from a system and network perspective. We define the term Operational Efficiency as a measure of the network efficiency and show that it increases by roughly 20 % with the DPE technique.

  • 408.
    Muddukrishna, Ananya
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Brorsson, Mats
    KTH, School of Information and Communication Technology (ICT), Communication: Services and Infrastucture, Software and Computer Systems, SCS.
    Vlassov, Vladimir
    KTH, School of Information and Communication Technology (ICT), Communication: Services and Infrastucture, Software and Computer Systems, SCS.
    A Locality Approach to Architecture-aware Task-scheduling in OpenMP2011Conference paper (Refereed)
    Abstract [en]

    Multicore and other parallel computer systems increasingly expose architectural aspects such as different memory access latencies depending on the physical memory address/location. In order to achieve high performance, programmers need to take these non-uniformities into consideration but this not only complicates the programming process but also leads to code that is not performance portable between different architectures.

    Task-centric programming models, such as OpenMP tasks, relieve the programmer from explicitly mapping computation on threads while still enabling effective resource management. We propose a task scheduling approach which uses programmer annotations and architecture awareness to identify the location of data regions that are operated upon by an OpenMP task. We have made an initial implementation of such a locality-aware OpenMP task scheduler for the Tilera TilerPro64 architecture and provide some initial results showing its effectiveness in fulfilling the need to minimize non-uniform access latencies to data and resources.

  • 409.
    Muddukrishna, Ananya
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jonsson, Peter A.
    SICS Swedish ICT AB.
    Brorsson, Mats
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS. SICS Swedish ICT AB.
    Locality-aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors2015In: Scientific Programming, ISSN 1058-9244, E-ISSN 1875-919X, article id 981759Article in journal (Refereed)
    Abstract [en]

    Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and on manycore processors is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor, and we identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers.

  • 410.
    Mäntysalo, Matti
    et al.
    TUT.
    Xie, Li
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Jonsson, Fredrik
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Feng, Yi
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Cabezas, Ana Lopez
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Zheng, Li-Rong
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    System integration of smart packages using printed electronics2012In: Electronic Components and Technology Conference (ECTC), 2012 IEEE 62nd, IEEE , 2012, p. 997-1002Conference paper (Refereed)
    Abstract [en]

    The last decade has shown enormous interest in additive and printed electronics manufacturing technologies, especially in intelligent packaging. Scientists and engineers all over the world are developing printed organic circuits. Despite their effort, the performance and yield of all-printed devices cannot replace silicon-based devices in smart package applications. Therefore, we have developed a hybrid interconnection platform to seamlessly integrate printed electronics with silicon-based electronics, close the gap between the two technologies, and to anticipate adaption of printed electronic technologies. We studied the suitability of a printed interconnection platform by fabricating a printed sensor-box that contains printed nano-Ag-interconnections on low-temperature plastic, a printable humidity sensor based on functionalized MWCNTs, a printed battery, conventional SMDs, and a silicon-based MCU.

  • 411.
    Naeem, Abdul
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Architecture Support and Scalability Analysis of Memory Consistency Models in Network-on-Chip based Systems2013Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The shared memory systems should support parallelization at the computation (multi-core), communication (Network-on-Chip, NoC) and memory architecture levels to exploit the potential performance benefits. These parallel systems supporting shared memory abstraction both in the general purpose and application specific domains are confronting the critical issue of memory consistency. The memory consistency issue arises due to the unconstrained memory operations which leads to the unexpected behavior of shared memory systems. The memory consistency models enforce ordering constraints on the memory operations for the expected behavior of the shared memory systems. The intuitive Sequential Consistency (SC) model enforces strict ordering constraints on the memory operations and does not take advantage of the system optimizations both in the hardware and software. Alternatively, the relaxed memory consistency models relax the ordering constraints on the memory operations and exploit these optimizations to enhance the system performance at the reasonable cost. The purpose of this thesis is twofold. First, the novel architecture supports are provided for the different memory consistency models like: SC, Total Store Ordering (TSO), Partial Store Ordering (PSO), Weak Consistency (WC), Release Consistency (RC) and Protected Release Consistency (PRC) in the NoC-based multi-core (McNoC) systems. The PRC model is proposed as an extension of the RC model which provides additional reordering and relaxation in the memory operations. Second, the scalability analysis of these memory consistency models is performed in the McNoC systems.

    The architecture supports for these different memory consistency models are provided in the McNoC platforms. Each configurable McNoC platform uses a packet-switched 2-D mesh NoC with deflection routing policy, distributed shared memory (DSM), distributed locks and customized processor interface. The memory consistency models/protocols are implemented in the customized processor interfaces which are developed to integrate the processors with the rest of the system. The realization schemes for the memory consistency models are based on a transaction counter and an an an address ddress ddress ddress ddress ddress ddress stack tacktack-based based based based based based novel approaches.approaches.approaches.approaches. approaches.approaches.approaches.approaches.approaches.approaches. The transaction counter is used in each node of the network to keep track of the outstanding memory operations issued by a processor in the system. The address stack is used in each node of the network to keep track of the addresses of the outstanding memory operations issued by a processor in the system. These hardware structures are used in the processor interface to enforce the required global orders under these different memory consistency models. The realization scheme of the PRC model in addition also uses acquire counter for further classification of the data operations as unprotected and protected operations.

    The scalability analysis of these different memory consistency models is performed on the basis of different workloads which are developed and mapped on the various sized networks. The scalability study is conducted in the McNoC systems with 1 to 64-cores with various applications using different problem sizes and traffic patterns. The performance metrics like execution time, performance, speedup, overhead and efficiency are evaluated as a function of the network size. The experiments are conducted both with the synthetic and application workloads. The experimental results under different application workloads show that the average execution time under the relaxed memory consistency models decreases relative to the SC model. The specific numbers are highly sensitive to the application and depend on how well it matches to the architectures. This study shows the performance improvement under the relaxed memory consistency models over the SC model that is dependent on the computation-to-communication ratio, traffic patterns, data-to-synchronization ratio and the problem size. The performance improvement of the PRC and RC models over the SC model tends to be higher than 50% as observed in the experiments, when the system is further scaled up.

  • 412.
    Naeem, Abdul
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Shared Memory Consistency Models Evaluation in NoC based Multicore Systems2012In: Design, Automation and Test in Europe (DATE 2012), PhD Forum, Dresden, Dermany: EDAA / ACM SIGDA , 2012Conference paper (Refereed)
    Abstract [en]

    This paper overviews our study on various shared memory consistency models, Sequential Consistency (SC), Weak Consistency (WC), Release Consistency (RC), and Protected Release Consistency (PRC) models in Network-on-Chip (NoC) based Distributed Shared Memory (DSM) multi-core systems. These memory models are implemented by using a transaction counter (TC) based unified approach in the NoC based systems. The performance gain observed in the WC, RC and PRC relaxed memory models under various benchmarks is between 20% and 50% compared to the SC strict model.

  • 413.
    Naeem, Abdul
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Xiaowen
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Realization and Performance Comparison of Sequential and Weak Memory Consistency Models in Network-on-Chip based Multi-core Systems2011In: Proceedings of 16th ACM/IEEE Asia and South Pacific Design Automation Conference(ASP-DAC) 2011, IEEE Press, 2011, p. 154-159Conference paper (Refereed)
    Abstract [en]

    This paper studies realization and performance comparison of the sequential and weak consistency models in the network-on-chip (NoC) based distributed shared memory (DSM) multi-ore systems. Memory consistency constrains the order of shared memory operations for the expected behavior of the multi-core systems. Both the consistency models are realized in the NoC based multi-core systems. The performance of the two consistency models are compared for various sizes of networks using regular mesh topologies and deflection routing algorithm. The results show that the weak consistency improves the performance by 46.17% and 33.76% on average in the code and consistency latencies over the sequential consistency model, due to relaxation in the program order, as the system grows from single core to 64 cores.

  • 414.
    Naeem, Abdul
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Xiaowen
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Scalability of Relaxed Consistency Models in NoC based Multicore Architectures2009In: SIGARCH Computer Architecture News, ISSN 0163-5964, E-ISSN 1943-5851, Vol. 37, no 5, p. 8-15Article in journal (Other academic)
    Abstract [en]

    This paper studies realization of relaxed memory consistency models in the network-on-chip based distributed shared memory (DSM) multi-core systems. Within DSM systems, memory consistency is a critical issue since it affects not only the performance but also the correctness of programs. We investigate the scalability of the relaxed consistency models (weak, release consistency) implemented by using transaction counters. Our experimental results compare the average and maximum code, synchronization and data latencies of the two consistency models for various network sizes with regular mesh topologies. The observed latencies rise for both the consistency models as the network size grows. However, the scaling behaviors are different. With the release consistency model these latencies grow significantly slower than with the weak  onsistency due to better optimization potential by means of overlapping, reordering and program order relaxations. The release consistency improves the performance by 15.6% and 26.5% on average in the code and consistency latencies over the weak consistency model for the specific application, as the system grows from single core to 64 cores. The latency of data transactions  rows 2.2 times faster on the average with a weak consistency model than with a release consistency model when the system scales from single core to 64 cores.

  • 415.
    Naeem, Abdul
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Xiaowen
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Scalability of Weak Consistency in NoC based Multicore Architectures2010In: IEEE INT SYMP CIRC SYST PROC, New York: IEEE , 2010, p. 3497-3500Conference paper (Refereed)
    Abstract [en]

    In Multicore Network-on-Chip, it is preferable to realize distributed but shared memory (DSM) in order to reuse the huge amount of legacy code and easy programming. Within DSM systems, memory consistency is a critical issue since it affects not only performance but also the correctness of programs. In this paper, we investigate the scalability of the weak consistency model, which may be implemented using a transaction counter. The experimental results compare synchronization latencies for various network sizes, topologies and lock positions in the network. Average synchronization latency rises exponentially for mesh and torus topologies as the network size grows. However, torus improves the synchronization latency in comparison to mesh. For mesh topology network average synchronization latency is also slightly affected by the lock position with respect to the network center.

  • 416.
    Naeem, Abdul
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Xiaowen
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Realization and Scalability of Release and Protected Release Consistency Models in NoC based Systems2011In: Proceeding of 14th Euromicro Conference on Digital System Design, 2011, Oulu: IEEE Computer Society, 2011, p. 47-54Conference paper (Refereed)
    Abstract [en]

    This paper studies the realization and scalability of release and protected release consistency models in Network-on-Chip (NoC) based Distributed Shared Memory (DSM) multi-core systems. The protected release consistency (PRC) model is proposed as an extension of the release consistency (RC) model and provides further relaxation in the shared memory operations. The realization schemes of RC and PRC models use a transaction counter in each node of the NoC based multi-core (McNoC) systems. Further, we study the scalability of these RC and PRC models and evaluate their performance in the McNoC platform. A configurable NoC based platform with 2D mesh topology and deflection routing algorithm is used in the tests. We experiment both with synthetic and application workloads. The performance of the RC and PRC models are compared using sequential consistency (SC) as the baseline. The experiments show that the average code execution time for the PRC model in 8x8 network (64 cores) is reduced by 30.5% over SC, and by 6.5% over RC model. Average data execution time in the 8x8 network for the PRC model is reduced by almost 37% over SC and by 8.8% over RC. The increase in area for the PRC of RC is about 880 gates in the network interface ( 1.7% ).

  • 417.
    Naeem, Abdul
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Architecture Support and Comparison of Three Memory Consistency Models in NoC based Syst2012In: Proceedings of 15th EUROMICRO Conference on Digital System Design: Architectures, Methods and Tools (DSD 2012), IEEE Computer Society, 2012, p. 304-311Conference paper (Refereed)
    Abstract [en]

    We propose a novel hardware support for three relaxed memory models, Release Consistency (RC), Partial Store Ordering (PSO) and Total Store Ordering (TSO) in Network-on-Chip (NoC) based distributed shared memory multicore systems. The RC model is realized by using a Transaction Counter and an Address Stack based approach while the PSO and TSO models are realized by using a Write Transaction Counter and a Write Address Stack based approach. In the experiments, we use a configurable platform based on a 2D mesh NoC using deflection routing policy. The results show that under synthetic workloads, the average execution time for the RC, PSO and TSO models in 8x8 network (64 cores) is reduced by 35.8%, 22.7% and 16.5% respectively, over the Sequential Consistency (SC) model. The average speedup for the RC, PSO and TSO models in the 8x8 network under different application workloads is increased by 34.3%, 10.6% and 8.9%, respectively, over the SC model. The area cost for the TSO, PSO and RC models is increased by less than 2% over the SC model at the interface to the processor.

  • 418.
    Naeem, Abdul
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Scalability Analysis of Memory Consistency Models in NoC-based Distributed Shared Memory SoCs2013In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, ISSN 0278-0070, E-ISSN 1937-4151, Vol. 32, no 5, p. 760-773Article in journal (Refereed)
    Abstract [en]

    We analyze the scalability of six memory consistency models in network-on-chip (NoC)-based distributed shared memory multicore systems: 1) protected release consistency (PRC); 2) release consistency (RC); 3) weak consistency (WC); 4) partial store ordering (PSO); 5) total store ordering (TSO); and 6) sequential consistency (SC). Their realizations are based on a transaction counter and an address-stack-based approach. The scalability analysis is based on different workloads mapped on various sizes of networks using different problem sizes. For the experiments, we use Nostrum NoC-based configurable multicore platform with a 2-D mesh topology and a deflection routing algorithm. Under the synthetic workloads, the average execution time for the PRC, RC, WC, PSO, and TSO models in the 8 x 8 network (64-cores) is reduced by 32.3%, 28.3%, 20.1%, 13.8%, and 9.9% over the SC model, respectively. For the application workloads, as the network size grows, the average execution time under these relaxed memory models decreases with respect to the SC model depending on the application and its match to the architecture. The performance improvement of the PRC and RC models over the SC model tends to be higher than 50% as observed in the experiments, when the system is further scaled up. The area cost in the network interface for the relaxed memory models is increased by less than 4% over the SC model.

  • 419.
    Naeem, Abdul
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Scalability analysis of release and sequential consistency models in NoC based multicore systems2012In: 2012 International Symposium on System on Chip, SoC 2012, IEEE , 2012, p. 6376350-Conference paper (Refereed)
    Abstract [en]

    We analyze the scalability of the Release Consistency (RC) and Sequential Consistency (SC) models which are realized in the Network-on-Chip (NoC) based distributed shared memory multicore systems. The analysis is performed on the basis of workloads mapped on the different sizes of networks with different data sets. The experiments use a configurable platform based on a 2D mesh NoC using deflection routing algorithm. The results show that under the synthetic workloads using different distributed locks, the performance of the RC model is increased by 17.6% to 54.6% over the SC model in the 64-cores system. For the application workloads, as the network size grows from 1 to 64 cores, the execution time under the RC model decreases relative to the SC model which depends on the application and its match to the architecture. The performance improvement of the RC model over the SC model tends to be higher than 50% observed in the experiments, when the system is further scaled up.

  • 420.
    Naeem, Abdul
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lu, Zhonghai
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Scalability and Performance Evaluation of Memory Consistency Models in NoC based Multicore SoCs2012Conference paper (Other academic)
  • 421.
    Navas, Byron
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Reinforcement Learning Based Self-Optimization of Dynamic Fault-Tolerant Schemes in Performance-Aware RecoBlock SoCs2015Report (Other academic)
    Abstract [en]

    Partial and run-time reconfiguration (RTR) technology has increased the range of opportunities and applications in the design of systems-on-chip (SoCs) based on Field-Programmable Gate Arrays (FPGAs). Nevertheless, RTR adds another complexity to the design process, particularly when embedded FPGAs have to deal with power and performance constraints uncertain environments. Embedded systems will need to make autonomous decisions, develop cognitive properties such as self-awareness and finally become self-adaptive to be deployed in the real world. Classico-line modeling and programming methods are inadequate to cope with unpredictable environments. Reinforcement learning (RL) methods have been successfully explored to solve these complex optimization problems mainly in workstation computers, yet they are rarely implemented in embedded systems. Disruptive integration technologies reaching atomic-scales will increase the probability of fabrication errors and the sensitivity to electromagnetic radiation that can generate single-event upsets (SEUs) in the configuration memory of FPGAs. Dynamic FT schemes are promising RTR hardware redundancy structures that improve dependability, but on the other hand, they increase memory system traffic. This article presents an FPGA-based SoC that is self-aware of its monitored hardware and utilizes an online RL method to self-optimize the decisions that maintain the desired system performance, particularly when triggering hardware acceleration and dynamic FT schemes on RTR IP-cores. Moreover, this article describes the main features of the RecoBlock SoC concept, overviews the RL theory, shows the Q-learning algorithm adapted for the dynamic fault-tolerance optimization problem, and presents its simulation in Matlab. Based on this investigation, the Q-learning algorithm will be implemented and verified in the RecoBlock SoC platform.

  • 422.
    Navas, Byron
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    The RecoBlock SoC Platform: A Flexible Array of Reusable Run-Time-Reconfigurable IP-Blocks2013In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, 2013, p. 833-838Conference paper (Refereed)
    Abstract [en]

    Run-time reconfigurable (RTR) FPGAs combine the flexibility of software with the high efficiency of hardware. Still, their potential cannot be fully exploited due to increased complexity of the design process. Consequently, to enable an efficient design flow, we devise a set of prerequisites to increase the flexibility and reusability of current FPGA-based RTR architectures. We apply these principles to design and implement the RecoBlock SoC platform, which main characterization is (1) a RTR plug-and-play IP-Core whose functionality is configured at run-time; (2) flexible inter-block communication configured via software, and (3) built-in buffers to support data-driven streams and inter-process communications. We illustrate the potential of our platform by a tutorial case study using an adaptive streaming application to investigate different combinations of reconfigurable arrays and schedules. The experiments underline the benefits of the platform and shows resource utilization.

  • 423.
    Navas, Byron
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Towards cognitive reconfigurable hardware: Self-aware learning in RTR fault-tolerant SoCs2015In: Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2015, Institute of Electrical and Electronics Engineers (IEEE), 2015, article id 7238103Conference paper (Refereed)
    Abstract [en]

    Traditional embedded systems are evolving into power-and-performance-domain self-aware intelligent systems in order to overcome complexity and uncertainty. Without human control, they need to keep operative states in applications such as drone-based delivery or robotic space landing. Nowadays, the partial and run-time reconfiguration (RTR) of FPGA-based Systems-on-chip (SoC) can enable dynamic hardware acceleration or self-healing structures, but this conversely increases system-memory traffic. This paper introduces the basis of cognitive reconfigurable hardware and presents the design of an FPGA-based RTR SoC that becomes conscious of its monitored hardware and learns to make decisions that maintain a desired system performance, particularly when triggering hardware acceleration and dynamic fault-tolerant (FT) schemes on RTR cores. Self-awareness is achieved by evaluating monitored metrics in critical AXI-cores, supported by hardware performance counters. We suggest a reinforcement-learning algorithm that helps the system to search out when and which reconfigurable FT-scheme can be triggered. Executing random sequences of an embedded benchmark suite simulates unpredictability and bus traffic. The evaluation shows the effectiveness and implications of our approach.

  • 424.
    Navas, Byron
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. ESPE Universidad de Las Fuerzas Armadas, Ecuador .
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    On providing scalable self-healing adaptive fault-tolerance to RTR SoCs2014In: Proceedings of ReConFigurable Computing and FPGAs (ReConFig), 2014 International Conference on, 2014, p. 1-6Conference paper (Refereed)
    Abstract [en]

    The dependability of heterogeneous many-core FPGA based systems are threatened by higher failure rates caused by disruptive scales of integration, increased design complexity, and radiation sensitivity. Triple-modular redundancy (TMR) and run-time reconfiguration (RTR) are traditional fault-tolerant (FT) techniques used to increase dependability. However, hardware redundancy is expensive and most approaches have poor scalability, flexibility, and programmability. Therefore, innovative solutions are needed to reduce the redundancy cost but still preserve acceptable levels of dependability. In this context, this paper presents the implementation of a self-healing adaptive fault-tolerant SoC that reuses RTR IP-cores in order to self-assemble different TMR schemes during run-time. The presented system demonstrates the feasibility of the Upset-Fault-Observer concept, which provides a run-time self-test and recovery strategy that delivers fault-tolerance over functions accelerated in RTR cores, at the same time reducing the redundancy scalability cost by running periodic reconfigurable TMR scan-cycles. In addition, this paper experimentally evaluates the trade-off of the implemented reconfigurable TMR schemes by characterizing important fault tolerant metrics i.e., recovery time (self-repair and self-replicate), detection latency, self-assembly latency, throughput reduction, and increase of physical resources.

  • 425.
    Navas, Byron
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    The Upset-Fault-Observer: A Concept for Self-healing Adaptive Fault Tolerance2014In: Proceedings of the 2014 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2014, IEEE Computer Society, 2014, p. 89-96Conference paper (Refereed)
    Abstract [en]

    Advancing integration reaching atomic-scales makes components highly defective and unstable during lifetime. This demands paradigm shifts in electronic systems design. FPGAs are particularly sensitive to cosmic and other kinds of radiations that produce single-event-upsets (SEU) in configuration and internal memories. Typical fault-tolerance (FT) techniques combine triple-modular-redundancy (TMR) schemes with run-time-reconfiguration (RTR). However, even the most successful approaches disregard the low suitability of fine-grain redundancy in nano-scale design, poor scalability and programmability of application specific architectures, small performance-consumption ratio of board-level designs, or scarce optimization capability of rigid redundancy structures. In that context, we introduce an innovative solution that exploits the flexibility, reusability, and scalability of a modular RTR SoC approach and reuse existing RTR IP-cores in order to assemble different TMR schemes during run-time. Thus, the system can adaptively trigger the adequate self-healing strategy according to execution environment metrics and user-defined goals. Specifically the paper presents: (a) the upset-fault-observer (UFO), an innovative run-time self-test and recovery strategy that delivers FT on request over several function cores but saves the redundancy scalability cost by running periodic reconfigurable TMR scan-cycles, (b) run-time reconfigurable TMR schemes and self-repair mechanisms, and (c) an adaptive software organization model to manage the proposed FT strategies.

  • 426.
    Navas, Byron
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Öberg, Johnny
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Towards the generic reconfigurable accelerator: Algorithm development, core design, and performance analysis2013Conference paper (Refereed)
    Abstract [en]

    Adoption of reconfigurable computing is limited in part by the lack of simplified, economic, and reusable solutions. The significant speedup and energy saving can increase performance but also design complexity; in particular for heterogeneous SoCs blending several CPUs, GPUs, and FPGA-Accelerator Cores. On the other hand, implementing complex algorithms in hardware requires modeling and verification, not only HDL generation. Most approaches are too specific without looking for reusability. Therefore, we present a solution based on: (1) a design methodology to develop algorithms accelerated in reconfigurable/non-reconfigurable IP-Cores, using common access tools, and contemplating verification from model to embedded software stages; (2) a generic accelerator core design that enables relocation and reuse almost independently of the algorithm, and data-flow driven execution models; and (3) a performance analysis of the acceleration mechanisms included in our system (i.e., accelerator core, burst I/O transfers, and reconfiguration pre-fetch). In consequence, the implemented system accelerates algorithms (e.g., FIR and Kalman filters) with speedups up to 3 orders of magnitude, compared to processor implementations.

  • 427. Neto, W. L.
    et al.
    Possani, V. N.
    Marranghello, Felipe
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Matos, J. M.
    Reis, A. I.
    Ribas, R. P.
    Exact Multi-Level Benchmark Circuit Generation for Logic Synthesis Evaluation2018In: 31st Symposium on Integrated Circuits and Systems Design, SBCCI 2018, Institute of Electrical and Electronics Engineers (IEEE), 2018, article id 8533248Conference paper (Refereed)
    Abstract [en]

    Logic synthesis is a crucial step in digital integrated circuit design. There are methods for exact synthesis of two-level design able to handle very large circuits, with hundred of inputs, although of limited usefulness in VLSI circuit and system design. On the other hand, exact multi-level synthesis is a quite complex task, where the majority of algorithms are heuristic. To evaluate and validate new methods, benchmarks are of great importance. In particular, exact benchmarks unlock the possibility to evaluate the effectiveness of synthesis algorithm with respect to the optimal solution. This work proposes a novel method to generate exact multi-level circuits based on reversible logic. The proposed approach is able to build exact benchmark circuits with around 40 millions nodes in short time, acting as the identity function $f(x)=x$. It means, the most compact circuit corresponds to only wires, without any logic gate instantiation. The proposed work is complementary to other circuit generation approaches, being easily combined to explore particular characteristics of related benchmarks.

  • 428.
    Niazi, M. F.
    et al.
    Turku Centre for Computer Science (TUCS).
    Seceleanu, T.
    Turku Centre for Computer Science (TUCS).
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Towards Reuse-Based Development for the On-chip Distributed SoC Architecture2012In: Computer Software and Applications Conference Workshops (COMPSACW), 2012 IEEE 36th Annual, 2012, p. 278-283Conference paper (Refereed)
    Abstract [en]

    The development of a reusable library of components for a multi-core segmented bus platform, the SegBus, is presented. The library is based on a plug-in that we develop and deploy within a modeling tool which eventually used by the SegBus DSL while developing applications targeting the SegBus platform. The steps required in building the library and embed it into a plug-in are discussed together with the certain use of it in our design methodology

  • 429.
    Nidhi, U.
    et al.
    Indian Institute of Technology, Delhi, India.
    Paul, Kolin
    Indian Institute of Technology, Delhi, India.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Kumar, Anshul
    Indian Institute of Technology, Delhi, India.
    High performance 3D-FFT implementation2013In: Circuits and Systems (ISCAS), 2013 IEEE International Symposium on, IEEE , 2013, p. 2227-2230Conference paper (Refereed)
    Abstract [en]

    3D FFT is a very data and compute intensive kernel encountered in many applications. We report a high performance design and implementation of 3D-FFT on a CGRA which supports partial reconfiguration. The hardware software multi clock design uses dynamic reconfiguration to reduce the required communication bandwidth to achieve a sustained throughput of 40 GOPS on a wordsize of 48 bits. Performance metrics including overheads and speed over software for implementations of up to 256 point 3D-FFT have been presented in the paper.

  • 430.
    Nigussie, E.
    et al.
    Turku Centre for Computer Science (TUCS).
    Tuuna, S.
    Turku Centre for Computer Science (TUCS).
    Plosila, J.
    Turku Centre for Computer Science (TUCS).
    Isoaho, J.
    Turku Centre for Computer Science (TUCS).
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Semi-Serial On-Chip Link Implementation for Energy Efficiency and High Throughput2012In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 20, no 12, p. 2265-2277Article in journal (Refereed)
    Abstract [en]

    A high-throughput and low-energy semi-serial on-chip communication link based on novel design techniques and circuit solutions is presented. This self-timed link is designed using high-speed serialization/deserializtion and pulse dual-rail encoding techniques. The link also employs wave-pipelined differential pulse current-mode signaling to maintain the high speed data intake from the serializer. The energy efficiency of the proposed semi-serial link, which consists of bit-serial links in parallel, mainly comes from the sharing of the novel serializer's control circuit among the bit-serial links. In addition, the integration of pulse signaling with wave-pipelining, the use of a new low-complexity data validity detection technique, and the avoidance of data decoding logic also contribute to the power reduction. Furthermore, the formulated pulse dual-rail encoding provides an opportunity to implement pulse signaling at no cost. The ability to detect data validity at bit level allows acknowledgment per word without losing the delay-insensitivity of the transmission. The proposed semi-serial link is analyzed and compared with bit-serial and fully bit-parallel links for 64-bit data and communication distances of 1 to 8 mm. The semi-serial link which consists of eight bit-serial links provides 72.72 Gbps throughput with 286 fJ/bit energy dissipation for 8 mm transmission. It dissipates the lowest energy per bit compared to fully bit-parallel links while achieving the same throughput. The links are designed and simulated in Cadence Analog Spectre using 65-nm technology from STMicroelectronics.

  • 431. Nigussie, E.
    et al.
    Tuuna, S.
    Plosila, J.
    Liljeberg, P.
    Isoaho, J.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Boosting performance of self-timed delay-insensitive bit parallel on-chip interconnects2011In: IET CIRC DEVICE SYST, ISSN 1751-858X, Vol. 5, no 6, p. 505-517Article in journal (Refereed)
    Abstract [en]

    The authors present a performance boosting technique with a better power efficiency for delay-insensitive on-chip interconnects. The increase in signal propagation delay uncertainty with technology scaling makes self-timed delay-insensitive on-chip interconnects the most appropriate alternative. However, achieving high-performance communication in self-timed delay-insensitive links is difficult, especially for large bit parallel transmission because of the time-consuming detection of each bit validity. The authors present a high-speed completion detection technique along with its circuit implementation and two on-chip interconnects which use the proposed completion detection circuit. The performance, power consumption, power efficiency and area of the presented on-chip interconnects are analysed and compared with the conventionally implemented delay-insensitive interconnects. For 64-bit parallel transmission, 2.07 and 1.72 times throughput improvement with 47 and 39% more power efficiency have been achieved for the two interconnects compared to their conventional counterparts. The interconnect circuits are designed and simulated using Cadence Analog Spectre and Hspice with 65 nm complementary metal-oxide semiconductor technology from STMicroelectronics.

  • 432. Nikander, P.
    et al.
    Kameswar Rao, V.
    Liuha, P.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    ELL-I: An inexpensive platform for fixed things2013In: Scalable Computing: Practice and Experience, ISSN 1895-1767, E-ISSN 1895-1767, Vol. 14, no 3, p. 155-167Article in journal (Refereed)
    Abstract [en]

    The Internet of Things (IoT) vision is enticing; each and every "thing" in the world is expected to be eventually connected to the Internet, thereby becoming a part of the "context" within which the applications live. In most of the IoT research, the focus has been in enabling movable things to communicate, including phones, tablets, RFID tags, watches, and jewellery, to name but a few. In such an approach, the things are expected to have their own batteries or receive temporary power over short distance electro-magnetic field. This approach has also dominated the more fixed side of the IoT research, including a large fraction on the work on stationary sensors and actuators, focusing also there on battery-based operations and wireless communication. In this paper, we introduce an alternative view to the world of stationary Internet-connected things. We argue that a large majority of the fixed or stationary things would benefit from being permanently powered using wireline connections, and while doing so, it becomes natural to use the same wires also for their communication and contextual needs. Such an approach allows the appliances to become part of the the wider application context. With this in mind, we introduce the ELL-i platform, a new open source initiative for provide a low-cost flexible prototyping and production platform for extensible, Power-over-Ethernet based smart appliances. We describe the first ELL-i prototyping board, a number of application concepts, and discuss its business model.

  • 433. Nilsaz, A. S.
    et al.
    Parashkoh, M. K.
    Ghauomy-zadeh, H.
    Zou, Zhuo
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Baghaei-Nejad, Majid
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Zheng, Li-Rong
    KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK. KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Low power 0.18 um CMOS ultra wideband inductor-less LNA design for UWB receiver2010In: IEEE Asia-Pacific Conference on Circuits and Systems, Proceedings, APCCAS, 2010, p. 855-858Conference paper (Refereed)
    Abstract [en]

    This paper presents an inductor-less low-noise amplifier (LNA) design for ultra-wideband (UWB) receivers and microwave access covering the frequency range from 0.4 to 5.7 GHz using 0.18-μm CMOS. Simulation results show that the voltage gain reaches a peak of 18.94 dB in-band with an upper 3-dB frequency of 5.7 GHz. The IIP3 is about 3 dBm and the noise figure (NF) ranges from 3.15-3.86 dB over the band of interest. Input matching is better than -8.79dB and the LNA consumes 5.77mW at 1.8V supply voltage. A figure of merit is used to compare the proposed design with recently published wideband CMOS LNAs. The proposed design achieves a superior voltage gain and tolerable NF, with the additional advantage of removing the bulky inductors. It is shown that the designed LNA without on-chip inductors achieves comparable performances with inductor-based designs.

  • 434. Nilsson, Erland
    et al.
    Millberg, Mikael
    Öberg, Johnny
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Load distribution with the Proximity Congestion Awareness in a Network on Chip2003In: Proceedings of the Design Automation and Test Europe (DATE), 2003, p. 1126-1127Conference paper (Refereed)
  • 435. Oeberg, Johnny
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    Validation of Interface Protocols Using Grammar-based Models1998In: Proceedings of the IEEE International High Level Design Validation and Test Workshop, 1998Conference paper (Refereed)
  • 436.
    Olsson, Thomas
    et al.
    Dept. of Applied Electronics, Univ. of Lund.
    Torkelsson, Mats
    Dept. of Applied Electronics, Univ. of Lund.
    Nilsson, Peter
    Dept. of Applied Electronics, Univ. of Lund.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Meincke, Thomas
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A digitally controlled on-chip clock multiplier for globally asynchronous locally synchronous systems1999In: Circuits and Systems, 1999. 42nd Midwest Symposium on, 1999, Vol. 1, p. 84-87Conference paper (Refereed)
    Abstract [en]

    For large high-speed globally synchronous ASICs, designing the clock distribution net becomes a troublesome task. Besides problems caused by clock skew, the clock net also is a major source of power consumption. Partitioning the design into locally clocked blocks reduces clock skew problems and if handled correctly it also helps reducing power consumption. However, to achieve these positive effects, the blocks need on-chip clocks having properties as small area and low power consumption. Therefore, a low power small area digitally controlled on-chip clock generator is designed

  • 437. O’Nils, Mattias
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Communication in Hardware/Software Embedded Systems - A Taxonomy and Problem Formulation1997In: Proceedings of the 15th NORCHIP Conference, 1997Conference paper (Refereed)
  • 438. O’Nils, Mattias
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    HW/SW Interface Validation in IP based System Design1998In: Proceedings of the International Workshop on IP Based Synthesis and System Design, 1998Conference paper (Refereed)
  • 439. O’Nils, Mattias
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Multi-phase Validation of Hardware/Software Interfaces based on Generated Simulation Models1998In: Proceedings of the IEEE International High Level Design Validation and Test Workshop, 1998Conference paper (Refereed)
  • 440. O’Nils, Mattias
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Operating System Sensitive Device Driver Synthesis from Implementation Independent Protocol Specification1999In: Proceedings of Design Automation and Test in Europe, 1999Conference paper (Refereed)
  • 441. O’Nils, Mattias
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Refinement of HW/SW Communication Channels: Case Study and Comparison1998In: Proceedings of the 16th NORCHIP Conference, 1998Conference paper (Refereed)
  • 442. O’Nils, Mattias
    et al.
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Synthesis of DMA Controllers from Architecture Independent Descriptions of HW/SW Communication Protocols1999In: Proceedings of the Twelfth International Conference on VLSI Design, 1999Conference paper (Refereed)
  • 443. O’Nils, Mattias
    et al.
    Öberg, Johnny
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Grammar Based Modelling and Synthesis of Device Drivers and Bus Interfaces1998In: Proceedings of the 24th Euromicro Conference, short contribution, Vasteras, 1998Conference paper (Refereed)
  • 444.
    Pahikkala, T.
    et al.
    Turku Centre for Computer Science (TUCS).
    Airola, A.
    Turku Centre for Computer Science (TUCS).
    Xu, T. C.
    Turku Centre for Computer Science (TUCS).
    Liljeberg, P.
    Turku Centre for Computer Science (TUCS).
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Salakoski, T.
    Turku Centre for Computer Science (TUCS).
    Parallelized Online Regularized Least-Squares for Adaptive Embedded Systems2012In: International Journal of Embedded and Real-Time Communication Systems (IJERTCS), ISSN 1947-3176, Vol. 3, p. 73-91Article in journal (Refereed)
    Abstract [en]

    The authors introduce a machine learning approach based on parallel online regularized least-squares learning algorithm for parallel embedded hardware platforms. The system is suitable for use in real-time adaptive systems. Firstly, the system can learn in online fashion, a property required in real-life applications of embedded machine learning systems. Secondly, to guarantee real-time response in embedded multi-core computer architectures, the learning system is parallelized and able to operate with a limited amount of computational and memory resources. Thirdly, the system can predict several labels simultaneously. The authors evaluate the performance of the algorithm from three different perspectives. The prediction performance is evaluated on a hand-written digit recognition task. The computational speed is measured from 1 thread to 4 threads, in a quad-core platform. As a promising unconventional multi-core architecture, Network-on-Chip platform is studied for the algorithm. The authors construct a NoC consisting of a 4x4 mesh. The machine learning algorithm is implemented in this platform with up to 16 threads. It is shown that the memory consumption and cache efficiency can be considerably improved by optimizing the cache behavior of the system. The authors’ results provide a guideline for designing future embedded multi-core machine learning devices.

  • 445. Pahikkala, Tapio
    et al.
    Airola, Antti
    Xu, Thomas Canhao
    Liljeberg, Pasi
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Salakoski, Tapio
    A Parallel Online Regularized Least-squares Machine Learning Algorithm for Future Multi-core Processors.2011In: PECCS 2011 - Proceedings of the 1st International Conference on Pervasive and Embedded Computing and Communication Systems 2011, 2011, p. 590-599Conference paper (Refereed)
    Abstract [en]

    In this paper we introduce a machine learning system based on parallel online regularized least-squares learning algorithm implemented on a network on chip (NoC) hardware architecture. The system is specifically suitable for use in real-time adaptive systems due to the following properties it fulfills. Firstly, the system is able to learn in online fashion, a property required in almost all real-life applications of embedded machine learning systems. Secondly, in order to guarantee real-time response in embedded multi-core computer architectures, the learning system is parallelized and able to operate with a limited amount of computational and memory resources. Thirdly, the system can learn to predict several labels simultaneously which is beneficial, for example, in multi-class and multi-label classification as well as in more general forms of multi-task learning. We evaluate the performance of our algorithm from 1 thread to 4 threads, in a quad-core platform. A Network-on-Chip platform is chosen to implement the algorithm in 16 threads. The NoC consists of a 4×4 mesh. Results show that the system is able to learn with minimal computational requirements, and that the parallelization of the learning process considerably reduces the required processing time.

  • 446. Pamunuwa, Dinesh
    et al.
    Grange, Matthew
    Weerasekera, Roshan
    Jantsch, Axel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    3-D Integration and the Limits of Silicon Computation2011In: Proceedings of the International Conference on Very Large Scale Integration (VLSI-SoC), 2011, p. 343-348Conference paper (Refereed)
    Abstract [en]

    The intrinsic computational efficiency (ICE) of silicon defines the upper limit of the amount of computation within a given technology and power envelope. The effective computational efficiency (ECE) and the effective computational density (ECD) of silicon, by taking computation, memory and communication into account, offer a more realistic upper bound for computation of a given technology. Among other factors, they consider how distributed the memory is, how much area is occupied by computation, memory and interconnect, and the geometric properties of 3-D stacked technology with through silicon vias (TSV) as vertical links. We use the ECE and ECD to study the limits of performance under different memory distribution, power, thermal and cost constraints for various 2-D and 3-D topologies, in current and future technology nodes.

  • 447. Pang, Z.
    et al.
    Tian, Junzhe
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Ecosystem-driven design of in-home terminals based on open platform for the Internet-of-Things2014In: 2014 16th International Conference on Advanced Communication Technology (ICACT), IEEE , 2014, p. 369-377Conference paper (Refereed)
    Abstract [en]

    In-home healthcare services based on the Internet-of-Things (IoT) have great business potentials. To turn it into reality, a business ecosystem should be established first. Technical solutions should therefore aim for a cooperative ecosystem by meeting the interoperability, security, and system integration requirements. In this paper, we propose an ecosystem-driven design strategy and apply it in the design of an open-platform-based in-home healthcare terminal. A cooperative business ecosystem is formulated by merging the traditional healthcare and mobile internet ecosystems. To support the ecosystem in practical technology and business development, ecosystem-driven standardization efforts, security mechanisms, terminal design principles, and data handling schemes are analyzed and corresponding solutions or guidelines are presented. Thirdly, to verify the proposed design strategy and guidelines, an open-platform-based terminal is implemented and demonstrated by a prototyping system.

  • 448. Pang, Z.
    et al.
    Tian, Junzhe
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Qiang
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Intelligent packaging and intelligent medicine box for medication management towards the Internet-of-Things2014In: 2014 16th International Conference on Advanced Communication Technology (ICACT), IEEE , 2014, p. 352-360Conference paper (Refereed)
    Abstract [en]

    The medication noncompliance problem has caused serious threat to public health as well as huge financial waste would wide. The emerging pervasive healthcare enabled by the Internet-of-Things offers promising solutions. In addition, an in-home healthcare station (IHHS) is needed to meet the rapidly increasing demands for daily monitoring and on-site diagnosis and prognosis. In this paper, a pervasive and preventive medication management solution is proposed based on intelligent and interactive packaging (I2Pack) and intelligent medicine box (iMedBox). The intelligent pharmaceutical packaging is sealed by the Controlled Delamination Material (CDM) and controlled by wireless communication. Various vital parameters can also be collected by wearable biomédical sensors through the wireless link. On-site diagnosis and prognosis of these vital parameters are supported by the high performance architecture. Additionally, friendly user interface is emphasized to ease the operation for the elderly, disabled, and patients. A prototyping system of the I2Pack and iMedBox is implemented and verified by field trials.

  • 449.
    Pang, Zhibo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Technologies and Architectures of the Internet-of-Things (IoT) for Health and Well-being2013Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The emerging technology breakthrough of the Internet-of-Things (IoT) is expected to offer promising solutions for food supply chain (FSC) and in-home healthcare (IHH), which may significantly contribute to human health and well-being. In this thesis, we have investigated the technologies and architectures of the IoT for these two applications as so-called Food-IoT and Health-IoT respectively. We intend to resolve a series of research problems about the WSN architectures, device architectures and system integration architectures. To reduce the time-to-market and risk of failure, business aspects are taken into account more than before in the early stage of technology development because the technologies and applications of IoT are both immature today.

    The challenges about enabling devices that we have addressed include: the WSN mobility and wide area deployment, efficient data compression in resource-limited wireless sensor devices, reliable communication protocol stack architecture, and integration of acting capacity to the low cost intelligent and interactive packaging (I2Pack). Correspondingly, the WAN-SAN coherent architecture of WSN, the RTOS-based and multiprocessor friendly stack architecture, the content-extraction based data compression algorithm, and the CDM-based I2Pack solution are proposed and demonstrated.

    At the system level, we have addressed the challenges about effective integration of scattered devices and technologies, including EIS and information integration architectures such as shelf-life prediction and real-time supply chain re-planning for the Food-IoT, and device and service integration architectures for the Health-IoT. Additionally, we have also addressed some challenges at the top business level, including the Value Chain Models and Value Proposition of the Food-IoT, and the cooperative ecosystem model of the Health-IoT. These findings are generic and not dependent on our proprietary technologies and devices.

    To be more generalized, we have demonstrated an effective research approach, the so-called Business-Technology Co-Design (BTCD),  to resolve an essential challenge in nowadays research on the IoT -- the lack of alignment of basic technology and practical business requirements. We have shown its effectiveness by our design practice. It could be an instructive example of “the change of mindset” which is essential for the IoT research in the future.

  • 450.
    Pang, Zhibo
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    Baghaei-Nejad, Majid
    KTH, School of Information and Communication Technology (ICT), Electronic, Computer and Software Systems, ECS. KTH, School of Information and Communication Technology (ICT), Centres, VinnExcellence Center for Intelligence in Paper and Packaging, iPACK.
    The TouchMe System: RFID Solution for Interactive Package with Mediated Service2008Conference paper (Refereed)
    Abstract [en]

    RFID Based Intelligent Sensing (RBIS) is a promising technology strategy for upcoming Internet of Things (IoT) which will boom up numbers of innovative applications in next decades. In order to introduce IoT applications to human daily life and get the value chain fully-formed, we must change the application model of RBIS from object-centric to user-centric and extend the business values from enterprise-centric to consumer-centric. In this paper, a user-centric model of RBIS for IoT applications is proposed based on generalization and abstraction of numbers of possible application scenarios. By extending the traditional two-role model (Reader and Tag) into three-role model (Mediator, Object and Visitor), a new role of Visitor is introduced to identify the human activities. Based on that, user-centric data gathering, personalized service delivery and privacy protection are enabled where the core-values of IoT applications lie in. An event-driven mechanism is introduced to improve the efficiency of communication and data processing in the system especially for long term monitoring of sparse events. The proposed model has been verified by implemented prototype system, the TouchMe system, including the hardware, software, interactive package boxes, and a novel passive RFID tag chip as the next generation solution. Experimental results approve that the proposed model clarifies the application requirements, technical architectures and added values of IoT applications. It is possible to establish feasible practical IoT services based on the proposed model in the future.

6789101112 401 - 450 of 633
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf