9th International Forum on Embedded MPSoC and Multicore
2-7 August 2009, Savannah, Georgia, USA




Monday August 3: MPSOC application day

SESSION 1: Keynote

  • Koichiro Yamashita, Fusitu Labs, Japan
    A software centric system design for multicore SoC in the upstream phase

SESSION 2: Mini-Keynotes

  • Norbert Wehn, University of Kaiserslautern, Germany
    Energy Modeling and Optimization: A Critical Assessment with Two Case Studies
    Energy efficiency is a key challenge in embedded system design. Thus, many sophisticated optimization techniques were developed over the last decades. These techniques typically rely on models representing the system's energy consumption. In this talk we will present two case studies which show that commonly used energy models can yield in wrong optimization strategies. In detail we will discuss: Short hop versus long hop in wireless sensor networks and DRAM power management.

  • Torsten Kempf, RWTH Aachen University, Germany
    A design methodology for SDR (Software Defined Radios) reconciling portability with efficiency

  • Soo-Ik Chae, Seoul National University, Korea
    Flexible multi-core platform using multple RISC clusters for multi-standard video applications
    First, I will introduce a RISC cluster that has up to four RISC cores with shared I/D caches and a hardware thread scheduler for muti-threading as well as coprocessors for computation and communication. Then I will describe a high-perofrmance video platform using multiple RISC clusters and its design flow that maps a video application to the platform with architecture-level DSE. I will also show some preliminary data about a maping result for a H.264 720p decoder.

  • Pieter Van der Wolf, NXP, Netherlands
    The Memory Bottleneck in MPSoCs for Multimedia
    MPSoCs for multimedia have high demands on their infrastructures, due to requirements for high bandwidth, low latency, and quality-of-service guarantees. We specifically focus on the bottleneck to off-chip DRAM and explain how it impacts overall system cost. We detail the requirements of different kinds of processing engines on a DRAM controller and discuss the integration challenge for heterogeneous MPSoCs for multimedia. 3D integration of logic and memory may create a paradigm shift, as it could make abundant bandwidth available. We will discuss some of the challenges for adoption of such 3D integration techniques.

  • Gerhard  Fettweis,Technical University Dresden, Germany
    Designing an LTE Baseband MPSoC

SESSION 3: In-depth technical presentations

  • Kasutoshi Wakabayashi, NEC, Japan
  • Suzane Lesecq, CEA-LETI, France
    Sensor fusion with intermittent data, application to attitude estimation
    Observers are classically used to fuse information from various sensor modalities in order to estimate other variables that are not directly measurable by sensors. The implementation of these techniques leads to the so-called "soft sensor". However, due to the introduction of networks between the sensors and the computational unit, the design of observers that can support data loss must be considered. Moreover, distributed versions of these observers must be implemented. This talk proposes a review of various techniques that can be found in the literature to take into account the data loss. An Extended Kalman Filter that takes into account the loss of a subset of the measurements is proposed. The technique is exemplified in the context of attitude estimation with measurements from an Inertial Measurement Unit.

  • Chris Rowen , Tensilica, USA
    Energy-Efficient LTE Baseband with Extensible Dataplane Processor Units
    This paper outlines a new generation of new processor generation technology tuned to the demands of multiple dataplane processing units in baseband communications, video post-processing, audio and other cost- and energy-obsessed SOC applications. We highlight improved interface and computing structures and new tools for integration with MatLab models and RTL blocks. Then we apply these methods to integration of a complete programmable140Mbps LTE receiver architecture, built using a new DSP architecture with optimizations for OFDM, MIMO decoding and forward error correction. We conclude with some insights into future directions for application-specific processor cores, system architectures and methodology.

  • Nicolas Darbel, STEricsson, France
    U8500 will fuel the growth of Web-enabled multimedia handheld devices and enable smartphones to be widely adopted by consumers
    This Single Chip Digital BaseBand and Application processor combines leading edge features for multimedia, computing and communication such as: real time capture & playback of 1080p30 video, HSPA modem release 7, high performance computing based on SMP dual ARM Cortex A9 multicore processor, supported by state-of-the-art architectural concepts such as Globally Asynchronous Locally Synchronous interconnect and SW controlled advanced power management, in advanced 45/40nm process technology and POP packaging.

  • Guy Bois, Ecole Polytechnique Montréal, Canada
    Exploration, Design and Development of Hardware & Software Multi-Processor Embedded Systems Made Easier
    Market forces are driving electronic system developers to design ever more complex products whose architectures are based on a large and growing mix of multi-processors, software, complex peripherals and intellectual property cores. This challenge has given rise to the need for new design methods that provides the ability to address both software and hardware development simultaneously, allowing full optimization in each domain. In this context, we present SpaceStudio a system level development environment and tool suite that allows hardware/software architectural exploration and partitioning, design, validation, simulation, performance analysis and integration of mixed HW/SW embedded systems.

  • Wen-mei Hwu, Illinois University, USA
    Many-core Parallel Computing - Can compilers and tools do the heavy lifting?
    Modern GPUs such as the NVIDIA GeForce GTX280, ATI Radeon 4860, and the upcoming Intel Larrabee are massively parallel, many-core processors. Today, application developers for these many-core chips are reporting 10X-100X speedup over sequential code on traditional microprocessors. According to the semiconductor industry roadmap, these processors could scale up to over 1,000X speedup over single cores by the end of the year 2016. Such a dramatic performance difference between parallel and sequential execution will motivate an increasing number of developers to parallelize their applications. Today, an application programmer has to understand the desirable parallel programming idioms, manually work around potential hardware performance pitfalls, and restructure their application design in order to achieve their performance objectives on many-core processors.  Although many researchers have given up on parallelizing compilers, I will show evidence that by systematically incorporating high-level application design knowledge into the source code, a new generation of compilers and tools can take over the heavy lifting in developing and tuning parallel applications. I will also discuss roadblocks whose removal will require innovations from the entire research community.

Tuesday August 4: Software day

SESSION 4: Keynote

  • Osamu Nishii, Renesas Technology, Japan
    Concurrent trace hardware of multi-core
    Trace hardware is attached to processors and it enables an observality on software debugging. Real-time trace function is used to trace without processing speed degradation, full trace function is used to trace information without information loss. SuperH multi-processor (SH-X3) supports both trace modes, and it improves trace utilization ratio using an adaptive trace buffer allocation technique.

SESSION 5: Mini-Keynotes

  • Jan Madsen, Technical University of Denmark, Denmark
    Mapping bio-chemical applications onto microfluidic-based biochips
    Microfluidic biochips are replacing the conventional biochemical analyzers, and are able to integrate on-chip all the necessary functions for biochemical analysis. The "digital" microfluidic biochips are manipulating liquids not as a continuous flow, but as discrete droplets, and hence are highly ireconfigurable and scalable. A digital biochip is composed of a two-dimensional array of cells, together with reservoirs for storing the samples and reagents. Several adjacent cells are dynamically grouped to form a virtual device, on which operations are executed. This talk will present the possibilities and challenges of mapping bio-chemical applications onto microfluidic biochips. It will discuss the similarities and new challenges as compared to on-line dynamic reconfigurability of digital reconfigurable multicore architectures.

  • Marcello Coppola , STMicroelectronics, France
    Future Trends in communication infrastructures for MPSoC
    Current SoC architectures have to support several applications with high requirements in term of bandwidth and latency which implies the use of heterogeneous multi core architectures in a single chip (MPSoCs) endowed with complex communication infrastructures, such as networks on chip (NoCs). So far, NoC has been considered as not programmable hardware component able to support the overall MPSoC bandwidth and latency requirements. In this talk we will show how important is the introduction of programmability in NoCs.

  • Mostapha Aboulhamid , Université de Montréal, Canada
    System Modeling and Multicore Simulation using the Transaction Paradigm
    The transaction concept is a powerful model used to describe in a simple way the coherent execution of simultaneous process activities in concurrent systems. Indeed, a transaction allows encapsulating a set of operations which therefore behaves as a single atomic operation. In this way, H/S system functionality can be modeled as a set of transactions, where each transaction can be dealt with in isolation (from other transactions). Hence, the designer does not need any longer to worry about shared resources consistency and the coordination of concurrent modules accessing them, since all these matters are being automatically taken care of by the transaction manager implemented by the underlying simulator. In order to simulate a transaction-based model, we need to implement a Software Transactional Memory (STM). An STM model consists of a set of processes which communicate through a shared memory by executing transactions. We will give some hints on the correctness criteria used to validate the parallel execution of transactions. Then, we give an overview of the different techniques and policies used in the implementation of an STM. Finally, we show different implementation avenues using the .NET framework and a comparison with the SystemC approach. In our comparison with SystemC, we show the potential of accelerating simulation using multicore hosts and the advantage of using the transaction paradigm in modeling as well as in design space exploration at a system abstraction level.

SESSION 6: In-depth technical presentations

  • Koji Inoue, Kyushu University, Japan
    Performance Balancing: Adaptive Parallel Execution for High-Performance Multi-Cores
    This talk introduces a technique to improve the performance of multi-core processors. Unlike conventional parallel executions, our approach dares to decrease the number of cores to be used for program executions even if there are relatively enough thread-level parallelisms. The remaining cores execute helper threads (software prefetchers) to gain the memory performance. The key of this approach is to consider the balance between computation and memory performance, and to decide the appropriate number of cores to be used for helper thread executions at run time. By means of effectively orchestrating the on-chip resources, we can dramatically improve the CMP performance. Our experimental results demonstrate that in the best case more than 60% of speed up can be achieved over conventional parallel executions.

  • Hyunchul Shin , Hanyang University, Korea
    Simultaneous Mapping and Scheduling for Multi-core Systems, by Using Iterative Refinement
    Mapping of applications to MPSoC architectures and scheduling of tasks are key problems in system level design of embedded multi-core systems. An effective iterative-refinement-based mapping and scheduling algorithm is developed, in which the tasks in the active zone is simultaneously mapped and scheduled for heterogeneous multiple processors. The tradeoff between run-time and quality of solution is possible by adjusting the number of tasks in the zone. Experimental results show that this method is very effective.

  • Sungjoo Yoo, Postech , Korea
    On-chip Network Architecture to Tackle Parallelism Mismatch Problems in Accessing Multiple Memories
    The performance of data-intensive applications (e.g., Full HD mobile internet device) is mostly determined by the bandwidth and latency of off-chip DDR memory. In order to increase the bandwidth and reduce the latency, 3D stacking of memory dies on top of LSI dies is being actively investigated. Towards the exploitation of full potential of new 3D stacked memories, we need new architectural ideas, especially, for memory controllers and on-chip network. In this talk, we will present on-chip network architectures with the awareness of parallel memories. To be specific, we will focus on the problems caused by parallelism mismatches between master and memories and present our solutions of request parallelization and read data serialization to resolve the problems.

  • Tsuyoshi Isshiki, Tokyo Institute of Technology, Japan
    Efficient MPSoC Design Space Exploration on the Tightly-Coupled Thread Workload Simulation Framework
    The current ESL tools provide an integrated design platform for software and MPSoC architecture developments, however, a thorough design space exploration is still a challenge due to the design costs of developing a correct and complete set of SW/HW simulation models and their time-consuming simulation runs. Our TCT (Tightly-Coupled Thread) MPSoC design framework consists of several key features for realizing an efficient design space exploration methodology on its MPSoC architecture platform. Our framework includes a simple programming model on the sequential C programs for creating highly concurrent execution models by its TCT compiler, an efficient message-passing protocol through high-speed crossbar interconnect, and an ultra-fast MPSoC trace-driven workload simulator whose speed measures in billions of cycles per second while achieving high accuracy of below 1% cycle error. This talk will mainly focus on its workload simulator technology and its future prospects.

  • Reiner Hartenstein, TU Kaiserslautern, Germany
    Many-Core Programming and the CS Education Dilemma
    Computing and especially parallel computing the von Neumann style is highly inefficient, not only because multiple layers of massive overhead phenomena often lead to code sizes of astronomic dimensions (Nathan’s Law). The unaffordability of the massive dominance of von- Neumann-based computing is looming because of growing high energy consumption and increasing cost of energy. For saving energy, partial software to configware (e. g. to FPGA) migrations are promising a higher potential than classical green computing and low power design approaches. For many application domains a homogeneous parallelization might run out of steam with a major number of cores. Often viable solutions are hardly possible by fully instruction-stream-based approaches. Hetero solutions will be required. A sufficiently large population of programmers being hetero- many-core-qualified as well as FPGA-savvy is far from being available. I do not agree with several HPC celebrities calling for a radical re-design of the entire computing discipline. The solution can be obtained by putting very old ideas into practice for a dual dichotomy approach including a twin paradigm model enabling a software / configware co-education strategy and a relativity dichotomy supporting the mapping between time and space domains. The talk will sketch a road map.

  • Jenq-Kuen Lee, National Tsing-Hua University, Taiwan
    Support of Programming Models and Tools for Embedded Multi-core Platforms

Wednesday August 5: Architecture day

SESSION 8: Keynote

  • Pierre Paulin, STMicroelectronics, Canada
    Real-Life Challenges on Mapping High-end Video to MP-SoC
    The increasing need for flexibility in multimedia SoCs for consumer applications is leading to a new class of programmable, multi-processor platforms. The high computation and data storage requirements of these applications poses new challenges in the expression of the applications, the multi-core fabric to support them and the aplication-to-platform mapping tools. We elaborate on these challenges, and illustrate them with our experience on mapping high-definition video image quality improvement and codec applications to MP-SoC platforms.

SESSION 9: Mini-Keynotes

  • Charlie Janac, Arteris, USA
    The Network-On-Chip as a Key to Flexible SoC Architectures
    Ever increasing SoC power, performance and area requirements are forcing on-chip interconnect technology to keep pace. In a first generation, a hierarchy of simple buses and crossbars were sufficient to meet the needs of SoCs. Subsequently, the need for IP re-use resulted in a second generation consisting of de-coupled busses and crossbars, separating the IP interface protocol from the interconnect protocol. The Network-On-Chip(NoC) represents the current generation of on-chip interconnect technology. By decoupling transaction, transport and physical layers, the NoC delivers a flexible topology architecture, resulting in lower power, faster and smaller interconnect IPs, thereby providing SoC designers with the needed flexibility to achieve performance and cost requirements.

  • Ahmed Jerraya, CEA-LETI, MINATEC, France
    SoC integration beyond Hardware and Software
    SoC integration has already made fantastic achievements to produce smaller, cheaper, more reliable, better performing devices with more and more functionalities. This presentation shows how this trend is expected to continue for the next 10 years because many applications still need 100X improvement factor that can be achieved only through integration.

  • Drew Wingard, Sonics, USA
  • Srinivasan Muralli, Inocs, Switzerland
    NoC Tool Flow for Achieving Fast Design Closure
    The growing complexity of Systems on Chips (SoCs) is requiring communication resources that can only be provided by a highly-scalable Networks on Chip (NoC) based communication infrastructure. Developing NoC-based systems tailored to a particular application domain is important for achieving high-performance, energy-efficient customized solutions. To achieve early time-to-market, it is important to have a CAD tool flow that automates most of the time-intensive design steps. In this talk, I will show how a CAD flow is crucial in solving the NoC design problem efficiently and for achieving design closure.

  • John Goodacre, ARM, UK
    Targeted execution enabling increased power efficiency
    In many low power designs there is a diversity in performance requirement both between concurrent task activity and their temporal requirement. It has been observed for numerous generation of processor design that there exists a variance in the power efficiency for a given microarchitectual design. It is therefore proposed that to minimize the energy consumption of a device, utilizing a processor of a specific microarchitectural structure can offer the most power efficient execution of specific applicable workloads. As such a unified ISA, heterogeneous multicore design will provide an associated energy reduction. This talk will summarize some of the approaches for this and potential software techniques required to support such systems.

SESSION 10: In-depth technical presentations

  • Scott Mahlke, Michigan University, USA
    High Performance Mobile Computing Using Flexible Wide SIMD Processors
    In the past decade, the proliferation of mobile devices has increased at a spectacular rate. There are now more than 3.3 billion active cell phones in the world, a device that we now all depend on in our daily lives. The current generation of devices employs a combination of general-purpose processors, digital signal processors, and hardwired accelerators to provide giga-operations-per-second performance on milliWatt power budgets. Such heterogeneous organizations are inefficient to build and maintain, as well as waste silicon area and power. Looking forward to the next generation of mobile computing, computation requirements will increase by one to three orders of magnitude due to higher data rates, increased complexity algorithms, and greater computation diversity but the power requirements will be just as stringent. Scaling of existing approaches will not suffice, instead the inherent computational efficiency, programmability, and adaptability of the hardware must change. To overcome these challenges, this talk argues that wide SIMD processors provide the necessary computational bandwidth for the next generation wireless signal processing and high-definition video algorithms. However, simply scaling old SIMD designs is not enough. Wide SIMD processors need more inherent flexibility to adapt to different application characteristics and better power efficiency to avoid unnecessary use of large structures. This work introduces AnySP, a highly configurable SIMD datapath that is adaptable to a wide range of signal and media processing applications. AnySP maintains high utilization of a wide SIMD datapath by enabling the processing of wide vectors or multiple narrow vectors simultaneously as well as pipeling deeper computation subgraphs across neighboring lanes. Results show that AnySP is capable of sustaining 4G wireless processing and high-definition video throughput rates, and will approach the 1000 Mops/mW efficiency barrier when scaled to 45nm.

  • Tom Conte, Georgia Institute of Technology, USA
    Manycores: will we learn from the past?

  • Sanjay Patel, Illinois University, USA
    Scaling to 1000 cores on a chip: Architecture and Application
    Chip architectures such as Nvidia G80 initiated the era of massively parallel general purpose computing on the client. Fueling the economic fire for such high-performance chips are interactive, client application domains such as gaming that are hungry for performance. Emerging applications in vision, imaging, video processing, virtual immersion, and robotics also have an insatiable need for speed, and provide a future performance roadmap for such many-core chips. In the Rigel Project, we are developing a scalable architecture with 1000s of cores, and many TFLOPS of peak performance. Rigel has a well-defined and general purpose programmer interface that enables a broad class of task and data parallel applications to be mapped efficiently to the chip. In this talk I will describe the major results of the project thus far, touching on subjects such as scalable cache coherence through hardware and software, the Rigel task-based parallel programming model, area-power-performance tradeoffs for throughput-oriented architectures, and parallel programming tools.

  • Tohru Ishihara, Kyushu University, Japan
    Real-Time Dynamic Voltage Hopping on MPSoCs

  • Hiroyuki Tomiyama, Nagoya University, Japan
    Real-Time Operating Systems for MPSoC
    In multiprocessor systems, it is very difficult to guarantee and optimize the real-time responsiveness of application tasks and interrupt handlers, and the real-time responsiveness is significantly affected by the scheduling policy and internal structure of the real-time operating systems. This talk will discuss real-time issues in MPSoC, and present real-time operating systems which we have developed and released as open-source software.

  • Fredrik Dahlgren, ST-Ericsson
    Technological Trends, Design Constraints and Architectural Challenges in Mobile Phone Platforms
    A large number of capabilities are being integrated into mobile phones. The multimedia requirements are similar to those of dedicated camcorders and cameras one generation earlier. The application framework and software architectures are rapidly becoming similar to the desktop and laptop environments, and the network access performance is driven by the trends of mobile broadband. The high volumes push the employment of the very latest silicon and packaging technologies. However, there are extremely challenging and conflicting requirements with respect to performance, cost, power consumption, flexibility, and software compatibility, which greatly impact the system architecture. This presentation aims at surveying current market and technological trends. Emphasis is on the processing needs of the different functions of mobile phone platforms, and some architectural tradeoffs from a multi-core perspective.

Thursday  August  6:  Advanced Application day

SESSION 11: Keynote

  • Sarathy Sriprakash, Northrup Grumman, USA
    Drivers for Multi-Core and Embedded SoC from Military Unmanned Applications

SESSION 12: Mini-Keynotes

  • Frédéric Pétrot, TIMA-SLS, France
    Design and Use of Transactional Memory in MPSoCs
    Large scale coherent shared memory multiprocessor SoC have been announced recently by semi-conductor and system companies as a viable solution to the variability and cost concerns of future technologies. In this context, programming these machines becomes (again) a headache, because the programmer has to express thread level parallelism. Multi thread programs are considered very hard to read, and one of the reason for that is the use of locks that must be taken with a lot of care to avoid dead or live-locks and performance degradation when competition for them is high. Lock-free approaches have been advocated to avoid this problem, but lock-free programs are really hard to write. An alternative approach proposed a decade ago is the use of transaction, relying on a specific hardware support called Transactional Memory (TM). This talk will introduce the properties of TM systems and the challenges behind their design.

  • Olivier Franza, Intel, USA
    Multi-core Reliability Challenges
    The impact of reliability in large-scale complex multi-core microprocessors will be presented in this mini-keynote.

  • Katalin Popovici , MathWorks, France
    Formal Code Verification for Embedded Software
    The increasing complexity of software in embedded systems makes the verification of code correctness difficult. Furthermore, the code quality has to be able to be assessed in the early phases of the software development when only incomplete applications are available, so the code can still be improved at minimal cost. To address these challenges, a tool is proposed to support an integrated framework for static code verification based on abstract interpretation. The tool automatically proves the code correctness of software, and at compile time warns about possible run-time errors, so before execution on the target embedded system.

  • Vijaykrishnan Narayanan , Pennsylvania State University, USA
    A Programming Platform for Multi-FPGA MPSoCs
    This talk will present a platform for automatic generation of embedded systems for a class of imaging applications on a multi-FPGA platform. This platform includes tools for automated generation of accelerator cores as well as tools for seamless integration of these cores with the rest of the system. Finally, I will present the results from the use of this tool in generating accelerator systems for radar and medical imaging.

  • Joachim Kunkel, Synopsys, USA
  • Thierry Collette , CEA LIST, France
    MPSOC architectures for Computing for Imaging
    In the embedded domain, the image processing, the video coding and the image/video understanding need embedded resources for computing and memorisation. The design of an embedded computer must be done taking into account a compromise on different criteria, as the performance, the silicon area, the power consumption, the reliability, the programmability and the flexibility. In this talk, after a discussion around these criteria, two MPSoC architectures for image processing will be presented, one for mobile application based on a solution with high performance, low silicon area, very low power and high level of flexibility. The other one, mainly dedicated for automotive applications, will be presented too.

SESSION 13: In-depth technical presentations

  • Kees Goossens, NXP, Netherlands
    Hardwired Networks on FPGAs and their applications
    In this talk we discuss the role networks on chip can play in FPGAs. In particular, hardwiring them in the silicon gives improved communication performance, at (limited) loss of flexibility. More importantly, unifying the interconnect for data, control, and code (bitstreams) enables new applications.

  • Yuan Xie, Pennsylvania State University, USA
    Enabling Many-Core Design via 3D Stacking
    As fabrication of 3D integrated circuits has become viable, developing CAD tools and circuit/architectural techniques for future MPSOC design are imperative. In this talk, a brief introduction on 3D IC integration technology will be given, and the challenges that can enable the adoption of 3D MPSOC will be discussed, and novel 3D architectures will be introduced.
    More information can be found here

  • Takeuchi Yoshinori, Osaka University, Japan
    Simulator Generation Method of Configurable Processors for MPSoC
    In multi processor SoC (MPSoC) era, design exploration requires a vast amount of time. MPSoC includes multi- or many-processors, a lot of dedicated HWs, and communication buses. We propose a simulator generation method combining configurable processor developing environment and high abstraction level bus model.

  • Kiyoung Choi, Seoul National University, Korea
    A Reconfigurable MP-SoC Architecture and Application Mapping
    With the ever increasing requirements for more flexibility and higher performance in embedded systems design, coarse-grained reconfigurable array has drawn much attraction. But most of the existing architectures cannot be used for applications such as 3D graphics that require floating-point operations. Moreover, mapping applications on the reconfigurable array is difficult since it needs to solve multiple problems simultaneously including compilation of the application and configuration of the architecture with limited routing resources. This talk presents how to perform floating-point operations on a coarse-grained reconfigurable array of integer processing elements. It also presents how to map applications on the array considering limited routing resources.

  • Alain Greiner, LIP6-ASIM, France
    TSAR : a scalable, shared memory, many-cores architecture with global cache coherence
    We present the TSAR many-cores Architecture (TSAR stands for Tera Scale ARchitecture). It is an European Medea+ project, driven by the BULL company. This architecture supports a cache-coherent, shared memory, scalable up to 4096 32 bits cores. We detail the distributed, hybrid, cache coherence protocol, and some preliminary experimental results based on the virtual prototyping of the first version of the architecture.

Friday  August 7: Advanced Research day

SESSION 14: Keynote

  • Janos Sztipanovits, Vanderbilt University, USA
    Compositionality and High-Confidence Design
    Cyber-Physical Systems (CPS) are inherently heterogeneous in structure, components, and types of interactions among components. Existing compositional frameworks address separate design concerns and neglect their interactions. The result is weakened or lost composability with effects cropping up during system integration. For example, when designing embedded controller dynamics, both the physical platform and safety/security related design decisions need to be considered. Physical platform properties that are relevant for controller dynamics include timing uncertainties caused by the communication network, CPU dynamic power management that affects performance, jitter caused by the schedulers, value uncertainties caused by quantization, and finite precision arithmetic inaccuracies. Safety and security related architecture properties that are relevant for controller dynamics include timing uncertainties caused by fault management, architecture adaptation, intrusion detection algorithms, encryption/decryption of data in communication channels, the use of separation kernels and virtualization. Composability of fundamental properties of controller dynamics, such as performance and stability requires that we precisely manage cross-layer interactions. In this talk we will examine three different techniques for improving composability, each based on the utilization of cross-layer abstractions: (1) Passivity-based design of controller dynamics that can be leveraged to orthogonalize controller stability from implementation uncertainties, (2) Cross-layer abstractions that can be used for making controller dynamics robust against selected implementation properties inside an operation regime and (3) Decoupling controller dynamics from implementation uncertainties by introducing a specification layer for timing properties that can be guaranteed during execution.

SESSION 15: Mini-Keynotes

  • Omar Hammami, ENSTA, France
    Automatic MPSOC Generation and Design Space Exploration from Automatic Parallelizers
    Multiprocessor systems on chip require an integrated design approach combining software-architecture-implementation mutual effects analysis. In this work we present a design flow integrating automatic parallelization based generation of multiprocessor system on chip coupled with automatic large scale emulator based multiobjective design space exploration. The powerful combination of parallelization and design space exploration open new design avenues undisclosed by decoupled analysis.

  • Lars Bauer , University of Karlsruhe, Germany
    Classifying and Evaluating Performance-relevant Parameters for Reconfigurable Processors
    Processors that deploy fine-grained reconfigurable fabrics to implement application-specific accelerators on-demand obtained significant attention within the last decade. They trade-off the flexibility of general-purpose processors with the performance of application-specific circuits without tailoring the processor towards a specific application domain like Application Specific Instruction Set Processors (ASIPs). Vast amounts of reconfigurable processors have been proposed, differing in multifarious architectural decisions. However, it has been an open question, which of the proposed concepts is more efficient in certain application and/or parameter scenarios. We have developed a comprehensive design space exploration tool that allows to systematically explore diverse reconfigurable processors and architectural parameters. In this talk I will present a classification of fine-grained reconfigurable processors with their relevant parameters and focus on a systematic design space exploration across diverse reconfigurable processor concepts with the aim to aid a designer of a reconfigurable processor.

  • Ulrich Ramacher, Infineon,
  • Rolf Ernst, Technical University of Braunschweig, Germany
    MpSoC in safety related applications
    MpSoC are on their way into safety related applications. Examples are automotive and aerospace control. There are two usage scenarios. Previously distributed control units (ECUs) and applications are merged onto one MpSoC to save cost but keep a certain degree of isolation. Of particular interest are mixed criticality systems that combine functions with different safety requirements on one MpSoC leading to diverging design constraints and objectives. A second MpSoC usage is load distribution of a task system to increase performance and reduce power consumption of a single application, such as drive control. Here, the classical shared memory communication of multirate periodic systems leads to issues of synchronization and indirect blocking that make efficient design difficult. The talk will give an overview of challenges and potential solutions.

  • David Atienza , EPFL, Switzerland
    Thermal Modeling and Active Cooling for 3D MPSoCs
    Continuous technical advances in manufacturing technologies are fueling the trend towards more powerful 3D Multi-Processor System-on-Chip (MPSoC) designs. However, 3D stacking creates additional manufacturing steps beyond the standard technology ones due to the high power density resulting from the placement of computational units on top of each other. Therefore, the power densities in 3D stacks will increase heat density, leading to degraded performance if thermal-aware design and thermal management are not handled properly. For instance, one of the novel cooling proposals is to use water flowing through liquid microchannels in addition to traditional heat sinks. During this mini-keynote, I will present new thermal modeling methods for 3D MPSoC architectures, developed in cooperation with IBM. In particular, I will described the work conducted on the modeling of liquid microchannels for cooling in 3D stacks. Finally, I will briefly summarize the effort to assess the effectiveness on 3D MPSoC architectures of dynamic thermal management techniques developed for 2D MPSoCs, an discuss possible ideas to devise new runtime approaches that cool down the 3D chips by tuning the flow rate of the coolant.

  • Gabriela Nicolescu, Ecole Polytechnique de Montréal, Canada
    System-Level Analysis for MPSoC Integrating Optical NoC