Program-In, Chip-Out: The HP Labs Automatic
Hardware Synthesis Project
Rob Schreiber, HP Labs.,USA
Chip designers are called on to design an ever larger variety of low-cost, low-power, embedded ASICs able to process high-bandwidth multimedia data streams. Many of these ASICs use custom hardware accelerators for the computational "hot spots." Design time, cost, energy consumption, and performance are important in these designs. As Moore's Law and the growth of the industry outstrip our ability to create and debug such designs by hand, automatic ASIC design has become a hot topic.
In order to reduce design time and design cost, the HP Labs Program-In, Chip-Out (PICO) project focuses on automatic design from high-level specifications. Source code (in a subset of C) for a performance-critical loop nest is used as a behavioral specification. The PICO system compiles the source code into a custom hardware design in the form of a parallel, special-purpose processor array. The user, or a design exploration tool, specifies the number of processing elements and their performance. The system produces the array, its local RAM, its control logic, its interface to memory, and its interface to a host processor. PICO also modifies the user's application software to make use of the generated accelerator. In experimental comparisons, PICO designs are slightly more costly than hand-designed accelerators with the same performance.
In this talk, we give an overview of PICO, and describe practical solutions to some subproblems that play a major role -- tiling, scheduling, bitwidth analysis, data path synthesis, local memory management, and estimation of the number of array elements referenced in a loop nest.
Component-based Design for Multiprocessor SoC
Ahmed Jerraya, TIMA Laboratory, France
Application-specific multiprocessor system-on-chip requires the integration of heterogeneous components including processors (DSP and microcontrollers), memories, peripherals (DMA, interrupt controllers, …) and sophisticated communication network. This lecture explores a high-level component-based methodology and design environment for application-specific multiprocessor SoC architectures. The system specification is a virtual architecture annotated with configuration parameters described in a SystemC like model. The component-based design environment has automatic wrapper-generation tools able to synthesize hardware interfaces, device drivers, and operating systems that implement a high-level API. The experiment of this approach shows a drastic reduction of design time without any significant loss of efficiency in the final design.
A methodology for Design Space Exploration
of on-chip networks
Luciano Lavagno, Politecnico di Torino, Italy
This presentation will introduce a design methodology for architectural exploration of networks on chip based on decoupling functionality from architecture, and computation from communication. A simple example of various mapping and refinement options for a single function-to-function token-based communication will be used to practically illustrate the basic concepts.
A larger realistic on-chip network will then be used to describe what sort of information can be obtained by various mapping and performance analysis experiments. Although some specific tools will be used to make the explanation more concrete, the methodology is fully tool-independent, and can be implemented on top of several publicly available design frameworks.
A methodology and architecture for Network
Faraydon Karim, STMicroelectronics Inc., USA
This talk first goes through the stages of SoC development specifically the network processor. Then lays down a methodology to speed up the research and development stages. Then describes an architecture that best suits SoC development from aspects of R&D.
Automated Processor Generation for System-on-Chip
Chris Rowen, Tensilica, USA
New application-focused system-on-chip platforms motivate new application-specific processors. Extensible processor architectures offer the efficiency of tuned logic solutions with the flexibility of standard high-level programming methodology. Automated extension of processor function units and the associated software environment – compilers, debuggers, simulators and real-time operating systems – satisfies these needs. At the same time, designing at the level of software and instruction set architecture significantly shortens the design cycle and reduces verification effort and risk. This lecture describes the key dimensions of extensibility within the processor architecture, the instruction set extension description language and the means of automatically extending the software environment from that description. It also outlines the essential methods to combine these tiny, fast processors into effective system-on-chip solutions, displacing both traditional processors and traditional RTL-based logic blocks.
Managing dynamic real-time tasks for modern
multi-media systems on multi-processor platforms
Rudy Lauwereins, IMEC & K.U.Leuven, Belgium
Run-time task scheduling on a multiprocessor platform forms a real challenge for real-time embedded systems, where also costs like energy consumption are of major concern. This problem will be illustrated first in the context of state-of-the-art embedded multi-media systems that are becoming more and more dynamic due to e.g. QoS issues. These applications also require high-performant heterogeneous multi-processor platforms to achieve real-time. Typically the run-time schedules for such dynamic systems are determined by the RTOS. Experience shows however that this is not effective for keeping the energy or memory footprint low. This cost-sensitive problem formulation has also not been considered in the traditional dynamic scheduling research. The approach proposed here, intends to combine the advantages of the design-time scheduling phase where all the necessary information is collected and prepared, and the flexibility of a dynamic scheduling phase that exploits this information. This approach allows to optimize the system energy consumption at run time based on precomputed cost-performance Pareto curves. The application-specific run-time scheduler is then integrated on top of the RTOS. Application results will demonstrate the feasibility of collecting these Pareto curves for realistic applications (based on prototype tools that have been developed). Also the impact of the run-time phase will be illustrated.
MP-SoC modeling: A Software point of view
Marcello Coppola, STMicroelectronics, France
Multi-Processor-System-on-Chip (MP-SOC) is a set of components, embedded on the same chip, which collaborates to achieve a common purpose. The design flow for such system is a key point for companies as STM, because a good flow may reduce the gap between productivity and time-to-silicon. The necessity to have several abstraction levels for the software and hardware is key point for such design flows. Today, hardware modeling is well covered and understood but there is a lack in modeling for software components. This talk, first of all introduces which are the software components that we can found in MP-SOC, then a comprehensive description for software components will be given. Finally a modeling methodology based on SystemC for such components is described. Two small examples conclude the talk.
Communication as the backbone for a well balanced
Eric Verhulst, Eonic Solutions GmbH, Germany
In a multi-processing environment a communication primitive is the logical equivalent of a local assignment. This means that in such an environment the communication functions are as important as the processing functions as found on the processors themselves to achieve a well balanced system design. The underlying programming model is offered by CSP (Communicating Sequential Processes) of C.A.R Hoare. This model also allows an early modeling of the final application that can be mapped onto adequately design hardware. A step further is achieved by using reprogrammable logic to build an active communication backbone. Such a model is found in the CSPA (Communicating Signal Processing Architecture) developed by Eonic for board level DSP systems. It allows to further maximize the CPU cycles available for processing while putting regular dataflow type processing in the data-streams. The result is a system architecture that is well balanced, more scalable and more efficient as it eliminates many of the bottlenecks that bus based architectures exhibit.
Scheduling of Multi-process System Specifications
Jan Madsen, Informatics and Mathematical Modelling, Technical University of Denmark, Denmark
A multi-process system specification exhibits significant non-determinism, as only a partial ordering is specified. The behavior in terms of produced output, will be the same regardless of internal behavior or implementation details. However, the actual ordering and choice of implementation may influence a number of other parameters such as performance, power consumption, flexibility, reliability, production cost, etc. These parameters have to be captured and explored in order to produce a successful SoC design.
In this lecture we will focus on the scheduling problem. We will introduce scheduling terminology and basic concepts, based on classical real-time scheduling for uni-processor systems. These are then extended to handle shared resources, data dependencies, context switching, cache effects and power reduction, which are all relevant for SOC. Finally, we will extend it to the problem of multi-processor scheduling.
Trends and Requirements for Network Processor
Pierre Paulin, SoC Platform Automation, STMicroelectronics, Central R&D, Canada
This talk first presents the spec of an 'ideal' NPU embedded S/W and SoC tool environment, based on our interactions with NPU users in telecom system houses. We then present a survey of current SoC tool environments for leading Network Processor Unit (NPU) vendors. We highlight the technical R&D challenges related to multi-processor S/W development and analysis.
We present 'StepNP', an experimental NPU R&D platform, which serves as the basis for NPU SoC tool and methodology exploration. We conclude with a description of our R&D in NPU SoC tools.
Datapath width optimization for customizable
Hiroto Yasuura, Graduate School of Information science and Electrical Engineering, Kyushu University, Japan
The datapath width strongly affects system performance, chip area, and energy consumption of an SoC. For a given set of applications, we can optimize datapath width of each components on the SoC. In this talk, the datapath width optimization problem is defined and several optimization techniques and tools for the optimization are introduced. The bitwidth analysis gives information on required datapath width. Soft-core processor is a customizable processor whose datapath width is parameterized. Valen-C and its compiler provide programming environment for Soft-core processor. A dynamic control approach for the datapath width is also presented. Several optimization methods for area minimization and energy minimization are persented with experimental results. These optimization methods significantly reduces area and power consumption of processors and memories in SoC, preserving system performance. We also propose a new design paradigm, called quality-driven design, which trades the computational quality (e.g., quality of video output) with performance, cost and energy/power.
Embedded System Architecture: a multi-faceted
Philip Koopman, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Distributed embedded systems are becoming prevalent in applications ranging from vehicles to home automation. Multi-processor systems on a chip can be considered both as components of such systems and as distributed embedded systems themselves. Designing such systems to be competitive, cost-effective, and supportable over the lifecycle of both products and companies requires a multi-disciplinary approach to architecture.
An architecture is an organized collection of components that encompasses both behaviors and interfaces with respect to a specific abstraction approach. The art in creating a good architecture is in knowing where to put interfaces and identifying the right abstraction approach. Inevitably, more than one concurrent architectural representation is needed to represent all the important aspects of a system. At the highest level, an embedded system architecture must provide decoupled but coordinated views of hardware, software, communications, and control. In many applications, distinct architectures must also be provided for human interface, maintenance/upgrade, safety/security, validation/verification, component coordination frameworks, and graceful degradation. And, of course, all these facets of the system must be compatible with overarching business and industry constraints. Unfortunately, the need for designers to decouple architectural views and subdivide components to manage complexity can at times complicate the creation of design tools that must use global tradeoffs in their quest to optimize cost and performance. Thus, there is often an inherent tension between optimization and architectural cleanliness when creating highly complex systems.
This talk will discuss the different types of architectural system views, common design patterns used for each architectural type, and the benefits of using a multi-architecture approach to representing and understanding systems. A brief discussion of recent research results will include a report of experiences representing a distributed embedded real-time control system in the Unified Modeling Language as well as progress in creating a unified approach to achieving graceful degradation for distributed embedded systems.
Multiprocessor Architectures for Signal Processing
Peter Pirsch, University of Hannover, Germany
Signal processing applications with their real-time constraints have caused a still growing demand for processing power and data throughput in today's DSP architectures. Implementations ranging from digital video/audio equipment to portable devices such as cellular phones and personal digital assistants have to deal with both, supporting a variety of increasingly complex algorithms and meeting low power contraints within compact realizations. Multiprocessor architectures address these requirements by mapping processing schemes onto a set of application-specific processor cores on a single chip.
This lecture will cover following topics, including several examples: