











#### Introducing Tensilica Fusion DSP Ultra-low power processing for mobile and wearable applications Leading low · Lower power for always on power DSP • Efficient floating point support for sensor fusion · Quad MAC performance for narrowband performance Configurability Multiple configuration options allow core to be configured exactly for the targeted applications Get the right · No waste, maximum efficiency core immediately • SW compatible with HiFi. 70+ partners. 140+ SW Comprehensive packages SW and Comms ecosystem support · Optimized DSP library with fixed and float kernels ecosystem · Ease of use for SW developers cādence° © 2015 Cadence Design Systems, Inc. All rights reserved.











## **Always Alert Vision**





- · Imaging and video represents virtually all IoT content by data volume
- · Unique computing, power and bandwidth challenges
- · Rapid evolution of new applications driving new algorithms and architectures

13 © 2015 Cadence Design Systems, Inc. All rights reserved.

cādence°

### **Vision Processors**

- High-bandwidth Network on Chip to DDR and CPUs
- · A family of high-performance DSPs for imaging, video and vision
- · Rich SIMD/VLIW architecture
  - o 4-way instruction issue
  - o Up to 200 separate ALU operations per cycle
  - · Huge pixel bandwidth
    - o Integrated DMA for data streaming
    - o >2000b per cycle data memory bandwidth
  - Rich software environment
    - World's best DSP C compilers: 0 assembly code
    - Full OpenCV and OpenVX support with 800 optimized functions
    - Wide third-party program and support

cādence°





#### Application Diversity drives ISA flexibility A successful architecture Scalar ops vs Vector ops maximizes the fraction of kernels that can be Vector Ops per Instruction 2.5 vectorized A small number of functions may still use scalar ops heavily Scalar Ops per Instruction On-the-fly data ALU ops vs Data Reorganization Data Reorg Vector Ops per Instruction 0.25 0.25 reorganization may be important in a few kernels ALU : Reorg ratio varies from 10:1 to 1:1 Efficient data reorganization ALU Vector Ops per Instruction boosts benefit of vectorization cādence° 17 © 2015 Cadence Design Systems, Inc. All rights reserved.



#### Specialized Vision Processing: CNN Engine Example CNN Engine Area vs. MHz (WC) Compute-intensive specialized DSP architecture using TIE on Xtensa Very high throughput (256 700,000 MAC/cycle) on filter and convolutionintensive algorithms for vision and object recognition applications 600,000 High memory bandwidth to handle huge stream of 16b weights 500,000 Architecture: SIMD/VLIW architecture with 47 general-purpose DSP vector ops on top of 80 base ops 400,000 Free TIE source distribution: Built to 300,000 be freely adapted and evolved by - 28hpm 12t customers Total floorplan area with memory: 0.8 1.0 mm<sup>2</sup> - 28hpc 9t · Scalable multi-core 100.000 → 28hpc 9t lay 19 © 2015 Cadence Design Systems, Inc. All rights reserved. $\mathsf{MHz}$



# cādence®

Cadence, Tensilica, Xtensa and the Cadence logo are registered trademarks of Cadence Design Systems, Inc. All other trademarks and logos are the property of their respective holders.