embedded and application-specific systems-on-chip · embedded and application-specific...

44
Embedded and Application-Specific Systems-On-Chip... ...in a nutshell! Charis Theocharides Lecturer Department of Electrical and Computer Engineering ([email protected]) June 2009

Upload: trankhuong

Post on 14-Jul-2018

235 views

Category:

Documents


0 download

TRANSCRIPT

Embedded and Application-Specific Systems-On-Chip...

...in a nutshell!

Charis TheocharidesLecturer

Department of Electrical and Computer Engineering

([email protected])

June 2009

Embedded SystemsConnecting

Consumer

Computing

ENIAC (Electronic Numerical Integrator and Computer)

U.S. Army Computer @ University of Pennsylvania

o ENIAC had around18,000 vacuum tubes, 70,000 resistors, 10,000 capacitors, και

6,000 switches.

o It was 33 meters long, 3 meters high, and 1 meter wide. It consumed140 kilowatts power.

ENIAC-on-a-Chip Moore School of Electrical Engineering, University of Pennsylvania

http://www.ee.upenn.edu/~jan/eniacproj.html

Size: 7.44mm x 5.29mm; 174,569 transistors; 0.5 um CMOS technology (triple metal layer).

The Ubiquitous Microchip The Ubiquitous Microchip

Sources: Sony, Philips, McLaren Mercedes, Apple, Airbus, Lexus, Toshiba

System Energy ConsumptionExponential increase in both gate and sub-threshold leakage presents one of the most challenging issues in sub-90nm technologies.

Power Density, Cooling, Performance degradation…

Thermal Issues – Leakage increases with increase in temperature

0.010.11Lpoly (um)

1E-5

0.0001

0.001

0.01

0.1

1

10

100

1000

Pow

er (W

/cm

2)

Passive Power Density

(25C)

A ctive P ower D ensity

G ate -Leakage

R. Puri, L. Stok, S. Bhattacharya, “Keeping Hot Chips Cool”, DAC 2005

Playstation

3 Grill!

Performance-Power Efficiency

Source: Kerry Bernstein, IBM

Variability

Need models for determining degree of variabilityStatic–

Process Variations

Runtime–

Changes in VDD

, temperature, Vt

, local coupling, external noise sources,

application workload•

Lack of modeling resources or information flow can transform

variability to uncertainty.40

50

60

70

80

90

100

110

Tem

pera

ture

(C)

System Reliability•

Shrinking feature size results in decreasing Vdd

and Vt–

Crosstalk, coupling noise, soft errors and process variations.–

Reliability a critical design issue

Reliable System

Acceptable OutputsInputs

Defects, Process Variation,Degraded Transistor Devices

Noise (external, internal) Radiation

Design Errors, Software Failures

Human Errors Malicious Attacks

Source: Subhasish Mitra - Intel Corporation / Stanford University

Performance Power

Data Integrity Availability

Security

Reliability Impact –

June 18th, 2005

http://www.foxcarolina.com/

S-Class

1998 –

Electronics Chart

Powertrain - Suspension - Displays - Comfort - Diagnosis - BrakingActive & Passive Safety - Telematic - Anti-Theft Systems - ...

System Complexity•

Complicated Algorithms!

Complicated Design!

Design Productivity Gap:–

Design vs. Algorithm Development–

Design vs. Marketing and Sales–

Design vs. System Architecture

Need designers who understand algorithms!•

Need algorithm development that understands architecture and design!

Need Effective System Integration!

So…•

Fact:

We are heading towards a billion transistors

placed on a single chip–

Intel Itanium 2 (Montecito) has 1.7 billion transistors•

Fact:

Reliability, Energy, Variability and the

Interconnect are critical roadblocks in designing efficient chips.

Fact:

Performance is no more valued at the GHz scale.

With these facts in mind, how do we design next-generation

application-specific h/w?

The SoC

Design Paradigm•

Collection of Components (Cores) placed on a single chip, forming a complete “system”–

Cost –

Market Capability

Specifications•

Performance Criteria•

Energy•

Reliability

Design and Test Reuse–

Hard Cores and Soft Cores

Plug and Play Approach–

Plug the core into a “predefined”

location and

expect it to work.

Systems-on-Chip

Printed Circuit Boards Systems-on-Chip

VGA CORE

ADC / DAC

ANALOG

DSP

Media Core

SoC

Design

Chips Composed of several processing elements

No longer monolithic–

Homogeneous Systems (Chip multi/manycores)

Heterogeneous Systems (MPSoCs)© Tilera Corporation

SoC Market Growth

0

5

10

15

20

25

30

35

40

45U

SD(b

illio

n)

2003 2004 2005 2006 2007 2008 2009Year

24.6%

Source: http://www.mindbranch.com

Electronic Systems Design Chain

Design Science

Manufacturing

Implementation

System Design

Platforms

IP

InterfacesFabrics

Design Process Today

C/C++ Environment

•System Level Design–Hardware and Software–Algorithm Development–Processor Selection–Done mainly in C/C++

EDA Environment

•IC Development–Hardware–Implementation Decisions–Done mainly in Verilog/VHDL

C/C++ Environment

•Software –Code development–RTOS details–Done mainly in C/C++

RefinementRefinementManual Translation

$ $Emulation /Prototyping

The Verification Process

Productivity Gaps

EASOC Lab•

Embedded and Application Specific Processors Laboratory (EASOC)–

2 PhD Students–

2 MSc

Student–

1 Graduate Researcher (summer intern)–

2 Undergraduate Students

http://www.ece.ucy.ac.cy/labs/easoc/index.html

Embedded and Application Specific Systems-On-Chip

In a nutshell…what we do.•

Embedded Intelligence

Ambient-Assisted Living and Ambient Intelligence–

Intelligent SoC

Management•

(Embedded) Image Processing, Computer Vision and Multimedia Hardware Architectures

• Hardware/Software Co-Design and System-Level Design Space Exploration of Embedded and Mobile Applications

• Biological Applications mapped on Reconfigurable Hardware and ASICs,Embedded

Biometrics and

Bioinformatics •

Manycore/Multicore (CMP/MPSoC) Architectures and Algorithms

Simulation tools and methodologies for MPSoC and Manycore Evaluation

Methodology

SubSub--SystemSystem Requirements Requirements (e.g. ABS)(e.g. ABS)

SubSub--SystemSystem ImplementationImplementation

OEM

Tier1 Supplier

Integration of new functions Integration of Integration of newnew functionsfunctions

Integration effort

Log (effort)Log (effort)

Log (Log (yearsyears))

~n 3~n 3

~n 2~n 2

~n~n

signalssignals

InteractionsInteractions trans/rectrans/rec

IntegrationIntegration

More than 50% of the More than 50% of the development effort is in development effort is in validationvalidation

Simulation and Evaluation

Programmable Hardware•

FPGA building blocks:

Programmable logic blocks–

Implement combinatorial and sequential logic

Programmable interconnect–

Wires to connect inputs and outputs to logic blocks

Programmable I/O blocks –

Special logic blocks at the periphery of device for external connections

I/O

I/O

Logic block

Interconnection switches

I/O

I/O

Modern FPGAs

Embedded Intelligence•

SoC

Monitoring and Management

Hardware monitoring and dynamic optimization of SoC

operation

Integration of Machine Learning algorithms in h/w for forecasting and optimization of SoC

functions

(interconnect traffic, memory and links allocation for low power, etc.)

Ambient Intelligence –

Utilization of ambient parameters for intelligent hardware behavior

Machine-awareness Read and React•

Ambient Assisted Living (AAL)–

Design of embedded processors to facilitate AAL applications

A cognitive system is one that– can reason, using substantial amounts of

appropriately represented knowledge– can learn from its experience so that it

performs better tomorrow than it did today– can explain itself and be told what to do– can be aware of its own capabilities and

reflect on its own behavior – can respond robustly to surprise– Can self-support in power energy

Systems that know what they’re doing

Embedded IntelligenceDeveloping Cognitive Systems on Chip:

Source: DARPA

Embedded Intelligence

Cognitive Teams

Cognitive Systems

Foundational Science and Mathematics (incl. Bio-inspired Computing, new approaches to Trust Management,…)

Robust Software and Hardware

Systems That KnowWhat They’re Doing

PerceptionRepresentation

& Reasoning

Communication&

InteractionLearning

Cognitive ArchitectureApplications

Systems

Core Cognition

Infrastructure

Cognitive Teams

Cognitive Teams

Cognitive Systems

Foundational Science and Mathematics (incl. Bio-inspired Computing, new approaches to Trust Management,…)

Robust Software and HardwareRobust Software and Hardware

Systems That KnowWhat They’re DoingSystems That KnowWhat They’re Doing

PerceptionRepresentation

& Reasoning

Communication&

InteractionLearningPerception

Representation&

Reasoning

Communication&

InteractionLearning

Cognitive ArchitectureCognitive ArchitectureApplicationsApplicationsApplications

Systems

Core Cognition

Infrastructure

Systems

Core Cognition

Infrastructure

Real-Time Image Processing, Computer Vision and Multimedia Hardware

Architectures

Image Processing and Computer Vision applications–

Generic object detection and recognition–

Face Detection, Face Recognition, Gender/Race classification •

Robotics, Security, Surveillance, Demographics, Marketing…–

Real-Time 3d Stereoscopic Reconstruction–

Real-Time Image Enhancement–

Signal Processing on FPGAs

Neural Sensors — Silicon RetinaThe side view

Primary Visual Cortex

*The eyes can react before the brain analyzes the visual image.

The top view

Embedded Vision

Pattern Recognition in Silicon

Recognition Neural NetworkInput Receptive Fields

Output Neuron

10x10

10x10 10x10

10x10

5x55x5

5x55x5

5x55x55x55x5

5x20

1st Hidden Layer of Neurons

2nd Hidden Layer of Neurons

5x205x205x205x205x20

Input Image

FACE ?

Lighting Correction

Neural Network in Silicon [2005]

Can process ~40 frames per second!

VLSI’04:T. Theocharides, G. Link, N. Vijaykrishnan, M. J. Irwin, W. Wolf, “Embedded Hardware Face Detection”, Proc. of the International Conference on VLSI Design, January 2004, pp. 133-138.

Hardware/Software Co-Design and System-Level Design Space Exploration of Embedded and Mobile Applications

A design methodology supporting the cooperative and concurrent development of hardware and software (co-specification, co-

development, and co-verification) in order to achieve shared functionality and performance goals for a combined system

The target is to develop a methodology for performing hardware and software development, fabrication and support cost modeling concurrent with hardware/software co-design.

JPEG2000 Case Study: Flow Steps

Courtesy of Celoxica Corp.

Biological Applications and Bioinformatics

String Searching and Matching•

DNA Sequencing architectures

Portable lab-on-a-chip–

Processors to perform real-time blood analysis •

Feature identification algorithms

BLAST Algorithm -in collaboration with the Technology Institute of Crete (Y. Papaefstathiou, D. Pnevmatikatos)–

Basic Local Alignment Search Tool, •

Comparison of primary

biological sequence information, such as the amino-acid

sequences of different proteins

or the nucleotides

of DNA sequences.•

HIGHLY parallel

Multicore/Manycore Design Space Exploration

Application Mapping issues (hw/sw

co-design)•

Resource Allocation using Intelligent Algorithms

Energy/Reliability Tradeoffs•

Dynamic Power/Energy Management

Simulation tools and methodologies for MPSoC and Manycore Evaluation –

Simulation and evaluation framework for manycore architectures and MPSoCs

Collaboration with M. K. Michael and M. Polycarpou

Current Projects –

Pattern Recognition and Machine Learning in H/W

Hardware Architectures for Support Vector Machines, Neural Networks and AdaBoost–

Neural Networks have been pretty heavily explored.–

Utilize on-chip networks to alleviate interconnect issues–

AdaBoost uses Boosting for machine learning and detection•

Most popular detection algorithm, wavelet approach–

Support Vector Machines suitable for linearly and non-linearly separable classes but require complicated hardware resources

Systolic Array Implementation

Current Projects (Intelligent MPSoCs)•

Control-theoretic power management for on-chip interconnect / manycore systems–

Collaboration with Vassos

Soteriou–

Dynamic Voltage Scaling –

use forecasting and perhaps control theory to determine when to switch

Forecasting as a reliability mechanism–

Learn traffic/application patterns and observe unusual behavior

Current Projects –

Application Mapping in H/W

FPGA-based implementation of QAM for integration in mobile environments (with SignalGenerix)

BLAST algorithm –

Investigate algorithm components that can be designed using multi-FPGA boards

Real-Time 3D Reconstruction from multi-view cameras–

FPGA prototype to be completed this fall

Time Is Yours!Sales are deteriorating

Our plan is to invent a kind of thingy, that

everyone wants to buy

So, I have fulfilled my job as visionary leader, how long will you need for yours?

Do KIOS members have any algorithms/ideas to design a 100-Billion-Transistor SOC?

The Key Question is…