hpec 2001 options for embedded systems. constraints, challenges, and approaches hpec 2001 lincoln...

62
HPEC 2001 Options for embedded systems. Constraints, challenges, and approaches HPEC 2001 Lincoln Laboratory 25 September 2001 Gordon Bell Bay Area Research Center Microsoft Corporation

Post on 18-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

HPEC 2001 HPEC 2001

Options for embedded systems.Constraints, challenges, and approaches

HPEC 2001Lincoln Laboratory25 September 2001

Gordon Bell

Bay Area Research Center

Microsoft Corporation

More architecture options: Applications,

COTS (clusters, computers… chips),

Custom Chips…

HPEC 2001 HPEC 2001

The architecture challenge: “One person’s system, is another’s component.”- Alan Perlis Kurzweil: predicted hardware will be compiled and be

as easy to change as software by 2010 COTS: streaming, Beowulf, and www relevance? Architecture Hierarchy:

– Application– Scalable components forming the system– Design and test – Chips: the raw materials

Scalability: fewest, replicatable components Modularity: finding reusable components

HPEC 2001 HPEC 2001

The architecture levels & options The apps

– Data-types: “signals”, “packets”, video, voice, RF, etc.– Environment: parallelism, power, power, power, speed, … cost

The material: clock, transistors… Performance… it’s about parallelism

– Program & programming environment– Network e.g. WWW and Grid– Clusters– Storage, cluster, and network interconnect– Multiprocessors– Processor and special processing– Multi-threading and multiple processor per chip– Instruction Level Parallelism vs– Vector processors

HPEC 2001 HPEC 2001

Sony Playstation export limiits

A problem X-Box would like to have, … but have solved.

Will the PC prevail for the next decade as a/the dominant platform? … or 2nd to smart, mobile devices?

Moore’s Law: increases performance; Bell’s Corollary reduces prices for new classes

PC server clusters aka Beowulf with low cost OS kills proprietary switches, smPs, and DSMs

Home entertainment & control …– Very large disks (1TB by 2005) to “store everything”– Screens to enhance use

Mobile devices, etc. dominate WWW >2003! Voice and video become the important apps!

C = Commercial; C’ = Consumer

Where’s the action? Problems? Constraints from the application: Speech, video, mobility, RF, GPS,

security…Moore’s Law, networking, Interconnects

Scalability and high performance processing– Building them: Clusters vs DSM– Structure: where’s the processing, memory, and switches (disk and ip/tcp

processing)– Micros: getting the most from the nodes

Not ISAs: Change can delay Moore Law effect … and wipe out software investment! Please, please, just interpret my object code!

System (on a chip) alternatives… apps drivers– Data-types (e.g. video, video, RF) performance, portability/power, and cost

HPEC 2001 HPEC 2001

COTS: Anything at the system structure level to use?

How are the system components e.g. computers, etc. going to be interconnected?

What are the components? Linux What is the programming model?

– Is a plane, CCC, tank, fleet, ship, etc. an Internet?– Beowulfs… the next COTS– What happened to Ada? Visual Basic? Java?

HPEC 2001 HPEC 2001

ComputingSNAPbuilt entirelyfrom PCs Wide & Local

Area Networksfor: terminal,

PC, workstation,& servers

Centralized& departmental

uni- & mP servers(UNIX & NT)

Legacymainframes &

minicomputersservers & terms

Wide-areaglobal

network

Legacymainframe &

minicomputerservers & terminals

Centralized& departmental

servers buit fromPCs

scalable computers

built from PCs

TC=TV+PChome ...

(CATV or ATM or satellite)

???

Portables

A space, time (bandwidth), & generation scalable environment

Person servers (PCs)

Person servers (PCs)

MobileNets

HPEC 2001 HPEC 2001

Five ScalabilitiesSize scalable -- designed from a few components,

with no bottlenecks

Generation scaling -- no rewrite/recompile or user effort to run across generations of an architecture

Reliability scaling… chose any level

Geographic scaling -- compute anywhere (e.g. multiple sites or in situ workstation sites)

Problem x machine scalability -- ability of an algorithm or program to exist at a range of sizes that run efficiently on a given, scalable computer.

Problem x machine space => run time: problem scale, machine scale (#p), run time, implies speedup and efficiency,

HPEC 2001 HPEC 2001

Why I gave up on large smPs & DSMs

Economics: Perf/Cost is lower…unless a commodity Economics: Longer design time & life. Complex.

=> Poorer tech tracking & end of life performance. Economics: Higher, uncompetitive costs for processor &

switching. Sole sourcing of the complete system. DSMs … NUMA! Latency matters.

Compiler, run-time, O/S locate the programs anyway. Aren’t scalable. Reliability requires clusters. Start there. They aren’t needed for most apps… hence, a small

market unless one can find a way to lock in a user base. Important as in the case of IBM Token Rings vs Ethernet.

HPEC 2001 HPEC 2001

What is the basic structure of these scalable systems?

Overall Disk connection especially wrt to

fiber channel SAN, especially with fast WANs

& LANs

HPEC 2001 HPEC 2001

GB plumbing from the baroque:evolving from 2 dance-hall SMP & Storage model

Mp — S — Pc : | :

|—————— S.fc — Ms| :

|— S.Cluster |— S.WAN —

vs.MpPcMs — S.Lan/Cluster/Wan —

:

HPEC 2001 HPEC 2001

SNAP Architecture----------

HPEC 2001 HPEC 2001

ISTORE Hardware Vision

System-on-a-chip enables computer, memory, without significantly increasing size of disk

5-7 year target:MicroDrive:1.7” x 1.4” x 0.2”

2006: ?1999: 340 MB, 5400 RPM,

5 MB/s, 15 ms seek2006: 9 GB, 50 MB/s ? (1.6X/yr capacity, 1.4X/yr BW)

Integrated IRAM processor2x height

Connected via crossbar switchgrowing like Moore’s law

16 Mbytes; ; 1.6 Gflops; 6.4 Gops10,000+ nodes in one rack! 100/board = 1 TB; 0.16 Tflops

HPEC 2001 HPEC 2001

The Disk Farm? or a System On a Card?

The 500GB disc cardAn array of discsCan be used as 100 discs 1 striped disc 50 FT discs ....etcLOTS of accesses/second of bandwidth

A few disks are replaced by 10s of Gbytes of RAM and a processor to run Apps!!

14"

HPEC 2001 HPEC 2001

0

50

100

150

200

250

100Mbps Gbps SAN

Transmitreceivercpusender cpu

Time µs toSend 1KB

The Promise of SAN/VIA/Infiniband http://www.ViArch.org/

Yesterday: – 10 MBps (100 Mbps Ethernet)

– ~20 MBps tcp/ip saturates 2 cpus

– round-trip latency ~250 µs

Now– Wires are 10x faster

Myrinet, Gbps Ethernet, ServerNet,…

– Fast user-level communication

- tcp/ip ~ 100 MBps 10% cpu- round-trip latency is 15 us

1.6 Gbps demoed on a WAN

HPEC 2001 HPEC 2001

Top500 taxonomy… everything is a cluster aka multicomputer Clusters are the ONLY scalable structure

– Cluster: n, inter-connected computer nodes operating as one system. Nodes: uni- or SMP. Processor types: scalar or vector.

MPP= miscellaneous, not massive (>1000), SIMD or something we couldn’t name

Cluster types. Implied message passing.– Constellations = clusters of >=16 P, SMP– Commodity clusters of uni or <=4 Ps, SMP– DSM: NUMA (and COMA) SMPs and constellations– DMA clusters (direct memory access) vs msg. pass– Uni- and SMPvector clusters:

Vector Clusters and Vector Constellations

Courtesy of Dr. Thomas Sterling, Caltech

HPEC 2001 HPEC 2001

Inno

vatio

n

The Virtuous Economic Cycle drives the PC industry… & Beowulf

Volum

e

Competition

Standards

Utility/value

DOJ

Greater availability

@ lower cost

Creates apps, tools, training,Attracts users

Attracts suppliers

HPEC 2001 HPEC 2001

BEOWULF-CLASS SYSTEMS

Cluster of PCs– Intel x86– DEC Alpha– Mac Power PC

Pure M2COTS Unix-like O/S with source

– Linux, BSD, Solaris Message passing programming model

– PVM, MPI, BSP, homebrew remedies Single user environments Large science and engineering applications

Lessons from Beowulf

An experiment in parallel computing systems Established vision- low cost high end computing Demonstrated effectiveness of PC clusters for some (not all) classes of

applications Provided networking software Provided cluster management tools Conveyed findings to broad community Tutorials and the book Provided design standard to rally community! Standards beget: books, trained people, software … virtuous cycle that

allowed apps to form Industry begins to form beyond a research project

Courtesy, Thomas Sterling, Caltech.

HPEC 2001 HPEC 2001

Designs at chip level…any COTS options?

Substantially more programmability versus factory compilation

As systems move onto chips and chip sets become part of larger systems, Electronic Design must move from RTL to algorithms.

Verification and design of “GigaScale systems” will be the challenge.

HPEC 2001 HPEC 2001

The Productivity Gap

1

Logi

c Tr

ansi

stor

s pe

r Chi

p

(K)

P

rodu

ctiv

ityTr

ans.

/Sta

ff - M

onth

10

100

1,000

10,000

100,000

1,000,000

10,000,000

10

100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

Logic Transistors/Chip

Transistor/Staff Month

58%/Yr. compoundComplexity growth rate

21%/Yr. compoundProductivity growth rate

Source: SEMATECHSource: SEMATECH

1981

1983

1985

1987

1989

1991

1993

1995

1997

1999

2003

2001

2005

2007

2009

xxx

x xx

x

HPEC 2001 HPEC 2001

What Is GigaScale? Extremely large gate counts

– Chips & chip sets– Systems & multiple-systems

High complexity– Complex data manipulation– Complex dataflow

Intense pressure for correct , 1st time – TTM, cost of failure, etc. impacts ability to have a silicon

startup Multiple languages and abstraction levels

– Design, verification, and software

HPEC 2001 HPEC 2001

EDA Evolution: chips to systemsGigaScale Architect

HierarchicalVerification

plus

2005 (e.g. Forte)GigaScale

Simulation IC Designer

1985(Daisy, Mentor) Gates10K gates

System Architect

Testbench AutomationEmulationFormal Verification

plus

1995 (Synopsys & Cadence)RTL

1M gates

Chip Architect

ASIC Designer

SOC Designer

1975 (Calma & CV)Physical design Courtesy of Forte Design Systems

HPEC 2001 HPEC 2001

If system-on-a-chip is the answer, what is the problem? Small, high volume products

– Phones, PDAs, – Toys & games (to sell batteries)– Cars– Home appliances– TV & video

Communication infrastructure Plain old computers… and portables Embeddable computers of all types where

performance and/or power are the major constraints.

HPEC 2001 HPEC 2001

SOC Alternatives… not including C/C++ CAD Tools

The blank sheet of paper: FPGA Auto design of a processor: Tensilica Standardized, committee designed components*,

cells, and custom IP Standard components including more application

specific processors *, IP add-ons plus custom

One chip does it all: SMOP*Processors, Memory, Communication & Memory

Links,

HPEC 2001 HPEC 2001

Tradeoffs and Reuse Model

System ApplicationSystem Application

Silicon ProcessSilicon Process

PlatformPlatformExportationExportation

StructuredStructuredCustomCustom

RTLRTLFlowFlow

FPGAFPGA FPGA &FPGA &GPPGPP

ASIPASIP DSPDSP GPPGPP

ApplicationApplicationImplementationImplementation

ProgrammabilityProgrammabilityLow HighTime to Develop/Iterate New ApplicationTime to Develop/Iterate New ApplicationHigh LowerCost to Develop/Iterate New ApplicationCost to Develop/Iterate New ApplicationHigh LowerMOPS/mWMOPS/mWHigh Low

IUnknownIUnknown

IOleObjectIOleObjectIDataObjectIDataObject

IPersistentStorageIPersistentStorageIOleDocumentIOleDocumentIUnknownIUnknown

IFooIFooIBarIBar

IPGoodIPGoodIOleBadIOleBad

IUnknown

IUnknownIFooIFoo

IBarIBarIPGood

IPGoodIOleBadIOleBad

IUnknown

IUnknown

IOleObject

IOleObjectIDataObject

IDataObject

IPersistentStorage

IPersistentStorageIOleDocum

ent

IOleDocument

IUnknownIUnknown

IOleObjectIOleObjectIDataObjectIDataObject

IPersistentStorageIPersistentStorageIOleDocumentIOleDocument

IUnknown

IUnknown

IOleObject

IOleObjectIDataObject

IDataObjectIPersistentStorage

IPersistentStorage

IOleDocument

IOleDocument

IUnknownIUnknown

IFoo IFoo

IBar IBar

IPGoodIPGood

IOleBadIOleBad

ArchitectureArchitecture

MicroarchitectureMicroarchitecture

System-on-a-chip alternativesFPGA Sea of un-

committed gate arrays

Xylinx, Altera

Compile a system

Unique processor for every app

Tensillica

Systolic | array

Many pipelined or parallel processors + custom

Pc + ?? Dynamic reconfiguration of the entire chip…

Pc+DSP | VLIW

Spec. purpose processors cores + custom

TI

Pc & Mp.

ASICS

Gen. Purpose cores. Specialized by I/O, etc.

IBM, Intel, Lucent

Universal Micro

Multiprocessor array, programmable I/0

Cradle, Intel IXP 1200

HPEC 2001 HPEC 2001

Xilinx 10Mg, 500Mt, .12 mic

HPEC 2001 HPEC 2001

Tensillica Approach: Compiled Processor Plus Development Tools

Describe the processor attributes from a browser-like interface

Using the processor generator, create...

ALU

Pipe

I/O

Timer

MMURegister File

Cache

Tailored, HDL uP core

Customized Compiler, Assembler, Linker, Debugger,Simulator

Standard cell library targetted to the silicon process

Courtesy of Tensilica, Inc.http://www.tensilica.com

Richard Newton, UC/Berkeley

HPEC 2001 HPEC 2001

EEMBC Networking Benchmark

0

2

4

6

8

10

12

14

Per

form

ance

rel

ativ

e to

IDT

323

34/1

00 (

MIP

S32

)IDT 32334/100

IDT79RC32364/100

NEC V832-143

AMD ElanSC520/133

Toshiba TMPR3927F-GH189/133

IDT79RC32V334-150

Toshiba TMPR3927F-GHM2000/133

NEC VR5432-167

Xtensa/200

IDT79RC64575IDtc/250

NEC VR5000

IDT79RC64575Algor/250

AMD K6-2/450

AMD K6-2E/400

Xtensa Optimized/200

AMD K6-2E+/500

AMD K6-IIIE+/5500.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

0.040

0.045

Net

mar

k P

erfo

rman

ce/M

Hz

• Benchmarks: OSPF, Route Lookup, Packet Flow• Xtensa with no optimization comparable to 64b RISCs• Xtensa with optimization comparable to high-end desktop CPUs• Xtensa has outstanding efficiency (performance per cycle, per watt, per mm2)• Xtensa optimizations: custom instructions for route lookup and packet flow

Colors: Blue-Xtensa, Green-Desktop x86s, Maroon-64b RISCs, Orange-32b RISCs

HPEC 2001 HPEC 2001

EEMBC Consumer Benchmark

0

25

50

75

100

125

150

175

200

Pe

rfo

rma

nce

re

lativ

e to

ST

20

C2

/50

ST20C2/50

AMD ElanSC520/133

NEC V832/143

National Geode GX1/200

NEC VR5432/167

Xtensa/200

NEC VR5000/250

AMD K6-2E/400

AMDK6-2E+/500

AMD K6-III+/550

Xtensa Optimized/200

0.00

0.20

0.40

0.60

0.80

1.00

Cons

umer

mar

k Per

form

ance

/MHz

Colors: Blue-Xtensa, Green-Desktop x86s, Maroon-64b RISCs, Orange-32b RISCs

• Benchmarks: JPEG, Grey-scale filter, Color-space conversion• Xtensa with no optimization comparable to 64b RISCs• Xtensa with optimization beats all processors by 6x (no JPEG optimization)• Xtensa has exceptional efficiency (performance per cycle, per watt, per mm2)• Xtensa optimizations:custom instructions for filters, RGB-YIQ, RGB-CMYK

HPEC 2001 HPEC 2001

Free 32 bit processor core

HPEC 2001 HPEC 2001

Complex SOC architecture

Synopsys via Richard Newton, UC/B

MSP

MSP

MSP

M EM O R Y

MSP

MSP

MSP

MSP

M EM O R Y

MSP

MSP

MSP

MSP

M EM O R Y

C LO C KS,D EBU G

MSP

MSP

MSP

MSP

M EM O R YD R AMC O N TR O L

MSP

D R AM

PR O G I/O PR O G I/O PR

OG

I/O

PR

OG

I/O

PR

OG

I/O

PROG I/OPROG I/OPROG I/OPROG I/O

PR

OG

I/OP

RO

G I/O

PR

OG

I/O

N VM EM

UMS Architecture

Memory bandwidth scales with processing Scalable processing, software, I/O Each app runs on its own pool of processors Enables durable, portable intellectual property

HPEC 2001 HPEC 2001

• Minimize design time for applications• Efficient programming model• High reusability accelerates derivative development

• Cost/Performance• Replace ASICs, FPGAs, ASSPs, and DSPs• Low power for battery powered appliances

• Flexibility• Cost effective solution to address fragmenting markets• Faster return on R&D investments

Cradle UMS Design Goals

Quad 3Quad 2

Quad ‘n”

I/O Quad

Quad “n”

Global Bus

SDRAMCONTROL

Quad 1 Quad 2

I/O Quad

Quad 3

Each Quad has 4 RISCs, 8 DSPs, and MemoryUnique I/O subsystem keeps interfaces soft

Universal Microsystem (UMS)

PLA Ring

The Universal Micro System (UMS)An off the shelf “Platform” for Product Line Solutions

MSP

MSP

MSP

MSP

M EM O R YCLO

CK

S

MSP

MSP

MSP

MSP

M EM O R YDR

AM

CO

NTR

OL

D R AM

G lobal Bus

PR O G I/O PR O G I/O PR

OG

I/O

PR

OG

I/O

PR

OG

I/O

PROG I/OPROG I/OPROG I/OPROG I/O

PR

OG

I/OP

RO

G I/O

N VM EM

MSP

MSP

MSP

M EM O R Y

MSP

MSP

MSP

MSP

M EM O R Y

MSP

P E D S E

M EM

M ulti S tream P rocesso r750 M IP S /G F LO P S

SharedProgM em

SharedDataM em

SharedDM A

D S E

M EM

I/O B

us

Superior Digital Signal Processing

(Single Clock FP-MAC)

Scalable real time functions in software using small fast processors (QUAD)

Intelligent I/O Subsystem(Change Interfaces without changing chips)

Universal Micro System

250 MFLOPS/mm2

Local Memory that scales with additional processors

HPEC 2001 HPEC 2001

VPN Enterprise Gateway

Quad 1Firewall/TunelingLayer-2 switching

IP stack

Quad 23 DES IPSec

IP Layer 3 RoutingOperating System

Quad 33 DES IPSec

VoIPLAN Telephony

Quads 4 & 5VoIP

LAN Telephony

T1/E

1/J1

10/100E-MAC

PHY PHY

10/100E-MAC

Quad 1TCP/IP

IP Layer 3IKE

3DES IPSec

T1/E

1/J1

10/100E-MAC

PHY PHY

10/100E-MAC

•Single quad; Two 10/100 Ethernet ports at wire speed; one T1/E1/J1 interface•Handles 250 end users and 100 routes•Does key handling for IPSec•Delivers 50Mbps of 3DES

•Five quads; Two 10/100 Ethernet ports at wire speed; one T1/E1/J1 interface•Handles 250 end users and 100 routes•Does key handling for IPSec•Delivers 100Mbps of 3DES•Firewall•IP Telephony•O/S for user interactions

UMS Application PerformanceApplication MSP

sComments

MPEG Video Decode 4 720x480, 9Mbits/sec6 720x480, 15Mbits/secMPEG Video Encode 10-

16322/1282 Search AreaAC3 Audio Decode 1  

Modems 0.5 V903 G.Lite4 ADSL

Ethernet Router(Level 3 + QOS)

0.5 Per 100Mb channel4 Per Gigabit channel

Encryption 1 3DES 15Mb/s1 MD5 425Mb/s

3D geom, lite, render

4 1.6M Polygons/secDV Encode/Decode 8 Camcorder

• Architecture permits scalable software

• Supports two Gigabit Ethernets at wire speed; four fast Ethernets; four T-1s, USB, PCI, 1394, etc.

• MSP is a logical unit of one PE and two DSEs

Cradle: Universal Microsystemtrading Verilog & hardware for C/C++

Single part for all apps App spec’d@ run time using FPGA & ROM 5 quad mPs at 3 Gflops/quad = 15 Glops Single shared memory space, caches Programmable periphery including:

1 GB/s; 2.5 GipsPCI, 100 baseT, firewire

$4 per flops; 150 mW/Gflops

UMS : VLSI = microprocessor : special systemsSoftware : Hardware

HPEC 2001 HPEC 2001

Silicon Landscape 200x Increasing cost of fabrication and mask

– $7M for high-end ASSP chip design– Over $650K for masks alone and rising– SOC/ASIC companies require $7-10M business guarantee

Physical effects (parasitics, reliability issues, power management) are more significant design issues

– These must now be considered explicitly at the circuit level Design complexity and “context complexity” is sufficiently

high that design verification is a major limitation on time-to-market

Fewer design starts, higher-design volume…implies more programmable platforms

Richard Newton, UC/Berkeley

HPEC 2001 HPEC 2001

The End

HPEC 2001 HPEC 2001

Application(s)

Instruction Set Architecture360 SPARC 3000

“Physical Implementation”

General-PurposeComputing

Application(s)

Verilog, VHDL, …

ASIC FPGA

SynthesizeableRTL

Platform-BasedDesign

Application(s)

…Microarchitecture & Software

Physical Implementation

… …

…Platform

HPEC 2001 HPEC 2001

Embedded ProcessorsLPArm0.5-2 MIPS/mW

ASIPsDSPs

1 V DSP 3 MOPS/mW

The Energy-Flexibility Gap

DedicatedHW

Flexibility (Coverage)

En

ergy

Eff

icie

ncy

MO

PS

/mW

(or

MIP

S/m

W)

0.1

1

10

100

1000

ReconfigurableProcessor/Logic

Pleiades10-50 MOPS/mW

MUD100-200 MOPS/mW

Source: Prof. Jan Rabaey, UC Berkeley

HPEC 2001 HPEC 2001

Approaches to Reuse

SOC as the Assembly of Components?– Alberto Sangiovanni-Vincentelli

SOC as a Programmable Platform?– Kurt Keutzer

HPEC 2001 HPEC 2001

Component-Based Programmable Platform Approach

Assemble ComponentsAssemble Components from from parameterized libraryparameterized library

Intermediate languageIntermediate language that that exposes programmability of all exposes programmability of all aspects of the microarchitectureaspects of the microarchitecture

Integrate using Integrate using programmableprogrammable approachapproach to on-chip communication to on-chip communication

Assembly languageAssembly languagefor Processorfor Processor

Application-Specific Programmable Platforms (ASPP) These platforms will be highly-programmable They will implement highly-concurrent functionality

Richard Newton, UC/Berkeley

HPEC 2001 HPEC 2001

Compact Synthesized Processor, Including Software Development Environment

to scale on a typical $10 IC (3-6% of 60mm^2)

Use virtually any standard cell library with commercial memory generators

Base implementation is less than 25K gates (~1.0 mm2 in 0.25CMOS)

Power Dissipation in 0.25 standard cell is less than 0.5 mW/MHz

Courtesy of Tensilica, Inc.http://www.tensilica.com

HPEC 2001 HPEC 2001

Challenges of Programmability for Consumer Applications

Power, Power, Power…. Performance, Performance, Performance… Cost

Can we develop approaches to programming silicon and its integration, along with the tools and methodologies to support them, that will allow us to approach the power and performance of a dedicated solution sufficiently closely (~2-4x?) that a programmable platform is the preferred choice?

Richard Newton, UC/Berkeley

HPEC 2001 HPEC 2001

Bottom Line: Programmable Platforms

The challenge is finding the right programmer’s model and associated family of micro-architectures– Address a wide-enough range of applications

efficiently (performance, power, etc.) Successful platform developers must “own” the

software development environment and associated kernel-level run-time environment– “It’s all about concurrency”

If you could develop a very efficient and reliable re-programmable logic technology (comparable to ASIC densities), you would eventually own the silicon industry!

Richard Newton, UC/Berkeley

HPEC 2001 HPEC 2001

Approaches to Reuse

SOC as the Assembly of Components?– Alberto Sangiovanni-Vincentelli

SOC as a Programmable Platform?– Kurt Keutzer

Richard Newton, UC/Berkeley

HPEC 2001 HPEC 2001

A Component-Based Approach… Simple Universal Protocol (SUP)

– Unix pipes (character streams only)– TCP/IP (only one type of packet; limited options)– RS232, PCI– Streaming…

Single-Owner Protocol (SOP)– Visual Basic– Unibus, Massbus, Sbus,

Simple Interfaces, Complex Application (SIC)– When “the spec is much simpler than the code*” you aren’t tempted

to rewrite it – SQL, SAP, etc.

Implies “natural” boundaries to partition IP and successful components will be aligned with those boundaries.

(*suggested by Butler Lampson)

The Key Elements of the SOC

Applications

Applications

Microarchitecture

Microarchitecture Design Technology

Design Technology

Distrib

uted O

S (Netw

ork)

Distrib

uted O

S (Netw

ork)Software Development

Software Development

RF M

EMS optical A

SIPR

F MEM

S optical ASIP

What is theWhat is thePlatform akaPlatform akaProgrammerProgrammer

model?model?

Richard Newton, UC/Berkeley

Power as the Driver

0.001

0.01

0.1

1

10

100

1000

Pentium StrongARM TI DSP Dedicated

MIP

S/m

W

0.35m 0.35m 0.25m 1m

Four ordersof magnitude

(Power is still, almost always, the driver!)

Source: R. Brodersen, UC Berkeley

HPEC 2001 HPEC 2001

Back end

HPEC 2001 HPEC 2001

Computer ops/sec x word length / $

y = 1E-248e0.2918x

1.E-06

1.E-03

1.E+00

1.E+03

1.E+06

1.E+09

1880 1900 1920 1940 1960 1980 2000

.=1.565^(t-1959.4)

doubles every 7.5

doubles every 2.3

doubles every 1.0

HPEC 2001 HPEC 2001

Microprocessor performance

100 G

10 G

Giga

100 M

10 M

Mega

Kilo1970 1980 1990 2000 2010

Peak Peak Advertised Advertised

Performance Performance (PAP)(PAP)

Moore’sMoore’sLawLaw

Real AppliedReal AppliedPerformance Performance

(RAP) (RAP) 41% Growth41% Growth

HPEC 2001 HPEC 2001

GigaScale Evolution

In 1999 less than 3% of engineers doing designs with more than 10M transistors per chip. (Dataquest)

By early 2002, 0.1 micron will allow 600M transistors per chip. (Dataquest)

In 2001 49% of engineers @ .18 micron, 5% @ .10 micron. (EE Times)

54% plan to be @ .10 micron in 2003.(EET)

HPEC 2001 HPEC 2001

Challenges of GigaScale GigaScale systems are too big to simulate

– Hierarchical verification– Distributed verification

Requires a higher level of abstraction– Higher abstraction needed for verification

- High level modeling- Transaction-based verification

– Higher abstraction needed for design - High-level synthesis required for productivity breakthrough