hpec 2001 options for embedded systems. constraints, challenges, and approaches hpec 2001 lincoln...
Post on 18-Dec-2015
219 views
TRANSCRIPT
HPEC 2001 HPEC 2001
Options for embedded systems.Constraints, challenges, and approaches
HPEC 2001Lincoln Laboratory25 September 2001
Gordon Bell
Bay Area Research Center
Microsoft Corporation
HPEC 2001 HPEC 2001
The architecture challenge: “One person’s system, is another’s component.”- Alan Perlis Kurzweil: predicted hardware will be compiled and be
as easy to change as software by 2010 COTS: streaming, Beowulf, and www relevance? Architecture Hierarchy:
– Application– Scalable components forming the system– Design and test – Chips: the raw materials
Scalability: fewest, replicatable components Modularity: finding reusable components
HPEC 2001 HPEC 2001
The architecture levels & options The apps
– Data-types: “signals”, “packets”, video, voice, RF, etc.– Environment: parallelism, power, power, power, speed, … cost
The material: clock, transistors… Performance… it’s about parallelism
– Program & programming environment– Network e.g. WWW and Grid– Clusters– Storage, cluster, and network interconnect– Multiprocessors– Processor and special processing– Multi-threading and multiple processor per chip– Instruction Level Parallelism vs– Vector processors
HPEC 2001 HPEC 2001
Sony Playstation export limiits
A problem X-Box would like to have, … but have solved.
Will the PC prevail for the next decade as a/the dominant platform? … or 2nd to smart, mobile devices?
Moore’s Law: increases performance; Bell’s Corollary reduces prices for new classes
PC server clusters aka Beowulf with low cost OS kills proprietary switches, smPs, and DSMs
Home entertainment & control …– Very large disks (1TB by 2005) to “store everything”– Screens to enhance use
Mobile devices, etc. dominate WWW >2003! Voice and video become the important apps!
C = Commercial; C’ = Consumer
Where’s the action? Problems? Constraints from the application: Speech, video, mobility, RF, GPS,
security…Moore’s Law, networking, Interconnects
Scalability and high performance processing– Building them: Clusters vs DSM– Structure: where’s the processing, memory, and switches (disk and ip/tcp
processing)– Micros: getting the most from the nodes
Not ISAs: Change can delay Moore Law effect … and wipe out software investment! Please, please, just interpret my object code!
System (on a chip) alternatives… apps drivers– Data-types (e.g. video, video, RF) performance, portability/power, and cost
HPEC 2001 HPEC 2001
COTS: Anything at the system structure level to use?
How are the system components e.g. computers, etc. going to be interconnected?
What are the components? Linux What is the programming model?
– Is a plane, CCC, tank, fleet, ship, etc. an Internet?– Beowulfs… the next COTS– What happened to Ada? Visual Basic? Java?
HPEC 2001 HPEC 2001
ComputingSNAPbuilt entirelyfrom PCs Wide & Local
Area Networksfor: terminal,
PC, workstation,& servers
Centralized& departmental
uni- & mP servers(UNIX & NT)
Legacymainframes &
minicomputersservers & terms
Wide-areaglobal
network
Legacymainframe &
minicomputerservers & terminals
Centralized& departmental
servers buit fromPCs
scalable computers
built from PCs
TC=TV+PChome ...
(CATV or ATM or satellite)
???
Portables
A space, time (bandwidth), & generation scalable environment
Person servers (PCs)
Person servers (PCs)
MobileNets
HPEC 2001 HPEC 2001
Five ScalabilitiesSize scalable -- designed from a few components,
with no bottlenecks
Generation scaling -- no rewrite/recompile or user effort to run across generations of an architecture
Reliability scaling… chose any level
Geographic scaling -- compute anywhere (e.g. multiple sites or in situ workstation sites)
Problem x machine scalability -- ability of an algorithm or program to exist at a range of sizes that run efficiently on a given, scalable computer.
Problem x machine space => run time: problem scale, machine scale (#p), run time, implies speedup and efficiency,
HPEC 2001 HPEC 2001
Why I gave up on large smPs & DSMs
Economics: Perf/Cost is lower…unless a commodity Economics: Longer design time & life. Complex.
=> Poorer tech tracking & end of life performance. Economics: Higher, uncompetitive costs for processor &
switching. Sole sourcing of the complete system. DSMs … NUMA! Latency matters.
Compiler, run-time, O/S locate the programs anyway. Aren’t scalable. Reliability requires clusters. Start there. They aren’t needed for most apps… hence, a small
market unless one can find a way to lock in a user base. Important as in the case of IBM Token Rings vs Ethernet.
HPEC 2001 HPEC 2001
What is the basic structure of these scalable systems?
Overall Disk connection especially wrt to
fiber channel SAN, especially with fast WANs
& LANs
HPEC 2001 HPEC 2001
GB plumbing from the baroque:evolving from 2 dance-hall SMP & Storage model
Mp — S — Pc : | :
|—————— S.fc — Ms| :
|— S.Cluster |— S.WAN —
vs.MpPcMs — S.Lan/Cluster/Wan —
:
HPEC 2001 HPEC 2001
ISTORE Hardware Vision
System-on-a-chip enables computer, memory, without significantly increasing size of disk
5-7 year target:MicroDrive:1.7” x 1.4” x 0.2”
2006: ?1999: 340 MB, 5400 RPM,
5 MB/s, 15 ms seek2006: 9 GB, 50 MB/s ? (1.6X/yr capacity, 1.4X/yr BW)
Integrated IRAM processor2x height
Connected via crossbar switchgrowing like Moore’s law
16 Mbytes; ; 1.6 Gflops; 6.4 Gops10,000+ nodes in one rack! 100/board = 1 TB; 0.16 Tflops
HPEC 2001 HPEC 2001
The Disk Farm? or a System On a Card?
The 500GB disc cardAn array of discsCan be used as 100 discs 1 striped disc 50 FT discs ....etcLOTS of accesses/second of bandwidth
A few disks are replaced by 10s of Gbytes of RAM and a processor to run Apps!!
14"
HPEC 2001 HPEC 2001
0
50
100
150
200
250
100Mbps Gbps SAN
Transmitreceivercpusender cpu
Time µs toSend 1KB
The Promise of SAN/VIA/Infiniband http://www.ViArch.org/
Yesterday: – 10 MBps (100 Mbps Ethernet)
– ~20 MBps tcp/ip saturates 2 cpus
– round-trip latency ~250 µs
Now– Wires are 10x faster
Myrinet, Gbps Ethernet, ServerNet,…
– Fast user-level communication
- tcp/ip ~ 100 MBps 10% cpu- round-trip latency is 15 us
1.6 Gbps demoed on a WAN
HPEC 2001 HPEC 2001
Top500 taxonomy… everything is a cluster aka multicomputer Clusters are the ONLY scalable structure
– Cluster: n, inter-connected computer nodes operating as one system. Nodes: uni- or SMP. Processor types: scalar or vector.
MPP= miscellaneous, not massive (>1000), SIMD or something we couldn’t name
Cluster types. Implied message passing.– Constellations = clusters of >=16 P, SMP– Commodity clusters of uni or <=4 Ps, SMP– DSM: NUMA (and COMA) SMPs and constellations– DMA clusters (direct memory access) vs msg. pass– Uni- and SMPvector clusters:
Vector Clusters and Vector Constellations
HPEC 2001 HPEC 2001
Inno
vatio
n
The Virtuous Economic Cycle drives the PC industry… & Beowulf
Volum
e
Competition
Standards
Utility/value
DOJ
Greater availability
@ lower cost
Creates apps, tools, training,Attracts users
Attracts suppliers
HPEC 2001 HPEC 2001
BEOWULF-CLASS SYSTEMS
Cluster of PCs– Intel x86– DEC Alpha– Mac Power PC
Pure M2COTS Unix-like O/S with source
– Linux, BSD, Solaris Message passing programming model
– PVM, MPI, BSP, homebrew remedies Single user environments Large science and engineering applications
Lessons from Beowulf
An experiment in parallel computing systems Established vision- low cost high end computing Demonstrated effectiveness of PC clusters for some (not all) classes of
applications Provided networking software Provided cluster management tools Conveyed findings to broad community Tutorials and the book Provided design standard to rally community! Standards beget: books, trained people, software … virtuous cycle that
allowed apps to form Industry begins to form beyond a research project
Courtesy, Thomas Sterling, Caltech.
HPEC 2001 HPEC 2001
Designs at chip level…any COTS options?
Substantially more programmability versus factory compilation
As systems move onto chips and chip sets become part of larger systems, Electronic Design must move from RTL to algorithms.
Verification and design of “GigaScale systems” will be the challenge.
HPEC 2001 HPEC 2001
The Productivity Gap
1
Logi
c Tr
ansi
stor
s pe
r Chi
p
(K)
P
rodu
ctiv
ityTr
ans.
/Sta
ff - M
onth
10
100
1,000
10,000
100,000
1,000,000
10,000,000
10
100
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
Logic Transistors/Chip
Transistor/Staff Month
58%/Yr. compoundComplexity growth rate
21%/Yr. compoundProductivity growth rate
Source: SEMATECHSource: SEMATECH
1981
1983
1985
1987
1989
1991
1993
1995
1997
1999
2003
2001
2005
2007
2009
xxx
x xx
x
HPEC 2001 HPEC 2001
What Is GigaScale? Extremely large gate counts
– Chips & chip sets– Systems & multiple-systems
High complexity– Complex data manipulation– Complex dataflow
Intense pressure for correct , 1st time – TTM, cost of failure, etc. impacts ability to have a silicon
startup Multiple languages and abstraction levels
– Design, verification, and software
HPEC 2001 HPEC 2001
EDA Evolution: chips to systemsGigaScale Architect
HierarchicalVerification
plus
2005 (e.g. Forte)GigaScale
Simulation IC Designer
1985(Daisy, Mentor) Gates10K gates
System Architect
Testbench AutomationEmulationFormal Verification
plus
1995 (Synopsys & Cadence)RTL
1M gates
Chip Architect
ASIC Designer
SOC Designer
1975 (Calma & CV)Physical design Courtesy of Forte Design Systems
HPEC 2001 HPEC 2001
If system-on-a-chip is the answer, what is the problem? Small, high volume products
– Phones, PDAs, – Toys & games (to sell batteries)– Cars– Home appliances– TV & video
Communication infrastructure Plain old computers… and portables Embeddable computers of all types where
performance and/or power are the major constraints.
HPEC 2001 HPEC 2001
SOC Alternatives… not including C/C++ CAD Tools
The blank sheet of paper: FPGA Auto design of a processor: Tensilica Standardized, committee designed components*,
cells, and custom IP Standard components including more application
specific processors *, IP add-ons plus custom
One chip does it all: SMOP*Processors, Memory, Communication & Memory
Links,
HPEC 2001 HPEC 2001
Tradeoffs and Reuse Model
System ApplicationSystem Application
Silicon ProcessSilicon Process
PlatformPlatformExportationExportation
StructuredStructuredCustomCustom
RTLRTLFlowFlow
FPGAFPGA FPGA &FPGA &GPPGPP
ASIPASIP DSPDSP GPPGPP
ApplicationApplicationImplementationImplementation
ProgrammabilityProgrammabilityLow HighTime to Develop/Iterate New ApplicationTime to Develop/Iterate New ApplicationHigh LowerCost to Develop/Iterate New ApplicationCost to Develop/Iterate New ApplicationHigh LowerMOPS/mWMOPS/mWHigh Low
IUnknownIUnknown
IOleObjectIOleObjectIDataObjectIDataObject
IPersistentStorageIPersistentStorageIOleDocumentIOleDocumentIUnknownIUnknown
IFooIFooIBarIBar
IPGoodIPGoodIOleBadIOleBad
IUnknown
IUnknownIFooIFoo
IBarIBarIPGood
IPGoodIOleBadIOleBad
IUnknown
IUnknown
IOleObject
IOleObjectIDataObject
IDataObject
IPersistentStorage
IPersistentStorageIOleDocum
ent
IOleDocument
IUnknownIUnknown
IOleObjectIOleObjectIDataObjectIDataObject
IPersistentStorageIPersistentStorageIOleDocumentIOleDocument
IUnknown
IUnknown
IOleObject
IOleObjectIDataObject
IDataObjectIPersistentStorage
IPersistentStorage
IOleDocument
IOleDocument
IUnknownIUnknown
IFoo IFoo
IBar IBar
IPGoodIPGood
IOleBadIOleBad
ArchitectureArchitecture
MicroarchitectureMicroarchitecture
System-on-a-chip alternativesFPGA Sea of un-
committed gate arrays
Xylinx, Altera
Compile a system
Unique processor for every app
Tensillica
Systolic | array
Many pipelined or parallel processors + custom
Pc + ?? Dynamic reconfiguration of the entire chip…
Pc+DSP | VLIW
Spec. purpose processors cores + custom
TI
Pc & Mp.
ASICS
Gen. Purpose cores. Specialized by I/O, etc.
IBM, Intel, Lucent
Universal Micro
Multiprocessor array, programmable I/0
Cradle, Intel IXP 1200
HPEC 2001 HPEC 2001
Tensillica Approach: Compiled Processor Plus Development Tools
Describe the processor attributes from a browser-like interface
Using the processor generator, create...
ALU
Pipe
I/O
Timer
MMURegister File
Cache
Tailored, HDL uP core
Customized Compiler, Assembler, Linker, Debugger,Simulator
Standard cell library targetted to the silicon process
Courtesy of Tensilica, Inc.http://www.tensilica.com
Richard Newton, UC/Berkeley
HPEC 2001 HPEC 2001
EEMBC Networking Benchmark
0
2
4
6
8
10
12
14
Per
form
ance
rel
ativ
e to
IDT
323
34/1
00 (
MIP
S32
)IDT 32334/100
IDT79RC32364/100
NEC V832-143
AMD ElanSC520/133
Toshiba TMPR3927F-GH189/133
IDT79RC32V334-150
Toshiba TMPR3927F-GHM2000/133
NEC VR5432-167
Xtensa/200
IDT79RC64575IDtc/250
NEC VR5000
IDT79RC64575Algor/250
AMD K6-2/450
AMD K6-2E/400
Xtensa Optimized/200
AMD K6-2E+/500
AMD K6-IIIE+/5500.000
0.005
0.010
0.015
0.020
0.025
0.030
0.035
0.040
0.045
Net
mar
k P
erfo
rman
ce/M
Hz
• Benchmarks: OSPF, Route Lookup, Packet Flow• Xtensa with no optimization comparable to 64b RISCs• Xtensa with optimization comparable to high-end desktop CPUs• Xtensa has outstanding efficiency (performance per cycle, per watt, per mm2)• Xtensa optimizations: custom instructions for route lookup and packet flow
Colors: Blue-Xtensa, Green-Desktop x86s, Maroon-64b RISCs, Orange-32b RISCs
HPEC 2001 HPEC 2001
EEMBC Consumer Benchmark
0
25
50
75
100
125
150
175
200
Pe
rfo
rma
nce
re
lativ
e to
ST
20
C2
/50
ST20C2/50
AMD ElanSC520/133
NEC V832/143
National Geode GX1/200
NEC VR5432/167
Xtensa/200
NEC VR5000/250
AMD K6-2E/400
AMDK6-2E+/500
AMD K6-III+/550
Xtensa Optimized/200
0.00
0.20
0.40
0.60
0.80
1.00
Cons
umer
mar
k Per
form
ance
/MHz
Colors: Blue-Xtensa, Green-Desktop x86s, Maroon-64b RISCs, Orange-32b RISCs
• Benchmarks: JPEG, Grey-scale filter, Color-space conversion• Xtensa with no optimization comparable to 64b RISCs• Xtensa with optimization beats all processors by 6x (no JPEG optimization)• Xtensa has exceptional efficiency (performance per cycle, per watt, per mm2)• Xtensa optimizations:custom instructions for filters, RGB-YIQ, RGB-CMYK
MSP
MSP
MSP
M EM O R Y
MSP
MSP
MSP
MSP
M EM O R Y
MSP
MSP
MSP
MSP
M EM O R Y
C LO C KS,D EBU G
MSP
MSP
MSP
MSP
M EM O R YD R AMC O N TR O L
MSP
D R AM
PR O G I/O PR O G I/O PR
OG
I/O
PR
OG
I/O
PR
OG
I/O
PROG I/OPROG I/OPROG I/OPROG I/O
PR
OG
I/OP
RO
G I/O
PR
OG
I/O
N VM EM
UMS Architecture
Memory bandwidth scales with processing Scalable processing, software, I/O Each app runs on its own pool of processors Enables durable, portable intellectual property
HPEC 2001 HPEC 2001
• Minimize design time for applications• Efficient programming model• High reusability accelerates derivative development
• Cost/Performance• Replace ASICs, FPGAs, ASSPs, and DSPs• Low power for battery powered appliances
• Flexibility• Cost effective solution to address fragmenting markets• Faster return on R&D investments
Cradle UMS Design Goals
Quad 3Quad 2
Quad ‘n”
I/O Quad
Quad “n”
Global Bus
SDRAMCONTROL
Quad 1 Quad 2
I/O Quad
Quad 3
Each Quad has 4 RISCs, 8 DSPs, and MemoryUnique I/O subsystem keeps interfaces soft
Universal Microsystem (UMS)
PLA Ring
The Universal Micro System (UMS)An off the shelf “Platform” for Product Line Solutions
MSP
MSP
MSP
MSP
M EM O R YCLO
CK
S
MSP
MSP
MSP
MSP
M EM O R YDR
AM
CO
NTR
OL
D R AM
G lobal Bus
PR O G I/O PR O G I/O PR
OG
I/O
PR
OG
I/O
PR
OG
I/O
PROG I/OPROG I/OPROG I/OPROG I/O
PR
OG
I/OP
RO
G I/O
N VM EM
MSP
MSP
MSP
M EM O R Y
MSP
MSP
MSP
MSP
M EM O R Y
MSP
P E D S E
M EM
M ulti S tream P rocesso r750 M IP S /G F LO P S
SharedProgM em
SharedDataM em
SharedDM A
D S E
M EM
I/O B
us
Superior Digital Signal Processing
(Single Clock FP-MAC)
Scalable real time functions in software using small fast processors (QUAD)
Intelligent I/O Subsystem(Change Interfaces without changing chips)
Universal Micro System
250 MFLOPS/mm2
Local Memory that scales with additional processors
HPEC 2001 HPEC 2001
VPN Enterprise Gateway
Quad 1Firewall/TunelingLayer-2 switching
IP stack
Quad 23 DES IPSec
IP Layer 3 RoutingOperating System
Quad 33 DES IPSec
VoIPLAN Telephony
Quads 4 & 5VoIP
LAN Telephony
T1/E
1/J1
10/100E-MAC
PHY PHY
10/100E-MAC
Quad 1TCP/IP
IP Layer 3IKE
3DES IPSec
T1/E
1/J1
10/100E-MAC
PHY PHY
10/100E-MAC
•Single quad; Two 10/100 Ethernet ports at wire speed; one T1/E1/J1 interface•Handles 250 end users and 100 routes•Does key handling for IPSec•Delivers 50Mbps of 3DES
•Five quads; Two 10/100 Ethernet ports at wire speed; one T1/E1/J1 interface•Handles 250 end users and 100 routes•Does key handling for IPSec•Delivers 100Mbps of 3DES•Firewall•IP Telephony•O/S for user interactions
UMS Application PerformanceApplication MSP
sComments
MPEG Video Decode 4 720x480, 9Mbits/sec6 720x480, 15Mbits/secMPEG Video Encode 10-
16322/1282 Search AreaAC3 Audio Decode 1
Modems 0.5 V903 G.Lite4 ADSL
Ethernet Router(Level 3 + QOS)
0.5 Per 100Mb channel4 Per Gigabit channel
Encryption 1 3DES 15Mb/s1 MD5 425Mb/s
3D geom, lite, render
4 1.6M Polygons/secDV Encode/Decode 8 Camcorder
• Architecture permits scalable software
• Supports two Gigabit Ethernets at wire speed; four fast Ethernets; four T-1s, USB, PCI, 1394, etc.
• MSP is a logical unit of one PE and two DSEs
Cradle: Universal Microsystemtrading Verilog & hardware for C/C++
Single part for all apps App spec’d@ run time using FPGA & ROM 5 quad mPs at 3 Gflops/quad = 15 Glops Single shared memory space, caches Programmable periphery including:
1 GB/s; 2.5 GipsPCI, 100 baseT, firewire
$4 per flops; 150 mW/Gflops
UMS : VLSI = microprocessor : special systemsSoftware : Hardware
HPEC 2001 HPEC 2001
Silicon Landscape 200x Increasing cost of fabrication and mask
– $7M for high-end ASSP chip design– Over $650K for masks alone and rising– SOC/ASIC companies require $7-10M business guarantee
Physical effects (parasitics, reliability issues, power management) are more significant design issues
– These must now be considered explicitly at the circuit level Design complexity and “context complexity” is sufficiently
high that design verification is a major limitation on time-to-market
Fewer design starts, higher-design volume…implies more programmable platforms
Richard Newton, UC/Berkeley
HPEC 2001 HPEC 2001
Application(s)
Instruction Set Architecture360 SPARC 3000
“Physical Implementation”
…
…
General-PurposeComputing
Application(s)
…
Verilog, VHDL, …
ASIC FPGA
SynthesizeableRTL
Platform-BasedDesign
Application(s)
…Microarchitecture & Software
Physical Implementation
… …
…Platform
HPEC 2001 HPEC 2001
Embedded ProcessorsLPArm0.5-2 MIPS/mW
ASIPsDSPs
1 V DSP 3 MOPS/mW
The Energy-Flexibility Gap
DedicatedHW
Flexibility (Coverage)
En
ergy
Eff
icie
ncy
MO
PS
/mW
(or
MIP
S/m
W)
0.1
1
10
100
1000
ReconfigurableProcessor/Logic
Pleiades10-50 MOPS/mW
MUD100-200 MOPS/mW
Source: Prof. Jan Rabaey, UC Berkeley
HPEC 2001 HPEC 2001
Approaches to Reuse
SOC as the Assembly of Components?– Alberto Sangiovanni-Vincentelli
SOC as a Programmable Platform?– Kurt Keutzer
HPEC 2001 HPEC 2001
Component-Based Programmable Platform Approach
Assemble ComponentsAssemble Components from from parameterized libraryparameterized library
Intermediate languageIntermediate language that that exposes programmability of all exposes programmability of all aspects of the microarchitectureaspects of the microarchitecture
Integrate using Integrate using programmableprogrammable approachapproach to on-chip communication to on-chip communication
Assembly languageAssembly languagefor Processorfor Processor
Application-Specific Programmable Platforms (ASPP) These platforms will be highly-programmable They will implement highly-concurrent functionality
Richard Newton, UC/Berkeley
HPEC 2001 HPEC 2001
Compact Synthesized Processor, Including Software Development Environment
to scale on a typical $10 IC (3-6% of 60mm^2)
Use virtually any standard cell library with commercial memory generators
Base implementation is less than 25K gates (~1.0 mm2 in 0.25CMOS)
Power Dissipation in 0.25 standard cell is less than 0.5 mW/MHz
Courtesy of Tensilica, Inc.http://www.tensilica.com
HPEC 2001 HPEC 2001
Challenges of Programmability for Consumer Applications
Power, Power, Power…. Performance, Performance, Performance… Cost
Can we develop approaches to programming silicon and its integration, along with the tools and methodologies to support them, that will allow us to approach the power and performance of a dedicated solution sufficiently closely (~2-4x?) that a programmable platform is the preferred choice?
Richard Newton, UC/Berkeley
HPEC 2001 HPEC 2001
Bottom Line: Programmable Platforms
The challenge is finding the right programmer’s model and associated family of micro-architectures– Address a wide-enough range of applications
efficiently (performance, power, etc.) Successful platform developers must “own” the
software development environment and associated kernel-level run-time environment– “It’s all about concurrency”
If you could develop a very efficient and reliable re-programmable logic technology (comparable to ASIC densities), you would eventually own the silicon industry!
Richard Newton, UC/Berkeley
HPEC 2001 HPEC 2001
Approaches to Reuse
SOC as the Assembly of Components?– Alberto Sangiovanni-Vincentelli
SOC as a Programmable Platform?– Kurt Keutzer
Richard Newton, UC/Berkeley
HPEC 2001 HPEC 2001
A Component-Based Approach… Simple Universal Protocol (SUP)
– Unix pipes (character streams only)– TCP/IP (only one type of packet; limited options)– RS232, PCI– Streaming…
Single-Owner Protocol (SOP)– Visual Basic– Unibus, Massbus, Sbus,
Simple Interfaces, Complex Application (SIC)– When “the spec is much simpler than the code*” you aren’t tempted
to rewrite it – SQL, SAP, etc.
Implies “natural” boundaries to partition IP and successful components will be aligned with those boundaries.
(*suggested by Butler Lampson)
The Key Elements of the SOC
Applications
Applications
Microarchitecture
Microarchitecture Design Technology
Design Technology
Distrib
uted O
S (Netw
ork)
Distrib
uted O
S (Netw
ork)Software Development
Software Development
RF M
EMS optical A
SIPR
F MEM
S optical ASIP
What is theWhat is thePlatform akaPlatform akaProgrammerProgrammer
model?model?
Richard Newton, UC/Berkeley
Power as the Driver
0.001
0.01
0.1
1
10
100
1000
Pentium StrongARM TI DSP Dedicated
MIP
S/m
W
0.35m 0.35m 0.25m 1m
Four ordersof magnitude
(Power is still, almost always, the driver!)
Source: R. Brodersen, UC Berkeley
HPEC 2001 HPEC 2001
Computer ops/sec x word length / $
y = 1E-248e0.2918x
1.E-06
1.E-03
1.E+00
1.E+03
1.E+06
1.E+09
1880 1900 1920 1940 1960 1980 2000
.=1.565^(t-1959.4)
doubles every 7.5
doubles every 2.3
doubles every 1.0
HPEC 2001 HPEC 2001
Microprocessor performance
100 G
10 G
Giga
100 M
10 M
Mega
Kilo1970 1980 1990 2000 2010
Peak Peak Advertised Advertised
Performance Performance (PAP)(PAP)
Moore’sMoore’sLawLaw
Real AppliedReal AppliedPerformance Performance
(RAP) (RAP) 41% Growth41% Growth
HPEC 2001 HPEC 2001
GigaScale Evolution
In 1999 less than 3% of engineers doing designs with more than 10M transistors per chip. (Dataquest)
By early 2002, 0.1 micron will allow 600M transistors per chip. (Dataquest)
In 2001 49% of engineers @ .18 micron, 5% @ .10 micron. (EE Times)
54% plan to be @ .10 micron in 2003.(EET)
HPEC 2001 HPEC 2001
Challenges of GigaScale GigaScale systems are too big to simulate
– Hierarchical verification– Distributed verification
Requires a higher level of abstraction– Higher abstraction needed for verification
- High level modeling- Transaction-based verification
– Higher abstraction needed for design - High-level synthesis required for productivity breakthrough