the future of microprocessors embedded in memory

36
1 The Future of Microprocessors Embedded in Memory David A. Patterson [email protected] http://cs.berkeley.edu/~patterson/talks EECS, University of California Berkeley, CA 94720-1776

Upload: others

Post on 12-Sep-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Future of Microprocessors Embedded in Memory

1

The Future of MicroprocessorsEmbedded in Memory

David A. Patterson

[email protected]://cs.berkeley.edu/~patterson/talks

EECS, University of CaliforniaBerkeley, CA 94720-1776

Page 2: The Future of Microprocessors Embedded in Memory

2

Outlinen Desktop/Server Microprocessor State of

the Artn A New Technology: Embedded DRAMn Mobile Multimedia Computing as New

Directionn A New Architecture for Mobile Multimedia

Computingn Berkeley’s Mobile Multimedia

Microprocessorn Historical Perspectiven Challenges & Potential Industrial Impact

Page 3: The Future of Microprocessors Embedded in Memory

3

Processor-DRAM Gap (latency)

µProc60%/yr.

DRAM7%/yr.

1

10

100

10001980

1981

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

DRAM

CPU1982

Processor-MemoryPerformance Gap:(grows 50% / year)

Per

form

ance

Time

“Moore’s Law”

Page 4: The Future of Microprocessors Embedded in Memory

4

Processor-MemoryPerformance Gap “Tax”

Processor % Area %Transistors

(≈cost)(≈power)

n Alpha 21164 37% 77%n StrongArm SA110 61% 94%n Pentium Pro 64% 88%

– 2 dies per package: Proc/I$/D$ + L2$

n Caches have no inherent value,only try to close performance gap

Page 5: The Future of Microprocessors Embedded in Memory

5

Today’s Situation:Microprocessor

MIPS MPUs R5000 R1000010k/5k

n Clock Rate 200 MHz 195 MHz1.0x

n On-Chip Caches 32K/32K 32K/32K 1.0x

n Instructions/Cycle 1(+ FP) 4 4.0x

n Pipe stages 5 5-71.2x

n Model In-order Out-of-order ---

Page 6: The Future of Microprocessors Embedded in Memory

6

Desktop/Server State of the Artn Processor performance doubling / 18 monthsn Microprocessor-DRAM performance gap & taxn Cost fixed at ≈$500/chip, power whatever can

cool– 10X cost, 10X power => 2X performance?

n Desktop apps slow at rate processors speedup?– Word struggles to keep up with typing?

n Consolidation of industry?PowerPCPowerPC

PA-RISC

PA-RISCMIPSMIPS AlphaAlpha IA-64SPARC

Page 7: The Future of Microprocessors Embedded in Memory

7

Outlinen Desktop/Server Microprocessor State of

the Artn A New Technology: Embedded DRAMn Mobile Multimedia Computing as New

Directionn A New Architecture for Mobile Multimedia

Computingn Berkeley’s Mobile Multimedia

Microprocessorn Historical Perspectiven Challenges & Potential Industrial Impact

Page 8: The Future of Microprocessors Embedded in Memory

8

A More RevolutionaryApproach: ProcessorEmbedded in DRAM

n Faster logic in DRAM process– DRAM vendors offer faster transistors +

same number metal layers as good logicprocess?@ ≈ 20% higher cost per wafer?

– As die cost ≈ f(die area4), 4% die shrink ⇒ equalcost

n Called Intelligent RAM (“IRAM”) since most oftransistors will be DRAM

Page 9: The Future of Microprocessors Embedded in Memory

9

IRAM Vision Statement

Microprocessor & DRAMon a single chip:– on-chip memory latency

5-10X, bandwidth 50-100X– improve energy efficiency

2X-4X (no off-chip bus)– serial I/O 5-10X v. buses– smaller board area/volume– adjustable memory size/width

DRAM

fab

Proc

Bus

D R A M

$ $Proc

L2$

Logic

fabBus

D R A M

I/OI/O

I/OI/O

Bus

Page 10: The Future of Microprocessors Embedded in Memory

10

Outlinen Desktop/Server Microprocessor State of

the Artn A New Technology: Embedded DRAMn Mobile Multimedia Computing as New

Directionn A New Architecture for Mobile Multimedia

Computingn Berkeley’s Mobile Multimedia

Microprocessorn Historical Perspectiven Challenges & Potential Industrial Impact

Page 11: The Future of Microprocessors Embedded in Memory

11

Intelligent PDA ( 2003?)Pilot PDA (todo,calendar,

calculator, addresses,...)+ Gameboy (Tetris, ...)+ Nikon Coolpix (camera)+ Cell Phone, Pager,

GPS, tape recorder,TV remote, am/fm radio,garage door opener, ...

+ Wireless data (WWW)+ Speech, vision recog.

– Speech control of all devices – Vision to see surroundings, scan documents, read bar

codes, measure room+ Voice output for conversations

Page 12: The Future of Microprocessors Embedded in Memory

12

New ArchitectureDirections

n “…media processing will become thedominant force in computer arch. &microprocessor design.”

n “... new media-rich applications... involvesignificant real-time processing of continuousmedia streams, and make heavy use ofvectors of packed 8-, 16-, and 32-bit integerand Fl. Pt.”

n Needs include high memory BW, highnetwork BW, continuous media data types,real-time response, fine grain parallelism– “How Multimedia Workloads Will Change

Page 13: The Future of Microprocessors Embedded in Memory

13

Outlinen Desktop/Server Microprocessor State of

the Artn A New Technology: Embedded DRAMn Mobile Multimedia Computing as New

Directionn A New Architecture for Mobile Multimedia

Computingn Berkeley’s Mobile Multimedia

Microprocessorn Historical Perspectiven Challenges & Potential Industrial Impact

Page 14: The Future of Microprocessors Embedded in Memory

14

Potential IRAM Architecturen “New” model: VSIW=Very Short Instruction

Word!– Compact: Describe N operations with 1 short

instruct.– Predictable (real-time) perf. vs. statistical perf.

(cache)– Multimedia ready: choose N*64b, 2N*32b, 4N*16b– Easy to get high performance; N operations:

» are independent

» use same functional unit» access disjoint registers» access registers in same order as previous instructions» access contiguous memory words or known pattern

Page 15: The Future of Microprocessors Embedded in Memory

15

Operation & Instruction Count:RISC v. “VSIW” Processor

(from F. Quintana, U. Barcelona.)

Spec92fp Operations (M) Instructions (M)Program RISC VSIW R / V RISC VSIW R /

Vswim256 115 95 1.1x 115 0.8

142xhydro2d 58 40 1.4x 58 0.8

71xnasa7 69 41 1.7x 69 2.2

31xsu2cor 51 35 1.4x 51 1.8

29xtomcatv 15 10 1.4x 15 1.3

11x

VSIW reduces ops by 1.2X, instructions by 20X!

Page 16: The Future of Microprocessors Embedded in Memory

16

Revive Vector (= VSIW)Architecture!

n Cost: ≈ $1M each?n Low latency, high

BW memorysystem?

n Code density?n Compilers?n Vector

Performance?n Power/Energy?n Scalar

performance?

n Real-time?

Limited to scientific

n Single-chip CMOS MPU/IRAMn IRAM = low latency,

high bandwidth memoryn Much smaller than VLIW/EPICn For sale, mature (>20 years)n Easy scale speed with

technologyn Parallel to save energy, keep

perfn Include modern, modest CPU

⇒ OK scalar (MIPS 5K v.10k)

n No caches, no speculation⇒ repeatable speed as varyinput

Page 17: The Future of Microprocessors Embedded in Memory

17

Software Technology TrendsAffecting New Direction?

n any CPU + vector coprocessor/memory– scalar/vector interactions are limited, simple– Example architecture based on ARM 9, MIPS IV

n Vectorizing compilers built for 25 years– can buy one for new machine from The Portland

Group

n Can change HW without changing programs– More arithmetic units, registers OK

n Lowers software effort– Media Processors/DSP, > 50% staff are

programmers

Page 18: The Future of Microprocessors Embedded in Memory

18

Software Technology TrendsAffecting New Direction?

n IRAM has limited memory => Desktop OSwrong

n Real-Time Operating Systems– “Microkernel” - exclude modules not used by

application– Small footprint even with all features ( 0.5 to 2 MB)– Examples: WRS VxWorks, QNX, Windows CE

n Applications not distributed in binary– WWW software distribution (Java, Javascript, TCL)– Open Source movement (Linix, Netscape, PERL)

Page 19: The Future of Microprocessors Embedded in Memory

19

Outlinen Desktop/Server Microprocessor State of

the Artn A New Technology: Embedded DRAMn Mobile Multimedia Computing as New

Directionn A New Architecture for Mobile Multimedia

Computingn Berkeley’s Mobile Multimedia

Microprocessorn Historical Perspectiven Challenges & Potential Industrial Impact

Page 20: The Future of Microprocessors Embedded in Memory

20

MHz1.6 GFLOPS(64b)/6.4

GOPS(16b)/32MB

Memory Crossbar Switch

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

M

+

Vector Registers

x

÷

Load/Store

8K I cache 8K D cache

2-waySuperscalar

VectorProcessor

4 x 64 4 x 64 4 x 64 4 x 64 4 x 64

4 x 64or

8 x 32or

16 x 16

4 x 644 x 64

QueueInstruction

I/OI/O

I/OI/O

SerialI/O

Page 21: The Future of Microprocessors Embedded in Memory

21

Ring-basedSwitch

CPU+$

Tentative VIRAM-1Floorplan

I/O

n 0.18 µm DRAM32 MB in 16 banksx 256b, 128subbanks

n 0.18 µm,5 Metal Logic

n ≈ 200 MHz MIPS, 16K I$, 16K D$

n ≈ 4 200 MHzFP/int. vector units

n die: ≈ 16x16mm

n xtors: ≈ 270Mn power: ≈2 Watts

4 Vector Pipes/Lanes

Memory (128 Mbits / 16 MBytes)

Memory (128 Mbits / 16 MBytes)

Page 22: The Future of Microprocessors Embedded in Memory

22

CPU+$

Tentative VIRAM-”0.25”Floorplan

n Demonstratescalability via2nd layout(automatic from1st)

n 8 MB in 4 banks x256b, 32 subbanks

n ≈ 200 MHz CPU,16K I$, 16K D$

n 1 ≈ 200 MHzFP/int. vector units

n die: ≈ 5 x 16mm

n xtors: ≈ 70Mn power: ≈0.5 Watts

1 VU

Memory (32 Mb /

4 MB)

Memory (32 Mb /

4 MB)

Page 23: The Future of Microprocessors Embedded in Memory

23

V-IRAM-1 Tentative Plann Phase I: Feasibility stage (≈H1’98)

– Test chip, CAD agreement, architecturedefined

n Phase 2: Design & Layout Stage (≈H2’98)– Test chip, Simulated design and layout

n Phase 3: Verification (≈H2’99)– Tape-out

n Phase 4: Fabrication,Testing, andDemonstration (≈H1’00)– Functional integrated circuit

n First microprocessor ≥ 0.15-.25Btransistors?

Page 24: The Future of Microprocessors Embedded in Memory

24

Outlinen Desktop/Server Microprocessor State of

the Artn A New Technology: Embedded DRAMn Mobile Multimedia Computing as New

Directionn A New Architecture for Mobile Multimedia

Computingn Berkeley’s Mobile Multimedia

Microprocessorn Historical Perspectiven Challenges & Potential Industrial Impact

Page 25: The Future of Microprocessors Embedded in Memory

25

SIMD on chip (DRAM)Uniprocessor (SRAM)MIMD on chip (DRAM)Uniprocessor (DRAM)MIMD component (SRAM )

10 100 1000 100000.1

1

10

100

Mbits of

Memory

Computational RAM

PIP-RAMMitsubishi M32R/D

Execube

Pentium Pro

Alpha 21164

Transputer T9

1000IRAMUNI?

PPRAM

Bits of Arithmetic Unit

Terasys

IRAMnot a new

ideaStone, ‘70 “Logic-in memory”Barron, ‘78 “Transputer”Kogge, ‘94 “Execube”Shimizu, ‘96 “M32R/D”Murakami, ‘97 “PPRAM”

Page 26: The Future of Microprocessors Embedded in Memory

26

Why IRAM now?Lower risk than before

n DRAM manufacturers now willing to listen– Before not interested, so early IRAM = SRAM

n Faster Logic + DRAM available soon?n Past efforts memory limited ⇒ multiple chips

⇒ 1st solve the unsolved (parallelprocessing)– Gigabit DRAM ⇒ ≈100 MB; OK for many apps?

n Systems headed to 2 chips: CPU + memoryn Embedded apps leverage energy efficiency,

adjustable mem. capacity, smaller board area ⇒ “Post-PC Era”: PDA >> desktop

Page 27: The Future of Microprocessors Embedded in Memory

27

IRAM Challengesn Chip

– Good performance and reasonable power?– Cost for Embedded DRAM process?

» Cost of 16 Mbit DRAM $1.78 in June 1998: Howcompete?

– Speed in Embedded DRAM? (time behind logicfab?)

– Testing time of IRAM vs DRAM vsmicroprocessor?

– BW/Latency oriented DRAM tradeoffs?

n Architecture– How to turn high memory bandwidth into

performance for real applications?

Page 28: The Future of Microprocessors Embedded in Memory

28

Commercial IRAM highway isgoverned by memory per

IRAM?

Graphics Acc.

Super PDA/PhoneVideo Games

Network ComputerLaptop

8 MB

2 MB

32 MB

Page 29: The Future of Microprocessors Embedded in Memory

29

Words to Remember

“...a strategic inflection point is a time in thelife of a business when its fundamentals areabout to change. ... Let's not mince words: Astrategic inflection point can be deadly whenunattended to. Companies that begin adecline as a result of its changes rarelyrecover their previous greatness.”– Only the Paranoid Survive, Andrew S. Grove,

1996

Page 30: The Future of Microprocessors Embedded in Memory

30

n Apps/metrics of future to design computer offuture

n Mobile Multimedia >> “Stationary” Computers?n IRAM potential in mem/IO BW, energy, board

area;challenges in cost, power/performance, testing

n 10X-100X improvements based on technologyshipping for 20 years (not JJ, photons, MEMS,...)

n Potential: shift semiconductor balance ofpower? Who ships the most memory? Most

Conclusion

Page 31: The Future of Microprocessors Embedded in Memory

31

Interested in Participating?n Looking for ideas of IRAM enabled apps,

partners to transfer technologyn Contact us if you’re interested:http://iram.cs.berkeley.edu/email: [email protected]

n Thanks for advice/support: DARPA,California MICRO, Hitachi, Intel, LGSemicon, Microsoft, Neomagic, Sandcraft,SGI/Cray, Sun Microsystems, TI, TSMC

Page 32: The Future of Microprocessors Embedded in Memory

32

Backup Slides

(The following slides are used to helpanswer questions)

Page 33: The Future of Microprocessors Embedded in Memory

33

VIRAM-1 Specs/GoalsTechnology 0.18-0.20 micron, 5-6 metal layers, fast

xtorMemory 16-32 MBDie size ≈ 200-300 mm2

Vector pipes/lanes 4 64-bit (or 8 32-bit or 16 16-bit)Serial I/O 4 lines @ 1 Gbit/sPoweruniversity ≈2 w @ 1-1.5 volt logicClockuniversity 200scalar/200vector MHzPerfuniversity 1.6 GFLOPS64 – 6.4 GOPS16

Powerindustry ≈1 w @ 1-1.5 volt logicClockindustry 400scalar/400vector MHzPerfindustry 3.2 GFLOPS64 – 12.8 GOPS16

2X

Page 34: The Future of Microprocessors Embedded in Memory

34

Mediaprocesing Functions (Dubey)Kernel Vector length

n Matrix transpose/multiply # vertices atonce

n DCT (video, comm.) image widthn FFT (audio) 256-1024n Motion estimation (video) image width,

i.w./16n Gamma correction (video) image widthn Haar transform (media mining) image widthn Median filter (image process.) image widthn Separable convolution (““) image width(from http://www.research.ibm.com/people/p/pradeep/tutor.html)

Page 35: The Future of Microprocessors Embedded in Memory

35

A More Revolutionary Approach:New Architecture Directions

n “...wires are not keeping pace with scalingof other features. … In fact, for CMOSprocesses below 0.25 micron ... anunacceptably small percentage of the diewill be reachable during a single clockcycle.”

n “Architectures that require long-distance,rapid interaction will not scale well ...”– “Will Physical Scalability Sabotage

Performance Gains?” Matzke, IEEEComputer (9/97)

Page 36: The Future of Microprocessors Embedded in Memory

36

“Vanilla” IRAM -Performance Conclusions

n IRAM systems with existing architecturesprovide moderate performance benefits

n High bandwidth / low latency used to speed upmemory accesses, not computation

n Reason: existing architectures developedunder assumption of low bandwidth memorysystem– Need something better than “build a bigger

cache”– Important to investigate alternative architectures

that better utilize high bandwidth and low latencyof IRAM