spring 07, feb 20 elec 7770: advanced vlsi design (agrawal) 1 elec 7770 advanced vlsi design spring...

24
Spring 07, Feb 20 Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Ag ELEC 7770: Advanced VLSI Design (Ag rawal) rawal) 1 ELEC 7770 ELEC 7770 Advanced VLSI Design Advanced VLSI Design Spring 2007 Spring 2007 Reducing Power through Multicore Reducing Power through Multicore Parallelism Parallelism Vishwani D. Agrawal Vishwani D. Agrawal James J. Danaher Professor James J. Danaher Professor ECE Department, Auburn University ECE Department, Auburn University Auburn, AL 36849 Auburn, AL 36849 [email protected] [email protected] http://www.eng.auburn.edu/~vagrawal/COURSE/E77 http://www.eng.auburn.edu/~vagrawal/COURSE/E77 70_Spr07 70_Spr07

Post on 20-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 11

ELEC 7770ELEC 7770Advanced VLSI DesignAdvanced VLSI Design

Spring 2007Spring 2007Reducing Power through Multicore ParallelismReducing Power through Multicore Parallelism

Vishwani D. AgrawalVishwani D. AgrawalJames J. Danaher ProfessorJames J. Danaher Professor

ECE Department, Auburn UniversityECE Department, Auburn University

Auburn, AL 36849Auburn, AL 36849

[email protected]@eng.auburn.edu

http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07http://www.eng.auburn.edu/~vagrawal/COURSE/E7770_Spr07

Page 2: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 22

Power Dissipation in CMOS Power Dissipation in CMOS Logic (0.25µ)Logic (0.25µ)

%75 %5%20

PPtotaltotal (0→1) = (0→1) = CCLL V VDDDD22

+ + ttscscVVDDDD I Ipeakpeak ++ VVDDDDIIleakageleakage

CL

VDD VDD

Page 3: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 33

Low-Power Datapath ArchitectureLow-Power Datapath Architecture Lower supply voltageLower supply voltage

This slows down circuit speedThis slows down circuit speed Use parallel computing to gain the speed backUse parallel computing to gain the speed back

Works well when threshold voltage is also lowered.Works well when threshold voltage is also lowered. About 60% reduction in power obtainable.About 60% reduction in power obtainable. Reference: A. P. Chandrakasan and R. W. Brodersen, Reference: A. P. Chandrakasan and R. W. Brodersen,

Low Power Digital CMOS DesignLow Power Digital CMOS Design, Boston: Kluwer , Boston: Kluwer Academic Publishers (Now Springer), 1995.Academic Publishers (Now Springer), 1995.

Page 4: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 44

A Reference DatapathA Reference Datapath

Combinationallogic

OutputInputR

eg

iste

r

Re

gis

ter

CK

Supply voltage = Vref

Total capacitance switched per cycle = Cref

Clock frequency = fPower consumption: Pref = CrefVref

2f

Cref

Page 5: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 55

A Parallel ArchitectureA Parallel Architecture

Comb.Logic

Copy 1

Comb.Logic

Copy 2

Comb.Logic

Copy N

Re

gis

ter

Re

gis

ter

Re

gis

ter

Re

gis

ter

N to

1 m

ulti

ple

xer

MultiphaseClock gen. and mux

control

InputOutput

CK

f

f/N

f/N

f/N

Each copy processes every Nth input, operates at reduced voltage

Supply voltage:VN ≤ V1 = Vref

N = Deg. of parallelism

Page 6: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 66

Level Converter: L to HLevel Converter: L to H

Vin_L

Vout_H

VDDH

VDDL

Transistors with thicker oxide and longer channels

N. H. E. Weste and D. Harris, CMOS VLSI Design, ThirdEdition, Section 12.4.3, Addison-Wesley, 2005.

Page 7: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 77

Level Converter: H to LLevel Converter: H to L

Vin_H Vout_L

VDDLTransistors with thicker oxide and longer channels

N. H. E. Weste and D. Harris, CMOS VLSI Design, ThirdEdition, Section 12.4.3, Addison-Wesley, 2005.

Page 8: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 88

Control Signals, N = 4Control Signals, N = 4

CK

Phase 1

Phase 2

Phase 3

Phase 4

Page 9: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 99

PowerPowerPN = Pproc + Poverhead

Pproc = N(Cinreg+ Ccomb)VN2f/N + CoutregVN

2f

= (Cinreg+ Ccomb+Coutreg)VN2f

= CrefVN2f

Poverhead = CoverheadVN2f ≈ δCref(N – 1)VN

2f

PN = [1 + δ(N – 1)]CrefVN2f

PN VN2

── = [1 + δ(N – 1)] ───P1 Vref

2

Page 10: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1010

Voltage vs. SpeedVoltage vs. Speed CLVref CLVref

Delay of a gate, T ≈ ──── = ────────── I k(W/L)(Vref – Vt)2

where I is saturation currentk is a technology parameterW/L is width to length ratio of transistorVt is threshold voltage

Supply voltage

No

rma

lize

d g

ate

de

lay,

T

4.0

3.0

2.0

1.0

0.0 Vt Vref =5VV2=2.9V

N=1

N=2

V3

N=31.2μ CMOS Voltage reduction

slows down as we get closer to Vt

Page 11: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1111

Increasing MultiprocessingIncreasing Multiprocessing

PN/P1

1 2 3 4 5 6 7 8 9 10 11 12

1.0

0.8

0.6

0.4

0.2

0.0

Vt=0V (extreme case)

Vt=0.4V

Vt=0.8V

N

1.2μ CMOS, Vref = 5V

Page 12: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1212

Extreme Cases: VExtreme Cases: Vtt = 0 = 0Delay, T α 1/ Vref

For N processing elements, delay = NT → VN = Vref/N

PN 1── = [1+ δ (N – 1)] ── → 1/NP1 N2

For negligible overhead, δ→0

PN 1── ≈ ──P1 N2

For Vt > 0, power reduction is less and there will be an optimum value of N.

Page 13: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1313

Example: Multiplier CoreExample: Multiplier Core

Specification:Specification: 200MHz Clock200MHz Clock 15W dissipation @ 5V15W dissipation @ 5V Low voltage operation, VLow voltage operation, VDDDD ≥ 1.5 volts ≥ 1.5 volts

(V(VDDDD – 0.5) – 0.5)22

Relative clock rate = Relative clock rate = ────────────── 20.2520.25

Problem:Problem: Integrate multiplier core on a SOCIntegrate multiplier core on a SOC Power budget for multiplier ~ 5WPower budget for multiplier ~ 5W

Page 14: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1414

A Multicore DesignA Multicore Design

MultiplierCore 1

MultiplierCore 5

Reg

RegR

egR

eg

5 to

1 m

ux

MultiphaseClock gen.

and muxcontrol

Input

Output

200MHzCK

200MHz

40MHz

40MHz

40MHz

MultiplierCore 2

Core clock frequency = 200/N, N should divide 200.

Page 15: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1515

How Many Cores?How Many Cores?

For N cores:For N cores: clock frequency = 200/N MHzclock frequency = 200/N MHz

Supply voltage, VSupply voltage, VDDNDDN= 0.5 + (20.25/N)= 0.5 + (20.25/N)1/21/2 Volts Volts

Assuming 10% overhead per core,Assuming 10% overhead per core, VVDDNDDN

Power dissipation =15 [1 + 0.1(N – 1)] Power dissipation =15 [1 + 0.1(N – 1)] ((──────))2 2

wattswatts 55

Page 16: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1616

Design TradeoffsDesign TradeoffsNumber of coresNumber of cores

NNClock (MHz)Clock (MHz) Core supply VDDN Core supply VDDN

(Volts)(Volts)Total PowerTotal Power

(Watts)(Watts)

11 200200 5.005.00 15.015.0

22 100100 3.683.68 8.948.94

44 5050 2.752.75 5.905.90

55 4040 2.512.51 5.295.29

88 2525 2.102.10 4.504.50

Page 17: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1717

Power Reduction in ProcessorsPower Reduction in Processors

Just about everything is used.Just about everything is used. Hardware methods:Hardware methods:

Voltage reduction for dynamic powerVoltage reduction for dynamic power Dual-threshold devices for leakage reductionDual-threshold devices for leakage reduction Clock gating, frequency reductionClock gating, frequency reduction Sleep modeSleep mode

Architecture:Architecture: Instruction setInstruction set hardware organizationhardware organization

Software methodsSoftware methods

Page 18: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1818

Parallel ArchitectureParallel Architecture

Processor

f

Processor

f/2

Processor

f/2

f

Input Output

Input

Output

Capacitance = CVoltage = VFrequency = fPower = CV2f

Capacitance = 2.2CVoltage = 0.6VFrequency = 0.5fPower = 0.396CV2f

Page 19: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 1919

Pipeline ArchitecturePipeline Architecture

Processor

f

Input Output

Re

gis

ter

½Proc.

f

Input Output

Re

gis

ter

½Proc.

Re

gis

ter

Capacitance = CVoltage = VFrequency = fPower = CV2f

Capacitance = 1.2CVoltage = 0.6VFrequency = fPower = 0.432CV2f

Page 20: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2020

Approximate TrendApproximate Trend n-parallel proc.n-parallel proc. n-stage pipeline proc.n-stage pipeline proc.

CapacitanceCapacitance nCnC CC

VoltageVoltage V/nV/n V/nV/n

FrequencyFrequency f/nf/n ff

PowerPower CVCV22f/nf/n22 CVCV22f/nf/n22

Chip areaChip area n timesn times 10-20% increase10-20% increase

G. K. Yeap, Practical Low Power Digital VLSI Design, Boston: KluwerAcademic Publishers, 1998.

Page 21: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2121

Multicore ProcessorsMulticore Processors

2000 2004 2008

Per

form

ance

bas

ed o

nS

PE

Cin

t200

0 an

d S

PE

Cfp

2000

ben

chm

arks

Multicore

Single core

Computer, May 2005, p. 12

Page 22: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2222

Multicore ProcessorsMulticore Processors

D. Geer, “Chip Makers Turn to Multicore Processors,” D. Geer, “Chip Makers Turn to Multicore Processors,” ComputerComputer, vol. 38, no. 5, pp. 11-13, May 2005., vol. 38, no. 5, pp. 11-13, May 2005.

A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor A. Jerraya, H. Tenhunen and W. Wolf, “Multiprocessor Systems-on-Chips,” Systems-on-Chips,” ComputerComputer, vol. 5, no. 7, pp. 36-40, , vol. 5, no. 7, pp. 36-40, July 2005; July 2005; this special issue contains three more this special issue contains three more articles on multicore processorsarticles on multicore processors..

S. K. Moore, “Winner Multimedia Monster – Cell’s Nine S. K. Moore, “Winner Multimedia Monster – Cell’s Nine Processors Make It a Supercomputer on a Chip,” Processors Make It a Supercomputer on a Chip,” IEEE IEEE SpectrumSpectrum, vol. 43. no. 1, pp. 20-23, January 2006. , vol. 43. no. 1, pp. 20-23, January 2006.

Page 23: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2323

Cell - Cell Broadband Engine Cell - Cell Broadband Engine ArchitectureArchitecture

L to RAtsushi Kameyama, ToshibaJames Kahle, IBMMasakazu Suzoki, Sony

© I

EE

E S

pe

ctru

m,

Jan

ua

ry 2

00

6

Nine-processor chip:192 Gflops

Page 24: Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Reducing Power through Multicore Parallelism Vishwani

Spring 07, Feb 20Spring 07, Feb 20 ELEC 7770: Advanced VLSI Design (Agrawal)ELEC 7770: Advanced VLSI Design (Agrawal) 2424

Cell’s Nine-Processor ChipCell’s Nine-Processor Chip

© IEEE Spectrum, January 2006 Eight IdenticalProcessors f = 5.6GHz (max)44.8 Gflops