02-thermalmodelmanag3dmpsocs-atienza [modo de …ecofac2010.irisa.fr/cours-atienza-3.pdf · prof....

9
12/03/2010 1 Thermal Modeling and Management for 3D MPSoCs with Active Cooling P fD id Ati Al © ESL/EPFL 2010 Prof. David Atienza Alonso ECOFAC 2010, Lannion, 29/03 – 2/04 2010 Embedded Systems Laboratory (ESL) Institute of EE, Faculty of Engineering Advantages of 3D vs. 2D Chips Promises Reduce average length of on-chip global wires Increase number of devices reachable in given time budget Greatly facilitate heterogeneous integration (e g logic-DRAM stacks) © ESL/EPFL 2010 2 (e.g. logic DRAM stacks) Samsung Wafer Stack Package (WSP) memory [Figures: Ray Yarema, Fermilab] Latest chips increase power density Non-uniform hot-spots in 2D chips [Sun, 1.8 GHz Sparc v9 Microproc] In 3D chips, heat affects several layers! (even more “cool” components) Thermal-Reliability Issues in 3D Chips © ESL/EPFL 2010 [Sun, Niagara Broadband Processor] Courtesy: [IBM and Irvine Sens.] 3 Latest chips increase power density Non-uniform hot-spots in 2D chips [Sun, 1.8 GHz Sparc v9 Microproc] In 3D chips, heat affects several layers! (even more “cool” components) Thermal-Reliability Issues in 3D Chips © ESL/EPFL 2010 [Sun, Niagara Broadband Processor] Courtesy: [IBM and Irvine Sens.] 4 Higher chances of thermal wear-outs and very short lifetimes!

Upload: others

Post on 25-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 02-ThermalModelManag3DMPSoCs-Atienza [Modo de …ecofac2010.irisa.fr/cours-atienza-3.pdf · Prof. David Atienza Alonso ECOFAC 2010, Lannion, 29/03 – 2/04 2010 Embedded Systems Laboratory

12/03/2010

1

Thermal Modeling and Management for 3D MPSoCs with Active Cooling

P f D id Ati Al

© ESL/EPFL 2010

Prof. David Atienza Alonso

ECOFAC 2010, Lannion, 29/03 – 2/04 2010

Embedded Systems Laboratory (ESL) Institute of EE, Faculty of Engineering

Advantages of 3D vs. 2D Chips

Promises • Reduce average length of on-chip global wires• Increase number of devices

reachable in given time budget

• Greatly facilitate heterogeneous integration (e g logic-DRAM stacks)

© ESL/EPFL 20102

(e.g. logic DRAM stacks)

Samsung Wafer Stack Package (WSP) memory

[Figures: Ray Yarema, Fermilab]

Latest chips increase power density

Non-uniform hot-spots in 2D chips [Sun, 1.8 GHz Sparc v9

Microproc] In 3D chips, heat affects several

layers! (even more “cool” components)

Thermal-Reliability Issues in 3D Chips

© ESL/EPFL 2010

[Sun, Niagara

BroadbandProcessor]

Courtesy:[IBM

and Irvine Sens.]

3

Latest chips increase power density

Non-uniform hot-spots in 2D chips [Sun, 1.8 GHz Sparc v9

Microproc] In 3D chips, heat affects several

layers! (even more “cool” components)

Thermal-Reliability Issues in 3D Chips

© ESL/EPFL 2010

[Sun, Niagara

BroadbandProcessor]

Courtesy:[IBM

and Irvine Sens.]

4

Higher chances of thermal wear-outs

and very short lifetimes!

Page 2: 02-ThermalModelManag3DMPSoCs-Atienza [Modo de …ecofac2010.irisa.fr/cours-atienza-3.pdf · Prof. David Atienza Alonso ECOFAC 2010, Lannion, 29/03 – 2/04 2010 Embedded Systems Laboratory

12/03/2010

2

0 2000 4000 6000 8000 100000

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000 Layer 2

length (um)

wid

th (

um)

400

405

410

415

420

7000

8000

9000

10000 Layer 3

404

406

2nd Tier

Run-Time Heat Spreading in 3D Chips

5-tier 3D stack: 10 heat sources and sensors

Inject between 4W – 1.5W

© ESL/EPFL 2010

0 2000 4000 6000 8000 100000

1000

2000

3000

4000

5000

6000

7000

length (um)

wid

th (

um)

394

396

398

400

402

0 2000 4000 6000 8000 100000

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000 Layer 4

length (um)

wid

th (

um)

393

394

395

396

397

398

399

400

401

0 2000 4000 6000 8000 100000

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000 Layer 5

length (um)

wid

th (

um)

390

391

392

393

394

395

396

397

3rd Tier

4th Tier

5th Tier

Large and non-uniform heat propagation!

(up to 130º C on top tier)5

NanoTera CMOSAIC Project: Design of 3D MPSoCs with Advanced Cooling

3D systems require novel electro-thermal co-design• Academic partners: EPFL and ETHZ• Industrial: IBM Zürich

© ESL/EPFL 20106

NanoTera CMOSAIC Project: Design of 3D MPSoCs with Advanced Cooling

3D stacked MPSoC chips: microchannels etched on back side to circulate liquid coolant

S t

3D systems require novel electro-thermal co-design• Academic partners: EPFL and ETHZ• Industrial: IBM Zürich

© ESL/EPFL 2010

System Level Active

Cooling Manager

(3D heat flow

prediction)

adjustmentof coolant flux

task scheduling andexecution control

7

Outline

Introduction

3D chip thermal modeling framework

Validation of 3D thermal model

Liquid cooling modeling

Li id li d l lid ti

© ESL/EPFL 2010

Liquid cooling model validation

Close-loop 3D MPSoCs thermal management with active cooling

Experiments and conclusions

8

Page 3: 02-ThermalModelManag3DMPSoCs-Atienza [Modo de …ecofac2010.irisa.fr/cours-atienza-3.pdf · Prof. David Atienza Alonso ECOFAC 2010, Lannion, 29/03 – 2/04 2010 Embedded Systems Laboratory

12/03/2010

3

Compact RC-Based Tier Thermal Model

q qb_right

qb_left

qb_top qb_back

Gate-level thermal model

RC Network of

Si/metal layer cells

qbTbq jf

ffjbj

0

6

1

© ESL/EPFL 20109

qb_bottomqb_front

qb_top = htopA(Ta-Ttop)

qb_bottom = hbottomA(Ta-Tbottom)

I•

(qbi)I

I+1•

(qbi)I+1

face i

I-1•

2D tier modeled as heat flux moving between adjacent cells

Convective boundary conditions between layers in tier

cells

[Atienza et al., TODAES 2007]

Multi-level execution for thermal convergence in 3D

• Local (2D-tier), liquid channels and global (3D) propagation

Evaluate local temperature for each cell

erg

ence

ns

Complete 3D Chip Thermal Modeling

© ESL/EPFL 2010

Tie

r-le

vel c

onve

N it

erat

ion

Feedbacktemperature Update with neighbour

temperature difusion

10

Go to next tier or microchannel

[Ayala et al., NanoNet 2009]

3D Chip Thermal Library Validation

Extensible set of layers in 3D stack• up to 9 tiers and heat spreader

• Pre-defined layers:

Silicon, copper (10 layers), glue, overmold, interposer, bump

Configurable nr. of cells and iterations per tier

© ESL/EPFL 2010

• Initially 10ms thermal interval (1000 iterat./tier)

Multi-tier test chip manufactured at EPFL:

11

3D Chip Thermal Library Validation

Extensible set of layers in 3D stack• up to 9 tiers and heat spreader

• Pre-defined layers:

Silicon, copper (10 layers), glue, overmold, interposer, bump

Configurable nr. of cells and iterations per tier

© ESL/EPFL 2010

• Initially 10ms thermal interval (1000 iterat./tier)

Multi-tier test chip manufactured at EPFL:• Three types of tiers

12

Page 4: 02-ThermalModelManag3DMPSoCs-Atienza [Modo de …ecofac2010.irisa.fr/cours-atienza-3.pdf · Prof. David Atienza Alonso ECOFAC 2010, Lannion, 29/03 – 2/04 2010 Embedded Systems Laboratory

12/03/2010

4

3D Thermal Library Validation: Creating Various 3D Thermal Maps

Flexibility for thermal characterization

© ESL/EPFL 2010 13

Flexibility for thermal characterization

3D Thermal Library Validation: Creating Various 3D Thermal Maps

© ESL/EPFL 2010 14

3D Thermal Library Validation: Creating Various 3D Thermal Maps

Flexibility for thermal characterization

© ESL/EPFL 2010

10 heat sources and sensors per layer, accesible to be simultaneously activated

15

3D Thermal Library Validation: Correlation with 5-Tier 3D Stack

242526272829303132

0 200 400 600 800 1000 1200

Se

ns

or V

olt

ag

e (

mV

)

3D Chip, EPFL, Layer 3 characterizationBlue Curve: 3D current -heat model for D8Pink curve: Heater current measured in D8

Dev8

D7HD8S

© ESL/EPFL 2010

0 200 400 600 800 1000 1200

Heater Current (mA), applied to Dev 7

24

26

28

30

32

34

36

38

0 200 400 600 800 1000 1200 1400

Sen

sor

Vo

ltag

e (m

V)

Heater Current (mA), applied to Dev 7

3D Chip, EPFL, multi-tier characterizationBue/Pink Curve: D7 (tier 1) and D8 (tier 4)Red Curve: 3D current-heat model for D8

Dev6

Dev7

Div6_Iheat7

16

Page 5: 02-ThermalModelManag3DMPSoCs-Atienza [Modo de …ecofac2010.irisa.fr/cours-atienza-3.pdf · Prof. David Atienza Alonso ECOFAC 2010, Lannion, 29/03 – 2/04 2010 Embedded Systems Laboratory

12/03/2010

5

3D Thermal Library Validation: Correlation with 5-Tier 3D Stack

242526272829303132

0 200 400 600 800 1000 1200

Se

ns

or V

olt

ag

e (

mV

)

3D Chip, EPFL, Layer 3 characterizationBlue Curve: 3D current -heat model for D8Pink curve: Heater current measured in D8

Dev8

D7HD8S

© ESL/EPFL 2010

0 200 400 600 800 1000 1200

Heater Current (mA), applied to Dev 7

24

26

28

30

32

34

36

38

0 200 400 600 800 1000 1200 1400

Sen

sor

Vo

ltag

e (m

V)

Heater Current (mA), applied to Dev 7

3D Chip, EPFL, multi-tier characterizationBue/Pink Curve: D7 (tier 1) and D8 (tier 4)Red Curve: 3D current-heat model for D8

Dev6

Dev7

Div6_Iheat7

17

Variations of less than 1.5% between 3D stack measurements and new 3D thermal model

Modeling Through Silicon Vias (TSVs) in 3D Stacks

TSVs:• Size: 5-10um x 10-100um• TSVs change resistivity of interlayer material (IM)

Figure: LSM-EPFL

© ESL/EPFL 2010

Modeling Granularities:1. Homogeneous

distribution, one R value for the IM

2. Different R value per unit (core, cache, etc.)

3. Exact locations of TSVs

• Higher accuracy• Higher complexity

18

Source: IBM Zürich and Y.Heights

TSV Modeling Accuracy in 3D Stacks

© ESL/EPFL 2010 19

Chosen to model TSV groups in localized positions of 3D MPSoCs

Liquid Flux Model for Laminar Flow

Local junction temperature modeled as RC network:

Rtot = Rcond + Rconv + Rheat

Rtot = 1/(Gsi/t + 1/Rb) + A/(bAt) + A/(VPcp)

Thermal resist. of Si

Chip back-side temperature

Heat source

Si b Heating Fl t d

© ESL/EPFL 2010

Thermal resistance of

wiring

temperature

T1

q1

q2

P1

P2

Si base thickness

Heating area Total area

Flow rate and density

Dependence of thermal resistance in liquid flux modeled as a quadratic form• Variable value of coolant flux (Φ)

ΔRheat ≈ aΦ + bΦ2 ; b << a

[Atienza et al., THERMINIC ’09 and DATE ‘10] 20

Page 6: 02-ThermalModelManag3DMPSoCs-Atienza [Modo de …ecofac2010.irisa.fr/cours-atienza-3.pdf · Prof. David Atienza Alonso ECOFAC 2010, Lannion, 29/03 – 2/04 2010 Embedded Systems Laboratory

12/03/2010

6

3D Thermal Model with Liquid Cooling

New set of layers in 3D stack• 3D stack (up to 9 tiers)

• 1 microchannel and coolant flow per tier

5-tier stack with microchannels and manifold cooling seal manufactured at IBM/EPFL• Enables different multi-tier liquid flux injection

© ESL/EPFL 2010

PCB

Micro-HeaterLiquid

Micro-Channels

q j

21

[Courtesy: IBM Zürich Labs]

Front-side

Manufacturing of 5-Tier 3D Test Chip with Liquid Channels in Multiple Tiers

Back-side

© ESL/EPFL 2010 22

Adding multi-tier liquid cooling in-/out-lets

[Courtesy: IBM Zürich Labs]

Multi-tier active cooling technology feasible for

3D-stacked chips

Correlation Results: Liquid Cooling and 3D Heat Transfer

Temperature evolution at the junction (Tj)• Tested range: 0.015 to 0.15 L/min

• Similar accuracy results at different channels

q1q2 P

2

© ESL/EPFL 2010

Avg Max temp Error= 0.6%

T1 P1

23

q1q2 P

2Correlation Results:

Liquid Cooling and 3D Heat Transfer

Temperature evolution at the junction (Tj)• Tested range: 0.015 to 0.15 L/min

• Similar accuracy results at different channels

© ESL/EPFL 2010

T1 P1

Avg Max temp Error= 0.6%

Variations of less than 1% between measurements and RC-based 3D thermal model with liquid cooling

24

Page 7: 02-ThermalModelManag3DMPSoCs-Atienza [Modo de …ecofac2010.irisa.fr/cours-atienza-3.pdf · Prof. David Atienza Alonso ECOFAC 2010, Lannion, 29/03 – 2/04 2010 Embedded Systems Laboratory

12/03/2010

7

Complete 3D Chip Thermal Modeling Flow with Liquid Cooling

Scheduler (Reactive Proactive)

Inputs: • Workload information

• Floorplan, TSV areas, package• Temperature (for dynamic policies) Power Manager (DPM)

Inputs: • Workload information

• Activity of cores

Inputs: • Power trace for each unit

© ESL/EPFL 2010

Scheduler (Reactive, Proactive)

3D Thermal Simulator w. Liquid Cooling based on EPFL-IBM 3D chips

(Integrated within internal HotSpot tool version)

Power trace for each unit• Floorplan, package and die properties

(Niagara-1), TSV area percentage/distribution• Flow rate

Transient Temperature Response for Each Unit

25

sniffer

sniffersniffer

iff

sniffer

I/O

SRAM SRAM

I/O

CPU

SRAM

CPU

CPUCPU

Multi-Proc. OS + DVFS + Task Migration

Sw app 1 ... Sw app NZeroZero--delaydelayMPSoCMPSoC

architecturearchitecturesimulationsimulation

Run-Time HW/SW Thermal Modeling Framework for 3D Chips

Exploitation of both hardware and software benefits

Energy of 2D components

© ESL/EPFL 2010

sniffer

sniffer

sniffersniffer

sniffersnifferMPSoC Behavior

Emulation on FPGA

Temp. (T) of 2D components

Software Thermal

Model

Host PC

si sisi

sisi

sisi

sisi

cu cucucucu

standardstandard Ethernet Ethernet connectionconnection & & dedicateddedicated HW monitorHW monitorDetailedDetailed

thermalthermalanalysisanalysis of of 2D 2D MPSoCMPSoC

layoutlayout

simulationsimulation p

26[D. Atienza et al., TODAES 2007]

sniffer

sniffersniffer

iff

sniffer

I/O

SRAM SRAM

I/O

CPU

SRAM

CPU

CPUCPU

Run-Time HW/SW Thermal Modeling Framework for 3D Chips

Multi-Proc. OS + DVFS + Task Migration

Sw app 1 ... Sw app N

Energy of 3D components

Exploitation of both hardware and software benefits

ZeroZero--delaydelayMPSoCMPSoC

architecturearchitecturesimulationsimulation

© ESL/EPFL 2010

3D StackThermal Model

sniffer

sniffer

sniffersniffer

sniffersnifferMPSoC Behavior

Emulation on FPGA

standardstandard Ethernet Ethernet connectionconnection & & dedicateddedicated HW monitorHW monitor

1st Tier

Nth Tier

Host PC

Temp. of3D components

componentssimulationsimulation

[D. Atienza, THERMINIC 2009] 27

Thermal Management for 3D-MPSoCs with Liquid Cooling

Active-Adapt3D: Combined policy manager (Best-Paper Award at IEEE/IFIP VLSI-SoC 2009)

• Predictive, floorplan-based task assignment and DVFS

• Close-loop variable liquid cooling control T ≥ 80°C Increment flow rate ; T < 80°C Decrement

P li b li d ti l ti l

© ESL/EPFL 2010

Policy can be applied reactively or proactively

28

ARMA‐Based 

Predictor 

Flow Rate Tuner

Thermal Sensors

Temperature Measurements

Temperature Forecast

System Temperature Dynamics

REACTIVE

PROACTIVE

Page 8: 02-ThermalModelManag3DMPSoCs-Atienza [Modo de …ecofac2010.irisa.fr/cours-atienza-3.pdf · Prof. David Atienza Alonso ECOFAC 2010, Lannion, 29/03 – 2/04 2010 Embedded Systems Laboratory

12/03/2010

8

Adaptive Thermal-Aware Task Assignment Policy for 3D MPSoCs

Cores on layers closer to the heat sink can be cooled faster in comparison to cores further away

Adapt-3D assigns a thermal index ( ) to each core in order to distinguish the location of the cores• Higher Core more prone to hot spotsi

© ESL/EPFL 2010

Higher Core more prone to hot spotsi

321

For cores at locations 1, 2 and 3:

Chip‐1

Chip‐2 2

3

29[Coskun and Atienza, DATE ‘09]

Adaptive Thermal-Aware Task Assignment Policy for 3D MPSoCs

WPP tt 1

preferredavginitinc TTW if

1

Probability of receiving workload at time t:

Weight:

Cool core

© ESL/EPFL 2010

avgpreferredinit TTW

preferredavgiinitdec

preferredavgi

initinc

TTW

WW

if

Hot core

E.g., 80oC

For each core

Measured by sensors

Empirical constants

30

Target 3D systems based on 3D version Sun UltraSPARC T1 • Power values and workloads from real traces measured in Sun

platforms (multimedia players, web servers, databases, etc.)

Cores and caches in separate layers

Experiments 3D Thermal Management: 3D MPSoCs with Microchannels

© ESL/EPFL 2010

Channels:

Width 400um, Depth 250um.

Four flow rate settings, default

at 15ml/min.31

(EXP1-2) (EXP3) (EXP4)

Thermal Management for 3D Chips: Active-Adapt3D Comparisons

Predictive task scheduling, active cooling and floorplan-aware DVFS achieves less than 5% hotspots

© ESL/EPFL 2010

Promising figures for thermal control in 3D-MPSoCs32

Page 9: 02-ThermalModelManag3DMPSoCs-Atienza [Modo de …ecofac2010.irisa.fr/cours-atienza-3.pdf · Prof. David Atienza Alonso ECOFAC 2010, Lannion, 29/03 – 2/04 2010 Embedded Systems Laboratory

12/03/2010

9

Thermal Management in 3D Chips: Active-Adapt3D Comparisons

Variable multi-tier flow control useful for 3D systems with 3+ layers.

Proactive thermal management achieves:• 75% reduction in spatial gradients on average -- for fixed flow rate

• 97% reduction in spatial gradients on average -- for variable flow rate

© ESL/EPFL 2010

*LC: Multi-tier variable liquid

cooling

33

Cooling power savings up to 67% to worst-case flux

Complexity of coming 3D MPSoC chips requires novelthermal modeling approaches• Application of simple RC-based methods demonstrated,

validated with 3D test chip

• Initial model of liquid cooling channels in 3D chips Simple RC laminar flow model, works well with variable liquid

fluxes (errors of less than 2%)

Conclusions

© ESL/EPFL 2010

fluxes (errors of less than 2%)

Integrated the compact model into custom HotSpot tool

New thermal management: feedback controller adjusts flow rate to allowed temperature with job assignment and DVFS• Proactive control improves the hot spot reduction to 95% for

systems with variable flow rates, and reduces thermal variations

• Dynamic flow rate adjustment is helpful in reducing the energy cost of the pump and overall system (67% power savings)

34

Key References and Bibliography

Thermal modeling and FPGA-based emulation• “Emulation-Based Transient Thermal Modeling of 2D/3D Systems-on-Chip with

Active Cooling”, David Atienza, Proc. of the 15th International Workshop on Thermal Investigations of ICs and Systems (THERMINIC ‘09), IEEE Press, Belgium, October, 2009.

Thermal management for 3D MPSoCs• “Energy-Efficient Variable-Flow Liquid Cooling in 3D Stacked Architectures”,

Ayse K. Coskun, David Atienza, et al., Proc. of Design, Automation and Test in

© ESL/EPFL 2010

y C , , , g ,Europe Conference (DATE ‘10), Dresden, Germany, March 2010,

• “Modeling and Dynamic Management of 3D Multicore Systems with Liquid Cooling”, Ayse K. Coskun, et al., Proc. of 17th Annual IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC ‘09), Brazil, October 2009.(Best Paper Award)

• “Through Silicon Via-Based Grid for Thermal Control in 3D Chips”, Jose L. Ayala, et al., Proc. of the 4th International ICST Conference on Nano-Networks (Nano-Net ‘09), Switzerland, October 2009.

• “Dynamic Thermal Management in 3D Multicore Architectures”, Ayse K. Coskun, et al., Proc. of Design, Automation and Test in Europe (DATE ‘09), France, April 2009.

Nano-Tera.ch Swiss Engineering Programme

© ESL/EPFL 2010

QUESTIONS ?

European Commission

Swiss National Science Foundation