resonant clock design for a power-efficient, high-volume...
TRANSCRIPT
![Page 1: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/1.jpg)
Resonant Clock Design for a Power-efficient, High-volume
x86-64 Microprocessorx86 64 Microprocessor
Visvesh Sathe1 Srikanth Arekapudi1 Alexander Ishii2 Visvesh Sathe , Srikanth Arekapudi , Alexander Ishii , Charles Ouyang1, Marios Papaefthymiou2,3,
Samuel Naffziger1
1Advanced Micro Devices, Inc.,2Cyclos Semiconductor, Inc.
3The University of Michigan, Ann Arbor
![Page 2: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/2.jpg)
Outline
• Piledriver (PD) overview
• Resonant clocking• Resonant clocking
• Clock driver design
• Inductor design
• Other resonant clock componentsOther resonant clock components
• Putting it all together
• Measurement results
• Conclusion
Paper 3.7
Conclusion
![Page 3: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/3.jpg)
Piledriver
• Extension of Bulldozer Architecture - 2-core module:• Shared I-fetch, Decode, Br. Predict, FP, L2, Cache unit, , , , ,• Per-core Integer schedule/execute, Load/Store
• 32nm CMOS• HKMG SOI
BP, DecodeI-Fetch• HKMG, SOI
• 11 metal layers• 33.3 mm2 w/L2
I Fetch
e U
nit
Core 0 Core 133.3 mm w/L2• 216M transistors• 0.8-1.3 V
FPUCa
ch
Core 0 Core 1
2MB L2 Cache
Paper 3.7
FPU Cache
![Page 4: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/4.jpg)
The Piledriver Global ClockSi ifi t l b l l k l di• Significant global clock loading
– 7ps clock grid skew target across 21mm2 core area
– Constrained clock latency from grid to timing elements
24% of average application• 24% of average applicationpower in global clock
Paper 3.7
![Page 5: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/5.jpg)
Basic Resonant Clocking Operation
• Rely on efficient LC resonance between spiral inductors and grid capacitance near resonant frequencyand grid capacitance near resonant frequency
• Efficient operation around natural frequency • Driving clock at much lower frequencies Reduced efficiency Warped clock waveform
![Page 6: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/6.jpg)
Resonant Clocking: Mutual Inductance
• Significant mutual inductance interaction from– Signal nets under and around inductor windings.g g– Power and ground nets serving circuits under inductor
• Mutual inductance causes L Q Cl k – L, Q, Clock power
• Maintaining keep-out regions is prohibitive (~5% area penalty)
![Page 7: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/7.jpg)
Resonant Clocking on PileDriver
plsEn
3rclk_x
CT/2TSw
preclk
p
drvEn
MSw
C /2
T/clkτ
3drvEn rclk
Clk Tank
CT/2On-chipSpiral Inductor
• Dual mode clock system with Mode switch (MSw)
Clock Driver Grid CapInductor
y ( )– resonant clocking (rclk) when MSw is closed– conventional clocking (cclk) when MSw is open
• CT ≈ 6xCclk to serve as effective AC ground• Throttle Switch (TSw) to address transient voltage spikes
![Page 8: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/8.jpg)
Piledriver Resonant Clocking
• 92 distributed inductors• 92 distributed inductors• MSw, TankCap, TSw
inductors contained in HCK
Paper 3.7
Tree Macros• Conventional L2CLK, NBCLK
![Page 9: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/9.jpg)
Basic Rclk Operation (1/6)
• Rclk operation can be partitioned into 6 phases• Voltage across Ctank ~Vdd/2• Nmos conducting Pmos off• Nmos conducting, Pmos off• R-L Current buildup through the Nmos• IR drop across Nmos and grid resistance clock voltage IR drop across Nmos and grid resistance clock voltage
![Page 10: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/10.jpg)
Basic Rclk Operation (2/6)
• Both, Nmos and Pmos are off• LC oscillation with initial inductor current to charge clk• Clk voltage transition a function of L,C and initial current.
![Page 11: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/11.jpg)
Basic Rclk Operation (3/6)
• Pmos on, Nmos off.• Both LC and pull-up mechanisms in effect
![Page 12: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/12.jpg)
Basic Rclk Operation (4/6)
• Pmos on, Nmos off• RL current buildup through the Pmos• IR drop across Pmos and grid clock voltage increase
![Page 13: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/13.jpg)
Basic Rclk Operation (5/6)
• Both driver devices are off• LC oscillation with initial inductor current to discharge clk• Clk voltage transition a function of L,C and initial current
![Page 14: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/14.jpg)
Basic Rclk Operation (6/6)
• Nmos on, Pmos off• Both LC and pull-down mechanisms in effect
![Page 15: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/15.jpg)
Cclk and Rclk Waveforms• Reduced driver strength required for rclk• Lower rclk slew insertion delay increase (phase offset)
D l d t f d i d i lk l h ff t• Delayed onset of driver devices rclk_pulse phase offset
rclk square
cclk
rclk_squarerclk_pulse
cclk
Paper 3.7
![Page 16: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/16.jpg)
Clock Driver Design
• Clock driver palette with 24 driversUp to 4 blocks Each block can contain up to 6 subBlocks– Up to 4 blocks. Each block can contain up to 6 subBlocks
– Effective granularity and efficiency tradeoff
• Run-time programmable drive strength modulation support
Paper 3.7
p g g pp– Each subBlock consists of 3 banks (2:3:2 ratio)– drvEn[2:0] signals allow for n/7 (n=2,3,4,5,7) drive modulation
![Page 17: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/17.jpg)
Driver Design
• Split buffer design p g– Skewed pre-drivers for reduced crossover current– Allows insertion delay management of rclk w.r.t. cclk, NB and L2
clocks
Paper 3.7
clocks
![Page 18: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/18.jpg)
Driver Design: Pulse Mode• Subtractive pulse-generator
scheme – Delay chain used to delay Delay chain used to delay
asserting edges of nmos and pmos devicesDe asserting transitions not – De-asserting transitions not delayed
– Ontime is a function of input duty cycle and delay amount
• Benefits over traditional pulse generationpulse generation– Lower variation (smaller delay)– Support for Off P-state operation
Paper 3.7
– Allows PLL duty cycle tuning
![Page 19: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/19.jpg)
Inductor Design
Paper 3.7
• Clk macro, bump pitch constrain inductor size
![Page 20: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/20.jpg)
Inductor Design (cont’d)
• Metal sharing with existing power cut-aways
Paper 3.7
• Metal sharing with existing power cut-aways• Center power straps, HCK tree through inductor
for mutual inductance cancellation
![Page 21: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/21.jpg)
Inductor Design (cont’d)
• Custom “loopless” power under inductor to avoid Q
Paper 3.7
Custom loopless power under inductor to avoid Q degradation due to power grid eddys
• TankCap built in Si, metal to meet capacitance, ESR target
![Page 22: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/22.jpg)
Other rclk Components: Mode Switchlk i t lk• rclk resistance ↔ cclk
loading tradeoff• Mode switch offers fet
resistance– Voltage dependent– Lower overdrive → Negative Lower overdrive → Negative
Temperature Coefficient
• cclk → rclk mode causes excessive clock grid loadingexcessive clock grid loading– V(n1) not always well
defined in cclk modeR lt i d d l k – Results in reduced clock amplitude and degraded slew → timing impact
![Page 23: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/23.jpg)
Mode Switch (Contd.)
St i t h i d• Staging techniques used– Turn on Mode Switch in stages (like in power gating)– “Warm” up n0 before making a low resistance connection to grid.
![Page 24: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/24.jpg)
Voltage Overshoot on Mode Switch
• rclk → cclk transition can result in voltage overshoot on n1. • Oxide stress poses a reliability issue.
![Page 25: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/25.jpg)
Throttle Switch• Throttle switch
connected across inductor– Low resistance help p
damp overshoot– Turns on as Mode
Switch turns offSwitch turns off
![Page 26: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/26.jpg)
Other rclk Components: CTank
• CTank acts as an A.C ground connection.
• C, ESR to serve as a low C, ESR to serve as a low loss ground connections– Low resistance power,
ground inductor ground, inductor connection
– High bandwidth decap• Implemented underneath
inductor using Si and metal• Distributed CTank connection Distributed CTank connection
using inductor winding for low resistance contact
• Metal cap built to meet C • Metal cap built to meet C, ESR requirements
![Page 27: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/27.jpg)
Clock Tuning• 7ps grid skew target across
22mm2
• Heavy clock grid loading requires • Heavy clock grid loading requires effective strategies for:– Grid wire tuning– Clock driver tuning– Inductor tuning
• Elmore delay-aware local wire • Elmore delay-aware local wire routing solution.
• Clock wire tuning algorithm t t t k ithmeets target skew with
– Inductance-aware clock spine geometries
– Minimal clock spine capacitance
![Page 28: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/28.jpg)
Clock Tuning (contd.)• Global clock load varies significantly across core• Effective driver and Inductor allocation key to maintaining
clock skew clock skew – Driver palette size : 24– Inductor palette size : 5 (0.5—1.3nH range)
• Iterative Linear Programming-based algorithm for driver and inductor allocation 1 Start with initial driver/inductor assignment1. Start with initial driver/inductor assignment2. Linearize problem – Obtain sensitivity matrix for each
driver/inductor location3. Setup L.P, solve for optimal assignment4. Run full chip clock skew analysis5 If skew budget not met goto 25. If skew budget not met, goto 2
![Page 29: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/29.jpg)
Clock Tuning (contd)Cl k k t l f lk d lk• Clock skew control for cclk and rclk– Wire tuning algorithm to constrain clock latency– Iterative LP formulation for optimal driver and inductor Iterative LP formulation for optimal driver and inductor
assignment– Interleaved driver and inductor placement
Additi l lk k i t 1– Additional rclk skew impact <1ps
Clock MeshClock MeshLoad
Global ClockSpine
Driver-MSw Shorting Bar
![Page 30: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/30.jpg)
Putting It All Together• Clock configuration programming
during P-State transitions F i d d f t bl
ProgramSequencerNorth Bridge
Fuse Box
• Frequency-indexed fuse table to access configuration bits
– Mode selection (rclk,cclk)
g
Clk Mode Parameter Broadcast (in HCK tree4)( , )
– Driver strength– Pulse_en, Pulse duty cycle Clk Mode Parameter
Broadcast (in HCK tree1)• Source-synchronous transfer• Cclk mode during P-State
transition Clk Mode Parameter
Broadcast (in HCK tree1)
transition Broadcast (in HCK tree0)driver[0]
Clk ModeParameter
D d
driver[26]Clk Mode
Parameter Decode
SourceSynchronous
Receiver
DecodeDecode
![Page 31: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/31.jpg)
Measured Efficiency (%) vs. Frequency
rclk_pulse_3750C250C
rclk square 3
rclk_square_2
rclk_square_3
• Efficiency : Percentage clock power savings over cclk• rclk_square_x Clock driver strength modulation of x/7
![Page 32: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/32.jpg)
fmax Measurements
3.9
4.0
3.7
3.8
(GH
z)
3.5
3.6
f max
3.3
3.4
cclk rclk sq 3 rclk sq 2
• Frequency-limiting patterns in HST setup• 0 MHz median, 5 MHz mean (0.13%) frequency overhead in rclk
_ q_ _ q_
( ) q y– rclk cclk phase offset– Low-slew rclk waveforms variation in timing elements
![Page 33: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/33.jpg)
Rclk Measurement Summary• Successfully ran SST (System Stress Test) over 2 weeks• Latest Fmax impact data on a larger set of parts shows an Fmax
overhead of ~0.2%overhead of 0.2%• Up to 34% energy efficiency achieved in the global clock using
Pulse ModeN t d ti d d t i F i t d i b – Not production ready due to excessive Fmax impact driven by phase offset to NB and L2 clock interface
– Phase offset issue resolved in current design• Temperature effect : Overall efficiency degradation with T
– Positive temperature coefficient due to metal resistanceN ti t t ffi i t d t l d i MS– Negative temperature coefficient due to low-overdrive MSw
• Traces with higher activity provide additional efficiency– Increased clock load dominates additional crossover current.
![Page 34: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/34.jpg)
Conclusion• Dual-mode resonant clock design in 32nm SOI
– Conventional mode : < 2.9GHzR d 2 9GH f– Resonant mode : > 2.9GHz to fmax
• Power savingsClock power: 25%– Clock power: 25%
– Average application power (core): 4.5% – Idle power (core): 10%Idle power (core): 10%
• Built upon existing clock infrastructure• No CMOS technology modificationNo CMOS technology modification
Paper 3.7
![Page 35: Resonant Clock Design for a Power-efficient, High-volume ...ewh.ieee.org/r5/denver/sscs/Presentations/2012_05_Sathe.pdf · Resonant Clock Design for a Power-efficient, High-volume](https://reader034.vdocuments.mx/reader034/viewer/2022051802/5afe21237f8b9a994d8e8195/html5/thumbnails/35.jpg)
AcknowledgmentsAMD
• Manivannan Bhoopathy• Kyle Viau
Cyclos Semiconductor• Fred Brauchler• Joohee Kim• Kyle Viau
• Alan Arakawa• Tom Meneghini
• Joohee Kim• Jerry Kao
• Denny Renfrow• Vasant Palisetti• Kevin Hurd• David Johnson• Rohit Kumar1
• Alvin Loke• Alvin Loke• Bill McGee
1 Currently with Apple Computers
Paper 3.7
y pp p