altera trcak g
DESCRIPTION
TRANSCRIPT
Device and circuit architecture for Low power design & techniques
Shlomi Shaked – Senior FAE
ALTERA Department - Eastronics
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 2
Agenda
Introduction Power Analysis Power Optimization Technology for Low Power
Introduction Introduction
Static, Dynamic and I/O Power in FPGAs
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 4
Power Basics
Cur
rent
Power Up
Static
Total Power (Dynamic+Static)
Time
Stratix Family Power-Up Profile
In-Rush Current
Typical FPGA
Stratix Family
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 5
Power Components
Power During Operation Standby or Static Power
Power with clocks stopped Dynamic Power
Power that increases with clock frequency Get this power from Early Power Estimator or Quartus
Power Analyzer Power During Start-up
Temporary Power-Up Spike / Inrush Current Configuration Power (to program SRAMs) Get this power Information from data sheet
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 6
Standby Power
Standby or Static Power Power drawn by device even when the clocks are stopped
Two Components Leakage Power: Transistors don’t turn off fully IO Power for Terminated IO Standards
IOs continuously drive current into resistors, even with no clock
n
S tagesupply_volcurrentstaticP1
_
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 7
Dynamic Power
Dynamic Power Increases Linearly (or close to linearly) with clock Frequency
Two Components Power due to Charging and Discharging of Capacitance of Routing
Wires, ALMs, Load Capacitance on I/O Pins, etc. Short Circuit Power
Power Dissipated When Current Flows in a Direct Path from VCC to Ground during switching
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 8
I/O Power Dynamic Power to Charge Capacitance Static Power
Significant for Resistively-Terminated Standards like SSTL Negligible for Non-Terminated I/O Standards like LVTTL and LVCOMS
Terminated I/O Standards: Some Power Dissipated as Heat in Off-Chip Resistors
Power Models Give Both Values1. Power Dissipated as Heat on FPGA (Thermal Power) 2. Power Drawn From Voltage Supply (Larger)
FPGA OutputBuffer
R1
R2 CL
VccioIBUFFER
VTT
Power Analysis Power Analysis
Early Power Estimation (EPE),
PowerPlay Power Analysis.
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 10
Power Analysis
Three parts to good power estimates1. Accurate Toggle Rate data on each signal2. Accurate Power Models of FPGA circuitry3. Knowledge of device Operating Conditions
Toggle Rate & Signal Probability Power Models Power Estimation
Report
Operating conditions
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 11
PowerPlay Power Analysis Tools
Lower Higher
Higher
Est
imat
ion
Acc
ura
cy
PowerPlay Analysis Inputs
Design Concept Design Implementation
User Input
Quartus II Design Profile
Place & RouteResults
Simulation Results
Early Power Estimator Spreadsheets
Quartus II Power Analyzer
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 12
PowerPlay - Early Power Estimator
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 13
PowerPlay Power Analyzer
Accurately Estimate the device power consumption after the design is completed
Signal Activities
User Design (after Fitting)
PowerPlay Power Analyzer
Power Analysis Report
Operating Conditions
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 14
PowerPlay Power Analyzer Tool
PowerPlay Power Analyzer Tool under Tools Menu
Toggle Rate Input Signal Activity File
Output by Quartus II simulator
VCD Generated By 3rd-Party
Simulators Assignment Editor Unspecified Toggle Rates:
use either: Default Toggle Rate Vectorless estimation
Operation Condition Setting
Power OptimizationPower Optimization
Synthesis & Place & Route
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 16
Core Dynamic Power Breakdown
*DSP Block Power: 5% of Dynamic Power for Designs That Use DSP Blocks
Average power Dissipation in varies FPGA architecture elements
Routing38%
ALMCombinational
19%
ALMRegisters
18%
RAM Blocks14%
Clock Networks9%
DSP Blocks2%*
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 17
Power-Driven Compilation Flow – Quick and Easy
Straight Forward Longer Compile Time Not Fully Optimized
for Power
Design Entry Schematic/HDL
Power-Driven Synthesis (Extra effort)
Power-Driven Fitter (Extra effort)
PowerPlay Power Analyzer (Power Estimation)
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 18
Power-Driven Compilation Flow -Recommend
Use Accurate Toggle Data From Simulation Results, Provide Best Guidance to Power-Driven Fitting SAF Provides the Design Signal
Activity Information Reads the Power Analyzer Input
Settings
Time Consuming Because of Longer Flow
Very Effective
Fit Design
Find Signal Toggle Rates: Gate-Level Simulation with
Glitch Filtering
Signal Activity (SAF) File
Design Entry Schematic/HDL
Power-Driven Synthesis (Extra effort)
Power-Driven Fitter (Extra effort)
PowerPlay Power Analyzer (Power Estimation)
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 19
1.Power-Driven Synthesis
Under Analysis & Synthesis Settings
Power Optimization Settings OFF: No Optimization Normal compilation
(Default): Power Optimizations which do not impact performance and do not Increase Compile Time
Extra effort: Power Optimizations which May Impact Design Performance and/or Increase Compile Time
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 20
Impact On Memory Blocks
Specify read-enable & write-enable signals on your RAMs whenever possible
PowerPlay will convert to clock enables Completely shuts down RAM on many cycles
Leave RAM Block Type = Auto Power optimizer will choose best RAM block
Memory Optimization Extra effort Setting
Power-Aware Memory Balancing
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 21
Impact On Memory Blocks (Cont)
Addr Decoder
Data[0:3]
Addr[10:11]
Addr[10:11]
Addr[0:9] Addr[0:11]
Data[0:3]
Power Efficient (Extra effort) Default Implementation
4K x 4 Memory
4K x 1 M4K RAM
1K x 4 M4K RAM
4
Extra effort Setting Normal Compilation Setting
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 22
Impact On Logic Elements
Power-Aware Logic Mapping Normal compilation or Extra effort Settings
Re-Arrange Logic During Synthesis to Reduce Impact of High Toggling Nets
Balance the Area / Power / Speed Goals
Less logic usually means less power Fewer signals to toggle
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 23
2.Power-Driven Fitter
Under Fitter Settings Power Optimization
Settings OFF: No Optimization Normal compilation
(Default): Power Optimizations which do not impact performance and do not Increase Compile Time
Extra effort: Power Optimizations which May Impact Design Performance and/or Increase Compile Time
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 24
Two Level Of Optimization
Normal Compilation Setting Power Efficient DSP Block Configuration
Swap Operands to Multipliers Swap DATAB with DATAA if DATAB is wider than DATAATransparent to Designer and No Affect on Performance
Extra Effort Setting Power Efficient DSP Block Configuration Localize High-Toggling Nets, and Route for
Minimum Capacitance Place Circuitry to Minimize Clock Power Utilizes the Signal Activity File to Guide the Fitter
(Recommended)
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 25
Place Circuitry to Minimize Clock Power
Previously P&R Places LEs Wherever is Best for Timing and Wiring Doesn’t Try to Minimize Clock Power
LEs
Clocks
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 26
Place Circuitry to Minimize Clock Power (Cont)
With Extra effort: Groups LEs From Same Clock Domain to Reduce Clock Power Reduces Clock Power with Minimal Effect on Routability
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 27
3.Clock Power Management
Clocks represent a significant portion of dynamic power consumption
Clock routing power is automatically optimized by the QII software
Dynamic clock enable lets internal logic control the clock network
Gated clock in the LAB
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 29
Clock Control Block
Use MegaWizard to Generate these Blocks
Dynamically Enable or Disable the Clock Network using Enable Signal When Clock Network is
Powered Down, all the Logic Fed by that Clock does not Toggle
Reduces Overall Device Power Consumption
Global and Regional Clock Network
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 30
4.Architectural Optimization
Taking advantage of specific architecture resources.
TriMatrix memory is optimized for different specific function.
Systemic design consideration
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 31
Use Dedicated Resources
DSP Blocks Less power than logic elements except for small multiplies (e.g.
5x5)
Use all the DSP logic (not just multipliers): Multiplier-accumulator, complex-multiplier, finite impulse response
sample chaining, etc.
Use altmult_accum MegaFunction if synthesis not inferring RAM blocks
Usually inferred by synthesis Use altsyncram MegaFunction if necessary
Shift registers Many toggling signals: Power inefficient Medium to large shift registers: Implement in FIFOs Use altshift_taps MegaFunction if necessary
Technology for Low PowerTechnology for Low Power
Cyclone III LS / Cyclone IV
Stratix IV / Stratix V
Hardcopy™
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 3434
Power / Performance / Area Compromises
Power
Utilization
Perfo
rman
ce
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 3535
Key Technologies to Reduce Power
FPGA Power Reduction(Yellow Highlight 28nm Techniques)
Lower Static Power
Lower Dynamic Power
Process innovations (65nm -> 40nm -> 28nm…)
Programmable Power Technology
Lower core voltage (1.1V -> 1.0V -> 0.85 V)
Extensive hardening of IP, Embedded HardCopy Blocks
Hard power-down of more functional blocks
More granular clock gating
Selective use of high-speed transistors
Partial reconfiguration
Dynamic on-chip termination
Quartus II software PowerPlay power optimization
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 3636
Programmable Power Technology
Programmable Power Technology enable Altera High end FPGA core logic to be programmed at the tile level for high-speed or low-power mode configuration
Tiles are defined as: MLAB/LAB pairs with routing to the pair DSP blocks Memory blocks I/O interface
Tiles with DSP blocks, memory blocks, and I/O elements that are used in the design are always set to high-speed mode Unused DSP blocks, memory blocks, and I/O interfaces are set to
low-power mode by default to reduce static and dynamic power
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 37
Programmable Speed vs. LeakageProgrammable Speed vs. Leakage
Note: A simple “model” showing Programmable Power Technology. Actual implementation varies and is patented.
Sourcesubstrate
Drain
Gate
0 V
< 0 V
High speed (HS)
Low power (LP)
VT – Automatically controlled by software
Channel
Po
wer
High speed
Low power
Threshold voltage
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 3838
Programmable Power Technology
Performance where you need it, lowest power everywhere else, automated by Quartus II
software
Logic array
High-speed logic
Timing critical path
Low-power logic
Unused low-power logic
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 39
Power Reduction with DDR3 & Dynamic OCT
Save 1.9W per 72-bit DIMM at 1067 Mbps
Write (Matching line impedance) Read (Terminating far end)
Stratix IV FPGA Memory chip Stratix IV FPGA Memory chip
DDR3 consumes 30% lower power than DDR2 DDR2 requires 1.8-V VCC rails DDR3 requires 1.5-V VCC rails
Dynamic OCT reduces termination power by 1 W/72-bits
© 2010 Altera Corporation—Public
ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 4040
HardCopy IV Devices Designed for Low Power
0
1
2
3
4
5
6
Stratix® FPGAs HardCopy® ASICs
Pow
er (
W)
I/ODSP
Leakage
LogicRouting and clocks
RAM
Optimized architecture for power efficiency
Unused logic and memory blocks not connected to power rail
Unused clock trees not powered Total core power reduction
estimates— 30% to 70% Final results pending
characterization
Thank you. Thank you.