altera trcak g

39
Device and circuit architecture for Low power design & techniques Shlomi Shaked – Senior FAE ALTERA Department - Eastronics

Upload: alonagradman

Post on 21-Jan-2015

635 views

Category:

Education


7 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Altera  trcak g

Device and circuit architecture for Low power design & techniques

Shlomi Shaked – Senior FAE

ALTERA Department - Eastronics

Page 2: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 2

Agenda

Introduction Power Analysis Power Optimization Technology for Low Power

Page 3: Altera  trcak g

Introduction Introduction

Static, Dynamic and I/O Power in FPGAs

Page 4: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 4

Power Basics

Cur

rent

Power Up

Static

Total Power (Dynamic+Static)

Time

Stratix Family Power-Up Profile

In-Rush Current

Typical FPGA

Stratix Family

Page 5: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 5

Power Components

Power During Operation Standby or Static Power

Power with clocks stopped Dynamic Power

Power that increases with clock frequency Get this power from Early Power Estimator or Quartus

Power Analyzer Power During Start-up

Temporary Power-Up Spike / Inrush Current Configuration Power (to program SRAMs) Get this power Information from data sheet

Page 6: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 6

Standby Power

Standby or Static Power Power drawn by device even when the clocks are stopped

Two Components Leakage Power: Transistors don’t turn off fully IO Power for Terminated IO Standards

IOs continuously drive current into resistors, even with no clock

n

S tagesupply_volcurrentstaticP1

_

Page 7: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 7

Dynamic Power

Dynamic Power Increases Linearly (or close to linearly) with clock Frequency

Two Components Power due to Charging and Discharging of Capacitance of Routing

Wires, ALMs, Load Capacitance on I/O Pins, etc. Short Circuit Power

Power Dissipated When Current Flows in a Direct Path from VCC to Ground during switching

Page 8: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 8

I/O Power Dynamic Power to Charge Capacitance Static Power

Significant for Resistively-Terminated Standards like SSTL Negligible for Non-Terminated I/O Standards like LVTTL and LVCOMS

Terminated I/O Standards: Some Power Dissipated as Heat in Off-Chip Resistors

Power Models Give Both Values1. Power Dissipated as Heat on FPGA (Thermal Power) 2. Power Drawn From Voltage Supply (Larger)

FPGA OutputBuffer

R1

R2 CL

VccioIBUFFER

VTT

Page 9: Altera  trcak g

Power Analysis Power Analysis

Early Power Estimation (EPE),

PowerPlay Power Analysis.

Page 10: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 10

Power Analysis

Three parts to good power estimates1. Accurate Toggle Rate data on each signal2. Accurate Power Models of FPGA circuitry3. Knowledge of device Operating Conditions

Toggle Rate & Signal Probability Power Models Power Estimation

Report

Operating conditions

Page 11: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 11

PowerPlay Power Analysis Tools

Lower Higher

Higher

Est

imat

ion

Acc

ura

cy

PowerPlay Analysis Inputs

Design Concept Design Implementation

User Input

Quartus II Design Profile

Place & RouteResults

Simulation Results

Early Power Estimator Spreadsheets

Quartus II Power Analyzer

Page 12: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 12

PowerPlay - Early Power Estimator

Page 13: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 13

PowerPlay Power Analyzer

Accurately Estimate the device power consumption after the design is completed

Signal Activities

User Design (after Fitting)

PowerPlay Power Analyzer

Power Analysis Report

Operating Conditions

Page 14: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 14

PowerPlay Power Analyzer Tool

PowerPlay Power Analyzer Tool under Tools Menu

Toggle Rate Input Signal Activity File

Output by Quartus II simulator

VCD Generated By 3rd-Party

Simulators Assignment Editor Unspecified Toggle Rates:

use either: Default Toggle Rate Vectorless estimation

Operation Condition Setting

Page 15: Altera  trcak g

Power OptimizationPower Optimization

Synthesis & Place & Route

Page 16: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 16

Core Dynamic Power Breakdown

*DSP Block Power: 5% of Dynamic Power for Designs That Use DSP Blocks

Average power Dissipation in varies FPGA architecture elements

Routing38%

ALMCombinational

19%

ALMRegisters

18%

RAM Blocks14%

Clock Networks9%

DSP Blocks2%*

Sanjay Rajput
Paul Leventis8/29/2005Need to double-check this number.
Page 17: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 17

Power-Driven Compilation Flow – Quick and Easy

Straight Forward Longer Compile Time Not Fully Optimized

for Power

Design Entry Schematic/HDL

Power-Driven Synthesis (Extra effort)

Power-Driven Fitter (Extra effort)

PowerPlay Power Analyzer (Power Estimation)

Page 18: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 18

Power-Driven Compilation Flow -Recommend

Use Accurate Toggle Data From Simulation Results, Provide Best Guidance to Power-Driven Fitting SAF Provides the Design Signal

Activity Information Reads the Power Analyzer Input

Settings

Time Consuming Because of Longer Flow

Very Effective

Fit Design

Find Signal Toggle Rates: Gate-Level Simulation with

Glitch Filtering

Signal Activity (SAF) File

Design Entry Schematic/HDL

Power-Driven Synthesis (Extra effort)

Power-Driven Fitter (Extra effort)

PowerPlay Power Analyzer (Power Estimation)

Page 19: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 19

1.Power-Driven Synthesis

Under Analysis & Synthesis Settings

Power Optimization Settings OFF: No Optimization Normal compilation

(Default): Power Optimizations which do not impact performance and do not Increase Compile Time

Extra effort: Power Optimizations which May Impact Design Performance and/or Increase Compile Time

Page 20: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 20

Impact On Memory Blocks

Specify read-enable & write-enable signals on your RAMs whenever possible

PowerPlay will convert to clock enables Completely shuts down RAM on many cycles

Leave RAM Block Type = Auto Power optimizer will choose best RAM block

Memory Optimization Extra effort Setting

Power-Aware Memory Balancing

Page 21: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 21

Impact On Memory Blocks (Cont)

Addr Decoder

Data[0:3]

Addr[10:11]

Addr[10:11]

Addr[0:9] Addr[0:11]

Data[0:3]

Power Efficient (Extra effort) Default Implementation

4K x 4 Memory

4K x 1 M4K RAM

1K x 4 M4K RAM

4

Extra effort Setting Normal Compilation Setting

Page 22: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 22

Impact On Logic Elements

Power-Aware Logic Mapping Normal compilation or Extra effort Settings

Re-Arrange Logic During Synthesis to Reduce Impact of High Toggling Nets

Balance the Area / Power / Speed Goals

Less logic usually means less power Fewer signals to toggle

Page 23: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 23

2.Power-Driven Fitter

Under Fitter Settings Power Optimization

Settings OFF: No Optimization Normal compilation

(Default): Power Optimizations which do not impact performance and do not Increase Compile Time

Extra effort: Power Optimizations which May Impact Design Performance and/or Increase Compile Time

Page 24: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 24

Two Level Of Optimization

Normal Compilation Setting Power Efficient DSP Block Configuration

Swap Operands to Multipliers Swap DATAB with DATAA if DATAB is wider than DATAATransparent to Designer and No Affect on Performance

Extra Effort Setting Power Efficient DSP Block Configuration Localize High-Toggling Nets, and Route for

Minimum Capacitance Place Circuitry to Minimize Clock Power Utilizes the Signal Activity File to Guide the Fitter

(Recommended)

Page 25: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 25

Place Circuitry to Minimize Clock Power

Previously P&R Places LEs Wherever is Best for Timing and Wiring Doesn’t Try to Minimize Clock Power

LEs

Clocks

Page 26: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 26

Place Circuitry to Minimize Clock Power (Cont)

With Extra effort: Groups LEs From Same Clock Domain to Reduce Clock Power Reduces Clock Power with Minimal Effect on Routability

Page 27: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 27

3.Clock Power Management

Clocks represent a significant portion of dynamic power consumption

Clock routing power is automatically optimized by the QII software

Dynamic clock enable lets internal logic control the clock network

Gated clock in the LAB

Page 28: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 29

Clock Control Block

Use MegaWizard to Generate these Blocks

Dynamically Enable or Disable the Clock Network using Enable Signal When Clock Network is

Powered Down, all the Logic Fed by that Clock does not Toggle

Reduces Overall Device Power Consumption

Global and Regional Clock Network

Page 29: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 30

4.Architectural Optimization

Taking advantage of specific architecture resources.

TriMatrix memory is optimized for different specific function.

Systemic design consideration

Page 30: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 31

Use Dedicated Resources

DSP Blocks Less power than logic elements except for small multiplies (e.g.

5x5)

Use all the DSP logic (not just multipliers): Multiplier-accumulator, complex-multiplier, finite impulse response

sample chaining, etc.

Use altmult_accum MegaFunction if synthesis not inferring RAM blocks

Usually inferred by synthesis Use altsyncram MegaFunction if necessary

Shift registers Many toggling signals: Power inefficient Medium to large shift registers: Implement in FIFOs Use altshift_taps MegaFunction if necessary

Page 31: Altera  trcak g

Technology for Low PowerTechnology for Low Power

Cyclone III LS / Cyclone IV

Stratix IV / Stratix V

Hardcopy™

Page 32: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 3434

Power / Performance / Area Compromises

Power

Utilization

Perfo

rman

ce

Page 33: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 3535

Key Technologies to Reduce Power

FPGA Power Reduction(Yellow Highlight 28nm Techniques)

Lower Static Power

Lower Dynamic Power

Process innovations (65nm -> 40nm -> 28nm…)

Programmable Power Technology

Lower core voltage (1.1V -> 1.0V -> 0.85 V)

Extensive hardening of IP, Embedded HardCopy Blocks

Hard power-down of more functional blocks

More granular clock gating

Selective use of high-speed transistors

Partial reconfiguration

Dynamic on-chip termination

Quartus II software PowerPlay power optimization

Page 34: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 3636

Programmable Power Technology

Programmable Power Technology enable Altera High end FPGA core logic to be programmed at the tile level for high-speed or low-power mode configuration

Tiles are defined as: MLAB/LAB pairs with routing to the pair DSP blocks Memory blocks I/O interface

Tiles with DSP blocks, memory blocks, and I/O elements that are used in the design are always set to high-speed mode Unused DSP blocks, memory blocks, and I/O interfaces are set to

low-power mode by default to reduce static and dynamic power

Page 35: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 37

Programmable Speed vs. LeakageProgrammable Speed vs. Leakage

Note: A simple “model” showing Programmable Power Technology. Actual implementation varies and is patented.

Sourcesubstrate

Drain

Gate

0 V

< 0 V

High speed (HS)

Low power (LP)

VT – Automatically controlled by software

Channel

Po

wer

High speed

Low power

Threshold voltage

Page 36: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 3838

Programmable Power Technology

Performance where you need it, lowest power everywhere else, automated by Quartus II

software

Logic array

High-speed logic

Timing critical path

Low-power logic

Unused low-power logic

Page 37: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 39

Power Reduction with DDR3 & Dynamic OCT

Save 1.9W per 72-bit DIMM at 1067 Mbps

Write (Matching line impedance) Read (Terminating far end)

Stratix IV FPGA Memory chip Stratix IV FPGA Memory chip

DDR3 consumes 30% lower power than DDR2 DDR2 requires 1.8-V VCC rails DDR3 requires 1.5-V VCC rails

Dynamic OCT reduces termination power by 1 W/72-bits

Page 38: Altera  trcak g

© 2010 Altera Corporation—Public

ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS, QUARTUS & STRATIX are Reg. U.S. Pat. & Tm. Off. and Altera marks in and outside the U.S. 4040

HardCopy IV Devices Designed for Low Power

0

1

2

3

4

5

6

Stratix® FPGAs HardCopy® ASICs

Pow

er (

W)

I/ODSP

Leakage

LogicRouting and clocks

RAM

Optimized architecture for power efficiency

Unused logic and memory blocks not connected to power rail

Unused clock trees not powered Total core power reduction

estimates— 30% to 70% Final results pending

characterization

Page 39: Altera  trcak g

Thank you. Thank you.