intel sandy ntel sandy bridge architecture

Upload: jaisson-k-simon

Post on 14-Apr-2018

249 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    1/54

    1

    IntroducingSandy Bridge

    Sandy Bridge - Intel Next Generation Microarchitecture

    Bob Valentine

    Senior Principal Engineer

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    2/54

    2

    Sandy Bridge: Overview

    Sandy Bridge - Intel Next Generation Microarchitecture

    Integrated Memory Controller2ch DDR3

    High BandwidthLast Level Cache

    Next Generation ProcessorGraphics and Media

    Next Generation Intel Turbo

    Boost Technology

    Intel Hyper-Threading

    Technology4 Cores / 8 Threads2 Cores / 4 Threads

    Integrates CPU, Graphics, MC,PCI Express* On Single Chip

    Embedded Display Port

    Substantial performanceimprovement

    IntelAdvanced VectorExtension (IntelAVX)

    High BW/low-latency modularcore/GFX interconnect

    Discrete Graphics Support :1x16 or 2x8

    2ch DDR3

    x16PCIe

    PECI Interface

    To Embedded

    Controller

    NotebookDP Port

    PCH

    En e r g y Ef f i c i e n c y

    St u n n i n g P e r f o r m a n ce St u n n i n g Pe r f o r m a n ce

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    3/54

    3

    Agenda

    Innovation in the Processor core

    System Agent, Ring Architecture andOther Innovations

    Innovation in Power Management

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    4/54

    4

    Introduction toSandy Bridge

    Processor Core

    Microarchitecture

    Sandy Bridge - Intel Next Generation Microarchitecture

    2ch DDR3

    x16

    PCIe

    PECI Interface

    To Embedded

    Controller

    NotebookDP Port

    Sandy Bridge

    Microarchitecture

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    5/54

    5

    Outline

    Sandy Bridge Processor Core Summary

    Core Major MicroarchitectureEnhancements

    Core Architectural Enhancements

    Processor Core Summary

    Sandy Bridge - Intel Next Generation Microarchitecture

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    6/54

    6

    Sandy Bridge Processor Core Summary

    Build upon the successful Nehalem microarchitectureprocessor core Converged building block for mobile, desktop, and server

    Add Cool microarchitecture enhancements Features that are better than linear performance/power

    Add Really Cool microarchitecture enhancements

    Features which gain performance while saving power

    Extend the architecture for important new applications Floating Point and Throughput Intel Advanced Vector Extensions (Intel AVX) - Significant boost for selected

    compute intensive applications

    Security AES (Advanced Encryption Standard) throughput enhancements

    Large Integer RSA speedups

    OS/VMM and server related features State save/restore optimizations

    Sandy Bridge - Intel Next Generation Microarchitecture

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    7/54

    7

    ALU, SIMUL,

    DIV, FP MUL

    ALU, SIALU,

    FP ADD

    ALU, Branch,

    FP Shuffle

    Load Load

    Store Address Store Address

    Store

    DataSix Execution PortsSix Execution Ports

    Processor Core TutorialMicroarchitecture Block Diagram

    32k L1 Instruction Cache

    Scheduler

    Port 0 Port 1 Port 5 Port 2 Port 3 Port 4

    32k L1 Data Cache

    48 bytes/cycle

    Allocate/Rename/RetireZeroing IdiomsLoadBuffers

    StoreBuffers

    ReorderBuffers

    L2 Data Cache (MLC)L2 Data Cache (MLC)Fill

    Buffers

    Pre decodeInstruction

    QueueDecoders

    1.5k uOP cache

    DecodersDecodersDecoders

    Branch Pred

    In order

    Out-of-order

    Front EndFront End

    (IA instructions(IA instructions Uops)Uops)

    In Order Allocation, Rename, RetirementIn Order Allocation, Rename, Retirement

    Out of OrderOut of Order UopUop SchedulingScheduling

    DataData

    CacheCache

    UnitUnit

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    8/54

    8

    Front End Microarchitecture

    Instruction Decode in Processor Core 32 Kilo-byte 8-way Associative ICache

    4 Decoders, up to 4 instructions / cycle

    Micro-Fusion

    Bundle multiple instruction events into a single Uops

    Macro-Fusion

    Fuse instruction pairs into a complex Uop

    Decode Pipeline supports 16 bytes per cycle

    32k L1 Instruction Cache Pre decodeInstruction

    Queue DecodersDecodersDecodersDecoders

    Branch Prediction Unit

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    9/54

    9

    Decoded Uop Cache ~1.5 Kuops

    New: Decoded Uop Cache

    Add a Decoded Uop Cache An L0 Instruction Cache for Uops instead of

    Instruction Bytes

    ~80% hit rate for most applications

    Higher Instruction Bandwidth and Lower Latency

    Decoded Uop Cache can represent 32-byte / cycle

    More Cycles sustaining 4 instruction/cycle

    Able to stitch across taken branches in the control flow

    Branch Prediction Unit

    32k L1 Instruction Cache Pre decodeInstruction

    Queue DecodersDecodersDecodersDecoders

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    10/54

    10

    New Branch Prediction Unit

    Do a Ground Up Rebuild of Branch Predictor Twice as many targets

    Much more effective storage for history

    Much longer history for data dependent behaviors

    Decoded Uop Cache ~1.5 KuopsBranch Prediction Unit

    32k L1 Instruction Cache Pre decodeInstruction

    Queue DecodersDecodersDecodersDecoders

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    11/54

    11

    Really Cool features in the front end Decoded Uop Cache lets the normal front end sleep

    Decode one time instead of many times

    Branch-Mispredictions reduced substantially

    The correct path is also the most efficient pathReally Cool Features

    Save Power while Increasing PerformancePower is fungible

    give it to other units in this core, or other units on die

    Sandy Bridge - Intel Next Generation Microarchitecture

    Decoded Uop Cache ~1.5 KuopsBranch Prediction Unit

    32k L1 Instruction Cache Pre decodeInstruction

    Queue DecodersDecodersDecodersDecoders

    Zzzz

    Sandy Bridge Front End

    Microarchitecture

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    12/54

    12

    Scheduler

    Port 0 Port 1 Port 5 Port 2 Port 3 Port 4

    Allocate/Rename/RetireZeroing IdiomsLoadBuffers

    Store

    Buffers

    Reorder

    Buffers

    In order

    Out-of-

    order

    In Order Allocation, Rename, Retirement

    Out of Order Uop Scheduling

    Out of Order Cluster

    Challenge to the OoO Architects :

    Increase the ILP while keeping the power available for Execution

    Receives Uops from the Front End

    Sends them to Execution Units when they are ready

    Retires them in Program Order

    Goal: Increase Performance by finding more Instruction LevelParallelism

    Increasing Depth and Width of machine implies larger buffers More Data Storage, More Data Movement, More Power

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    13/54

    13

    Sandy Bridge Out-of-Order (OOO) Cluster

    Method: Physical Reg File (PRF) instead of centralized Retirement Register File

    Single copy of every data

    No movement after calculation

    Allow significant increase in buffer sizes

    Dataflow window ~33% larger

    Scheduler

    Allocate/Rename/RetireZeroing IdiomsLoadBuffers

    Store

    Buffers

    Reorder

    Buffers

    In order

    Out-of-

    order

    FP/INT Vector PRF Int PRF

    PRF is a Cool featurebetter than linear

    performance/power

    Key enabler for Intel

    Advanced VectorExtensions (Intel AVX)

    Sandy Bridge - Intel Next Generation Microarchitecture

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    14/54

    14

    Execution Cluster

    3 Execution Ports

    Maximum throughput of 8 floating pointoperations* per cycle

    Port 0 : packed SP multiply

    Port 1 : packed SP add

    Scheduler

    Port 0 Port 1 Port 5

    ALUALU ALU

    JMP

    FP Shuf

    FP Bool

    VI ADDVI MUL

    VI Shuffle

    DIV

    VI Shuffle

    FP ADD

    Blend

    BlendFP MUL

    Challenge to the Execution UnitArchitects :

    Double the Floating PointThroughput

    in a cool manner

    *FLOPS = Floating Point Operations / Second

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    15/54

    15

    Doubling the FLOPs in a Cool Manner

    Intel Advanced Vector Extensions (Intel AVX)

    Extend SSE FP instruction set to 256 bitsoperand size

    Intel AVX extends all 16 XMM registers to 256bits

    New, non-destructive source syntax

    VADDPS ymm1, ymm2, ymm3

    New Operations to enhance vectorization

    Broadcasts

    Masked load & store

    YMM0

    XMM0

    128 bits256 bits (AVX)

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    16/54

    16

    Doubling the FLOPs in a cool manner

    Extend SSE FP instruction set to 256 bits operand size

    Intel AVX extends all 16 XMM registers to 256bits

    New, non-destructive source syntax

    VADDPS ymm1, ymm2, ymm3

    New Operations to enhance vectorization

    Broadcasts

    Masked load & store

    Intel Advanced Vector

    Extensions (Intel

    AVX)

    YMM0

    XMM0

    128 bits

    256 bits (AVX)

    Intel AVX is a Cool Architecture

    Vectors are a natural data-type for many applications

    Wider vectors and non-destructive source specify morework with fewer instructions

    Extending the existing state is area and power efficient

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    17/54

    17

    Execution Cluster A Look Inside

    Scheduler seesmatrix:

    3 ports to 3stacks of

    execution unitsGeneral PurposeInteger

    SIMD (Vector)Integer

    SIMD FloatingPoint

    The challenge is todouble the outputof one of these

    stacks in a mannerthat is invisible tothe others

    Port 0

    Port 1

    Port 5

    ALU

    ALU

    ALU

    JMP

    FP Shuf

    FP Bool

    VI ADD

    VI MUL

    VI Shuffle

    DIV

    VI Shuffle

    FP ADD

    Blend

    Blend

    FP MUL

    GPR SIMD INT SIMD FP

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    18/54

    18

    Execution Cluster

    Solution:

    Repurposeexistingdatapaths to

    dual-use

    SIMD integerand legacySIMD FP use

    legacy stackstyle

    Intel AVXutilizes both128-bit

    executionstacks

    Intel Advanced Vector Extensions (Intel AVX)

    Port 0

    Port 1

    Port 5

    ALU

    ALU

    ALU

    JMP

    FP Shuf

    FP Bool

    VI ADD

    VI MUL

    VI Shuffle DIV

    VI Shuffle

    FP ADD

    Blend

    Blend

    FP MUL

    FP ADD

    FP Multiply

    FP Blend

    FP Shuffle

    FP Boolean

    FP Blend

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    19/54

    19

    Port 0

    Port 1

    Port 5

    ALU

    ALU

    ALU

    JMP

    FP Shuf

    FP Bool

    VI ADD

    VI MUL

    VI Shuffle DIV

    VI Shuffle

    FP ADD

    Blend

    Blend

    FP MUL

    FP ADD

    FP Multiply

    FP Blend

    FP Shuffle

    FP Boolean

    FP Blend

    Execution Cluster

    Solution:

    Repurposeexistingdatapaths to

    dual-use

    SIMD integerand legacySIMD FP use

    legacy stackstyle

    Intel AVXutilizes both128-bit

    executionstacks

    Cool Implementation of Intel AVX256-bit Multiply + 256-bit ADD + 256-bit Load per clock

    Double your FLOPs with great energy efficiency

    Intel Advanced Vector Extensions (Intel AVX)

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    20/54

    20

    Memory Cluster

    Memory Unit can service two memory requests per cycle 16 bytes load and 16 bytes store per cycle

    Memory Control

    32kx8-way L1 Data Cache

    32 bytes/cycle

    Load Store AddressStore

    Data

    256KB L2 Data Cache (MLC)

    Challenge to the Memory Cluster Architects

    Maintain the historic bytes/flop ratio of SSE for Intel AVX

    and do so in a cool manner

    Intel Advanced Vector Extensions (Intel AVX)

    Fill

    Buffers

    Store

    Buffers

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    21/54

    21

    Memory Cluster in Sandy Bridge

    Solution : Dual-Use the existing connections Make load/store pipes symmetric

    Memory Unit services three data accesses per cycle

    2 read requests of up to 16 bytes AND 1 store of up to 16 bytes

    Internal sequencer deals with queued requests

    Memory Control

    32kx8-way L1 Data Cache

    48 bytes/cycle

    Store

    Data

    256KB L2 Data Cache (MLC)

    Sandy Bridge - Intel Next Generation Microarchitecture

    Fill

    Buffers

    Store

    Buffers

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    22/54

    22

    Memory Cluster in Sandy Bridge

    Solution : Dual-Use the existing connections Make load/store pipes symmetric

    Memory Unit services three data accesses per cycle

    2 read requests of up to 16 bytes AND 1 store of up to 16 bytes

    Internal sequencer deals with queued requests

    Second Load Port is one of highest performance featuresRequired to keep Intel Advanced Vector Extensions (Intel AVX)

    Instruction Set fed linear power/performance means its Cool

    Sandy Bridge - Intel Next Generation Microarchitecture

    Memory Control

    32kx8-way L1 Data Cache

    48 bytes/cycle

    Store

    Data

    256KB L2 Data Cache (MLC) FillBuffers

    Store

    Buffers

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    23/54

    23

    Putting it together

    Sandy Bridge Microarchitecture32k L1 Instruction Cache

    Scheduler

    Memory Control

    Port 0 Port 1 Port 5 Port 2 Port 3 Port 4

    32k L1 Data Cache

    48 bytes/cycle

    Allocate/Rename/RetireZeroing IdiomsLoadBuffers

    Store

    Buffers

    Reorder

    Buffers

    Store

    Data

    L2 Data Cache (MLC) FillBuffers

    Pre decodeInstruction

    QueueDecodersDecoders

    DecodersDecoders

    In order

    Out-of-

    order

    ALU

    VI ADD

    VI Shuffle

    AVX FP ADD

    ALU

    JMP

    AVX/FP Shuf

    AVX/FP Bool

    AVX FP Blend

    ALU

    VI MUL

    VI Shuffle

    DIV

    AVX FP BlendAVX FP MUL

    Sandy Bridge - Intel Next Generation Microarchitecture AVX= Intel Advanced Vector Extensions (Intel AVX)

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    24/54

    24

    Other Architectural Extensions

    Cryptography Instruction ThroughputEnhancements

    throughput for AES instructions introduced in Westmere

    Large Number Arithmetic ThroughputEnhancements

    ADC (Add with Carry) throughput doubled

    Multiply (64-bit multiplicands with 128-bit product) ~25% speedup on existing RSA binaries!

    State Save/Restore Enhancements

    New state added in Intel Advanced Vector Extensions

    (Intel AVX)

    HW monitors features used by applications

    Only saves/restores state that is used

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    25/54

    25

    Sandy Bridge Processor Core

    Summary Build upon the successful Nehalem processor core Converged building block for mobile, desktop, and server

    Cool and Really Cool features

    Improve performance/power and performance/area

    Extends the architecture for important new applications

    Floating Point and Throughput Applications

    Intel Advanced Vector Extensions (Intel AVX) - Significantboost for selected compute intensive apps

    Security

    AES (Advanced Encryption Standard) Instructions speedup

    Large Integer RSA and SHA speedups

    OS/VMM and server related features

    State save/ restore optimizations

    Sandy Bridge - Intel Next Generation Microarchitecture

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    26/54

    26

    System Agent, RingArchitecture and

    Other Innovations2ch DDR3

    x16

    PCIe

    PECI Interface

    To Embedded

    Controller

    Notebook

    DP Port

    Sandy Bridge

    Microarchitecture

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    27/54

    27

    Integration: Optimization Opportunities

    Dynamically redistribute power between Cores & Graphics

    Tight power management control of all components, providingbetter granularity and deeper idle/sleep states

    Three separate power/frequency domains: System Agent(Fixed), Cores+Ring, Graphics (Variable)

    High BW Last Level Cache, shared among Cores and Graphics

    Significant performance boost, saves memory bandwidth and power

    Integrated Memory Controller and PCI Express* ports

    Tightly integrated with Core/Graphics/LLC domain

    Provides low latency & low power remove intermediate busses

    Bandwidth is balanced across the whole machine, fromCore/Graphics all the way to Memory Controller

    Modular uArch for optimal cost/power/performance

    Derivative products done with minimal effort/time

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    28/54

    28

    Scalable Ring On-die Interconnect

    Ring-based interconnect between Cores, Graphics, LastLevel Cache (LLC) and System Agent domain

    Composed of4 rings

    32 Byte Data ring, Requestring,Acknowledge

    ring and Snoop ring Fully pipelined at core frequency/voltage:

    bandwidth, latency and power scale with cores

    Massive ring wire routing runs over the LLC

    with no area impact Access on ring always picks the shortestpath minimize latency

    Distributed arbitration, sophisticated ringprotocol to handle coherency, ordering, and

    core interface

    Scalable to servers with large number ofprocessors

    Block Diagram Illustrative only. Number of processor cores will vary with different processor models based on the Sandy BridgeMicroarchitecture. Represents client processor implementation.

    H i g h B an d w i d t h , Lo w La t e n c y , M o d u l a r

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    29/54

    29

    Cache Box

    Interface block

    Between Core/Graphics/Media and the Ring

    Between Cache controller and the Ring

    Implements the ring logic, arbitration, cache controller

    Communicates with System Agent for LLC misses, externalsnoops, non-cacheable accesses

    Full cache pipeline in each cache box Physical Addresses are hashed at the source

    to prevent hot spots and increase bandwidth Maintains coherency and ordering for the

    addresses that are mapped to it

    LLC is fully inclusive with Core Valid Bits eliminates unnecessary snoops to cores

    Runs at core voltage/frequency, scaleswith Cores

    Block Diagram Illustrative only. Number of processor cores will vary with different processor models based on the Sandy BridgeMicroarchitecture. Represents client processor implementation.

    D i s t r i b u t e d c o h e r e n c y & o r d e r i n g ; Sca l a b l e

    B an d w i d t h , La t e n c y & P ow e r

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    30/54

    30

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    31/54

    31

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    32/54

    32

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    33/54

    33

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    34/54

    34

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    35/54

    35

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    36/54

    36

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    37/54

    37

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    38/54

    38

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    39/54

    39

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    40/54

    40

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    41/54

    41

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    42/54

    42

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    43/54

    43

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    44/54

    44

    S d id C Sh i

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    45/54

    45

    Sandy Bridge LLC Sharing

    LLC shared among all Cores, Graphics and Media

    Graphics driver controls which streams are cached/coherent

    Any agent can access all data in the LLC, independent of whoallocated the line, after memory range checks

    Controlled LLC way allocation mechanism to preventthrashing between Core/graphics

    Multiple coherency domains

    IA Domain (Fully coherent via cross-snoops) Graphic domain (Graphics virtual caches,

    flushed to IA domain by graphics engine)

    Non-Coherent domain (Display data, flushed tomemory by graphics engine)

    Block Diagram Illustrative only. Number of processor cores will vary with different processor models based on the Sandy BridgeMicroarchitecture. Represents client processor implementation. Sandy Bridge - Intel Next Generation Microarchitecture

    Mu ch h i g h e r Gr a p h i c s p e r f o r m a n c e ,

    D RAM p o w e r s a v i n g s , m o r e D RAM BW

    a v a i l a b l e f o r Co r e s

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    46/54

    46

    Lean and Mean System Agent

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    47/54

    47

    Lean and Mean System Agent

    Contains PCI Express*, DMI, MemoryController, Display Engine

    Contains Power Control Unit

    Programmable uController, handles all power

    management and reset functions in the chip Smart integration with the ring

    Provides cores/Graphics /Media with high BW,low latency to DRAM/IO for best performance

    Handles IO-to-cache coherency Separate voltage and frequency from

    ring/cores, Display integration for betterbattery life

    Extensive power and thermalmanagement for PCI Express* and DDR

    Block Diagram Illustrative only. Number of processor cores will vary with different processor models based on the Sandy BridgeMicroarchitecture. Represents client processor implementation.

    Sm a r t I / O I n t e g r a t i o n

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    48/54

    48

    Power and Thermal Management

    Usage Scena io Responsi e Beha io

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    49/54

    49Intel Top SecretIntel Restricted Secret RSNDA

    Usage Scenario: Responsive Behavior

    Interactive work benefits from Next Generation Intel

    Turbo Boost Idle periods intermixed with user actions

    User interactive actions

    Image open and process

    Example Photo editing

    Open image

    View

    Process Rotate

    Balance colors

    Contrast

    Corp

    Zoom

    Etc.

    Innovative Concept:

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    50/54

    50

    pThermal Capacitance

    Cl a s s i c M od e l

    Steady-State Thermal Resistance

    Design guide for steady state

    Temperature

    Time

    Cl a s s ic m o d e l

    r e s p o n s e

    Cl a s s ic m o d e l

    r e s p o n s e

    Tem p e r a t u r e r i se s a s e n e r g y i s d e l i v e r e d t o t h e r m a l so l u t i o n

    T h e r m a l s o lu t i o n r e s p o n s e i s c a l cu l a t e d a t r e a l - t i m e

    Tempe

    rature

    Time

    Mo r e r e a l i s t i c

    r e sp o n s e t o p o w e r

    c h a n g e s

    M o r e r e a l i s t i c

    r e sp o n s e t o p o w e r

    c h a n g e s

    N ew M o d e l

    Steady-State Thermal ResistanceAND

    Dynamic Thermal Capacitance

    Next Generation Intel Turbo

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    51/54

    51

    Boost Benefit

    Time

    Power

    Sleep or

    Low power

    Next GenTurbo Boost

    TDP

    C0

    (Turbo)

    After idle periods, the system

    accumulates energy budget

    and can accommodate high

    power/performance for a few

    seconds

    In Steady State conditions the

    power stabilizes on TDP

    P>TDP:

    Responsiv

    eness

    Su

    stain

    power

    Buildup thermal budget

    during idle periods

    U se

    a cc u m u l a t e d

    e n e r g y b u d g e t

    t o e n h a n c e u s e r

    e x p e r i e n c e

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    52/54

    52

    Core and Graphic Power Budgeting

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    53/54

    53

    p g g Cores and Graphics integrated on the same die with

    separate voltage/frequency controls; tight HW control

    Full package power specifications available for sharing

    Power budget can shift between Cores and Graphics

    CorePower [W]

    Graphics Power[W]

    Total package power

    Realistic concurrentmax power

    Sum of max power

    Heavy Graphicsworkload

    Heavy CPUworkload

    SpecificationCore Power

    SpecificationGraphics Power

    Sa n d y B r i d g e

    N e x t Ge n Tu r b o

    f o r sh o r t p e r i o d s

  • 7/27/2019 Intel sandy ntel sandy bridge architecture

    54/54

    54

    Summary32nm Next Generation

    Microarchitecture

    Processor Graphics

    System Agent, Ring Architectureand Other Innovations

    Performance and PowerEfficiency