dynamic power redistribution in failure-prone cmps

Dynamic Power Redistribution in Failure-Prone CMPs

Paula Petrica, Jonathan A. Winter* and David H. Albonesi

Cornell University

*Google, Inc.

Paula Petrica WEED2010 2

Motivation Hardware failures expected to become

prominent in future generations

Front End (FE)

Back End (BE)

Load-Store Queue (LSQ)Core

Motivation Deconfiguration

tolerates defects at the expense of performance

Pipeline imbalance Units correlated with

deconfigured one might become overprovisioned

Power inefficiencies Application specific

Front End (FE)

Back End (BE)

Load-Store Queue (LSQ)Core

Research Goal

Given a CMP with a set of failures and a power budget: Eliminate power inefficiencies Improve performance

Outline

Motivation Architecture

Power Harnessing Performance Boosting

Power Transfer Runtime Manager Conclusions and future work

Core 2

Front End (FE)

Load-Store Queue (LSQ)

Architecture

Two-step approach Transfer power Harness Power

Back End (BE)

Core 1

Front End (FE)

Back End (BE)

Power Harnessing

FQDecode/ Rename

Dispatch

Select

D-Cache

RFBPred

I-Cache

Pipeline Imbalance

er Saved

Performance Boosting

Distribute accumulated margin of power to boost performance Temporarily enable a previously dormant feature

Requirements Small area and fast power-up Small PPR (Power-Performance Ratio)

Performance Boosting Techniques

Speculative Cache Access Speculatively send L1 requests to the L2 cache Speculatively access both tag and data in the L2

cache at the same time (rather than serially) Turned on independently or in combination Approximately linear power-performance relationship Benefits applications limited by L1 cache capacity

LoadL1

CacheL2

CacheL1 Miss Tag Data

Lower Hierarchy Level

L2 Cache

Lower Hierarchy Level

L2 Cache

Boosting main memory performance CLEAR [N. Kirman et al, HPCA 2005] Predict and speculatively retire long latency loads Supply predicted values to destination registers Free processor resources for non-dependent

instructions Linear power-performance relationship Benefits memory bound applications

DVFS Scale up voltage and frequency Already built in Cubic power cost for linear performance benefit Benefits high-IPC applications

Comparison of Boosting Techniques

Core 2

Front End (FE)

Architecture

Two-step approach Transfer power Harness Power

Back End (BE)

Core 1

Front End (FE)

Back End (BE)

Power Transfer Runtime Manager Periodically coordinate chip-wide effort to

relocate power among cores Obtain current local hardware deconfiguration

status (due to faults) Determine additional components to be

deconfigured Transfer power to one or more mechanisms that

make best use of it

Power Transfer Runtime Manager

Sampling Phase

Steady Phase

Sample deconfigurations

Choose additional deconfiguration

Sample performance boosting

Compute global throughput with fairness

Choose best 4-core configuration

Apply DVFS (greedy)

Local decisions

Global Decisions

Global vs Local Optimization 100 4-core configurations, random errors and random SPEC

CPU2000 benchmarks

Diversity of Boosting Techniques 100 4-core configurations, random errors and random SPEC

CPU2000 benchmarks

Power Transfer Runtime Manager 100 4-core configurations, random errors and random SPEC

CPU2000 benchmarks

Conclusions We proposed a technique to increase performance

given a certain power budget in the presence of hard faults

Exploited the deconfiguration capabilities already built in microprocessors

Demonstrated that pipeline imbalances and additional deconfiguration are application-dependent

Proposed several boosting techniques Demonstrated the potential for substantial

performance gains for a 4-core CMP

Future Work Heuristic approaches to scale this problem to

many cores Simulated Annealing, Genetic Algorithm Pareto optimal fronts to reduce the number of

combinations Hierarchical optimization

Questions?

dynamic power redistribution in failure-prone cmps

power budget

deconfiguredtransfer

incubic power cost

dynamic power redistribution

l1 requests

set of failures

motivationhardware failures

l2 cachespeculatively

Documents

federalism, regional redistribution, and country...

route redistribution

cmps 20081211a...

10 redistribution

cmps 20081211a the_1823_integrated_call_centre-a_case_study

access: smart scheduling for asymmetric cache cmps

cmps 3130/6130 computational geometry spring...

surplus food redistribution guide - zero waste...surplus...

cmps 20110222 social media lite

cmps 122: computer security - university of california...

mattress redistribution

dell cmps 2003

cmps 3223 theory of computation

14/13/15 cmps 3130/6130 computational geometry spring 2015...

cmps 20081211b...

cmps 2433 – coding theory chapter 3

agenda cmps outubro 2015

messages, instances, and initialization (methods) cmps 2143

cmps research school retreat pres by sutton

cmps 4760/6760 distributed systems - tulane university