1k. kant, modeling challenges in distributed energy adaptive computing challenges in distributed...

35
1 K. Kant, Modeling Challenges in Distributed Energy K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Adaptive Computing Challenges in Challenges in Distributed Energy Distributed Energy Adaptive Computing Adaptive Computing K. Kant K. Kant NSF and GMU NSF and GMU

Upload: dylan-williams

Post on 26-Mar-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

11K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing

Challenges in Distributed Challenges in Distributed Energy Adaptive ComputingEnergy Adaptive Computing

K. KantK. Kant

NSF and GMUNSF and GMU

Page 2: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 22

Information & communication Technology (ICT) has a problem

Performance Centric Energy & Sustainability centric

How do we get there?

Page 3: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 33

ICT Power Growth until 2020ICT Power Growth until 2020

• Increase in spite of power efficient designs– Clients: 8x in number, 3X in power– Data Centers: > 2X increase– Network: 3X increase

Network

Network

Clients

Data CenterTransmission, conversion& distribution

Page 4: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 44

Current StateCurrent StateUnsustainable ComputingUnsustainable Computing

Page 5: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 55

Data Center InfrastructureData Center Infrastructure

• Resource intensive: Water, cabling, metal, …• ~50% power wasted before getting to racks

Page 6: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 66

13.2kv

11

5k

v

13

.2k

v

13.2kv 480V

20

8V

0.3% loss99.7% efficient

0.5% loss99.5% efficient

1.0% loss99.0% efficient

6% loss94% efficient

~1% loss in switchgear and conductorsUPS:

2.5MW Generator~180 Gallons/hour

IT LOAD

~10% distribution loss + High carbon impact

Distribution InfrastructureDistribution Infrastructure

Page 7: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 77

~50% Rack Power Wasted~50% Rack Power Wasted

Component Total Used Comments

CPU 80 60 Operating at 100% utilization

Fans 50 25 Temp. directed fan at 100% util

Memory (32 GB) 88 24 2GB DIMMS, 4W idle, 19W active

Hard drives 40 10 6 SATA drives, 25% busy

I/O adapters 20 4 25% disk, 15% network

Motherboard 22 12 N/S bridges & devices, VR’s, …

Total DC power 300 135

Power supply loss 50 7 14% 5% loss of AC input pwr

AC input power 350 142 > 50% of power is wasted

Page 8: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 88

Sustainable ComputingSustainable Computing

Page 9: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 99

Renewable Energy PushRenewable Energy Push

• Limit energy draw from grid – Less infrastructure– Less losses– but variable supply

Need better power adaptabilityNeed better power adaptability

Page 10: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 1010

High Temperature DC’sHigh Temperature DC’s

• Chiller-less operation– Less energy/materials, but

space inefficient

• High temperature operation– Smaller Toutlet – Tinlet

– More throttling– More failure prone (?)

X

Need smarter thermal adaptabilityNeed smarter thermal adaptability

Page 11: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

OverdesignOverdesign

• Overdesign is the norm today– Huge power supplies, fans, heat sinks, server cases,

high rack capacity, UPS capacity, …– Engineered for worst case Rarely encountered– Huge power wastage, waste of materials, energy, …

1111Better energy adaptability to deal w/ frugal Better energy adaptability to deal w/ frugal designdesign

Efficiency vs. Load

505560657075808590

0 20 40 60 80 100output load

PS

U e

ffic

ien

cy

Low eff High eff

• What if we right-size everything?• Highly energy

efficient but need smarter control

Page 12: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

Energy Adaptive ComputingEnergy Adaptive Computing

• EAC strives to do dynamic end to end adjustment to – Workload adaptation for graceful QoS

degradation under energy limitations– Infrastructure adaptation to cope with

temporary energy deficiencies.

• Requires coordinated power/thermal mgmt of computation, network & storage.

• Enhances sustainability of IT infrastructure

1212

Page 13: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 1313

EAC InstancesEAC Instances

Page 14: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 1414

Client-server EACClient-server EAC

• Transparently adapt to client energy states– State = {on-AC, normal, low-battery, …}– Service contract Ci = {setup QoS, operational

QoS}

• Adaptation Challenges– Communicating & enforcing contracts.– Group adaptation of clients forced by

network/servers ?

Page 15: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 1515

Cluster EACCluster EAC

• Adaptation to intra & inter-DC limits – Multi-level: Server, rack & DC levels

• Adaptation Challenges– Estimate & collect power deficits/surplus at

multiple levels– Coordination across large range of devices

• Location based services• Coordination across levels

– Simultaneously handle client-server loop

Page 16: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 1616

P2P EACP2P EAC

• Adaptation based on “available energy”• Content: video resolution, audio coding, …• Network: modulate wireless radio usage (?)• Energy proportional use of peer resources• Energy driven content replication & reorganization

• Adaptation Challenges– Satisfying QoS ?– Balancing src/dest usage vs. relay node

energy usage ?

Page 17: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 1717

ChallengesChallenges

Some specific IssuesSome specific Issues

Page 18: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 1818

Power Estimation ChallengesPower Estimation Challenges

• Notion of effective power?– Additive relationship: Workload power – Why is this hard? Interference

• Available power– Determined by power, thermal & perhaps

other issues (noise).– Required at multiple levels: facility, enclosure,

machine, …

Page 19: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 1919

Network Role in EACNetwork Role in EAC

• Energy Adaptation– Aggressive control of switch/router ports

• Speed, state & width controls

– Traffic consolidation across paths

• Adaptation induced congestion– Propagation (e.g., ECN, EBCN) & response

• Computation – communication tradeoff ?

• Redirection ?

• Network protocol support for adaptation?

Page 20: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 2020

Other IssuesOther Issues

• EAC Security– Attacks on power sources– Energy Attacks on IT, e.g.,

• Demanding too much, cyclic demands, …

• Storage adaptation– Storage devices, controllers & network.

• Coordinated end to end control is hard!

• Formal models to understand impact of energy adaptation.

Page 21: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

Energy Adaptation in Energy Adaptation in Data CentersData Centers

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 2121

Page 22: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

Adaptation MethodsAdaptation Methods

• Workload Adaptation– Coarse grain: Shut down low priority tasks– Fine grain: Graceful QoS degradation, e.g.,

• Batched service, poorer resolution, …

• Infrastructure Adaptation– Operation at lower speeds (DVFS)– Effective use of low power modes & “width”

control.

• Workload adaptation always done first2222

Page 23: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

Infrastructure AdaptationInfrastructure Adaptation

• Need a multilevel scheme –– Individual “assets” up to entire data center

• Need both supply & demand side adaptations

Page 24: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

Supply Side AdaptationSupply Side Adaptation

• Supply side Limits– Hard caps at higher levels (true limit) vs. “soft”

(artificial) caps at lower levels.– Limits may be a result of thermal/cooling issues.

• Load consolidation – An essential part of energy efficient operation– Load consolidation vs. soft capping

• Need to address workload adaptation changes as a result of supply increase & decrease.

Page 25: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

Demand Side AdaptationDemand Side Adaptation

• Adaptation to fluctuating demand– Transactional workload: Migrate queries or

app VMs?

• Issues w/ combined supply & demand side adaptations– Imbalance: One node squeezed while other

has surplus power– Ping-pong Control: Oscillatory migration of

workload– Error accumulation down the hierarchy.

Page 26: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

A Proposed AlgorithmA Proposed Algorithm

• Unidirectional control– Load migration moves up the hierarchy, from

local to global.– Local migrations are temporary & do not trigger

changes to “soft” caps on supply.• Target Node selection

– Based on bin packing (best-fit decreasing)– Allows for more imbalance, which can be

exploited for workload consolidation• Properties

– Avoids ping-pong, attempts to minimize imbalance

Page 27: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

Experimental ResultsExperimental Results

• Scenario– 3 levels, 18 identical servers (4+4 + 5+5)– 3 applications, total of 25 app instances– Any app can run on any server – Demand Poisson (active power ∞ utilization)

Page 28: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

Migration FrequencyMigration Frequency

• Migration drivers: consolidation vs. energy deficiency– Low util Consolidation, High util Energy deficiency

• Other characteristics– Migration frequency low in all cases – No ping-pong observed

Page 29: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

Thermal ImpactsThermal Impacts

• Additional Issues– Energy consumption limited by

thermal/cooling issues, not energy availability– Migrations required to limit temperature

• Temperature & power have nonlinear relationship

• Need to account for both power & thermal effects

Page 30: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

Results w/ Thermal EffectsResults w/ Thermal Effects

• Imbalanced cooling– Servers 1-14: Ta=25o C, Servers 15-18: Ta=40oC– Temperature limit: 65oC

• Power demand is adjusted by the alg. to account for higher temperature

Page 31: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

ConclusionsConclusions

• Need to go beyond energy efficiency– Design devices/systems to minimize life-cycle

energy footprint– Creatively adapt to available energy to

operate “at the edge”

• Ongoing/future work– Coordinated server, network & storage mgmt.– Explore tradeoffs between QoS, power

savings and admission control performance

3131

Page 32: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 3232

Thank you!Thank you!

Page 33: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 3333

Power InefficienciesPower Inefficiencies

ServerPSU

Rack supply

70-90% efficient

±12, ±5V

VoltageRegulators

90-95% efficient

CPU

Wasted leakage & clock power

Fans

DRAM & Memcontroller

AdaptersStorage

280V

95% efficient Idle wasted power

Page 34: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 3434

Operating RegimesOperating RegimesE

ner

gy

ple

nty

co

mp

uti

ng

Relative power requirements1.02.03.04.0 0

Per

form

ance

En

erg

y ad

apti

ve

com

pu

tin

g

En

erg

y d

efic

ien

t co

mp

uti

ng

En

erg

y ef

fici

ent

com

pu

tin

g

Page 35: 1K. Kant, Modeling Challenges in Distributed Energy Adaptive Computing Challenges in Distributed Energy Adaptive Computing K. Kant NSF and GMU

K. Kant, Modeling Challenges in Distributed Energy Adaptive ComputingK. Kant, Modeling Challenges in Distributed Energy Adaptive Computing 3535

So, What’s the ProblemSo, What’s the Problem

• Local constraints & controls end-to-end impacts– DC to DC load shift

• Service disruption & post-shift impact

– Client request to alter content• Less or more work for server

• Potential conflicting controls

Client Client

Network

Network

Server1storage

DC1

Server2storageDC2

Core Network

Core Network