knightshift : scaling the energy proportionality wall through server-level heterogeneity

25
KnightShift: Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity Daniel Wong Murali Annavaram University of Southern California MICRO-2012 Supported by NSF and DARPA

Upload: ward

Post on 08-Jan-2016

37 views

Category:

Documents


1 download

DESCRIPTION

Supported by NSF and DARPA. KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity. Daniel Wong Murali Annavaram University of Southern California. MICRO-2012. Overview. 2. EP Trends. 3. KnightShift. 4. Effect on EP. 5. Evaluation. 1. Measuring EP. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

KnightShift: Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

Daniel Wong Murali AnnavaramUniversity of Southern California

MICRO-2012

Supported byNSF and DARPA

Page 2: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

Overview

Overview | 2

0% 20% 40% 60% 80% 100%0%

20%

40%

60%

80%

100%

Actual

Utilization

Pea

k po

wer

Oct-03 Mar-05 Jul-06 Dec-070

0.2

0.4

0.6

0.8

1

TimeE

nerg

y P

ropo

rtio

nalit

y

0% 20% 40% 60% 80% 100%-20%

0%

20%

40%

60%

80%

100%

120%

ActualLinearIdealKnightShift

Utilization

Pea

k po

wer

| 1. Measuring EP | 2. EP Trends | 3. KnightShift

| 4. Effect on EP | 5. Evaluation

Page 3: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

Measuring Energy Proportionality

Measuring EP | 3

| Energy Proportionality Curve

| Actual – empirically measured power usage | Linear – extrapolated from peak to idle power

usage| Ideal – utilization and power are perfectly

proportional

0% 20% 40% 60% 80% 100%0%

20%

40%

60%

80%

100%

Ac-tual

Utilization

Pe

ak

po

we

r

Server BServer A

0% 20% 40% 60% 80% 100%0%

20%

40%

60%

80%

100%

Ac-tual

Utilization

Pe

ak

po

we

r

Page 4: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

| DR is a course first-order approximation of EP❖ …but it is not accurate – only measures two extremes❖ Ignores power consumption at intermediate utilizations

| Assuming 100W peak and Google datacenter utilization[1]

❖ Server A = 68.6W , Server B = 64.6W

0% 20% 40% 60% 80% 100%0%

20%

40%

60%

80%

100%

Ac-tual

Utilization

Pe

ak

po

we

r

0% 20% 40% 60% 80% 100%0%

20%

40%

60%

80%

100%

Ac-tual

Utilization

Pe

ak

po

we

rDynamic Range (DR)

Measuring EP | 4

DR=60% DR=50%

[1] L. Barroso and U. Holzle,“The Case For Energy-proportional Computing,” Computer, Dec 2007.

How can we accurately quantify EP?

Page 5: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

| EP is a better indicator of energy usage than DR| Why is DR not enough?

❖ EP = DR + how linear the energy proportionality curve

Energy Proportionality (EP)[2]

Measuring EP | 5

0% 20% 40% 60% 80% 100%0%

20%

40%

60%

80%

100%

Ac-tualIdeal

Utilization

Pe

ak

po

we

r

[2] F. Ryckbosch, S. Polfliet, and L. Eeckhout, “Trends in Server Energy Proportionality,” Computer,2011.

EP=53% EP=57%

0% 20% 40% 60% 80% 100%0%

20%

40%

60%

80%

100%

Ac-tual

Utilization

Pe

ak

po

we

r???

Page 6: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

| Linearly Energy Proportional (LD=0)EP=DR

| Superlinearly Energy Proportional (+LD)EP<DR

| Sublinearly Energy Proportional (-LD) EP>DR

| LD shows how far off the actual EP curve is from the linear EP curve

Linear Deviation (LD)

Measuring EP | 6

0% 20% 40% 60% 80% 100%0%

20%

40%

60%

80%

100%

Ac-tual

Utilization

Pe

ak

po

we

r

Superlinear Sublinear

0% 20% 40% 60% 80% 100%0%

20%

40%

60%

80%

100%

Ac-tual

Utilization

Pe

ak

po

we

r

Page 7: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

| Proportionality Gap (PG) @ utilization x%

Proportionality Gap (PG)

Measuring EP | 7

0% 20% 40% 60% 80% 100%0%

20%

40%

60%

80%

100%

Ac-tual

Utilization

Pe

ak

po

we

r

0% 20% 40% 60% 80% 100%0%

20%

40%

60%

80%

100%

Ac-tual

Utilization

Pe

ak

po

we

r

0% 20% 40% 60% 80% 100%0%

20%

40%

60%

80%

100%

Utilization

Pro

po

rtio

na

tliy

Ga

p

0% 20% 40% 60% 80% 100%0%

20%

40%

60%

80%

100%

Utilization

Pro

po

rtio

na

tliy

Ga

p

Page 8: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

| SPECpower_ssj2008❖ Measures performance and power at 10% utilization

intervals

| 291 servers| November 2007 – December 2011

Energy Proportionality Trends

Trends | 8

Page 9: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

| 2007-2009❖ DR improves from

50% to 80%

| Since 2009❖ DR stalled at 80%

| 100% DR very difficult❖ Power supplies,

voltage converters, fans, chipsets, network, etc.

Dynamic Range Trends

Trends | 9

Oct-03 Mar-05 Jul-06 Dec-070

0.2

0.4

0.6

0.8

1

Time

Dyn

am

ic R

an

ge

Page 10: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

Oct-03 Mar-05 Jul-06 Dec-070

0.2

0.4

0.6

0.8

1"+LD""-LD"

Time

En

erg

y P

rop

ort

ion

atli

y

| EP also stalled around 80%❖ Caused by DR

| High EP servers are -LD

Energy Proportionality Trends

Trends | 10

Since DR growth stalled, the only way to improve EP is through lowering LD

Page 11: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

0% 20% 40% 60% 80% 100%0

0.2

0.4

0.6

0.8LOW(<50)MID(50-75)

Utilization

Pro

po

rtio

na

tliy

Ga

p

| Large PG at low utilizationregardless of EP

| As EP improves, PG at high utilization near 0

Proportionality Gap Trends

Trends | 11

Energy disproportionality at low utilization will be the main obstacle to

achieving perfectly ideal EP

Page 12: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

Oct-03 Mar-05 Jul-06 Dec-070

1000

2000

3000

4000

5000

6000

7000Efficiency @ 100% LoadEfficiency @ 10% Load

Time

ssj_

op

s/w

att

| Energy efficiency is defined as ssj_ops/watt

| Energy efficiency at high loadhas grown dramatically

| Energy efficiency at low loadhas grown slowly

| Most datacenter workloadsspent majority of time at low load

Energy Efficiency Trends

Trends | 12

Low utilization energy efficiency growth must be addressed to improve overall server

energy efficiency

Page 13: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

| EP stall primarily caused by stall in DR❖ Main focus has been improving peak and idle power

consumption

| To improve EP in the future:❖ Improve LD❖ Target large proportionality gap at low utilizations

| Previous server-level low power modes are inactive❖ Exploits idle periods DR improvements

| There is now a need for server-level active low power modes❖ Exploits low utilization periods LD/PG improvements

Overcoming the EP Wall

Trends | 13

Page 14: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

| Server-level active low power mode solution to exploit low utilization periods

| Basic Idea -- fronts a high-power primary server with a low-power compute node, called the Knight

| Knight capability = fraction of throughput compared to primary server

| KnightShift consists of 3 components:❖ KnightShift hardware❖ System software

✒Supports certain functionality (data sharing, networking, etc)

❖ KnightShift runtime✒Supports KnightShift functionality

KnightShift Server Architecture

KnightShift | 14

Page 15: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

| Primary Server and Knight contains independent CPU/Memory/Chipset

| Independent power domains❖ Remote wakeup through

wake-on-lan| Shared Disk (NFS)| Networking through

simple router❖ Communicate b/t both nodes❖ Expose only Knight’s IP ❖ Requires Knight to stay on

| Implementation Options:❖ Ensemble-level (Commodity parts)❖ Board-level (Motherboard Intg.)❖ Server-level (Add-on board)

Ensemble-level KnightShift

KnightShift | 15

Page 16: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

| Example KnightShift operation

KnightShift Runtime

KnightShift | 16

Sle

ep

Wak

eup

awak

e

syn

c

Low High

Pow

er C

onsu

mpt

ion

Primary: Flush memory state

Primary: Send sleep message and enter low power state

Knight: Begin processing request Knight:

Sends wakeup message

Primary: Wakes up and sends awake message

Knight: Flush memory state. Sends sync message.

Primary: Begin processing requests

Primary Server

Knight

Page 17: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

| Monitors server utilization| Mode switching policy

❖ Aggressively switch into the Knight❖ Conservatively switch out off the Knight❖ More optimized policy will improve response time at cost

of energy

| Redirect requests (Using scheduler/web balancer)❖ Forward incoming requests to active node

| Coordinating mode switching❖ Ensure data consistency

KnightShift Runtime

KnightShift | 17

Page 18: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

| KnightShift-enhanced 291 SPECpower servers| Theoretically scale power of Knight

❖ PowerKnight = C1.7 × PowerPrimary, with Knight capability C

Effect of KnightShift on EP

KnightShift EP | 18

0% 20% 40% 60% 80% 100%-20%

0%

20%

40%

60%

80%

100%

120%

ActualIdealKnightShift

Utilization

Pe

ak

po

we

r

Page 19: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

20% Knight

50% Knight

Effect of KnightShift on PG

KnightShift EP | 19

0% 20% 40% 60% 80% 100%

-0.2

0

0.2

0.4

0.6

LOW(<50)

Utilization

Pro

po

rtio

na

tliy

Ga

p

KnightShift effectively close the proportionality gap

at low utilization

0% 20% 40% 60% 80% 100%

-0.2

0

0.2

0.4

0.6LOW(<50)MID(50-75)

Utilization

Pro

po

rtio

na

tliy

Ga

p

Page 20: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

| KnightShift essentially shifted all servers to –LD| All servers now have EP>60% (from 20%)| Some servers with EP=1

❖KnightShift can achieve ideal EP!

Effect of KnightShift on EP and LD

KnightShift EP | 20

-0.3 -0.2 -0.1 0.0 0.1 0.2-0.2

-1.66533453693773E-16

0.2

0.4

0.6

0.8

1

1.2

Linear DeviationEn

ergy

Pro

porti

onal

ity

Page 21: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

| Primary Server❖ Dual 4-core Intel Xeon

L5630❖ 500GB HD, 36GB DRAM❖ 156W-205W❖ Sleep/Wakeup time 5/20s

| Knight❖ Intel Atom D525 (15%

capable)❖ 500GB HD, 1GB DRAM❖ 15W-16.7W

| EP improved from 24% to 48%

Prototype Evaluation

Evaluation | 21

Page 22: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

| Wikipedia-based benchmark (WikiBench)[3]

❖Cloned Wikipedia database dump❖Request trace from actual Wikipedia traffic

Prototype Evaluation

Evaluation | 22

[3]Wikibench – http://www.wikibench.eu

Page 23: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

Prototype Results

Evaluation | 23

High power usage during

high utilization

Knight saves significant

power during low utilization

| Queuing model simulation| Sensitivity Analysis

❖ Utilization patterns❖ Knight capability❖ Transition time

Page 24: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

| EP growth stalled by DR| Large disproportionality at low utilization

| Key to improving EP❖ Improve LD❖ Target low utilization proportionality gap❖ Need for server-level active low power mode

| KnightShift exploits low utilization periods using a Knight❖ Enables high efficiency at low utilization❖ Effectively improves DR, LD and closes PG gap at low

util.❖ In some cases, achieves ideal EP

Conclusion

Conclusion | 24

Page 25: KnightShift : Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity

Thank you!

Questions?

Conclusion | 25