vishal gupta* (georgia tech) ripal nathuji (microsoft ...vishal gupta* (georgia tech) ripal nathuji...

26
Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

Upload: others

Post on 31-May-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research)

* Work done during summer internship at Microsoft Research

Page 2: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

Symmetric

SMP

Asymmetric multicore processor

AMP

Different types of CPU cores

P P

P P P

P

P

CPU Cores

multicore processor

Page 3: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

B

B

C A

P P

P P

P P

P

SMP

AMP

SMP

AMP

Application

time T 2T 3T

Speedup!

Page 4: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

• How good are AMPs as compared to SMPs?

•  Can datacenter applications save power using AMPs?

Page 5: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

S … S S S

S … S S S

S … S S S

. . . . . . . . . . . .

Others

Processor

Datacenter

Server

P P

P P P

P

P SMP AMP

λdatacenter (throughput)

Page 6: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

•  Constant work •  Meet latency SLA

PdatacenterAMP < Pdatacenter

SMP ?

Page 7: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

• Energy Scaling

• Parallel Speedup …

Sequential execution

Parallel execution

Page 8: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

P P

P

Sequential application

SMP AMP

Area equivalent

Page 9: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

time TSLA

TSMP

TAMP

Slack

tlarge tsmall

P

P

P

SMP

AMP

Smaller core = lesser power

Page 10: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P … SMP AMP

Parallel application

Page 11: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

SMP AMP

Sequential Phase

Small cores: Bottleneck

Run on the fast core

Speedup = Higher throughput

Page 12: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

Server

Request Queue

Arrival Rate λ Service Rate

µ

Latency SLA

M/M/1 Queuing Model

E[T] =1

µ − λAvg.

Response Time

Page 13: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P … SMP AMP

Parallel application

Amdahl’s Law for Multicores

Parallel Speedup (PS) (refer to paper for ES)

Page 14: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

Area = 1 Area = r Perf = perf(r)

n = Chip area

P P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P

P P

P

P

P

P

P

P

P

P

P

P P

P P

SMP n=16, r=1

AMP n=16, r=4

SMP n=16, r=4

r = Area(Big/Core)

f = fraction of computation that can be parallelized

Page 15: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

µAMP ( f ,n,r) =1

1− fperf (r)

+f

n − r

Ref: Hill and Marty, Amdahl's law in the multicore era (IEEE Computer’08)

µSMP ( f ,n,r) =1

1− fperf (r)

+f

nr* perf (r)

Page 16: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

λdatacenter = NserverSMP * λserver

SMP

λdatacenter = NserverAMP * λserver

AMP

Datacenter capacity = No. of servers * Server throughput

Constant Work

λserverpeak = µ −

1TSLA

Page 17: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

PdatacenterSMP = Nserver

SMP *PserverSMP

PdatacenterAMP = Nserver

AMP *PserverAMP

Datacenter power (P) = No. of servers * Server power

Page 18: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

CPU Utilization (U)

Serv

er P

ower

C

onsu

mpt

ion

P(U

)

Idle Power

Peak Power

Ref: The Case for Energy-Proportional Computing, Barroso & Hölzle, IEEE Computer 2007

Page 19: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

CPU Utilization (U)

Frac

tion

of ti

me Server load distribution (Wload)

Pserver = Wload (U) *Pserver (U)∑

Page 20: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

PdatacenterAMP < Pdatacenter

SMP ?

Page 21: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

Upto 52% power savings n = 64

0% 10% 20% 30% 40% 50% 60%

0 0.2 0.4 0.6 0.8 1

Pow

er sa

ving

s of A

MP

over

SM

P

Fraction of work that can be parallelized (f)

r=32 r=16 r=8 r=4

Page 22: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

Upto 14% power savings

-25% -20% -15% -10% -5% 0% 5%

10% 15% 20%

5% 10% 15% 20% 25% 30% 35% 40% 45%

Pow

er sa

ving

s of A

MP

over

SM

P

Fraction of area sacrificed for small core

Small core bias Uniform bias Large core bias

Application A Application B Application C

Page 23: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

•  PS looks more promising that ES

•  Can we achieve these savings in reality?

Page 24: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

High r (realistic r = 3)

High (but not too high!) f

0% 10% 20% 30% 40% 50% 60%

0 0.5 1 Pow

er sa

ving

s of A

MP

over

SM

P

Fraction of work that can be parallelized (f)

r=32 r=16 r=8 r=4

Page 25: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

•  Scalability: Amdahl’s law assumes unbounded scalability

•  Migration overhead: zero migration overhead

•  Perfect scheduling: oracle scheduler

Actual savings are going to be lower

Page 26: Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft ...Vishal Gupta* (Georgia Tech) Ripal Nathuji (Microsoft Research) * Work done during summer internship at Microsoft Research

•  Potential for power savings in datacenters using AMPs

•  Parallel Speedup more promising than Energy Scaling

•  Practical considerations to realize full benefits

Future work: Extend our analysis to functional asymmetry