heracles: improving resource efficiency at scale isca’15 stanford university google, inc

23
Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc.

Upload: justin-blankenship

Post on 05-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

Heracles: Improving Resource Efficiency at ScaleISCA’15

Stanford UniversityGoogle, Inc.

Page 2: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

OutlineIntroductionDesign

◦Isolation Mechanisms◦Controllers

EvaluationConclusion

Page 3: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

MotivationAverage server utilization in most

datacenter is low, ranging between 10%~50%.◦Difficult to consolidate the latency-

critical services on a subset of highly utilized servers.

Increase the server utilization by launching best-effort tasks on the same server with a latency-critical job.

Page 4: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

Motivation(Cont.)Previous works tend to protect LC

workloads, but reduce the opportunities for higher utilization through co-location.

Page 5: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

GoalEliminate SLO violations at all

levels of load for the LC job while maximizing the throughput for BE tasks.

Page 6: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

HeraclesA real-time, feedback-based

controller◦Enables the safe co-location of best-

effort(BE) tasks alongside a latency-critical(LC) service.

◦Ensures that LC jobs meet their target while maximizing the resources given to BE tasks.

Page 7: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

Heracles(Cont.)◦Four hardware and software isolation

mechanisms. Hardware: shared cache partitioning,

fine-grained power/frequency setting. Software: core isolation, network traffic

control.

Page 8: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

Isolation Mechanisms(Soft)Core isolation

◦Pin workload to a set of core using cpuset cgroups.

◦Speed of (re)allocation: tens of milliseconds.

Network traffic◦Limit the outgoing bandwidth of BE

tasks using Linux traffic control.◦No limit on LC job.◦Take effect in less than hundreds of

milliseconds.

Page 9: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

Isolation Mechanisms(Hard)LLC isolation

◦Cache Allocation Technology(CAT) in recent Intel chip. Use way-partitioning to define non-

overlapping partitions on LLC. Take effect in a few milliseconds.

◦Implement software monitor to track the bandwidth usage of LC and BE jobs. Scale down the # of cores for BE jobs if LC

jobs does not receive sufficient bandwidth.

Page 10: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

Isolation Mechanisms(Hard)(Cont.)Power isolation

◦CPU frequency monitoring, Running Average Power Limit(RAPL), and per-core DVFS.

◦Take effect within a few milliseconds.

Page 11: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

Design ApproachAn optimization problem

◦Maximize utilization with the constraint that the SLO must be met.

Heracles ◦decomposes the high-dimensional

optimization problem into many smaller and independent problem. Decoupling interference sources.

◦Monitors latency, latency slack, and load. Adjust the BE job allocation.

Page 12: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

System Diagram

Page 13: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

High-level Controller

Page 14: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

Core & Memory Sub-controller

Page 15: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

Max Load under SLO

Page 16: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

Power and Network Sub-controller

Page 17: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

EvaluationTwo sets of experiments

◦Co-locates LC applications with BE tasks on a single server.

◦Measuring end-to-end latency of Websearch on tens of servers. BE tasks are also running.

Effective Machine Utilization(EMU)◦LC throughput + BE throughput

Page 18: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

WorkloadsThree Google production LC

workloads:◦websearch◦ml_cluster

Real-time text clustering using machine learning

◦memkeyval In-memory key-value store

Run LC workloads with benchmarks that stress a single shared resource.◦Stream-LLC, Stream-DRAM, cpu-pwr, iperf, brain, and streetview.

Page 19: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

Latency of LC Applications

Page 20: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

EMU

Page 21: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

Shared Resource Utilization

Page 22: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

Websearch in Cluster

Page 23: Heracles: Improving Resource Efficiency at Scale ISCA’15 Stanford University Google, Inc

ConclusionHeracles

◦a heuristic feedback-based system that manages four isolation mechanisms to enable a latency-critical workload to be co-located with batch jobs without SLO violations.

◦Evaluation on real hardware demonstrates an average utilization of 90% across all evaluated scenarios without any SLO violations for the latency-critical job.