heracles: improving resource efficiency at scale isca’15 stanford university google, inc

Heracles: Improving Resource Efficiency at ScaleISCA’15

Stanford UniversityGoogle, Inc.

OutlineIntroductionDesign

◦Isolation Mechanisms◦Controllers

EvaluationConclusion

MotivationAverage server utilization in most

datacenter is low, ranging between 10%~50%.◦Difficult to consolidate the latency-

critical services on a subset of highly utilized servers.

Increase the server utilization by launching best-effort tasks on the same server with a latency-critical job.

Motivation(Cont.)Previous works tend to protect LC

workloads, but reduce the opportunities for higher utilization through co-location.

GoalEliminate SLO violations at all

levels of load for the LC job while maximizing the throughput for BE tasks.

HeraclesA real-time, feedback-based

controller◦Enables the safe co-location of best-

effort(BE) tasks alongside a latency-critical(LC) service.

◦Ensures that LC jobs meet their target while maximizing the resources given to BE tasks.

Heracles(Cont.)◦Four hardware and software isolation

mechanisms. Hardware: shared cache partitioning,

fine-grained power/frequency setting. Software: core isolation, network traffic

control.

Isolation Mechanisms(Soft)Core isolation

◦Pin workload to a set of core using cpuset cgroups.

◦Speed of (re)allocation: tens of milliseconds.

Network traffic◦Limit the outgoing bandwidth of BE

tasks using Linux traffic control.◦No limit on LC job.◦Take effect in less than hundreds of

milliseconds.

Isolation Mechanisms(Hard)LLC isolation

◦Cache Allocation Technology(CAT) in recent Intel chip. Use way-partitioning to define non-

overlapping partitions on LLC. Take effect in a few milliseconds.

◦Implement software monitor to track the bandwidth usage of LC and BE jobs. Scale down the # of cores for BE jobs if LC

jobs does not receive sufficient bandwidth.

Isolation Mechanisms(Hard)(Cont.)Power isolation

◦CPU frequency monitoring, Running Average Power Limit(RAPL), and per-core DVFS.

◦Take effect within a few milliseconds.

Design ApproachAn optimization problem

◦Maximize utilization with the constraint that the SLO must be met.

Heracles ◦decomposes the high-dimensional

optimization problem into many smaller and independent problem. Decoupling interference sources.

◦Monitors latency, latency slack, and load. Adjust the BE job allocation.

System Diagram

High-level Controller

Core & Memory Sub-controller

Max Load under SLO

Power and Network Sub-controller

EvaluationTwo sets of experiments

◦Co-locates LC applications with BE tasks on a single server.

◦Measuring end-to-end latency of Websearch on tens of servers. BE tasks are also running.

Effective Machine Utilization(EMU)◦LC throughput + BE throughput

WorkloadsThree Google production LC

workloads:◦websearch◦ml_cluster

Real-time text clustering using machine learning

◦memkeyval In-memory key-value store

Run LC workloads with benchmarks that stress a single shared resource.◦Stream-LLC, Stream-DRAM, cpu-pwr, iperf, brain, and streetview.

Latency of LC Applications

Shared Resource Utilization

Websearch in Cluster

ConclusionHeracles

◦a heuristic feedback-based system that manages four isolation mechanisms to enable a latency-critical workload to be co-located with batch jobs without SLO violations.

◦Evaluation on real hardware demonstrates an average utilization of 90% across all evaluated scenarios without any SLO violations for the latency-critical job.

heracles: improving resource efficiency at scale isca’15 stanford university google, inc

lc jobs

latencycritical job

bandwidth usage of lc

monitors latency

latency slack

latencycritical workload

latencycritical services

effortbe tasks

Documents

heracles newsletter 9 - heracles project › ... ›...

heracles y el arte

heracles - cfpk...heracles newsletter 2019/1 heracles 5....

heracles: improving resource efficiency at scale ·...

isca roralpresentation20140930

conferencia heracles 2

nordea heracles london

isca amendments

doce trabajos de heracles

heracles (hércules)

datacenter simulation methodologies:...

red heracles

danaid ii: heracles - university of...

heracles in greek epic studies on the … › ifikk ›...

isca presentation.key

isca notes

heracles ídolo de masas

isca abelhas

heracles and pindar

the labors of heracles