retro: targeted resource management in multi …...code change increases database usage, shifts...
TRANSCRIPT
Targeted Resource Management in Multi-tenant Distributed Systems
Jonathan MaceBrown University
Peter BodikMSR Redmond
Rodrigo FonsecaBrown University
Madanlal MusuvathiMSR Redmond
Resource Management in Multi-Tenant Systems
2
Failure in availability zone cascades to shared control plane, causes thread pool starvation for all zones• April 2011 – Amazon EBS Failure
Resource Management in Multi-Tenant Systems
2
Failure in availability zone cascades to shared control plane, causes thread pool starvation for all zones• April 2011 – Amazon EBS Failure
Aggressive background task responds to increased hardware capacity with deluge of warnings and logging• Aug. 2012 – Azure Storage Outage
Resource Management in Multi-Tenant Systems
2
Code change increases database usage, shifts bottleneck to unmanaged application-level lock• Nov. 2014 – Visual Studio Online outage
Failure in availability zone cascades to shared control plane, causes thread pool starvation for all zones• April 2011 – Amazon EBS Failure
Aggressive background task responds to increased hardware capacity with deluge of warnings and logging• Aug. 2012 – Azure Storage Outage
Resource Management in Multi-Tenant Systems
2
Code change increases database usage, shifts bottleneck to unmanaged application-level lock• Nov. 2014 – Visual Studio Online outage
Failure in availability zone cascades to shared control plane, causes thread pool starvation for all zones• April 2011 – Amazon EBS Failure
Aggressive background task responds to increased hardware capacity with deluge of warnings and logging• Aug. 2012 – Azure Storage Outage
Resource Management in Multi-Tenant Systems
2
Shared storage layer bottlenecks circumvent resource management layer• 2014 – Communication with Cloudera
Code change increases database usage, shifts bottleneck to unmanaged application-level lock• Nov. 2014 – Visual Studio Online outage
Failure in availability zone cascades to shared control plane, causes thread pool starvation for all zones• April 2011 – Amazon EBS Failure
Aggressive background task responds to increased hardware capacity with deluge of warnings and logging• Aug. 2012 – Azure Storage Outage
Resource Management in Multi-Tenant Systems
2
Degraded performance, Violated SLOs, system outages
Shared storage layer bottlenecks circumvent resource management layer• 2014 – Communication with Cloudera
Code change increases database usage, shifts bottleneck to unmanaged application-level lock• Nov. 2014 – Visual Studio Online outage
Failure in availability zone cascades to shared control plane, causes thread pool starvation for all zones• April 2011 – Amazon EBS Failure
Aggressive background task responds to increased hardware capacity with deluge of warnings and logging• Aug. 2012 – Azure Storage Outage
Resource Management in Multi-Tenant Systems
3
Degraded performance, Violated SLOs, system outages
Shared storage layer bottlenecks circumvent resource management layer• 2014 – Communication with Cloudera
Code change increases database usage, shifts bottleneck to unmanaged application-level lock• Nov. 2014 – Visual Studio Online outage
Failure in availability zone cascades to shared control plane, causes thread pool starvation for all zones• April 2011 – Amazon EBS Failure
Aggressive background task responds to increased hardware capacity with deluge of warnings and logging• Aug. 2012 – Azure Storage Outage
Resource Management in Multi-Tenant Systems
3
Degraded performance, Violated SLOs, system outages
Shared storage layer bottlenecks circumvent resource management layer• 2014 – Communication with Cloudera
OSHypervisor
Code change increases database usage, shifts bottleneck to unmanaged application-level lock• Nov. 2014 – Visual Studio Online outage
Failure in availability zone cascades to shared control plane, causes thread pool starvation for all zones• April 2011 – Amazon EBS Failure
Aggressive background task responds to increased hardware capacity with deluge of warnings and logging• Aug. 2012 – Azure Storage Outage
Resource Management in Multi-Tenant Systems
3
Degraded performance, Violated SLOs, system outages
Shared storage layer bottlenecks circumvent resource management layer• 2014 – Communication with Cloudera
OSHypervisor
Containers / VMs
4
Containers / VMs
Shared Systems:Storage, Database,
Queueing, etc.
4
5
5
Monitors resource usage of each tenant in near real-time
Actively schedules tenants and activities
5
Monitors resource usage of each tenant in near real-time
Actively schedules tenants and activities
High-level, centralized policies:Encapsulates resource management logic
5
Monitors resource usage of each tenant in near real-time
Actively schedules tenants and activities
High-level, centralized policies:Encapsulates resource management logic
Abstractions – not specific to resource type, system
Achieve different goals: guarantee average latencies, fair share a resource, etc.
5
Hadoop Distributed File System (HDFS)
6
Hadoop Distributed File System (HDFS)
HDFS NameNode
Filesystem metadata6
HDFS DataNodeHDFS DataNode
HDFS DataNode
Hadoop Distributed File System (HDFS)
Replicated block storage
HDFS NameNode
Filesystem metadata6
HDFS DataNodeHDFS DataNode
HDFS DataNode
Hadoop Distributed File System (HDFS)
Replicated block storage
HDFS NameNode
Filesystem metadata
Rename
6
HDFS DataNodeHDFS DataNode
HDFS DataNode
Hadoop Distributed File System (HDFS)
Replicated block storage
HDFS NameNode
Filesystem metadata
ReadRename
6
HDFS DataNodeHDFS DataNode
HDFS DataNode
Hadoop Distributed File System (HDFS)
Replicated block storage
HDFS NameNode
Filesystem metadata
ReadRename
6
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
7
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
7
time
500
0
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
7
time
500
0
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
8
500
0
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
8
500
012
0
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
8
500
012
0
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
9
500
012
0
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
9
500
012
0500
0
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
10
500
012
0500
0
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
10
500
012
0500
0
HDFS NameNode HDFS DataNodeHDFS DataNode
HDFS DataNode
10
500
012
0500
0500
0
Local storage
HDFS DataNode
11
Local storage
HDFS DataNode
11
Local storage
HBase RegionServer
HDFS DataNode
11
Local storage
HBase RegionServer
HDFS DataNode
11
Local storage
HBase RegionServer
HDFS DataNode
MapReduceShuffler
Yarn ContainerYarn ContainerHadoop YARN
Container
MapReduce Tasks
11
Local storage
HBase RegionServer
HDFS DataNode
MapReduceShuffler
Yarn ContainerYarn ContainerHadoop YARN
Container
MapReduce Tasks
11
Local storage
HBase RegionServer
HDFS DataNode
Hadoop YARNNodeManager
Yarn ContainerYarn ContainerHadoop YARN Container
MapReduce
Local storage
HBase RegionServer
HDFS DataNode
Hadoop YARNNodeManager
Yarn ContainerYarn ContainerHadoop YARN Container
MapReduce
Local storage
HBase RegionServer
HDFS DataNode
MapReduceShuffler
Yarn ContainerYarn ContainerHadoop YARN
Container
MapReduce Tasks
11
Goals
12
Co-ordinated control across processes, machines, and services
Goals
12
Co-ordinated control across processes, machines, and services
Handle system and application level resources
Goals
12
Co-ordinated control across processes, machines, and services
Handle system and application level resources
Principals: tenants, background tasks
Goals
12
Co-ordinated control across processes, machines, and services
Handle system and application level resources
Principals: tenants, background tasks
Real-time and reactive
Goals
12
Co-ordinated control across processes, machines, and services
Handle system and application level resources
Principals: tenants, background tasks
Real-time and reactive
Efficient: Only control what is needed
Goals
12
Architecture
13
14
Tenant Requests
14
Workflows
Purpose: identify requests from different users, background activities
eg, all requests from a tenant over time
eg, data balancing in HDFS
15
Workflows
Purpose: identify requests from different users, background activities
eg, all requests from a tenant over time
eg, data balancing in HDFS
Unit of resource measurement, attribution, and enforcement
15
Workflows
Purpose: identify requests from different users, background activities
eg, all requests from a tenant over time
eg, data balancing in HDFS
Unit of resource measurement, attribution, and enforcement
Tracks a request across varying levels of granularityOrthogonal to threads, processes, network flows, etc.
15
Workflows
16
Workflows
16
Workflows Resources
Purpose: cope with diversity of resources
16
Workflows Resources
Purpose: cope with diversity of resources
What we need:
1. Identify overloaded resources
16
Workflows Resources
Purpose: cope with diversity of resources
What we need:
1. Identify overloaded resources
2. Identify culprit workflows
16
Workflows Resources
Purpose: cope with diversity of resources
What we need:
1. Identify overloaded resources
Slowdown Ratio of how slow the resource is now compared to its baseline performance with no contention.
2. Identify culprit workflows
16
Workflows Resources
Purpose: cope with diversity of resources
What we need:
1. Identify overloaded resources
Slowdown Ratio of how slow the resource is now compared to its baseline performance with no contention.
2. Identify culprit workflows
Load Fraction of current utilization that we can attribute to each workflow
16
Workflows Resources
17
Purpose: cope with diversity of resources
What we need:
1. Identify overloaded resources
Slowdown Ratio of how slow the resource is now compared to its baseline performance with no contention.
2. Identify culprit workflows
Load Fraction of current utilization that we can attribute to each workflow
Workflows Resources
17
Purpose: cope with diversity of resources
What we need:
1. Identify overloaded resources
Slowdown Ratio of how slow the resource is now compared to its baseline performance with no contention.
2. Identify culprit workflows
Load Fraction of current utilization that we can attribute to each workflow
Workflows Resources
Slowdown (queue time + execute time) / execute timeeg. 100ms queue, 10ms execute
=> slowdown 11
Load time spent executingeg. 10ms execute
=> load 10
17
Purpose: cope with diversity of resources
What we need:
1. Identify overloaded resources
Slowdown Ratio of how slow the resource is now compared to its baseline performance with no contention.
2. Identify culprit workflows
Load Fraction of current utilization that we can attribute to each workflow
Workflows Resources
17
Workflows Resources Control Points
Goal: enforce resource management decisions
18
Workflows Resources Control Points
Goal: enforce resource management decisions
Decoupled from resources
Rate-limits workflows, agnostic to underlying implementation e.g.,
token bucket
priority queue
18
Workflows Resources Control Points
Goal: enforce resource management decisions
Decoupled from resources
Rate-limits workflows, agnostic to underlying implementation e.g.,
token bucket
priority queue
19
Workflows Resources Control Points
Goal: enforce resource management decisions
Decoupled from resources
Rate-limits workflows, agnostic to underlying implementation e.g.,
token bucket
priority queue
19
Workflows Resources Control Points
Goal: enforce resource management decisions
Decoupled from resources
Rate-limits workflows, agnostic to underlying implementation e.g.,
token bucket
priority queue
19
Workflows Resources Control Points
Goal: enforce resource management decisions
Decoupled from resources
Rate-limits workflows, agnostic to underlying implementation e.g.,
token bucket
priority queue
19
Workflows Resources Control Points
Goal: enforce resource management decisions
Decoupled from resources
Rate-limits workflows, agnostic to underlying implementation e.g.,
token bucket
priority queue
19
Workflows Resources Control Points
Goal: enforce resource management decisions
Decoupled from resources
Rate-limits workflows, agnostic to underlying implementation e.g.,
token bucket
priority queue
19
Workflows Resources Control Points
Goal: enforce resource management decisions
Decoupled from resources
Rate-limits workflows, agnostic to underlying implementation e.g.,
token bucket
priority queue
19
Workflows Resources Control Points
20
Pervasive Measurement
1. Pervasive MeasurementAggregated locally then reported centrally once per second
Workflows Resources Control Points
20
Re
tro
Co
ntr
oll
er
AP
I
Pervasive Measurement
1. Pervasive MeasurementAggregated locally then reported centrally once per second
2. Centralized ControllerGlobal, abstracted view of the system
Workflows Resources Control Points
20
Re
tro
Co
ntr
oll
er
AP
I
Pervasive Measurement
1. Pervasive MeasurementAggregated locally then reported centrally once per second
2. Centralized ControllerGlobal, abstracted view of the system
Policies run in continuous control loop
Po
licy
Po
licy
Po
licy
Workflows Resources Control Points
20
Re
tro
Co
ntr
oll
er
AP
I
Pervasive Measurement
1. Pervasive MeasurementAggregated locally then reported centrally once per second
2. Centralized ControllerGlobal, abstracted view of the system
Policies run in continuous control loop
3. Distributed EnforcementCo-ordinates enforcement using distributed token bucket
Po
licy
Po
licy
Po
licy
Workflows Resources Control Points
Distributed Enforcement
20
Re
tro
Co
ntr
oll
er
AP
I
Distributed Enforcement
Pervasive Measurement
Workflows Resources Control Points
Po
licy
Po
licy
Po
licy
“Control Plane” for resource management
Global, abstracted view of the systemEasier to write
Reusable
21
22
Example: LatencySLOPolicy
22
Example: LatencySLOH High Priority Workflows
Policy
“200ms average request latency”
22
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
“200ms average request latency”
(use spare capacity)
22
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
“200ms average request latency”
(use spare capacity)monitor latencies
22
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
“200ms average request latency”
(use spare capacity)monitor latencies
attribute interference
22
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
“200ms average request latency”
(use spare capacity)monitor latencies
throttle interfering workflows
23
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
23
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
24
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
Select the high priority workflow Wwith worst performance
24
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
Select the high priority workflow Wwith worst performance
Weight low priority workflows by their interference with W
24
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
Select the high priority workflow Wwith worst performance
Weight low priority workflows by their interference with W
Throttle low priority workflows proportionally to their weight
25
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
Weight low priority workflows by their interference with W
Select the high priority workflow Wwith worst performance
Throttle low priority workflows proportionally to their weight
26
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
Throttle low priority workflows proportionally to their weight
Select the high priority workflow Wwith worst performance
Weight low priority workflows by their interference with W
27
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
Select the high priority workflow Wwith worst performance
Weight low priority workflows by their interference with W
Throttle low priority workflows proportionally to their weight
27
1 foreach candidate in H2 miss[candidate] = latency(candidate) / guarantee[candidate]3 W = candidate in H with max miss[candidate]
4 foreach rsrc in resources() // calculate importance of each resource for hipri5 importance[rsrc] = latency(W, rsrc) * log(slowdown(rsrc))
6 foreach lopri in L // calculate low priority workflow interference7 interference[lopri] = Σrsrc importance[rsrc] * load(lopri, rsrc) / load(rsrc)
8 foreach lopri in L // normalize interference9 interference[lopri] /= Σk interference[k]
10 foreach lopri in L11 if miss[W] > 1 // throttle12 scalefactor = 1 – α * (miss[W] – 1) * interference[lopri]13 else // release14 scalefactor = 1 + β
15 foreach cpoint in controlpoints() // apply new rates16 set_rate(cpoint, lopri, scalefactor * get_rate(cpoint, lopri)
Example: LatencySLOH High Priority Workflows
Policy
L Low Priority Workflows
Select the high priority workflow Wwith worst performance
Weight low priority workflows by their interference with W
Throttle low priority workflows proportionally to their weight
Other types of policy…
28
Other types of policy…
Bottleneck FairnessDetect most overloaded resource
Fair-share resource between tenants using it
Policy
28
Other types of policy…
Bottleneck FairnessDetect most overloaded resource
Fair-share resource between tenants using it
Dominant Resource FairnessEstimate demands and capacities from measurements
Policy
Policy
28
Other types of policy…
Bottleneck FairnessDetect most overloaded resource
Fair-share resource between tenants using it
Dominant Resource FairnessEstimate demands and capacities from measurements
Policy
Policy
ConciseAny resources can be bottleneck (policy doesn’t care)Not system specific
28
Evaluation
29
Instrumentation
30
Instrumentation
Retro implementation in JavaInstrumentation Library
Central controller implementation
30
Instrumentation
Retro implementation in JavaInstrumentation LibraryCentral controller implementation
To enable RetroPropagate Workflow ID within application (like X-Trace, Dapper)Instrument resources with wrapper classes
30
Instrumentation
Retro implementation in JavaInstrumentation LibraryCentral controller implementation
To enable RetroPropagate Workflow ID within application (like X-Trace, Dapper)Instrument resources with wrapper classes
Overheads
31
Instrumentation
Retro implementation in JavaInstrumentation LibraryCentral controller implementation
To enable RetroPropagate Workflow ID within application (like X-Trace, Dapper)Instrument resources with wrapper classes
OverheadsResource instrumentation automatic using AspectJ
31
Instrumentation
Retro implementation in JavaInstrumentation LibraryCentral controller implementation
To enable RetroPropagate Workflow ID within application (like X-Trace, Dapper)Instrument resources with wrapper classes
OverheadsResource instrumentation automatic using AspectJ
Overall 50-200 lines per system to modify RPCs
31
Instrumentation
Retro implementation in JavaInstrumentation LibraryCentral controller implementation
To enable RetroPropagate Workflow ID within application (like X-Trace, Dapper)Instrument resources with wrapper classes
OverheadsResource instrumentation automatic using AspectJ
Overall 50-200 lines per system to modify RPCs
Retro overhead: max 1-2% on throughput, latency
31
Experiments
32
ExperimentsYARN MapReduce
ZooKeeper
Systems
HDFSHBase
32
ExperimentsYARN MapReduce
ZooKeeper
Systems
Workflows MapReduce Jobs (HiBench)HBase (YCSB)HDFS clientsBackground Data ReplicationBackground Heartbeats
HDFSHBase
32
ExperimentsYARN MapReduce
ZooKeeper
Systems
Workflows MapReduce Jobs (HiBench)HBase (YCSB)HDFS clientsBackground Data ReplicationBackground Heartbeats
ResourcesCPU, Disk, Network (All systems)Locks, Queues (HDFS, HBase)
HDFSHBase
32
ExperimentsYARN MapReduce
ZooKeeper
Systems
Workflows MapReduce Jobs (HiBench)HBase (YCSB)HDFS clientsBackground Data ReplicationBackground Heartbeats
ResourcesCPU, Disk, Network (All systems)Locks, Queues (HDFS, HBase)
PoliciesLatencySLO
Bottleneck Fairness
Dominant Resource Fairness
HDFSHBase
32
Experiments
Policies for a mixture of systems, workflows, and resources
Results on clusters up to 200 nodes
YARN MapReduce
ZooKeeper
Systems
Workflows MapReduce Jobs (HiBench)HBase (YCSB)HDFS clientsBackground Data ReplicationBackground Heartbeats
ResourcesCPU, Disk, Network (All systems)Locks, Queues (HDFS, HBase)
PoliciesLatencySLO
Bottleneck Fairness
Dominant Resource Fairness
See paper for full experiment results
HDFSHBase
32
Experiments
Policies for a mixture of systems, workflows, and resources
Results on clusters up to 200 nodes
YARN MapReduce
ZooKeeper
Systems
Workflows MapReduce Jobs (HiBench)HBase (YCSB)HDFS clientsBackground Data ReplicationBackground Heartbeats
ResourcesCPU, Disk, Network (All systems)Locks, Queues (HDFS, HBase)
PoliciesLatencySLO
Bottleneck Fairness
Dominant Resource Fairness
See paper for full experiment results
This talk LatencySLO policy results
HDFSHBase
32
Experiments
Policies for a mixture of systems, workflows, and resources
Results on clusters up to 200 nodes
YARN MapReduce
ZooKeeper
Systems
Workflows MapReduce Jobs (HiBench)HBase (YCSB)HDFS clientsBackground Data ReplicationBackground Heartbeats
ResourcesCPU, Disk, Network (All systems)Locks, Queues (HDFS, HBase)
PoliciesLatencySLO
Bottleneck Fairness
Dominant Resource Fairness
See paper for full experiment results
This talk LatencySLO policy results
HDFSHBase
32
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
33
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
33
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
33
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
33
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
33
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
33
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
33
0
5
10
15
0 5 10 15 20 25 30
Slo
wd
ow
n
Time [minutes]
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
34
0
5
10
15
0 5 10 15 20 25 30
Slo
wd
ow
n
Time [minutes]
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
Disk
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
34
0
5
10
15
0 5 10 15 20 25 30
Slo
wd
ow
n
Time [minutes]
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
Disk
CPU
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
34
0
5
10
15
0 5 10 15 20 25 30
Slo
wd
ow
n
Time [minutes]
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
Disk
CPU
HDFS NN Lock
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
34
0
5
10
15
0 5 10 15 20 25 30
Slo
wd
ow
n
Time [minutes]
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
Disk
CPU
HDFS NN Lock
HDFS NN Queue
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
34
0
5
10
15
0 5 10 15 20 25 30
Slo
wd
ow
n
Time [minutes]
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
Disk
CPU
HDFS NN Lock
HDFS NN Queue
HBase Queue
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
SLOtarget
34
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
+ SLO Policy Enabled
SLO target
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
35
SLOtarget
1
10
100
1000
0 5 10 15 20 25 300.1
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
+ SLO Policy Enabled
SLO target
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
35
SLOtarget
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.2
1 HBaseTable Scan
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.2
1 HBaseTable Scan
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.2
1 HBaseTable Scan
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.2
1 HBaseTable Scan
0.5
1 HDFS mkdir
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.2
1 HBaseTable Scan
0.5
1 HDFS mkdir
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.2
1 HBaseTable Scan
0.5
1 HDFS mkdir
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.5
1 HBaseCached Table Scan
0.2
1 HBaseTable Scan
0.5
1 HDFS mkdir
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.5
1 HBaseCached Table Scan
0.2
1 HBaseTable Scan
0.5
1 HDFS mkdir
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
0.1
1
10
0 5 10 15 20 25 30
SLO
-No
rma
lize
d L
ate
ncy
Time [minutes]
HDFS read 8k
HBase read 1 cached row
0.5
1 HBaseCached Table Scan
0.2
1 HBaseTable Scan
0.5
1 HDFS mkdir
HBase read 1 row
HBaseCached Table Scan
HDFS mkdir
HBaseTable Scan
36
Conclusion
37
Resource management for shareddistributed systems
Conclusion
37
Resource management for shareddistributed systems
Centralized resource management
Conclusion
37
Resource management for shareddistributed systems
Centralized resource management
Conclusion
Comprehensive: resources, processes, tenants, background tasks
37
Resource management for shareddistributed systems
Centralized resource management
Conclusion
Comprehensive: resources, processes, tenants, background tasks
Abstractions for writing concise, general-purpose policies:
WorkflowsResources (slowdown, load)Control points
37
Resource management for shareddistributed systems
Centralized resource management
Conclusion
http://cs.brown.edu/~jcmace
Comprehensive: resources, processes, tenants, background tasks
Abstractions for writing concise, general-purpose policies:
WorkflowsResources (slowdown, load)Control points