balancing fairness and efficiency in tiered storage ... · talk outline ! motivation !...
TRANSCRIPT
Balancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation
Hui Wang, Peter Varman Rice University
FAST’14, Feb 2014
Tiered Storage
v Tiered storage: HDs and SSDs q Advantages:
} Performance } Cost
q Challenges: } Fair resource allocation } High system efficiency
¨ Variable system throughput
2
Tiered Storage Model
3
} Clients: Make requests to SSD (hit) and HD (miss) in certain ratio } Scheduler: Aware of the request target, dispatches requests to storage } Storage: SSD and HD independent, without frequent data migrations
Fairness and Efficiency in Tiered Storage
v How do we define fairness? q How to define fairness for multiple resources? q Fair allocation may cause low efficiency
v How to improve efficiency of both devices? q Only focusing on efficiency may cause unfairness
4
Existing Solutions for QoS Scheduling
v Proportional sharing in storage / IO scheduling q Extended from networks and CPU scheduling q Additional Reservation and Limit controls q All of them are designed for a single resource!
v Dominant Resource Fairness Model (DRF) [NSDI’11]
q Designed for allocating multiple resources q DRF does not explicitly address system utilization
5
Talk Outline v Motivation v Bottleneck-Aware Allocation (BAA) v Evaluation v Conclusions and future work
6
Example: Single Device Type v Configuration:
q Single HD with capacity 100 IOPS; q Two clients with equal weights
} Fully backlogged, Work-conserving
q Proportional sharing
v Results: q Each gets 50 IOPS q Utilization 100%
v Device can be fully utilized for any allocation ratio
7
50 IOPS
50 IOPS
HD
100%
100 IOPS
What if there are multiple resources?
8
Example: Multiple Devices (Fairness)
v Natural policy: Weighted Fair Queuing v Configuration:
} HD capacity 100 IOPS, SSD 500 IOPS; } Two clients: h1 = 0.9, h2 = 0.5; } Conventional WFQ 1:1
v Results: } Each gets 167 IOPS } Utilization of HD = 100%, but SSD only 47%
v Simply transferring WFQ to multiple resources will have efficiency problem!
9
16.7 IOPS
83.3 IOPS
HD
150 IOPS
SSD
83.3 IOPS
100% 47%
(Capacity Normalized)
500 IOPS 100 IOPS
IDLE
Example: Multiple Devices (Efficiency)
v Configuration: } HD capacity 100 IOPS, SSD 500 IOPS; } Two clients h1 = 0.9, h2 = 0.5;
v Results: } Utilization 100% } Client 1 gets 500 IOPS } Client 2 gets 100 IOPS
v It is not possible to precisely assign both the relative allocations (fairness) and the system utilization (efficiency).
10
50 IOPS
50 IOPS
HD
450 IOPS
SSD
100% 100%
50 IOPS
500 IOPS 100 IOPS
(Normalized)
DRF (Dominant Resource Fairness) v Configuration:
} HD 100 IOPS } SSD 500 IOPS } Two clients
¨ h1 = 0.9 (dominant resource SSD) ¨ h2 = 0.5 (dominant resource HD)
v What will DRF do? q Equalize dominant shares
11
36 IOPS
64 IOPS
HD
324 IOPS
SSD
100% 77%
64 IOPS
64%
64%
(Normalized)
IDLE
DRF v Not addressing efficiency
q Add a third client h3 = 0.1 q Utilization further reduced to 48% q Worse if more clients bottlenecked
on HD
12
500 IOPS 100 IOPS
22 IOPS
39 IOPS
HD
196 IOPS
SSD
48%
5 IOPS
39%
39%
39 IOPS
39 IOPS
39%
100%
IDLE
One More HD-bound Client
13
500 IOPS 100 IOPS
22 IOPS
39 IOPS
HD
196 IOPS
SSD
48%
5 IOPS
39%
39%
39 IOPS
39 IOPS
39%
100%
IDLE
36 IOPS
64 IOPS
HD
324 IOPS
SSD
100% 77%
64 IOPS
64%
64%
(Normalized)
IDLE
500 IOPS 100 IOPS
(Normalized)
Talk Outline
v Motivation v Bottleneck-Aware Allocation (BAA) v Evaluation v Conclusions and future work
14
Fair Shares v Fair Share of a client
q IOPS it would get if each resource was partitioned equally among the clients
v Two devices (150 IOPS and 300 IOPS)
} Client 1: h1 = 4/9 } Client 2: h2 = 4/9 } Client 3: h3 = 5/6
15
1/3
1/3
1/3
? IOPS
? IOPS
HD
? IOPS
SSD
? IOPS
? IOPS ? IOPS
150 IOPS 300 IOPS
Fair Shares } Client 1: h1 = 4/9 } Client 2: h2 = 4/9 } Client 3: h3 = 5/6
v Fair share ( ):
} Client 1: 90 IOPS } Client 2: 90 IOPS } Client 3: 120 IOPS
} Depends only on client’s hit ratio and capacities of the devices
16
1/3
1/3
1/3
50 IOPS
20 IOPS
HD
40 IOPS
SSD
100 IOPS
50 IOPS 40 IOPS
150 IOPS 300 IOPS
fi
Fairness Policy v Allocate in the ratio of fair shares ?
q Fair share reflects what a client would get if running alone
v Problem
q Throttling across devices similar to DRF example
v Solution
q Bottleneck-aware allocation
17
Bottleneck-Aware Allocation v Bottleneck Sets
q Define load-balancing point q If : in HD-bottleneck Set (D) q If : in SSD-bottleneck Set (S)
18
hi ≤ hbalhi > hbal
hbal =Cs / (Cs +Cd )
Fairness Requirements of BAA v Sharing Incentive (SI)
q No client gets less IOPS than it would from equally partitioning each resource
v Envy-Freedom (EF) q Clients prefer their own allocation over the allocation of any other
client
v Local Fair Share Ratio q Clients belong to the same bottleneck set get IOPS in proportion to
their fair shares
19
Bottleneck-Aware Allocation v Maximize system throughput v Satisfy fairness requirements
20
Solution Space Satisfying All Properties
v BAA will match SI and EF of DRF v Get better or same utilization than DRF
21
BAA search area
Local Fair Share Ratio
DRF
Envy Free
Sharing Incentive
Fairness Constraints of BAA v Fairness between clients in D: v Fairness between clients in S:
v Fairness between a client in D and a client in S:
}
q constraints
22
Optimization for Allocation (2-variable LP)
23
(1)
(2)
(3)
(4)
Talk Outline v Motivation v Bottleneck-Aware Allocation (BAA) v Evaluation v Conclusions and future work
24
Evaluation v Simulation
q Evaluate BAA’s efficiency q Evaluate BAA’s dynamic behavior when workload changes
v Linux q Prototype by interposing BAA scheduler in the IO path q Evaluate BAA’s efficiency, fairness (SI and EF)
25
Simulation (Efficiency - 2 clients)
v Two clients: h1 = 0.5; h2 = 0.95
v Two devices: q HD= 100 IOPS; SSD = 5000 IOPS
26
} SSD Utilization: } FQ: 7% } DRF: 65% } BAA: 100%
Simulation (Efficiency - 3 clients)
27
} A third client: h3 = 0.8 } SSD Utilization: } FQ: 6% } DRF: 45% } BAA: 71% (bounded by fairness)
Simulation (Dynamic Behavior)
v Two clients q h1 = 0.45, 0.2 (after 510s) q h2 = 0.95
v Two devices: q HD= 200 IOPS q SSD = 3000 IOPS
v The utilization is pulled back high after a short period
28
Linux (Efficiency-Throughput)
v Two clients: q Financial workload (h1= 0.3) q Exchange workload (h2 = 0.95)
29
} Total throughputs: } BAA: 1396 IOPS } DRF: 810 IOPS } CFQ: 1011 IOPS
Linux (Efficiency-Utilization)
v The average utilization: v BAA (HD 94% and SSD 92%), v DRF (HD 99% and SSD 78%), CFQ (HD 99.8% and SSD 83%)
30
Linux (Fairness – Sharing Incentive)
v Four financial clients } h1=0.2 (D Set) } h2=0.4 (D Set) } h3= 0.98 (S Set) } h4 =1.0 (S Set)
v Every client receives at least its fair share.
q Proportional to fair share
31
1
10
100
1000
10000
Client 1 Client 2 Client 3 Client 4
IOP
S
Fair Share Throughput
Linux (Fairness – Envy freedom)
1
10
100
1000
10000
Client 1 Client 2 Client 3 Client 4
IOP
S
HD SSD
32
v No one envies others’ allocation } No one get higher allocation
on all devices } D set: Higher HD allocation } S set: Higher SSD allocation
Talk Outline v Motivation v Bottleneck-Aware Allocation (BAA) v Evaluation v Conclusions and future work
33
Conclusions and Future Work v A new model (BAA) to balance fairness and efficiency
q Fairness: } Sharing Incentive } Envy free } Local Fair Share
q Efficiency: } Maximize utilization subject to fairness constraints
34
Ongoing Work v Apply BAA for broader multi-resource allocation
q CPU, Memory, Networks
v Other fairness policies q Cost, reservations
v Cache model q SSD as a cache of HD q Data migration
35
36