balancing fairness and efficiency in tiered storage ... · talk outline ! motivation !...

Balancing Fairness and Efficiency in Tiered Storage Systems with Bottleneck-Aware Allocation

Hui Wang, Peter Varman Rice University

FAST’14, Feb 2014

Tiered Storage

v  Tiered storage: HDs and SSDs q Advantages:

}  Performance }  Cost

q Challenges: }  Fair resource allocation }  High system efficiency

¨ Variable system throughput

2

Tiered Storage Model

3

}  Clients: Make requests to SSD (hit) and HD (miss) in certain ratio }  Scheduler: Aware of the request target, dispatches requests to storage }  Storage: SSD and HD independent, without frequent data migrations

Fairness and Efficiency in Tiered Storage

v  How do we define fairness? q How to define fairness for multiple resources? q  Fair allocation may cause low efficiency

v  How to improve efficiency of both devices? q Only focusing on efficiency may cause unfairness

4

Existing Solutions for QoS Scheduling

v  Proportional sharing in storage / IO scheduling q  Extended from networks and CPU scheduling q Additional Reservation and Limit controls q All of them are designed for a single resource!

v  Dominant Resource Fairness Model (DRF) [NSDI’11]

q Designed for allocating multiple resources q DRF does not explicitly address system utilization

5

Talk Outline v  Motivation v  Bottleneck-Aware Allocation (BAA) v  Evaluation v  Conclusions and future work

6

Example: Single Device Type v  Configuration:

q  Single HD with capacity 100 IOPS; q Two clients with equal weights

}  Fully backlogged, Work-conserving

q  Proportional sharing

v  Results: q  Each gets 50 IOPS q Utilization 100%

v  Device can be fully utilized for any allocation ratio

7

50 IOPS

50 IOPS

HD

100%

100 IOPS

What if there are multiple resources?

8

Example: Multiple Devices (Fairness)

v  Natural policy: Weighted Fair Queuing v  Configuration:

}  HD capacity 100 IOPS, SSD 500 IOPS; }  Two clients: h1 = 0.9, h2 = 0.5; }  Conventional WFQ 1:1

v  Results: }  Each gets 167 IOPS }  Utilization of HD = 100%, but SSD only 47%

v  Simply transferring WFQ to multiple resources will have efficiency problem!

9

16.7 IOPS

83.3 IOPS

HD

150 IOPS

SSD

83.3 IOPS

100% 47%

(Capacity Normalized)

500 IOPS 100 IOPS

IDLE

Example: Multiple Devices (Efficiency)

v  Configuration: }  HD capacity 100 IOPS, SSD 500 IOPS; }  Two clients h1 = 0.9, h2 = 0.5;

v  Results: }  Utilization 100% }  Client 1 gets 500 IOPS }  Client 2 gets 100 IOPS

v  It is not possible to precisely assign both the relative allocations (fairness) and the system utilization (efficiency).

10

50 IOPS

50 IOPS

HD

450 IOPS

SSD

100% 100%

50 IOPS

500 IOPS 100 IOPS

(Normalized)

DRF (Dominant Resource Fairness) v  Configuration:

}  HD 100 IOPS }  SSD 500 IOPS }  Two clients

¨  h1 = 0.9 (dominant resource SSD) ¨  h2 = 0.5 (dominant resource HD)

v  What will DRF do? q  Equalize dominant shares

11

36 IOPS

64 IOPS

HD

324 IOPS

SSD

100% 77%

64 IOPS

64%

64%

(Normalized)

IDLE

DRF v  Not addressing efficiency

q Add a third client h3 = 0.1 q Utilization further reduced to 48% q Worse if more clients bottlenecked

on HD

12

500 IOPS 100 IOPS

22 IOPS

39 IOPS

HD

196 IOPS

SSD

48%

5 IOPS

39%

39%

39 IOPS

39 IOPS

39%

100%

IDLE

One More HD-bound Client

13

500 IOPS 100 IOPS

22 IOPS

39 IOPS

HD

196 IOPS

SSD

48%

5 IOPS

39%

39%

39 IOPS

39 IOPS

39%

100%

IDLE

36 IOPS

64 IOPS

HD

324 IOPS

SSD

100% 77%

64 IOPS

64%

64%

(Normalized)

IDLE

500 IOPS 100 IOPS

(Normalized)

Talk Outline

v  Motivation v  Bottleneck-Aware Allocation (BAA) v  Evaluation v  Conclusions and future work

14

Fair Shares v  Fair Share of a client

q  IOPS it would get if each resource was partitioned equally among the clients

v  Two devices (150 IOPS and 300 IOPS)

}  Client 1: h1 = 4/9 }  Client 2: h2 = 4/9 }  Client 3: h3 = 5/6

15

1/3

1/3

1/3

? IOPS

? IOPS

HD

? IOPS

SSD

? IOPS

? IOPS ? IOPS

150 IOPS 300 IOPS

Fair Shares }  Client 1: h1 = 4/9 }  Client 2: h2 = 4/9 }  Client 3: h3 = 5/6

v  Fair share ( ):

}  Client 1: 90 IOPS }  Client 2: 90 IOPS }  Client 3: 120 IOPS

}  Depends only on client’s hit ratio and capacities of the devices

16

1/3

1/3

1/3

50 IOPS

20 IOPS

HD

40 IOPS

SSD

100 IOPS

50 IOPS 40 IOPS

150 IOPS 300 IOPS

fi

Fairness Policy v  Allocate in the ratio of fair shares ?

q  Fair share reflects what a client would get if running alone

v  Problem

q Throttling across devices similar to DRF example

v  Solution

q Bottleneck-aware allocation

17

Bottleneck-Aware Allocation v  Bottleneck Sets

q Define load-balancing point q  If : in HD-bottleneck Set (D) q  If : in SSD-bottleneck Set (S)

18

hi ≤ hbalhi > hbal

hbal =Cs / (Cs +Cd )

Fairness Requirements of BAA v  Sharing Incentive (SI)

q  No client gets less IOPS than it would from equally partitioning each resource

v  Envy-Freedom (EF) q  Clients prefer their own allocation over the allocation of any other

client

v  Local Fair Share Ratio q  Clients belong to the same bottleneck set get IOPS in proportion to

their fair shares

19

Bottleneck-Aware Allocation v  Maximize system throughput v  Satisfy fairness requirements

20

Solution Space Satisfying All Properties

v  BAA will match SI and EF of DRF v  Get better or same utilization than DRF

21

BAA search area

Local Fair Share Ratio

DRF

Envy Free

Sharing Incentive

Fairness Constraints of BAA v  Fairness between clients in D: v  Fairness between clients in S:

v  Fairness between a client in D and a client in S:

} 

q  constraints

22

Optimization for Allocation (2-variable LP)

23

(1)

(2)

(3)

(4)


24

Evaluation v  Simulation

q  Evaluate BAA’s efficiency q  Evaluate BAA’s dynamic behavior when workload changes

v  Linux q Prototype by interposing BAA scheduler in the IO path q  Evaluate BAA’s efficiency, fairness (SI and EF)

25

Simulation (Efficiency - 2 clients)

v  Two clients: h1 = 0.5; h2 = 0.95

v  Two devices: q  HD= 100 IOPS; SSD = 5000 IOPS

26

}  SSD Utilization: }  FQ: 7% }  DRF: 65% }  BAA: 100%

Simulation (Efficiency - 3 clients)

27

}  A third client: h3 = 0.8 }  SSD Utilization: }  FQ: 6% }  DRF: 45% }  BAA: 71% (bounded by fairness)

Simulation (Dynamic Behavior)

v  Two clients q  h1 = 0.45, 0.2 (after 510s) q  h2 = 0.95

v  Two devices: q  HD= 200 IOPS q  SSD = 3000 IOPS

v  The utilization is pulled back high after a short period

28

Linux (Efficiency-Throughput)

v  Two clients: q  Financial workload (h1= 0.3) q  Exchange workload (h2 = 0.95)

29

}  Total throughputs: }  BAA: 1396 IOPS }  DRF: 810 IOPS }  CFQ: 1011 IOPS

Linux (Efficiency-Utilization)

v  The average utilization: v  BAA (HD 94% and SSD 92%), v  DRF (HD 99% and SSD 78%), CFQ (HD 99.8% and SSD 83%)

30

Linux (Fairness – Sharing Incentive)

v  Four financial clients }  h1=0.2 (D Set) }  h2=0.4 (D Set) }  h3= 0.98 (S Set) }  h4 =1.0 (S Set)

v  Every client receives at least its fair share.

q  Proportional to fair share

31

1

10

100

1000

10000

Client 1 Client 2 Client 3 Client 4

IOP

S

Fair Share Throughput

Linux (Fairness – Envy freedom)

1

10

100

1000

10000

Client 1 Client 2 Client 3 Client 4

IOP

S

HD SSD

32

v No one envies others’ allocation }  No one get higher allocation

on all devices }  D set: Higher HD allocation }  S set: Higher SSD allocation


33

Conclusions and Future Work v  A new model (BAA) to balance fairness and efficiency

q  Fairness: }  Sharing Incentive }  Envy free }  Local Fair Share

q  Efficiency: }  Maximize utilization subject to fairness constraints

34

Ongoing Work v  Apply BAA for broader multi-resource allocation

q CPU, Memory, Networks

v  Other fairness policies q Cost, reservations

v  Cache model q  SSD as a cache of HD q Data migration

35

balancing fairness and efficiency in tiered storage ... · talk outline ! motivation !...

Documents