load balancing tasks with overlapping requirements milan vojnovic microsoft research joint work with...
TRANSCRIPT
Load Balancing Tasks with Overlapping Requirements
Milan VojnovicMicrosoft Research
Joint work with Dan Alistarh, Christos Gkantsidis, Jennifer Iglesias, Bo Zong
2
Motivating Application Scenario: Stream Processing Platforms
3
Tasks and Requirements
4
5
Problem #1: Bi-Criteria Load Balancing
Query Assignment Problem:
• Find an assignment of tasks to machines that
Criteria 1: minimizes the total number of distinct requirements that need to be supplied to machines
Criteria 2: the number of tasks assigned over machines is balanced
6
Problem #2: Min-Max Load Balancing
Query Assignment Problem:
• Find an assignment of tasks to machines that minimizes the maximum number of distinct requirements needed by a machine
7
Other Motivating Application Scenarios• Scheduling tasks in distributed clusters of machines with data locality
• …
• Beyond resource allocation in data centres:
• Clustering of information objects (documents, images, videos)
• Summarizing topics for collections of documents
• …
8
Related Work
Standard load balancing• Identical machines Graham-1996• Related machines Aspnes et al-1993, Cho and Sahni-
1988• Restricted machines Azar et al-1992• Unrelated machines Aspnes et al-1993• Routing Aspnes et al-1993
Min-max multiway cut Bansal et al-2014Svitkina and Tardos 2004
9
Problem #1: Bi-Criteria Load Balancing
Minimize
subject to
for
set of requirements set of tasks 𝑓 (𝑄′ )=∑
𝑠∈𝑆
𝑤 (𝑠 )1 (𝑠 requiredby some 𝑞∈𝑄 ′)𝑆𝑞⊆𝑆 , for every q∈𝑄
10
NP Hardness
• Query Assignment Problem is NP-complete
Proof: Reduction from the well known bin packing problem
11
Random Query Assignment
• Maximum number of tasks per machine:
with probability
[Raab and Steger, 1998]
• The expected number of requirements needed by the machines:
= number of tasks needing requirement
12
Deficiency of Random Query Assignment
𝑛/ 𝑙
𝑛/ 𝑙
𝑛/ 𝑙
𝑚/ 𝑙
𝑚/ 𝑙
𝑚/ 𝑙
• Expected number of needed requirements:
as
• Optimal:
13
Special Case: Tasks with Singleton Requirements
• There exists a polynomial-time algorithm that guarantees 2-approximation for singleton task requirements with arbitrary weights
14
Algorithm
15
Tasks with Arbitrary Sets of Requirements• For unit-weight requirements, there exists a polynomial algorithm
with approximation ratio
where is maximum number of requirements of a task
• For arbitrary-weight requirements, the same approximation ratio holds but with an extra factor: the ratio of the max to the min weight
16
Gadget: Minimum Task Type Packing
• Given a set of requirements , a set of tasks , and a real number • Find a subset of query types that minimizes
subject to
17
Algorithm
1. Pick an empty machine2. Find a subset of query types that approximately solves MQP problem
with parameter
3. Let be the subset of unassigned queries of type in 4. If then apply a pruning procedure5. If there are unassigned queries, go to 1
18
Experimental Evaluation
• Random bipartite graph for subscriptions of tasks to requirements• Number of tasks per requirement according to a Zipf distribution ()• Number of requirements per task fixed to a constant
• Metric: replication factor
= total number of needed requirements / m
19
Offline Algorithms
• MQP = defined in an earlier slide• OffRand = uniform random assignment of a query type to a machine• IC = Incremental cost• MMS = Min-max traffic cost per machine
20
Performance of Offline Algorithms
Number of requirements per task
21
Online Task Assignment
• LeastCost
• LeastSource
• LeastQT
22
Performance of Online Algorithms
Number of requirements per task
23
Problem #2: Min-Max Load Balancing
Minimize
subject to
24
Online Task Assignment
• At each arrival of task
• Compute for every
• Assign task to machine in
25
Hidden Co-Clustering Input
26
Recovery Theorem
• Suppose and
There exists an online assignment of tasks that guarantees asymptotic recovery of hidden clusters
Proof: coupling to a Polya’s urn process
Asymptotic recovery: portion of tasks from the same hidden cluster of tasks that is assigned to the same bin goes to 1 for asymptotically large number of tasks
27
Experimental Evaluation
• Dataset
• Greedy• Random = random task arrival• Decreasing with respect to the number of requirements
• Balance big = large tasks to least loaded, small items according to greedy• Prefer big = large tasks to least loaded, delayed assignment of up to a fixed number of
small tasks
28
Retail dataset
29
Conclusion
• Studied two variants of non-standard load balancing problems• Bi-criteria and min-max
• Approximation ratios for offline problems• Hidden clustering recovery conditions for a simple greedy online task
assignment strategy• Open questions:• Tighter approximation ratios for offline versions of both problems?• Similar hidden cluster recover questions (allowing for more memory)?