a gpu algorithm design for the resource constrained project scheduling problem
DESCRIPTION
A GPU algorithm design for the Resource Constrained Project Scheduling Problem. Libor Bukata and Přemysl Šůcha { bukatlib,suchap }@ fel.cvut.cz The Czech Technical University in Prague. Motivation. Our motivation is to use power of the GPU to solve combinatorial problems. Existing works: - PowerPoint PPT PresentationTRANSCRIPT
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem
Přemysl Šůcha - the CTU in Prague 1/18
A GPU algorithm design for the Resource Constrained Project
Scheduling Problem
Libor Bukata and Přemysl Šůcha{bukatlib,suchap}@fel.cvut.cz
The Czech Technical University in Prague
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem
Přemysl Šůcha - the CTU in Prague 2/18
Motivation• Our motivation is to use power of the GPU to solve
combinatorial problems.• Existing works:
– [1] M. Czapinski and S. Barnes, “Tabu Search with two approaches to parallel flowshop evaluation on CUDA platform,” J. Parallel Distrib. Comput., vol. 71, pp. 802–811, June 2011.
– [2] V. Boyer, D. El-Baz, and M. Elkihel, “Solving knapsack problems on GPU,” Computers & Operations Research, vol. 39, no. 1, pp. 42–47, 2012.
• We tackle more complex combinatorial problem than [1,2].
• We are focused on homogeneous model.
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem
Přemysl Šůcha - the CTU in Prague 3/18
Outline
• Problem Statement (RCPSP)• Sequential Solution (Tabu Search Algorithm)• Parallelization• Parallelization on the Nvidia CUDA Framework• Experimental Results• Conclusions
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem
Přemysl Šůcha - the CTU in Prague 4/18
Problem Statement• The Resource Constrained Project Scheduling Problem (RCPSP) is a
general scheduling problem.• It is one of the most important problem in project management,
manufacturing and production optimization.
• The problem is NP-hard since P2||Cmax is already NP-hard (two partitioning problem)
0
1 2
3
45
6
7
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem
Přemysl Šůcha - the CTU in Prague 5/18
Problem Statement• A set of N activities V = {0, … , N-1} with durations D =
(d0; … ; dN-1) : di ℤ+. Activity 0 is the first activity of the project and N-1 is the last one.
• Precedence among activities are given via a Direct Acyclic Graph G(V, E) where E is a set of edges such that (i, j) E.
0
1 2
3
45
6
7
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem
Přemysl Šůcha - the CTU in Prague 6/18
Problem Statement
• A set of M renewable resources with capacities R = {R0, … , RM-1}, where Rk ℤ+.
• Activity resource requirement ri,k ℤ+.
0
1 2
3
45
6
7
13
56
R1
Cmax
4
3
2
14
t0 1 2 3 4 5 6
12
3
5 6R2 3
2
1
t0 1 2 3 4 5 6
Resource 1
Resource 2
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem
Přemysl Šůcha - the CTU in Prague 7/18
Problem Statement
• Schedule S is vector (s0, … , sN-1) of activities start time values si ℤ+ satisfying constraints of the mathematical model:
precedence constraints
resource constraints
objective function
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem
Přemysl Šůcha - the CTU in Prague 9/18
The Tabu Search Algorithm for the RCPSP
• The RCPSP can be solved via the meta-heuristic approach Tabu Search (TS)
• l = 0; Find an initial solution Wl W (a topological order); Wbest = Wl.• While (l < L)
– Determine W (Wl) neighborhood of Wl.– Eliminate infeasible solutions W (Wl) -> W ‘(Wl)– Compute Cmax(Wnext) of solution Wnext W ‘(Wl).– Assign Wl+1 = arg min Cmax(Wnext) : Wnext TL– TL = TL Wl+1;– If Cmax(Wbest) > Cmax(Wl+1) then Wl+1 -> Wbest.– If the solution was not improved during the given number of iterations perform
diversification of Wl+1
– l++• Return Wbest
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem
Přemysl Šůcha - the CTU in Prague 10/18
Representation of the Solution• The solution represented by
vector of start time values (s0, … , sN-1) results in a huge solution space.
• That is the reason why we selected the order of activities W = (w0, … , wN-1) as the solution representation, e.g. (1,5,6,3,4,2)
13
56
R1
Cmax
4
3
2
14
t0 1 2 3 4 5 6
12
3
5 6R2 3
2
1
t0 1 2 3 4 5 6
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem
Přemysl Šůcha - the CTU in Prague 11/18
The Neighborhood of the Solution
• Neighborhood W (Wl) is a set of solutions obtained by applying all possible swap operators to Wl .
• A swap operator exchanges two activities in Wl.• For example swap(3,7):
(1,5,2,3,4,6) (1,5,6,3,4,2)
0 1 2 3 4 5 6
12
3
5 6R2 3
2
1
t0 1 2 3 4 5 6
12
3
5 6R2 3
2
1
t
CmaxCmax
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem
Přemysl Šůcha - the CTU in Prague 12/18
TS Parallelization on the GPU• Parallelization was inspired by [3].
– There is a set of independent solutions.– Each CPU thread tries to improve an assigned solution until the given
number of iterations is reached.– Each thread processes solutions one by one.– Access is controlled via atomic operations.
• [3] T. James, C. Rego, and F. Glover, “A cooperative parallel tabu search algorithm for the quadratic assignment problem,” European Journal of Operational Research, vol. 195, no. 3, pp. 810 – 826, 2009.
Wbest, Cmax
best, TLbest
W1, Cmax
1, TL1
W2, Cmax
2, TL2
W3, Cmax
3, TL3
…WB, Cmax
B, TLB
solution
makespan
Tabu List
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem
Přemysl Šůcha - the CTU in Prague 13/18
CUDA Mapping
• Each CUDA block executes an independent TS algorithm
• A thread processes one or more solution(s) in the neighborhood of the solution (elimination of infeasible solutions and Cmax(Wnext) computation).
Wbest, Cmax
best, TLbest
W1, Cmax
1, TL1
W2, Cmax
2, TL2
W3, Cmax
3, TL3
…WB, Cmax
B, TLB
Block 0 Block 1 Block 2 Block 3 Block 4 Block 5 … Block 27
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem
Přemysl Šůcha - the CTU in Prague 14/18
Global memory
Block 0
CUDA Mapping
Wbest, Cmax
best, TLbest
W1, Cmax
1, TL1
W2, Cmax
2, TL2
…WB, Cmax
B, TLB
Shared memorycurrent solution W
precedence constraints
durations of activities D
Registershelpervariables
Texture memoryrequired resources ri,k
activities predecessors
Local memoryArrays for evaluation of resources
Activities start time values
Block 27Shared memory
current solution W
precedence constraints
durations of activities D
Registershelpervariables
…
TL of Block 0…
TL of Block 27
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem
Přemysl Šůcha - the CTU in Prague 15/18
Implementation of the Tabu List• TL is stored in the global memory – access needs to be accelerated.• TLC (Tabu List Cache) is a 2D dimensional array of Boolean values.• Test whether a move is in the TL can be performed by a single read operation.
swap(1,3) swap(5,7) swap(1,7)
X T T
X X
X X X
X X X X
X X X X X T
X X X X X X
X X X X X X X
X X X X X X X X
Add new move to TL:1. (iold, jold) = TL[index]2. TC[iold, jold] = false3. TL[index]= (i, j)4. TC[i, j] = true5. index = (index + 1)% |TL|
TL:
TLC:
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem
Přemysl Šůcha - the CTU in Prague 16/18
Computation of Cmax
• The goal is to minimize memory consumption.• Activities are added into the schedule one by one according to Wl
taking into account precedence constraints and resource constraints.
0 1 2 3 4 5 6 7 8
7
6
5
4
3
2
1
i+2
+2
+3
+1
+1
di = 3
si si + di
t
Rk
The earliest start time when activity i with ri,k = 3 can be
executed.
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem
Přemysl Šůcha - the CTU in Prague 17/18
Experimental Results
• Experiments were performed on the Intel Xeon 2.66 GHz server and Nvidia Tesla 2050C (448 CUDA cores, 14 multiprocessors) graphics card.
• J120 benchmark instances (600 projects with 120 activities) were used for performance measurements.
• The GPU algorithm tests 1.8 106 solutions per second in average.
• GPU is able to perform the same number of iterations 55 times faster than the CPU.
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem
Přemysl Šůcha - the CTU in Prague 18/18
Conclusions• The first known GPU algorithm solving the RCPSP.• Compared to [1] we propose a more efficient TL (Tabu List
cache).• The algorithm for the schedule evaluation is suitable for
the GPU (low memory requirements).• The homogenous model reduces required communication
bandwidth between the CPU and the GPU.
• [1] M. Czapinski and S. Barnes, “Tabu Search with two approaches to parallel flowshop evaluation on CUDA platform,” J. Parallel Distrib. Comput., vol. 71, pp. 802–811, June 2011.