a gpu algorithm design for the resource constrained project scheduling problem

17
PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem Přemysl Šůcha - the CTU in Prague A GPU algorithm design for the Resource Constrained Project Scheduling Problem Libor Bukata and Přemysl Šůcha {bukatlib,suchap}@fel.cvut.cz The Czech Technical University in Prague

Upload: erica

Post on 22-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

A GPU algorithm design for the Resource Constrained Project Scheduling Problem. Libor Bukata and Přemysl Šůcha { bukatlib,suchap }@ fel.cvut.cz The Czech Technical University in Prague. Motivation. Our motivation is to use power of the GPU to solve combinatorial problems. Existing works: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A GPU algorithm design for the  Resource Constrained Project Scheduling Problem

PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Přemysl Šůcha - the CTU in Prague 1/18

A GPU algorithm design for the Resource Constrained Project

Scheduling Problem

Libor Bukata and Přemysl Šůcha{bukatlib,suchap}@fel.cvut.cz

The Czech Technical University in Prague

Page 2: A GPU algorithm design for the  Resource Constrained Project Scheduling Problem

PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Přemysl Šůcha - the CTU in Prague 2/18

Motivation• Our motivation is to use power of the GPU to solve

combinatorial problems.• Existing works:

– [1] M. Czapinski and S. Barnes, “Tabu Search with two approaches to parallel flowshop evaluation on CUDA platform,” J. Parallel Distrib. Comput., vol. 71, pp. 802–811, June 2011.

– [2] V. Boyer, D. El-Baz, and M. Elkihel, “Solving knapsack problems on GPU,” Computers & Operations Research, vol. 39, no. 1, pp. 42–47, 2012.

• We tackle more complex combinatorial problem than [1,2].

• We are focused on homogeneous model.

Page 3: A GPU algorithm design for the  Resource Constrained Project Scheduling Problem

PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Přemysl Šůcha - the CTU in Prague 3/18

Outline

• Problem Statement (RCPSP)• Sequential Solution (Tabu Search Algorithm)• Parallelization• Parallelization on the Nvidia CUDA Framework• Experimental Results• Conclusions

Page 4: A GPU algorithm design for the  Resource Constrained Project Scheduling Problem

PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Přemysl Šůcha - the CTU in Prague 4/18

Problem Statement• The Resource Constrained Project Scheduling Problem (RCPSP) is a

general scheduling problem.• It is one of the most important problem in project management,

manufacturing and production optimization.

• The problem is NP-hard since P2||Cmax is already NP-hard (two partitioning problem)

0

1 2

3

45

6

7

Page 5: A GPU algorithm design for the  Resource Constrained Project Scheduling Problem

PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Přemysl Šůcha - the CTU in Prague 5/18

Problem Statement• A set of N activities V = {0, … , N-1} with durations D =

(d0; … ; dN-1) : di ℤ+. Activity 0 is the first activity of the project and N-1 is the last one.

• Precedence among activities are given via a Direct Acyclic Graph G(V, E) where E is a set of edges such that (i, j) E.

0

1 2

3

45

6

7

Page 6: A GPU algorithm design for the  Resource Constrained Project Scheduling Problem

PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Přemysl Šůcha - the CTU in Prague 6/18

Problem Statement

• A set of M renewable resources with capacities R = {R0, … , RM-1}, where Rk ℤ+.

• Activity resource requirement ri,k ℤ+.

0

1 2

3

45

6

7

13

56

R1

Cmax

4

3

2

14

t0 1 2 3 4 5 6

12

3

5 6R2 3

2

1

t0 1 2 3 4 5 6

Resource 1

Resource 2

Page 7: A GPU algorithm design for the  Resource Constrained Project Scheduling Problem

PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Přemysl Šůcha - the CTU in Prague 7/18

Problem Statement

• Schedule S is vector (s0, … , sN-1) of activities start time values si ℤ+ satisfying constraints of the mathematical model:

precedence constraints

resource constraints

objective function

Page 8: A GPU algorithm design for the  Resource Constrained Project Scheduling Problem

PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Přemysl Šůcha - the CTU in Prague 9/18

The Tabu Search Algorithm for the RCPSP

• The RCPSP can be solved via the meta-heuristic approach Tabu Search (TS)

• l = 0; Find an initial solution Wl W (a topological order); Wbest = Wl.• While (l < L)

– Determine W (Wl) neighborhood of Wl.– Eliminate infeasible solutions W (Wl) -> W ‘(Wl)– Compute Cmax(Wnext) of solution Wnext W ‘(Wl).– Assign Wl+1 = arg min Cmax(Wnext) : Wnext TL– TL = TL Wl+1;– If Cmax(Wbest) > Cmax(Wl+1) then Wl+1 -> Wbest.– If the solution was not improved during the given number of iterations perform

diversification of Wl+1

– l++• Return Wbest

Page 9: A GPU algorithm design for the  Resource Constrained Project Scheduling Problem

PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Přemysl Šůcha - the CTU in Prague 10/18

Representation of the Solution• The solution represented by

vector of start time values (s0, … , sN-1) results in a huge solution space.

• That is the reason why we selected the order of activities W = (w0, … , wN-1) as the solution representation, e.g. (1,5,6,3,4,2)

13

56

R1

Cmax

4

3

2

14

t0 1 2 3 4 5 6

12

3

5 6R2 3

2

1

t0 1 2 3 4 5 6

Page 10: A GPU algorithm design for the  Resource Constrained Project Scheduling Problem

PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Přemysl Šůcha - the CTU in Prague 11/18

The Neighborhood of the Solution

• Neighborhood W (Wl) is a set of solutions obtained by applying all possible swap operators to Wl .

• A swap operator exchanges two activities in Wl.• For example swap(3,7):

(1,5,2,3,4,6) (1,5,6,3,4,2)

0 1 2 3 4 5 6

12

3

5 6R2 3

2

1

t0 1 2 3 4 5 6

12

3

5 6R2 3

2

1

t

CmaxCmax

Page 11: A GPU algorithm design for the  Resource Constrained Project Scheduling Problem

PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Přemysl Šůcha - the CTU in Prague 12/18

TS Parallelization on the GPU• Parallelization was inspired by [3].

– There is a set of independent solutions.– Each CPU thread tries to improve an assigned solution until the given

number of iterations is reached.– Each thread processes solutions one by one.– Access is controlled via atomic operations.

• [3] T. James, C. Rego, and F. Glover, “A cooperative parallel tabu search algorithm for the quadratic assignment problem,” European Journal of Operational Research, vol. 195, no. 3, pp. 810 – 826, 2009.

Wbest, Cmax

best, TLbest

W1, Cmax

1, TL1

W2, Cmax

2, TL2

W3, Cmax

3, TL3

…WB, Cmax

B, TLB

solution

makespan

Tabu List

Page 12: A GPU algorithm design for the  Resource Constrained Project Scheduling Problem

PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Přemysl Šůcha - the CTU in Prague 13/18

CUDA Mapping

• Each CUDA block executes an independent TS algorithm

• A thread processes one or more solution(s) in the neighborhood of the solution (elimination of infeasible solutions and Cmax(Wnext) computation).

Wbest, Cmax

best, TLbest

W1, Cmax

1, TL1

W2, Cmax

2, TL2

W3, Cmax

3, TL3

…WB, Cmax

B, TLB

Block 0 Block 1 Block 2 Block 3 Block 4 Block 5 … Block 27

Page 13: A GPU algorithm design for the  Resource Constrained Project Scheduling Problem

PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Přemysl Šůcha - the CTU in Prague 14/18

Global memory

Block 0

CUDA Mapping

Wbest, Cmax

best, TLbest

W1, Cmax

1, TL1

W2, Cmax

2, TL2

…WB, Cmax

B, TLB

Shared memorycurrent solution W

precedence constraints

durations of activities D

Registershelpervariables

Texture memoryrequired resources ri,k

activities predecessors

Local memoryArrays for evaluation of resources

Activities start time values

Block 27Shared memory

current solution W

precedence constraints

durations of activities D

Registershelpervariables

TL of Block 0…

TL of Block 27

Page 14: A GPU algorithm design for the  Resource Constrained Project Scheduling Problem

PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Přemysl Šůcha - the CTU in Prague 15/18

Implementation of the Tabu List• TL is stored in the global memory – access needs to be accelerated.• TLC (Tabu List Cache) is a 2D dimensional array of Boolean values.• Test whether a move is in the TL can be performed by a single read operation.

swap(1,3) swap(5,7) swap(1,7)

X T T

X X

X X X

X X X X

X X X X X T

X X X X X X

X X X X X X X

X X X X X X X X

Add new move to TL:1. (iold, jold) = TL[index]2. TC[iold, jold] = false3. TL[index]= (i, j)4. TC[i, j] = true5. index = (index + 1)% |TL|

TL:

TLC:

Page 15: A GPU algorithm design for the  Resource Constrained Project Scheduling Problem

PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Přemysl Šůcha - the CTU in Prague 16/18

Computation of Cmax

• The goal is to minimize memory consumption.• Activities are added into the schedule one by one according to Wl

taking into account precedence constraints and resource constraints.

0 1 2 3 4 5 6 7 8

7

6

5

4

3

2

1

i+2

+2

+3

+1

+1

di = 3

si si + di

t

Rk

The earliest start time when activity i with ri,k = 3 can be

executed.

Page 16: A GPU algorithm design for the  Resource Constrained Project Scheduling Problem

PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Přemysl Šůcha - the CTU in Prague 17/18

Experimental Results

• Experiments were performed on the Intel Xeon 2.66 GHz server and Nvidia Tesla 2050C (448 CUDA cores, 14 multiprocessors) graphics card.

• J120 benchmark instances (600 projects with 120 activities) were used for performance measurements.

• The GPU algorithm tests 1.8 106 solutions per second in average.

• GPU is able to perform the same number of iterations 55 times faster than the CPU.

Page 17: A GPU algorithm design for the  Resource Constrained Project Scheduling Problem

PDP 2013 A GPU algorithm design for the Resource Constrained Project Scheduling Problem

Přemysl Šůcha - the CTU in Prague 18/18

Conclusions• The first known GPU algorithm solving the RCPSP.• Compared to [1] we propose a more efficient TL (Tabu List

cache).• The algorithm for the schedule evaluation is suitable for

the GPU (low memory requirements).• The homogenous model reduces required communication

bandwidth between the CPU and the GPU.

• [1] M. Czapinski and S. Barnes, “Tabu Search with two approaches to parallel flowshop evaluation on CUDA platform,” J. Parallel Distrib. Comput., vol. 71, pp. 802–811, June 2011.