a reinforcement learning approach for hybrid flexible flowline scheduling problems

A Reinforcement Learning Approach to SolvingHybrid Flexible Flowline Scheduling Problems

Bert Van Vreckem Dmitriy Borodin Wim De Bruyn AnnNowe

Authors

• Bert Van Vreckem, HoGent Business and InformationManagementbert.vanvreckem@hogent.be

• Dmitriy Borodin, OMPartnersdborodin@ompartners.com

• Wim De Bruyn, HoGent Business and InformationManagementwim.debruyn@hogent.be

• Ann Nowe, Artificial Intelligence Lab, Vrije Universiteit Brusselann.nowe@vub.ac.be

HFFSP MISTA2013: 29 August 2013 3/28

Contents

1 Hybrid Flexible Flowline Scheduling Problems

2 A Machine Learning Approach

3 Learning Permutations with Precedence Constraints

4 Experiments & results

5 Conclusion

Hybrid Flexible Flowline Scheduling Problems

Powerful model for complex real-life production schedulingproblems.In α/β/γ notation1:

HFFLm, ((RM(i))

(m)i=1/Mj , rm, prec, Siljk, Ailjk, lag/Cmax

Flowline Scheduling problems: jobs processed in consecutive stages.

Stage 1 Stage 2 Stage 3 Stage 4

1(Urlings, 2010)

Powerful model for complex real-life production schedulingproblems.In α/β/γ notation1:

HFFLm, ((RM(i))

(m)i=1/Mj , rm, prec, Siljk, Ailjk, lag/Cmax

Flowline Scheduling problems: jobs processed in consecutive stages.

Stage 1 Stage 2 Stage 3 Stage 4

1(Urlings, 2010)

Hybrid case: unrelated parallel machines

Flexible case: stages may be skipped

Other constraints: Machine eligibility

Other constraints: Time lag between stages

Stage 1

Stage 2

Stage 3

Stage 4

Other constraints: Sequence dependent setup times

1 2 3 4 5 6 7 8 9 10 11 12

J1 J2M1

J1 J2M2

J2 J1M1

J2 J1M2

1 2 3 4 5 6 7 8 9 10 11 12

J1 J2M1

J1 J2M2

J2 J1M1

J2 J1M2

1 2 3 4 5 6 7 8 9 10 11 12

J1 J2M1

J1 J2M2

J2 J1M1

J2 J1M2

Other constraints: Precendence relations between jobs

1 2 3 4 5 6 7 8 9 10 11 12

J1 J2M1

J1 J2M2

J2 J1M1

J2 J1M2

Precedence relations between jobs make the problem muchharder, in a way that MILP/CPLEX approach doesn’t workanymore for larger instances (Urlings, 2010)

Contents

5 Conclusion

A Machine Learning ApproachScheduling Hybrid Flexible Flowline Scheduling Problems

Two stages:

• Job permutations

→ Learning Automata

• Machine assignment

→ Earliest Preparation Next Stage(EPNS) (Urlings, 2010)

Two stages:

• Job permutations → Learning Automata

• Machine assignment

→ Earliest Preparation Next Stage(EPNS) (Urlings, 2010)

Two stages:

• Machine assignment → Earliest Preparation Next Stage(EPNS) (Urlings, 2010)

Two stages:

• Machine assignment → Earliest Preparation Next Stage(EPNS) (Urlings, 2010)

Reinforcement learningAt every discrete time step t:

• Agent percieves environment state s(t)

• Agent chooses action a(t) ∈ A = a1, . . . , an according tosome policy

• Environment places agent in new state s(t+ 1) and givesreinforcement r(t)

• Goal: learn policy that maximizes long term cumulativereward

∑t r(t)

Environment

Learning Automata (LA)

Reinforcement Learning agents that choose action according toprobability distribution p(t) = (p1(t), . . . , pn(t)), withpi = Prob[a(t) = ai] and s.t.

∑ni=1 pi = 1

pi(0) = 1n (1)

pi(t+ 1) = pi(t) +αrewr(t)(1− pi(t))−αpen(1− r(t))pi(t) (2)

if ai is the action taken at instant t

pj(t+ 1) = pj(t) −αrewr(t)pj(t)

+αpen(1− r(t))(

n− 1− pj(t)

if aj 6= ai

∑ni=1 pi = 1

pi(0) = 1n (1)

+αpen(1− r(t))(

n− 1− pj(t)

if aj 6= ai

∑ni=1 pi = 1

pi(0) = 1n (1)

+αpen(1− r(t))(

n− 1− pj(t)

if aj 6= ai

Learning Automaton update

1 2 3 40

E.g. action 3 was chosen

1 2 3 40

r(t) = 1

1 2 3 40

r(t) = 0

1 2 3 40

r(t) = 1

1 2 3 40

r(t) = 0

1 2 3 40

r(t) = 1

1 2 3 40

r(t) = 0

1 2 3 40

r(t) = 1

1 2 3 40

r(t) = 0

Contents

5 Conclusion

Probabilistic Basic Simple Strategy (PBSS)(Wauters, 2012)

• A LA is assigned to every position of a permutation

• LAs play a dispersion game to choose unique action, resultingin a permutation

• Quality of solution is evaluated

• Update probabilities according to LA update rule LinearReward-Inaction (αpen = 0):

• Better result than best one so far: r(t) = 1• If not, r(t) = 0

• Repeat until convergence

• Better result than best one so far: r(t) = 1

• If not, r(t) = 0

Probabilistic Basic Simple Strategy (PBSS)

• PBSS: great results in several optimization problems thatinvolve learning permutations

• but doesn’t work well when precedence constraints areinvolved

• PBSS only learns from positive experience (i.e. improving onprevious solutions)

• Doesn’t learn to avoid invalid permutations

Extending PBSS for precendence constraints

Updating probabilities:

• If the job permutation is invalid, perform an update withr(t) = 0 and αpen > 0 for all agents that are involved in theviolation of precedence constraints.

• If the job permutation is valid, perform a LR−I update in allagents, depending on the resulting makespan ms and bestmakespan until now msbest:

• improved: r(t) = 1;• equally good: r(t) = 1/2;• worse: r(t) = msbest

2ms ;• no valid schedule found: r(t) = 0;

• improved: r(t) = 1;

• equally good: r(t) = 1/2;• worse: r(t) = msbest

• improved: r(t) = 1;• equally good: r(t) = 1/2;

• worse: r(t) = msbest2ms ;

• no valid schedule found: r(t) = 0;

Contents

5 Conclusion

Experiments

• HFFSP Benchmark problems from (Ruiz et al., 2008)2

• problem sets with 5, 7, 9, 11, 13, 15 jobs, 96 instances in eachset

• + other constraints that make problems harder (precedencerelations!)

• αrew = 0.1; αpen = 0.5 (no tuning)

• Run until converges, or at most 300 seconds

2Available at http://soa.iti.es/problem-instances

ResultsInstance set 5 7 9 11 13 15 overallmean RD (%) 0.0697 2.0131 1.1568 1.6565 3.7294 7.9189 2.7484best RD (%) -35.70 -24.71 -26.92 -21.10 -43.34 -10.46 -43.34# improved 11 12 18 12 9 6 68# equal 62 40 19 18 8 7 154# worse 23 44 59 66 79 82 354

Contents

5 Conclusion

Results and Discussion

Contributions:

• Extension of PBSS for learning permutations with precedenceconstraints

• Simple model + RL approach can yield good quality resultsfor challenging HFFSP instances

Discussion & future work:

• Precedence relations do make the problem harder

• Parameter tuning

• Convergence

• Larger instances (50, 100 jobs)

• Explore possibilities for improvement in machine assignment

Results and Discussion

Contributions:

• Extension of PBSS for learning permutations with precedenceconstraints

• Simple model + RL approach can yield good quality resultsfor challenging HFFSP instances

Discussion & future work:

• Precedence relations do make the problem harder

• Parameter tuning

• Convergence

• Larger instances (50, 100 jobs)

• Explore possibilities for improvement in machine assignment

Thank you!

Questions?

bert.vanvreckem@hogent.behttp://www.slideshare.net/bertvanvreckem/

a reinforcement learning approach for hybrid flexible flowline scheduling problems

ai hffsp mista2013

conclusion hffsp mista2013

learning permutations

stages stage

stage epns urlings

precedence constraints

consecutive stages

instant t pjt

Education

flowline pilot valve models psv5a / psv5e - flowline pilot...

tcp flowline - cdn.bluenotion.nl

cellular network traffic scheduling using deep reinforcement...

skf flowline

outdoor lighting flowline

bs 8666 - scheduling dimension ing bending and cutting of...

learnet: reinforcement learning based flow scheduling...

exploration of reinforcement learning in radar scheduling

flowline 2 - elecro engineering

dynamic pricing and energy consumption scheduling with...

why would location-based scheduling be ......the theory of...

energy aware deep reinforcement learning scheduling for

reinforcement learning in telescope scheduling

skf flowline monitor

mid-term hydro-scheduling problem: the battle between...

scheduling virtual machine migration during datacenter...

hierarchical bayesian methods for reinforcement learning ·...

reinforcement learning applied to meta -scheduling in grid...

fmc flowline product catalog copy

fmc flowline product catalog.pdf-, attachment