a reinforcement learning approach for hybrid flexible flowline scheduling problems
DESCRIPTION
Paper presented at MISTA2013, Gent. In this paper, we present a method based on Learning Automata to solve Hybrid Flexible Flowline Scheduling Problems (HFFSP) with additional constraints like sequence dependent setup times, precedence relations between jobs and machine eligibility. This category of production scheduling problems is noteworthy because it involves several types of constraints that occur in complex real-life production scheduling problems like those in process industry and batch production. In the proposed technique, Learning Automata play a dispersion game to determine the order of jobs to be processed in a way that makespan is minimized, and precedence constraint violations are avoided. Experiments on a set of benchmark problems indicate that this method can yield better results than the ones known until now.TRANSCRIPT
A Reinforcement Learning Approach to SolvingHybrid Flexible Flowline Scheduling Problems
Bert Van Vreckem Dmitriy Borodin Wim De Bruyn AnnNowe
Authors
• Bert Van Vreckem, HoGent Business and [email protected]
• Dmitriy Borodin, [email protected]
• Wim De Bruyn, HoGent Business and [email protected]
• Ann Nowe, Artificial Intelligence Lab, Vrije Universiteit [email protected]
HFFSP MISTA2013: 29 August 2013 3/28
Contents
1 Hybrid Flexible Flowline Scheduling Problems
2 A Machine Learning Approach
3 Learning Permutations with Precedence Constraints
4 Experiments & results
5 Conclusion
HFFSP MISTA2013: 29 August 2013 4/28
Hybrid Flexible Flowline Scheduling Problems
Powerful model for complex real-life production schedulingproblems.In α/β/γ notation1:
HFFLm, ((RM(i))
(m)i=1/Mj , rm, prec, Siljk, Ailjk, lag/Cmax
Flowline Scheduling problems: jobs processed in consecutive stages.
Stage 1 Stage 2 Stage 3 Stage 4
1(Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 5/28
Hybrid Flexible Flowline Scheduling Problems
Powerful model for complex real-life production schedulingproblems.In α/β/γ notation1:
HFFLm, ((RM(i))
(m)i=1/Mj , rm, prec, Siljk, Ailjk, lag/Cmax
Flowline Scheduling problems: jobs processed in consecutive stages.
Stage 1 Stage 2 Stage 3 Stage 4
1(Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 5/28
Hybrid Flexible Flowline Scheduling Problems
Hybrid case: unrelated parallel machines
M11
M12
M13
M21
M22
M31
M32
M33
M34
M41
M42
HFFSP MISTA2013: 29 August 2013 6/28
Hybrid Flexible Flowline Scheduling Problems
Flexible case: stages may be skipped
M11
M12
M13
M21
M22
M41
M42
HFFSP MISTA2013: 29 August 2013 7/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Machine eligibility
M11
M13
M21
M22
M31
M33
M42
HFFSP MISTA2013: 29 August 2013 8/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Time lag between stages
Stage 1
Stage 2
Stage 3
Stage 4
HFFSP MISTA2013: 29 August 2013 9/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Sequence dependent setup times
1 2 3 4 5 6 7 8 9 10 11 12
J1 J2M1
J1 J2M2
J2 J1M1
J2 J1M2
HFFSP MISTA2013: 29 August 2013 10/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Sequence dependent setup times
1 2 3 4 5 6 7 8 9 10 11 12
J1 J2M1
J1 J2M2
J2 J1M1
J2 J1M2
HFFSP MISTA2013: 29 August 2013 10/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Sequence dependent setup times
1 2 3 4 5 6 7 8 9 10 11 12
J1 J2M1
J1 J2M2
J2 J1M1
J2 J1M2
HFFSP MISTA2013: 29 August 2013 11/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Precendence relations between jobs
1 2 3 4 5 6 7 8 9 10 11 12
J1 J2M1
J1 J2M2
J2 J1M1
J2 J1M2
HFFSP MISTA2013: 29 August 2013 12/28
Hybrid Flexible Flowline Scheduling Problems
Precedence relations between jobs make the problem muchharder, in a way that MILP/CPLEX approach doesn’t workanymore for larger instances (Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 13/28
Contents
1 Hybrid Flexible Flowline Scheduling Problems
2 A Machine Learning Approach
3 Learning Permutations with Precedence Constraints
4 Experiments & results
5 Conclusion
HFFSP MISTA2013: 29 August 2013 14/28
A Machine Learning ApproachScheduling Hybrid Flexible Flowline Scheduling Problems
Two stages:
• Job permutations
→ Learning Automata
• Machine assignment
→ Earliest Preparation Next Stage(EPNS) (Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 15/28
A Machine Learning ApproachScheduling Hybrid Flexible Flowline Scheduling Problems
Two stages:
• Job permutations → Learning Automata
• Machine assignment
→ Earliest Preparation Next Stage(EPNS) (Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 15/28
A Machine Learning ApproachScheduling Hybrid Flexible Flowline Scheduling Problems
Two stages:
• Job permutations → Learning Automata
• Machine assignment → Earliest Preparation Next Stage(EPNS) (Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 15/28
A Machine Learning ApproachScheduling Hybrid Flexible Flowline Scheduling Problems
Two stages:
• Job permutations → Learning Automata
• Machine assignment → Earliest Preparation Next Stage(EPNS) (Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 15/28
Reinforcement learningAt every discrete time step t:
• Agent percieves environment state s(t)
• Agent chooses action a(t) ∈ A = a1, . . . , an according tosome policy
• Environment places agent in new state s(t+ 1) and givesreinforcement r(t)
• Goal: learn policy that maximizes long term cumulativereward
∑t r(t)
Environment
Agent
s
r
a
HFFSP MISTA2013: 29 August 2013 16/28
Learning Automata (LA)
Reinforcement Learning agents that choose action according toprobability distribution p(t) = (p1(t), . . . , pn(t)), withpi = Prob[a(t) = ai] and s.t.
∑ni=1 pi = 1
pi(0) = 1n (1)
pi(t+ 1) = pi(t) +αrewr(t)(1− pi(t))−αpen(1− r(t))pi(t) (2)
if ai is the action taken at instant t
pj(t+ 1) = pj(t) −αrewr(t)pj(t)
+αpen(1− r(t))(
1
n− 1− pj(t)
)(3)
if aj 6= ai
HFFSP MISTA2013: 29 August 2013 17/28
Learning Automata (LA)
Reinforcement Learning agents that choose action according toprobability distribution p(t) = (p1(t), . . . , pn(t)), withpi = Prob[a(t) = ai] and s.t.
∑ni=1 pi = 1
pi(0) = 1n (1)
pi(t+ 1) = pi(t) +αrewr(t)(1− pi(t))−αpen(1− r(t))pi(t) (2)
if ai is the action taken at instant t
pj(t+ 1) = pj(t) −αrewr(t)pj(t)
+αpen(1− r(t))(
1
n− 1− pj(t)
)(3)
if aj 6= ai
HFFSP MISTA2013: 29 August 2013 17/28
Learning Automata (LA)
Reinforcement Learning agents that choose action according toprobability distribution p(t) = (p1(t), . . . , pn(t)), withpi = Prob[a(t) = ai] and s.t.
∑ni=1 pi = 1
pi(0) = 1n (1)
pi(t+ 1) = pi(t) +αrewr(t)(1− pi(t))−αpen(1− r(t))pi(t) (2)
if ai is the action taken at instant t
pj(t+ 1) = pj(t) −αrewr(t)pj(t)
+αpen(1− r(t))(
1
n− 1− pj(t)
)(3)
if aj 6= ai
HFFSP MISTA2013: 29 August 2013 17/28
Learning Automaton update
1 2 3 40
0.2
0.4
0.6
0.8
1
i
pi
E.g. action 3 was chosen
1 2 3 40
0.2
0.4
0.6
0.8
1
r(t) = 1
pi
1 2 3 40
0.2
0.4
0.6
0.8
1
r(t) = 0
pi
HFFSP MISTA2013: 29 August 2013 18/28
Learning Automaton update
1 2 3 40
0.2
0.4
0.6
0.8
1
i
pi
E.g. action 3 was chosen
1 2 3 40
0.2
0.4
0.6
0.8
1
r(t) = 1
pi
1 2 3 40
0.2
0.4
0.6
0.8
1
r(t) = 0
pi
HFFSP MISTA2013: 29 August 2013 18/28
Learning Automaton update
1 2 3 40
0.2
0.4
0.6
0.8
1
i
pi
E.g. action 3 was chosen
1 2 3 40
0.2
0.4
0.6
0.8
1
r(t) = 1
pi
1 2 3 40
0.2
0.4
0.6
0.8
1
r(t) = 0
pi
HFFSP MISTA2013: 29 August 2013 18/28
Learning Automaton update
1 2 3 40
0.2
0.4
0.6
0.8
1
i
pi
E.g. action 3 was chosen
1 2 3 40
0.2
0.4
0.6
0.8
1
r(t) = 1
pi
1 2 3 40
0.2
0.4
0.6
0.8
1
r(t) = 0
pi
HFFSP MISTA2013: 29 August 2013 18/28
Contents
1 Hybrid Flexible Flowline Scheduling Problems
2 A Machine Learning Approach
3 Learning Permutations with Precedence Constraints
4 Experiments & results
5 Conclusion
HFFSP MISTA2013: 29 August 2013 19/28
Probabilistic Basic Simple Strategy (PBSS)(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resultingin a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule LinearReward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1• If not, r(t) = 0
• Repeat until convergence
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resultingin a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule LinearReward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1• If not, r(t) = 0
• Repeat until convergence
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resultingin a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule LinearReward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1• If not, r(t) = 0
• Repeat until convergence
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resultingin a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule LinearReward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1• If not, r(t) = 0
• Repeat until convergence
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resultingin a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule LinearReward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1
• If not, r(t) = 0
• Repeat until convergence
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resultingin a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule LinearReward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1• If not, r(t) = 0
• Repeat until convergence
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resultingin a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule LinearReward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1• If not, r(t) = 0
• Repeat until convergence
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)
• PBSS: great results in several optimization problems thatinvolve learning permutations
• but doesn’t work well when precedence constraints areinvolved
• PBSS only learns from positive experience (i.e. improving onprevious solutions)
• Doesn’t learn to avoid invalid permutations
HFFSP MISTA2013: 29 August 2013 21/28
Probabilistic Basic Simple Strategy (PBSS)
• PBSS: great results in several optimization problems thatinvolve learning permutations
• but doesn’t work well when precedence constraints areinvolved
• PBSS only learns from positive experience (i.e. improving onprevious solutions)
• Doesn’t learn to avoid invalid permutations
HFFSP MISTA2013: 29 August 2013 21/28
Probabilistic Basic Simple Strategy (PBSS)
• PBSS: great results in several optimization problems thatinvolve learning permutations
• but doesn’t work well when precedence constraints areinvolved
• PBSS only learns from positive experience (i.e. improving onprevious solutions)
• Doesn’t learn to avoid invalid permutations
HFFSP MISTA2013: 29 August 2013 21/28
Probabilistic Basic Simple Strategy (PBSS)
• PBSS: great results in several optimization problems thatinvolve learning permutations
• but doesn’t work well when precedence constraints areinvolved
• PBSS only learns from positive experience (i.e. improving onprevious solutions)
• Doesn’t learn to avoid invalid permutations
HFFSP MISTA2013: 29 August 2013 21/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update withr(t) = 0 and αpen > 0 for all agents that are involved in theviolation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in allagents, depending on the resulting makespan ms and bestmakespan until now msbest:
• improved: r(t) = 1;• equally good: r(t) = 1/2;• worse: r(t) = msbest
2ms ;• no valid schedule found: r(t) = 0;
HFFSP MISTA2013: 29 August 2013 22/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update withr(t) = 0 and αpen > 0 for all agents that are involved in theviolation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in allagents, depending on the resulting makespan ms and bestmakespan until now msbest:
• improved: r(t) = 1;• equally good: r(t) = 1/2;• worse: r(t) = msbest
2ms ;• no valid schedule found: r(t) = 0;
HFFSP MISTA2013: 29 August 2013 22/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update withr(t) = 0 and αpen > 0 for all agents that are involved in theviolation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in allagents, depending on the resulting makespan ms and bestmakespan until now msbest:
• improved: r(t) = 1;
• equally good: r(t) = 1/2;• worse: r(t) = msbest
2ms ;• no valid schedule found: r(t) = 0;
HFFSP MISTA2013: 29 August 2013 22/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update withr(t) = 0 and αpen > 0 for all agents that are involved in theviolation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in allagents, depending on the resulting makespan ms and bestmakespan until now msbest:
• improved: r(t) = 1;• equally good: r(t) = 1/2;
• worse: r(t) = msbest2ms ;
• no valid schedule found: r(t) = 0;
HFFSP MISTA2013: 29 August 2013 22/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update withr(t) = 0 and αpen > 0 for all agents that are involved in theviolation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in allagents, depending on the resulting makespan ms and bestmakespan until now msbest:
• improved: r(t) = 1;• equally good: r(t) = 1/2;• worse: r(t) = msbest
2ms ;
• no valid schedule found: r(t) = 0;
HFFSP MISTA2013: 29 August 2013 22/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update withr(t) = 0 and αpen > 0 for all agents that are involved in theviolation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in allagents, depending on the resulting makespan ms and bestmakespan until now msbest:
• improved: r(t) = 1;• equally good: r(t) = 1/2;• worse: r(t) = msbest
2ms ;• no valid schedule found: r(t) = 0;
HFFSP MISTA2013: 29 August 2013 22/28
Contents
1 Hybrid Flexible Flowline Scheduling Problems
2 A Machine Learning Approach
3 Learning Permutations with Precedence Constraints
4 Experiments & results
5 Conclusion
HFFSP MISTA2013: 29 August 2013 23/28
Experiments
• HFFSP Benchmark problems from (Ruiz et al., 2008)2
• problem sets with 5, 7, 9, 11, 13, 15 jobs, 96 instances in eachset
• + other constraints that make problems harder (precedencerelations!)
• αrew = 0.1; αpen = 0.5 (no tuning)
• Run until converges, or at most 300 seconds
2Available at http://soa.iti.es/problem-instances
HFFSP MISTA2013: 29 August 2013 24/28
ResultsInstance set 5 7 9 11 13 15 overallmean RD (%) 0.0697 2.0131 1.1568 1.6565 3.7294 7.9189 2.7484best RD (%) -35.70 -24.71 -26.92 -21.10 -43.34 -10.46 -43.34# improved 11 12 18 12 9 6 68# equal 62 40 19 18 8 7 154# worse 23 44 59 66 79 82 354
HFFSP MISTA2013: 29 August 2013 25/28
ResultsInstance set 5 7 9 11 13 15 overallmean RD (%) 0.0697 2.0131 1.1568 1.6565 3.7294 7.9189 2.7484best RD (%) -35.70 -24.71 -26.92 -21.10 -43.34 -10.46 -43.34# improved 11 12 18 12 9 6 68# equal 62 40 19 18 8 7 154# worse 23 44 59 66 79 82 354
HFFSP MISTA2013: 29 August 2013 25/28
Contents
1 Hybrid Flexible Flowline Scheduling Problems
2 A Machine Learning Approach
3 Learning Permutations with Precedence Constraints
4 Experiments & results
5 Conclusion
HFFSP MISTA2013: 29 August 2013 26/28
Results and Discussion
Contributions:
• Extension of PBSS for learning permutations with precedenceconstraints
• Simple model + RL approach can yield good quality resultsfor challenging HFFSP instances
Discussion & future work:
• Precedence relations do make the problem harder
• Parameter tuning
• Convergence
• Larger instances (50, 100 jobs)
• Explore possibilities for improvement in machine assignment
HFFSP MISTA2013: 29 August 2013 27/28
Results and Discussion
Contributions:
• Extension of PBSS for learning permutations with precedenceconstraints
• Simple model + RL approach can yield good quality resultsfor challenging HFFSP instances
Discussion & future work:
• Precedence relations do make the problem harder
• Parameter tuning
• Convergence
• Larger instances (50, 100 jobs)
• Explore possibilities for improvement in machine assignment
HFFSP MISTA2013: 29 August 2013 27/28
Thank you!
Questions?
[email protected]://www.slideshare.net/bertvanvreckem/
HFFSP MISTA2013: 29 August 2013 28/28