a reinforcement learning approach for hybrid flexible flowline scheduling problems
Post on 27-Jun-2015
296 Views
Preview:
DESCRIPTION
TRANSCRIPT
A Reinforcement Learning Approach to SolvingHybrid Flexible Flowline Scheduling Problems
Bert Van Vreckem Dmitriy Borodin Wim De Bruyn AnnNowe
Authors
• Bert Van Vreckem, HoGent Business and InformationManagementbert.vanvreckem@hogent.be
• Dmitriy Borodin, OMPartnersdborodin@ompartners.com
• Wim De Bruyn, HoGent Business and InformationManagementwim.debruyn@hogent.be
• Ann Nowe, Artificial Intelligence Lab, Vrije Universiteit Brusselann.nowe@vub.ac.be
HFFSP MISTA2013: 29 August 2013 3/28
Contents
1 Hybrid Flexible Flowline Scheduling Problems
2 A Machine Learning Approach
3 Learning Permutations with Precedence Constraints
4 Experiments & results
5 Conclusion
HFFSP MISTA2013: 29 August 2013 4/28
Hybrid Flexible Flowline Scheduling Problems
Powerful model for complex real-life production schedulingproblems.In α/β/γ notation1:
HFFLm, ((RM(i))
(m)i=1/Mj , rm, prec, Siljk, Ailjk, lag/Cmax
Flowline Scheduling problems: jobs processed in consecutive stages.
Stage 1 Stage 2 Stage 3 Stage 4
1(Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 5/28
Hybrid Flexible Flowline Scheduling Problems
Powerful model for complex real-life production schedulingproblems.In α/β/γ notation1:
HFFLm, ((RM(i))
(m)i=1/Mj , rm, prec, Siljk, Ailjk, lag/Cmax
Flowline Scheduling problems: jobs processed in consecutive stages.
Stage 1 Stage 2 Stage 3 Stage 4
1(Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 5/28
Hybrid Flexible Flowline Scheduling Problems
Hybrid case: unrelated parallel machines
M11
M12
M13
M21
M22
M31
M32
M33
M34
M41
M42
HFFSP MISTA2013: 29 August 2013 6/28
Hybrid Flexible Flowline Scheduling Problems
Flexible case: stages may be skipped
M11
M12
M13
M21
M22
M41
M42
HFFSP MISTA2013: 29 August 2013 7/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Machine eligibility
M11
M13
M21
M22
M31
M33
M42
HFFSP MISTA2013: 29 August 2013 8/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Time lag between stages
Stage 1
Stage 2
Stage 3
Stage 4
HFFSP MISTA2013: 29 August 2013 9/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Sequence dependent setup times
1 2 3 4 5 6 7 8 9 10 11 12
J1 J2M1
J1 J2M2
J2 J1M1
J2 J1M2
HFFSP MISTA2013: 29 August 2013 10/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Sequence dependent setup times
1 2 3 4 5 6 7 8 9 10 11 12
J1 J2M1
J1 J2M2
J2 J1M1
J2 J1M2
HFFSP MISTA2013: 29 August 2013 10/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Sequence dependent setup times
1 2 3 4 5 6 7 8 9 10 11 12
J1 J2M1
J1 J2M2
J2 J1M1
J2 J1M2
HFFSP MISTA2013: 29 August 2013 11/28
Hybrid Flexible Flowline Scheduling Problems
Other constraints: Precendence relations between jobs
1 2 3 4 5 6 7 8 9 10 11 12
J1 J2M1
J1 J2M2
J2 J1M1
J2 J1M2
HFFSP MISTA2013: 29 August 2013 12/28
Hybrid Flexible Flowline Scheduling Problems
Precedence relations between jobs make the problem muchharder, in a way that MILP/CPLEX approach doesn’t workanymore for larger instances (Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 13/28
Contents
1 Hybrid Flexible Flowline Scheduling Problems
2 A Machine Learning Approach
3 Learning Permutations with Precedence Constraints
4 Experiments & results
5 Conclusion
HFFSP MISTA2013: 29 August 2013 14/28
A Machine Learning ApproachScheduling Hybrid Flexible Flowline Scheduling Problems
Two stages:
• Job permutations
→ Learning Automata
• Machine assignment
→ Earliest Preparation Next Stage(EPNS) (Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 15/28
A Machine Learning ApproachScheduling Hybrid Flexible Flowline Scheduling Problems
Two stages:
• Job permutations → Learning Automata
• Machine assignment
→ Earliest Preparation Next Stage(EPNS) (Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 15/28
A Machine Learning ApproachScheduling Hybrid Flexible Flowline Scheduling Problems
Two stages:
• Job permutations → Learning Automata
• Machine assignment → Earliest Preparation Next Stage(EPNS) (Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 15/28
A Machine Learning ApproachScheduling Hybrid Flexible Flowline Scheduling Problems
Two stages:
• Job permutations → Learning Automata
• Machine assignment → Earliest Preparation Next Stage(EPNS) (Urlings, 2010)
HFFSP MISTA2013: 29 August 2013 15/28
Reinforcement learningAt every discrete time step t:
• Agent percieves environment state s(t)
• Agent chooses action a(t) ∈ A = a1, . . . , an according tosome policy
• Environment places agent in new state s(t+ 1) and givesreinforcement r(t)
• Goal: learn policy that maximizes long term cumulativereward
∑t r(t)
Environment
Agent
s
r
a
HFFSP MISTA2013: 29 August 2013 16/28
Learning Automata (LA)
Reinforcement Learning agents that choose action according toprobability distribution p(t) = (p1(t), . . . , pn(t)), withpi = Prob[a(t) = ai] and s.t.
∑ni=1 pi = 1
pi(0) = 1n (1)
pi(t+ 1) = pi(t) +αrewr(t)(1− pi(t))−αpen(1− r(t))pi(t) (2)
if ai is the action taken at instant t
pj(t+ 1) = pj(t) −αrewr(t)pj(t)
+αpen(1− r(t))(
1
n− 1− pj(t)
)(3)
if aj 6= ai
HFFSP MISTA2013: 29 August 2013 17/28
Learning Automata (LA)
Reinforcement Learning agents that choose action according toprobability distribution p(t) = (p1(t), . . . , pn(t)), withpi = Prob[a(t) = ai] and s.t.
∑ni=1 pi = 1
pi(0) = 1n (1)
pi(t+ 1) = pi(t) +αrewr(t)(1− pi(t))−αpen(1− r(t))pi(t) (2)
if ai is the action taken at instant t
pj(t+ 1) = pj(t) −αrewr(t)pj(t)
+αpen(1− r(t))(
1
n− 1− pj(t)
)(3)
if aj 6= ai
HFFSP MISTA2013: 29 August 2013 17/28
Learning Automata (LA)
Reinforcement Learning agents that choose action according toprobability distribution p(t) = (p1(t), . . . , pn(t)), withpi = Prob[a(t) = ai] and s.t.
∑ni=1 pi = 1
pi(0) = 1n (1)
pi(t+ 1) = pi(t) +αrewr(t)(1− pi(t))−αpen(1− r(t))pi(t) (2)
if ai is the action taken at instant t
pj(t+ 1) = pj(t) −αrewr(t)pj(t)
+αpen(1− r(t))(
1
n− 1− pj(t)
)(3)
if aj 6= ai
HFFSP MISTA2013: 29 August 2013 17/28
Learning Automaton update
1 2 3 40
0.2
0.4
0.6
0.8
1
i
pi
E.g. action 3 was chosen
1 2 3 40
0.2
0.4
0.6
0.8
1
r(t) = 1
pi
1 2 3 40
0.2
0.4
0.6
0.8
1
r(t) = 0
pi
HFFSP MISTA2013: 29 August 2013 18/28
Learning Automaton update
1 2 3 40
0.2
0.4
0.6
0.8
1
i
pi
E.g. action 3 was chosen
1 2 3 40
0.2
0.4
0.6
0.8
1
r(t) = 1
pi
1 2 3 40
0.2
0.4
0.6
0.8
1
r(t) = 0
pi
HFFSP MISTA2013: 29 August 2013 18/28
Learning Automaton update
1 2 3 40
0.2
0.4
0.6
0.8
1
i
pi
E.g. action 3 was chosen
1 2 3 40
0.2
0.4
0.6
0.8
1
r(t) = 1
pi
1 2 3 40
0.2
0.4
0.6
0.8
1
r(t) = 0
pi
HFFSP MISTA2013: 29 August 2013 18/28
Learning Automaton update
1 2 3 40
0.2
0.4
0.6
0.8
1
i
pi
E.g. action 3 was chosen
1 2 3 40
0.2
0.4
0.6
0.8
1
r(t) = 1
pi
1 2 3 40
0.2
0.4
0.6
0.8
1
r(t) = 0
pi
HFFSP MISTA2013: 29 August 2013 18/28
Contents
1 Hybrid Flexible Flowline Scheduling Problems
2 A Machine Learning Approach
3 Learning Permutations with Precedence Constraints
4 Experiments & results
5 Conclusion
HFFSP MISTA2013: 29 August 2013 19/28
Probabilistic Basic Simple Strategy (PBSS)(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resultingin a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule LinearReward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1• If not, r(t) = 0
• Repeat until convergence
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resultingin a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule LinearReward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1• If not, r(t) = 0
• Repeat until convergence
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resultingin a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule LinearReward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1• If not, r(t) = 0
• Repeat until convergence
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resultingin a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule LinearReward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1• If not, r(t) = 0
• Repeat until convergence
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resultingin a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule LinearReward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1
• If not, r(t) = 0
• Repeat until convergence
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resultingin a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule LinearReward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1• If not, r(t) = 0
• Repeat until convergence
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)(Wauters, 2012)
• A LA is assigned to every position of a permutation
• LAs play a dispersion game to choose unique action, resultingin a permutation
• Quality of solution is evaluated
• Update probabilities according to LA update rule LinearReward-Inaction (αpen = 0):
• Better result than best one so far: r(t) = 1• If not, r(t) = 0
• Repeat until convergence
HFFSP MISTA2013: 29 August 2013 20/28
Probabilistic Basic Simple Strategy (PBSS)
• PBSS: great results in several optimization problems thatinvolve learning permutations
• but doesn’t work well when precedence constraints areinvolved
• PBSS only learns from positive experience (i.e. improving onprevious solutions)
• Doesn’t learn to avoid invalid permutations
HFFSP MISTA2013: 29 August 2013 21/28
Probabilistic Basic Simple Strategy (PBSS)
• PBSS: great results in several optimization problems thatinvolve learning permutations
• but doesn’t work well when precedence constraints areinvolved
• PBSS only learns from positive experience (i.e. improving onprevious solutions)
• Doesn’t learn to avoid invalid permutations
HFFSP MISTA2013: 29 August 2013 21/28
Probabilistic Basic Simple Strategy (PBSS)
• PBSS: great results in several optimization problems thatinvolve learning permutations
• but doesn’t work well when precedence constraints areinvolved
• PBSS only learns from positive experience (i.e. improving onprevious solutions)
• Doesn’t learn to avoid invalid permutations
HFFSP MISTA2013: 29 August 2013 21/28
Probabilistic Basic Simple Strategy (PBSS)
• PBSS: great results in several optimization problems thatinvolve learning permutations
• but doesn’t work well when precedence constraints areinvolved
• PBSS only learns from positive experience (i.e. improving onprevious solutions)
• Doesn’t learn to avoid invalid permutations
HFFSP MISTA2013: 29 August 2013 21/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update withr(t) = 0 and αpen > 0 for all agents that are involved in theviolation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in allagents, depending on the resulting makespan ms and bestmakespan until now msbest:
• improved: r(t) = 1;• equally good: r(t) = 1/2;• worse: r(t) = msbest
2ms ;• no valid schedule found: r(t) = 0;
HFFSP MISTA2013: 29 August 2013 22/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update withr(t) = 0 and αpen > 0 for all agents that are involved in theviolation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in allagents, depending on the resulting makespan ms and bestmakespan until now msbest:
• improved: r(t) = 1;• equally good: r(t) = 1/2;• worse: r(t) = msbest
2ms ;• no valid schedule found: r(t) = 0;
HFFSP MISTA2013: 29 August 2013 22/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update withr(t) = 0 and αpen > 0 for all agents that are involved in theviolation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in allagents, depending on the resulting makespan ms and bestmakespan until now msbest:
• improved: r(t) = 1;
• equally good: r(t) = 1/2;• worse: r(t) = msbest
2ms ;• no valid schedule found: r(t) = 0;
HFFSP MISTA2013: 29 August 2013 22/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update withr(t) = 0 and αpen > 0 for all agents that are involved in theviolation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in allagents, depending on the resulting makespan ms and bestmakespan until now msbest:
• improved: r(t) = 1;• equally good: r(t) = 1/2;
• worse: r(t) = msbest2ms ;
• no valid schedule found: r(t) = 0;
HFFSP MISTA2013: 29 August 2013 22/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update withr(t) = 0 and αpen > 0 for all agents that are involved in theviolation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in allagents, depending on the resulting makespan ms and bestmakespan until now msbest:
• improved: r(t) = 1;• equally good: r(t) = 1/2;• worse: r(t) = msbest
2ms ;
• no valid schedule found: r(t) = 0;
HFFSP MISTA2013: 29 August 2013 22/28
Extending PBSS for precendence constraints
Updating probabilities:
• If the job permutation is invalid, perform an update withr(t) = 0 and αpen > 0 for all agents that are involved in theviolation of precedence constraints.
• If the job permutation is valid, perform a LR−I update in allagents, depending on the resulting makespan ms and bestmakespan until now msbest:
• improved: r(t) = 1;• equally good: r(t) = 1/2;• worse: r(t) = msbest
2ms ;• no valid schedule found: r(t) = 0;
HFFSP MISTA2013: 29 August 2013 22/28
Contents
1 Hybrid Flexible Flowline Scheduling Problems
2 A Machine Learning Approach
3 Learning Permutations with Precedence Constraints
4 Experiments & results
5 Conclusion
HFFSP MISTA2013: 29 August 2013 23/28
Experiments
• HFFSP Benchmark problems from (Ruiz et al., 2008)2
• problem sets with 5, 7, 9, 11, 13, 15 jobs, 96 instances in eachset
• + other constraints that make problems harder (precedencerelations!)
• αrew = 0.1; αpen = 0.5 (no tuning)
• Run until converges, or at most 300 seconds
2Available at http://soa.iti.es/problem-instances
HFFSP MISTA2013: 29 August 2013 24/28
ResultsInstance set 5 7 9 11 13 15 overallmean RD (%) 0.0697 2.0131 1.1568 1.6565 3.7294 7.9189 2.7484best RD (%) -35.70 -24.71 -26.92 -21.10 -43.34 -10.46 -43.34# improved 11 12 18 12 9 6 68# equal 62 40 19 18 8 7 154# worse 23 44 59 66 79 82 354
HFFSP MISTA2013: 29 August 2013 25/28
ResultsInstance set 5 7 9 11 13 15 overallmean RD (%) 0.0697 2.0131 1.1568 1.6565 3.7294 7.9189 2.7484best RD (%) -35.70 -24.71 -26.92 -21.10 -43.34 -10.46 -43.34# improved 11 12 18 12 9 6 68# equal 62 40 19 18 8 7 154# worse 23 44 59 66 79 82 354
HFFSP MISTA2013: 29 August 2013 25/28
Contents
1 Hybrid Flexible Flowline Scheduling Problems
2 A Machine Learning Approach
3 Learning Permutations with Precedence Constraints
4 Experiments & results
5 Conclusion
HFFSP MISTA2013: 29 August 2013 26/28
Results and Discussion
Contributions:
• Extension of PBSS for learning permutations with precedenceconstraints
• Simple model + RL approach can yield good quality resultsfor challenging HFFSP instances
Discussion & future work:
• Precedence relations do make the problem harder
• Parameter tuning
• Convergence
• Larger instances (50, 100 jobs)
• Explore possibilities for improvement in machine assignment
HFFSP MISTA2013: 29 August 2013 27/28
Results and Discussion
Contributions:
• Extension of PBSS for learning permutations with precedenceconstraints
• Simple model + RL approach can yield good quality resultsfor challenging HFFSP instances
Discussion & future work:
• Precedence relations do make the problem harder
• Parameter tuning
• Convergence
• Larger instances (50, 100 jobs)
• Explore possibilities for improvement in machine assignment
HFFSP MISTA2013: 29 August 2013 27/28
Thank you!
Questions?
bert.vanvreckem@hogent.behttp://www.slideshare.net/bertvanvreckem/
HFFSP MISTA2013: 29 August 2013 28/28
top related