application of neural networks to heuristic scheduling algorithms
TRANSCRIPT
Application of neural networks to heuristic scheduling algorithms
Derya Eren Akyol*
Department of Industrial Engineering, University of Dokuz Eylul, 35100 Bornova-Izmir, Turkey
Available online 2 July 2004
Abstract
This paper considers the use of artificial neural networks (ANNs) to model six different heuristic algorithms
applied to the n job, m machine real flowshop scheduling problem with the objective of minimizing makespan. The
objective is to obtain six ANN models to be used for the prediction of the completion times for each job processed
on each machine and to introduce the fuzziness of scheduling information into flowshop scheduling. Fuzzy
membership functions are generated for completion, job waiting and machine idle times. Different methods are
proposed to obtain the fuzzy parameters. To model the functional relation between the input and output variables,
multilayered feedforward networks (MFNs) trained with error backpropagation learning rule are used. The trained
network is able to apply the learnt relationship to new problems. In this paper, an implementation alternative to the
existing heuristic algorithms is provided. Once the network is trained adequately, it can provide an outcome
(solution) faster than conventional iterative methods by its generalizing property. The results obtained from the
study can be extended to solve the scheduling problems in the area of manufacturing.
q 2004 Elsevier Ltd. All rights reserved.
Keywords: Artificial neural networks; Multilayered perceptron; Heuristic scheduling; Flowshop scheduling problems; Fuzzy
membership functions
1. Introduction
The flowshop scheduling problem is considered as one of the general production scheduling problems
in which n different jobs must be processed by m machines in the same order. The problem can be
considered as finding a scheme of allocation of tasks to a limited number of competing resources, with an
objective of satisfying constraints and optimizing performance criteria. Much research literature
addresses methods of minimizing performance measures such as makespan. The makespan
minimization, within the general flowshop scheduling domain, provides a useful area for analysis
because it is an important model in scheduling theory and it is usually very difficult to find its optimal
solution (Jain & Meeran, 2002).
0360-8352/$ - see front matter q 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.cie.2004.05.005
Computers & Industrial Engineering 46 (2004) 679–696www.elsevier.com/locate/dsw
* Tel.: þ90-232-3881047; fax: þ90-232-3887864.E-mail address: [email protected] (D.E. Akyol).
During the last 40 years, most of the work has been applied to the permutation flowshop problem.
In the permutation flowshop, n different jobs have to be processed on m machines. Each job has one
operation on each machine and all jobs have the same ordering sequence on each machine. At any time,
each machine can process at most one job. Preemption is not allowed. The objective is to find a
permutation of jobs that minimizes the maximum completion time or makespan. This problem denoted
by n=m=P=Cmax; is found to be an NP-complete problem of combinatorial optimization problems.
Complete enumeration, integer programming, branch and bound techniques can be used to find the
optimal sequences for small-size problems but they do not provide efficient solutions for large size
problems. In view of the combinatorial complexity and time constraints, most of the large problems can
be solved only by heuristic methods (Lee & Shaw, 2000). Though, efficient heuristics cannot guarantee
optimal solutions, they provide approximate solutions almost as good as the optimal solutions (Ho &
Chang, 1991).
In recent years, the technological advancements in hardware and software have encouraged new
application tools such as neural networks to be applied to combinatorially exploding NP-hard problems
(Jain & Meeran, 1998). They have emerged as efficient approaches in a variety of engineering
applications where problems are difficult to formulate or awkwardly defined. They are computational
structures that implement simplified models of biological processes which are preferred for their
robustness, massive parallelism and ability to learn. They have proven to be more useful for complicated
problems difficult to solve with conventional methods. The advantage of them lies in their resilience
against distortions in the input data and their learning capabilities. With their learning capabilities,
they avoid having to develop a mathematical model or acquiring the appropriate knowledge to solve a
task. The ability to map and solve a number of problems motivated the proposal for using neural
networks as a highly parallel model for general purpose computing. so that, they have been applied for
solving scheduling and different combinatorial optimization problems. Finding the relationship between
the data (i.e. processing times, due dates, etc.) and schedules, determining the optimal sequence for the
jobs to be processed, identifying best dispatching strategies (i.e. scheduling rules), etc. are some of
the application areas of neural networks in the scheduling literature (Sabuncuoglu & Gurgun, 1996).
Sabuncuoglu (1998) presented a detailed review of the literature in the area of scheduling, and study of
Smith (1999) involves a review of the research works on the use of NNs in combinatorial optimization.
Neural networks as learning tools, have demonstrated their ability to capture the general relationship
between variables that are difficult or impossible to relate to each other analytically by learning, recalling
and generalizing from training patterns as data (Shiue & Su, 2002). In other words, they are universal
function approximators and are therefore attractive for automatically learning the (nonlinear) functional
relation between the input and output variables (Raaymakers & Weijters, 2003).
In this study, a scheduling problem in a real permutation flowshop environment is considered.
Using the information of production orders for 1 month and the global operation recipe, the best
sequence of five different products are found by six different heuristic algorithms. For each sequence
found by six different heuristic algorithms, the completion time of each job on each machine, job waiting
times and machine idle times are computed and read into the system. To model these six different
heuristic scheduling algorithms, one of the most popular neural network architectures, the multilayered
perceptron (MLP) neural network is used. In order to develop a neural network, the Backpack Neural
Network System Version 4.0 (by Z Solutions) is used and some necessary steps are followed. For each of
the heuristic algorithms, the neural network model is used for estimating the makespan of five jobs
processed on 43 machines. By this way, we presented a neural network based implementation alternative
D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696680
to the existing heuristic algorithms. The proposed method is simple and straightforward. An MLP neural
network is trained on data from a real world problem to learn the functional relationship between the
input and output variables. After the training process is completed, MLP can provide outputs of adequate
accuracy over a limited range of input conditions, having the advantage of requiring a lot less
computation than other modeling methods (Feng, Li, Cen, & Huang, 2003). In other words, the neural
network’s computational speed permits fast solutions to problems not seen previously by the network
(El-Bouri, Balakrishnan, & Pooplewell, 2000).
This paper is organized as follows. Section 2 presents a mathematical formulation of the permutation
flowshop scheduling problem with the makespan objective. In Section 3, we give information about the
heuristic procedures considered in this study. Steps of developing a backpropagation network are explained
in Section 4. Section 5 includes the experimental results. Finally, Section 6 provides conclusions.
2. The permutation flowshop scheduling problem with the makespan criterion
In a permutation flowshop scheduling problem, there are a set of jobs I ¼ {1; 2; 3;…; n} and a set of
machines J ¼ {1; 2; 3;…;m}: Each of the n jobs has to be processed on m machines 1,2,…m in the order
given by the indexing of the machines. Thus the job i; i [ I consists of a sequence of m operations,
each of which must be processed on machine j for an uninterrupted processing time pij: Each machine j;
j [ J; can process at most one job at a time, each job can be processed on at most one machine at a time
and once an operation is started, it must be completed without interruption (Baker, 1974). Let Cij be the
completion time of operation i on machine j: The makespan Cmax is the maximum completion among all
jobs. In the permutation flowshop problem with the makespan objective, the goal is to find a permutation
schedule that minimizes the makespan Cmax; where a permutation schedule for a flowshop instance is a
schedule in which each machine processes the jobs in the same order.
In order to provide a formal mathematical model of the problem, we apply the notion of job
processing order represented by a permutation P ¼ {p1;p2;…;pn} on the set I; where pi denotes the
element of I which is in position i in P (Nowicki, 1999). Then we calculate the completion time of the
partial schedule pi on machine j denoted by Cðpi; jÞ as follows:
Cðp1; 1Þ ¼ pðp1; 1Þ ð1Þ
Cðpi; 1Þ ¼ Cðpi21; 1Þ þ pðpi; 1Þ for i ¼ 2;…; n ð2Þ
Cðp1; jÞ ¼ Cðp1; j 2 1Þ þ pðp1; jÞ for j ¼ 2;…;m ð3Þ
Cðpi; jÞ ¼ max{Cðpi21; jÞ;Cðpi; j 2 1Þ} þ pðpi; jÞ for i ¼ 2;…; n; j ¼ 2;…;m: ð4Þ
Finally, we define the makespan as
CmaxðpÞ ¼ Cðpn;mÞ: ð5Þ
The permutation flowshop scheduling problem is then to find a permutation pp in the set of all
permutations P such that (Rajendran & Chaudri, 1991)
Cmax ðppÞ # CmaxðpÞ ;p [ P: ð6Þ
D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696 681
In this study, we consider a flowshop consisting of M machines, each with unlimited buffer space.
There is no additional restriction that the processing of each job has to be continuous so that, there may
be waiting times between the processing of any consecutive tasks of each job.
The main assumptions for this problem are:
- a set of n multiple-operation jobs is available for processing at time zero (each job requires m
operations and each operation requires a different machine).
- the set up times for the operations are sequence-independent and are included in the
processing times.
- m different machines are continuously available.
- individual operations are not preemptable.
3. Heuristics
Six flowshop heuristics are considered in this study. One can find the explanations of these methods in
Aksoy (1980), Campbell, Dudek, and Smith (1970), Koulamas (1998) and Nawaz, Enscore, and Ham
(1983). The other two heuristic algorithms are new and are presented below.
3.1. Aslan’s frequency algorithm
This algorithm is developed by Aslan (1999) with the objective of minimizing makespan and works as
follows:
Step 1
Take the operation times of each job on each machine, generate an n £ m dimensional problem.
Step 2
Consider all of the combinations of all jobs, produce nðn 2 1Þ pairs (two by two).
Step 3
Calculate the partial makespan of all pairs by loading the jobs on the machines. Pair ði; jÞ and pair ðj; iÞ
are compared. For the pairs which have smaller completion time, the first job takes the frequency value
of 1, the other job 0.
By these comparisons nðn 2 1Þ=2 frequency values are obtained. If the completion time of the pairs
are the same then both jobs take the frequency value of 1.
Step 4
Sum up the frequency values of all jobs and sort them in decreasing order. (This method sequences the
jobs in decreasing frequency value order).
Step 5
If the jobs have equal frequency values, consider alternative sequences and the sequence which results in
less total completion time is the final sequence.
D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696682
3.1.1. Improvement phase
The job pairs which have equal total completion times are evaluated and the dominant pairs are found.
The pairs which have less total machine idleness are considered as dominant. A frequency value of 1 is
added to the first job and a frequency value of 1 is subtracted from the other job and a new sequence is
generated.
3.1.2. A numerical example to Aslan’s frequency (dual sequencing) algorithm
Consider a flowshop with 5 machines. There are 4 jobs to be scheduled and their processing times are
as shown in Table 1.
We compare the calculated total completion times for pairs ij and ji for all of the combinations of
the jobs.
Pairs Completion times Pairs Completion times
(1,2) 41 (2,1): 42
(1,3) 46 (3,1): 42
(1,4) 41 (4,1): 40
(2,3) 41 (3,2): 40
(2,4) 39 (4,2): 39
(3,4) 38 (4,3): 41
.
First, pair (1,2) and (2,1) are compared. Since pair (1,2) results in less partial makespan, job1 takes the
frequency value of 1 and job 2 takes the frequency value of 0. By executing these nðn 2 1Þ=2
comparisons, we obtain the frequency values for all jobs.
As it is seen in Table 2, we assign frequency value of 1 to jobs 2 and 4 because pair (2,4) and pair (4,2)
have both total completion times of 39. Then the frequency values of all jobs are summed. The frequency
values for each job are obtained as follows:
Frequency for job 1:1
Frequency for job 2:1
Frequency for job 3:3
Frequency for job 4:2
Table 1
Processing times for 4 job 5 machine problem
Job Machine
M1 M2 M3 M4 M5
J1 5 9 8 10 1
J2 9 3 10 1 8
J3 9 4 5 8 6
J4 4 8 8 7 2
D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696 683
The frequency values of all jobs are sorted in decreasing frequency value order. The method yields the
sequence 3-4-2-1 or 3-4-1-2. The makespan of these two sequences are compared and it is found that
3-4-1-2 sequence with Cmax ¼ 57 is better than the sequence 3-4-2-1 with Cmax ¼ 58:
3.1.3. Improvement phase
The job pairs (2,4) and (4,2) have equal total completion times. The dominant pairs are investigated.
Because pair (2,4) and pair (4,2) have equal total completion times of 39, we compare these two pairs
and try to decide which one is dominant. For each pair, except for the first machine, we calculate how
much the subsequent machine waits the preceding machine. In Fig. 1, the numbers on the upper side of
each circle indicate the starting time of each job on each machine and the numbers below the circle
indicate the execution time of each job on each machine. For the above example, for pair (2,4), in order
to start operation, fourth machine waits for the third machine 7 min, fifth machine waits for the fourth
machine 6 min so for pair (2,4), the total delayed time is 6 þ 7 ¼ 13 min. For pair (4,2), in order to start
operation, fourth machine waits for the third machine 3 min and the fifth machine waits for the fourth
machine 2 min and the total delayed time is 2 þ 3 ¼ 5 min. As it is seen from the results, pair (4,2)
Table 2
Frequency values of each job at each comparison
Comparison Job
J1 J2 J3 J4
1 1 0
2 0 1
3 0 1
4 0 1
5 1 1
6 1 0
Fig. 1. Comparison of pair (2,4) and pair (4,2).
D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696684
is dominant. So we add þ1 to the frequency of the fourth job and subtract þ1 from the frequency of the
second job. The frequency of the fourth job becomes 3 and the frequency of the second job becomes 0.
Since the second job has the least frequency value, the second job takes the last place in the sequence.
When we overview the frequency values of all jobs, we see that the third and the fourth jobs have equal
frequencies so we should consider two alternative sequences, sequence 3-4-1-2 and sequence 4-3-1-2.
Sequence 3-4-1-2 gives total completion time of 57 and sequence 4-3-1-2 gives total completion time of
54, so the final sequence is 4-3-1-2.
3.2. Aslan’s point algorithm
This algorithm is also developed by Aslan (1999) with the objective of minimizing makespan and
works as follows:
Step 1
Compare pair ij and ji: If pair ij results in less completion time than pair ji then assign positive point to
job i which is equal to the difference between makespan ji and makespan ij and also assign negative point
to job j which is equal to the difference between makespan ji and ij:
Step 2
For all the jobs sum up the points and sequence the jobs in decreasing point order.
3.2.1. A numerical example to Aslan’s point algorithm
When we consider the same problem in Table 1, first of all we compare pair (1,2) and pair (2,1).
We see that pair (1,2) gives less total completion time than pair (2,1) so we assign 42 2 41 ¼ þ1
positive point to job 1 and 21 point to job 2. Then we compare pair (1,3) and pair (3,1). Pair (3,1) gives
less completion time than pair (1,3) so we assign 46 2 42 ¼ þ4 point to job 3 and 24 point to job 1.
Later, we compare pair (1,4) and pair (4,1), we give 41 2 40 ¼ þ1 point to job 4 and 21 point to job 1.
We repeat this procedure for all the pairs and the point values obtained are 24 for the first job, 22 for
the second job, þ8 for the third job and 22 for the fourth job. The final sequence is 3-4-2-1 or 3-2-4-1
with the makespan 58.
The numerical examples presented for these two heuristic algorithms are only given for
demonstration purpose, different experiments need to be performed to test the effectiveness of these
new heuristics and they should be compared with other heuristics in the literature. One can find more
information in Aslan (1999).
3.2.2. The solution of 5-job-43-machine problem by heuristic methods
In this study, a scheduling problem in a real permutation flowshop environment is considered. In this
environment, five different items, in batches are produced by passing through 43 serial machines in Axle
Housing Workshop of a manufacturing plant which is a supplier of automative axles and axle
components. According to the production orders for 1 month and the global operation recipe, the best
sequence of five different products (big housing, light housing, small housing, additional axle,
trailer axle) are found by six different heuristic algorithms and are shown in Table 3. For each sequence
found by six different heuristic algorithms, the completion time of each job on each machine, job waiting
times and machine idle times are computed.
D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696 685
4. Developing a neural network by using backpack neural network system
To develop a neural network for this problem, a neural network software tool which uses
backpropagation training algorithm is employed. The backpropagation network is one of the most
widely used network architectures because of its ability to learn complex mappings and strong
foundation. Backpropagation is a systematic way of training a multilayer artificial neural network in a
supervised manner. It involves two phases of computation: a training phase (feed-forward phase) and a
backward (recall) phase. In the training phase, the network will learn the relationship between inputs and
outputs by applying an input vector to the nodes of the network. In the backward phase, the network
predicts outputs when exposed to unseen examples or new inputs (Sabuncuoglu, 1998).
As a first step, the data is read into the system. Job and machine numbers, processing times of each job
on each machine, job waiting times, machine idle times are created in Excel and saved as a dbase file
which serves as a native database format used in Backpack Neural Network System. In order to realize
preprocessing on our data, one of N transformation on jobs and machines are created and a method is
proposed to determine the fuzzy completion times, job waiting and machine idle times. Triangular
fuzzy numbers are used to represent the fuzzy completion times. Job waiting and machine idle times
are considered as triangular or trapezoidal fuzzy numbers. The input variables are scaled to lie in the
range 0–1 and the output variables in the range 0.2–0.8. This approach reduces the training time
by eliminating the possibility of reaching the saturation regions of the sigmoid transfer function during
training.
4.1. Determining the fuzzy membership function for completion times
In job sequencing for a flowshop, processing times are frequently not known exactly and only
estimated intervals are given. Fuzzy numbers are ideally suited to represent these intervals (McCahon &
Lee, 1992). Processing times are formed of operation times. In most situations, the calculation of
operation times are done using 5 or 10% tolerance via Time Studies. So, in this study, we assume that
there can be deviations in the completion times which are formed of processing times. In our study,
instead of representing job processing times by using fuzzy numbers, completion times are represented
through the use of triangular fuzzy numbers. The graphical representation of a triangular fuzzy number is
shown in Fig. 2. For each heuristic method, each job’s completion time on the last machine is considered
as the most likely time and by adding and subtracting 5% tolerance value from this time, the pessimistic
Table 3
The best sequence and makespans for each heuristic algorithm
Algorithm Sequence Makespan
CDS 3-2-1-4-5 32,759.617
NEH 5-2-1-3-4 31,952.35
Koulamas’s 2-1-4-3-5 33,327.468
Aslan’s frequency 3-5-2-4-1 32,754.271
Aslan’s point 2-3-5-4-1 32,242.017
Aksoy’s 3-2-4-1-5 34,101.504
D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696686
time and the optimistic time for carrying out the last operation of each job is found. These 3 values
become the parameter set of our fuzzy set that are defined for each job. According to these parameters,
the membership functions are developed.
4.2. Determining the fuzzy membership functions for job waiting and machine idle times
In this study, to represent fuzzy job waiting and fuzzy machine idle times, trapezoidal fuzzy numbers
are employed. The trapezoidal fuzzy numbers are represented by (a, b, c, d). The membership function is
1 at b to c, and becomes zero at the two end points, a and d. The graphical representation of a trapezoidal
fuzzy number is illustrated in Fig. 3, where mðxÞ is the membership function and x is either job waiting or
machine idle times (McCahon & Lee, 1992).
For example, a manager may say that the job waiting or machine idle time for job A is generally ‘b’ to
‘c’ minutes. But due to other factors which cannot be controlled or predicted, the job waiting or machine
idle time may be occasionally as high as ‘d’ minutes or as low as ‘a’ minutes. We proposed a new
method to represent job waiting times and machine idle times. For each job, we summed up job waiting
times and for each machine, we summed up machine idle times in a cumulative manner. We assumed
that ‘a’ is the minimum job waiting or machine idle time and ‘d’ is the total job waiting or total machine
idle time which is the maximum value. ‘b’ is the first value of the cumulative total and ‘c’ is the
cumulative total value before the total cumulative times. If only one machine waits or just one job has a
waiting time, we used triangular membership functions having the parameters like a is 0, b is the
machine idle time or job waiting time and c is equal to 0.
The output variable for the model is chosen as the completion time. The activity of each hidden
unit is determined by the activities of the input units and the weights on the connections between
the input and hidden units. A three layer backpropagation network is developed for each heuristic
algorithm. Three layered networks can be taught to perform a particular task or to learn a
particular mapping as follows: first, the network is presented with training examples, which
consist of patterns of activities for the input units as well as the desired activity for the output unit.
Fig. 2. The graphical representation of a triangular fuzzy number.
Fig. 3. Trapezoidal fuzzy number.
D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696 687
Then, a determination is made of how close the actual output of the neural network is to the
desired output. The difference between the desired and the actual output is used as an error signal
to adjust the connection weights.
In this study, to perform the neural network analysis, our data is split into three datasets. These are the
training set, the test set and the validation set. Train set is the database containing the observations to
train the network. Test set is the database containing the observations to test the neural network to
determine when training should be stopped. Validate set is the database containing observations that the
neural network has not seen. This data set is applied to our trained neural network to assess its
performance in real world conditions. The Split Data function of the software package develops uniform
distributions of observations for the train and test data sets. This allows the training (using the train and
test sets) to ensure that all features of the distribution are learned. The validate set follows the same
distribution as the base data set.
To specify data sets, the output variable, completion time is selected to divide the base data set
into groups. For each heuristic, the develop data sets function pulls stratified samples from the base
data sets and the number of groups (the number of strata) to be used to stratify the data sets is chosen
between 3 and 6. For different number of strata, the data set percentage which sets the percentage of
the Controlling Group (The group or bin with the least amount of observations. This group sets the
number of observations to be randomly selected from the other groups in order to achieve the desired
distributions) that is assigned to the train, test and validate data sets are chosen as 60% for the train,
20% for the test and 20% for the validate data set. After specifying the data sets, the training process
can begin.
During the training of the neural network, the data in the training set will be thoroughly examined
and the network will learn or generalize the relationship between the dependent (each variable being
predicted) and independent variables (each input variable) so that the trained network can be used to
give us estimates or answers given new data cases. At intervals, under the control of the network
development system (program), the partially trained network will be presented with the independent
variables in the test set and will make predictions of the dependent variable, which is known.
The goodness of fit of the predictions will be measured. Based upon the fit, modifications to the
network might be made and training continued using the training set. For each heuristic, for different
number of strata, different data sets are used which were generated by splitting the data. The number
of train and test observations for each heuristic for different number of groups are given in
Tables 4–9. These tables show the values belonging to the networks resulting in smallest root mean
square errors.
Table 4
Measurement of fit for the test set for CDS algorithm
Heuristic No. of
groups
Train obs.
read
Test obs.
read
Test set
RMSE
Low test
RMSE
No. of
epochs
No. of
failed tries
Epoch
size
No. of hid.
layer nodes
CDS 6 57 23 0.0966 0.0966 13,000 30 12 6
CDS 5 103 27 0.0379 0.0377 14,200 30 12 10
CDS 4 71 32 0.0671 0.0652 6800 30 71 6
CDS 3 103 40 0.0681 0.0674 8400 30 103 6
D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696688
Table 5
Measurement of fit for the test set for NEH algorithm
Heuristic No. of
groups
Train obs.
read
Test obs. read Test set
RMSE
Low test
RMSE
No. of
epochs
No. of
failed tries
Epoch
size
No. of hid.
layer nodes
NEH 6 48 11 0.0298 0.0282 8000 30 48 6
NEH 5 68 18 0.0714 0.0714 62,000 30 68 6
NEH 4 73 16 0.0642 0.063 7600 30 12 6
NEH 3 86 29 0.0611 0.0606 9000 30 80 6
Table 6
Measurement of fit for the test set for Koulamas’ algorithm
Heuristic No. of
groups
Train obs.
read
Test obs.
read
Test set
RMSE
Low test
RMSE
No. of
epochs
No. of
failed tries
Epoch
size
No. of hid.
layer nodes
Koulamas 6 80 31 0.0507 0.0502 15,000 30 12 6
Koulamas 5 89 21 0.0488 0.0410 6600 30 89 6
Koulamas 4 94 32 0.0445 0.0444 25,000 30 94 6
Koulamas 3 114 37 0.0285 0.0269 9000 30 114 6
Table 7
Measurement of fit for the test set for Aslan’s frequency algorithm
Heuristic No. of
groups
Train obs.
read
Test obs.
read
Test set
RMSE
Low test
RMSE
No. of
epochs
No. of
failed tries
Epoch
size
No. of hid.
layer nodes
Frequency 6 59 18 0.0532 0.0532 15,000 30 15 6
Frequency 5 65 16 0.0965 0.0936 6800 30 12 6
Frequency 4 58 16 0.0996 0.0738 6200 30 58 6
Frequency 3 78 26 0.0761 0.0760 18,200 30 12 6
Table 8
Measurement of fit for the test set for Aslan’s point algorithm
Heuristic No. of
groups
Train obs.
read
Test obs.
read
Test set
RMSE
Low test
RMSE
No. of
epochs
No. of
failed tries
Epoch
size
No. of hid.
layer nodes
Point 6 57 23 0.0835 0.0834 10,800 30 57 6
Point 5 72 21 0.0618 0.0615 15,800 30 12 6
Point 4 93 33 0.0651 0.0608 8200 30 93 6
Point 3 95 32 0.0637 0.0619 7000 30 95 6
D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696 689
4.3. Network architecture
A backpropagation neural network is adopted in this study, in which signals are passed from the input
layer to the output layer through a hidden layer and learning is done by adjusting the connection weights
by a gradient descent algorithm that involves backpropagating the error to previous layers.
A difficult task with ANNs involves choosing the architecture parameters of the network. Although the
neural network has the possibility of solving various scheduling problems, how to specify the values of
many parameters and weights of these networks remains a critical issue (Raaymakers & Weijters, 2003).
At present, there is no established theoretical method to determine the optimal configuration of a
network. Most of the design parameters are application dependent and must be determined empirically.
A feedforward ANN with single layer is employed in this work. There are no constraints about the
number of hidden layer(s); a network can have only one, or as many hidden layers as
selected. However, there is no evidence that the network with more hidden layers can perform better
(Dagli, 1994). Patterson (1996) indicated that a single hidden layer would be sufficient for most
applications (Shiue & Su, 2002). The need for more than one hidden layer is highly unlikely because the
networks with additional layers require significantly longer training times despite the fact that the
networks with larger hidden layers distribute the weights effectively over the layers and provide better
performance based on hidden layers criterion. In the light of this information, with the aim of not
increasing the training times, a single hidden layered network is chosen for the implementation purpose
in this study. The number of hidden layer nodes is also among the most important considerations when
solving problems using multilayered feedforward neural networks. An insufficient number of hidden
layer neurons generally results in the network’s inability to solve a particular problem, while too many
hidden layer neurons may result in a network with poor generalization performance and can lead to
over-fitting (Liu, Chang, & Zhang, 2002). The number of hidden layer nodes for our problem is
determined by a trial and error procedure, in which various architectures are constructed by changing the
number of nodes in the hidden layer. Our neural networks applied to model different heuristic
algorithms, are trained with 4, 6, 10, 15, 20, 25, 30 hidden nodes with epoch sizes of 12 (default value of
the software), 15, entire training observations, and epoch sizes of 50 and 80 (for the models having
number of training observations greater than 50) for different number of groups. All the network
architectures are compared and, for each network, modeling each heuristic algorithm with different
number of groups, the one with the minimum root mean square error (RMSE) is selected as the best one
for each group. Tables 4–9 show the epoch sizes and hidden layer nodes which result in the smallest
RMS errors. The number of output nodes correspond to the number of outputs in the model. Since our
study is a prediction problem, typically this will be 1. Because we are dealing with a prediction problem,
Table 9
Measurement of fit for the test set for Aksoy’s algorithm
Heuristic No. of
groups
Train obs.
read
Test obs.
read
Test set
RMSE
Low test
RMSE
No. of
epochs
No. of
failed tries
Epoch
size
No. of hid.
layer nodes
Aksoy 6 63 21 0.1034 0.1034 36,600 30 63 6
Aksoy 5 104 31 0.0369 0.0366 9800 30 104 6
Aksoy 4 69 23 0.0846 0.0836 6400 30 69 6
Aksoy 3 106 41 0.0254 0.0254 65,000 3 12 10
D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696690
the RMS error is selected as the training criterion. It is a quantitative measure of learning to reflect the
degree of learning that takes place in a network. As a network learns, its root mean square decreases.
Every network is trained until reaching 65,000 epochs of training where each epoch is defined by the
training observations presented to the network. Although the maximum number of epochs to execute
before terminating the training session is defined as 65,000, the training may end before the maximum is
reached if the performance on the test set does not improve. Another important point is the initialization
of the network’s weights, which is done with random values in order to break the symmetry among the
hidden nodes. In this study, the initial weights of the backpropagation network are randomly set between
20.2 and þ0.2 and 215 observations are collected from the axle housing workshop belonging to
previous month for generating the database.
One of the problems of the gradient descent rule which is used as a learning procedure for
backpropagation networks is setting an appropriate learning rate (Shiue & Su, 2002). The learning rate
determines the size of the steps in the search space to find the minimal training error (Raaymakers &
Weijters, 2003). A small learning rate results in longer learning times while a large learning rate causes
oscillations during the training of the network. One efficient and commonly used procedure that allows a
greater learning rate without causing divergent oscillations is the addition of a momentum term to the
gradient descent method. As with the learning rate, if the momentum term is too large, the network will
display a chaotic learning behavior. Because learning time is not the issue in this study, both learning
parameters are chosen relatively small at the beginning. Starting value for the learning rate and
momentum coefficient are both set to 0.2. Increase in the learning rate is defined as 0.095, and decrease
in the learning rate is defined as 0.1. The increase in the learning rate is the amount added to the learning
rate if the current weight change is in the same direction (same sign) as the prior weight change, the
decrease in the learning rate is the percentage amount the learning rate is decreased by, if the current
weight change is in the opposite direction (different sign) as the prior weight change. The increase in the
momentum value is defined as 0.05 and the decrease in the momentum term is defined as 0.1.
5. Experimental results
In order to evaluate the performance of the ANN model, the validate dataset is applied to our trained
model and the prediction accuracy of the model is investigated by comparing the fit statistics for each
heuristic with different number of strata. According to the values of the correlation coefficient for
different levels of significance, the fit statistics showed the validity of our model. After obtaining
Table 10
The summary of the results for each heuristic algorithm with the highest correlation coefficient
Heuristic No. of groups Corr. coef. ðR2Þ RMSE MAPE MAE
CDS 5 0.9882 0.0250 10.74 733.9157
NEH 3 0.9622 0.0446 25.15 1290.624
Koulamas 4 0.9863 0.0280 7.70 762.4662
Frequency 6 0.9339 0.0563 30.74 1705.223
Point 3 0.9635 0.0431 15.60 1168.647
Aksoy 3 0.9909 0.0224 9.00 609.6883
D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696 691
satisfactory performance results, it is decided to use the network to predict the completion times of each
job on each machine for the next production period. Since correlation coefficient is a measure of the
linear relationship between two variables, for each heuristic, the number of stratas that give the highest
correlation coefficient are chosen as the best result. The summary of the results for each heuristic
algorithm with the highest correlation coefficient is given in Table 10.
After the determination of the best number of strata for each heuristic resulting in smallest root mean
squared error, the predicted and actual completion times are compared. The fit statistics are obtained
from the model and fit graphs are drawn showing the comparison between the actual and the predicted
completion times. The fit graphs are given in Figs. 4–9.
The last values (makespans) of the predicted and actual completion times (belonging to the last
machine) for each heuristic are illustrated in Table 11.
Although there are differences between actual and predicted completion times as shown in Figs. 4–9,
the results indicate that, the models applied to model different heuristic scheduling algorithms predicted
the completion times with good accuracies between 0.9339 and 0.9909. The fact that all values are very
close to unity, indicates that the mapping was performed at a satisfactory level even when fuzzy
Fig. 4. Actual versus predicted completion times for CDS algorithm.
Fig. 5. Actual versus predicted completion times for NEH algorithm.
D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696692
Fig. 6. Actual versus predicted completion times for Koulamas’ algorithm.
Fig. 7. Actual versus predicted completion times for Aslan’s frequency algorithm.
Fig. 8. Actual versus predicted completion times for Aslan’s point algorithm.
D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696 693
information was present. If we calculate the ratio between the predicted makespans and the actual
makespans, we can see that all of the algorithms are successful in predicting the makespans. The decision
maker can choose one of these algorithms but to specify the due dates in agreement with the customer,
taking the average of these ratios which is equal to 1.00062 is offered as an alternative to find the
makespans of the jobs at hand. By this way, a better production plan that meets the customers’ due dates
can be made.
It is to be noted that, identifying an appropriate ANN model has a strong impact on the performance of
the networks in predicting the makespans. For example, the effect of different values of epoch sizes is
shown in the tables below. As shown in Table 12, training the network modeling Aslan’s frequency
algorithm with epoch size of 59 gives better results than training the network with epoch size of 12,
Fig. 9. Actual versus predicted completion times for Aksoy’s algorithm.
Table 11
The actual and predicted makespans for each heuristic algorithm
Heuristic Actual Predicted Predicted/actual
CDS 32,024.998 32,578.713 1.01729
NEH 31,952.000 31,825.555 0.99604
Koulamas 33,327.000 34,139.152 1.02436
Aslan’s frequency 32,753.998 32,750 0.99987
Aslan’s point 32,242.000 31,583.658 0.97958
Aksoy 32,435.000 31,999.766 0.98658
Table 12
Effect of epoch size on the performance of the network modeling Aslan’s frequency algorithm
Epoch size No. of hidden nodes Train obs. read Test obs. read Test set RMSE Corr. coef. Predicted makespan
12 6 59 18 0.0596 0.8843 22,766.303
59 6 59 18 0.0534 0.8902 29,919.959
15 6 86 21 0.0532 0.9339 32,750
15 6 59 18 0.0597 0.8934 29,194.941
D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696694
but setting the epoch size to 15 and increasing the number of training observations (by reallocating the
observations in each group) gives the best result. Table 13 also shows the effect of different values of
epoch sizes on the performance of the network modeling Aslan’s point algorithm. Training the network
with the entire training set (95 observations) gives the best result in predicting the makespan. Because the
results showed good prediction ability, no more training observations are needed to train the network.
6. Conclusions
Backpropagation networks have been successfully utilized for various complex and difficult
scheduling problems by many researchers. The main objective of this study is to introduce a new and
alternative approach of using a neural network for the estimation of the makespans of flowshops. Rather
than determining the optimal sequences directly by the network, backpropagation networks have been
applied to predict the completion times of five jobs planned to be produced for the next production
period. Once the neural network results are achieved modeling six heuristic algorithms, they are
compared with the actual results to show the feasibility of using the neural networks as an
implementation alternative to the existing heuristic scheduling algorithms. The trained model prediction
was in good agreement with the actual values hence, producing R2 values between 93.39 and 99.09%.
These results show that approximately 93.39 and 99.09% of the variation in the dependent variables
(output parameters) can be explained by the independent variables (input parameters) selected and the
data set used.
This study shows the applicability of artificial neural networks to real scheduling problems.
Manufacturing plants having a permutation flowshop environment can profit from the model obtained in
this study with their data.
The future directions of this study are to employ additional methods that will improve the accuracy of
the results. In this respect, decreasing the mean percentage errors, finding a perfect match between the
predicted and actual values seem to be a promising direction. Different neural network architectures may
be used to predict the makespan for planning the production schedules. Extensions of the proposed
method, involving different neural network architectures may be developed to solve complex scheduling
problems having different performance measures.
References
Aksoy, M. (1980). Duzgun sıralı is yerlerindeki statik sıralama problemlerinde toplam is akıs suresinin enazlanması icin yeni
bir Sezgisel Yontem. Yoneylem Aras., Bildiriler’80, 255–268.
Table 13
Effect of epoch size on the performance of the network modeling Aslan’s point algorithm
Epoch size No. of hidden nodes Train obs. read Test obs. read Test set RMSE Corr. coef. Predicted makespan
12 6 95 32 0.0752 0.9489 29,076.232
95 6 95 32 0.0637 0.9635 31,583.658
15 6 95 32 0.0713 0.9595 33,147.738
D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696 695
Aslan, D. (1999). Model development and application based on object oriented neural network for scheduling problem.
DEU Res. Fund. Proj., Nr. 0908.97.07.01. University of Dokuz Eylul, Izmir.
Baker, K. R. (1974). Introduction to sequencing and scheduling. New York: Wiley.
Campbell, H. G., Dudek, R. A., & Smith, M. L. (1970). A heuristic algorithm for the n-job, m-machine sequencing problem.
Management Science, 16, 630–637.
Dagli, C. H. (1994). Artificial neural networks for intelligent manufacturing. London: Chapman and Hall.
El-Bouri, A., Balakrishnan, S., & Pooplewell, N. (2000). Sequencing jobs on a single machine: A neural network approach.
European Journal of Operational Research, 126, 474–490.
Feng, S., Li, L., Cen, L., & Huang, J. (2003). Using MLP networks to design a production scheduling system. Computers and
Operations Research, 30, 821–832.
Ho, J. C., & Chang, Y. L. (1991). A new heuristic for the n-job, M-machine flow-shop problem. European Journal of
Operational Research, 52, 194–202.
Jain, A. S., & Meeran, S. (1998). Job shop scheduling using neural networks. International Journal of Production Research, 36,
1249–1272.
Jain, A. S., & Meeran, S. (2002). A multi-level hybrid framework applied to the general flow-shop scheduling problem.
Computers and Operations Research, 29, 1873–1901.
Koulamas, C. (1998). A new constructive heuristic for the flowshop scheduling problem. European Journal of Operational
Research, 105, 66–71.
Lee, I., & Shaw, M. J. (2000). A neural-net approach to real time flow-shop sequencing. Computers and Industrial Engineering,
38, 125–147.
Liu, D., Chang, T. S., & Zhang, Y. (2002). A constructive algorithm for feedforward neural networks with incremental training.
IEEE Transactions on Circuits and Systems—I: Fundamental and Applications, 49, 1876–1879.
McCahon, C. S., & Lee, E. S. (1992). Fuzzy job sequencing for a flowshop. European Journal of Operational Research, 62,
294–301.
Nawaz, M., Enscore, E., & Ham, I. (1983). A heuristic algorithm for the n-job, m-machine flowshop sequencing problem.
Omega, 11, 91–95.
Nowicki, E. (1999). The permutation flow shop with buffers: A tabu search approach. European Journal of Operational
Research, 116, 205–219.
Patterson, D. W. (1996). Artificial neural networks: Theory and applications. Singapore: Prentice-Hall.
Raaymakers, W. H. M., & Weijters, A. J. M. M. (2003). Makespan estimation in batch process industries: A comparison
between regression analysis and neural networks. European Journal of Operational Research, 145, 14–30.
Rajendran, C., & Chaudri, D. (1991). An efficient heuristic approach to the scheduling of jobs in a flowshop. European Journal
of Operational Research, 61, 318–325.
Sabuncuoglu, I. (1998). Scheduling with neural networks: A review of the literature and new research directions. Production
Planning and Control, 9, 2–12.
Sabuncuoglu, I., & Gurgun, B. (1996). A neural network model for scheduling problems. European Journal of Operational
Research, 93, 288–299.
Shiue, Y. R., & Su, C. T. (2002). Attribute selection for neural network based adaptive scheduling systems in flexible
manufacturing systems. International Journal of Advanced Manufacturing Technology, 20, 532–544.
Smith, K. (1999). Neural networks for combinatorial optimization: A review of more than a decade research. Informs Journal
on Computing, 11, 15–34.
D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696696