application of neural networks to heuristic scheduling algorithms

Application of neural networks to heuristic scheduling algorithms

Derya Eren Akyol*

Department of Industrial Engineering, University of Dokuz Eylul, 35100 Bornova-Izmir, Turkey

Available online 2 July 2004

Abstract

This paper considers the use of artificial neural networks (ANNs) to model six different heuristic algorithms

applied to the n job, m machine real flowshop scheduling problem with the objective of minimizing makespan. The

objective is to obtain six ANN models to be used for the prediction of the completion times for each job processed

on each machine and to introduce the fuzziness of scheduling information into flowshop scheduling. Fuzzy

membership functions are generated for completion, job waiting and machine idle times. Different methods are

proposed to obtain the fuzzy parameters. To model the functional relation between the input and output variables,

multilayered feedforward networks (MFNs) trained with error backpropagation learning rule are used. The trained

network is able to apply the learnt relationship to new problems. In this paper, an implementation alternative to the

existing heuristic algorithms is provided. Once the network is trained adequately, it can provide an outcome

(solution) faster than conventional iterative methods by its generalizing property. The results obtained from the

study can be extended to solve the scheduling problems in the area of manufacturing.

q 2004 Elsevier Ltd. All rights reserved.

Keywords: Artificial neural networks; Multilayered perceptron; Heuristic scheduling; Flowshop scheduling problems; Fuzzy

membership functions

1. Introduction

The flowshop scheduling problem is considered as one of the general production scheduling problems

in which n different jobs must be processed by m machines in the same order. The problem can be

considered as finding a scheme of allocation of tasks to a limited number of competing resources, with an

objective of satisfying constraints and optimizing performance criteria. Much research literature

addresses methods of minimizing performance measures such as makespan. The makespan

minimization, within the general flowshop scheduling domain, provides a useful area for analysis

because it is an important model in scheduling theory and it is usually very difficult to find its optimal

solution (Jain & Meeran, 2002).

0360-8352/$ - see front matter q 2004 Elsevier Ltd. All rights reserved.

doi:10.1016/j.cie.2004.05.005

Computers & Industrial Engineering 46 (2004) 679–696www.elsevier.com/locate/dsw

* Tel.: þ90-232-3881047; fax: þ90-232-3887864.E-mail address: [email protected] (D.E. Akyol).

http://www.elsevier.com/locate/dsw

During the last 40 years, most of the work has been applied to the permutation flowshop problem.

In the permutation flowshop, n different jobs have to be processed on m machines. Each job has one

operation on each machine and all jobs have the same ordering sequence on each machine. At any time,

each machine can process at most one job. Preemption is not allowed. The objective is to find a

permutation of jobs that minimizes the maximum completion time or makespan. This problem denoted

by n=m=P=Cmax; is found to be an NP-complete problem of combinatorial optimization problems.

Complete enumeration, integer programming, branch and bound techniques can be used to find the

optimal sequences for small-size problems but they do not provide efficient solutions for large size

problems. In view of the combinatorial complexity and time constraints, most of the large problems can

be solved only by heuristic methods (Lee & Shaw, 2000). Though, efficient heuristics cannot guarantee

optimal solutions, they provide approximate solutions almost as good as the optimal solutions (Ho &

Chang, 1991).

In recent years, the technological advancements in hardware and software have encouraged new

application tools such as neural networks to be applied to combinatorially exploding NP-hard problems

(Jain & Meeran, 1998). They have emerged as efficient approaches in a variety of engineering

applications where problems are difficult to formulate or awkwardly defined. They are computational

structures that implement simplified models of biological processes which are preferred for their

robustness, massive parallelism and ability to learn. They have proven to be more useful for complicated

problems difficult to solve with conventional methods. The advantage of them lies in their resilience

against distortions in the input data and their learning capabilities. With their learning capabilities,

they avoid having to develop a mathematical model or acquiring the appropriate knowledge to solve a

task. The ability to map and solve a number of problems motivated the proposal for using neural

networks as a highly parallel model for general purpose computing. so that, they have been applied for

solving scheduling and different combinatorial optimization problems. Finding the relationship between

the data (i.e. processing times, due dates, etc.) and schedules, determining the optimal sequence for the

jobs to be processed, identifying best dispatching strategies (i.e. scheduling rules), etc. are some of

the application areas of neural networks in the scheduling literature (Sabuncuoglu & Gurgun, 1996).

Sabuncuoglu (1998) presented a detailed review of the literature in the area of scheduling, and study of

Smith (1999) involves a review of the research works on the use of NNs in combinatorial optimization.

Neural networks as learning tools, have demonstrated their ability to capture the general relationship

between variables that are difficult or impossible to relate to each other analytically by learning, recalling

and generalizing from training patterns as data (Shiue & Su, 2002). In other words, they are universal

function approximators and are therefore attractive for automatically learning the (nonlinear) functional

relation between the input and output variables (Raaymakers & Weijters, 2003).

In this study, a scheduling problem in a real permutation flowshop environment is considered.

Using the information of production orders for 1 month and the global operation recipe, the best

sequence of five different products are found by six different heuristic algorithms. For each sequence

found by six different heuristic algorithms, the completion time of each job on each machine, job waiting

times and machine idle times are computed and read into the system. To model these six different

heuristic scheduling algorithms, one of the most popular neural network architectures, the multilayered

perceptron (MLP) neural network is used. In order to develop a neural network, the Backpack Neural

Network System Version 4.0 (by Z Solutions) is used and some necessary steps are followed. For each of

the heuristic algorithms, the neural network model is used for estimating the makespan of five jobs

processed on 43 machines. By this way, we presented a neural network based implementation alternative

D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696680

to the existing heuristic algorithms. The proposed method is simple and straightforward. An MLP neural

network is trained on data from a real world problem to learn the functional relationship between the

input and output variables. After the training process is completed, MLP can provide outputs of adequate

accuracy over a limited range of input conditions, having the advantage of requiring a lot less

computation than other modeling methods (Feng, Li, Cen, & Huang, 2003). In other words, the neural

network’s computational speed permits fast solutions to problems not seen previously by the network

(El-Bouri, Balakrishnan, & Pooplewell, 2000).

This paper is organized as follows. Section 2 presents a mathematical formulation of the permutation

flowshop scheduling problem with the makespan objective. In Section 3, we give information about the

heuristic procedures considered in this study. Steps of developing a backpropagation network are explained

in Section 4. Section 5 includes the experimental results. Finally, Section 6 provides conclusions.

2. The permutation flowshop scheduling problem with the makespan criterion

In a permutation flowshop scheduling problem, there are a set of jobs I ¼ {1; 2; 3;…; n} and a set of

machines J ¼ {1; 2; 3;…;m}: Each of the n jobs has to be processed on m machines 1,2,…m in the order

given by the indexing of the machines. Thus the job i; i [ I consists of a sequence of m operations,

each of which must be processed on machine j for an uninterrupted processing time pij: Each machine j;

j [ J; can process at most one job at a time, each job can be processed on at most one machine at a time

and once an operation is started, it must be completed without interruption (Baker, 1974). Let Cij be the

completion time of operation i on machine j: The makespan Cmax is the maximum completion among all

jobs. In the permutation flowshop problem with the makespan objective, the goal is to find a permutation

schedule that minimizes the makespan Cmax; where a permutation schedule for a flowshop instance is a

schedule in which each machine processes the jobs in the same order.

In order to provide a formal mathematical model of the problem, we apply the notion of job

processing order represented by a permutation P ¼ {p1;p2;…;pn} on the set I; where pi denotes the

element of I which is in position i in P (Nowicki, 1999). Then we calculate the completion time of the

partial schedule pi on machine j denoted by Cðpi; jÞ as follows:

Cðp1; 1Þ ¼ pðp1; 1Þ ð1Þ

Cðpi; 1Þ ¼ Cðpi21; 1Þ þ pðpi; 1Þ for i ¼ 2;…; n ð2Þ

Cðp1; jÞ ¼ Cðp1; j 2 1Þ þ pðp1; jÞ for j ¼ 2;…;m ð3Þ

Cðpi; jÞ ¼ max{Cðpi21; jÞ;Cðpi; j 2 1Þ} þ pðpi; jÞ for i ¼ 2;…; n; j ¼ 2;…;m: ð4Þ

Finally, we define the makespan as

CmaxðpÞ ¼ Cðpn;mÞ: ð5Þ

The permutation flowshop scheduling problem is then to find a permutation pp in the set of all

permutations P such that (Rajendran & Chaudri, 1991)

Cmax ðppÞ # CmaxðpÞ ;p [ P: ð6Þ

D.E. Akyol / Computers & Industrial Engineering 46 (2004) 679–696 681

In this study, we consider a flowshop consisting of M machines, each with unlimited buffer space.

There is no additional restriction that the processing of each job has to be continuous so that, there may

be waiting times between the processing of any consecutive tasks of each job.

The main assumptions for this problem are:

- a set of n multiple-operation jobs is available for processing at time zero (each job requires m

operations and each operation requires a different machine).

- the set up times for the operations are sequence-independent and are included in the

processing times.

- m different machines are continuously available.

- individual operations are not preemptable.

3. Heuristics

Six flowshop heuristics are considered in this study. One can find the explanations of these methods in

Aksoy (1980), Campbell, Dudek, and Smith (1970), Koulamas (1998) and Nawaz, Enscore, and Ham

(1983). The other two heuristic algorithms are new and are presented below.

3.1. Aslan’s frequency algorithm

This algorithm is developed by Aslan (1999) with the objective of minimizing makespan and works as

follows:

Step 1

Take the operation times of each job on each machine, generate an n £ m dimensional problem.

Step 2

Consider all of the combinations of all jobs, produce nðn 2 1Þ pairs (two by two).

Step 3

Calculate the partial makespan of all pairs by loading the jobs on the machines. Pair ði; jÞ and pair ðj; iÞ

are compared. For the pairs which have smaller completion time, the first job takes the frequency value

of 1, the other job 0.

By these comparisons nðn 2 1Þ=2 frequency values are obtained. If the completion time of the pairs

are the same then both jobs take the frequency value of 1.

Step 4

Sum up the frequency values of all jobs and sort them in decreasing order. (This method sequences the

jobs in decreasing frequency value order).

Step 5

If the jobs have equal frequency values, consider alternative sequences and the sequence which results in

less total completion time is the final sequence.


3.1.1. Improvement phase

The job pairs which have equal total completion times are evaluated and the dominant pairs are found.

The pairs which have less total machine idleness are considered as dominant. A frequency value of 1 is

added to the first job and a frequency value of 1 is subtracted from the other job and a new sequence is

generated.

3.1.2. A numerical example to Aslan’s frequency (dual sequencing) algorithm

Consider a flowshop with 5 machines. There are 4 jobs to be scheduled and their processing times are

as shown in Table 1.

We compare the calculated total completion times for pairs ij and ji for all of the combinations of

the jobs.

Pairs Completion times Pairs Completion times

(1,2) 41 (2,1): 42

(1,3) 46 (3,1): 42

(1,4) 41 (4,1): 40

(2,3) 41 (3,2): 40

(2,4) 39 (4,2): 39

(3,4) 38 (4,3): 41

.

First, pair (1,2) and (2,1) are compared. Since pair (1,2) results in less partial makespan, job1 takes the

frequency value of 1 and job 2 takes the frequency value of 0. By executing these nðn 2 1Þ=2

comparisons, we obtain the frequency values for all jobs.

As it is seen in Table 2, we assign frequency value of 1 to jobs 2 and 4 because pair (2,4) and pair (4,2)

have both total completion times of 39. Then the frequency values of all jobs are summed. The frequency

values for each job are obtained as follows:

Frequency for job 1:1




Table 1

Processing times for 4 job 5 machine problem

Job Machine

M1 M2 M3 M4 M5

J1 5 9 8 10 1

J2 9 3 10 1 8

J3 9 4 5 8 6

J4 4 8 8 7 2


The frequency values of all jobs are sorted in decreasing frequency value order. The method yields the

sequence 3-4-2-1 or 3-4-1-2. The makespan of these two sequences are compared and it is found that

3-4-1-2 sequence with Cmax ¼ 57 is better than the sequence 3-4-2-1 with Cmax ¼ 58:

3.1.3. Improvement phase

The job pairs (2,4) and (4,2) have equal total completion times. The dominant pairs are investigated.

Because pair (2,4) and pair (4,2) have equal total completion times of 39, we compare these two pairs

and try to decide which one is dominant. For each pair, except for the first machine, we calculate how

much the subsequent machine waits the preceding machine. In Fig. 1, the numbers on the upper side of

each circle indicate the starting time of each job on each machine and the numbers below the circle

indicate the execution time of each job on each machine. For the above example, for pair (2,4), in order

to start operation, fourth machine waits for the third machine 7 min, fifth machine waits for the fourth

machine 6 min so for pair (2,4), the total delayed time is 6 þ 7 ¼ 13 min. For pair (4,2), in order to start

operation, fourth machine waits for the third machine 3 min and the fifth machine waits for the fourth

machine 2 min and the total delayed time is 2 þ 3 ¼ 5 min. As it is seen from the results, pair (4,2)

Table 2

Frequency values of each job at each comparison

Comparison Job

J1 J2 J3 J4

1 1 0

2 0 1

3 0 1

4 0 1

5 1 1

6 1 0

Fig. 1. Comparison of pair (2,4) and pair (4,2).


is dominant. So we add þ1 to the frequency of the fourth job and subtract þ1 from the frequency of the

second job. The frequency of the fourth job becomes 3 and the frequency of the second job becomes 0.

Since the second job has the least frequency value, the second job takes the last place in the sequence.

When we overview the frequency values of all jobs, we see that the third and the fourth jobs have equal

frequencies so we should consider two alternative sequences, sequence 3-4-1-2 and sequence 4-3-1-2.

Sequence 3-4-1-2 gives total completion time of 57 and sequence 4-3-1-2 gives total completion time of

54, so the final sequence is 4-3-1-2.

3.2. Aslan’s point algorithm

This algorithm is also developed by Aslan (1999) with the objective of minimizing makespan and

works as follows:

Step 1

Compare pair ij and ji: If pair ij results in less completion time than pair ji then assign positive point to

job i which is equal to the difference between makespan ji and makespan ij and also assign negative point

to job j which is equal to the difference between makespan ji and ij:

Step 2

For all the jobs sum up the points and sequence the jobs in decreasing point order.

3.2.1. A numerical example to Aslan’s point algorithm

When we consider the same problem in Table 1, first of all we compare pair (1,2) and pair (2,1).

We see that pair (1,2) gives less total completion time than pair (2,1) so we assign 42 2 41 ¼ þ1

positive point to job 1 and 21 point to job 2. Then we compare pair (1,3) and pair (3,1). Pair (3,1) gives

less completion time than pair (1,3) so we assign 46 2 42 ¼ þ4 point to job 3 and 24 point to job 1.

Later, we compare pair (1,4) and pair (4,1), we give 41 2 40 ¼ þ1 point to job 4 and 21 point to job 1.

We repeat this procedure for all the pairs and the point values obtained are 24 for the first job, 22 for

the second job, þ8 for the third job and 22 for the fourth job. The final sequence is 3-4-2-1 or 3-2-4-1

with the makespan 58.

The numerical examples presented for these two heuristic algorithms are only given for

demonstration purpose, different experiments need to be performed to test the effectiveness of these

new heuristics and they should be compared with other heuristics in the literature. One can find more

information in Aslan (1999).

3.2.2. The solution of 5-job-43-machine problem by heuristic methods

In this study, a scheduling problem in a real permutation flowshop environment is considered. In this

environment, five different items, in batches are produced by passing through 43 serial machines in Axle

Housing Workshop of a manufacturing plant which is a supplier of automative axles and axle

components. According to the production orders for 1 month and the global operation recipe, the best

sequence of five different products (big housing, light housing, small housing, additional axle,

trailer axle) are found by six different heuristic algorithms and are shown in Table 3. For each sequence

found by six different heuristic algorithms, the completion time of each job on each machine, job waiting

times and machine idle times are computed.


4. Developing a neural network by using backpack neural network system

To develop a neural network for this problem, a neural network software tool which uses

backpropagation training algorithm is employed. The backpropagation network is one of the most

widely used network architectures because of its ability to learn complex mappings and strong

foundation. Backpropagation is a systematic way of training a multilayer artificial neural network in a

supervised manner. It involves two phases of computation: a training phase (feed-forward phase) and a

backward (recall) phase. In the training phase, the network will learn the relationship between inputs and

outputs by applying an input vector to the nodes of the network. In the backward phase, the network

predicts outputs when exposed to unseen examples or new inputs (Sabuncuoglu, 1998).

As a first step, the data is read into the system. Job and machine numbers, processing times of each job

on each machine, job waiting times, machine idle times are created in Excel and saved as a dbase file

which serves as a native database format used in Backpack Neural Network System. In order to realize

preprocessing on our data, one of N transformation on jobs and machines are created and a method is

proposed to determine the fuzzy completion times, job waiting and machine idle times. Triangular

fuzzy numbers are used to represent the fuzzy completion times. Job waiting and machine idle times

are considered as triangular or trapezoidal fuzzy numbers. The input variables are scaled to lie in the

range 0–1 and the output variables in the range 0.2–0.8. This approach reduces the training time

by eliminating the possibility of reaching the saturation regions of the sigmoid transfer function during

training.

4.1. Determining the fuzzy membership function for completion times

In job sequencing for a flowshop, processing times are frequently not known exactly and only

estimated intervals are given. Fuzzy numbers are ideally suited to represent these intervals (McCahon &

Lee, 1992). Processing times are formed of operation times. In most situations, the calculation of

operation times are done using 5 or 10% tolerance via Time Studies. So, in this study, we assume that

there can be deviations in the completion times which are formed of processing times. In our study,

instead of representing job processing times by using fuzzy numbers, completion times are represented

through the use of triangular fuzzy numbers. The graphical representation of a triangular fuzzy number is

shown in Fig. 2. For each heuristic method, each job’s completion time on the last machine is considered

as the most likely time and by adding and subtracting 5% tolerance value from this time, the pessimistic

Table 3

The best sequence and makespans for each heuristic algorithm

Algorithm Sequence Makespan

CDS 3-2-1-4-5 32,759.617

NEH 5-2-1-3-4 31,952.35

Koulamas’s 2-1-4-3-5 33,327.468

Aslan’s frequency 3-5-2-4-1 32,754.271

Aslan’s point 2-3-5-4-1 32,242.017

Aksoy’s 3-2-4-1-5 34,101.504


time and the optimistic time for carrying out the last operation of each job is found. These 3 values

become the parameter set of our fuzzy set that are defined for each job. According to these parameters,

the membership functions are developed.

4.2. Determining the fuzzy membership functions for job waiting and machine idle times

In this study, to represent fuzzy job waiting and fuzzy machine idle times, trapezoidal fuzzy numbers

are employed. The trapezoidal fuzzy numbers are represented by (a, b, c, d). The membership function is

1 at b to c, and becomes zero at the two end points, a and d. The graphical representation of a trapezoidal

fuzzy number is illustrated in Fig. 3, where mðxÞ is the membership function and x is either job waiting or

machine idle times (McCahon & Lee, 1992).

For example, a manager may say that the job waiting or machine idle time for job A is generally ‘b’ to

‘c’ minutes. But due to other factors which cannot be controlled or predicted, the job waiting or machine

idle time may be occasionally as high as ‘d’ minutes or as low as ‘a’ minutes. We proposed a new

method to represent job waiting times and machine idle times. For each job, we summed up job waiting

times and for each machine, we summed up machine idle times in a cumulative manner. We assumed

that ‘a’ is the minimum job waiting or machine idle time and ‘d’ is the total job waiting or total machine

idle time which is the maximum value. ‘b’ is the first value of the cumulative total and ‘c’ is the

cumulative total value before the total cumulative times. If only one machine waits or just one job has a

waiting time, we used triangular membership functions having the parameters like a is 0, b is the

machine idle time or job waiting time and c is equal to 0.

The output variable for the model is chosen as the completion time. The activity of each hidden

unit is determined by the activities of the input units and the weights on the connections between

the input and hidden units. A three layer backpropagation network is developed for each heuristic

algorithm. Three layered networks can be taught to perform a particular task or to learn a

particular mapping as follows: first, the network is presented with training examples, which

consist of patterns of activities for the input units as well as the desired activity for the output unit.

Fig. 2. The graphical representation of a triangular fuzzy number.

Fig. 3. Trapezoidal fuzzy number.


Then, a determination is made of how close the actual output of the neural network is to the

desired output. The difference between the desired and the actual output is used as an error signal

to adjust the connection weights.

In this study, to perform the neural network analysis, our data is split into three datasets. These are the

training set, the test set and the validation set. Train set is the database containing the observations to

train the network. Test set is the database containing the observations to test the neural network to

determine when training should be stopped. Validate set is the database containing observations that the

neural network has not seen. This data set is applied to our trained neural network to assess its

performance in real world conditions. The Split Data function of the software package develops uniform

distributions of observations for the train and test data sets. This allows the training (using the train and

test sets) to ensure that all features of the distribution are learned. The validate set follows the same

distribution as the base data set.

To specify data sets, the output variable, completion time is selected to divide the base data set

into groups. For each heuristic, the develop data sets function pulls stratified samples from the base

data sets and the number of groups (the number of strata) to be used to stratify the data sets is chosen

between 3 and 6. For different number of strata, the data set percentage which sets the percentage of

the Controlling Group (The group or bin with the least amount of observations. This group sets the

number of observations to be randomly selected from the other groups in order to achieve the desired

distributions) that is assigned to the train, test and validate data sets are chosen as 60% for the train,

20% for the test and 20% for the validate data set. After specifying the data sets, the training process

can begin.

During the training of the neural network, the data in the training set will be thoroughly examined

and the network will learn or generalize the relationship between the dependent (each variable being

predicted) and independent variables (each input variable) so that the trained network can be used to

give us estimates or answers given new data cases. At intervals, under the control of the network

development system (program), the partially trained network will be presented with the independent

variables in the test set and will make predictions of the dependent variable, which is known.

The goodness of fit of the predictions will be measured. Based upon the fit, modifications to the

network might be made and training continued using the training set. For each heuristic, for different

number of strata, different data sets are used which were generated by splitting the data. The number

of train and test observations for each heuristic for different number of groups are given in

Tables 4–9. These tables show the values belonging to the networks resulting in smallest root mean

square errors.

Table 4

Measurement of fit for the test set for CDS algorithm

Heuristic No. of

groups

Train obs.

read

Test obs.

read

Test set

RMSE

Low test

RMSE

No. of

epochs

No. of

failed tries

Epoch

size

No. of hid.

layer nodes

CDS 6 57 23 0.0966 0.0966 13,000 30 12 6

CDS 5 103 27 0.0379 0.0377 14,200 30 12 10

CDS 4 71 32 0.0671 0.0652 6800 30 71 6

CDS 3 103 40 0.0681 0.0674 8400 30 103 6


Table 5

Measurement of fit for the test set for NEH algorithm

Heuristic No. of

groups

Train obs.

read

Test obs. read Test set

RMSE

Low test

RMSE

No. of

epochs

No. of

failed tries

Epoch

size

No. of hid.

layer nodes

NEH 6 48 11 0.0298 0.0282 8000 30 48 6

NEH 5 68 18 0.0714 0.0714 62,000 30 68 6

NEH 4 73 16 0.0642 0.063 7600 30 12 6

NEH 3 86 29 0.0611 0.0606 9000 30 80 6

Table 6

Measurement of fit for the test set for Koulamas’ algorithm

Heuristic No. of

groups

Train obs.

read

Test obs.

read

Test set

RMSE

Low test

RMSE

No. of

epochs

No. of

failed tries

Epoch

size

No. of hid.

layer nodes

Koulamas 6 80 31 0.0507 0.0502 15,000 30 12 6

Koulamas 5 89 21 0.0488 0.0410 6600 30 89 6

Koulamas 4 94 32 0.0445 0.0444 25,000 30 94 6

Koulamas 3 114 37 0.0285 0.0269 9000 30 114 6

Table 7

Measurement of fit for the test set for Aslan’s frequency algorithm

Heuristic No. of

groups

Train obs.

read

Test obs.

read

Test set

RMSE

Low test

RMSE

No. of

epochs

No. of

failed tries

Epoch

size

No. of hid.

layer nodes

Frequency 6 59 18 0.0532 0.0532 15,000 30 15 6

Frequency 5 65 16 0.0965 0.0936 6800 30 12 6

Frequency 4 58 16 0.0996 0.0738 6200 30 58 6

Frequency 3 78 26 0.0761 0.0760 18,200 30 12 6

Table 8

Measurement of fit for the test set for Aslan’s point algorithm

Heuristic No. of

groups

Train obs.

read

Test obs.

read

Test set

RMSE

Low test

RMSE

No. of

epochs

No. of

failed tries

Epoch

size

No. of hid.

layer nodes

Point 6 57 23 0.0835 0.0834 10,800 30 57 6

Point 5 72 21 0.0618 0.0615 15,800 30 12 6

Point 4 93 33 0.0651 0.0608 8200 30 93 6

Point 3 95 32 0.0637 0.0619 7000 30 95 6


4.3. Network architecture

A backpropagation neural network is adopted in this study, in which signals are passed from the input

layer to the output layer through a hidden layer and learning is done by adjusting the connection weights

by a gradient descent algorithm that involves backpropagating the error to previous layers.

A difficult task with ANNs involves choosing the architecture parameters of the network. Although the

neural network has the possibility of solving various scheduling problems, how to specify the values of

many parameters and weights of these networks remains a critical issue (Raaymakers & Weijters, 2003).

At present, there is no established theoretical method to determine the optimal configuration of a

network. Most of the design parameters are application dependent and must be determined empirically.

A feedforward ANN with single layer is employed in this work. There are no constraints about the

number of hidden layer(s); a network can have only one, or as many hidden layers as

selected. However, there is no evidence that the network with more hidden layers can perform better

(Dagli, 1994). Patterson (1996) indicated that a single hidden layer would be sufficient for most

applications (Shiue & Su, 2002). The need for more than one hidden layer is highly unlikely because the

networks with additional layers require significantly longer training times despite the fact that the

networks with larger hidden layers distribute the weights effectively over the layers and provide better

performance based on hidden layers criterion. In the light of this information, with the aim of not

increasing the training times, a single hidden layered network is chosen for the implementation purpose

in this study. The number of hidden layer nodes is also among the most important considerations when

solving problems using multilayered feedforward neural networks. An insufficient number of hidden

layer neurons generally results in the network’s inability to solve a particular problem, while too many

hidden layer neurons may result in a network with poor generalization performance and can lead to

over-fitting (Liu, Chang, & Zhang, 2002). The number of hidden layer nodes for our problem is

determined by a trial and error procedure, in which various architectures are constructed by changing the

number of nodes in the hidden layer. Our neural networks applied to model different heuristic

algorithms, are trained with 4, 6, 10, 15, 20, 25, 30 hidden nodes with epoch sizes of 12 (default value of

the software), 15, entire training observations, and epoch sizes of 50 and 80 (for the models having

number of training observations greater than 50) for different number of groups. All the network

architectures are compared and, for each network, modeling each heuristic algorithm with different

number of groups, the one with the minimum root mean square error (RMSE) is selected as the best one

for each group. Tables 4–9 show the epoch sizes and hidden layer nodes which result in the smallest

RMS errors. The number of output nodes correspond to the number of outputs in the model. Since our

study is a prediction problem, typically this will be 1. Because we are dealing with a prediction problem,

Table 9

Measurement of fit for the test set for Aksoy’s algorithm

Heuristic No. of

groups

Train obs.

read

Test obs.

read

Test set

RMSE

Low test

RMSE

No. of

epochs

No. of

failed tries

Epoch

size

No. of hid.

layer nodes

Aksoy 6 63 21 0.1034 0.1034 36,600 30 63 6

Aksoy 5 104 31 0.0369 0.0366 9800 30 104 6

Aksoy 4 69 23 0.0846 0.0836 6400 30 69 6

Aksoy 3 106 41 0.0254 0.0254 65,000 3 12 10


the RMS error is selected as the training criterion. It is a quantitative measure of learning to reflect the

degree of learning that takes place in a network. As a network learns, its root mean square decreases.

Every network is trained until reaching 65,000 epochs of training where each epoch is defined by the

training observations presented to the network. Although the maximum number of epochs to execute

before terminating the training session is defined as 65,000, the training may end before the maximum is

reached if the performance on the test set does not improve. Another important point is the initialization

of the network’s weights, which is done with random values in order to break the symmetry among the

hidden nodes. In this study, the initial weights of the backpropagation network are randomly set between

20.2 and þ0.2 and 215 observations are collected from the axle housing workshop belonging to

previous month for generating the database.

One of the problems of the gradient descent rule which is used as a learning procedure for

backpropagation networks is setting an appropriate learning rate (Shiue & Su, 2002). The learning rate

determines the size of the steps in the search space to find the minimal training error (Raaymakers &

Weijters, 2003). A small learning rate results in longer learning times while a large learning rate causes

oscillations during the training of the network. One efficient and commonly used procedure that allows a

greater learning rate without causing divergent oscillations is the addition of a momentum term to the

gradient descent method. As with the learning rate, if the momentum term is too large, the network will

display a chaotic learning behavior. Because learning time is not the issue in this study, both learning

parameters are chosen relatively small at the beginning. Starting value for the learning rate and

momentum coefficient are both set to 0.2. Increase in the learning rate is defined as 0.095, and decrease

in the learning rate is defined as 0.1. The increase in the learning rate is the amount added to the learning

rate if the current weight change is in the same direction (same sign) as the prior weight change, the

decrease in the learning rate is the percentage amount the learning rate is decreased by, if the current

weight change is in the opposite direction (different sign) as the prior weight change. The increase in the

momentum value is defined as 0.05 and the decrease in the momentum term is defined as 0.1.

5. Experimental results

In order to evaluate the performance of the ANN model, the validate dataset is applied to our trained

model and the prediction accuracy of the model is investigated by comparing the fit statistics for each

heuristic with different number of strata. According to the values of the correlation coefficient for

different levels of significance, the fit statistics showed the validity of our model. After obtaining

Table 10

The summary of the results for each heuristic algorithm with the highest correlation coefficient

Heuristic No. of groups Corr. coef. ðR2Þ RMSE MAPE MAE

CDS 5 0.9882 0.0250 10.74 733.9157

NEH 3 0.9622 0.0446 25.15 1290.624

Koulamas 4 0.9863 0.0280 7.70 762.4662

Frequency 6 0.9339 0.0563 30.74 1705.223

Point 3 0.9635 0.0431 15.60 1168.647

Aksoy 3 0.9909 0.0224 9.00 609.6883


satisfactory performance results, it is decided to use the network to predict the completion times of each

job on each machine for the next production period. Since correlation coefficient is a measure of the

linear relationship between two variables, for each heuristic, the number of stratas that give the highest

correlation coefficient are chosen as the best result. The summary of the results for each heuristic

algorithm with the highest correlation coefficient is given in Table 10.

After the determination of the best number of strata for each heuristic resulting in smallest root mean

squared error, the predicted and actual completion times are compared. The fit statistics are obtained

from the model and fit graphs are drawn showing the comparison between the actual and the predicted

completion times. The fit graphs are given in Figs. 4–9.

The last values (makespans) of the predicted and actual completion times (belonging to the last

machine) for each heuristic are illustrated in Table 11.

Although there are differences between actual and predicted completion times as shown in Figs. 4–9,

the results indicate that, the models applied to model different heuristic scheduling algorithms predicted

the completion times with good accuracies between 0.9339 and 0.9909. The fact that all values are very

close to unity, indicates that the mapping was performed at a satisfactory level even when fuzzy

Fig. 4. Actual versus predicted completion times for CDS algorithm.

Fig. 5. Actual versus predicted completion times for NEH algorithm.


Fig. 6. Actual versus predicted completion times for Koulamas’ algorithm.

Fig. 7. Actual versus predicted completion times for Aslan’s frequency algorithm.

Fig. 8. Actual versus predicted completion times for Aslan’s point algorithm.


information was present. If we calculate the ratio between the predicted makespans and the actual

makespans, we can see that all of the algorithms are successful in predicting the makespans. The decision

maker can choose one of these algorithms but to specify the due dates in agreement with the customer,

taking the average of these ratios which is equal to 1.00062 is offered as an alternative to find the

makespans of the jobs at hand. By this way, a better production plan that meets the customers’ due dates

can be made.

It is to be noted that, identifying an appropriate ANN model has a strong impact on the performance of

the networks in predicting the makespans. For example, the effect of different values of epoch sizes is

shown in the tables below. As shown in Table 12, training the network modeling Aslan’s frequency

algorithm with epoch size of 59 gives better results than training the network with epoch size of 12,

Fig. 9. Actual versus predicted completion times for Aksoy’s algorithm.

Table 11

The actual and predicted makespans for each heuristic algorithm

Heuristic Actual Predicted Predicted/actual

CDS 32,024.998 32,578.713 1.01729

NEH 31,952.000 31,825.555 0.99604

Koulamas 33,327.000 34,139.152 1.02436

Aslan’s frequency 32,753.998 32,750 0.99987

Aslan’s point 32,242.000 31,583.658 0.97958

Aksoy 32,435.000 31,999.766 0.98658

Table 12

Effect of epoch size on the performance of the network modeling Aslan’s frequency algorithm

Epoch size No. of hidden nodes Train obs. read Test obs. read Test set RMSE Corr. coef. Predicted makespan

12 6 59 18 0.0596 0.8843 22,766.303

59 6 59 18 0.0534 0.8902 29,919.959

15 6 86 21 0.0532 0.9339 32,750

15 6 59 18 0.0597 0.8934 29,194.941


but setting the epoch size to 15 and increasing the number of training observations (by reallocating the

observations in each group) gives the best result. Table 13 also shows the effect of different values of

epoch sizes on the performance of the network modeling Aslan’s point algorithm. Training the network

with the entire training set (95 observations) gives the best result in predicting the makespan. Because the

results showed good prediction ability, no more training observations are needed to train the network.

6. Conclusions

Backpropagation networks have been successfully utilized for various complex and difficult

scheduling problems by many researchers. The main objective of this study is to introduce a new and

alternative approach of using a neural network for the estimation of the makespans of flowshops. Rather

than determining the optimal sequences directly by the network, backpropagation networks have been

applied to predict the completion times of five jobs planned to be produced for the next production

period. Once the neural network results are achieved modeling six heuristic algorithms, they are

compared with the actual results to show the feasibility of using the neural networks as an

implementation alternative to the existing heuristic scheduling algorithms. The trained model prediction

was in good agreement with the actual values hence, producing R2 values between 93.39 and 99.09%.

These results show that approximately 93.39 and 99.09% of the variation in the dependent variables

(output parameters) can be explained by the independent variables (input parameters) selected and the

data set used.

This study shows the applicability of artificial neural networks to real scheduling problems.

Manufacturing plants having a permutation flowshop environment can profit from the model obtained in

this study with their data.

The future directions of this study are to employ additional methods that will improve the accuracy of

the results. In this respect, decreasing the mean percentage errors, finding a perfect match between the

predicted and actual values seem to be a promising direction. Different neural network architectures may

be used to predict the makespan for planning the production schedules. Extensions of the proposed

method, involving different neural network architectures may be developed to solve complex scheduling

problems having different performance measures.

References

Aksoy, M. (1980). Duzgun sıralı is yerlerindeki statik sıralama problemlerinde toplam is akıs suresinin enazlanması icin yeni

bir Sezgisel Yontem. Yoneylem Aras., Bildiriler’80, 255–268.

Table 13

Effect of epoch size on the performance of the network modeling Aslan’s point algorithm

Epoch size No. of hidden nodes Train obs. read Test obs. read Test set RMSE Corr. coef. Predicted makespan

12 6 95 32 0.0752 0.9489 29,076.232

95 6 95 32 0.0637 0.9635 31,583.658

15 6 95 32 0.0713 0.9595 33,147.738


Aslan, D. (1999). Model development and application based on object oriented neural network for scheduling problem.

DEU Res. Fund. Proj., Nr. 0908.97.07.01. University of Dokuz Eylul, Izmir.

Baker, K. R. (1974). Introduction to sequencing and scheduling. New York: Wiley.

Campbell, H. G., Dudek, R. A., & Smith, M. L. (1970). A heuristic algorithm for the n-job, m-machine sequencing problem.

Management Science, 16, 630–637.

Dagli, C. H. (1994). Artificial neural networks for intelligent manufacturing. London: Chapman and Hall.

El-Bouri, A., Balakrishnan, S., & Pooplewell, N. (2000). Sequencing jobs on a single machine: A neural network approach.

European Journal of Operational Research, 126, 474–490.

Feng, S., Li, L., Cen, L., & Huang, J. (2003). Using MLP networks to design a production scheduling system. Computers and

Operations Research, 30, 821–832.

Ho, J. C., & Chang, Y. L. (1991). A new heuristic for the n-job, M-machine flow-shop problem. European Journal of

Operational Research, 52, 194–202.

Jain, A. S., & Meeran, S. (1998). Job shop scheduling using neural networks. International Journal of Production Research, 36,

1249–1272.

Jain, A. S., & Meeran, S. (2002). A multi-level hybrid framework applied to the general flow-shop scheduling problem.

Computers and Operations Research, 29, 1873–1901.

Koulamas, C. (1998). A new constructive heuristic for the flowshop scheduling problem. European Journal of Operational

Research, 105, 66–71.

Lee, I., & Shaw, M. J. (2000). A neural-net approach to real time flow-shop sequencing. Computers and Industrial Engineering,

38, 125–147.

Liu, D., Chang, T. S., & Zhang, Y. (2002). A constructive algorithm for feedforward neural networks with incremental training.

IEEE Transactions on Circuits and Systems—I: Fundamental and Applications, 49, 1876–1879.

McCahon, C. S., & Lee, E. S. (1992). Fuzzy job sequencing for a flowshop. European Journal of Operational Research, 62,

294–301.

Nawaz, M., Enscore, E., & Ham, I. (1983). A heuristic algorithm for the n-job, m-machine flowshop sequencing problem.

Omega, 11, 91–95.

Nowicki, E. (1999). The permutation flow shop with buffers: A tabu search approach. European Journal of Operational

Research, 116, 205–219.

Patterson, D. W. (1996). Artificial neural networks: Theory and applications. Singapore: Prentice-Hall.

Raaymakers, W. H. M., & Weijters, A. J. M. M. (2003). Makespan estimation in batch process industries: A comparison

between regression analysis and neural networks. European Journal of Operational Research, 145, 14–30.

Rajendran, C., & Chaudri, D. (1991). An efficient heuristic approach to the scheduling of jobs in a flowshop. European Journal

of Operational Research, 61, 318–325.

Sabuncuoglu, I. (1998). Scheduling with neural networks: A review of the literature and new research directions. Production

Planning and Control, 9, 2–12.

Sabuncuoglu, I., & Gurgun, B. (1996). A neural network model for scheduling problems. European Journal of Operational

Research, 93, 288–299.

Shiue, Y. R., & Su, C. T. (2002). Attribute selection for neural network based adaptive scheduling systems in flexible

manufacturing systems. International Journal of Advanced Manufacturing Technology, 20, 532–544.

Smith, K. (1999). Neural networks for combinatorial optimization: A review of more than a decade research. Informs Journal

on Computing, 11, 15–34.