polish-british workshops computer systems engineering theory & applications

POLISH-BRITISH WORKSHOP

COMPUTER SYSTEMS

ENGINEERING

THEORY & APPLICATIONS

Editors: Keith J. BURNHAM Leszek KOSZALKA

Radoslaw RUDEK Piotr SKWORCOW

Organised jointly by:

• Control Theory and Applications Centre, Coventry University, UK • Chair of Systems and Computer Networks, Wroclaw University of

Technology, Poland with support from the IET Control and Automation Professional Network

Reviewers:

Keith J. BURNHAM Arkadiusz GRZYBOWSKI

Adam JANIAK Andrzej KASPRZAK Leszek KOSZALKA

Jens G. LINDEN Marcin MARKOWSKI

Iwona POZNIAK-KOSZALKA Przemyslaw RYBA Henry SELVARAJ Ventzeslav VALEV

Benoit VINSONNEAU Krzysztof WALKOWIAK

Cover page designer Aleksandra de’Ville

Typesetting: Camera-ready by authors

Printed by: Drukarnia Oficyny Wydawniczej Politechniki Wrocławskiej, Wrocław 2011

Wybrzeze Wyspianskiego 27, 50-370 Wroclaw, Poland

ISBN 978-83-911675-9-5

POLISH-BRITISH WORKSHOP was held in

Sokołowsko, Poland, May/June 2008 and

Czarna Gora, Poland, June 2009

International Steering Committee Keith J. BURNHAM (the United Kingdom)

Andrzej KASPRZAK (Poland) Henry SELVARAJ (the United States)

Leszek KOSZALKA (Poland) Jens G. LINDEN (Germany)

Pawel PODSIADLO (Australia) Iwona POZNIAK-KOSZALKA (Poland)

Radu-Emil PRECUP (Romania) Radoslaw RUDEK (Poland)

Piotr SKWORCOW (the United Kingdom) Gwidon STACHOWIAK (Australia)

Jan STEFAN (Czech Republic) Ventzeslav VALEV (Bulgaria)

Local Organizing Committee 2008:

Dariusz JANKOWSKI Katarzyna WALEWSKA

Marcin BAZYLUK Krzysztof KWIECIEŃ

2009: Pawel BOGALINSKI Wojciech KMIECIK

Daniel KOWALCZYK Urszula MOSKAL

Conference Proceedings Editors Keith J. BURNHAM - editor Leszek KOSZALKA - editor Piotr SKWORCOW - editor Radoslaw RUDEK – editor Jens G. LINDEN – co-editor

Iwona POZNIAK-KOSZALKA – co-editor

IET Control & Automation Professional Network

The IET Control and Automation Professional Network is a global network run by, and on behalf of, professionals in the control and automation sector with the specialist and technical backup of the IET. Primarily concerned with the design,

implementation, construction, development, analysis and understanding of control and automation systems, the network provides a resource for everyone

involved in this area and facilitates the exchange of information on a global scale. This is achieved by undertaking a range of activities including a website with a range of services such as an events calendar and searchable online library, face-to-face networking at events and working relationships with other organisations.

For more information on the network and how to join visit http://www.theiet.org/.

Preface It is with great pleasure that we as Editors write the preface of the Proceedings for the eighth and ninth Polish-British Workshops on Computer Systems Engineering: Theory and Applications, organized jointly by the Department of Systems and Computer Networks, Wroclaw University of Technology, Wroclaw, Poland and the Control Theory and Applications Centre, Coventry University, Coventry, UK. The Workshops took place in Sokolowsko (2008) and Czarna Gora (2009), with a number of papers presented by young researchers and engineers. The theme of the Workshops was focused on solving complex scientific and engineering problems in a wide area encompassing computer science, control engineering, information and communication technologies and operational research.

Due to increasing populations worldwide and being driven by the scientific and technological developments, the systems that enable and/or enhance our day-to-day activities are becoming increasingly complex. To ensure sustainability of these systems, and indeed of our modern societies, there is an urgent need to solve many scientific and engineering problems. As a result, dealing with modelling, optimisation and control of complex large-scale systems, uncertain data, computational complexity as well as the need for high-speed communication, have all become of significant importance. The problems addressed and the solutions proposed in the papers presented at the Workshops and included in the Proceedings are closely linked to the issues currently faced by our society, such as efficient utilisation of energy and resources, design and operation of communication networks, modelling and control of complex dynamical systems and handling the complexity of information. We hope that these Proceedings will be of value to those researching in the relevant areas and that the material will inspire prospective researchers to become interested in seeking solutions to complex scientific and engineering problems.

The Polish-British Workshops have now become a traditional and integral part of the long- lasting collaboration between Wroclaw University of Technology and Coventry University, with the Workshops taking place every year since 2001. The Workshops bring together young researchers from different backgrounds and at different stages of their career, including undergraduate and MSc students, PhD students and post-doctoral researchers. It is a truly fantastic and quite unique opportunity for early-stage researchers to share their ideas and learn from the experience of others, to become inspired by the work carried out by their elder colleagues and to receive valuable feedback concerning their work from accomplished researchers, all in a pleasant and friendly environment surrounded by the picturesque mountains of Lower Silesia. None of this, however, would be possible without the continued efforts and commitment of the Polish-British Workshop founders: Dr Iwona Pozniak-Koszalka, Dr Leszek Koszalka and Prof. Keith J. Burnham. On behalf of all researchers who have attended the Polish-British Workshop series, including ourselves, we would like to express our sincere gratitude for making the Workshop series such a tremendous success, for sharing with others their extensive knowledge and experience, and for providing valuable guidance related to career and life choices faced by young researchers at this crucial stage of their careers. Dr Piotr Skworcow, Water Software Systems, De Montfort University, Leicester, UK and Dr Radoslaw Rudek, Department of Information Technology, Wroclaw University of Economics, Poland Editors of the Proceedings and Members of the International Steering Committees for the Polish-British Workshops, 2008 and 2009.

Contents

M. BAZYLUK, L. KOSZAŁKA, K. J. BURNHAM, R. RUDEK

DETERMINING THE INPUT BIAS ON EFFICIENCY OF METAHEURISTIC ALGORITHMS FOR A PARALLEL MACHINE SCHEDULING PROBLEM

9

P. BOGALINSKI, I. POŹNIAK-KOSZAŁKA, L. KOSZAŁKA, P. SKWORCOW

THE TWO DIMENSIONAL IRREGULAR-SHAPED NESTING PROBLEM

29

B. CZAJKA, I. POŹNIAK-KOSZAŁKA

SCHEDULING IN MULTI-PROCESSOR COMPUTER SYSTEMS – VARIOUS SCENARIO SIMULATIONS

36

T. DANNE, J. G. LINDEN, D. HILL, K. J. BURNHAM

MODELLING APPROACHES FOR A HEATING, VENTILATION AND AIR CONDITIONING SYSTEM

45

V. ERSANILLI, K. J. BURNHAM

A CONTINUOUS-TIME MODEL-BASED TYRE FAULT DETECTION ALGORITHM UTILISING AN UNKNOWN INPUT OBSERVER

59

J. GŁADYSZ, K. WALKOWIAK

THE HEURISTIC ALGORITHM BASED ON FLOW DEVIATION METHOD FOR SIMULTANEOUSLY UNICAST AND ANYCAST ROUTING IN CFA PROBLEM

73

T. KACPRZAK, L. KOSZAŁKA

COMPARISON OF ACTION CHOSING SCHEMES FOR Q-LEARNING

81

T. LARKOWSKI, J. G. LINDEN, K. J. BURNHAM

RECURSIVE BIAS-ELIMINATING LEAST SQUARES ALGORITHM FOR BILINEAR SYSTEMS

90

K. LENARSKI, A. KASPRZAK, P. SKWORCOW

ADVANCED TABU SEARCH STRATEGIES FOR TWO-LAYER NETWORK DIMENSIONING PROBLEM

104

M. KUCHARZAK, L. KOSZAŁKA, A. KASPRZAK

OPTIMIZATION ALGORITHMS FOR TWO LAYER NETWORK DIMENSIONING

116

D. PICHEN, I. POŹNIAK-KOSZAŁKA

A NEW TREND IN SOFTWARE DEVELOPMENT PROCESS USING DATABASE SYSTEMS

129

A. SMUTNICKI, K. WALKOWIAK

AN ALGORITHM FOR UNRESTORABLE FLOW OPTIMISATION PROBLEM USING P-CYCLES PROTECTION SCHEME

140

M. SUMISŁAWSKA, M. GARYCKI, L. KOSZAŁKA, K. J. BURNHAM, A. KASPRZAK

EFFICIENCY OF ALLOCATION ALGORITHMS IN MESH ORIENTED STRUCTURES DEPENDING ON PROBABILITY DISTRIBUTION OF THE DIMENSIONS OF INCOMING TASKS

159

B. TOKARSKI, L. KOSZAŁKA, P. SKWORCOW

SIMULATION BASED PERFORMANCE ANALYSIS OF ETHERNET MPI CLUSTER

172

M. ŻACZEK, M. WOŹNIAK

APPLYING DATA MINING TO NETWORK INTRUSION DETECTION

182

I. ZAJIC, K. J. BURNHAM

EXTENSION OF GENERALISED PREDICTIVE CONTROL TO HANDLE SISO BILINEAR SYSTEMS

188

T. CZYŻ, R. RUDEK

SCHEDULING JOBS ON AN ADAPTIVE PROCESSOR 200

W. KMIECIK, M. WÓJCIKOWSKI, A. KASPRZAK, L. KOSZAŁKA

TASK ALLOCATION IN MESH CONNECTED PROCESSORS USING LOCAL SEARCH METAHEURISTIC ALGORITHM

208

R. ŁYSIAK, I. POŹNIAK-KOSZAŁKA, L. KOSZAŁKA

ARTIFICIAL NEURAL NETWORK FOR IMPROVEMENT OF TASK ALLOCATION IN MESH-CONNECTED PROCESSORS

220

M. SUMISLAWSKA, P.J. REEVE, K. J. BURNHAM, I. POŹNIAK-KOSZAŁKA, G. HEARNS

COMPUTER CONTROL ALGORITHM SIMULATION AND DEVELOPMENT WITH INDUSTRIAL APPLICATION

230

I. ZAJIC, K. J. BURNHAM, T. LARKOWSKI, D. HILL

DEHUMIDIFICATION UNIT CONTROL OPTIMISATION

240

Computer Systems Engineering 2008

Keywords: scheduling, parallel machines, total tardiness,

simulated annealing, tabu search,

genetic algorithm, ant colony optimization

Marcin BAZYLUK∗

Leszek KOSZAŁKA∗

Keith J. BURNHAM†

Radosław RUDEK‡

DETERMINING THE INPUT BIAS ON EFFICIENCY OFMETAHEURISTIC ALGORITHMS FOR A PARALLEL

MACHINE SCHEDULING PROBLEM

The influence of input parameters on the efficiency of metaheuristic algorithms has been consideredsince their emergence. Calculations show that their calibration becomes an important issue leading tonoticeable improve of quality. Nevertheless, this is often a long-lasting process. This paper tries todetermine that dependence and suggests some policy on the optimization of finetuning. Another aspectis the efficiency variation of metaheuristics for different instance shapes and sizes. As the shape weunderstand the upper bounds of parameters describing a single solution, and as the size - the number ofjobs and the number of available machines. Such variation is measured.

1. INTRODUCTION

During the last decades the scheduling problems on parallel machines with theearliness-tardiness objective has been extensively analysed both by researchers and prac-titioners. In general, a scheduling problem on parallel machines focus on allocation ofjobs to machines and determining their starting times such that the given objective isoptimized. Scheduling problems on parallel machines are usually NP-hard and the prob-lem under earliness-tardiness objectives is strongly NP-hard. Namely, Gareyet al. [1]showed that the simplified problem on a single machine with symmetric earliness and

∗Department of Systems and Computer Networks, Wrocław University of Technology, Poland.†Control Theory and Applications Centre, Coventry University, Coventry, UK.‡Wrocław University of Economics, Poland.

9

tardiness penalties is NP-hard. Yanoet al. [2] proved the NP-hardness for the consid-ered objectives with job weights proportional to their processing times. Sunet al. [3]studied the problem with identical parallel machines and a common due dates for all jobsand proved its ordinary NP-hard if the number of machines is given. Since the problemconsidered in this paper, is a more general, thus, it is not less complex.

2. PROBLEM FORMULATION

There are given the setM of m machines and the setJ of n jobs that have to beprocessed on these machines. Jobs are independent, non-preemptive and available forprocessing at time 0 and each job can be processed by one machine at a time.

Before, we define the problem formally, let us introduce a useful notation and param-eters as follows:

i, j = 1, 2, . . . , n, job indexes,k = 1, 2, . . . ,m, machine index,pik processing time of jobi on machinek,wi weight of jobi,di due date of jobi,Ci completion time of jobi,Ei earliness of jobi, Ei = max0, di − Ci,Ti tardiness of jobi, Ti = max0, Ci − di,

xijk =

1 if job j immediately follows jobi on machinek, ,0 otherwise,

i = 0, 1, . . . , n, j = 1, 2, . . . , n, k = 1, 2, . . . ,m

yjk =

1 if j ob j is to be executed on machinek,0 otherwise,

.

j = 1, 2, . . . , n, k = 1, 2, . . . ,m

For i = 0, x0jk = 1 means that jobj is scheduled as the first one on machinek. Onthis basis, the job processing time of jobj is defined by the recursive formulae:

Cj =n∑

i=0

m∑

k=0

xijkCi + pjk. (1)

Following the notation and parameters, we define the problem formally. The objec-tive is to find such allocation of jobs to machines and their staring times on machines

10

that minimizes the following:

f =

n∑

i=1

wi(Ei + Ti) (2)

under the following constraints:

• each job is processed by one machine and preemption is not allowed:n∑

i=1,i6=j

m∑

k=1

xijk = 1, j = 1, 2, . . . , n, (3)

• the is no idle times:n∑

i=1,i6=j

xijk = yjk, j = 1, 2, . . . , n, k = 1, 2, . . . ,m, (4)

• each machine can process only one job at a time:n∑

j=1,j 6=i

xijk ≤ yik, i = 1, 2, . . . , n, k = 1, 2, . . . ,m. (5)

Since idle times are not allowed, thus, determining the staring times of jobs is reducedto their sequences on the machines.

3. METAHEURISTICS

To solve the problem described in the previous section, we will use the well knownmetaheuristic algorithms that are described in the further part of this section.

3.1. SIMULATED ANNEALING

The simulated annealing algorithm starts from an initial solution, generated by amodifiedLongest Processing Time, called LPT-MM (Longest Processing Time - Multi-Machine) algorithm. Its mechanism is presented in Fig. 1.

Starting from a given solution, the algorithm chooses the next solution by swap orinsert of two randomly chosen jobs. The new solution replaces the recent solution withthe following probability

P = exp( Cg − Cn

T0(1−δt

tmax·100%)

)

, (6)

11

Step 1 (initialization)

Schedule jobsj ∈ J according to the non-increasing order ofwj

pj.

Step 2 (iterative scheduling)

For each jobsi ∈ J do

Find the first idle machinek. Allocate jobi to machinek after the last job.

Fig. 1. Mechanism of the LPT-MM algorithm

whereCg is the total global lowest cost,Cn is the total cost of the neighbor solution,T0

is the initial temperature,t is the current time andtmax is the fixed time of calculations.The parameterδ is the temperature decrease factor expressed in [%]. On this basis, theparameterα for the linear schedule is calculated:

α =δT0

tmax. (7)

Block diagram of the designed simulated annealing is presented in Fig. 3a. The algo-rithm running time is fixed.

3.2. TABU SEARCH

Now we will propose a deterministic tabu search algorithm. It generates a completeneighborhood of a current solution from which it afterwards chooses the best one. Twoneighbor types are proposed: swap and insert. To generate the initial solution the LPT-MM algorithm (see Fig. 1) is used.

The tabu search algorithm uses local search with a short term memory, called tabulist, that is organized as FIFO (First In First Out). Tabu list stores arcs(i, j, k, l) thatmeans jobj immediately follows jobi on machinek where jobsi andj are respectivelyin positionsl− 1 andl. Wheni = 0 job j is in the first position on machinek andl = 0.Block diagram of the proposed tabu search is shown in Fig. 3b.

3.3. GENETIC ALGORITHM

For more information on genetic algorithms (GAs), we refer to [5]. Due to manydifferent approaches to the considered algorithm which have been presented in the lit-erature, this section states exact parameters of the implemented one. Each iterationsimulates one generation of chromosomes, where we are dealing with a population ofancestors and a population of descendants. For the number of ancestors asn we areabout to getn(n − 1) descendants and their set is generated by mating every possible

12

START

• Thegiven set of parents.

LOOP

• Choose one parent randomly.

• Find the first job on the chosen parent which has not been placed on the descendant yet.

• If the machine assigned to that job is the same on both parents, then make this same selection on the job, too;else choose one of the machines of the parents randomly and assign it to the job.

• Place the job-machine pair on the first empty position of the descendant.

• If all genes of the descendant chromosomes are set, STOP;else proceed to the next gene on the descendant and repeat LOOP.

Fig. 2. MCUOX crossover algorithm

pair in the ancestors set (chromosomesi, j: 1 ≤ i, j ≤ n, i 6= j).What was proved experimentally in [6] for the problem examined here simplified

by setting the due dates of all jobs to 0, usage of popular PMX crossover in the GAwould make it impossible to find satisfying results. Therefore, we implemented of amulti-component uniform order-based crossover operator (MCUOX) proposed in [7]where a single gene accomodates both an object and the associated selection for thatobject. Thus, for our case each gene corresponds to a job-machine pair. Constructionof a descendant from two parent chromosomes is presented in Fig. 2. The crossover isexecuted with a fixed probability, thus, sometimes the descendant chromosome is simplya copy of its first ancestor.

For mutation two mechanisms are used: the swap and the bit one. First of theserandomly chooses two positions on a chromosome and replaces their contents. Thesecond one incorporates reassignment of a random job to a randomly chosen machine ina chromosome.

Probability of selecting chromosomes for crossover from a population of descendantsis proportional to the value of their fitness function and for a given chromosome is de-fined as

Pg =Fg

G∑

g=1

Fg

, (8)

whereG being the size of the population andFg = CTc(g)−1 being the fitness function

of chromosomeg, inversely proportional to the objective one whereC is a fixed constant.

13

The block diagram of GA is presented in Fig. 3c.

3.4. ANT COLONY OPTIMIZATION

Ant algorithms are a new promising approach to solving optimization problems basedon the behaviour of ant colonies. First ant systems were developed to attack the problemspresenting their similarity to ant colonies behaviour such as the TSP [8], QAP [9] or VRP[10]. Later they have been extended for the problem of JS but the literature on this topicis rather limited. Some metaheuristic methods applied for similar scheduling problemswere analysed in [11].

In every generation each ofm ants constructs a solution. It iteratively chooses a ran-dom job and places it on a random machine at the first empty position. The probabilitythat the ant makes a step, which means placing jobj after jobi on machinek is definedas

Pijk =(τijk)

α(ηijk)β)

∑

j∈J,k∈M(τijk)α(ηijk)β(9)

where τijk is the amount of pheromone on that choice. We also introduce the heuristicvalueηijk, as proposed in [13] which is the cost calculated for execution ofjob j on ma-chinek in this case. Exponentsα andβ are constants determining the relative influenceof respectively the pheromone value and the heuristic on the decision of the ant.

After all ants have finished their paths some of the old pheromone is evaporated bymultiplying it with a factor0 < ρ < 1:

∀i,j∈J,k∈M τijk ← τijk · ρ (10)

This avoids the old pheromone from having too strong influence on future decisions.Finally mb best ants, wheremb ≤ m add some pheromone to every step they have

made on their tours. The amount of pheromone applied to a single step is equalQ/Tc,whereTc is the objective function value of the solution found by the ant andQ is the ob-jective function value of the solution found by LPT-MM, described in Fig. 1 multipliedby a constant valueλ:

∀i,j∈J,k∈M(xijk = 1) τijk ← τijk +Q

Tc. (11)

To prevent from reducing some steps to 0 and making the probability of other ones toolarge we define the minimum (τmin) and maximum (τmax) pheromone values of eachstep. The block diagram is presented in Fig. 3d.

14

(a) simulated annealing (b) tabu search

(c) genetic algorithm (d) ant colony optimization

Fig. 3. Block diagrams of the implemented metaheuristics

4. EXPERIMENTATION SYSTEM

According to the objectives of this paper, this section contains a conception of anal-ysis, which includes measuring the dependence of metaheuristic algorithms efficiencyupon the input parameters. Figures presented in this section contains block diagramscompliant with the rules defined by the theory of systems. Input parameters used by theobjects modelled in this chapter are presented in Table 1. Since no benchmark problemswere found in the literature for the problem in particular shape considered here, ran-dom generation of instances seemed to be the only way of preparing their large numberrequired for the experiments. The detailed plan of experimentation is as follows:

15

i). Perform the calibration of the internal input parameters for all four metaheuristics.Simultaneously measure the sensitivity of the algorithms to the variation of suchparameters. First, generate a set of random instances large enough with givenparameters (Fig. 4) and hand them to the algorithms. Then solve all instances

Table 1. Input parameters used in figures

Symbol Description Used byT0 initial temperature simulated annealingδ temperature decrease factor simulated annealingΛ tabu list size tabu searchG size of population genetic algorithm

Ms,Mb probability of swap / bit mutation genetic algorithmX probability of crossover genetic algorithm

τmin, τmax minimal / maximal pheromone value ant systemρ evaporate factor ant systemm number of ants ant systemβ exponent used in eq. 9 (α = 1) ant systemtC time of calculations all algorithmstI the algorithms is interrupted for no improvenent in this timeall algorithms

n,m number of jobs / machines instance generatorwmax, pmax, dmax upper bound of job weights / processing times / due dates instance generator

Fig. 4. Analysis of algorithms sensitivity to input parameters

16

Fig. 5. Analysis of algorithms sensitivity to instance size

repeatedly using different values of internal input parameters.

ii). Inspect the efficiency variation of the algorithms for different instance sizes. Re-peatedly generate large sets of instances with fixed upper bounds (wmax, pmax,dmax) and different jobs and machines numbers (n, m). Then pass each set to thefour metaheuristics and engage them with the same calculation times (tC , tI ) andthe input parameters, received from the previous experiment (Fig. 5).

iii). Measure the sensitivity of four metaheuristics to the variation of upper bounds ofparameters describing the instances (wmax, pmax, dmax). For this generate setsof instances with different values of these bounds and fixed jobs and machinesnumbers (n, m). Then hand these sets to the four algorithms and engage themwith the finetuned internal input parameters (Fig. 6).

17

Fig. 6. Analysis of algorithms sensitivity to instance shape

5. RESEARCH

All tests were run on a PC with CPU IntelR© CeleronR© 2.66GHz, 1024MB RAM,MicrosoftR© WindowsR© Server 2003 Enterprise Edition operating system and .NETFramework 2.0.

5.1. PARAMETERS CALIBRATION

An external simulated annealing algorithm has been designed to manage the the fine-tuning process used in the application. The finetuned parameters of all four consideredalgorithms for example instances are presented in Table 2 (fortC = 10s andtI = 5s).Thecalculation timetC is the amount of time after which the algorithm is stopped un-conditionally. The interruption timetI is the amount of time after which the algorithm isinterrupted only if there is no improvement of the objective function of the best solutionfound. Symbols in the tables and in the following figures are the same as used in Table1.

The average efficiency evolution for varying selected execution parameters of someinstance sizes is presented in Fig. 7-10 which are collected in Table 3. The average

18

Table 2. Finetuning results for 10s (5s) time

SA TS GA ACOn mT0 δ[%] Λ G Ms Mb X τmin τmax ρ m β

15 13 85 3 28 0.15 0.14 1.00 100 1200 0.85 10 0.5150 10 16 94 5 30 0.09 0.12 1.00 100 1000 0.80 10 0.5

5 17 94 8 32 0.14 0.08 1.00 150 1100 0.85 12 0.515 23 95 8 32 0.08 0.10 1.00 200 800 0.90 24 0.5

100 10 18 96 12 38 0.16 0.13 0.95 100 1000 0.81 41 0.55 23 96 14 36 0.12 0.14 1.00 150 1200 0.75 38 0.515 20 97 11 33 0.10 0.13 1.00 150 1200 0.90 37 0.5

50 10 21 95 12 34 0.09 0.13 1.00 100 1100 0.80 36 0.55 20 96 12 39 0.10 0.10 0.97 100 1000 0.78 48 0.5

Table 3. Finetuning progress

ConstantAlg. n m tC [s] tI [s] Finetuned parametersparam.

Fig.

TS 100 10 10 5 Λ = 1, 2, 3, . . . , 20 - 7aTS 50 5 10 5 Λ = 1, 2, 3, . . . , 20 - 7b

T0 = 12, 14, 16, . . . , 28,SA 100 10 10 5δ = 84%, 86%, 88%, . . . , 100%

- 8a

T0 = 14, 16, 18, . . . , 28,SA 50 5 10 5δ = 86%, 88%, 90%, . . . , 100%

- 8b

G = 24, 26, 28, . . . , 42, Ms = 0.1,GA 100 10 10 5X = 0.92, 0.93, 0.94, . . . , 1.00 Mb = 0.1

9a

G = 31, 33, 35, . . . , 43, Ms = 0.1,GA 50 5 10 5X = 0.93, 0.94, 0.95, . . . , 1 Mb = 0.1

9b

Ms = 0.02, 0.04, 0.06, . . . , 0.20, G = 30,GA 100 10 1 0.5Mb = 0.02, 0.04, 0.06, . . . , 0.20 X = 1

9c

Ms = 0.02, 0.04, 0.06, . . . , 0.24, G = 15,GA 50 5 1 0.5Mb = 0.02, 0.04, 0.06, . . . , 0.2 X = 0.9

9d

τmin = 100,AS 100 10 10 5

ρ = 0.75, 0.77, 0.79, . . . , 0.95,τmax = 1000, 10a

m = 25, 27, 29 . . . , 45β = 0.5τmin = 100,

AS 50 5 10 5ρ = 0.70, 0.72, 0.74, . . . , 0.90,

τmax = 1000, 10bm = 30, 32, 34, . . . , 50

β = 0.5ρ = 0.85,

AS 100 10 1 0.5τmin = 25, 50, 75, . . . , 200,

m = 18, 10cτmax = 850, 900, 950 . . . , 1200

β = 0.5ρ = 0.9,

AS 50 5 1 0.5τmin = 25, 50, 75 . . . , 200,

m = 15, 10dτmax = 850, 900, 950, . . . , 1200

β = 0.5

objective function deterioration factordTc = T ∗c /Tc · 100%, presented in z-axis of the

figures is calculated by dividing the best objective function value in the examined argu-ment spaceT ∗

c by the current objective function value. In x-axis and y-axis (x-axis only

19

0 5 10 15 20

0

10

20

30

40

50

60

Tabu list size

Ave

rage

Tc d

eter

iora

tion

[%]

(a) 100 jobs & 10 machines

0 5 10 15 20

0

5

10

15

20

Tabu list size

Ave

rage

Tc d

eter

iora

tion

[%]

(b) 50 jobs & 5 machines

Fig. 7. Finetuning of TS

10

15

20

25

85

90

95

1000

20

40

60

Initial temperature

Temperature decrease [%]

Ave

rage

Tc d

eter

iora

tion

[%]

5

10

15

20

25

30

35

40

45


1416

1820

2224

2628

85

90

95

1000

20

40

60

80

Initial temperature

Temperature decrease [%]

Ave

rage

Tc d

eter

iora

tion

[%]

10

20

30

40

50

60

70


Fig. 8. Finetuning of SA

for the tabu search) different values of the finetuned parametersare presented. The aver-age deterioration of the objective function in comparison with the simulated annealing isshown in the z-axis (y-axis for the tabu search). Figures proove strong sensitivity of allimplemented algorithms to the input parameters. Also, going through them we can reachthe conclusion that approximating that dependence with a mathematical function turnsout to be infeasible. An external calibration algorithm appears to be the only conclusion.The following conclusions emerge:

• The value of the temperature decrease factor (SA) equals around 95% (Fig. 8) butusing too large (≥98%) value leads to dramatical deterioration. Influence of the

20

25

30

35

40

0.920.93

0.940.95

0.960.97

0.980.99

1

0

2

4

6

Population size

Crossover probability

Ave

rage

Tc d

eter

iora

tion

[%]

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

(a)G andX, 100 jobs & 10 machines

30

35

40

45

0.93

0.94

0.95

0.96

0.97

0.98

0.99

1

0

5

10

Population size

Crossover probability

Ave

rage

Tc d

eter

iora

tion

[%]

0

1

2

3

4

5

6

7

(b) G andX, 50 jobs & 5 machines

0

0.05

0.1

0.15

0.2

0

0.05

0.1

0.15

0.2

0

1

2

Swap mutatio

n probability

Bit mutation probability

Ave

rage

Tc d

eter

iora

tion

[%]

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

(c)Ms andMb, 100 jobs & 10 machines

00.05

0.10.15

0.20.25

0

0.05

0.1

0.15

0.2

0

5

10

Swap mutation probability

Bit mutation probability

Ave

rage

Tc d

eter

iora

tion

[%]

0

1

2

3

4

5

(d) Ms andMb, 50 jobs & 5 machines

Fig. 9. Finetuning of GA

initial temperature is not so strong but setting it precisely will improve the solutionquality by a few percentage points.

• Influence of the tabu list size in TS upon the algorithm effectiveness sometimesreminds the normal distribution (Fig. 7a) with only one local optimum which isquite easy to locate but can also be monotically decreasing (Fig. 7b).

• The crossover probability of GA should be at least0.95 but if there is no time forfinetuning setting its value to 1 should not cause strong effectiveness deteriorarion.On the other hand, the population sizeG is a crucial aspect. Following Fig. 9a wecan see three local optima arranged in a line. The area of swap and bit mutationprobabilities is strongly diversified, however keeping0.1 < Mb,Ms < 0.2 shouldguarantee that the objective function value will not deteriorate by more that 5%.

21

0.75

0.8

0.85

0.9

0.95

25

30

35

40

45

0

2

4

Evaporate factor

Ants no.

Ave

rage

Tc d

eter

iora

tion

[%]

0.5

1

1.5

2

2.5

3

(a)ρ andm, 100 jobs & 10 machines

0.7

0.75

0.8

0.85

0.9

3032

3436

3840

4244

4648

50

0

2

4

Evaporate factorAnts no.

Ave

rage

Tc d

eter

iora

tion

[%]

0.5

1

1.5

2

2.5

(b) ρ andm, 50 jobs & 5 machines

0

50

100

150

200

800850

900950

10001050

11001150

12000

1

2

3

Min. pheromone

Max. pheromone

Ave

rage

Tc d

eter

iora

tion

[%]

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

(c) τmin and τmax, 100 jobs & 10 machines

0

50

100

150

200

800

900

1000

1100

12000

5

10

Min. pheromoneMax. pheromone

Ave

rage

Tc d

eter

iora

tion

[%]

0

1

2

3

4

5

(d) τmin and τmax, 50 jobs & 5 machines

Fig. 10. Finetuning of AS

• The value ofβ in AS was for most cases fixed at0.5. The minimal values of theobjective function in the space of evaporate factorρ and ants numberm in ASare often surrounded by maxima (Fig. 10ab) and therefore are hard to find for thefinetuning algorithm based on SA. However, setting them improperly worsens thequality in less that 2%. Most of the local optima are located for the0.82 < ρ <0.88, (Fig. 10a). Whileτmin = 50 andτmax = 950 are the pheromone lowerandupper bounds form = 5 andn = 50 (Fig. 10d), these values indicate a localmaximum form = 10 andn = 100 (Fig. 10c).

5.2. INFLUENCE OF INSTANCE SIZE

After the finetuning phase is completed we can proceed to measure the sensitivity ofthe four developed algorithms to the size of input instances with the calibrated internal

22

parameters which will guarantee their highest possible effectiveness. The size of aninstance is described by two factors: jobs numbern and machines numberm.

Most conducted research involved 100 repetitions of each algorithm with the sameinput characteristics to calculate the mean value of the objective function. Only forsome larger instances which needed ten minutes and more to finish a single optimizationexperiment that number was reduced to 50.

The average evolution of the objective function for different instance sizes andtC =10s, tI = 5s is presented in Fig. 11. Decreasing the number of jobs is accompanied bytheapproach of tabu search quality to simulated annealing. It finally overtakes the SAfor some time forn = 50. Since in all conducted experiments the simulated annealingalgorithm was predominant over its rivals for the whole time, we will use its currentobjective function value as a reference point for comparing the three other ones. Hence

0 2 4 6 8 100

2000

4000

6000

8000

10000

12000

time [s]

Tc

TSGASAAS


0 2 4 6 8 100

0.5

1

1.5

2

2.5x 10

4

time [s]

Tc


0 2 4 6 8 100

1000

2000

3000

4000

5000

time [s]

Tc

(c) 50 jobs & 10 machines

0 2 4 6 8 100

1000

2000

3000

4000

5000

6000

time [s]

Tc

(d) 50 jobs & 5 machines

Fig. 11. Objective function evolution

23

02

46

810

020

4060

80100

0

1

2

3

4

5

KN

F

1

1.5

2

2.5

3

3.5

4

(a) SA-TS

02

46

810

020

4060

80100

0

5

10

15

KN

F

2

4

6

8

10

12

(b) SA-GA

02

46

810

020

4060

80100

0

5

10

15

KN

F

2

4

6

8

10

12

14

(c) SA-AS

02

46

810 0

2040

6080

100

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

NK

F

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

1.05

1.1

(d) AS-GA

Fig. 12. Efficiency margin of the metaheuristics

the quality of TS, AS and GA will be expressed as the proportion oftheir objectivefunction to the objective function of SA.

Successive research was conducted to compare the quality variation of four algo-rithms for the changing size of an instance. Superiority of SA over three other algorithmsis presented in Fig. 12abc, and superiority of AS over GA in Fig. 12d whereF is theproportion mentioned above,m is the number of machines andn is the number of jobs.Figures show that the implemented algorithms are strongly sensitive to the instance sizebut this sensitivity shows some level of linearity and therefore seems to be predictable.Both local search algorithms proove to be more suitable for the examined range of pa-rameters. The quality of evolutionary ones is comparable. The following conclusionsappear:

• The proportion of tabu search to simulated annealing quality for 10s of calcula-

24

tions is maintained in the level of approx. 1.3 for all instances except the ones with90 jobs and more (Fig. 12a). This is caused by rapid growth of calculations forthe tabu search. For example the number of jobsn = 80, we have a neighborhoodsize equal32n(n + 1), which is 9720. After adding 20 more jobs (25%) this sizewill grow to 15150, which is 56% more.

• For the genetic algorithm, its efectiveness comparable to the SA is remarkableonly form = 1. However, there is also an interesting increase of quality form ≥ 2(Fig. 12b). Low values ofF for n = 10 is irrelevant since 10s of calculations isdefinitely enough to find the optimal or near-optimal solution in this case. Thequality of GA starts to raise form ≥ 5 and this increase is stronger for a longercalculation time.

• The ant colony optimization seems to be dependent upon both jobs and machinesnumber, too. It differs from the GA in the location of the weakest point which isalso smoother (Fig. 12c). Finding the explanation for why a strong deteriorationof evolutionary algorithms is noticed for some interval in the machines numbercould be the topic of a separate paper.

• The margin between the quality of AS and GA shows some continuity (Fig. 12d)for the growth of calculation time. Superiority of the ant system emerges for thedecreasing number of machines and the increasing number of jobs. This emer-gence shows a high level of linearity.

5.3. INFLUENCE OF INSTANCE SHAPE

The following subsection contains results of the comparative study conducted to in-spect the influence of the instance parameters upper bounds on the quality of four inves-tigated algorithms. These upper bounds are: the maximal job weightwmax, the maximaljob processing time on a machinepmax, and the maximal job due datedmax.

Table 4. Influence of instance shape

n m tC [s] tI [s] wmin wmax dmin dmax pmin pmax Fig.100 10 10 5 0 5 to 50 0 50 0 10 13a50 5 10 5 0 5 to 50 0 50 0 10 13b100 10 10 5 0 10 0 10 to 100 0 10 13c50 5 10 5 0 10 0 10 to 100 0 10 13d100 10 10 5 0 10 0 500 0 10 to 100 13e50 5 10 5 0 10 0 500 0 10 to 100 13f

25

5 10 15 20 25 30 35 40 45 500

1

2

3

4

5x 10

4

wmax

Tc

SATSGAAS

(a)wmax, 100 jobs & 10 machines

5 10 15 20 25 30 35 40 45 500

0.5

1

1.5

2

2.5x 10

4

wmax

Tc

(b)wmax, 50 jobs & 5 machines

10 20 30 40 50 60 70 80 90 1000

0.5

1

1.5

2x 10

4

dmax

Tc

(c) dmax, 100 jobs & 10 machines

10 20 30 40 50 60 70 80 90 1000

2000

4000

6000

8000

10000

dmax

Tc

(d) dmax, 50 jobs & 5 machines

10 20 30 40 50 60 70 80 90 1000

5

10

15x 10

4

pmax

Tc

(e)pmax, 100 jobs & 10 machines

10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6x 10

4

pmax

Tc

(f) pmax, 50 jobs & 5 machines

Fig. 13. Objective function dependence upon the instance shape

The lower bounds were set to default valueswmin = 1, pmin = 1, dmin = 0. Varia-tion of efficiency was monitored for different values of upper bounds. That issue couldbe also comprehended as the shape of instances. Figures illustrating such influence are

26

collected in table 4 which also containts the detailed input parameters in all conductedexperiments. wmax andwmin are respectively the maximum and minimum values ofjob weight,dmax anddmin - the maximum and minimum job due date,pmax andpmin

- the maximum and minimum job processing time on a selected machine. n andm arerespectively jobs and machines number,tC andtI - calculation and interruption time.

Fig. 13 proves that changing the instance shape by modifying the maximal values ofjob weights, job due dates and job processing times on the machines does not stronglyinfluence the effectiveness of four investigated algorithms. Quality margin between thefour algorithms remains constant in all experiments.

6. CONCLUSIONS

For the considered problem four metaheuristic algorithms wereimplemented to dealwith such a problem. Three of these are the simulated annealing (SA), the tabu search(TS), and the genetic algorithm (GA) which have become very popular in recent time.This is proved by large body of literature covering wide perspectives of their implemen-tation [4, 5, 7, 12, 14]. The fourth algorithm was the ant colony optimization (ACO)[8, 9, 10, 11]. Complex computational experiments have been conducted on randomlygenerated instances to compare the performances of the implemented metaheuristics.

Comparison of the results obtained by four developed metaheuristics with the optimalones was impossible and the only option was to compare them with each other.

Sensitivity of the four algorithms to the values of the input parameters which probablycannot be approximated due to its complicated distribution. Therefore, tuning is essentialeach time one of the algorithms is engaged. The exception is the tabu list size in the tabusearch which can be foreseen as proposed in [14]. No dependence upon the range ofparameters describing the jobs and the machines of a single problem instance. Stronginfluence of the number of jobs and the number of machines on the effectiveness of theused algorithms. This influence showed a high level of linearity.

REFERENCES

[1] GAREY M.R., TARJAN R.E. and WILFONG G.T.,One-processor scheduling with sym-metric earliness and tardiness penalties. Mathematics of Operations Research, vol. 13,1988, pp. 330–348.

[2] YANO C.A. and KIM Y., Algorithm for a class of single machine weighted tardiness andearliness problems. European Journal of Operational Research, vol. 52, 1991, pp. 167–178.

27

[3] SUN H. and WANG G.,Parallel machine earliness and tardiness scheduling with propor-tional weights. Computers & Operations Research, vol. 30, 2003, pp. 801–808.

[4] CAO D., CHEN M. and WAN G.,Parallel machine selection and job scheduling to min-imize machine cost and job tardiness. Computers & Operations Research, vol. 32, 2005,pp. 1995–2012.

[5] DAVIS L., Handbook of genetic algorithms. Van Nostrand Reinhold: New York, NY, 1991.

[6] BAZYLUK M., KOSZALKA L. and BURNHAM K.J., Using heuristic algorithms forparallel machines job scheduling problem. Proceedings of the 6th PBW, 2006, 9–29.

[7] SIVRIKAYA-SERIFOGLU F. and ULUSOY G.,Parallel machine scheduling with earli-ness and tardiness penalties. Computers & Oper. Research, vol. 26, 1999, pp. 773–787.

[8] DORIGO M. and GAMBARDELLA L.M.,Ant colonies for the travelling salesman prob-lem. BioSystems, vol. 43, 1997, pp. 73–81.

[9] MANIEZZO V. and COLORNI A., The ant system applied to the quadratic assignmentproblem. Knowledge and Data Engineering, vol. 11, 1999, pp. 769–778.

[10] BULLNHEIMER B., HARTL R.F. and STRAUSS C.,An improved ant system algorithmfor the vehicle routing problem. Annals of Oper. Research, vol. 89, 1999, pp. 319–328.

[11] LIAO C. and JUAN H.,An ant colony optimization for single-machine tardiness schedulingwith sequence-dependent setups. Comp. & Oper. Research, vol. 34, 2007, pp. 1899–1909.

[12] RABADI G. and ANAGNOSTOPOULOS G.C.,A heuristic algorithm for the just-in-timesingle machine scheduling problem with setups: a comparison with simulated annealing.International Journal of Advanced Manufacturing Technology, vol. 32, 2007, pp. 326–335.

[13] MIDDENDORF M., REISCHLE F. and SCHMECK H.,Multi colony ant algorithms. Jour-nal of Heuristics, vol. 8, 2002, pp. 305–320.

[14] BILGE U., KIRAC F., KURTULAN M. and PEKGUN P.,A tabu search algorithm for par-allel machine total tardiness problem. Computers & Operations Research, vol. 31, 2004,pp. 397–414.

28


Keywords: nesting, packing problem, cutting problem, metaheuristic algorithm, Tabu Search

Paweł BOGALIŃSKI* Iwona POŹNIAK-KOSZAŁKA * Leszek KOSZAŁKA* Piotr SKWORCOW†

THE TWO DIMENSIONAL IRREGULAR-SHAPED NESTING PROBLEM

In this paper, we analyse the nesting problem that is faced by manufacturing processes where cutting and packing are involved. It concerns a non-overlapping placement of two-dimensional irregular shape objects on a bounded area. We describes new algorithms for solving the nesting problem. The first algorithm is based on Tabu Search and the second algorithm, called Shaking Algorithm, is proposed by the authors. Both algorithms were implemented and tested using the proposed experimentation system.

1. INTRODUCTION

The nesting problem concerns the placement of a number of irregularly shaped objects on a two-dimensional fixed area, such that the objects do not overlap. Nesting problem is also known as the strip nesting problem or irregular strip nesting problem [1]. Nesting problem occurs in variety of industries, particularly in manufacturing processes where cutting and packing operations are involved. Examples of such industries and processes include aircraft and ship construction, textile, clothing and footwear production, furniture manufacturing etc. Often the cost of the material being cut is a significant component of the total production cost and thus saving the material is an important matter. A solution of the nesting problem should therefore attempt to minimize the amount of wasted material which is equivalent to maximizing the utilization of the material. An example of a 2D nesting problem related to cutting parts

* Department of Systems and Computer Networks, Wroclaw University of Technology, Poland. † Water Software Systems, De Montfort University, Leicester, UK.

29

of clothes from a roll of fabric is illustrated in Figure 1. In this case to minimise the production cost it is necessary to maximize the utilization of the fabric or equivalently minimize the waste of fabric, while cutting all required parts of clothes.

Fig. 1. A nesting problem example – tight packing parts of Shirts

This paper concerns a variant of the nesting problem for irregular shape objects and

describes two different methods developed to solve nesting problem. The paper is organized as follows: Section 2 contains problem formulation. Solution to the problem and developed algorithms are presented in Section 3. Section 4 contains a description of experimentation system developed to test the performance of algorithms for solving nesting problem. Section 5 provides final remarks.

2. PROBLEM DESCRIPTION

The term nesting has been used to describe a wide variety of two dimensional cutting and packing problems. All such problems involve a non-overlapping placement of a set of irregular two-dimensional shapes within a defined region of two-dimensional space. Most problems can be categorized as follows [2]:

• Decision problem. Decide whether a set of shapes fit within a given region. • Knapsack problem. Given a set of shapes and a region, find a placement of

a subset of shapes that maximizes the utilization (area covered) of the region.

• Bin packing problem. Given a set of shapes and a set of regions, minimize the number of regions needed to place all shapes.

30

• Strip packing problem. Given a set of shapes and a width W, minimize the length of rectangular region with width W such that all shapes are contained in the region.

This paper concerns a strip packing (nesting) problem, i.e. the objective is to find a non-overlapping placement of shapes within the bounds of the material, such that the length L of required material is minimal. This is equivalent to the maximization of utilization of the material. Utilization, denoted U, of the material for a given solution can be expressed as:

(1)

where S denotes sum of areas of all shapes. The terms length and width are traditionally used in the nesting problem literature. In this paper the term height, denoted h, is used as an equivalent of length, thus the aim of the optimization is minimization of the height of required material, see Figure 2.

Fig. 2. Strip nesting problem - problem statement

The aim of this work is to formulate algorithms for solving nesting problem and

develop and implement an application for evaluation of performance of the implemented algorithms, leading to establishing what is the best approach for solving nesting problem. The main criterion when evaluating the efficiency of the algorithms implemented is the height of material required to place a defined set of shapes.

3. SOLUTION TO THE PROBLEM

Solving the nesting problem progresses in three stages:

31

1. Quantization – this stage is representation of the shapes as maps of pixels. Each shape is characterized by array of bits.

2. Optimization – this stage is the main and the most important step. During this stage an optimal or near-optimal solution is found. The nesting problem is NP-hard even fort rectangular shapes (and material) with no rotation allowed [1,2]. A complete overview is not practical, thus to find a good placement of shapes some alternative or heuristic methods should be used. In this paper two methods are presented: Tabu Search algorithm and a method called “Shaking”.

3. Printing – this stage is the process of conversion of the array of bits representation into graphical shapes.

3.1 QUANTIZATION

Quantization is necessary to process images representing real objects (parts of clothes, metal elements etc.). Each image is translated to a raster model, i.e. polygons are represented by matrices [2]. An example of a polygon and its raster model equivalent is illustrated in Figure 3.

Fig. 3. Polygon and its equivalent raster model.

Source: Nielsen B.K., Odgaard A., Fast Neighbourhood Search for the Nesting Problem.[2]

In a raster model, each pixel of an image is represented by two coordinates, therefore each shape to be placed on a material is represented by matrix of coordinates. The resolution of each raster model is fixed.

3.2 TABU SEARCH ALGORITHM

Tabu search is a mathematical optimization method. Tabu search enhances the performance of a local search method by using memory structures: once a potential

32

solution has been determined, it is marked as "taboo" so that the algorithm does not ‘v isit’ that possibility again [3]. For further information about Tabu search see [9-10].

Tabu search algorithm has been applied to solve a wide range of optimisation problems, such as job-shop scheduling or Travelling Salesman Problem (TSP). Application of Tabu search method to solve the strip nesting problem is based on an idea of expressing the strip nesting problem as a TSP. Each city in TSP is represented by a shape in nesting problem while each route from the starting city to the destination city (sequence of cities) in TSP is represented by a sequence of shapes on the material in nesting problem. Using such formulation, finding the best solution for nesting problem involves finding an optimal sequence of shapes on the material.

Each shape is identified by a unique number and each placement of shapes is represented by a sequence of numbers. Order of numbers in the sequence corresponds to their position on the material in the following manner: the first number in the sequence means that the shape identified by this number is placed in the right bottom corner of the material, the shape corresponding to the following number in the sequence is placed on the left of the first shape and so on. For each sequence of shapes the algorithm is searching the neighbourhood of permutation (type swap), and when a minimal value is found (local minimum of height of the material used), the information about the move which resulted in this solution, is placed in the taboo list. The moves already placed in the taboo list are not allowed to avoid oscillations around local optimum.

Tabu search algorithm starts searching from a random sequence describing placement of shapes. Subsequently, the algorithm checks all solutions in the neighbourhood of the current sequence. From this neighbourhood the algorithm chooses the best solution, i.e. the one corresponding to the minimal height of the material used. This selected solution becomes the base solution, so at the next step the neighbourhood of this solution is checked and so on.

3.3 SHAKING ALGORITHM

The second algorithm developed and implemented by the authors is based on analogy to our daily lives. Given a set of specific objects which we want to arrange to pack them in a bag, we usually arrange them according to some scheme (algorithm), e.g. heavier objects on the bottom, lighter objects on top, with all objects packed tightly to maximize the utilization of space. We can also put everything in the bag randomly and to make some free space, shake energetically the bag. The concept of shaking is applied to nesting problem in a manner described below.

For each shape its physical properties like mass and momentum are considered. An area is defined where the shapes are allowed to move and a simulated gravitation field is applied. The shapes can change their position and oscillate in relation to their initial

33

location. Each shape has assigned a speed vector, which is changing due to the simulated gravitation field and due to interactions between the shapes. When a shape collides with another shape then the law of conservation of momentum is applied. The algorithm runs until the shapes are settled. This method can be used in an effort to improve the solution obtained from the Tabu search algorithm described in previous section, or can be applied on its own, i.e. with a random initial location of shapes.

4. EXPERIMENTATION SYSTEM AND EXAMPLES

An experimentation system to test the developed methods has been developed in C# using Microsoft® Visual Studio 2005 Professional Edition programming environment. The developed experimentation system allows to change the main parameters of both developed algorithms. Results of all simulations are presented as a graphical representation of shapes placement and can be saved to a text file. The application interface enables to compare the results of Tabu search and “shaking” algorithms. The main window of the application interface is shown in Figure 4.

Fig. 4. The main window of application and an example comparison of results obtained for Tabu search

and for Tabu search with “shaking”

34

Fig. 5. Example shapes to be placed on a material (left) and results of Tabu search algorithm (right)

5. CONCLUSIONS AND PERSPECTIVES

Nesting problem is a complex computing problem which occurs in variety of industries, particularly in manufacturing processes where cutting and packing operations are involved. A solution of the nesting problem attempts to place a set of shapes on a material such that the amount of wasted material is minimal. In this paper two methods for solving the nesting problem were proposed, namely Tabu search algorithm and “shaking” algorithm. An experimentation system to assess the performance of both algorithms has been developed and tested. In this paper only example results were presented and further work on improving the experimentation system is ongoing. It is planned to further develop the algorithms, e.g. by allowing rotation of shapes, and to consider different quality measures.

REFERENCES

[1] NIELSEN B.K., An efficient solution method for relaxed variants of the nesting problem, The Australian Theory Symposium 2007, Ballarat, Australia.

[2] NIELSEN B.K., ODGAARD A., Fast Neighbourhood Search for the Nesting Problem, Technical Report no. 03/02, DIKU, University of Copenhagen, DK-2100 Copenhagen, Denmark, February 14, 2003

[3] GENDREAU M. http://www.ifi.uio.no/infheur/Bakgrunn/Intro_to_TS_Gendreau.htm – An Introduction to Tabu Search, 15 January 2008.

[4] GLOVER, F. Tabu Search — Part I, ORSA Journal on Computing 1989 1: 3, 190-206. [5] GLOVER, F. Tabu Search — Part II, ORSA Journal on Computing 1990 2: 1, 4-32.

35


Keywords: Computer simulation, algorithms, scheduling

Bartosz CZAJKA* Iwona POŹNIAK-KOSZAŁKA *

SCHEDULING IN MULTI-PROCESSOR COMPUTER SYSTEMS – VARIOUS SCENARIO SIMULATIONS

In the paper, we analyse some multi-processor scheduling problems. Since such problems are NP-hard, therefore, we provide approximation algorithms that are based on the well known metaheuristics. Furthermore, we provide a dedicated experimentation system that allows to evaluate different solution algorithms.

1. INTRODUCTION

Schedulers are use in almost every science discipline such as mathematics, biology, economy and especially in computer science. Recently, computer companies propose new excellent hardware and software solutions for home or business use. Almost every new machine is based on more than one processor or core. It becomes a standard that we use dual core or quarto core computers, the question is how to operate on many CPUs or cores, effectively. Multiprocessor scheduling [2] is a very complex problem [9]. Having many processors and many queues we desire to answer the question how to schedule tasks in an efficient way for making operating systems [5] more productive. Choosing the best algorithm for a given category of scheduling problem can be done on the basis of simulation.

The first version of the experimentation system that is considered in this paper, was presented in [1]. In this paper, we focus on the developed version of that system called MESMS2 (Multilevel Experimentation for Simulation of Multiprocessor Scheduling). It is based on a concept of a virtual multiprocessor simulation. Using MESMS2, we can test algorithms on many virtual CPUs. Moreover, during simulation, user can choose which queue has to be executed (number of queues is also limited by user). Furthermore, this paper give an information about architectures used in multiprocessor systems and reveals some tests based on MESMS2.

* Department of Systems and Computer Networks, Wroclaw University of Technology, Poland.

36

The paper gives some basic information about scheduling [3], [4]. Then, we present the MESMS2 system, in particular its opportunities and environment parameters. In the next part of the paper we show, step by step, how to make an image of real testing environment. The last part concerns investigations made using the MESMS2 system, on different simulation scenarios including presentation of some results, discussion and conclusions.


The general concept of scheduling is presented in Fig.1, where each CPU (C) decides which process (P) is picked to be executed. The considered problem can be formulated as follows:

P = 1,2,….… NPQ – processes in Q queue

I = 1,2,……. NIP– Instructions in process P

Q = 1,2,…….NQ – queues

A = 1,2 …….NA – algorithms

C = 1,2,…….NCAQ – CPU C using A

algorithm picking process P from Q queue

Threads

C11

2

C27

2

C36

1

A1

A7

A6

Q2

Q1

P20

2

I120

Fig. 1. The concept idea of scheduling

During simulation, the system determines simulation variables (some of them can be regarded as indices of performance) which need to be optimized.

First, we introduce simulation variables:

TQ – the total simulation time for each queue Q [ms],

tP – the waiting time of process P in queue Q [ms],

tIP– the time of instruction I in process P [ms],

tACn – the duration of the scheduling algorithm A for n iterations and CPU C [ms],

tAQ – the duration of the algorithm A for each queue Q [ms].

Next, the optimized objectives are defined. The main objective is to choose the inner parameters such that the following are minimized:

37

QTTTF +⋅⋅⋅++= 211

∑=

=CN

i

iFF1

45∑=

=N

iA

C tF1

4

∑=

=QN

i

iFF1

23

∑∑∑===

+⋅⋅⋅++=NP

ii

P

ii

P

ii

Q tttF111

2

21

- total simulation time

- processes waiting time for one queue

- total processes waiting time

- total algorithms durations

3. SIMULATION - SCENARIOS

Two typical computer architecture will be consider, each of them as a new individual simulation scenario. SMP (Symetric Multiprocessing) and MPP (Massively Parallel Computing) are two most using architecture in computer systems, SMP is shown in Fig. 2.

Fig. 2. SMP – architecture

In this approach every CPU shares memory through a bus communication, since the only one operating system exists. This architecture can also be applied to multi core processors e.g., Dual Core.

38

Fig. 3. MPP - architecture

The second scenario is less complicated and its concept is show in Fig.3. Each CPU has its own memory which is not shared. Moreover operating system is dedicated for a single processor. Using this approach it is very important that the data which is proceed should be prepared for separable processors e.g., database searching – each CPU is searching a different letter.

4. ALGORITHMS

In this paper, we focus on four different scheduling algorithms. The first is based on some priority – scheduler chooses process by its priority (e.g. processing first a process that is more import for the system). The second idea depends on arriving time (e.g. First Come First Serve). The third idea prefers processes with some parameters or character of information they carry (e.g. SJF, where processes are handled according to their sizes). The fourth group contains algorithms which do not depend on process information and structure (e.g. Round Robin). The detailed description of these algorithms can be found in literature, e.g. [1], [2] and [4].

The proposed system MESMS2 gives opportunity for using the number of 10 scheduling algorithms, including algorithms such as Priority (P), First Came – First Served (FCFS), Shortest Job First (SJF), Round-Robin(R-R). The other available algorithms are hybrid algorithms composed from elements belonging to those four groups. It is worth mentioning that using the program modularity, it is very easy to add additional algorithms to MESMS2 by its proper implementation [6].

39

5. EXPERIMENTATION ENVIRONMENT

5.1. APPLICATION

The MESMS2 system was designed and implemented in C# language [7]. As an implementation environment Visual Studio 2005 [8] was chosen. The application can be run on any computer with operating system based on Windows family but Windows 2003, XP Professional, Windows Vista are recommended. We also considered hardware configurations and we realized that multi-core processors would increase simulation efficiency and would give more trusted results.

MESMS2 is organized in six main modules, including experiment design module, CPU generator, queue generator, synchronization module, results collector and presentation module. At the beginning of the simulation a user sets inner parameters (every created queue or CPU can have own specified parameters). CPU speed is given in percentage scale (0% -200%). Moreover, one need to define queue which would be an initial working object for processor. Position “Algorithm” gives opportunities to decide the way how we pick process from queue. Considering queues: we are able to control its capacity by determining the number of processes and their parameters, which are to be used for generating processes in the Queue Generator Module. The obtained results are converted and some plots are generated to present behaviour of the processes.

5.2. EXPERIMENTS

We compare two presented scenarios (SMP, MPP) by creating virtual experiments on MESMS2. Environment were created by special application module. Experiment 1 is based on SMP solution, experiment 2 on MPP, 135 processes were created and their parameters [1] are randomly generated. Simulation inner parameters are as follows:

• number of processes: 120,

• priority range: 1-10, • instructions range: 2-5,

• instruction duration range: 1s – 2s, • algorithms: SJF, Round-Robin, Priority, FCFS,

• processors homogenous: 4.

Fig. 4. presents inner parameters, axis x show each processes axis y its values divided into instructions.

40

Fig. 4. Processes size - Inner Parameters

It can be seen that processes are regular sizes around 4s , and we have only one queue with 130 tasks. Fig. 5 presents priority composition for each process.

Fig. 5. Processes priorities - Inner Parameters

Next, we present as a comparison of 2 scenarios in the first SMP, only one queue exist with 120 processes in the second MPP we split those processes into 4 different queues. Each queue had the same number of processes. Simulation time graph is shown in Fig. 6 and presents step by step simulation events and Fig. 7 shows compared scenarios.

Fig.6. SMP - Simulation time

41

Fig.7. MPP - Simulation time graph

Comparison of simulation time plots gives the view for the processes during experiments. In MPP mode it is more regular course then in SPP, furthermore plots are sizeable and it is possible to enlarge defined areas. Next parameter which was compared is the total waiting time presented on Fig. 8 and Fig. 9. It can be seen that in MPP mode the sequence of processes has less influence on results than in SMP mode.

Fig.8. SMP - Total waiting time

Fig.9. MPP - Total waiting time

42

5.3. TASK SCHEDULE – CPU EXPLOIT

There is one more important parameter CPU exploit, which depends on the processor speed, and the algorithm. When we consider homogenous processors single exploit would be comparable. Fig. 10 presents results for SMP scenario: CPU 1 did 25,11% all of the tasks, rest of CPU are very similar all of the values are above 25%.

Fig.10. SMP - Tasks schedule

The second scenario MPP reveals that the total execution time is longer and equals 157546 ms and the exploit is about 25%. Those values tell us about system stability and functionality – when one processor finishes all the task from dedicated queue it change the context and pick another from a different queue.

Fig. 11. MPP - Tasks schedule

43

6. CONCLUSIONS AND FUTURE WORK

We implemented an application to test properties of algorithms for some scheduling problem. The proposed system MESMS2 let us to build customized environment that allows to create many queues and CPUs, such that all parameters can be fixed. In the our future work, we will add some new functions to the system, and develop a generation module of MESMS2.

REFERENCES

[1] CZAJKA B., I. POŹNIAK-KOSZAŁKA I., Comparison of scheduling algorithms in multiprocessor systems using multilevel experimentation system. Proceedings of the 16th International Conference on System Science. Vol. II, Wroclaw 2007.

[2] KANN V., Multiprocessor Scheduling. http://www.ensta.fr/~diam/ro/online/viggo.html, 2008.

[3] AAS J., Understanding the Linux 2.6.8.1 CPU Scheduler. Silicon Graphics Inc, (SGI) ,(e-book), 2005, pp. 10-13.

[4] SILBERSCHATZ A., GALVIN P.B., GAGNE G., Foundation of Operating Systems. WNT, Warsaw 2006 (in Polish).

[5] TANENBAUM A.S., WOODHULL A.S., Operating Systems Design and Implementation, Third Ed., Prentice Hall, 2006.

[6] HEJLSBERG A., WILTAMUTH S., GOLDE P., The C# Programming Language. Addison Wesley Professional, 2004.

[7] FERGUSON J., PATTERSON B., BOUTQUIN P., C# Bible (Paperback), John Wiley & Sons, 2002.

[8] Visual Studio 2005 SDK. Development Environment Model, Microsoft. Retrieved on 01.05.2008.

[9] BRUCKER P., KNUST S., Complexity results for scheduling problems. http://www.mathematik.uni-osnabrueck.de/research/OR/class/ , 2008

44


Keywords: control, dehumidification, HVAC, modelling, optimisation, simulation

Thomas DANNE†

Jens G. LINDEN†

Dean HILL‡

Keith J. BURNHAM†

MODELLING APPROACHES FOR A HEATING, VENTILATIONAND AIR CONDITIONING SYSTEM

In this paper an approach for modelling a heating ventilation and air conditioning (HVAC) plant isdeveloped. The model is the first step towards model based control optimisation. As the plant consistsof different parts the model reflects this: each component of the plant is modelled separately within ablack, grey, or white box approach. The components modelled are: mixing section (non-dynamic, whitebox), dehumidification unit consisting of a silica-gel desiccant wheel (discrete, black box), cooling coilunit (continuous, grey box) and the dynamics of a clean room which is supplied by the HVAC plant(continuous, grey box). In this paper, the models for the mixing section, dehumidification unit and theclean room are derived.

1. INTRODUCTION

Heating, Ventilation and Air Conditioning (HVAC) plants are widely used to condi-tion the air in clean air production areas. Although these systems are commonly used,most of the HVAC plant controllers operate in a detuned manner, which often leads toinefficient operation. The consequences of poor HVAC control are rarely catastrophic[7], which can give an impression of a satisfactory plant operation even though unnec-essarily large actuator excursions lead to excessive wear and tear and increased energyconsumption.

Abbott Diabetes Care (ADC), an industrial collaborator of the Control Theory andApplications Centre (CTAC) at Coventry University uses over 70 separate HVAC plantsto condition the air in clean production areas. The specifications for the room temper-

†Control Theory and Applications Centre, Coventry University, Coventry, UK‡Abbott Diabetes Care, Witney, Oxfordshire, UK

45

ature and humidity of the air are tight and must not be violated as the quality of theproduct produced in the areas is highly dependent on these specifications. Consequentlyconsiderable effort has been made to install reliable HVAC systems. The HVAC is animportant part of the operation of the overall process and the annual energy costs forrunning this part of the plant alone currently amounts to some £2 m. However, rathersurprisingly, even though the clean rooms have different capacities, a common controlstrategy is used across all plants (namely PID-control with similar gains), which indi-cates potential for reduction of energy costs.

In recognition of this, and based on the assumption that there could be potential forimprovement in performance, there is an attendant need to analyse the system from acontrol point of view to search for an optimised setup. For this purpose, one such plantis chosen and modelled in order to check for optimisation potential. After describing thedetails of the plant, the different components are itemised, modelled and validated. Thevarious sections are briefly mentioned and more detailed descriptions are given in thefollowing sections.

A static white box model is established for the mixing section on the basis of en-ergy and mass conservation laws [2]. The most complex component of the plant is thedehumidification unit, consisting of a silica-gel desiccant wheel. Such thermodynamicdevices can be modelled using a white box technique, as shown by e.g. [3] and [6]. Also,an artificial neural network could be used to replicate the units behaviour [3]. In this in-vestigation, an alternative approach is chosen and the unit is modelled by a linear ARX(Auto-Regressive with eXogenous inputs) model for the humidity behaviour and a bi-linear ARMAX (ARX with a Moving Average noise term) representation for modellingthe temperature behaviour. The room is modelled within a grey box approach. Firstwhite box models are derived from the conservation of energy and mass law for temper-ature and humidity. The model parameters are then estimated by minimising a specifiedcost function and introducing a bilinear term into the temperature model. Details on thebilinear approach may be found in [5].

2. PLANT DETAILS

The HVAC plant considered supplies a room of volumetric dimension 53m2×2.75mwith an air flow rate of1.389 m3

s. A schematic of the plant is given in Figure 1. With

reference to Figure 1 the return air is extracted from the room (a) and mixed with freshair (f). The fresh air is pre-treated by a central fresh air plant. The amount of fresh airadded to the return air is regulated by a damper at a fixed level of12%. The mixed air (b)enters the pre-cooling coil which is deactivated during normal operation of the plant. It

46

Return Air Temperature Signal

Return Air Humidity Signal

from Fresh Air Plant

Exhaust Outside Air

Pre-Cooling

Coil

Dehumidifi-cation Unit

Air Handling Unit

Roomto Air Lock

Damper

cc

c

a

b c d ef

g

Keya return airb mixed airc pre-dehum. aird post-dehum. aire conditioned airg outside air

Fig. 1. Schematic of HVAC plant.

is therefore not a subject of this investigation. To remove moisture, the air is progressedthrough the dehumidification unit. The air (post-dehumidification air) leaving the dehu-midification unit (d) is dry and warm. This air is required to be cooled by the coolingcoil unit (CCU), denoted by, which is integrated within the air handling unit (AHU).The AHU also contains a heating coil unit, denoted by, which is only used for coldstart ups, as well as a driving fan which are both not considered in the modelling exercisehere. The conditioned air (e) is then progressed to the room with a portion being drawnoff to an air lock. The air supplying the air lock is considered as the exhaust from theplant. The plant is adjusted such that the amount of air progressed to the air lock matchesthe amount of air delivered by the fresh air plant.

The actuators that affect the control of the plant comprise of the gas valve at thedehumidification unit and flow valves for both the cool and hot water for the coolingand heating coil unit respectively. The three valves are controlled by three separatePID-controllers. The feedback signals for temperature and humidity are measured in thereturn air (a) duct. The temperature controller includes an interlocking structure suchthat only one of the two coils is active (either heating or cooling).

Measuring the current temperature and humidity from the return air is necessary sothat disturbances in the room (heat and humidity loads) are taken into account. This alsoleads to an unintentional but unavoidable time delay in the closed loop control system.

For a consistent nomenclature, the labels used in Figure 1 are also used as subscriptsfor properties, such as, mass flow ratem, volumetric flow rateV , temperatureϑ, specifichumidityω, etc. For instance, the quantityϑa denotes the temperature a the point a.

47

2.1. ENERGY CONSUMPTION

Since the purpose of the model is primarily for controller tuningoptimisation, itis necessary to consider how the energy is dissipated by the plant. In the case of thedehumidification unit, this is rather straightforward as the amount of gas combusted isassumed to be linearly related to the gas valve position.

Since more cross-coupling is involved in the process of cooling the air, it is consid-erably more complicated to estimate the energy consumption at the CCU. In fact, thecooling coil is a multiple-input-multiple-output (MIMO) system. The three inputs arethe temperature of the incoming air, the temperature of the incoming cool water and itsmass flow rate. Both, output water and output air temperature are dependent on theseinputs. The energy consumption can be estimated by taking into account the incomingand outgoing water temperatures and the mass flow rate.

2.2. TYPICAL OPERATING RANGE

The set points of the controllers are set to a dry bulb temperature∗ of ϑa = 21.5Cand adew point temperature ofϑDP,a = −14C, which corresponds to a specific hu-midity of ωa = 1.1 × 10−3 kg

kg (dry air)†. The return air (a) temperature and humidity will

typically be in this region. The fresh air (f) is either pre-cooled or heated to10C by anexternal fresh air plant. Assuming that the outside air is warmer than10C, the coolingprocess might also dehumidify the air. However, the humidity of the air could, in general,be at any possible value. The fresh air is now required to be mixed with the return air.The ambiguous of the mixed air are typicallyϑb = 20C andωb = 2 × 10−3 kg

kg (dry air).As the pre-cooling coil is not in use, the temperature and humidity values of (b) alsoapply for the pre-dehumidification air (c). The dehumidification unit dries the air totypically ωs = 0.5 × 10−3 kg

kg (dry air). At the same time, this leads to a temperaturerise denoted∆T of about∆T = 10K. To cool the air, the CCU is used. A typ-ical temperature of the conditioned air isϑe = 16C. The humidity is not effectedby the cooling coil. The temperature of the cool water entering the cooling coil is ap-proximately6C which is significantly higher than the dew point temperature of the air(ωa = 0.5 × 10−3 kg

kg (dry air) =ϑDP,d ≈ −22.5C). The heat exchange is hence sensibleonly, which means that no condensation occurs during the cooling process.

∗The dry bulb temperature is the temperature measured with a thermometer whose bulb is dry. It ishence the ordinary air temperature.

†Although the unit ‘ kg

kg (dry air)’ is mathematically ‘odd’, it is commonly used in thermodynamics to clarifythat the specific humidityω is related to the mass of the dry air rather then the mass of the wet air.

48

MixingSectionModel

Dehumidifi-cation Unit

Model

CoolingCoil

ModelRoomModel

ϑb ϑd ϑe

ϑa

ωb ωd ωe = ωd

ωa

Qc

ϑa

ωa

ϑf

ωf

c g ϑg

ωg

c c Qr

Ωr

Fig. 2. Overall plant model structure.

3. MODELLING

The developed model of the HVAC system consists of four sub-models representingdifferent components. Ideal sensors are assumed, i.e. infinite bandwidth, which impliesthat the dynamics of all sensors can be neglected. The sub-systems considered are:mixing section model, dehumidification unit model, cooling coil model and room model;each is modelled as a MIMO system. The combination of the four sub-models leads tothe overall model, which is again a MIMO system. The overall structure of the modelis shown in Figure 2, in whichcc andcg are the signals controlling the cool water valveand gas valve respectively,Qc represents the cooling load at the CCU, whereasQr andΩr represent the heat and moisture load of the room.

The models are derived using appropriate tools, using different approaches and dif-

Table 1. Details of all sub-models.

Component Approach Model Type Parameter Estimation MethodMixing

White box Static From physical propertiessectionDehumidifi-

Black box DiscreteRecursive Least Squares +

cation unit Extended Recursive Least SquaresCooling coil

Grey box ContinousFrom physical properties +

unit cost function minimisation

Room Grey box ContinousFrom physical properties +cost function minimisation

49

Fresh Air

Return Air

Mixed AirDamper

Mixing Section

a

bf

Fig. 3. Mixing Section.

ferent methods for estimation of the model parameters. Table 1 provides an overview ofthe details of the four models.

3.1. MIXING AREA MODEL

A schematic of the mixing section is shown in Figure 3. In order derive a white boxmodel the following assumptions are made: ideal gas behaviour, perfect mixing of theair, constant barometric pressure process, negligible thermal and moisture storage bycomponents, perfect insulation (hence adiabatic process) and negligible infiltration andexfiltration effects. Furthermore, for modelling purposes, fiction and transient behaviourof the mixing section are neglected, i.e. assuming a constant cross sectional speed of theair through the duct. The mixed air flow speed (b) is about5.7m

s. The mixing section is

has avolume of approximately0.125m3, hence it takes0.5m×0.5m5.7(m/s) ≈ 44ms for the air

to progress through this section. Therefore, it is assumed that the flow dynamics due tothe volume of the section are negligible.

Equations derived from mass and energy conservation laws for adiabatic mixing ofair streams are given e.g. by [2]. Applying these to the given case leads to:

ωb =ωaVaρ (ϑa, ωa) + ωf Vfρ (ϑf , ωf )

Vaρ (ϑa, ωa) + Vfρ (ϑf , ωf )+ c1 (1)

hb =ha (ϑa, ωa) Vaρ (ϑa, ωa) + hf (ϑf , ωf ) Vfρ (ϑf , ωf )

Vaρ (ϑa, ωa) + Vfρ (ϑf , ωf )(2)

where, noting the subscriptsa, b andf , the dry-bulb temperatures and specific humiditiesare denoted byϑ andω, respectively. The quantityV represents the different volumetricflow rates (which are fixed). The compensating offset termc1 has been added aftertesting the model. The specific gravity, denotedρ, is dependent on temperature and thespecific humidity. Using standard thermodynamic equations [2] an expression forρ canbe obtained as:

50

Mixing Section Model:

Temperature Error Histogram

Num

ber

ofO

ccur

ence

s

Simulation Error inC

Mixing Section Model:

Humidity Error Histogram

Num

ber

ofO

ccur

ence

s

Simulation Error in10−4 kg

kg (dry air)

-2(−10.6%)

-1(−5.3%)

0.

1(5.3%)

-0.2(−0.99%)

0.

0.2(0.99%)

0

2000

4000

0

1000

2000

3000

Fig. 4. Mixing section model: simulation error histogram for temperature and humidity.

ρ =p− pω

0.622+ω

Ra (ϑ+ 273.15)(3)

whereRa denotes the gas constant of air andp the barometric pressure. Note that (3) isonly applicable in SI-units, so thatϑ is required to be expressed inC.

The enthalpy of wet air is dependent on both, temperature and humidity of the air.Using standard thermodynamic equations [2], one can derive an expression for humidity,denotedh(ϑ, ω):

h(ϑ, ω) = ϑCp,a + ω (hv(0C) + ϑCp,v) (4)

which may be rearranged to give the temperature for a given enthalpy and humiditydenotedϑ(h, ω):

ϑ(h, ω) =h− ω hv(0

C)

Cp,a + ωCp,v(5)

whereCp,a andCp,v are the specific heat capacities of air and water vapour, respectively,andhv(0C) is the enthalpy of water vapour at0C. Equations (1)-(5) define the non-dynamic model for the mixing section.

The above static equations have been implemented in the SIMULINK environment.In order to validate the model it is tested with data obtained from one full day operationof the HVAC plant. The fixed parameters are given by:

51

Outside Air Heated Air Exhaust Air

Dry Air (d) Process Air (c)

GasBurner

cg

from Gas Distribution System

Fig. 5. Schematic of dehumidification unit.

Ra = 287 Jkg K

p = 101.325 × 103 Pa Va = 1.216 m3

sVf = 0.173 m3

s

Cp,a = 1005 Jkg K

Cp,v = 1830 Jkg K

hv(0C) = 2501.3 × 103 J

kg

The model is simulated with a sampling interval of5s. Thehistogram of the sim-ulation error for both models is shown in Figure 4. These results show that there areminor simulation errors. The simulation mean squared error (MSE) for the temperatureis 5.79 × 10−3 K2 and10.8 × 10−9( kg

kg (dry air))2 for the humidity.

3.2. DEHUMIDIFICATION UNIT MODEL

The main component of the dehumidification unit is an adsorbent carrying wheel. Itexhibits an aluminium honeycomb structure. The large aluminium surface is coated withsilica-gel. The silica-gel itself has a large internal surface and is, therefore, able to adsorblarge amounts of water. As the removal of moisture is an exothermal process, adsorbtionheat occurs [1]. Consequently, to remove the water from the silica-gel, heat is required tobe applied to reverse the process. To realise this repeated process, a design as shown inFigure 5 is used. The wheel rotates at a constant angular velocity of about0.2 rpm. Theprocess air is blown through the main part (about3

4 ) of the wheel, during which time theair is dried. Furthermore, the temperature of the air rises due to adsorbtion heat as wellas radiated heat from the gas burner. The latter is mounted in the same housing and is notperfectly insulated. As the wheel rotates, the wet silica-gel enters the reactivation areawhich is about14 of the wheel. To dry the silica-gel outside air is heated with a gasburnerand passed through the wheel. The air used for drying the silica-gel is then dischargedas exhaust. The valve which regulates the gas flow to the burner is the actuator of thissub-system.

The process is continually repeated until the silica-gel looses its humidity absorb-

52

ing properties. This is typically of the order of six years of constant operation, and isnoticeable by a significant reduction in efficiency.

An attempt has been made to model this complex process using the laws of physicsand with a neural network [3, 6]. However, a more satisfying result has been obtained inthis investigation by making use of a multiple-input-single-output (MISO) linear ARXmodel for the dried air humidity and a MISO bilinear ARMAX model for the dried airtemperature behaviour.

In order to find suitable structures for these models, different (bi)-linear ARX/ARMAXmodels have been investigated. Hence, the order of the system, the number of bilinearterms and the number of noise terms are systematically varied and the model perfor-mance assessed using the simulation MSE criterion. The two models presented in thispaper give a sufficient performance, whilst remaining parsimonious, i.e. having limitedcomplexity. The models are implemented and simulated in MATLAB, with a samplinginterval of5s.

The model parameters are estimated using the Recursive Least Square (RLS) algo-rithm for the ARX system, and the Extended Recursive Least Square (ERLS) algorithmfor the ARMAX system. The RLS algorithm [4] is given as:

θk = θk−1 + Lk

(

yk − ϕTkθk−1

)

(6)

Lk =P k−1 ϕk

1 + ϕTkP k−1 ϕk

(7)

P k =

(

P k−1 −P k−1 ϕk

ϕTkP k−1

1 + ϕTkP k−1 ϕk

)

(8)

where θk denotes the estimated parameter vector,yk is the measured output andϕk

isthe observation vector at the sampling instancek.

The inputs for both systems can be combined into a single input vector, denotedukand given by:

uk =[

cg,k ϑg,k ωg,k ϑb,k ωb,k

]

(9)

For modelling the humidity behaviour, a second order, 5 input, 1 output ARX model

53

Dehumidification unit model

Tem

pera

ture

ϑd

inC

Time in h

MeasurementSimulation

Hum

idity

ωd

in10−3

kg

kg

Time in h


2 4 6 8 10 12 14

2 4 6 8 10 12 14

1

2

3

24

28

32

Fig. 6. Simulation of dehumidification unit.

structure is given by:

θω =[

a1 a2 b1 b2 b3 b4 b5]T

(10)

ϕω,k

=[

−ωd,k−1 −ωd,k−2 uk]T

(11)

ωd,k = ϕTω,k

θω + ek (12)

which is found to have satisfactory simulation performance in terms of MSE. Theparameters are estimated with the standard RLS algorithm (6)-(8). The simulation MSEfor the data set used for parameter estimation is1.93 × 10−8(kg

kg)2. For an unseen test

dataset, the performance index is3.59×10−8(kgkg)2. The measured and simulated output

of the test data set is shown in the lower plot of Figure 6. The estimated parameters aregiven in Table 2.

To model the temperature behaviour of the dehumidification unit, a more complexstructure is required. A second order, 5 input ARMAX model with 1 bilinear term anda noise model order of 1 is used. The bilinear term is used to accommodate the product

Table 2. Estimated parameters of humidity model.

a1 a2 b1 b2 b3 b4 b5

−0.439 −0.412 −1.53 10−4

−6.66 10−6

7.60 10−3

−4.04 10−7

0.133

54

Table 3. Estimated parameters of temperature model.

a1 a2 b1 b2 b3 b4 b5 n1 c1−1.96 0.963 9.66 10

−4−3.26 10

−41.07 1.76 10

−3−2.38 −5.11 10

−50.883

of the gas valve positioncg normalised to a scale from 0 to 1 and the output temperatureϑd. Hence, the structure is given by:

θϑ =[

a1 a2 b1 b2 b3 b4 b5 n1 c1]T

(13)

ϕϑ,k

=[

−ϑd,k−1 −ϑd,k−2 uk−1 (ϑd,k−1 × cg,k−1) ek−1

]T(14)

ϑd,k = ϕT

ϑ,kθϑ + ek (15)

Due to the estimated prediction errorek−1 in the observation vectorϕϑ,k

, the observationvector is dependent on theta. The prediction errorek is required to be estimated at everytime step using the latest estimated parameter vectorθk−1 to obtain unbiased results.This is termed ERLS, see [4]. The estimated values of the parameters obtained are givenin Table 3. The simulation MSE obtained with this method is0.464K2 for the estimationdataset and0.288K2 for an unseen data set. It is surprising that the performance withan unseen data set in this particular instance is even better than for the data set used forestimation, but this is a single observation and further work would need to be carried outin order to make any conclusive statement on this. The simulated output is comparedwith the measured output in the upper plot of Figure 6.

Heat and Moisture Load

Return Air Conditioned Air

System BoundaryQe

Ωe

Qa

Ω1

QaΩl

hrωr

Fig. 7. Schematic of room model.

55

3.3. ROOM MODEL

The room is also modelled using two separate yet coupled models for return tem-perature and humidity. Here, both temperature and humidity are modelled using a greybox approach. The bases for both models are white box models which make use of themass and energy conservation laws. The assumptions and conditions for both modelsare: ideal gas behaviour, perfect mixing, constant barometric pressure process, negli-gible infiltration and exfiltration effects. Furthermore, the influence of humidity on theenthalpy of the air is neglected. The moisture flow rate is denotedΩ (kg

s) and Q denotes

enthalpy flow (Js

i.e. W ).Considering the schematic of the room (Figure 7), one can derive the following equa-

tion to model the humidity of the room by applying the conservation of mass law for thewater vapour inside the room. The humidity in the room, denotedωr, is assumed to beequal to the return air humidity, denotedωa, i.e.

d

dtωa = ch,1(ωe − ωa) + ch,1 ch,2 Ωl (16)

with ch,1 =mmr

and ch,2 =1m

. In order to optimise the model with respect to simulationperformance, the value ofch,1 is judiciously varied. Furthermore, a time delay of du-ration Td is assumed to be presented in the model output. The optimised values ofch,1andTd are searched by use of a cost function optimisation method. The MSE betweensimulation and measured data is defined as the cost. The minimum of this is searched byutilising the inbuilt MATLAB function fminsearch. The system has been simulated witha sampling interval of5s within the SIMULINK environment. The estimated parametersfor the humidity model arech,1 = 8.61×10−3 (1

s) (white model value:8.41×10−3 (1

s)),

ch,2 = 6.92 × 10−1 ( skg) andTd = 60 s.

The room temperature model is derived in a similar way. Again, the room air tem-peratureϑr is assumed to be equal to the return air temperatureϑa:

d

dtϑa =

m

mr(ϑe − ϑa) +

Ql

mr Cp,a(17)

This model is derived from the conservation of energy law. An optimal parametrisationfor this model is found by introducing a bilinear term and, in a similar manner to thehumidity model, the approach is applied, yielding:

d

dtϑa = ct,1(ϑe − ϑa) + ct,2 Ql + ct,3 (ϑ5 × ϑ1) (18)

56

Ret

urn

Tem

pera

ture

ϑa

inC

Room temperature model


Time inh

Ret

urn

Hum

idity

ωa

in10−3

kg

kg

(dry

air)

Room humidity model


5 10 15 20

5 10 15 20

0

2

20

22

24

Fig. 8. Room model: Performance with test data.

The final model structure (18) has been implemented in SIMULINK. The optimisedparameters arect,1 = 6.97 × 10−4 (1

s), ct,2 = 6.18 × 10−7 (

CJ) andct,3 = −4.5 ×

10−5 ( 1C s

).The performance of both models (upper plot return temperature, lower plot return

humidity) with unseen test data is displayed in Figure 8. The MSE between simulatedand measured data is9.78× 10−9 ( kg

kg (dry air))2 and 0.0519K2 for the humidity and tem-

perature models, respectively.

4. CONCLUSIONS AND FURTHER WORK

Models for the mixing section, the dehumidification unit and theclean room havebeen derived and validated for an Heating, Ventilation and Air Conditioning control sys-temw. After deriving a model for the cooling coil unit, the models are required to becombined to assess the overall simulation performance. On the basis of this model, dif-ferent control strategies can be tested. As well as achieving an improved PID-controllertuning, which is straightforward to implement, a four-term bilinear PID-controller [5]which could deal with the discovered non-linearities is to be tested. Finally, the newcontrol strategies are to be applied to the plant to evaluate the performance on the actualsystem.

57

REFERENCES

[1] BRUNDERETT G.W.Handbook of dehumidification technology.Butterworths, London,1987.

[2] CENGEL Y.A. and BOLES M.A.Thermodynamics: An engineering approach.McGraw–Hill, London, 1994.

[3] CENJUDO J.M., MORENO R. and CARRILLO A.Physical and neural network modelsof a silica–gel desiccant wheel.Energy and Buildings, vol. 34, 2002, pp. 837–844.

[4] LJUNG L. System identification: Theory for the user., Prentice Hall PTR, Upper SaddleRiver NJ, 1999.

[5] MARTINEAU S., BURNHAM K.J., HAAS O.C.L., ANDREWS G. and HEELEY A.Four–term bilinear PID controller applied to an industrial furnace., Control engineeringpractice, vol.12, 2004, pp.457–464.

[6] NIA F.E., VAN PAASSEN D. and SAIDI M.H.,Modeling and simulation of a desiccantwheel for air contitioning.Energy and buildings, vol. 38, 2006, pp.1230–1239.

[7] UNDERWOOD C.P.HVAC control systems: Modelling, analysis and design., E. & F.N.Spon, London, 1999.

58


Keywords: continuous, estimation, unknown input observer, fault-detection, recursive least squares, suspension

Vincent ERSANILLI*

Keith BURNHAM*

A CONTINUOUS-TIME MODEL-BASED TYRE FAULT DETECTION ALGORITHM UTILISING AN UNKNOWN INPUT

OBSERVER

This paper investigates a continuous-time model-based approach to fault diagnosis in vehicle

tyres. An unknown input observer is used to overcome the problem of the unknown input to the system, namely the road disturbance. A suspension model of a quarter car is constructed from first principles and state space and transfer function models are obtained. The coefficients of the transfer function are estimated in continuous-time using a standard recursive least squares scheme, which provides the basis of the fault detection mechanism.

1. INTRODUCTION

The motivation for the work in this paper arises from an investigation into fault detection schemes for vehicle suspension systems which avoid the direct measurement of tyre pressure. Measuring tyre pressure directly from a rotating wheel whilst the vehicle is in motion is problematic and necessitates the use of radio frequency transmitters and receivers and battery operated sensors [8]. The proposed system estimates tyre pressure based on chassis mounted acceleration sensor measurements. Fault detection in suspension systems via discrete-time (DT) methods has been investigated in [9]. It was reported that under certain conditions it was not always possible to isolate particular faults. In an attempt to increase the sensitivity whilst reducing the number of false alarms a combination of recursive least squares (RLS) and cautious least squares (CLS) was proposed [2]. Studies in [5] and [3] have shown

* Control Theory and Applications Centre, Coventry, UK

59

that it is theoretically possible to isolate faults using continuous-time (CT) model approaches with a state variable filter and RLS for parameter estimation.

A problem with the model-based parameter estimation approach is that the input to the system is unknown i.e. the road surface is not known to the algorithm in advance. The solution to this problem within this work is the inclusion of an unknown input observer which estimates the road surface input from the chassis acceleration, based on knowledge of the suspension system. The design of the observer is based on the work in [1] where the idea from a reduced order observer perspective was developed.

This paper is organised as follows. Section 2 deals with the vehicle suspension model and issues surrounding the selection of sampling interval. Section 3 shows how the unknown input observer is designed. Section 4 details the CT model and the estimation scheme. Section 5 outlines the simulation method. Section 6 gives detailed results and an analysis of the simulation studies. The conclusions are presented in Section 7.

2. VEHICLE SUSPENSION MODEL

Fig. 1 represents the vehicle suspension model for this work in which a quarter car, consisting of a quarter of the chassis (sprung mass,), wheel assembly (un-sprung mass,), suspension spring, suspension damper and tyre spring is considered. The input stimulus to the system is essentially a displacement, denoted, from the road surface. Using Newton’s law of motion the system may be expressed as

+ − + − = 0 (1)

+ − − − − − = 0 (2)

where and denote the displacement of sprung and un-sprung mass, respectively ( and denote the velocity and acceleration in both cases).

A convenient state space representation given by

= + and = (3)

with state vector = leads to = (4a)

= − ! − " + − # (4b)

# = " (4c)

60

" = ! − " + − # − # − (4d)

Fig. 1. Vehicle suspension schematic

Having defined the state vector, the representation takes the following state space vector-matrix form:

=

$%%%& 0100()!*! ()!*! )!*!)!*!0001)!*+! ,!*+!()-()!*+! (,!*+!.

///0 +

$%%& 000)-*+!.//0 (5a)

= 110 − 10()!*! (,!*! )!*!,!*!2 (5b)

The output corresponding to the first row represents the suspension deflection, which is the relative displacement between and and the output corresponding to the second row represents chassis acceleration. Values of the vehicle suspension components are given in Table 1.

61

Table 1. Vehicle suspension component values

Parameter Symbol Value

Sprung mass 350 kg

Un-sprung mass 45 kg

Suspension stiffness 15000 N/m

Tyre stiffness 200000 N/m

Damper value 1100 Ns/m

The vertical acceleration of the chassis is the main output of interest for this system. This is the variable measured on the vehicle. In terms of the model this quantity is given by (4b). The secondary output of interest corresponding to the fast mode is that of the un-sprung mass, comprising the wheel, tyre, brake and axle assembly, given by (4d). Other measured outputs will not be considered in this paper with the exception of suspension deflection as this has an impact on the sampling frequency used in the model. The outputs of the system can be expressed in terms of their transfer functions by applying

3 4 = 56 47 − 8 (6)

where 59 is a particular row of the output matrix :.

For the un-sprung mass this leads to an acceleration transfer function given by

;<=>?.ABC>ABD."<>EBD">BA#D (7)

Similarly, for the chassis this leads to an acceleration transfer function given by

EBD"C>BA#D<=>?.ABC>ABD."<>EBD">BA#D (8)

The poles of these transfer functions are identical and are given by two pairs of complex poles, namely

F, = −12.42 ± 67.51N F,# = −1.38 ± 6.21N

62

Taking the reciprocal of the real part indicates that the time constant of the fastest mode (associated with the wheel dynamics) is 80.5 ms and the slowest (associated with the chassis) is 0.725 s. These represent typical results for a vehicle suspension configuration, such as in Fig. 1. The ratio of the two dynamic modes is typically of the order 10:1 for the un-sprung and sprung mass respectively, see for example [6].

2.1 SAMPLING INTERVAL

Measurements of the chassis acceleration are sampled at an intervalQ. This interval must be selected carefully to capture the dynamics of the dominant modes in the system. Ideally the sampling interval should be one tenth of the time constant of the fastest mode of the system [7] to capture the dynamics of the system. This leads to a sampling interval Q of 8.05 ms and a theoretically ideal sampling frequency of 124 Hz.

3. UNKNOWN INPUT OBSERVER

This approach to observer design divides the state vector in two parts, one part not depending on unknown inputs and the second part depending on the unknown input. The system (3) is equivalent to

= 8 + + RS (9a)

T = : (9b)

Q = UR (10)

where Qis a non singular matrix and U ∈ ℜXY X; and ∈ ℜX, ∈ ℜX, S ∈ℜX, T ∈ ℜX, are the state, known input, unknown input and output vector, respectively. Since p≥m, rank(D)=m, rank(C)=p and the pair (C, A) are observable, one can proceed.

Suppose

= Q = Q [\ (11)

with ∈ ℜX; , ∈ ℜ and

8 = Q;8Q = 18 88 82 (12)

63

= Q; = 12 (13)

R = Q;R = 1 07 2 (14)

: = :Q = :U:R (15)

the relation (9) can be written

= 8 + 8 +

= 8 + 8 + + 7 S (16)

T = : + :.

The state is dependent on the unknown input v whereas is not, which makes a superior candidate for estimation [1]. The input-free system becomes

= 8 + 8 + (17a)

T = : + : (17b)

Suppose we create a non-singular matrix

] = :R^ (18)

with ^ ∈ ℜ_Y _;

and denoting

]; = 1]]2 (19)

with ] ∈ ℜ Y_, ] ∈ ℜ _; Y_, verifying

];] = 1]:R ]^]:R ]^2 = 17 00 7_; 2 (20)

64

pre-multiplying both sides of measurement equation (17) by ]; leads to

]T = ]:U +]:R , ]T = ]:U + ]:R (21)

Combining (20) and (21) gives:

]T = ]:U + (22)

]T = ]:U (23)

The state is then deduced from (22) such that

= ]T − ]:U (24)

hence substituting (24) into (17) gives

= 8 + + aT

Tb = : (25)

where

8 = 8 − 8]:U, a = 8], : = ]:UandTb = ]T.

If the pair c8, :d is observable or detectable, following the conventional Luenberger observer design procedure [7], it is possible to design a reduced order observer for the unknown input free system (25)

e = c8 − f:de + + fT (26)

where f ∈ ℜ X; Y _; f = f] + a. Then

e = Qe = Q [ g]T − ]:Ug\ (27)

and e → as i → ∞. Based on the reduced order observer described by (26) and (27), an estimation of unknown inputs can be obtained

65

Se = ]T + 3#g + 3"T + 3A (28)

where

3# = ]:Uf]:U + ]:U8]:UU −]:U8 − 8 + 8]: (29)

3" = −]:Uf] − ]:U8] − 8] (30)

3A = −]:U − (31)

4. PARAMETER ESTIMATION

RLS is the method used here to estimate the coefficients of the transfer functions of the suspension model. RLS is a straightforward online estimation algorithm, yet it is optimal in the mean square error (MSE) sense when the assumptions on linearity of the model and Gaussian properties of the measurement noise hold. Although auto regressive moving average (ARMA) additive noise has been adopted for the noise models, the estimator is found to perform adequately, as will be demonstrated in the results presented in Section 5.

4.1 THE CONTINUOUS-TIME SYSTEM MODEL

The RLS algorithm is used to estimate the coefficients of a CT differential equation model based on sampled data measurements of the input and output variables obtained in DT. Consider the linear differential equation representation

klm kl + n

kl(om kl(o +⋯+ nX i = qr

k* k* +⋯+ q i (32)

Taking Laplace transforms, and assuming zero initial conditions, the transfer function corresponding to the above differential equation takes the form

s 4 = t u ] 4 (32a)

where s 4 and ] 4 denote the Laplace transforms of the noise free system output i and the available noise free input i, respectively. The transfer function numerator and denominator polynomials are given by

4 = qr4 + q4 ; +⋯+ q (33)

66

8 4 = 4X + n4X; +⋯+ nX (34)

where s is the Laplace variable. The CT system model input and noise free output i and i, respectively, are sampled at discrete intervals i, … , iw. In the case of uniformly sampled data (as in the vehicle suspension simulation) at each sampling interval ∆t, where ix = ∆i, the measured output is assumed to be corrupted by an additive measurement noise z ix, i.e.

T ix = ix + z ix (35)

where ix is the sampled CT deterministic, noise free output of the CT system and, as in the DT case, ξ ix is modelled as a DT ARMA process

ξ ix = c|(od

|(o ~ ix~ ix = U 0, (36)

The problem is to estimate the parameters of the CT differential equation (or transfer function) model (32) from N sampled data pairs comprising the available noise free input and noise corrupted output, denotedw = ix; T ixx;w . The system estimation equation at the sampling instant is expressed in the following pseudo linear regression form

T X ix = 6 ix + z ix (37)

6 ix = [−T X; ix…− T r ix ix… r ix\ (38)

= n…nXqr…q 6 (39)

where the subscript f denotes hybrid filtering which involves a CT filter. First the pre-filtered derivatives which are sampled at instant ∆i are obtained as the inputs to the integrators in the CT implementation of the state variable pre-filter1 8⁄ 4, as shown in Fig. 2.

67

Fig. 2. State variable filter

Ideally the coefficients of the pre-filter match those of the unknown system [11]. In practice these would be initialised with approximate values and iteratively updated with the new estimates as they become available. In this work, however, rounded values close to those of the coefficients corresponding to the nominal CT suspension system are used. Further consideration would need to be given as to updating the coefficients in an application such as fault detection.

4.2 REPLICATING FAULTS IN THE SYSTEM MODEL

There are many ways in which a suspension may degrade but only tyre faults are considered here. In particular, a slow deflation which results in a gradual reduction in tyre stiffness of some 50% is considered. This fault scenario is replicated by creating a matrix of theoretical values of the model parameters starting at sample with the nominal (no fault) values for the parameters, denoted X.The parameters are linearly, incrementally changed from the sample where the fault starts, denoted up to the sample when the fault ends, denoted with the faulty values of the faulty parameter vector, denoted . From the sample to the end of the simulation, , the values remain fixed at .

5. SIMULATION STUDIES

For the purposes of fault detection a robust diagnosis with no false alarms of faults is required. To achieve this goal a matrix of tests is implemented and a majority voting

68

system is proposed, similar to the type that is used in aircraft [10]. The fault decision algorithm is presented with the result of three tests and a majority verdict decides the diagnosis and hence alerts the driver of a problem with tyre pressure. The tests are detecting changes in the system in three distinct ways. The primary approach is parameter estimation and is carried forward from the work in [4]. This technique is augmented by analysis of the input estimation: variance and the phase portrait.

The simulation studies show that the estimated parameters no longer converge to the true model parameters. The cause of this is most likely to be the estimation of the input, which is only an approximation of the road surface. This behaviour is not particularly problematic as the estimations tend to settle to a steady value when the system is in steady state and during a fault are changing in sympathy with the actual model parameters. A persistent change in the parameters can then be deemed to be a fault. For practical applications, bounds and conditions should be placed on the variation of the estimated parameters for a diagnosis to take place. With further testing work the value of the estimated parameters could be linked directly to pressure in the tyre.

6. ESTIMATION RESULTS

Fig. 3 shows estimation of parameter in a typical test run with no fault. Contrast this result with Fig. 4 which shows the fault occurring at 6 minutes and stabilising at 12 minutes. Fig. 5 shows the mean variance of the input estimation as it evolves over time, starting with the fault free condition, with the fault being introduced at the 40% of the total test time and stabilising at around 70%. Fig. 6 compares the phase portrait of the input estimation before and after the fault has occurred and stabilised.

7. CONCLUSIONS

A quarter car suspension model and unknown input observer was developed. The parameters of the transfer function model were estimated with no access to the real (road) input. Diagnostics were developed to identify changes in the system relating to tyre pressure decrease and a majority voting system was proposed.

The diagnostic tests show that it is possible to distinguish between the system in a nominal state and the faulty condition, by the use of three different tests. During the course of the simulation studies it became clear that tuning the observer made a significant difference to the ability of the diagnostic algorithms to track changes in the system. The observer design is dependent on the system matrix A and so computing the observer with modified values of tyre spring, , moved the poles of the observer.

69

Fig. 3. Parameter estimations of in the fault free condition

Fig. 4. Parameter estimations of as a fault occurs at 6 minutes and stabilises at 12 minutes

70

Fig. 5. Mean input estimation variance as a fault occurs at 40% and stabilises 70%

Fig. 6. Phase portrait of the input estimation before and during a fault

This behaviour highlights a property of the fault detector: during a fault, the configuration of the observer is no longer theoretically optimum. The solution to this problem was to start with an observer that is optimally configured for the faulty condition, which happens to work adequately for the fault free condition and is an

71

improvement over the observer which is configured for the fault free case. Further work will include an investigation of the possibility of a multiple model approach with models for a variety of different system states.

With further testing work the value of the estimated parameters could be linked to pressure in the tyre to give an estimation of the real pressure rather than merely indicated a change in the pressure.

Majority voting is the proposed method of defining a fault and this could be further developed into a pattern matching algorithm that can match test outcomes with vehicle states i.e. change in mass, road surface, vehicle speed to improve the accuracy over a range of driving scenarios.

REFERENCES

[1] BOUBAKER O., Full order observer design for linear systems with unknown inputs, IEEE International Conference on Industrial Technology, 2004

[2] BURNHAM K. J., Self-tuning Control for Bilinear Systems, PhD Thesis, Coventry Polytechnic, Coventry, UK, 1991

[3] ERSANILLI V. E., BURNHAM K. J., KING P. J., Comparison of Continuous-Time and Discrete-Time Vehicle Models as Candidates for Suspension System Fault Detection. IAR Workshop on Advanced Control and Diagnosis, Coventry, UK, 2008.

[4] ERSANILLI V. E., Fault Detection for Vehicle Suspension. MSc Dissertation, Coventry University, UK, 2008.

[5] FRIEDRICH C., Condition Monitoring and Fault Detection, MSc Dissertation, Coventry University, UK, 2006.

[6] GILLESPIE T., Fundamentals of Vehicle Dynamics, Society of Automotive Engineers, Warrendale, USA, 1992.

[7] NISE N., Control Systems Engineering, John Wiley and Sons, Inc., USA, 2004. [8] VELUPILLAI S., GÜVENÇ L. Tire Pressure Monitoring, IEEE Control Systems

Magazine, Dec. 2007, pp. 22-25. [9] WALKER C. J., A Cautious Fault Detection Algorithm, BEng project dissertation,

Coventry University, UK, 1991 [10] WEIZHONG Y., JAMES L., GOEBEL K. K., A multiple classifier system for aircraft

engine fault diagnosis, Proc. of the 60th meeting of the society for machinery failure prevention technology, 2006, pp. 291-300,

[11] YOUNG P. C., The Refined Instrumental Variable Method, Unified Estimation of Discrete and Continuous-Time Transfer Function Models, Journées Identification at Modélisation expérimentale, Poitiers, France, 2006

72


Keywords: unicast, anycast, CFA, routing, flow, capacity, Top-Down, Flow Deviation

Jakub GŁADYSZ* Krzysztof WALKOWIAK *

THE HEURISTIC ALGORITHM BASED ON FLOW DEVIATION METHOD FOR SIMULTANEOUSLY UNICAST

AND ANYCAST ROUTING IN CFA PROBLEM

In this paper, we present heuristic algorithms to solve the capacity and flow assignment (CFA) problem simultaneously for unicast and anycast flows. By introducing anycast (one-to-one-of many) flow and replicas of servers we increase reliability and decrease summary flow in network. As a criterion function we use the total average delay with the budget constraint. To obtain an initial selection we use a heuristic algorithm based on the top-down method. Next we try to decrease vector of flows using CFD_DEL algorithm for unicast and anycast connections. Finally, we present results of computational tests.

1 INTRODUCTION

Due to increasing number of people using the Internet and due to growing flows of Information, the mechanism of designing computer networks is a matter of great importance.

In designing computer networks there exist a few kind of problems [2]: flow assignment problem (FA), capacity assignment problem (CA), capacity and flow assignment problem (CFA), topology, capacity and flow assignment (TCFA). In the literature there are many papers touching FA and CFA problems for unicast (one-to-one) and multicast (one-to-many) connections [2][5]. Anycast paradigm is a new technique to deliver packets in computer networks which was implemented in Internet Protocol version 6.0. It is the point-to-point flow of packets between a single client and the “nearest” destination server. The idea behind anycast is that a client wants to download or send packets to any one of several possible servers offering a particular service or application [4]. This flow has become more popular since users started


73

downloading of books, movies, music etc. One of technology that applies anycast traffic is Content Delivery Network (CDN) [6].

In this paper, we consider Capacity and Flow Assignment Problem for multicommodity, non-bifurcated flow. The goal of our model is to minimize function associated with flow and capacity. That function can be the total average delay which was formulated by Kleinrock [3]. In this problem network topology, location of severs, set of capacities and link cost, set of unicast and anycast demands are given. Unicast connections are defined as triple: origin node, destination node and bandwidth requirement, anycast is defined as client node and bandwidth requirement to and from server. In the problem we know a set of routes. We assume that all connections should be established and summary flow in each arc can not be bigger than its capacity. This problem was formulated in [1].

2 PROBLEM FORMULATION

To represent the problem we use the following notation: Sets: ℜ the set of selections,

rX the set of variables kpx , which are equal to one,

rY the set of variables lay , which are equal to one,

kpπ the index set of candidate routes(path) for connection p,

ANP , UNP the set of anycast and unicast connections,

)(az the set of capacities for channel a .

Constants: kpaδ 1 if arc abelong to route k realizing connection p , 0 otherwise,

lac value of capacity respectively to lay ,

lak value of link cost respectively to lay ,

)( pτ index of the connection associated with connection p,

)( kpo π the origin node for connection p,

)( kpd π the destination node for connection p ,

B budget,

74

κ the total message arrival rate from external sources,

Variables:

af the summary flow in arc a

kpx the binary variable, 1 if route k belong to connection p , 0 otherwise

ay capacity of arc a

lay 1 if capacity of arc a is l

ac

This problem can be formulated below:

yf ,min ∑

∈ −=

Aa aa

a

fy

fZT

κ1

)(

Subject to:

(1)

1=∑Π∈

kp

k

xp

pkp kPpx Π∈∀∈∀∈ ,1,0 (2)

AN

k

kp

kp

k

kp

kp Ppoxdx

pp

∈∀∑=Π∈Π∈

∑)(

)()( )()(τ

ττ ππ (3)

AayQxf ak

pkp

kpa

Ppa

p

∈∀≤= ∑∑Π∈∈

δ (4)

1,0∈lay 1

)(

=∑∈ azl

lay Aa∈∀ )(azl∈∀ (5)

∑∈

=)(azl

la

laa cyy (6)

∑∈

≤=Aa

la

la BykZD )( (7)

),( YXZ = , 1:,

=

=kpxpk

kpxX U ,

1:,

la

ylayY

la =

= U (8)

Constraint (2) guarantees that only one route can be chosen for one connection. Equation (3) guarantees that two routes associated with the same anycast demand connect the same pair of nodes. Condition (4) states that flow in each arc cannot be bigger than its capacity. Condition (5) ensures that only one value of capacity can be taken for each arc. Equation (6) denotes value of capacity in arc a . Constrains (7) states that summary link cost can’t be bigger than the budget. Point (8) is a definition

75

of a selection Z including set X and set Y . Set X and set Y include all variables x and y , respectively, which are equal to one.

3 ALGORITHM

We set the initial selection using a heuristic algorithm that is based on Top-Down method. The initial selection is found according to the following steps:

Step1 For each channel a in representation 1Z set value of capacity to a maximum

value. For each unicast connection find the shortest path using )( fl Ta metric for zero-

flow in network. Then find the shortest path to the server and from server for anycast demand to satisfy anycast condition )()( )(

jp

ki od τππ = . Then calculate flow in each

channel.

Step2 Calculate )( 1iZT . If capacity condition (4) is not satisfied for network iS1 ,

then the algorithm stops. The problem has no solution. Otherwise go to step (3).

Step3 Check budget constraint. If BZD i >)( 1 then go to step (4), otherwise

BZD i ≤)( 1 go to step (5). Step4 Take such value of capacity in channel a , for which value

of la

ma

Ta

ma

lalm

a kk

wccKT

−−

=∆)(

is the least possible. Variable Taw is the first partial

derivative of function )(ZT over arc capacity. Set 1+= ii and go to step (2).

Step5 The initial selection is found. Stop algorithm and set iZZ 11 = .

Let f and c be the vectors of flows and capacities for feasible initial selection 1Z .

Then use CFD_DEL algorithm for the fixed vector of capacity. This algorithm was proposed in [6] and it jointly optimizes unicast and anycast flows in a network. It decreases criterion function )(ZT by minimizing vector of flow.

4 COMPUTATIONAL RESULTS

To solve CFA problem we split it to CA and FA problems. Based on Top-Down algorithm we obtain feasible values of capacity for zero-flow. Based on CFD_DEL algorithm we try to minimize vector of flow in a network for fixed values of capacity.

76

The experiments were conducted with two main purposes in mind. First we will compare values of criterion function for a feasible solution obtained by Top-Down algorithm and results by CFD_DEL. Next, we will examine a value of the criterion function according to the numbers of servers-replicas in a network. In computational experiments we used the network topology with 14 nodes and 56 links and assumed that 30% traffic in network is one-to-one-of-many, other traffic is one-to-one. Fig.1 shows the topology of the test network.

6

9 7

2

4

3

1

810

5

13 14

12

11

Fig. 1. Topology of network

Results of two Algorithms for one, two and three servers are presented in Fig. 2-4.

0

0,1

0,2

0,3

0,4

0,5

0,6

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Localization of server in network

T(Z

) Top-Down

CFD_DEL

Fig. 2. Values of T(Z) for one server

77

0

0,1

0,2

0,3

0,4

0,5

0,6

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85

Different localizations of two servers

T(Z

)

Top-Down

CFD_DEL

Fig. 3. Values of T(Z) for different localization of two servers

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

Different localizations of three servers

T(Z

)

Top-Down

CFD_DEL

Fig. 4. Values of T(Z) for different localization of three servers

In Fig. 2-4 we can notice that algorithm CFD_DEL reduces the feasible initial

selection obtained by the algorithm Top-Down. Comparing results of experiments the algorithm CFD_DEL reduces the criterion function for one replica server located in the network to 34%, for two replica servers to 37%, and for three replica servers to 43%.

78

Table 1. Average values of T(Z) in view of number of nodes obtained by Top-Down and CFD_DEL algorithm

Number of servers 1 2 3

T(Z) for Top-Down 0.3078 0.2931 0.2768

T(Z) for CFD_DEL 0.2016 0.1828 0.1571

Table 2. Minimal values of T(Z) and nodes for which minimal values are obtained by Top-Down algorithm


Min T(Z) 0.2307 0.1932 0.1900

Localization of servers in network:

5 5,10 4,5,11

Table 3. Minimal values of T(Z) and nodes for which minimal values are obtained by CFD_DEL algorithm


Min T(Z) 0.1470 0.1265 0.1221

Localization of servers in network:

5 3,12 2,5,11

5. FINAL REMARKS

In this paper, we presented a model and solution methods for CFA problem for unicast and anycast flows. We used two heuristics algorithms – first to find an initial feasible solution, second to minimize the criterion function in view of flow in the network. Results of the experiments were presented in Fig. 2-4.

In Tables 1-3 we can notice that the next server in the network decreases the criterion function and the minimal value of T(Z) for both algorithms. We must remember that every next server decreases total average delay, decreases flow in network, but increases cost of building it. Tables 1-3 show that for both Algorithms the best localization of servers-replicas is different. For example the best localization of servers-replicas for algorithm Top-Down is in nodes 5, 10 and 4, 5, 11 but for algorithm CFD_DEL in nodes 3,12 and 2,5,11. To examine results obtained by CFD_DEL algorithm we must find a optimal solution by an exact algorithm (e.g. branch-and-bound method [1]).

79

REFERENCES

[1] GŁADYSZ J., WALKOWIAK K., Branch-and-Bound Algorithm for Simultaneously Unicast and Anycast Routing in CFA Problem, 42th Spring International Conference, Modelling and Simulation of Systems 2008, Czech Rublic, MARQ Ostrava, pp. 108-115.

[2] KASPRZAK A., Projektowanie struktur rozległych sieci komputerowych, Monography, Oficyna Wydawnicza Politechniki Wrocławskiej, Wrocław, Poland 2001 (in Polish).

[3] KLEINROCK L., GERLA M., FRATTA L., The Flow Deviation Method: An Approach to Store-and-Forward Communication Network Design, Network – an International Journey, John Wiley & Sons , 1973, pp. 97-132.

[4] METZ CH., IP Anycast Point-to-(Any) Point Communication, IEEE Internet Computing, March-April 2002, pp. 94-98.

[5] PIORO M., MEDHI D., Routing, Flow and Capacity Design in communication and Computer Network, Morgan Kaufman Publishers, 2004.

[6] WALKOWIAK K., Algorytmy wyznaczania przepływów typu unicast and anycast w przeżywalnych sieciach zorientowanych połączeniowo, Monography, Oficyna Wydawnicza Politechniki Wrocławskiej, Wrocław 2007 (in Polish).

80


Keywords: Reinforcement Learning, Q–learning, Boltzman Scheme, Counter Scheme, Adaptive Scheme

Tomasz KACPRZAK* Leszek KOSZAŁKA*

COMPARISON OF ACTION CHOSING SCHEMES FOR Q-LEARNING

A very important issue when implementing Q-learning algorithm regarding its performance is an action choosing scheme. Q-learning algorithm does not define how to choose an action to perform during single step – it is a decision of designer to implement most effective scheme in a particular system. In order to obtain maximum result in specified time, the designer has to decide about exploration and exploitation in action choosing scheme. Well described methods, such as greedy, probability–based or temperature–based can be applied as such scheme. This paper studies and compares effectiveness of different action choosing methods considering various environments and introduces a new method Adaptive Scheme, which evaluates how profitable exploration is and uses this information in further decision making.

1. REINFORCEMENT LEARNING

Reinforcement Learning (RL) is a method of solving decision making problems in an unknown environment. Almost every environment can be represented with a state graph or matrix and a set of actions, which are available to perform in this state [1][2][3]. The decision module chooses and performs an action in a particular state. The aim of the module is to make such decisions, which will move – or maintain – the system in certain state, or prevent the system from moving to particular state. In RL this matrix is not known to the decision module at the beginning of operation. The decision system performs actions and obtains the reinforcement from the environment. This reinforcement is called “reward”, and can be positive or negative in terms of reaching the goal of a decision system. There must be a non-empty set of states, which are considered as “absorbing”, after reaching which the trial is finished and next trial * Department of Systems and Computer Networks, Wrocław University of Technology, Poland.

81

begins [2]. RL can be considered as learning through “trial and error”. RL algorithms can be divided into strategy–based, and action–based. First learns by assuming a strategy in the beginning and updating and improving it during learning. Second learns the best action to perform in particular state.

This work covers the topic of action–based RL that is the most popular implementation of Q-learning.

2. Q - LEARNING

The Q-learning algorithm learns to takes the given action in the given state such that the objectives are optimized [1]. The algorithm uses the Q–matrix, in which the relations between stages and actions are given. These values are calculated according to the current value, the reinforcement (reward) and discounted value resulted from the previous actions. The Q–matrix is updated after every time an action is performed. The general idea of the algorithm can be presented as follows [1]:

1: for each time step t do 2: observe current state xt; 3: at := choose action (xt, Qt); 4: perform action at;

5: observe reinforcement rt and next state xt+1; 6: ∆ := rt + γmaxa Qt (xt+1, a) – Q(xt, at); 7: updateβ (Q(xt, at), ∆). 8: end for each.

where γ is a discount factor. If γ is close to 1, the algorithm tends to choose an action, which leads to bigger, but more time distant profits, and when it is close to 0, it prefers instant rewards.

Q-learning algorithm does not specify how the action is chosen (step 3). This paper compares and evaluates popular methods of action choosing.

3. ACTION CHOOSING SCHEMES

The problem of exploration is very important in Q-learning algorithm - it is essential for the learning system to obtain sufficient knowledge of the environment in which it performs, realized by proper action choosing [1]. Of course, the main goal of the system is to obtain results (exploit), and exploration is most frequently an action, which does not aim to get profits. Exploration can be considered as experimental behavior of system, in which more time is spent to know environment better, and to get greater rewards in the future. The proper balance between exploration and exploitation has to be achieved during the operation of the system. Action Choosing

82

Schemes aim to achieve this balance. The most popular Action Choosing Schemes are discussed in the further part of this section.

• Greedy – the system does not experiment at all. In every step, it uses the first action it performed, about which it is known, that leads to the goal. In this work Greedy Scheme will be treated as reference for comparison purposes.

• ε – Greedy – a probability–based scheme, in which the random action is chosen with probability ε > 0 and greedy action with probability (1 – ε) [1].

• Boltzman Scheme – a scheme which uses the Boltzman Distribution, described by:

( ) ( )( )( )( )∑

∗∗ =

aTaxQ

TaxQax

/,exp

/,exp,π (1)

where T > 0 is called temperature, and it regulates the degree of randomness in action choosing. Values close to 0 cause almost deterministic action selection and values close to 1 – almost random. In Boltzman Scheme the smaller the probability of selection of non-greedy action is, then the most greedy action has a greater value than the others. During learning, the temperature is “cooled” by decreasing T.

• Counter Scheme – these scheme is based on assigning one or more parameters to each action in each state, next to Q–value. The parameters are used in process of action selection. It can contain such information as last selection time, number of times this action has been selected in this trial. In this work the Counter Strategy chooses the greedy action with probability ε > 0 and action which has not been performed for the longest time from the set of non–greedy actions. ε is in fact a temperature indicator. During trials it is decreased according to the parameters.

• Adaptive Scheme – this new method is introduced in this paper and it is based on Counter Scheme. The additional feature of this scheme is that it monitors how efficient the exploration is by measuring the average number of changes of Q values when algorithm chooses a random action to perform – the Efficiency Factor. Next, it sets temperature accordingly to exploration efficiency factor. This mechanism also resembles the way human learn – if learning of new solutions do not bring any profits, it abandons learning and focuses on results. The size of averaging window affects the sensitivity of scheme.

83

4. EXPERIMENT SETTINGS

The aim of the algorithms is to obtain maximum results in time specified by the designer. To do it, schemes responsible for action choosing have to maintain the proper balance between exploration and exploitation. During initial analysis of the problem it can be easily recognized that for very long time of trials (which are considerably longer than it is needed for an algorithm to fully learn the environment) the exploration method is not very significant for results of the algorithm. Also no further study is needed for those “long run” cases. More interesting is the case, when learning time is comparatively shorter, equal or a bit longer to experiment time. This will be the scope of this paper. No ending conditions were specified for the algorithm, the trial was performed until the high predictability of further results.

The results are measured by the summation of reward values that the schemes obtained in the given time. The experiments were performed in the grid environments, (see Fig. 1). It contains absorbing states (grey) that are also rewards and forbidden states (black).

Fig. 1. Grid Environments used for experiments. a) simple case. b) middle-sized complex environment, with significantly larger rewards hidden in corners, what makes them very difficult to reach.


To perform experiments a simulating software was created in Visual C++ environment (Fig. 2). The simulator allows to create 2D environments, perform tests of various time length, and compare action choosing schemes. It operates in two modes – demonstration mode and comparison mode, in which it performs simulation 84

for all schemes in same environment. The reliability of comparison is assured by calculating average score of user – defined number of simulations. The program can also present how Q-learning works and serve for demonstration purposes. In this environment the designer can specify following input:

• rectangular board of various size with rewards and obstacles, • the discount factor γ (for all schemes), • random action probability (for ε–greedy algorithm), • the initial temperature Ti and the temperature decreasing step dT (for

Boltzman and Counter schemes), • the averaging window size – the number of past results, which is used to

calculate average score in time, • the averaging window size for adaptive scheme – the number of past

profitable random actions taken into consideration in computing Efficiency Factor,

• the time period of the simulation (for comparison mode only), • the trial count – the number of simulations (to assure reliability).

Fig. 2. Screenshot of the proposed experimentation system.

85

6. EXPERIMENTS

The comparison was based on many simulations in different environments. Results for many types of environments were similar, nevertheless two different cases were defined: simple and complex environment case, based on Environments a) and b) (see Fig. 1). The comparison was performed for these cases varying with exploration scheme, environment (presented above) and the scheme cooling time. The discount factor was also tested, but the results did not bring significant conclusions.

6.1. “COOLING” TIME ADJUSTMENT

Fig. 3 presents average score in time, when cooling was time: 1000 time ticks. Fig. 5 presents the average for cooling time of 400. Both cases discount rate was set to γ = 0.8. The Averaging Window for Adaptive Scheme was set to 5, which gave best results in simulations. For environment (a) and γ = 0.3 average time results were very similar to Fig. 3 and Fig. 4.

Fig. 3. Simulation results for environment (a), γ = 0.8, “cooling” time set to 1000.

Fig. 4. Simulation results for environment (a), γ = 0.8, “cooling” time set to 300. 86

Points marked with squares present the time, when the total score of particular scheme exceeded the score of greedy scheme. It can be easily noticed that in both cases Counter, Boltzman and Adaptive Schemes obtained better average score than Greedy Scheme, what indicates, that spending more time for learning in this Environment is profitable. In fact learning is profitable in majority of tested environments. However it also can be noticed, that for Boltzman and Counter Schemes the “cooling” time is a crucial factor – setting it too long can cause spending more time for exploration when Environment is already explored (as seen in when comparing Fig. 3 and Fig. 4). Too short “cooling” time causes the exploration to be incomplete, on the other hand system gains scores in early stage of learning (Fig 4).

Adaptive Scheme, which adjusts temperature on it’s own, always reached the highest possible average, however, it reaches the maximum in long – term. In mid – term it obtains the average score comparative to Counter Scheme.

6.2. COMPLEX ENVIRONMENT

Environment b) has to be considered “complex”, because it contains rewards, which require much exploration – they are hidden in the corners of the board. Exploring them brings significant profits in terms of average to the algorithm. Simulation in this environment was performed only using γ = 0.8, because use of small Discount Factor would make these hidden rewards insignificant. Fig. 5 presents the average score for Environment b).

Fig. 5. Simulation results for environment (b), γ = 0.8, “cooling” time set to 5000.

It can be seen that the best average score was obtained by Boltzman scheme. This

observation was made also in other “tricky” environments. The Counter Scheme average score was significantly lower, on the other hand, it performed well in the beginning of experiment. Adaptive Scheme also performed on the satisfying level in

87

the beginning, and kept constant improvement rate later on. During longer trial it proved to reach the maximum average.

6.3. CONSIDERING DISCOUNT FACTOR

The situation was slightly changed when smaller discount factor was considered at the same “cooling” time. At first, it can be noticed that overall average is higher, when discount factor γ = 0.3. That is because of internal characteristics of Environment – it is rather simple, rewards are regularly distributed within Environment and the value of rewards are comparable. Adaptation of γ is not a subject of this work, but seems to be interesting subject of research.

Observation of average scores in case 2 (Fig. 4) leads to conclusion, that for this conditions Counter Scheme outperformed the Boltzman Scheme – it had significantly better average score and also better total score in all moments of time. Observation of simulation indicated, that Counter Scheme tends to “travel long distances” in Environment, by following one direction and rather not going back. That seems to be a reason why Counter Scheme is eligible for using high discounts factors and not low – as it will be presented after analysis of the next plot.

Fig. 5 presents the case of γ = 0.3 and “cooling” time of 300. In this case the Boltzman scheme occurred to be the best in the long term, since it reached maximum in the most cases. Furthermore, it did not stop to learn, despite zero temperature. That is because Boltzman scheme always uses probability when choosing the action to perform and the decreasing of the temperature diminishes only the probability of choosing “non–greedy” action. In situation of low discount factor and regular environment it will still sometimes choose “non - greedy” action and therefore learn. This phenomena is significant for very limited range of specific environments.

Fig. 6. Simulation results for Environment (a), γ = 0.3, “cooling” time set to 300.

88

7. CONCLUSIONS

In this paper, the different exploration schemes were analysed. The experiments showed that the simple Greedy Scheme provided very good results for a short running time. The ε–greedy solution did not seem to obtain neither results nor average comparable to other schemes using various ε values. For a mean running time applications Counter Scheme obtained best results among all schemes (Fig. 4). For the long running time the Boltzman Scheme seems to outperform other schemes. The adjustment of the temperature can be a problem for Boltzman and Counter Scheme when the designer does not know the length of a trial or the characteristics of the environment. In this cases Adaptive Scheme can be successfully applied. It performed well in mean running times and is ready to learn until it reaches the maximum average score.

REFERENCES

[1] BOLC L., ZAREMBA P., Wprowadzenie do uczenia się maszyn, Akademicka Oficyna Wydawnicza, 1993 (In Polish).

[2] CICHOSZ P., Systemy uczące się, Warszawa, WNT, 2007 (In Polish). [3] DE FARIAS DP, MEGIDDO N., “Exploration-Exploitation Tradeoffs for Experts Algorithms in

Reactive Environments”, http://books.nips.cc/papers/files/nips17/NIPS2004_0071.pdf. [4] TEKNOMO K., http://people.revoledu.com/kardi/tutorial/ReinforcementLearning, 7 march 2008. [5] MITCHELL T., Machine learning, McGraw-Hill Companies, Inc., 1997. [6] http://wazniak.mimuw.edu.pl/index.php?title=Sztuczna_inteligencja/SI_Modu%C5%82_13_-

_Uczenie_si%C4%99_ze_wzmocnieniem, 2008-03-07 (In Polish).

89


Keywords: identification, parameter estimation, errors-in-variables

Tomasz LARKOWSKI∗

Jens G. LINDEN∗

Keith J. BURNHAM∗

RECURSIVE BIAS-ELIMINATING LEAST SQUARESALGORITHM FOR BILINEAR SYSTEMS

The paper presents a recursive approach for the identification of single-input single-output discrete-time time-invariant errors-in-variables bilinear system models. The technique is based on the extensionof the bias compensated least squares and bias-eliminating least squares methods for bilinear systems.Consequently, since the constituent algorithms are constructed within the least squares framework, therequired computational burden is relatively low. A numerical simulation study compares the proposedalgorithm to other EIV methods.

1. INTRODUCTION

The errors-in-variables (EIV) framework addresses the identification of systems whereall the measured variables are corrupted by noise. This formulation extends the standardapproach, which postulates that only the output signals are uncertain and the input isknown exactly. The EIV class of problems is found to be most important when the de-termination of the internal physical laws describing the system is of a prime interest,as opposed to the prediction of the system output [17]. The EIV approach has gainedincreased attention during the last decade in various wide ranging engineering, scien-tific and socio-economic fields. A detailed description of the existing approaches can befound in [13, 17, 18, 19].

A diagrammatic illustration of the standard EIV system setup, is presented in Fig-ure 1. The variablesu0k , y0k denote the noise free input/output,uk, yk the measuredinput/output signals anduk, yk the additive noise sequences corruptingu0k , y0k , respec-tively. Consequently, the true input and output signals, i.e.u0k andy0k , respectively,

∗Control Theory and Applications Centre, Coventry, UK

90

u0k y0k

ukyk

yk

uk

SYSTEM

Fig. 1. Typical EIV systems setup.

are only available via the noisy measurements

uk = u0k + uk, yk = y0k + yk. (1)

In the field of modelling for nonlinear systems, the bilinear system (BS) models havebeen exploited to advantage in various practical applications, e.g. control plants, biolog-ical and chemical phenomena, earth and sun science, nuclear fission, fault diagnosis andsupervision, see [5, 14] or [15]. The fact that the BS models are so widely applicablestems the need to extend the EIV approaches developed for linear systems to encompassthe BS case.

One interesting approach for linear EIV systems is the so called bias-eliminating leastsquares (BELS), first proposed in [21, 22, 23]. The BELS method was subsequentlyanalysed in [6] and [7], whereas its extension to handle BS was proposed in [12]. Thispaper addresses the recursive realisation of the bilinear bias-eliminating least squares(BBELS) technique.

2. NOTATION AND PROBLEM STATEMENT

BS models are characterised by a nonlinearity in the form of a product between theinput and state. In general, regarding the state-space representation, a discrete time-invariant single-input single-output (SISO) BS can be described by, see [2]:

xk+1 = Axk + Bu0k + Gu0kxk, x0 = x0, (2a)

y0k = Cxk +Du0k , (2b)

wherexk ∈ Rnx denotes the state vector andx0 its initial value, withu0k ∈ R and

y0k ∈ R being the noise-free input and output sequences, respectively. The time-invariant matricesA, B, C, D andG are of appropriate dimension and characterise the

91

dynamical behaviour of the system. It is to be noted that an input dependent system ma-trix can be expressed asA(u0k) =

[

A+ u0kG]

yielding input dependent steady-stateanddynamic characteristics. Different methods for the discretisation of the continuousBS can be found, see [3] or [4]. In this paper, attention is placed on a particular class ofthe discrete time-invariant SISO BS that can be represented by the following nonlinearautoregressive with exogenous input process, i.e.

A(q−1)y0k = B(q−1)u0k +

nη∑

i=1

ηiiu0k−iy0k−i

, (3)

wherenη ≤ nb ≤ na andq−1 is the backward shift operator, defined byxkq−1 , xk−1.

The polynomialsA(q−1) andB(q−1) are given as follows

A(q−1) , 1 + a1q−1 + . . . + anaq

−na , (4a)

B(q−1) , b1q−1 + . . .+ bnb

q−nb . (4b)

A discussion regarding the state-space realisability of the input/output description of BScan be found in [8, 9]. The BS given by (3) belongs to a class of diagonal BS (see [16]for more details). Diagonal BS models are possibly the most commonly utilised class ofBS for the purpose of industrial applications, see [1, 5, 20]. Furthermore, the diagonalBS exhibit a crucial property of interest, namely, that there exists no correlation in thebilinear terms between the coupled input and output signals, i.e.E[u0k−i

y0k−i] = 0,

where E[·] denotes the expected value operator, see [16]. Additionally, although notexploited here, the state-space realisability of the diagonal BS is also guaranteed, see [9].In the remainder of this paper the reference will be made to the diagonal BS, exclusively.

The following assumptions are postulated:

A1. The diagonal BS is time-invariant, asymptotically stable, observable and control-lable.

A2. The system structure, i.e.na, nb andnη, is knowna priori.

A3. The true input is white, zero mean, bounded and persistently exciting of suffi-ciently high order.

A4. The corrupting input/output noise sequences are zero mean, ergodic, white sig-nals with unknown variancesσu andσy, respectively, mutually uncorrelated anduncorrelated with the noise free signalsu0k andy0k , respectively.

92

With reference to the linear case, the assumptions postulated here are typical in the EIVframework, see [17]. Whilst this property is not true for the general class of BS, A3implies thatE[y0k ] = 0, see [16].

Thesystem parameter vector is defined as

θT ,[

aT bT ηT]

∈ Rnθ , (5)

where

aT ,[

a1 . . . ana

]

∈ Rna, (6a)

bT ,[

b1 . . . bnb

]

∈ Rnb , (6b)

ηT ,[

η11 . . . ηnηnη

]

∈ Rnη (6c)

with nθ = na + nb + nη. The regressor vectors for the measured data are given by

ϕTk ,

[

ϕTyk

ϕTuk

ϕTρk

]

∈ Rnθ (7)

with

ϕyk ,[

−yk−1 . . . −yk−na

]

∈ Rna, (8a)

ϕuk,

[

uk−1 . . . uk−nb

]

∈ Rnb , (8b)

ϕρk ,[

ρk−1 . . . ρk−nη

]

∈ Rnη , (8c)

where the notation

ρk−i , uk−iyk−i (9)

is used to denote the bilinear product terms. The corresponding noise contributions inthe regressor vectors are denoted with a tilde, e.g.ϕk, whereas the noise free signals aredenoted with a zero subscript, e.g.ϕ0k .

The notationΣcd is used as a general notion for the covariance matrix of the vectorsck anddk, whereasξcf is utilised for a covariance vector withfk being a scalar, i.e.

Σcd , E[

ckdTk

]

, Σc , E[

ckcTk

]

, ξcf , E[ckfk]. (10)

Thecorresponding estimates denoted by[·] are given as

Σcd ,1

N

N∑

k=1

ckdTk , Σc ,

1

N

N∑

k=1

ckcTk , ξcf ,

1

N

N∑

k=1

ckfk, (11)

93

whereN denotes the number of data samples. In addition,0g×h and Ig denote the nullmatrix of arbitrary dimensiong × h and the identity matrix of arbitrary dimensiong,respectively.

The dynamic identification problem for diagonal BS in the EIV framework consid-ered here is formulated as follows:

Problem 1 [Dynamic diagonal BS EIV identification problem]GivenN samples of the measured signalsuk

Nk=1 and yk

Nk=1, determine the vector

ϑT ,[

θT σu σy]

∈ Rnθ+2. (12)

3. BIAS COMPENSATED LEAST SQUARES

This section provides a brief review of the bias compensated least squares techniquefor diagonal BS, see [10, 12] for further details. The bilinear bias compensated leastsquares (BBCLS) algorithm for the class of diagonal BS comprises of equations (13a),(13b) and (13c), see [12]. These correspond to the bilinear bias compensation rule, thenoise covariance matrix and the auto-correlation of noise on the bilinear terms, respec-tively, i.e.

θBBCLS , (Σϕ − Σϕ)−1 ξϕy, (13a)

Σϕ ,

σyIna 0 00 σuInb

00 0 σρInη

, (13b)

σρ , σuσy + σyσu − σuσy, (13c)

whereσu , E[u2k] andσy , E[y2k] are the variances of the measured system input andoutput signals, respectively. Equation (13a) can be alternatively re-expressed as

θBBCLS = θLS +Σ−1ϕ ΣϕθBBCLS, (14)

whereθLS denotes the least squares (LS) estimate. A recursive realisation of the BB-CLS approach was presented in [11]. It is implied from the BBCLS algorithm that theknowledge regarding noise variances corrupting input/output of a system together withvariances of measured input/output signals is sufficient to obtain unbiased estimates ofthe true system parameters. Whilst the variances of the input/output signals can be es-timated directly from available measurements, two more equations are required to de-termine the variances of input/output noise sequences. This issue is addresses in thesubsequent section.

94

4. OFFLINE BILINEAR BIAS ELIMINATING LEAST SQUARES

In this section the offline BBELS algorithm will be briefly reviewed. In general,for the BELS based approaches the two additional equations in order to determine theinput and output noise variances are formed by considering an overparametrised system,i.e. a system with an augmented parameter inA(q−1) or B(q−1) polynomial. Sincein both cases the considerations are analogous, the first option is considered here. Theoverparametrised system (3) is given by

A(q−1)y0k = B(q−1)u0k +

nη∑

i=1

ηiiu0k−iy0k−i

, (15)

where

A(q−1) , 1 + a1q−1 + . . .+ anaq

−na + ana+1q−na−1. (16)

The additional parameter, denoted with a breve is null by definition, i.e.

ana+1 , 0 (17)

such that the augmented system (15) is formally equivalent to the system (3). The pa-rameter vector corresponding toA(q−1) is

aT ,[

aT ana+1

]

∈ Rna+1. (18)

The augmented parameter vector, denotedθ, is given by

θT ,[

aT bT ηT]

∈ Rnθ+1. (19)

The augmented regressor vector for the measurements is defined as

ϕTk ,

[

ϕTyk

ϕTuk

ϕTρk

]

, (20)

where

ϕTyk

,[

ϕTyk

−yk−na−1

]

. (21)

The BBCLS scheme for the augmented system (15), in accordance with (14), is givenby

ˆθBBCLS =

ˆθLS +Σ−1

ϕ Σ ˜ϕˆθBBCLS (22a)

95

with

Σ ˜ϕ ,

σyIna+1 0 00 σuInb

00 0 σρInη

. (22b)

The utilisation of (17) implies that the following linear constraint must be satisfied

HT θ = 0, (23)

where

HT ,[

hT 0 . . . 0]

∈ Rnθ+1, (24a)

hT ,[

0 . . . 0 1]

∈ Rna+1. (24b)

The following notation for the inverse of the matrixΣ−1ϕ is introduced

Σ−1ϕ ,

Σ11 Σ12 Σ13

ΣT12 Σ22 Σ23

ΣT13 ΣT

23 Σ33

∈ R(nθ+1)×(nθ+1), (25)

whereΣ11 ∈ R(na+1)×(na+1), Σ12 ∈ R

(na+1)×nb , Σ13 ∈ R(na+1)×nη .

Lemma 1 Considering the overparametrised system(15), the following equality holds

−hT ˆaLS , σyhTΣ11a+ σuh

TΣ12b+ σρhTΣ13η. (26)

Proof 1 See [12] for details.

Lemma 2 The asymptotic expression for the expected error of the LS method, denoted1LV (

ˆθLS), whereL = N−na, with respect to the overparametrised system(15) is given

by

limN→∞

1

LV (

ˆθLS) , σy(1 + ˆaTLS a) + σub

TLSb+ σρη

TLSη. (27)

Proof 2 See [12] for details.

Merging the BBCLS rule with Lemmas (1) and (2) allows to solve the identificationproblem specified by Problem 1. Denoting an iteration index byi and the maximumnumber of iterations byImax, the offline BBELS algorithm is summarised as:

96

Algorithm 1 (BBELS algorithm)

1. ComputeˆθLS, σu, σy and seti = 0, ˆθiBBELS =ˆθLS, σi

ρ = 0

while i < Imax do

2. i = i+ 1

3. Solve

[

σiy

σiu

]

=

[

hTΣ11âi−1BBELS hTΣ12bi−1

BBELS

1 + âTLSâi−1

BBELS bTLSbi−1BBELS

]−1[

−hT âLS− σi−1ρ hTΣ13η

i−1BBELS

1LV (

ˆθLS)− σi−1

ρ ηTLSηi−1BBELS

]

4. Calculate: σiρ = σuσ

iy + σyσ

iu − σi

uσiy

5. Compute:ˆθiBBELS =ˆθLS+Σ−1

ϕ Σi˜ϕ

ˆθi−1

BBELS

end

5. ONLINE BILINEAR BIAS ELIMINATING LEAST SQUARES

Since the Algorithm 1 is iterative, it can be easily transformed into a recursive scheme.

This requires online updates of the expressions:Σ−1ϕ , V (

ˆθLS), σu and σy. The update

of the matrixΣ−1ϕ is carried out utilising the recursive LS (RLS) algorithm [24].De-

noting Pk = (Σϕ)−1 and introducing the user chosenk0 > na, the recursive BBELS

(RBBELS) algorithm is given by:

Algorithm 2 (RBBELS algorithm)

1. Setk = k0, Pk = 103Inθ+1, σku = 1

k

∑ki=1 u

2k, σk

y = 1k

∑ki=1 y

2k

Compute ˆθkLS, and setˆθkBBELS =ˆθkLS, σk

ρ = 0

for k = k0 + 1 . . . N

2. Calculate

97

Lk = Pk−1ϕk

(

1 + ϕTk Pk−1ϕk

)−1

ˆθkLS =

ˆθk−1

LS + Lk

(

yk − ϕTkˆθk−1

LS

)

Pk = Pk−1 − LkϕTk Pk−1

V (ˆθkLS) =

(

yk − ϕTkˆθkLS

)2

+ (k − na − 2)V (ˆθk−1

LS )

3. Solve

[

σky

σku

]

=

[

hTΣ11kâk−1

BBELS hTΣ12k bk−1

BBELS

1+(âkLS)T âk−1

BBELS (bkLS)T bk−1

BBELS

]−1[

−hT âkLS− σk−1ρ hTΣk

13ηk−1BBELS

1k−na−1V (

ˆθkLS)− σk−1

ρ (ηkLS)T ηk−1

BBELS

]

4. Calculate

σku =

k − 1

kσk−1u +

1

k − 1u2k

σky =

k − 1

kσk−1y +

1

k − 1y2k

σkρ = σk

uσky + σk

y σku − σk

uσky

5. Compute:ˆθkBBELS =ˆθkLS+ PkΣ

k˜ϕ

ˆθk−1

BBELS

end

Note that the recursive computation of the expressionV (ˆθkLS) is based on a present es-

timate of the parameter vectorθkLS, i.e. at the time instancek. This introduces a sys-

tematic error to the expressionV (ˆθkLS), which is propagated subsequently through the

entire identification procedure. Therefore, the value ofV (ˆθNLS), computed at the last

time instant will not be, in general, equal to the corresponding value obtained from theoffline BBELS algorithm. Consequently, the estimates of the vectorϑ resulting from theonline and offline algorithms will also differ. This issue is crucial in the initial period ofthe identification as the estimates ofθLS are of a very low quality and hence can have asignificant effect on the accuracy as well as on the convergence of the entire recursivealgorithm. In order to alleviate this problem it can be considered to use an offline ex-

pression for the calculation ofV (ˆθkLS) during the firstM recursions. Although, this will

not eliminate the total mismatch, but the effect of the firstM − 1 imprecise estimates ofθLS is removed and the accuracy of the estimatedϑ is improved. Other possibility could

98

be to utilise the online update ofV (ˆθkLS) with the offline expression being used everym

recursions over a data window of a fixed lengthl ≪ N .Moreover, it is also noted that, according to Lemmas (1) and (2), the computation

of σku andσk

y (also in the case of the offline algorithm) involves the termσkρ which is

approximated by its previous value, i.e. the value at the time instancek− 1. This proce-dure introduces an additional degree of an approximation and hence also uncertainty tothe overall RBBELS algorithm.

It must be stated that the BBELS algorithm and hence also the RBBELS approachis not always convergent, especially for low signal-to-noise ratios. An extensive studyregarding the convergence of the BELS method for linear systems was presented in [7].It was shown that the convergence of the BELS based techniques depends not only onthe values of the signal-to-noise ratios on the input and output but also on a particularmodel structure. Consequently, it is conjectured that also the BBELS and thus RBBELSapproach are subject to this property.


This section provides a numerical evaluation and comparison ofthe proposed RBBELSapproach with the RLS and the offline BBELS algorithm. The SISO diagonal BS systemwith na = 2, nb = nη = 1 is simulated forN = 10000 samples. The parameter vectorto be identified is given by

ϑT =[

1.200 0.900 0.600 0.100 0.016 0.005]

. (28)

The input sequence is white and uniformly distributed with|u0k | < 0.354. The selectedvalues of the input and output noise variances correspond to the signal-to-noise ratiosequal approximately9dB on both the input and output. In the case of the BBELS al-gorithm the iterations are restricted to10, i.e. Imax = 10. The RBBELS approach isinitialised withk0 = 100. The results of the estimation procedure for a single particularrealisation of the simulation are depicted in Figure 2 and Figure 3.

Considering Figure 2, a significant bias is seen on the estimates corresponding tothe RLS. The BBELS approach obtained estimates of the model parameter vector rel-atively close to their true values. It is noted that the estimates ofθ produced by theRBBELS converge to their offline counterparts over the successive recursions. More-over, the RBBELS algorithm was able to achieve estimates virtually indistinguishablefrom the BBELS approach at the last recursion step, i.e. fork = N . Analogous findingsare noted when considering Figure 3. These observations can be seen as an indication

99

2000 4000 6000 8000 10000−1.2

−1.15

−1.1

−1.05

−1

−0.95

true

BBELS

RBBELS

RLS

2000 4000 6000 8000 100000.65

0.7

0.75

0.8

0.85

0.9

2000 4000 6000 8000 100000.4

0.5

0.6

0.7

2000 4000 6000 8000 10000

0.05

0.1

0.15

a1

a2

b 1 η 11

NN

NN

Fig. 2. The results of the identification procedure using RBBELS, BBELS and RLS algorithms.

supporting the appropriateness of the recursive scheme. Furthermore, the scattered val-ues of the estimates of the vectorϑ in the initial part of the identification procedure canbe related to the relatively low precision of the estimated variances of input and outputsignals, i.e.σu andσy, respectively.

7. CONCLUSIONS

An approach for the online identification of a discrete time-invariant single-inputsingle-output errors-in-variables diagonal bilinear systems has been presented. Thetechnique is based on the extension of the bias compensated least squares and bias-eliminating least squares methods for bilinear systems. Since, based on the least squaresprinciple, the required computational burden is relatively low. The method proposed hasbeen demonstrated via a numerical study. Further work could aim to relax the assump-

100

2000 4000 6000 8000 10000

6

8

10

x 10−3

trueBBELSRBBELS

2000 4000 6000 8000 10000

0.012

0.013

0.014

0.015

0.016

2000 4000 6000 8000 10000

0.04

0.045

0.05

2000 4000 6000 8000 10000

0.14

0.15

0.16

0.17

0.18

0.19

σu

σy

σu

σy

NN

NN

Fig. 3. The results of the identification procedure using RBBELS, BBELS and RLS algorithms.

tion referring to the whiteness of the input signal.

REFERENCES

[1] BURNHAM K. J. Self-tuning Control for Bilinear Systems. PhD thesis, Coventry Poly-technic, 1991.

[2] DUNOYE, A. Bilinear Self-tuning Control and Bilinearisation of Nonlinear IndustrialSystems. PhD thesis, Coventry University, 1996.

[3] DUNOYE, A., BALMER L., BURNHAM K. J. and JAME, D. J. G. On the discretisationof single-input single-output bilinear systems.Int. J. of Control, vol. 68(2), pp. 361–372,1997.

[4] EKMAN M. Identification of linear systems with errors in variables using separable non-linear least squares. InProc. of 16th IFAC World Congress, Prague, Czech Republic, 2005.

[5] EKMAN M. Modeling and Control of Bilinear Systems: Applications to the ActivatedSludge Process. PhD thesis, Uppsala University, 2005.

101

[6] HONG M., SODERSTROM T. and ZHENG W. X. A simplified form of the bias-eliminating least squares method for errors-in-variables identification.IEEE Trans. onAutomatic Control, vol. 52(9), pp. 1754–1756, 2007.

[7] HONG M., SODERSTROM T. and ZHENG W. X. Convergence properties of bias-eliminating algorithms for errors-in-variables identification.Int. J. of Adaptive Controland Signal Proc., vol. 19(9), pp. 703–722, 2005.

[8] KOTTA U. and MULLARI T. Equivalence of realizability conditions for nonlinear controlsystems. InProc. of the Estonian Academy of Sciences. Physics. Mathematics, vol. 55(1),pp. 24–42, 2006.

[9] KOTTA U., NOMM S. and ZINOBE, A. S. I. On state space realizability of bilinearsystems described by higher order difference equations. InProc. of 42nd IEEE Conf. onDecision and Control, vol. 6, pp. 5685–5690, 2003.

[10] LARKOWSKI T., LINDEN J. G., VINSONNEAU B. and BURNHAM K. J. Identificationof dynamic errors-in-variables models for bilinear systems. InProc. of 7th Int. Conf. onTechnical Informatics, Timisoara, Romania, 2008.

[11] LARKOWSKI T., LINDEN J. G., VINSONNEAU B. and BURNHAM K. J. Recursivebias-compensating algorithm for the identification of dynamical bilinear systems in theerrors-in-variables framework. InProc. of 5th Int. Conf. on Informatics in Control, Au-tomation and Robotics, Funchal, Madeira Portugal, 2008.

[12] LARKOWSKI T., VINSONNEAU B. and BURNHAM K. J. Bilinear model identificationin the errors-in-variables framework via the bias-compensating least squares. InIAR andACD Int. Conf., Grenoble, France, 2007.

[13] MARKOVSKY I., WILLEMS J. C., VAN HUFFEL S. and DE MOOR B.Exact andApproximate Modeling of Linear Systems: A Behavioral Approach. Monographs on Math-ematical Modeling and Computation. SIAM, 2006.

[14] MOHLER R. R. Nonlinear Systems: Applications to Bilinear Control, volume 2. PrenticeHall, Englewood Cliffs, NJ, 1991.

[15] MOHLER R. R. and KHAPALOV A. Y. Bilinear control and application to flexible a.c.transmission systems.J. of Optimization Theory and Applications, vol. 105(3), pp. 621–637, 2000.

[16] PEARSON R. K. Discrete-Time Dynamic Models. Oxford University Press, New York,USA, 1999.

[17] SODERSTROM T. Errors-in-variables methods in system identification. InAutomatica,vol. 43, pp. 939–958, 2007.

[18] SODERSTROM T., SOVERINI U. and MAHATA K. Perspectives on errors-in-variablesestimation for dynamic systems. InSignal Proc., vol. 82(8), pp 1139–1154, 2002.

[19] VAN HUFFEL S. and LEMMERLING P. Total Least Squares and Errors-in-variablesModeling: Analysis, Algorithms and Applications. Kulwer Academic Publishers, TheNetherlands, 2002.

[20] YU D., GOMM J.B., SHIELDS D. N., WILLIAMS D. and DISDELL K. Fault diagnosisfor a gas-fired furnace using a bilinear observer method. InProc. of American ControlConf., vol. 2, pp. 1127–1131, 1995.

102

[21] ZHENG W. X. On a least-squares-based algorithm for identification of stochastic linearsystems. InIEEE Trans. on Signal Proc., vol. 46, pp. 1631–1638, 1998.

[22] ZHENG W. X. Transfer function estimation from noisy input and output data.Int. J. ofAdaptive Control and Signal Proc., vol. 12, pp. 365–380, 1998.

[23] ZHENG W. X. A bias correction method for identification of linear dynamic errors-in-variables models.IEEE Trans. on Automatic Control, vol. 47, pp. 1142–1147, 2002.

[24] LJUNG L. System identification - theory for the user. Prentice Hall PTR, New Jersey,USA, 1999.

103


Keywords: network optimization, heuristic

Krzysztof LENARSKI∗

Andrzej KASPRZAK∗

Piotr SKWORCOW†

ADVANCED TABU SEARCH STRATEGIES FOR TWO-LAYERNETWORK DIMENSIONING PROBLEM

This paper concerns use of tabu search-based strategies to solve a two-layer network dimensioningproblem. A modular case of a problem in non-bifurcated networks is presented. Since this problemis NP-hard (its decision version is NP-complete), thus, it is highly unlikely to solve it in a reasonabletime (for large networks) using exact methods. The main goal of this paper is to examine advanced tabusearch strategies such as long term memory and return jump methods. A computer experimentationsystem has been developed to carry out simulations and complex experiments. Example results ofexperiments are demonstrated and discussed.

1. INTRODUCTION

Computer networks are becoming more and more utilized due to increasing popular-ity of applications that require high bandwidth. Because of growing number of networkusers it is important for the network to be designed properly and in a reliable manner [1].Significant attention is given to the network designing issues. Improvement of networkfunctional quality just by few percent may reduce costs of leasing even by thousanddollars per month [2].

The resources (links and nodes) of communication and computer networks are con-figured in a multi-layered fashion, forming a hierarchical structure with each layer beinga proper network on its own. The links of an upper layer are formed using paths of thelower layer, and this pattern repeats as one goes down the resources hierarchy. This mayresult in network providers hierarchy, i.e. some providers may own resources only atone or two neighbouring layers of a network [3].∗Department of Systems and Computer Networks, Wroclaw University of Technology, Poland.†Water Software Systems, De Montfort University, Leicester, UK.

104

Emerging architectures and technologies for Next Generation Internet (NGI) corenetworks introduce a whole spectrum of new functions and features. Internet Protocol(IP), enhanced with MPLS traffic engineering capabilities is being used today and is fore-seen to be implemented in NGI networks. Furthermore recent advances in wavelengthdivision multiplexing (WDM) technology makes it a strong candidate for the basic tech-nology in the next generation optical transport network (OTN). Thus one of the possiblearchitectures for NGI is IP/MPLS-over-(D)WDM [4].

In this work a modular case of two-layer network dimensioning of networks withnon-bifurcated flows is considered. This is a NP-complete problem and path-link for-mulation results in large routing list that has to be predefined [5]. This paper concernstabu search (TS) approach to solve a two-layer connection-oriented networks designproblem. Proposed algorithm is a meta-heuristic which guides a local heuristic searchprocedure to explore the solution beyond local optimality. Tabu search is based on a con-cept that problem solving, in order to qualify as intelligent, must incorporate adaptivememory and responsive exploration. Further information about this topic can be foundin e.g. [6-9]. The main goal of this work is to examine tabu search to determine the mostefficient parameters.

The rest of this paper is organized as follows. Section 2 contains problem formu-lation. Section 3 is the overview of the proposed solution. Section 4 presents the de-velopment of an experimentation system. In Section 5 the results of investigations arepresented and discussed. Finally, Section 6 contains final remarks.

Fig. 1. Two layer network example

105

2. PROBLEM STATEMENT

A network model is represented as a two undirected finite graphs G = (N, L), whereN is a set of nodes and L is a set of graph edges. The example of two resource layernetwork is shown on Fig.1. The demands are directed from one node to another and thesum of demands in both directions needs to be lower than or equal to the link capacity.The network is also characterised by constants, such as: volume of demands, links’ ca-pacity in upper and lower layer and also their costs. The problem can be formulated asfollows:

Two layer dimensioning - Link-Path formulation

indices

d = 1, 2, ..., D demands,p = 1, 2, ..., Pd candidate paths in upper layer for flows

realizing demand d,e = 1, 2, ..., E links of upper layer,q = 1, 2, ..., Qe candidate paths in lower layer for flows

realizing link e,g = 1, 2, ..., G links of lower layer.

constantshd volume of demand d,δedp =1 if link e of upper layer belongs to path p realiz-

ing demand d; 0, otherwise,M size of the link capacity module in upper layer,ξe cost of one (M -module) capacity unit of link e of

upper layer,γgeq =1 if link g of lower layer belongs to path q realiz-

ing link e of upper layer; 0, otherwise,N size of link capacity module in lower layer,κg cost of one (N -module) capacity unit of link g of

lower layer.

106

variablesxdp flow allocated to path p realizing volume of

demand d,udp binary variable associated with path p

realizing demand d,ye M -module capacity units of link e,zeq flow allocated to path q realizing capacity of

link e,req binary variable associated with path q

realizing demand d,ug N -module capacity of lower layer link g.

objective

minimize F =∑e

ξeye +∑g

κgug (1)

constraints ∑p

udp = 1, d = 1, 2, ..., D (2)

∑d

∑p

δedpxdp ≤Mye, e = 1, 2, ..., E (3)

∑q

zeq = ye, e = 1, 2, ..., E (4)

∑q

req = 1, e = 1, 2, ..., E (5)

M∑e

∑q

γgeqzeq ≤ Nug, g = 1, 2, ..., G (6)

Equation (1) is the objective function defined as a sum costs of lower and uppercapacity layer modules. Equations (2-6) represent constraints, where equations (2) and(5) state that each demand must be realized on a single path (non-bifurcated) network,equations (3) and (6) state that flow in each link of upper and lower layer cannot exceedlink capacity. Equation (4) specifies that all upper layer capacities have to be realized bythe lower layer flows.

107

3. PROPOSED SOLUTION METHODS

Having determined paths for all demands we can try to find the solution of the statedproblem using tabu search algorithm. Following subsections consider neighborhoodstructure and moves, tabu search memories and tabu structure.

3.1. NEIGHBORHOOD STRUCTURE AND MOVES

Neighborhood in our case is the list of objects (see example in Table 1) that containsinformation such as:• start node of demand (Start N),• end node of demand (End N),• list of specified paths connected above nodes (Path),• tabu tenure parameter (TT).

The path is divided into two sets of active (A) and inactive (I) paths. The active pathsare considered in a current solution and the inactive ones are waiting in a queue to beconsidered. The moves that define the neighborhood structure consist of transferring achosen path from one subset to another.

3.2. TABU SEARCH MEMORIES

The core of the algorithm is to use of TS memory structures to guide through thesearch process. A short term memory is employed principally to prevent the search frombeing trapped in a local optimum, and also to introduce vigor in the search process. Along term memory is used to handle more advanced issues, such as intensification anddiversification.Short term memory. The short term memory operates by imposing restrictions on the

Table 1. Neighborhood object

Start N. End N. P.N. Path TT

1 3 1 < 1, 2, 4 > 3

2 < 1, 5, 4 > 0

3 < 1, 2, 3, 4 > 1

4 < 1, 2, 5, 4 > 0

5 < 1, 5, 2, 4 > 0

6 < 1, 5, 2, 3, 4 > 5

108

composition of a newly generated solution. For elementary moves, we impose restric-tions which ensure that a move cannot be ”reversed”. For example, if one of the pathsis moved from set I to set A, in the tabu structure it is noted that this move cannot beperformed for some period of time. This prevents the algorithm from oscillating aroundlocal optimum. A parameter called tabu tenure identifies the number of iterations whena particular tabu restriction remains in force. The tabu tenure parameter is examined inSection V.Long term memory. The long term memory is used to store the frequency of movesin a specified direction. Changing of one route in demand between nodes x and y willincrease a parameter called ”frequency” connected with this demand by 1. Long termmemory can be used in two cases, either to intensify frequency moves or diversify it. Inour implementation we would use it for diversification.

3.3. TABU STRUCTURE

Short and long term memory are implemented as a special structure called TabuStructure. It stores two parameters: tabu tenure and frequency. Tabu tenure is storedusing a field in every neighborhood object (see Table 1). Frequency is stored for everydemand, as shown in Fig.2.

Fig. 2. Example of tabu structure

109


In order to carry out investigations for the stated problem a computer experimenta-tion system (CES) was developed. CES was implemented in C# language using VisualStudio 2005 environment with .NET technology [11,12]. The system consists of severalmodules, which are responsible for different functionality. The application allows to de-sign complex experiments using experiment design module, to find paths and solutionsfor the stated problem including time measurements using computational module andfinally to compare the results on graphs using presentation module.Experiment design module. In this module user can enter the network data and algo-rithm parameters. The data describing network structure and the intensity matrix [13]can be entered manually or loaded from file. The algorithm parameters can only be en-tered manually.Computational module. This module is an implementation of path finding and thentabu search algorithms.Presentation module. This module is responsible for presentation of all results ob-tained during experiments. One can summarize the results on a chart and save the resultsto either a txt or xls file.

5. INVESTIGATIONS

Four two-layer networks were considered during investigations, their parameters aresummarised in Table 2. All of these networks have some similarity to real, existingnetworks. The lower layers are problem instances of SNDlib 1.0, which is the libraryof test instances for fixed telecommunication network design [14]. The upper layers arecomposed from the lower layers, with some nodes randomly removed. Demands andsize of modules M used in investigations were also taken from SNDlib. In the fourthnetwork there was no problem instance with a modular capacity, hence some arbitraryvalues, which can be found in Table 3, were used. It was assumed that each link has thesame module size and cost, both for upper and a lower layer:In this paper results of three experiments are presented. In the first experiment the num-ber of iterations was investigated. The second part considered how the results wereaffected by different tabu tenure values. And, finally, in third one use of advanced strate-gies is investigated. All the investigations were conducted on each network, and repeated10 times to make the obtained results more representative. Experiments were conductedon a PC with AMD Sempron 2500+ (1,8GHz) processor with 512 RAM.

110

Table 2. Networks parameters

Upper lay. Lower lay. Dem.

Network atlanta

Number of nodes 13 15 13

Number of links 17 22 210

Network poland



Network dfn-bwin



Network di-yuan



5.1. EXPERIMENT 1. NUMBER OF ITERATIONS

Experiment 1 and 2 investigated how the results depend on different tabu searchparameters. The number of algorithm iterations was increased while observing whatinfluence it had on results and on computation time. Results for “atlanta” network areshown in Fig.3, and for other networks in Table 4.

Table 3. Module capacities and costs

Name M Cost M N Cost N

atlanta 1000 950000 4000 1090000

poland 155 272 622 816

dfn-bwin 1000 44400 4000 54400

di-yuan 2 4 100 150

111

Fig. 3. Experiment 1: Cost and computation times for different number of iterations for “atlanta” network.

As expected lower cost was obtained when more iteration was performed. In the firstpart of chart in Fig.3 it can be noticed that the cost decreased steeply, but after exceeding300 iterations further improvements became insignificant.

5.2. EXPERIMENT 2. TABU TENURE

Results of investigation how the results are affected by different tabu tenure valuesare summarised in Table 5.

Table 4. Experiment 1. Results

TT Networkspoland dfn-bwin di-yuan

10 8,25E+06 1,86E+08 1,33E+05

20 7,36E+06 1,55E+08 1,10E+05

30 6,13E+06 1,51E+08 9,52E+04

40 5,22E+06 1,33E+08 9,05E+04

50 4,89E+06 1,16E+08 8,08E+04

60 4,44E+06 1,08E+08 7,72E+04

70 3,73E+06 9,87E+07 6,98E+04

80 3,09E+06 8,97E+07 6,64E+04

90 2,88E+06 8,78E+07 6,27E+04

100 2,69E+06 8,50E+07 6,22E+04

112

Tabu tenure parameter had no influence on the algorithm computation time. It canbe noticed in Table 5 that for the considered networks the best results were achieved fortabu tenure equal to 4 or 5. Tabu tenure is a crucial parameter for tabu search methodsand we it was observed that in worst cases badly chosen value of tabu tenure increasedthe cost by 13%.

5.3. EXPERIMENT 3. ADVANCED STRATEGIES

This experiment investigated how use of the advanced strategies influences the re-sults. As it can be seen in the Fig.4 using diversification did not improve the results andthe cost was higher than in other cases by approximately 30%. The best results wereobtained using intensification with or without back jump or when the long term memorywas not used at all.

6. CONCLUSIONS

This paper investigated the use of tabu search-based strategies to solve a two-layernetwork dimensioning problem. The main goal of this paper was to examine impactof advanced tabu search strategies and a computer experimentation system has been

Table 5. Experiment 2. Results

TT Networksatlanta poland dfn-bwin di-yuan

1 5,09E+08 8,41E+07 2,13E+08 5,80E+04

2 4,88E+08 8,45E+07 2,04E+08 6,16E+04

3 5,12E+08 8,23E+07 2,06E+08 6,08E+04

4 4,82E+08 7,96E+07 2,14E+08 5,63E+04

5 4,90E+08 7,67E+07 2,03E+08 5,80E+04

6 5,44E+08 8,13E+07 2,17E+08 5,72E+04

7 5,26E+08 7,81E+07 2,14E+08 5,64E+04

8 5,36E+08 7,41E+07 2,13E+08 5,78E+04

9 4,97E+08 7,48E+07 2,19E+08 6,04E+04

10 4,88E+08 7,87E+07 2,12E+08 6,06E+04

113

Fig. 4. Experiment 3. Results

developed for this purpose. Based on simulation results it may be observed that: 1) thenumber of iterations (NoI) has significant influence on results, the more iterations, thebetter the result (i.e. lower cost). However after a certain NoI, depending on networksize, further increase of NoI does not reduce the cost significantly, but increases thetime of computation; 2) the best value of tabu tenure parameter depends on a networkstructure and has no effect on computation time; 3) using the long term memory the bestresults were achieved using intensification, long jump has insignificant impact on resultin the cases considered.

Further work will include to extension of capabilities of the computer experimenta-tion system by adding some new problem instances and implementation of other meta-heuristic methods.

REFERENCES

[1] LENARSKI K. and KOSZALKA L., Comparison of heuristic methods applied to optimiza-tion of computer networks, XVI International Conference on Systems Science, Wroclaw,Poland, 4-6 September, 2007.

[2] TANENBAUM A.S., Computer Networks (in Polish), Warszawa, 1988.

[3] PIORO M. and MEDHI D., Routing, Flow, and Capacity Design in Communication andComputer Networks, San Francisco: Morgan-Kaufmann, 2004.

[4] KUBILINSKAS E. and PIORO M., An IP/MPLS over WDM network design problem,International Network Optimization Conference (INOC) 2005, Lisbon, Portugal, 20-23March, 2005.

[5] KUBILINSKAS E., Notes on application of Path Generation to Multi-layer network designproblems with PF flow allocation, 17th Nordic Teletraffic Seminar (NTS 17), Fornebu,Norway, 25-27 August, 2004.

[6] GLOVER F., Tabu search fundamentals and uses, Colorado, 1995.

114

[7] GLOVER F., Tabu Search - Part I, ORSA Journal on Computing, Vol. 1, No. 3, pp. 190-206, 1989.

[8] GLOVER F., Tabu Search - Part II, ORSA Journal on Computing, Vol 2, No. 1, 4-32, 1990.

[9] GLOVER F., Xu J. and Chiu S.Y., Probabilistic Tabu Search for Telecommunications Net-work Design, Combinatorial Optimization: Theory and Practice, Vol. 1, No. 1, 1997, pp.69-94.

[10] DROZDEK A., C++. Algorithms and data structures (in Polish), Gliwice, 2004.

[11] STEFACZYK A., Secrets of C sharp (in Polish), Gliwice, 2005.

[12] PERRY S. C., Core C sharp and .NET, Gliwice, 2006.

[13] KASPRZAK A., Wide area networks with packet switching (in Polish), Wroclaw, 1999,pp. 173-216.

[14] Survivable Network Design Library [Online, 2008]. http://sndlib.zib.de/home.action

115


Keywords: optimization, flow allocation, dimensioning problem, multilayer network

Michał KUCHARZAK* Leszek KOSZAŁKA* Andrzej KASPRZAK*

OPTIMIZATION ALGORITHMS FOR TWO LAYER NETWORK DIMENSIONING

Modern computer networks utilise integration of different technologies. This led to a multilayered model of network, raised new questions and challenged most of the existing optimization algorithms developed for a single layer. This paper considers flow allocation in multilayered networks and investigates dimensioning of networks involving two layers of resources. Two different strategies for dimensioning network links capacities based on the Dijkstra’s shortest path algorithm are compared. The first strategy concerns the dimensioning directly from the mean traffic load in the individual layers from the top layer to the bottom layer in a multilayered network. The second approach considers actual flow and multilayered path regarding the coordination of the individual network layers. Experimentation results and comparison strategy are illustrated and discussed.

1. INTRODUCTION

Modern communication networks can be composed of more than one layer of resources. Multi-layer technology has evolved with many different multi-layering possibilities, e.g., IP networks can be provided over ATM, MPLS, SONET or WDM. In fact, it is possible to have more than two layers, e.g. IP over ATM over SONET network [8]. Unlike regular networks, these multilayer networks allow users and other networks to interface on different technology layers.

Introduction of multilayered models of networks resulted in most of the network optimization problems becoming more computationally demanding. Furthermore, many of existing well-understood optimisation problems on single layer networks


116

Fig. 1. Two-layer network example

changed to be far-from-trivial on multilayered networks [3]. Problems defined on multilayered networks are more difficult to solve than these stated for flat networks [4]. The integration of different technologies such as ATM, SDH, and WDM in multilayer transport networks raises many questions regarding the coordination of the individual network layers [1]. In this work authors compare two deterministic methodologies for off-line solving of dimensioning problems for multilayered networks involving two layers of resources. The first approach concerns flow allocation directly from the mean traffic load in the individual layers from the top layer to the bottom in the multilayered network respectively, the second approach considers actual flow and multilayered path regarding the coordination of the individual network layers.

This paper is divided into six main sections. The main idea and a model of multilayered networks is discussed in Section 2, supported by examples and related illustrations. In the same section a logical model of multilayer networks involving two layers of resources is introduced. Section 3 presents a dimensioning optimization problem statement with particular assumptions and constraints. Section 4 describes two deterministic approaches for solving dimensioning problems. Investigations are described in Section 5. This section illustrates basic principles and algorithms comparison with evaluation results. Final remarks appear in Section 6.

2. MULTILAYER NETWORK

2.1 CONCEPT OF MULTILAYER NETWORKS

Multilayer networks are networks composed of more than one layer of resources [5, 8]. In Fig. 1 an idea of two-layer network is illustrated as an example. The example shows network which consists of two layers: traffic network and transport network.

117

Traffic network represents a logical capacity to carry the traffic (tagged links d are logical). To route these logical links and the associated capacity it is necessary to introduce and set a transport network. For example, the logical link d=1, between nodes A and C, in the traffic network illustrated in Fig. 1 can be connected with the transport network route A-B-C (or e=1, e=2). Similarly, data unit for the logical link d=6, between nodes A and D, can be connected via the transport route A-B-F-D (or by links e=1, e=5, e=4). The logical capacities in traffic layer are realized by means of flows in physical transport layer.

2.2 MULTILAYER NETWORKS MODELLING

Consider the network example illustrated in Fig. 2 and the model presented in [8]. The network consists of two layers of resources (layer 1: equipment, transport layer; layer 2: Virtual Capacity layer) and an additional auxiliary layer (demand layer) used merely to specify the demands (logical capacity to carry the traffic).

Fig. 2. Three-layer network model

For each demand d its demand volume hd is realized by means of flows xdp assigned to paths Pd of Layer 2 (Fig. 3). In accordance to [8], Pd = (Pd1, Pd2, …, PdPd) is used for denoting Layer 2 candidate link-path list for demand d while Qe = (Qe1, Qe2,

118

…, QeQe) is used for denoting Layer 1 candidate link-path list for link e. Examples of candidate paths Pd and Qe for network presented in Fig. 2 are shown in Table 11.

Fig. 3. Demand d=6 of volume h6 realization by means of bifurcated flow x6 on three paths

from path list P6 in VC layer

The resulting loads of flows xdp on each link e of Layer 2 determine link capacity vector, denoted y, of the layer (as in the example in Fig. 3). The next step is analogous. The capacity of each link e in Layer 2 is realized by means of flow in Layer 1, and the resulting Layer 1 flows zeq determine the load of each link g of Layer 1, and hence its capacity ug. Unit costs (ξe, κg) define the cost of transporting data on each link in each layer. Considered multilayer network model is based on the following assumption: if a node appears in an upper layer, then it automatically appears downwards.

In a nutshell, the resources (links and nodes) of communication and computer networks are configured in a multi-layered fashion, forming a hierarchical structure with each layer being a proper network on its own. The links of an upper layer are formed using paths of the lower layer, and this pattern repeats as one goes down the resources hierarchy [8].

Actual flow in a multilayer network is assigned to a multilayered path that describes a path throughout the entire network.

1 The routing lists do not necessarily contain all possible paths.

Table 1. Candidate Path Lists

Link Path List d =1 P11=1 P12=3, 4, 2 P13=5, 2 d =2 P21=1 P22=1, 3, 4 P23=1, 5

d =3 P31=3 P32=5, 4 d =4 P41=3 P42=3, 5 P43=3, 1, 2 d =5 P51=3, 1 P52=4, 2 P53=3, 5, 2 d =6 P61=5 P62=3, 4 P63=1, 2

Link Path List

e =1 Q11=1 Q12=2, 6, 5, 4 Q13=2, 6, 7, 3 e =2 Q21=4, 5 Q22=3, 7 Q23=1, 2, 6 e =3 Q31=2 Q32=1, 3, 7, 6 Q33=1, 4, 5, 6 e =4 Q41=6 Q42=2, 1, 3, 7 e =5 Q51=2, 6 Q52=1, 4, 5

119

3. NETWORK DIMENSIONING PROBLEM

The network dimensioning problem concerns finding flow allocations in the upper layer (VC layer) and in the lower layer (equipment), as well as the capacity of links in both layers, given the cost of modules capacity unit of link for data transferring in both layers. According to the multilayer network model described in Section 2 and considerations in [8] the dimensioning problem formulation for a network containing two layers of resources can be stated as follows: A two layer network dimensioning problem indices d = 1, 2, . . . , D demands p = 1, 2, . . . , Pd candidate paths in upper layer for flows realizing demand d e = 1, 2, . . . , E links of upper layer q = 1, 2, . . . , Qe candidate paths in lower layer for flows realizing link e g = 1, 2, . . . , G links of lower layer constants hd volume of demand d δedp = 1 if link e belongs to path p realizing demand d, 0 otherwise ξe cost of one capacity unit of link e γgeq = 1 if link g belongs to path q realizing capacity of link e, 0 otherwise κg cost of one capacity unit of link g variables xdp flow allocated to path p of demand d ye capacity of link e in the VC layer zeq flow allocated to path q realizing capacity of link e ug capacity of (physical, transport) link g in the lower layer objective

∑∑ +=g gge ee uyF κξ minimize (1)

constraints

Dd hx dp dp ,...,2,1, ==∑ (2)

Ee yx ed p dpedp ,...,2,1, =≤∑ ∑ δ (3)

120

Ee yz eq eq ,...,2,1, ==∑ (4)

Gg uz ge q eqgeq ,...,2,1, =≤∑ ∑ γ (5)

Constraint (2) describes that flows in the VC layer realize the assumed demand

volumes and (3) defines required capacity of each link e. By analogy, ye determines demand for the lower layer that must be realized by means of flows zeq (4), and formula (5) specifies the lower layer capacity constraint.

4. ALGORITHMS

Both presented approaches, Top-to-Bottom and Flat Cost Strategies, are based on the well-understood Dijkstra’s shortest path finding algorithm [2].

4.1 TOP-TO-BOTTOM OPTIMIZATION

Top-to-Bottom (TtB) optimization approach is a simple algorithm based on an assumption that all of resource layers (VC and lower) are networks on their own sense with their own individual demands. Flow allocation is performed separately for each layer, from the uppermost layer to the bottom one.

This approach can be considered as different flow problems for each layer which are combined together in such a way, that the upper layer imposes demand on the neighbouring lower layer (Fig. 4a and Fig. 4b). An exact formulation of the TtB algorithm can be stated as follows:

Fig. 4a. Defined volume of demand hd is demand to be routed directly in VC layer.

Fig. 4b. Capacity ye of all links e determines demand to be routed in transport layer

121

TtB pseudo-code:

1: FOR EACH d 2: Pd := Pd1, where P d1 is derived from Dijksta's shortest path 3: xd1 := hd (demand realization in the way of non-bifurcated flow) 4: FOR EACH e, 5: IF δed1 = 1 6: ye := ye + xd1 7: END IF 8: END FOR 9: END FOR 10: FOR EACH e 11: Qe := Qe1, where Q e1 is derived from Dijksta's shortest path 12: ze1 := ye 13: FOR EACH g, 14: IF γge1 = 1 15: ug := ug + ze1 16: END IF 17: END FOR 18: END FOR 19: RETURN F

4.2 FLAT COST OPTIMIZATION

Flat Cost (FC) optimization approach considers all layers (VC and lower) as one integrated network. Flow allocation is performed not on the cheapest path in each layer separately but on the cheapest multilayered path throughout the entire multilayered network.

Dijkstra’s algorithm is well defined for network composed of single layer of resources. FC strategy treats a two-layer network as a flat (single layer) artificial network composed of nodes of VC layer and links e represented as lower layer topology (Fig. 5) with new cost of links assigned to each e.

Flat Cost methodology introduces a Flat Cost Coefficient (FCC) – ξe + βe. FCC describes a total cost of realization a data unit on the shortest path in lower layer and in VC layer.

Let q=1,path list equals Qe=Qe1, where Qe1 is the Dijkstra’s shortest path. FCC is stated as ξe + βe, where ξe cost of one capacity unit of link e and:

∑==∀ ggegeqg κβγ ,1 (6)

122

Fig. 5. Each link e is represented as network topology of lower layer (new artificial network is flat with new costs of links ξe + βe, and Dijkstra’s algorithm can be performed easily). In the example links

e=1 and e=7 are depicted. Other links are prepared in the same way.

FC pseudo-code: 1: FOR EACH e 2: Qe := Qe1, where Q e1 is derived from Dijksta's shortest path 3: FOR EACH g, 4: IF γge1 = 1 5: δe := βe + κg (realization cost of load ye on the path Q e1) 6: END IF 7: END FOR 8: FOR EACH d 9: P d := Pd1, where P d1 is derived from Dijksta's shortest path

regarding cost of one capacity unit of link ξe + βe 10: x d1 := hd 11: FOR EACH e, 12: IF δed1 = 1 13: ye := ye + xd1 14: END IF 15: END FOR 16: END FOR 17: FOR EACH e 18: ze1 := ye 19: FOR EACH g, 20: IF γge1 = 1 21: ug := ug + ze1 22: END IF 23: END FOR 24: END FOR 25: RETURN F

123

5. INVESTIGATIONS

5.1 PROBLEM INSTANCES

Examination of the algorithms is performed for three hypothetic multilayered networks. Each of them consists of demand layer with the same topology (Fig. 6).

Fig. 6. Layer 3: Demand, layer, in the background transport layer is shown

Fig. 7. Layer 1: Transport layer scenario

Transport layer 1 with costs of links g (Fig. 7.) is adapted from the core telecommunication network in Poland originated by Polish Telecom [7]. Fig. 7. illustrates three different topologies of Virtual Capacity layers with unit costs of links e.

Fig. 7. Layer 2: Different Virtual Capacity layer scenarios, in the background transport layer is shown

In the problem defined in Section 3 the modularization of links’ capacities ye and ug is not taken into consideration, thus flow allocation assigned to the shortest Dijkstra’s

124

path (in a non-bifurcated way) ensures that the obtained transportation cost is minimized [6].

Volume of demand hd for each d equals to 1 for all instances. In fact for non-bifurcated flow and a case without modularization of links’ capacities it is sufficient to determine the shortest paths for single unit of data.

5.2 OPTIMIZATION RESULTS

The objective function F according to formula (1) for mixed integral case of dimensioning problem is compound by two sub-functions: summation of all flow costs in VC layer ∑eξeye and summation of all flow costs in lower layer ∑gκgug. The objective of optimization is to find flow allocations xdp in the VC layer and flow zeq in the transport layer that the total cost F is minimized.

Optimised values of the cost function F are shown in Fig. 8a. The best optimization results were obtained using Flat Cost algorithm for all defined instances. However, TtB approach achieves lower values of cost in its first step, the volume of demand hd realization directly in VC layer is locally allocated more effectively than in FC strategy (Fig. 8b). FC which uses FCC resulted in finding the globally cheapest route throughout all layers in because of high level of minimization of costs in transport layer (Fig. 8c).

Fig. 8a. Dimensioning problem optimization results for three multilayer networks A, B and C

TtB

FC

125

Fig. 8b. Costs in VC layers

Fig. 8c. Costs in transport layers

TtB FC

Fig. 9a. Traffic load visualization in VC and transport layers for instance A

TtB FC

FC

FC

TtB

TtB

126

Fig. 9b. Traffic load visualization in VC and transport layers for instance B

TtB FC

Fig. 9c. Traffic load visualization in VC and transport layers for instance C

Dimensioning optimization causes the flow to be allocated in each of two layers VC and transport layer. In Fig. 9a – 9c traffic load on each link for these layers is visualized. Non-zero capacity links are marked proportionally to the volume of load and dashed lines indicate unused links in each layer of resources.

6. FINAL REMARKS

This paper compared two deterministic approaches for multilayered networking dimensioning problem. The main statements can be formulated as follows:

127

1. Flat Cost strategy is better approach than Top-to-Bottom optimization for dimensioning problem for the considered networks;

2. Multilayer Network should not be considered as a set of separate layers but rather as an integrated model;

3. Sometimes it is worth to allocate flow in a more expensive way locally in the VC layer, to obtain better results globally;

The main goals of computer networks optimization is to ensure networks reliability, efficiency, security, to reduce utilization of devices, to minimize the use of resources etc. In general, all of these goals lead to improved financial benefits and reduced expenses, hence, it is possible to state the following general statement: dimensioning optimization result depends on topology in each layer of resources and the multilayer network should not be considered as a set of separate layers but as an integrated model. Sometimes it may be worth to install more links, prepare more interfaces and adapt more nodes to minimize the total cost of dimensioning problem in multilayered networks.

REFERENCES

[1] DEMESTER P., GRYSSELS M., et al. Resilience in Multilayer Networks. IEEE

Communications Magazine 37/8, 1999, pp. 70–76. [2] DIJKSTRA E. W., A note on two problems in connexion with graphs, Numerische

Mathematik, No. 1, 1959, pp. 269–271. [3] DIJKSTRA F., ANDREE B., KOYMANS K., HAM J. and LAAT C., A Multi-Layer

Network Model Based on ITU-T G.805, In Press, May 2007. [4] KOHN M., Improving Fairness in Multi Service Multi Layer Networks, Transparent Optical

Networks, No.1), 2005, pp. 53–56. [5] KUBILINSKAS E. and PIÓRO M., An IP/MPLS over WDM network design problem,

International Network Optimization Conference (INOC), 2005, pp. 20–23. [6] KUCHARZAK M., Bifurcated and non-bifurcated flows in multilayered networks

dimensioning issues, 6th Student Scientific Conference (KNS), Wroclaw, Poland, 2008 (in Polish)

[7] ONLINE. The library of test instances for survivable fixed telecommunication network design. http://sndlib.zib.de, 2006.

[8] PIÓRO M. and MEDHI D., Routing, Flow, and Capacity Design in Communication and Computer Networks, Morgan Kaufmann Publishers, 2004.

128


Keywords: Database, embedded system, experimentation, SQLite, PostgreSQL, MySQL

Dawid PICHEN* Iwona POŹNIAK-KOSZAŁKA *

A NEW TREND IN SOFTWARE DEVELOPMENT PROCESS USING DATABASE SYSTEMS

In this paper, we present a new trend in software development processes, which uses database systems instead of ordinary files to purposes of applications data storing. We also present the implemented experimentation system, which gives an opportunity to investigates problem occurring in databases. The objective of experiments is to evaluate a database management system which may be used as a data storage in developed programs. We present and discuss some results of investigations.

1. INTRODUCTION

Most current computer programs need to store non-volatile data in some kind of memory. There are two most popular methods of non-volatile data storing. The first one uses regular files as a data carrier. This method is recommended when the data file has a binary content or has a larger size. Most UNIX applications use this method to store data. For example, program configuration files are kept in the /etc directory. Usually they are just regular text files and each of them contains one or more data fields with zero, one or more assigned values. The structure of such files strongly varies between programs, so this method is not compact. It also has a noticeable disadvantage – program developers have to write a data file parser or use one from many already written libraries.

The second method is more sophisticated but it can be used only in Microsoft Windows operating systems (OS). Beginning from Windows NT 3.51 there is a centralized database built-in. This database is commonly called the registry. The


129

purpose of the registry is to store data from different applications in a compact way. The registry is divided into keys and values. There are 12 different types of values. The great advantage of this method is that almost all data (except the huge values) are kept in one place and there is no need to write the parser, since OS handles all operations with the registry through API functions. Usage of the registry is possible only in Windows systems, so the method is not portable.

Another method is to use an external database to store data. In that case, it can be portable – of course when the database is. Like in the Windows registry, there is no need to write a parser, it is only necessary to use proper API functions. That makes the program developing process faster and easier. There are many database systems available on the market, but not all of them are recommended for the presented purpose. The first requirement for a database system is to have the smallest possibly size, so that the main application size do not increase too much. The second aspect is the cost of a database system. For that reason we concentrate on free or open source software, to choose database systems that can be used in a developed software without paying a single fee. The third requirement is the portability, both a database system and API libraries should be available on different operating systems and/or hardware architectures. The important requirement is also the performance of a database system. In most cases requirements presented in [1] are eligible.

The rest of the paper is organized as follows. Section 2 contains a short description of the considered database systems. In Section 3 the main idea and opportunities of the experimentation system are presented. Section 4 is related to investigations, including design of experiments, results of selected experiments and analysis of the database system effectiveness. Conclusions and final remarks appear in section 5.

2. SELECTED DATABASE SYSTEMS

In this paper three well known database systems will be presented and compared in a simulated environment. These database systems are: MySQL, PostgreSQL and SQLite. Each of them is ported to various operating systems and they are open source projects. Each of them also recognizes the SQL language but particular statements and supported features may vary depending on the system.

130

2.1. MYSQL

MySQL is a relational database management system written as an open source project. Its developers established the MySQL AB company, which develops, distributes, sells and supports the commercial version of MySQL database. Because of that, there are two license models available: the GNU General Public License and the commercial license. The first license type allows developers to use MySQL for free, as long as the main application is under the GPL license. This means that MySQL database needs to be purchased if it is going to be used in a commercial product. That can exclude MySQL if it is planned to built a commercial application without paying for a database system. In 2008 Sun Microsystems acquired MySQL AB.

The MySQL database management system has been actively expanded since 1995 when the first version emerged. Beginning from version 5.0, it is an advanced database system which supports i.e. unions, subqueries, cursors, stored procedures, triggers, views, query caching, UTF support, etc. Its SQL language conforms to the ANSI SQL 99 standard. MySQL, unlike other database systems, is provided with multiple database storage engines. Although this system is advanced, it is considered to be very fast too.

MySQL is a very popular product, mainly among dynamic website developers. It has gained a strong position on the market because it has been used as one of the components of LAMP (Linux, Apache, MySQL, PHP) software package [2]. The LAMP software package is known as an easy to use and maintain, powerful system to deliver dynamic Web content.

Accessing MySQL database systems with programs written in major programming languages is not a problem, because there are many client libraries available with well documented APIs [3].

2.2. POSTGRESQL

PostgreSQL is an open source relational database management system. Unlike MySQL, PostgreSQL is not owned by any company. Its developers belong to PostgreSQL Global Development Group, which is a community of people and companies supporting this project. Great advantage of this project compared to MySQL is its license type. PostgreSQL is available under BSD license, which is very liberal. This license type allows to use PostgreSQL for free, even in commercial, closed source applications.

The history of PostreSQL started in 1986 at the University of California at Berkeley where prof. Michael Stonebraker started the Postgres database project. In 1995 two Stonebraker's students replaced the query language used by Postgres with SQL. They also changed the database name to Postgres95. In 1996 Postgres95 was

131

published to the open source community. Developers from around the world stared to modify the database. They have improved stability and speed, added new features and written documentation. That made database a powerful and advanced product. Due to those changes, the project has changed its name to PostgreSQL and the first version of PostgreSQL was 6.0.

PostgreSQL Global Development Group advertises its product as The world's most advanced open source database. Indeed, PostgreSQL is an advanced system which supports i.e. foreign keys, joins, views, triggers, stored procedures in multiple languages, subqueries, asynchronous replication, nested transactions, hot backups, UTF support, etc. Its SQL language conforms to the ANSI SQL 92/99 standards.

PostgreSQL is often used as a replacement for MySQL in the LAMP package. Like MySQL, there are also many client libraries available for different languages. More info about PostgreSQL can be found in [4].

2.3. SQLITE

SQLite is also an open source database management system, but it is much different than two systems mentioned before. The main difference is the model of the system. MySQL and PostgreSQL are based on the client-server model, in which at least one server instance is needed to serve queries from clients. Even if the database needs to be installed only on one computer, running the server is necessary to handle the user queries. These queries need to be sent by using a client library. SQLite is an embedded database system which is contained in a single library. To use it, it is only necessary to link the main program with its library file and then, communication with the database is achieved by calling library functions described in the API's documentation [5]. Because of that, there is no a database process in the system. Instead, the process of the main program changes the database file itself by calling proper database system functions. SQLite uses just a single file to store the whole database. This system was designed to have the smallest possible size and be usable as well. Size of its library is less than 0.5 MB, so it can be used not only in computer programs, but also in stand-alone devices such as mobile phones, set-top boxes, MP3/4 players, etc. [6]

SQLite is published on the most liberal license type – public domain. Everyone can use its source code for any purpose without any notice that it was used. SQLite source code does not contain any information of its author, because its author wanted to emphasize the license.

SQLite is not so advanced as the previous database systems, mainly because of its size. It is addressed to other segments of the market, in which the program size matters. It supports most of the ANSI SQL 92 standard, but some its features are not implemented. Features that are implemented are i.e. transactions, triggers, complex

132

queries and UTF support. Features that are not implemented are i.e. foreign keys support, right and full outer join, table permission statements (since it uses ordinary files and then file permissions), etc. The large difference between SQLite and other DBMSs is columns data types handling. SQLite recognizes different data types in the CREATE TABLE statement, but it actually stores all data in only 5 different storage classes. Those are NULL, INTEGER, REAL (floating point values), TEXT and BLOB (blob of data). But the most important fact is that the column data types are not static. That means each column of records can have different data types assigned. For example, text string can be inserted into a column declared as integer. Declaration of column data types is only a notice to SQLite engine of the recommended data type. This is an unusual behavior, which is sometimes criticized.

3. COMPUTER EXPERIMENTATION SYSTEM

In every experimentation system it is necessary to define the inputs and the outputs. Our experimentation system has opportunities for setting up four different inputs (denoted by I1, …, I4) and observing two different outputs (denoted by O1 and O2). The block-scheme of the process of experiment as an input-output plant is shown in Fig. 1.

Fig. 1. Object of the experiment

Inputs. In Table 1 inputs used in all complex experiments are specified, including the kind of database management system (I1), the number of records (I2), the type of query (I3) and the number of queries for a test (I4).

Experiment plant

MySQL client library

PostgreSQL client library

SQLite library

MySQL server

PostgreSQL server

ordinary file

O2 O1

I1

I2

I3

I4

DBMSs

133

Outputs. Table 2 contains outputs which may be regarded as “local” measures of effectiveness. The outputs are the query execution time (O1) and the processor utilization time (O2).

Table 1. Input data for each experiment

Input Input name Possible values of input

I1 Database Management System MySQL

PostgreSQL

SQLite

I2 Number of records (added, updated, deleted or used to select data from)

10, 20, 50, 75, 100, 200, 500, 750, 1000, 3000, 5000, 7500, 10000

I3 Type of query Acquiring data (SELECT) from 1 table

Acquiring data (SELECT) from 2 tables (joining) Adding data (INSERT)

Modifying data (UPDATE)

Deleting data (DELETE)

I4 Number of queries for a test 10

100

1000

Table 2. Outputs for all experiments

Output symbol Output name O1 Query execution time O2 Processor utilization time

The experimentation system consists of several modules. The main module of the system is a special program called DBsim which performs simulations of different database systems. This application was written in C language and was designed to work in UNIX and UNIX-like operating systems. It requires MySQL, PostgreSQL and SQLite developer libraries to work, as well as running instances of MySQL and PostgreSQL servers and earlier prepared suitable users and databases. Each database consists of two tables: files and file_types. The simulator feeds the table files with the information about files stored on the simulator environment a real file system. Table file_types stores MIME types and text description of popular files. SQL statements used to create the files table are presented in Fig. 2. It can be noticed that these statements are not equal, there are few differences between the database systems. This

134

is because of the auto increment feature that is supported in every database used in the experiment, but in a different way. Statement used to create the file_types table is common to every database and is presented in Fig. 3.

MySQL PostgreSQL SQLite

CREATE TABLE files

(

id INTEGER

NOT NULL

AUTO_INCREMENT

PRIMARY KEY,

name TEXT,

path TEXT,

filedate TIMESTAMP,

chksum_md5 TEXT,

size INTEGER,

file_type INTEGER

NOT NULL

);

CREATE TABLE files

(

id SERIAL

NOT NULL

PRIMARY KEY,

name TEXT,

path TEXT,

filedate TIMESTAMP,

chksum_md5 TEXT,

size INTEGER,

file_type INTEGER

NOT NULL

);

CREATE TABLE files

(

id INTEGER

NOT NULL

PRIMARY KEY

AUTOINCREMENT,

name TEXT,

path TEXT,

filedate TIMESTAMP,

chksum_md5 TEXT,

size INTEGER,

file_type INTEGER

NOT NULL

);

Fig. 2. Statements used to create files table

MySQL / PostgreSQL / SQLite

CREATE TABLE file_types

(

id INTEGER NOT NULL PRIMARY KEY,

mime_type TEXT,

description TEXT

);

Fig. 3. Statements used to create file_types table

4. INVESTIGATIONS

All experiments were executed using the same computer, thus comparing obtained results is reliable. All experiments were performed on a notebook computer with Intel Core 2 Duo T5500 processor and 1 GB of RAM memory. OpenSUSE 10.2 operating system was used with a 2.6.18.8 version of Linux kernel. The main program was built

135

with gcc 4.1.3 compiler. The following versions of database management systems were used in the experiment: PostgreSQL 8.1.11, MySQL 5.0.26, SQLite 3.3.8.

4.1. EXPERIMENT FOR DATA INSERTING

For this research, from 10 to 10000 simulated records were subsequently added to each database system. The total execution time of this operation was measured. The results are presented in Fig. 4. It can be seen that SQLite was the slowest database system, it needed almost 50 s to add 750 records, while its rivals did it in less than 1 s. The fastest database is MySQL, it added 10000 records in less than 1 s. PostgreSQL did it in 6 s, which is still acceptable time for most situations.

0 2000 4000 6000 8000 1000010

-4

10-2

100

102

104

Number of records

Tim

e [s

]

MySQL

PostgreSQL

SQLite

Fig. 4. Query execution time for records adding (time is on the logarithmic scale)

4.2. EXPERIMENT FOR DATA FETCHING

For this research, there were two different SELECT queries passed to the databases. The total execution time of these queries was measured. Results are presented in Fig. 5. The first query was selecting data from the files table only. The second query was a complex one, it did left outer joining files table with file_types table. It can be noticed that MySQL was the fastest database when the simple query was used. For the complex query, SQLite was the fastest database, however it was the slowest system when the simple query was used. PostgreSQL was about 40% faster than SQLite, when the simple query was used and the dataset had more than 4000 records. In the case of the complex query, PostgreSQL was the slowest database, with results much worse than its competitors (as high as 5 times slower than MySQL).

136

0 2000 4000 6000 8000 100000

0.05

0.1

0.15

0.2

Number of records

Tim

e [s

]

MySQL

PostgreSQL

SQLite

0 2000 4000 6000 8000 10000

0

0.02

0.04

0.06

0.08

0.1

Number of records

Tim

e [s

]

MySQL

PostgreSQL

SQLite

Fetching data from a single table Joining data from two tables

Fig. 5. Query execution time for data selecting

4.3. EXPERIMENT FOR DATA MODIFYING

In this experiment values in column chksum_md5 were changed in all table rows. The UPDATE statement execution time was measured. Results are presented in Fig. 6. It can be seen, that the fastest was MySQL database. When there were more than 3500 records in the table, PostgreSQL was the slowest system, but when there were less records, SQLite was. But in all cases, even for a table with a large amount of rows, the data modification time was still tolerable.

0 2000 4000 6000 8000 100000

0.2

0.4

0.6

0.8

Number of records

Tim

e [s

]

MySQL

PostgreSQL

SQLite

Fig. 6 Query execution time for data modifying

4.4. EXPERIMENT FOR DATA DELETING

In the last experiment presented in this paper, the whole content of the files table was deleted. The total execution time of the DELETE query was measured. Results of

137

this experiment are shown in Fig. 7. It can be noticed, that the fastest system was MySQL, it totally beaten its rivals with the time, which was less than 10 ms, even for deleting as many as 10000 rows. When there were more than 1000 records in a table, it can be noticed that the delete query time of SQLite was almost constant (about ¼ s), probably because it changed only the database file in fixed positions). When there were more than 3500 records, PostgreSQL was the slowest database system, and it can be seen that the delete query execution time depends on the number of rows in a linear way.

0 2000 4000 6000 8000 100000

0.1

0.2

0.3

0.4

Number of records

Tim

e [s

]

MySQL

PostgreSQL

SQLite

Fig. 7. Query execution time for data deleting

5. CONCLUSIONS

On the basis of the simulations made, the following conclusions may be considered in a program developing process. Using a database system in the computer program instead of regular files strongly reduces the total developing time, because there is no need to write special functions that would fetch, add, find, modify or delete data. Algorithms used in the database systems cause that those operations are usually faster than if they are done by their own functions. If it is planed to store the huge amount of data, MySQL or PostgreSQL should be chosen, but they also need a separately delivered database server with the application. If that application is going to be a commercial product with a free database, then it is better to choose PostgreSQL. If the database has a constant huge table that is delivered with the application and it does not add a lot of rows, SQLite is preferred. If the size of the environment in which the desired application would run is small, SQLite because it does not need a running server and its size is smaller than 0.5 MB.

138

Although SQLite is a very small system, it has a very good overall performance, except of data inserting. Most programs do not store a lot of data, thus, SQLite is a good solution. It neither requires any configuration, so its integration with the developed program would be the easiest. Finding data by using the database system is very fast. In all examined database systems it took less than 200 ms to fetch data from a table that had 10000 records, both when the query was simple or complex. That processing time is acceptable in most cases. The processor utilization time for all compared databases was relatively low, even for a large number of rows in the table. However, these results are not completely sure for MySQL and PostgreSQL, because they work as an another process and their client libraries communicate with them by a socket. It causes the simulator to measure only the client library processor time, which always is low, because the client only passes queries to the main server, and processes the obtained data.

REFERENCES

[1] POZNIAK-KOSZALKA I., PICHEN D.: Simulation Based Evaluation of Algorithms for Improving Efficiency of Database Systems, pp. 211-218, Proc. of the MOSIS '07 Conference, Rožnov pod Radhoštěm, Czech Republic, 2007.

[2] LANE D., WILLIAMS H. E.: Web Database Application with PHP and MySQL, 2nd Edition, O'Reilly, 2004.

[3] MySQL 5.0 Reference Manual, http://dev.mysql.com/doc/refman/5.0/en/, last modified: 2008-03-11 (revision: 10190).

[4] PostgreSQL 8.3.0 Documentation, http://www.postgresql.org/docs/8.3/interactive/index.html, last modified: 2008-03-06.

[5] SQLite Version 3 C/C++ API Reference, http://www.sqlite.org/c3ref/intro.html, last modified: 2008-02-21.

[6] HUDSON P., Interview: Richard Hipp, Linux Format magazine, Issue 73, United Kingdom, 2005.

[7] POZNIAK-KOSZALKA I.: Relational Data Bases in Sybase Environment – Modeling, Designing, Applications, WPWR, Wroclaw, 2004 (in Polish).

[8] KLINE K. E.: SQL in a Nutshell, 2nd Edition, O'Reilly, 2004.

[9] POZNIAK-KOSZALKA I., PICHEN D.: Simulation Based Evaluation of Hardware Methods for Improving Efficiency of Database Systems, pp. 131-136, Proc. of the ASIS 2007 Colloquium, Hostýn, Czech Republic, 2007.

[10] KING K., JAMSA K.: SQL Tips and Techniques, Premier Press, 2002.

139


Keywords: network, protection, p-cycles

Adam SMUTNICKI∗

Krzysztof WALKOWIAK∗

AN ALGORITHM FOR UNRESTORABLE FLOW

OPTIMISATION PROBLEM USING P-CYCLES

PROTECTION SCHEME

This paper deals with Unrestorable Flow Optimisation (UFO) problem in networks pro-

tected by p-cycles. This novel protection technique is used as the efficient tool for ensuring

survivability of computer networks. In this paper, the mathematical model of UFO problem

and the original solution algorithm based on metaheuristics are formulated. The proposed

algorithm combines k-shortest paths method, multi knapsack problem, p-cycles generator,

linear programming method and tabu search approach.

1. INTRODUCTION

Survivability of computer networks and systems is located among the most im-portant subjects in modern computer engineering and science. This research topicembraces the wide spectrum of particular technological and theoretical problemsderived from computer architecture area, network topology, communication pro-tocols, transmission, coding, cryptography, etc. The topology of the computernetwork has crucial meaning for its survivability, since physical creation of the netlinks is much more time-consuming and troublesome than producing a new (orspare) device, furthermore faults of network links and nodes are still the commonproblem. Traditionally, for years, ring and mesh topologies are used to increasenet survivability, with all their advantages and disadvantages. Quite recently,p-cycles appeared as the competitive and useful tool for providing survivabilityof real computer networks. The approach was born less than a decade ago andmakes a great career in a short time. The fundamental usage of p-cycles assumes

∗Department of Systems and Computer Networks, Wroc law University of Technology, Poland.

140

that network configuration can be updated, namely the capacity of some linkscan be increased, to achieve desired level of protection. The cost of such capacitymodification constitutes the goal function of the suitable optimization problem.This case has been considered commonly in the scientific papers. Nevertheless,p-cycles offers also possibility of achieving higher reliability of the network with-out any additional cost on increasing link capacities. This case ensures clearbenefits in comparison to classical cycle approach, but has been only mentionedin literature and practically not researched — such problem will be considered inthe presented paper.

The remainder of the paper is organized as follows. In Section 2 we providebrief introduction into p-cycles idea. Section 3 affords mathematical model of theoriginally formulated UFO (Unrestorable Flow Optimization) problem whereasSection 4 describes the solution method and component algorithms.

2. FUNDAMENTALS OF P-CYCLES

Traditionally, either ring or mesh topology have been used in the constructionof survivable computer networks. Ring offers short restoration time as well assimple restoration scheme, however its design and operation is rather complexand the usage of total transport bandwidth is inefficient. Mesh is easy to design,optimize and operate, but have greater than ring restoration time. Mesh networksdo not require as much spare capacity as rings, because in the restoration processcapacity demand can be split between different links. On the other hand, ringsare so efficient in the restoration process because there are no need to searchfor restoration path. Obviously, there is a great need to find topology, whichaggregates all advantageous properties of mesh and ring networks. This idea wasfully realized in the concept of p-cycles, that means “fast as ring”, “efficient asmesh”, preconfigurable and protected.

2.1. BASIC NOTIONS AND PROPERTIES

We perceived computer network protected by p-cycles as mesh with workingpaths realizing demands of flows between specified nodes, by using one of shortestdirect routes (k-shortest paths). A collection of p-cycles is formed in advancewhile configuring network, to be ready to use in case of any failure and performreal-time recovery. p-Cycles are not an ordinary cycles. Let us consider a meshnetwork and choose some cycle (Fig. 1 a and b). In classical cycle protection

141

approach, this cycle protects all spans being ”on-cycle”. In the paper [9] it isshown that cycle established on mesh network protects also ”straddling spans”,i.e. spans between cycle nodes, but not belonging to the cycle (Fig. 1 c). Observe,that in case of failure of ”straddling span” the arc of cycle can be used to transferwhole flow from this failed span. This property allows one to extend protectionprovided by p-cycles on straddling spans as well.

Fig. 1. p-Cycles in the mesh network.

In case of failure of ”on cycle” span, there is one path which can be usedto transfer flow (Fig. 2 b). But for failure of ”straddling span” there are twodifferent paths, which can be used for recovery process. Arc of the cycle can beused as a path, or both arcs to achieve lower load on links (Fig. 2 c and d).Because without using any additional links and spare capacity we achieve much

142

higher level of protection, protected are not only cycle spans but also ”straddlingspans”. ”Straddling spans” have twice the leverage of an on-cycle span in terms ofefficiency because when they fail, the cycle itself remains intact and can therebyoffer two protection paths for each of unit of protection capacity. This spansare not limited to be inside a cycle, each span between two nodes of the cycle isprotected by this idea.

Fig. 2. p-Cycles protection schemes for various types of failure.

Notice, we do not need any additional spare capacity to protect ”straddlingspans”, because the spare capacity from ring spans is used to protect those spans.This means that we can protect much more spans and link capacity using thesame amount of spare capacity as in the ring model. Thus, under the same costswe can achieve higher level of network survivability.

143

2.2. OPTIMISATION PROBLEMS AND EXTENSIONS

Traditional p-cycles approaches consider two optimization tasks, [8], each ofthem is formulated for pre-generated set of p-cycles. The former one tends toachieve maximum restorability using existed spare capacity on links. Our overallaim is to find a configuration of p-cycles in the network, on which we have specifiedamount of spare capacity. The problem can be defined as:

min∑

i∈S

ui (1)

where S — the set of network spans; ui — the number of unrestorable workingchannels on span i. Formula (1) minimizes the number of unprotected channels,thus maximizes restorability. In the latter problem we are looking for the mini-mum spare capacity, which we have to provide to our network, in order to achieve100% restorability. The optimization task can be defined as:

min∑

i∈S

cisi (2)

where ci — cost or length of span i; si — the number of working spans on link i.Both optimization tasks relay on the assumption that a candidate set of p-cycleshav been already generated and we only have to choose the best one (or a set)among them, to achieve desired reliability. In practice, the number of such can-didates growths exponentially with increasing network size. Therefore, efficientalgorithms of p-cycles generation and philosophy of making this generation areespecially desirable, see e.g. [8, 13,23].

Since 1998 there have been published many interesting ideas improving p-cycles reliability and functionality. The first, actually one of the most popular,extends scope of p-cycles protection by considering not only links, but also nodes,or whole paths. In [6] there is presented a new interpretation of a straddling-span,not as physical connections but logical links (then more than one physical spancan be protected). This approach is called path protection because it protectsa path from one node lying on a cycle to another node on a cycle, including linksand nodes on a path. Using this method not only links can be protected but alsonodes. The idea of protection of a straddling path consisting of more than onephysical link have been proposed in [8].

In the basic p-cycles model we have guarantee that 100% recoverability isobtainable for any single failure. However, assumption about lone failure cannotbe sustained in networks of current size and scope. Networks must be prepared to

144

deal with multiple failures in the same time, see e.g. node failure (all links to thisnode become disconnected contemporary). In [12] dual-failure protection model isdiscussed, and an integer linear programming model for designing a minimum-costp-cycles network with a specified minimum dual-failure restorability is proposed;some level of dual-failure restorability can be achieved without any additionalspare capacity. An alternative approach, [16], is based on an assumption thatit is hardly to observe dual (multiple) failure at the same time moment. Then,some action (dynamic reconfiguration of p-cycles) can be taken in the time periodbetween failures.

3. MATHEMATICAL MODEL AND UFO PROBLEM

In this section, we present an optimization model of flow allocation (as well asthe routing pre-selection) in network protected by fixed configuration of p-cycles.An existing backbone network is considered. In many cases the network is inan operational phase and augmenting of its resources (links, capacity, replicaservers). The network survivability — indispensable in modern computer net-works — is provided by the concept of p-cycles described in previous section. Inthe problem we are given network topology, link capacity, traffic demand matrix,candidate paths for demands, p-cycles configuration. We optimize over workingflows in normal non- failure state of the network for protection in the case of sin-gle link failure. The objective is to minimize the unrestored flow, i.e. flow thatdue to limited resources of link capacity cannot be restored using the conceptof p-cycles. This problem was firstly described in [19]. Notation based on [17]and [2] will be used.

Indicese, l = 1, 2, . . . , E network links (spans)d = 1, 2, . . . ,D demandsp = 1, 2, . . . , Pd candidate paths for flows realizing demand dq = 1, 2, . . . , Q p-cycless = 1, 2, . . . , S failure states

Constantsδedp = 1, if link e belongs to path p realizing demand d; 0 otherwisehd — volume of demand dce — capacity of link e

145

βeq = 1, if link e belongs to p-cycle q; 0 otherwiseǫeq = 1, if p-cycle q can be used for restoration of link e; 0 otherwise i.e.

link e either belongs to p-cycle q or is a straddling span of qγeq — coefficient of restoration paths provided for failed link e by an

instance of p-cycle q (= 1 for an on-cycle link; = 0.5 for a straddlingspan; = 0 otherwise)

Variablesxdp = 1 if demand d uses path p; 0 otherwise (binary)fe — load of link e associated with working demandsydeq = 1 if demand d uses path p-cycle q for restoration in the case of

failure of link e; 0 otherwise (binary)zde = 1 if demand d is not restored in the case of failure of link e;

0 otherwise (binary)gel — load of link e associated with p-cycle in the case of failure of link l

Objective

minU =∑

e

∑

d

zdehd (3)

Constraints∑

p

xdp = 1, d = 1, 2, . . . ,D (4)

fe =∑

d

∑

p

δedpxdphd, e = 1, 2, . . . , E (5)

zde +∑

q

ǫeqydeq =∑

p

δedpxdp, e = 1, 2, . . . , E d = 1, 2, . . . ,D (6)

gel =∑

d

∑

p

δldpxdp∑

q

βeqydlqγlqhd, e = 1, 2, . . . , E l = 1, 2, . . . , E (7)

fe + gel ≤ ce, e = 1, 2, . . . , E l = 1, 2, . . . , E (8)

Constraint (4) imposes the non-bifurcated flow, i.e. one of candidate pathsmust be selected for each demand. (5) is a definition of the link load, calculatedas a sum over all demands and candidate paths. Constraint (6) assures that ifdemand d uses link e then in the case of failure of link e either the demand d isnot restored, or one of p-cycles selected for restoration. The right-hand side of

146

(6) is 1, only if link e belongs to the path p selected for demand d. Consequently,if path p used by demand d does not use link e, there is no need to decide onrestoration of d in the case of link e failure. Since we include in the sum constantǫeq (= 1, if p-cycle q protects link e) in the sum over all p-cycles (right-hand side),(6) guarantees that the p-cycle q selected for restoration of demand d using link ein the case of failure of link e (ydeq = 1) can restore flow of d. In other words, inthe sum

∑

q ǫeqydeq we take into account only these p-cycles q, which can protectlink e. Let us have a closer look at definition (7), which enables to calculatethe overall flow of p-cycles allocated to link e in the case of failure of link l.Notice that γlqydlqhd denotes how much flow of demand d is allocated to p-cycleq for restoration in the case of failure of link l. Recall that γlq is restorationcoefficient of paths provided for failed link l by p-cycle q. Due to construct ofp-cycle γlq = 1 if l is on-cycle link; γlq = 0.5 if l is a straddling link of p-cycleq (restoration is run on both parts of p-cycle, therefore each part carries half ofthe demand), and finally γlq = 0 otherwise. Moreover, term δldpxdp enables tocheck whether or not the path selected for demand d uses link l. Combining bothterms (δldpxdpγlqydlqhd) we obtain the flow (if any) of demand d carried on linke in the case of link l failure. Finally, to compute gel we must check if p-cycle quses link e (βeq = 1). The next constraint (8) is a capacity constraint and assuresthat flow allocated to link e in normal failure-free state of the network plus flowassociated with p-cycles using link e in the restoration process does not exceedlink capacity.

The UFO problem is the special case of (3) – (8) obtaining by fixing paths forrealizing demands (i.e. by fixing variables xdp).

4. SOLUTION METHOD

In the context of optimization problem formulated in Section 3 one can proposequite natural decomposition into several sub-problems as follows. At the begin wehave the network with detailed description of topology, span capacities and costs.Then, for this network a set of demands is given. Each demand determines flowtransfer between pair of nodes (from source to destination). In order to satisfythese demands routing paths have to be found, taking into account a cost of eachpath. This problem is known in the literature as multicomodity flow (MCF),and will be described briefly in Section 4.1. MCF problem requires the set ofshortest paths (k-SP) between specified pair of nodes, among which MCF selectsthe set of the best ones (see [17] for detail). Observe, the configuration of paths

147

satisfying demands need be advantageous for p-cycles configuration. Our overallaim is to find the optimal set of p-cycles; this will be done by tabu search (TS)algorithm, which goes through the solution space by certain search trajectoryverifying candidates of p-cycles. Successive p-cycles in this space can be generatedon demand by using a reasonable generator. One among described in literaturecan be used to this order, [1, 4, 10, 13, 14, 18, 18, 21, 22, 22–24]. For fixed set op-cycles and fixed routing paths realizing demands we have to solve the UFOproblem. Its optimal solution is created by independent checking of restorabilityfor each span with nonzero flow. If this span transfers single commodity flow,its restorability case can be simply evaluated. Otherwise, if the span transfersmulticommodity flow, the multi-knapsack (MK) problem is used to find optimalrestoration scheme. Both cases require the evaluation of so called spare capacity ofthe cycle. For disjoin cycles inside current p-cycles configuration such evaluationcan be made independently and the problem is not troublesome. If cycles insidecurrent p-cycles configuration are not disjoint, the problem of finding real sparecapacity of cycles will be solved by special linear programming (LP) problem.

Before the construction of final (aggregated) solution algorithm one shouldselects the optimization strategy, either non-joint or more complex joint opti-mization. More elaborate discussion of this subject is presented in Section 4.2,where examples and description of efficiency are shown. The solution method ofUFO problem is described in Section 4.3. Section 4.4 introduces MK Problem andits solution methods. Section 4.5 provides the LP problem used for distributionof spare capacities in cycles in current p-cycles configuration.

4.1. MULTICOMMODITY FLOW PROBLEM

Term commodity means in computer networks a set of packets having the samesource and target nodes. Generally spans transfer multicommodity flow. Themuticommodity flow problem can model average flow of computer data packetsin networks in chosen time unit.

By [20], multicommodity flow problem I = (G, c,K) is defined for undirectedgraph G = (V,E), where V means a set of vertices, E is a set of graph edges andc is a vector of span capacities. For G there can be defined specification K withk commodities, counted 1, . . . , k, where specification of commodity i consists ofpair source-target si, ti ∈ V and non-negative value of demand di. The numberof different sources is represented by k∗, the number of nodes is n and numberof vertices is m, m ≥ n. Graph G is connected and does not have parallel edges.Symbol uw represents an edge is unique and means directed edge.

148

Multicommodity flow f consist of k vectors fi, i = 1, . . . , k. Value fi(uw)stands for flow of commodity i through edge uv. If flow of commodity i havethe same direction as edge uv then it has the same sign, in other situation hasopposite sign. Is is only needed to determine flow directions. For each commodityi there are defined following constraints:

∑

wu∈E

fi(wu)−∑

uw∈E

fi(uw) = 0 ∀u 6∈ si, ti (9)

∑

uw∈E

fi(uw) = di for u = si (10)

∑

wu∈E

fi(wu) = di for u = ti (11)

Total value of flow on span uv is defined as:

f(uw) =∑

i

|fi(uw)| (12)

Flow can be realised when:f(uw) ≤ c(uw) (13)

Accordingly to [5] multicommodity flow with integer values of flow and eventwo commodities is the NP-complete task. Since MF problem is known for years,any literature algorithm can be used to solve it.

4.2. OPTIMISATION OF P -CYCLES

In Section 2.2 (following by [19]) there have been presented optimisation tasksassociated with p-cycles. Most solutions relay on the assumption that routing(Multicomodity Flow Optimisation, described in Section 4.1) has been alreadyperformed. Then the set of p-cycles is generated and searched to find optimalp-cycles configuration, see Fig. 3 a. Notice, in our problem spare capacity andp-cycles configuration depends each other. Then one can propose the alternativeapproach, see Fig. 3 b, called a joint-optimisation — flow allocation and p-cyclesoptimization is made with a feedback (by using two-level algorithm).

While analyzing literature one can observe that most of authors avoid jointoptimization, solving problems by using non-joint attitude. Frequently the argu-ment “to complex” justify abandonment. Yes, this is true, joint optimisation ismuch more complex, but as it has been shown it may provide much better results.

149

(a) (b)

Fig. 3. Comparison of two p-cycles optimisation ideas. MCF — Multicomodity Flow, CS —Cycle Selection, SCO — Spare Capacity Optimisation.

So the main idea is to design appropriate algorithm which can give, maybe notoptimal solutions, but approximate solution within reasonable time. This canbe achieved using metaheuristics (Genetic Algorithm, Ant Colony Optimization,etc) or local search algorithms (Tabu Search, Simulated Annealing, SimulatedAllocation).

In [8] there have been presented comparison of results received by using joinand non-joint optimisation. Tests have been done for four networks: three with32 nodes and 51, 47, 41 spans and on COST-239 network. For a network with 32nodes 10 000 p-cycles candidates have been generated, for COST-239 500 p-cyclescandidates have been generated. For joint optimisation there have been chosen 5shortest paths (for realisation of each demands) and for COST-239 the numberof chosen paths was 10. Table 1 presents results from [8].

Accordingly to results presented in Tab. 1 it is obvious that joint optimisationproduces significantly better solutions that non-joint optimisation — difference isabout 20-25%. Quite low redundancy (39%) for COST-239 network is a very goodresult. There is also a relationship between redundancy achieved and networkdegree. Lower values of network degree means bigger redundancy value. Forthis reason COST-239 have best redundancy value (because of highest networkdegree).

For described problem the join-optimisation approach have been chosen, anda Tabu Search [7] algorithm as a direct solution method. The main idea of Tabu

150

Table 1. Comparison of results of joint and non-joint optimisation (from [8]). 32n51s means

network with 32 nodes and 51 spans.

Network network total total redundancy number ofdegree load spare cycles

Non-joint

32n 51s 1.18 394 877 352 559 89.3% 1632n 47s 2.94 414 984 347 285 90.2% 3632n 44s 2.75 423 596 403 982 95.4% 30COST-239 4.72 137 170 81 790 59.6% 7

Joint

32n 51s 1.18 405 539 246 943 65.0% 2832n 47s 2.94 424 267 300 400 74.6% 3332n 44s 2.75 413 853 341 269 82.5% 23COST-239 4.72 143 745 46 910 39.0% 4

Search algorithm won’t be described in this paper, because the standard solutionwill be used, and implementation is in progress. The main idea of this sectionis to describe non typical problems associated with the analysed optimisationproblem.

4.3. ALGORITHM FOR UFO PROBLEM

In section 3 there has been formulated Unrestorable Flow Optimisation (UFO)problem. We would like to find criteria value described in equation (3) and tocalculate matrix zde, values d are given as input. On this stage of consideration weassume that all paths realizing demands are known and all p-cycles are configuredwith their capacities. Algorithm is presented on Lis. 1, all terms are compatiblewith used in Section 3, and also:

E set of edges in network;

De set of flow demands realised by span e;

Qe set of p-cycles protecting span e (e can belong to cycle or be a straddlingspan);

Re set of demands which can be restored in case of failure of span e;

Ue set of demands which can not be restored in case of failure of span e;

Listing 1. Algorithm for evaluation of zde values, need for calculation of final value of criteriafunction.

for e ∈ E do

151

begin

(Re, Ue) = checkRestoriationPosibility(e,De, Qe) ;for d ∈ R do zde = 0 ;for d ∈ U do zde = 1 ;

end ;

Function checkRestoriationPosibility(e,De, Qe) is crucial to this algorith —its aim is to decide whether all demands realized using span e can be restored incase of failure of span e. If not all demands can be restored, this function shouldoptimize this decision process, so that the amount of unrestorable flow will beminimal. This function returns information about two sets — restorable (Re)and unrestorable (Ue) demands for span e. It looks quite simple, but is not, aswill be show below. There are several scenarios which can appear:

i). span e is protected by single p-cycle with capacity q and only one demandd (with the value of flow h) is realised using span e;

ii). span e is protected by single p-cycle with capacity q and more than onedemand (each of them have its own value of flow hd) is realised using spane;

iii). span e is protected by more than one p-cycle, each with its own capacity qiand more than one demand d (each with value of flow hd) is realised usingspan e.

Case 1 is very simple — demand can be restored if h ≤ q. Case 2 is morecomplicated if

∑

d∈Dehd > q. If this situation appears, decision which demands

restore and which not have to be made. This is some kind of optimisation prob-lem: to minimise unrestorable flow for capacity q. This task can be modeledusing well known classical knapsack problem [3]. Case 3 is the most complicatedsituation. There is a decision which demands can be restored as well as usingwhich p-cycle. This can be also modeled using knapsack problem, but formulateda little different — it is so called multiple knapsack (MK) problem. Detailed de-scription of MK problem as well as possible solution algorithms are described inSection 4.4, see [11] for detail.

4.4. MULTIPLE KNAPSACK PROBLEM

Multiple Knapsack Problem (MK) is a generalization of well known KnapsackProblem. In MKP exist several knapsacks with generally different capacities.Accordingly to [11] MKP can be formulated as follows:

152

• exists a set of items: N := 1, . . . , n

• each item has its value pj and weight wj, j = 1, . . . , n

• exists a set of knapsacks M := 1, . . . ,m

• each knapsack has on positive capacity ci, i = 1, . . . ,m

Subset N ⊆ N is called fitable if elements of N can be assigned to knapsackswithout exceeding their capacities. In other words there have to exist a methodof fragmentation N into n disjoint sets Ni, such that w(Ni) ≤ ci, i = 1, . . . ,m.Main aim is to choose such subset N , which value is maximised:

max

m∑

i=1

n∑

j=1

pjxij (14)

taking into account constraints:n∑

j=1

wjxij ≤ ci, i = 1, . . . ,m (15)

m∑

i=1

xij ≤ 1, j = 1, . . . , n (16)

where:xij ∈ 0, 1, i = 1, . . . ,m, j = 1, . . . , n

xij = 1 when element j was assigned to knapsack i, otherwise zero.In case when all knapsacks have the same capacity c problem is called multiple

knapsack problem with identical capacities (MK-I). When pj = wj , j = 1, . . . , nproblem is called as multiple subset sum problem (MSS). MSS with identicalcapacities of knapsacks is called multiple subset sum problem with identical ca-pacities (MSS-I). MK is a particular case of general generalized assignment prob-lem (GA). Analysing GA problem each item has value pij (instead of pi), whenassigned to knapsack i — value of item depends on chosen knapsack.

Accordingly to [11] even simplified version MSS-I problem is strongly NP-hard, because it is an optimisation version of 3-partitioning problem, which isstrongly NP-complete. In [11,15] there are presented several algorithms for MK.For our needs the greedy approach has been chosen. In both works presentedalgorithms base on assignment of sequence of elements si−1 + 1, . . . , si − 1 toknapsack i (i = 1, . . . ,m) with the assumption that

si−1∑

j=si−1+1

wj ≤ ci

153

This means that items are packed one after another to knapsacks. If item does notfits actualy packed knapsack, item have to be rejected or actual knapsack is full,and one should start packing next knapsack. Of course elements in a queue forpacking are sorted descending by value of a weight unit (pi/wi). This order assuresthat the most valuable items will be packed in the first order. But algorithm likethis have some disadvantages:

• when an item does not fit to actual knapsack, moving to packing an-other knapsack will waste some space in previous knapsack, where probablysmaller items will fit;

• in case of rejection of item, which does not fit to actual knapsack, it canhappen that this item could have been packed into another knapsack andthe item with higher weight value than other items is rejected.

This problem comes from the fact that from the beginning of this algorithmsthere was an assumption that complexity have to be linear O(n). It have a greatmeaning for bigger n and m values.

For small n and m values another greedy algorithm can be proposed. Itemsfor packing are sorted descending by the value of a weight unit (pi/wi). Eachelement is packed into knapsack according to ”best fit” rule — item is packedinto knapsack which value (ci −wj) is the biggest. Packing like this ensures thatin first order the most valuable elements are packed, and when rejecting element,we are sure that we cannot pack it into any knapsack. This algorithm will havecomplexity O(nm) what is acceptable for small sizes of problem.

Of course presented algorithm as well as algorithms from [11,15] (only greedyalgorithms) are approximate algorithms, and won’t give optimal solution.

4.5. P -CYCLES CAPACITY CALCULATION

In most analyzes of optimization problems connected with p-cycles, authorsconcentrate on minimization of total needed amount of spare capacity Section2.2. In those problems simply additional amount of spare capacity is added toa span, which can allow to build desired protection scheme using p-cycles. Inproblem described in Section 3 there is no possibility to add any spare capacity,only existing spare capacity in working spans can be used. This assumptionsgenerates several additional constraints. Main of them is a decision problem howmuch of spare capacity have to be assigned to each of used p-cycles, in situationwhen at least two cycles have common span — Fig. 4. In this situation sum ofp-cycles capacities cannot exceed span spare capacity. Additionally maximization

154

of sum of whole p-cycles spare capacity is desired. Mathematical formulation ofthis problem is described below. There have been used the following sysmbols:

Fig. 4. Example of configuration where two p-cycles have common span.

rq capacity of p-cycle q;

se available spare capacity on span e;

rmaxq maximum potential capacity of p-cycle q (defined in equation 17);

βeq = 1 if span e belongs to p-cycle q;

Maximum potential capacity of each p-cycle is bounded by value:

rmaxq = minse : e = 1, . . . , E, e ∈ q. (17)

For each span we have constraint:

∑

q

rqβeq ≤ se, e = 1, . . . , E (18)

Total amount of p-cycles capacity should be maximised:

max∑

q

rq, q = 1, . . . , Q (19)

taking into account constraints:

0 ≤ rq ≤ rmaxq , q = 1, . . . , Q (20)

The problem (19) – (20) is a typical linear programming task, so simplexmethod can be recommended here to solve it.

155

5. CONCLUSIONS

In this paper we have formulated a new optimization problem of flow allocationnetwork protected by p-cycles. The presented model can be employed to developexact as well as heuristic algorithms depending on their destination and numericalproperties.

Optimization problem (3)–(8), considered for each fixed set of p-cycles, is themathematical programming case with linear goal function and linear as well asnon-linear constraints, with 0/1 and continuous decision variables (so called non-linear mixed mathematical programming). Possibility of solving this problem(e.g. through linearization, relaxation and/or dual approach) is an open taskat this stage of the research. In this paper we propose the following intuitivetwo-level solution algorithm: (1) upper level searches for optimal set of p-cycles,(2) lower level – for each fixed set of p-cycles we need to solve the optimizationproblem (3)–(8). We propose the algorithmic components used on the lower level,namely multi-knapsack, linear programming and multicommodity flow. Our fu-ture research will be concentrated on implementation and testing of the proposedalgorithm.

REFERENCES

[1] CHANG L. and LU R. Finding good candidate cycles for efficient p-cyclenetwork design. In Proc. of 13th International Conference on ComputerCommunications and Networks, pp. 321–326, October 2004.

[2] CHOLDA P., JAJSZCZYK A. and WAJDA K. The evolution of the re-covery based on p-cycles. In Proc. of the 12th Polish Teletraffic SymposiumPSRT2005, pp. 49–58, Poznan, Poland, September 2005.

[3] CORMEN T.H., LEISERSON C.E., RIVEST R. L., and STEIN C. Intro-duction to Algorithms, Second Edition. The MIT Press, 2001.

[4] DOUCETTE J., HE D., GROVER W. D. and YANG O. Algorithmic ap-proaches for efficient enumeration of candidate p-cycles and capacitated p-cycle network design. In Proc. of Fourth International Workshop on Designof Reliable Communication Networks, pp. 212–220, October 2003.

156

[5] EVEN S., ITAI A. and SHAMIR A. On the complexity of timetable andmulticommodity flow problems. SIAM Journal on Computing, vol. 5(4), pp.691–703, 1976.

[6] GANGXIANG S. and GROVER W.D. Extending the p-cycle concept topath segment protection for span and node failure recovery. IEEE Journalon Selected Areas in Communications, vol. 21(8), pp. 1306–1319, October2003.

[7] GLOVER F. and LAGUNA M. Tabu search. Kluwer Academic Pusblishers,Massachusetts, USA, 1997.

[8] GROVER W. D. Mesh-Based Survivable Networks: Options and Strategiesfor Optical, MPLS, SONET, and ATM Networking. Prentice Hall PTR, NewJersey, USA, August 2003.

[9] GROVER W. D. and STAMATELAKIS D. Cycle-oriented distributed pre-configuration: Ring-like speed with mesh-like capacity for self-planning net-work restoration. In Proc. of ICC 1998 IEEE International Conference onCommunications, pp. 537–543, Atlanta, Georgia, USA, June 1998.

[10] KANG B., HABIBI D., LO K., PHUNG Q.V., NGUYEN H.N. and RAS-SAU A. An approach to generate an efficient set of candidate p-cycles inwdm mesh networks. In Proc. of APCC ’06. Asia-Pacific Conference onCommunications, 2006, pp. 1–5, Busan, South Korea, August 2006.

[11] KELLERER H., PFERSCHY U. and PISINGER D. Knapsack Problems.Springer Verlag, 2004.

[12] LI W., DOUCETTE J. and ZUO M. p-cycle network design for specifiedminimum dual-failure restorability. In Proc. of ICC 2007. IEEE Interna-tional Conference on Communications, pp. 2204–2210, Glasgow, Scotland,June 2007.

[13] LO K., HABIBI D., RASSANA., PHUNGQ., and NGUYEN H. N. Heuristicp-cycle selection design in survivable wdm mesh networks. In Proc. of ICON’06. 14th IEEE International Conference on Networks, pp. 1–6, Singapore,September 2006.

[14] LO K., HABIBI D., RASSAN A., PHUNG Q., and NGUYEN H. N. andKANG B. A hybrid p-cycle search algorithm for protection in wdm mesh

157

networks. In Proc. of ICON ’06. 14th IEEE International Conference onNetworks, pp. 1–6, September 2006.

[15] MARTELLO S. and TOTH P. Knapsack Problems: Algorithms and Com-puter Implementations. John Wiley & Sons, West Sussex, England, 1990.

[16] MUKHERJEE D. S., ASSI C., and AGARWAL A. Alternate strategies fordual failure restoration using p-cycles. In Proc. IEEE International Confer-ence on Communications, 2006, pp. 2477–2482, June 2006.

[17] PIORO M. and MEDHI D. Routing, Flow, and Capacity Design in Commu-nication and Computer Networks. Elsevier Inc., San Francisco, USA, 2004.

[18] SCHUPKE D.A. An ilp for optimal p-cycle selection without cycle enu-meration. In Proc. of Eighth IFIP Working Conference on Optical NetworkDesign and Modelling (ONDM 2004), Ghent, Belgium, February 2004.

[19] SMUTNICKI A. and WALKOWIAK K. Modeling flow allocation problemin mpls network protected by p-cycles. In Proc. of 42nd Spring InternationalConference on Modelling and Simulation of Systems, pp. 35–42, Hradec nadMoravici, Czech Republic, April 2008.

[20] STEIN C. Approximation Algorithms for Multicommodity Flow and ShopScheduling Problems. PhD thesis, S. M., Electrical Engineering and Com-puter Science Massachusetts Institute of Technology, 1992.

[21] WU B., YEUNG K. L., LUI K.S. and XU S. A new ilp-based p-cycle con-struction algorithm without candidate cycle enumeration. In Proc. of ICC2007. IEEE International Conference on Communications, pp. 2236–2241,Glasgow, Scotland, 2007.

[22] WU B., YEUNG K.L. and XU S. Ilp formulation for p-cycle constructionbased on flow conservation. In Proc. of Global Telecommunications Confer-ence, 2007. GLOBECOM ’07, pp. 2310–2314, Washington, DC, USA, 2007.

[23] XUE G. and GOTTAPU R. Efficient construction of virtual p-cycles pro-tecting all cycle-protectible working links. In Proc. of 2003 Workshop onHigh Performance Switching and Routing, pp. 305–309, Torino, Italy, 2003.

[24] ZHANG H. and YANG O. Finding protection cycles in dwdm networks. InProc. of IEEE International Conference on Communications, pp. 2756–2760,2002.

158


Keywords: Mesh, task allocation, simulation

Małgorzata SUMISŁAWSKA* Maciej GARYCKI*

Leszek KOSZAŁKA*

Keith J. BURNHAM** Andrzej KASPRZAK*

EFFICIENCY OF ALLOCATION ALGORITHMS IN MESH ORIENTED STRUCTURES DEPENDING ON PROBABILITY

DISTRIBUTION OF THE DIMENSIONS OF INCOMING TASKS

In efficient utilizing of multi-processing units and computers connected in clusters a proper task allocation is needed. In this paper, four algorithms for task allocation are compared such as First Fit (FF), Stack Based Algorithm (SBA), Better Fit Stack Based Algorithm (BFSBA) and a new one Probability Distribution Stack Based Algorithm (PDSBA) – that is based on SBA. Comparison of the algorithms is provided using dedicated experimentation environment, designed and implemented by the authors.

1. INTRODUCTION.

Nowadays computational strength of a single processor unit is insufficient in advanced calculations or data management in corporations. Thus, multi-computers and computer networks are utilized. Advanced processing requires CPUs in multi-processing units or clusters of computers connected using mesh topology [5]. The mesh topology in this context is commonly used in data processing centres and research facilities of Internet Service Providers.

The objective of this paper is to compare several mechanisms of the allocation process. The second section explains basic terms and issues related to the subject. In Section 3 we discuss commonly used allocation algorithms. The new proposed

* Department of Systems and Computer Networks, Wroclaw University of Technology, Poland. ** Control Theory and Applications Centre, Coventry University, UK.

159

algorithm is presented subsequently. Section 5 describes designed and implemented experimentation system. Section 6 reports the results of the investigation. The final remarks are given in the last section.


In this section, the discussed problem is formulated. At first, we define a two-dimensional mesh (Fig. 1) that is a structure of nodes connected in the following way: (i.) most of the nodes are connected with four neighbours,

(ii.) border nodes have three adjacent executors, (iii.) four nodes are situated in the corner having two neighbours. The mesh

M(W, H) is a rectangular structure containing W×H nodes, where W denotes the width and H describes the height of the grid.

A submesh S(i, j, k, l) is a rectangular subgrid of the mesh, where (i, j) refers to the left top corner of the submesh and k and l are its width and height respectively (Fig 1).

Fig. 1. Submesh S(3,2,4,3) of a mesh M(7,5)

Each task is represented by a rectangular grid (wj, hj), which is allocated on the mesh. The newly allocated task cannot overlap any other task in the mesh. Each task has its own Time To Live (duration time) after which is removed from the grid and the released nodes can be reused.

Allocation is a process of placing an incoming task into a mesh. There are several efficiency indexes of allocation algorithms defined: the runtime of an algorithm, the fragmentation factor of the mesh or the percentage of allocated tasks. In case of tasks with a short duration time (comparable to the runtime of an algorithm) it is essential to minimize the runtime of an algorithm, in contrast to the tasks which Time To Live (TTL) is much longer than runtime of an algorithm. In this paper, we focused on the second case. Further, we analyse some well known methods that are based on Stack Based Algorithms (SBA) (see [1], [2], [3]) and the new proposed algorithm called Probability Distribution Stack Based Algorithm (PDSBA).

160

Terms used in description of SBA-family algorithms are as follows. Base Node (BN) of each task is its left top node. For each task in the mesh the Forbidden Area (FA) is created. It is a submesh which nodes cannot be used as Base Nodes because the newly allocated task overlap another task in the grid. A Rejected Area (RA) refers to an L-shaped area covering the right and bottom nodes of the mesh. If a Base Node were placed in RA, the task would cross the boundary of a mesh. Base Block (BB) refers to a submesh where Base Node of an incoming task can be placed, thus, the task does not overlap any other task allocated in the mesh. BB is selected from the group of Candidate Blocks. Initial CB is obtained by subtracting RA from the whole mesh. Then by subtracting a Forbidden Area from one Candidate Block we create one to four new CBs.

3. ALGORITHMS

The allocation process of SBA algorithm begins when a new task arrives. Since RA is subtracted from the mesh, creating the Initial Candidate Block (ICB), the remaining area of the mesh is being analysed. The ICB is placed on the stack together with the first FA from the list. If there are no FA, the whole ICB becomes the BB. Otherwise the algorithm starts to run in a loop. The FA and CB are taken from the stack and one to four new CB are created by subtracting FA from CB. New CBs are placed on the top of the stack together with the next FA from the list. The whole process recurs until the end of the list of FAs. If there are no FAs in the list to put on the stack, one of CB created with the last FA becomes BB.

Better Fit Stack Based Algorithm (BFSBA), which is a modification of SBA, selects the CB with minimal height and minimal horizontal position. In comparison to SBA, BFSBA processes all CBs that are on the stack to choose the appropriate CB. The proposed PDSBA is described broader in the next section.

The discussed algorithms are compared with the First Fit (FF) algorithm, which allocates a task in the first discovered submesh which does not overlap any allocated process. FF starts from the top left corner node and continues to check the executors one after another, row by row until it discovers the one which might become BN.

4. PDSBA ALGORITHM

The proposed algorithm PDSBA is similar to BFSBA, but the probability distribution of incoming tasks size is being recognized as the new tasks come. The difference between BFSBA and PDSBA is that, while using BFSBA an incoming task

161

is being allocated in CB with the minimal height, the proposed algorithm selects CB, which height is a multiple of a size of task, which is most probable to come. The scheme of PDSBA is illustrated in Fig. 2.

Fig. 2. The scheme of PDSBA

162

5. THE EXPERIMENTATION SYSTEM

The algorithms are evaluated using the experimentation system developed in Borland C++ Builder 6.0 Personal Edition. The simulation environment works under Windows XP. In the proposed system, a new task appears in each iteration and it has to be allocate. The main idea of the system is presented in Fig. 3.

Fig. 3. Scheme of the application

Now, we describe steps of a single iteration of the simulator (the experimentation sytem) (Fig. 3):

Step 1 Arrival of a new task. Step 2 Searching for BB. Step 3 If BB is discovered, the structure is allocated in the upper left corner of BB. Step 4 If allocation is impossible, it is examined if the queue is full. Step 5 If the queue is not full, the task is being placed at the end of the queue. Step 6 If the queue is full, the task is rejected. Step 7 TTL of each allocated task is decreased.

163

Step 8 If any task is completed (its TTL equals zero) it is removed from the structure. If not, go to Step 11. Step 9 The queue is explored with the aim of discovering task, which fits to the space left by executed task. Step 10 The task selected from the queue is allocated in the mesh using the algorithm chosen before the simulation started. Step 11 When the new task arrives, go to Step 2. Step 12 The simulation stops when all planned tasks are processed. The system has a possibility to simulate the situation where several executors are inactive, thus, there are discontinuities in the mesh structure.

Input parameters of the simulator are (Fig. 4): • the selected algorithm, • the mean and variance of the task size, • the width and height of the mesh structure, • the number of incoming tasks.

The size of the queue is one fifth of the whole number of the planned tasks. The indexes of performance of the algorithms are:

• the number of tasks allocated as soon as they appear, • the number of all allocated tasks, • the number of rejected tasks, • the average time the task spends in the queue, • the fragmentation index in the function of the time.

If there are no tasks in the queue, the fragmentation index equals zero. If the queue is not empty, the fragmentation index equals the percentage of idle executors within the entire mesh.

Fig. 4. Input and output parameters of the system

164

6. EXPERIMENTS

The purpose of the experiments is to determine the utility of each algorithm for various settings. Two stages of the experiment were performed. In the first one, we examine a full mesh (all the nodes active). The investigations in the stage #2 checked the efficiency of the algorithms in case when several nodes are inactive what simulated shutdown or collapsed executors. We provide 4 experiments in Stage #1 and 2 in Stage #2. Each simulation was 25 times for each allocation algorithm that is 100 simulations in total.

Stage #1, Full mesh:

Experiment #1. Full mesh (all the nodes active). Grid size: 20×20. 100 tasks to appear. Mean size of a task equals 7, standard deviation = 2 in the first case and 5 in the second case. Examining the whole number of allocated and rejected tasks, and the number of tasks allocated as soon as they arrived.

Experiment #2. The same input parameters as in Experiment #1 (full mesh, grid size: 20×20, 100 tasks to appear, mean dimension of a task = 7, standard deviation = 2 in the first case and 5 in the second case). Examining the average time, the task spends in the queue before it is allocated.

Experiment #3. Full mesh. Grid size: 20×20. 100 tasks to appear. Mean size of a task equals 7, standard deviation = 2. The dependence between the number of queued tasks and the fragmentation index of the mesh in function of the iteration number is examined.

Experiment #4. The same environment parameters as in Experiment #1, case 1 (grid size: 20×20, 100 tasks to appear, the mean task size = 7, standard deviation = 2). The mesh fragmentation indexes in function of the iteration number for each algorithm are compared.

Stage #2, Partial mesh:

Experiment #5. Partial mesh (several nodes inactive). Grid size: 20×20. 100 tasks to appear. Mean size of a task equals 7, standard deviation = 2 in the first case and 7 in the second case. Examining the whole number of allocated and rejected tasks, and the number of tasks allocated as soon as they arrived.

Experiment #6. The same parameters as in Experiment #4 (partial mesh, grid size: 20×20, 100 tasks to appear, mean dimension of a task = 7, standard deviation = 2

165

in the first case and 7 in the second case. Examining the average time, the task spends in the queue before it is allocated.

Next, we provide the results of the described experiments.

Experiment #1. The aim of the first simulation was to prove that the PDSBA, in comparison to the other algorithms, provides the best results when the task sizes are more predictable. In relation to the mean task size of 7, the standard deviation of 2 is low, thus, the probability density function is focused around the mean value.

PDSBA BFSBA

SBA FF

0

20

40

60

80

100

allocated as soon as the jobs appeared

allocatedrejected

PDSBA BFSBA

SBA FF

0

20

40

60

80

100

allocated as soon as the jobs appeared

allocatedrejected

Fig. 5. Efficiency of the algorithms according to the standard deviation of the incoming task sizes. The left chart illustrates the situation when the standard deviation equals 2 (rather predictable task sizes). The chart on the right depicts the case when the task size is less predictable (standard deviation equals 5).

When the task dimensions were more predictable, PDSBA provided the best results. Note that although in the second case the percentage of tasks allocated by PDSBA is lower than by SBA, the PDSBA allocated more tasks as soon as they come (without queuing them).

It is worthy noticing that although FF is least efficient in comparison to all SBA, it provides better results when the scatter of task dimensions is high.

Experiment #2. Investigating the mean queuing time was the reason this experiment was performed. The shorter the queuing time is, the higher probability, that the queuing cache will not become full and reject incoming processes, thus, more tasks will be allocated.

166

PDSBA BFSBA

SBA FF

0

2

4

6

8

10

12

meam queuing time

PDSBA BFSBA

SBA FF

0

2

4

6

8

10

12

meam queuing time

Fig. 6. Mean queuing time for each of the examined algorithms according to the standard deviation of the incoming task sizes. The left chart illustrates the situation when the standard deviation equals 2 (rather

predictable task sizes). The chart on the right depicts the case when the task size is less predictable (standard deviation equals 5).

This experiment exhibits the differences between PDSBA, BFSBA and SBA. The chart on the left depicts the situation, when the standard deviation of incoming task sizes is relatively small, thus, the dimensions of incoming processes are more predictable. In this case the PDSBA returns the best results. On the second chart (relatively high standard deviation of task sizes), one can notice that although all algorithms based on stack behave very similar, the classic SBA is the most efficient.

The behaviour of the FF algorithm shows that its efficiency is far less than the other mentioned algorithms. In both cases almost the same queuing time of FF can be observed, so the conditions when the task sizes varies significantly does not influence on the behaviour of the FF algorithm.

Experiment #3. The goal of the experiment is to examine the dependence between the number of queued tasks and the fragmentation index of the mesh in function of time.

0 10 20 30 40 50 60 70 80 90 100

0

5

10

15

20

25

30

35

queued fragmentation

0 10 20 30 40 50 60 70 80 90 100

0

5

10

15

20

25

30

35


Fig. 7. The number of queued tasks and the fragmentation index for SBA (left) and FF (right)

167

0 10 20 30 40 50 60 70 80 90 100

0

5

10

15

20

25

30

35


0 10 20 30 40 50 60 70 80 90 100

0

5

10

15

20

25

30

35


Fig. 8. The number of queued tasks and the fragmentation index for BF SBA (left) and PDSBA (right)

Due to the definition of the mesh fragmentation used in this paper, the value of that indicator reliably determines the efficiency of the algorithm after a certain number of the tasks have appeared. After the first rejection of a task, the percentage of idle executors is high, what increases the fragmentation index, although numerous tasks may be allocated in the free space. The examined performance indicator became stable when about 60 – 70 tasks have come. Dependence between the fragmentation index and the number of queued tasks may be derived from Fig. 7. and Fig. 8. As the incoming task is not allocated, the percentage of idle nodes does not decrease.

Experiment #4. The goal was to compare the mesh fragmentation in function of the iteration number for each examined algorithm. The environment has been chosen with the same parameters as in Experiment #1, case 1 (grid size: 20×20, 100 tasks to appear, mean task size = 7, standard deviation = 2).

0 10 20 30 40 50 60 70 80 90 100

0

5

10

15

20

25

30

35

PDSBA BFSBA SBA FF

Fig. 9. Mesh fragmentation for each of examined algorithms in function of the number of iteration

168

The shape of fragmentation index function is similar for all examined algorithms. After the arrival of about 60 tasks, the fragmentation index became stable. The percentage of vacant tasks was least for BFSBA, although in the first phase of the simulation (before 30 tasks have appeared) PDSBA provided the lowest fragmentation index.

Experiment #5. The purpose of this experiment was to compare the efficiency of PDSBA to the other mentioned algorithms (SBA and derivatives and FF) for the environment of a mesh structure with randomly removed nodes. Other conditions (the mean size of incoming tasks and their dimensions) remain the same as in the first experiment.

As one can notice, the efficiency of each algorithm in that kind of environment is rather different from one observed in Experiment #1, where the mesh was continuous (without inactive nodes). In this case the PDSBA and SBA are almost equally efficient. When the variance of task dimensions is low in comparison to their mean size, the BFSBA gives worse results than the other algorithms based on stack, but its efficiency increases with the variance of the task sizes and in the second case BFSBA appear to be very efficient.

PDSBA BFSBA

SBA FF

0

20

40

60

80

allocated as soon as the jobs appearedallocatedrejected

PDSBA BFSBA SBA FF

0

20

40

60

80

100

allocated as soon as the jobs appearedallocatedrejected

Fig. 10. Efficiency of all investigated algorithms according to the standard deviation of the incoming task size for environment of a mesh structure with randomly removed nodes. The chart on the left

illustrates a situation when the standard deviation of the task dimensions is relatively small in comparison with their mean value. The chart on the right depicts a case, when the incoming task sizes differ

significantly from their mean size.

The FF algorithm in both cases gives unacceptable results. Although in the environment with continuous mesh its efficiency was rather small according to performance of the rest of investigated algorithms, it is totally inapplicable to the environment of partial mesh topology (most of the tasks were rejected).

169

Experiment #6. The aim of this experiment was to examine the average time the process spend in the queue before it is allocated for the environment with partial mesh (some nodes removed or inactive) and compare the results for relatively predictable task sizes and significantly various task sizes.

The results of this experiment shows, that the queuing time of tasks waiting to be allocated for all algorithms is comparable when the mesh is not full and the sizes of incoming tasks are predictable (small variance of the task dimensions). In that case the SBA performed the longest queuing time.

PDSBA BFSBA

SBA FF

0

2

4

6

8

10

12

mean queuing time

PDSBA BFSBA

SBA FF

0

2

4

6

8

1012

14

meam queuing time

Fig. 11. Mean queuing time for each of the examined algorithms according to the standard deviation of the incoming task sizes for the environment with partial mesh. The left chart illustrates the situation

when the standard deviation equals 2 (rather predictable task sizes). The chart on the right depicts the case when the task size is less predictable (standard deviation equals 5).

When the sizes of incoming processes were less predictable (higher variance of the task sizes), the results were significantly different. The shortest queuing time is a feature of the PDSBA algorithm, the FF algorithm performs the less satisfying queuing time.

7. CONSLUSIONS AND PERSPECTIVES

There is no such allocation algorithm among the analysed cases, that is the best in all cases. This research resulted in specifying fields of usage, in which the examined algorithms give the best results. Table 1 summarises the results of the experiments. The conditions for which the specific algorithm is most effective has been defined roughly. The formula to define the best algorithm for the given input parameters is to be set up in the future.

170

Table 1. Application of the examined algorithms

Situation

Recommended algorithm Full mesh

Variance of task sizes

yes low PDSBA

yes high SBA

no low BFSBA

no high PDSBA, SBA

REFERENCES

[1] LISOWSKI D., Analiza efektywności wykorzystania algorytmów alokacji zadań w sieciach o topologii siatki, Wrocław, Politechnika Wrocławska, 2004.

[2] KUBIAK M., Analiza wydajności dynamicznych algorytmów alokacji zadań w systemach zorientowanych siatkowo, Wrocław, Politechnika Wrocławska, 2006.

[3] KOSZAŁKA L., LISOWSKI D., PÓŹNIAK-KOSZAŁKA I., Comparison of allocation algorithms for mesh structured networks with using multistage simulation.

[4] BANI-MOHAMMAD S., OULD-KHAOUA M., ABABNEH I., MACKENZIE L., non-contiguous processor allocation strategy for 2D mesh connected multicomputers based on sub-meshes available for allocation, In, Proceedings of the 12th International Conference on Parallel and Distributed Systems, 12-15 July 2006.

[5] Blue Gene Project, http://www.research.ibm.com/bluegene/index.html, 2005.

[6] YOO B.-S., DAS C.-R., A fast and efficient processor allocation scheme for mesh-connected multicomputers, IEEE Transactions on Parallel & Distributed Systems.

[7] ABABNEH I., An efficient free-list submesh allocation scheme for two-dimensional mesh-connected multicomputers, Journal of Systems and Software.

171


Keywords: MPI, parallel processing, benchmark, implementation

Bolesław TOKARSKI∗

Leszek KOSZAŁKA∗

Piotr SKWORCOW†

SIMULATION BASED PERFORMANCE ANALYSIS OFETHERNET MPI CLUSTER

This paper considers the influence of network aspects and MPI implementation on the performanceof an Ethernet-based computer cluster. Following factors are considered: message size, number ofnodes, node heterogeneity, network speed, switching technology and MPI implementation. The primaryindex of performance is throughput measured with Intel MPI Benchmark. It was found that there is aspecific message size that is optimal for each cluster configuration. The most important factors werefound to be the network speed and MPI implementation.

1. INTRODUCTION

Nowadays CPUs use a multi-core architecture. This means that single-threaded,single-process applications can not make use of more than one core at a time. As aresult, multithreaded applications are commonly used in desktop PCs. However, largedata computing centres use multiple server computers interconnected typically with Eth-ernet. More information about Ethernet, its design and protocols can be found in [1].Multithreading alone cannot overcome the obstacle of network interconnection. This isthe field of Message Passing Interface (MPI).

MPI is de facto standard library used for large scale computation. It is used in super-computers to provide fast and reliable data exchange between nodes. All supercomputerslisted on Top500 Supercomputing Sites [2] use MPI as their communication library. SunMicrosystems in its book about Multithreading for Solaris [3] mentions a combinationof multithreading and RPC (Remote Procedure Call – see [4]) as a method of extending

∗Department of Systems and Computer Networks, Wrocław University of Technology, Poland.†Water Software Systems, De Montfort University, Leicester, UK.

172

a multithreaded program to run over the network. The book also points Sun’s Cluster-Tools, based on OpenMPI as a more effective approach for distributed systems. The MPIstandard is defined on the MPI Forum in [5] and [6]. It consists of a set of function callsneeded to ensure a consistent communication between nodes in a multi-computer clus-ter. The MPI library was designed to address supercomputers consisting of thousands ofCPUs.

The objective of this paper is to find the factors influencing the performance ofEthernet-based MPI network cluster. Performance tests were conducted on low-costPCs. In this paper the tests were focused mainly on comparison of MPI implementationsfrom Sun, LAM and MPIch. More investigations on tuning a network-based clustercan be found in e.g. [8]. Along with optimisation methods a comparison of MPIch2,GridMPI, MPIch-Madeleine and OpenMPI can be found in that work. Other MPI testswere conducted by Brain VanVoorst and Steven Seidel [7] with comparison of MPI im-plementations on a shared memory machine.

Efficiency is measured using theSendRecvtest from Intel’s MPI Benchmarks [9].Following factors are evaluated:

• cluster heterogeneity and the presence of a master node,

• number of nodes in the cluster,

• interconnection speed,

• switching algorithm,

• MPI implementation.

The paper is organised as follows. In section 2 the system used for experiments isdiscussed, including hardware and software details. Section 3 contains the results ofthe tests mentioned above. Section 4 lists other factors that might influence the cluster.Section 5 contains conclusions from the experiments. Section 6 shows perspectives onfuture work with MPI.


2.1. HARDWARE

The computer cluster used in this paper consists of a set of 10 computers, 8 of whichare IBM NetVista 6579-PCG – Pentium III-based, 866 MHz with 128 MB of RAM and

173

1 Fast Ethernet card. The detailed specs of the nodes are available from IBM [10]. Themaster node is a brandless Pentium IV 3 GHz with 1.5 GB of RAM and 3 Fast Ethernetcards (bonded). The slow node is a brandless Pentium 166 MHz with 64 MB of RAMand 1 Fast Ethernet card. The slow node is used to test the efficiency of a heterogeneouscluster.

Two different devices were used for the interconnection. The first one is a UB Net-works 10 Mbps Half-duplex hub, the second is a 3com SuperStack II 3300 – 100 MbpsFull-duplex, intelligent switch, used in 10 mbps mode.

2.2. SOFTWARE

The following software is used in the cluster: Debian Linux 4.0,MPI libraries –OpenMPI 1.2.4 and Intel MPI 3.1. As a test suite Intel MPI Benchmarks 3.1 was used;graphs in the further part of the paper illustrate the results ofSendRecvtest from thesuite. The mechanism of this test is thoroughly discussed in the documentation [9].In summary, the test uses the MPI function callMPI Sendrecv. Each node sends amessage of a given size to its “right” neighbour and receives a message from its “left”neighbour. The time measured is the time between the start of sending the message andreceiving it by the last node in the cluster. Throughput is expressed by Equation (1).

throughput =2 ·messagesize

time(1)

To obtain meaningful results each test was repeated 1000 times. The charts containaverage results. It was assumed that during the test no unnecessary services were runningon the nodes.

3. TEST RESULTS

3.1. EXPERIMENT #1 – CLUSTER HETEROGENEITY

Design of the experiment

The following hardware configurations were used for this test:

• IBM NetVista computers (Pentium III 866 MHz),

• 8 IBM NetVista computers + master node (Pentium IV 3 GHz),

• 8 IBM NetVista computers + master node + slow node (Pentium 166 MHz).

174

Other conditions: Intel MPI library, interconnected with 100 mbps switch in intelli-gent mode, master node connected with 3 bonded links.

Experiment results

8 standard + master + slow node8 standard + master node

8 standard nodes

Number of Bytes Transferred

Thr

ough

put

inM

byte

s/se

c

4M1M256k64k16k4k1k256641641

24

22

20

18

16

14

12

10

8

6

4

2

0

Fig. 1. Measured throughput of clusters consisting of a) 8 standard nodes b) 8 standard + 1 master node c)8 standard + 1 master + 1 slow node

Figure 1 illustrates the throughput of the tested set. It shows that for message sizeup to 32 KB the best results are obtained for a homogeneous cluster. However, it seemsthat there is an additional burden of synchronising the nodes when the size of messageexceeds 32 KB. Without the master node, performance of the cluster drops from 21 MB/sto 13 MB/s, when the message size is 128 KB.

Adding a slow node leads us to having half the throughput of a homogeneous clusterwith a master node. This would result in a huge slowdown in applications that requirenode synchronisation or assume equal computing power of nodes. However, one needsto note that the real slowdown seen here does not affect certain types of computation,i.e. so called “embarrassingly parallel” tasks.

175

3.2. EXPERIMENT #2 – NUMBER OF CLUSTER NODES


In this test we used a master node plus 1, 4 or 8 standard nodes interconnected withone of the following:

• UB Networks hub in 10 mbps half duplex mode

• 3Com switch in 100 mbps full duplex mode

• 3Com switch in 10 mbps half duplex mode

Other conditions: Intel MPI library, switch used in intelligent mode, master nodeconnected with 1 link.

Experiment results

Figure 2 illustrates how the cluster behaves for a different number of nodes and dif-ferent network interconnection. As one can see, the 10 mbps hub easily gets overloadedwith data. While throughput can reach 1 MB/s with 2 nodes, it drops to 0.4 MB/s with5 nodes and to 0.2 MB/s on 9 nodes. 100 mbps used with full duplex serves well for thecluster; the throughput is nearly the same for any number of nodes. On the 5 and 9 nodegraph one can observe a drop of throughput when the message size exceeds 128 KB.One can notice such drop also in Figure 1. The reason for this has not been determined;it may be related to the network stack or the library implementation.

3com switch used in 10 mbps half duplex mode resulted in improved throughput,compared to the hub. Cluster throughput did no depend on the amount of nodes. How-ever, two anomalies have been observed. One is in thea part of the graph – the hubseems to give better results than the switch. The second can be seen in partsb andc –there is a drop of throughput when the message size equals 8 KB or 16 KB. Comparingthe results seen on partc of Figure 2 one can notice that using a 10 mbps switch resultedin four times higher throughput compared to a 10 mbps hub, while using a 100 mbps fullduplex switch gives over 20 times better throughput than a 10 mbps half duplex switch.

176

9 nodes - 100 mbps switch9 nodes - 10 mbps switch

9 nodes - 10 mbps hub

c) Throughput of 9 node cluster


Th

rou

gh

pu

tin

Mb

ytes

/sec

4M1M256k64k16k4k1k256641641

100

10

1

0.1

0.01



b) Throughput of 5 node cluster


Th

rou

gh

pu

tin

Mb

ytes

/sec

4M1M256k64k16k4k1k256641641

100

10

1

0.1

0.01



a) Throughput of 2 node cluster


Th

rou

gh

pu

tin

Mb

ytes

/sec

4M1M256k64k16k4k1k256641641

100

10

1

0.1

0.01

Fig. 2. Measured throughput of cluster with 10 mbps hub/switch and 100 mbps switch consisting of a) 2nodes b) 5 nodes c) 9 nodes

3.3. EXPERIMENT #3 – SWITCHING ALGORITHMS


In this test the switch interconnecting the nodes was using one of the followingswitching algorithms:

• Store-and-forward

• Fragment-free

• Fast-forward

• Fast-forward with 3 bonded links on the master node

Conditions: 8 standard and 1 master node, OpenMPI library, interconnected with 100mbps switch, master node connected with 1 link.

177

Experiment results

Master node with 3 links in Fast-forwardFast-forward

Fragment-freeStore-and-forward


Thr

ough

put

inM

byte

s/se

c

4M1M256k64k16k4k1k256641641

24

22

20

18

16

14

12

10

8

6

4

2

0

Fig. 3. Comparison of clusters interconnected with changing switching algorithm switch

In this experiment the OpenMPI library was used, hence the curveshown in Figure 3is different from the one in Figure 1. There is, however, a similar drop of throughput for64 KB gets worst value. Figure 3 also shows that it is better to have a faster connectionfor the master node.

One would expect to see that Fast-forwarding algorithm should give the best results.This, however, was found not to be true for the considered cluster, as shown in Figure3. Moreover, Store-and-forward shows better results than the other switching techniquesconsidered.

3.4. EXPERIMENT #4 – MPI IMPLEMENTATION


In this test the MPI implementation used was one of the following:

178

• Intel MPI,

• OpenMPI,

• MPIch,

• LAM-MPI.

Other conditions: Constant number of 8 standard and 1 master node interconnectedwith 100 mbps switch in intelligent mode, master node connected with 3 bonded links.

MPIchLAM MPIOpenMPIIntel MPI


Thr

ough

put

inM

byte

s/se

c

4M1M256k64k16k4k1k256641641

24

22

20

18

16

14

12

10

8

6

4

2

0

Fig. 4. Comparison of clusters based on different MPI implementations

Experiment results

This final experiment investigated how different MPI implementations impact theperformance of the cluster. Figure 4 shows that LAM-MPI and Intel MPI achievedthe highest throughput, with LAM-MPI being marginally better than Intel MPI for all

179

message sizes considered except 256 KB. OpenMPI produced average results, with 25%loss compared to its rivals in its weakest point, i.e. for message size 64 KB. On the otherhand, it was the best performer when used with 512 KB message size. MPIch producedthe worst results for all message sizes considered.

It is worth mentioning that during the tests none of Intel MPI, OpenMPI or MPIchneeded to use swap file or other HDD data. Only LAM-MPI, when testing 512 KBmessage size, needed to use the PC’s HDD; it did not seem to affect the test result,though.

4. CONCLUSIONS

In this paper, the most important factors influencing the performance of Ethernet-based network cluster were evaluated experimentally and discussed. The findings can besummarised as follows:

• The most important factor was found to be the linking device used for the inter-connection. Upgrading to a better switch can result in 20 times higher throughput.If a high-end cluster is needed, one should consider InfiniBand, MyriNet or otherproprietary solution.

• The second factor is the heterogeneity of the cluster. One needs to assume thatthe speed of the cluster depends on the slowest machine. Heterogeneous cluster isineffective, hence it is better to upgrade all CPUs or memory in the cluster ratherthan upgrade just a part of it.

• It was observed that using an efficient MPI implementation speeds up calculations.MPI implementations are targeted and optimised for specific areas. For example,MPIch performed poorly in the tests considered in this paper, but performed muchbetter on a shared-memory system considered in [7].

• When writing an application using MPI, it is essential to choose the right messagesize for the cluster. For the considered cluster, this was found to be 16 KB forMPIch, 1MB-4MB for OpenMPI and 128 KB for LAM-MPI and Intel MPI. Whenthe application’s target MPI implementation is not known, it is recommended notto use message sizes over 16k.

180

REFERENCES

[1] SPURGEON C.E.,Ethernet: The Definitive Guide, O’Reilly Media, Inc., Sebastopol, CA,first edition, 2000.

[2] TOP500.Org,Top 500 supercomputing sites., Technical report, http://www.top500.org/, XI2007.

[3] Sun Microsystems,Multithreaded Programming Guide, Sun Microsystems, Inc., SantaClara, CA, first edition, 2008.

[4] BLOOMER J.,Power Programming with RPC, O’Reilly and Associates, Inc., Sebastopol,CA, first edition, 1992.

[5] Message Passing Interface Forum,Mpi: A message-passing interface standard (version1.1), Technical report, http://www.mpi-forum.org/, 1995.

[6] Message Passing Interface Forum,Mpi-2: Extensions to the message-passing interface,Technical report, http://www.mpi-forum.org/, 2003.

[7] VanVOORST B. and SEIDEL S.,Comparison of mpi implementations on a shared memorymachine, Technical report, 2000.

[8] MIGNOT J.C., PASCALE S.G., PRIMET V.B., GLUCK L.H.O.,Comparison and tuningof mpi implementations in a grid context, Technical report, Universite de Lyon, INRIA,LIP (Laboratoire de l’Informatique du Parallelisme), France, 2007.

[9] Intel GmbH, Bruehl, Germany,Intel MPI Benchmarks, Users Guide and MethodologyDescription, 3.1 edition, 2007.

[10] International Business Machines Corporation,Technical Information Manual, A20 Type6269, A40 Types 6568, 6578, 6648, A40p Types 6569, 6579, 6649, 1st edition, 2000.

181


Keywords: Intrusion Detection Systems, Data Mining, KDD-CUP 99

Michał ŻACZEK∗

Michał WOŹNIAK∗

APPLYING DATA MINING TO NETWORK INTRUSION DETECTION

This paper presents results of software implementation of classifying data mining algorithm – Naive Bayes – for detection of unwanted network traffic. Set of data created for KDD-CUP 99 competition was used to substitute a real network traffic. Series of tests shown significant dependencies of size and contents of training dataset, and algorithm efficiency. The paper points out difficulties in implementation and potential capabilities in data mining for network intrusion detection applications.

1. INTRODUCTION

Safety is the key issue for the growth of computer networks and the Internet. Every application that uses network connection may constitute a weakness in safety of a computer system. Comprehensive analysis of all ISO/OSI model layers [1] can be effective protection against the most recent attacks. This task is carried out by special Intrusion Detection Systems, IDS. Use of data mining algorithms [2] is relatively new approach to detection of intrusions. Data mining is a process of automated unveiling previously unknown and potentially useful patterns or schemes from large data repositories [4]. Large data repository can be a network traffic, e.g., IP packets, analysed in real time or intercepted and recorded in logs. Classification is one of data mining methods that is eagerly used for intrusion detection. In classification mapping of data into a set of predefined classes is found. Classification process contains three phases – building a model, testing, and prediction of unknown values. In the first phase a training set of examples is used for input data. Example is a list of descriptive attributes, i.e. descriptors, and a decisive attribute selected to define a class. Classifier

∗ Department of Systems and Computer Networks, Wroclaw University of Technology, Poland.

182

is the result; basing on values of descriptors, it assigns value of decisive attribute to each new example. To test is to use the classifier to classify known data and evaluate its functioning, whereas prediction involves using the classifier in practice. The system can be used in practice when percentage of correctly classified packets, i.e. classifying accuracy, is satisfactory. When classifying accuracy is too low, an attempt can be made to create classifier with different training datasets.

The purpose of this work is software implementation of a classifying algorithm and evaluation of its accuracy for detection of network intrusions.

2. TRAINING DATA AND TEST DATA

In the paper, we use existing dataset prepared for KDD-CUP 99 data mining competition [3]. The set was intended especially for training and testing IDS systems that utilize data mining. Sets were divided into training portion with decisive attributes defined, and test portion with decisive attributes to evaluate accuracy of the classifier. Data contain both, common traffic network and typical attacks as well. Convention accepted by KDD-CUP assumes that a single text line (example) is a set of attributes that describe one packet. Each packet has a decisive attribute, whose standard value is “normal”, while in case of an attack is described by the name of the attack. An example of a single packet from the dataset is shown below:

0,tcp,http,SF,334,1684,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,9,0.00,0.00,0.00,0.00,1.00,0.00,0.33,0,0,0.00,0.00,0.00,0.00,0.00,0.00, normal.

Each coma-separated number (or char string) is single descriptor – a descriptive attribute. Attributes relate to protocol, ports, services, amount of transferred data etc. Specific meaning of each descriptor is not relevant at this point. To simplify the analysis, and to avoid problems with too small number of examples, I decided to remove classes with less than 3000 examples from the training set. In the end I obtained training set that contained 682 462 examples in 5 classes, and test set containing 284 614 examples of the same classes. Table 1 and Table 2 show configuration of sets.

3. CLASSIFICATION ALGORITHM

Having certain knowledge on possibilities of classification algorithms [2][3][4], we have decided to implement Naive Bayes in the research as potentially accurate for IDS applications. Mathematically, the algorithm computes probability that certain example belongs to a specified class. Once the probability is known for all classes, a class with

183

the greatest probability is selected and the example is assigned to that class. To calculate probability of an example belonging to a certain class, Naive Bayes:

• analyses training data to calculate probability of occurrence of each example descriptor value in the class,

• assumes (“naively”) that occurrence of the given descriptor values are independent events,

• assigns multiplication product of occurrence of all descriptors values in the example to a occurrence of the given example in the class,

• takes occurrence of given example in the class as the multiplication product of events, consisting in occurrence of values of all descriptors from the example,

• uses events independency theorem to calculate probability that example belongs to the class; probability is the product of occurrence of all descriptor values in the example.

Table 1. Configuration of training set. Table 2. Configuration of test set.

CLASS NUMBER OF EXAMPLES

“ipsweep” 7579

“neptune” 204815

“normal” 250000

“satan” 5389

“smurf” 212363

If a given descriptor takes nominal values, probability is defined by frequency of occurrence of the given descriptor value in the class. For example training data contains 100 examples of “normal” class, 50 of which has “tcp” value of the given descriptor, then occurrence probability for this descriptor in “normal” class is 50/100 = ½. However, if a given descriptor can take values from a continuous domain, probability will be calculated with a normal distribution (value derived from the Gauss function). Parameters of a normal distribution – mean value and standard deviation – must be calculated earlier, based on the training data for each numerical descriptor of a given class.

3. IMPLEMENTATION

The Naive Bayes algorithm is implemented in C++ and it operates in Windows Vista/XP operating system environment. Classification throughput is reduced when training set increases – for the biggest set (682 462 examples), approximately 4000 examples per hour can be processed.

CLASS NUMBER OF EXAMPLES

“ipsweep” 300

“neptune” 58001

“normal” 60593

“satan” 1630

“smurf” 164090

184

4. RESEARCH

At the start, we had the whole training sets and test sets. For such configuration a single test would last for over 71 hours. It was unacceptable, due to many tests were carried out. Moreover, such big training set can not give the best results [4]. The most important purpose of these experiments was to find an optimal training set, i.e. a set with the smallest possible size that produces the greatest classification accuracy.

Experiment #1

For the first test series we prepared 5 training sets of different size, and the whole set max. Tables 3 and 4 show configuration of sets.

Table 3. Configuration of the training sets – Experiment #1

SET NAME tren_100 tren_1000 tren_2000 tren_5000 tren_10000 max NUMBER OF EXAMPLES

500 5000 10000 25000 50000 682462

Table 4. Configuration of the test file – Experiment #1

CLASS “ipsweep” “neptune” “normal” “satan” “smurf”

NUMBER OF EXAMPLES

300 1000 1000 1000 1000

Table 5. Test results – Experiment #1

TRAINING FILE tren_100 tren_1000 tren_2000 tren_5000 tren_10000 max OVERALL

CLASSIFICATION ACCURACY

45 % 67 % 76 % 76,23 % 76,28 % 68,5 %

The classification accuracy increases with the increasing amount of training data, but up to a certain size only. For the whole set it is lower than maximal. It is necessary to notice, however, that quantitative ratios of specific examples in the “max” set are different, thus, probably affecting accuracy. Accuracy for “ipsweep”, “normal”, “satan” and “smurf” classes was above 90%, which improves only slightly when bigger training files are used. The classification accuracy for “neptune” class reached 3% for the smallest set and remained unchanged. During analysis of the results file it turned out, that the classifier mistakes “neptune” examples with “satan” class.

185

Experiment #2

We have made an attempt to correct “neptune” class and modify tren_5000 set. We changed number of examples for “neptune” and “satan” classes.

Table 6. Configuration of training sets – Experiment #2

TRAINING FILE tren_A tren_B tren_C tren_D tren_E NUMBER OF

EXAMPLES, “neptune” 10000 5000 10000 10000 2500 NUMBER OF

EXAMPLES, “satan” 5000 10000 10000 2500 10000

Using the abovementioned sets does not bring any improvement – results are practically the same as those for the unchanged set. The last series of tests investigated accuracy of the classifier for the whole test file (Table 7).

Table 7. Results for tren_5000 training file and the whole test set

OVERALL RESULT

“ipsweep” “neptune” “normal” “satan” “smurf”

85,4 % 98 % 29,8 % 99,26 % 73,8 % 100 %

Even though numbers are greater, it is not to be taken for improvement, but rather more precise examination. Those results are the most conclusive classifier evaluation that were obtain. Accuracy tests for tren_5000 file modification do not introduce any significant change. Difficulties in correct distinction of “neptune” and “satan” classes may result from high correlation of descriptor values in examples of those classes.


Classification using own Naive Bayes implementation gives very promising results. Three out of five examined classes obtained results close to 100%. Let me emphasise excellent results for “normal” class. This is particularly important, since effectiveness in network traffic recognition is significant determinant of IDS system quality. Weaknesses in this field may draw the system into generating false alarms and heavily reduce practical importance of the solution. Additional tests with large datasets and other classes (types of attacks) are necessary before suitability of the algorithm for intrusion detection can be finally evaluated. In the future, Naive Bayes can be utilized for operation in a computer network.

186

REFERENCES

[1] BABBIN J., GRAHAM C., OREBAUGH A., PINKARD B., RASH M., IPS. Zapobieganie i aktywne przeciwdziałanie intruzom. Mikom, 2003. [2] FRANK E., WITTEN I. H., Data Mining. Practical Machine Learning Tools and Techniques (second edition). Elsevier, 2005. [3] http://kdd.ics.uci.edu/databases/kddcup99/task.html, October 2007. [4] http://wazniak.mimuw.edu.pl/index.php?title=Eksploracja_danych, October 2007.

187


Keywords: linear systems, bilinear systems, generalized predictive control

Ivan Zajıc†

Keith J. BURNHAM†

EXTENSION OF GENERALISED PREDICTIVE CONTROL TOHANDLE SISO BILINEAR SYSTEMS

The paper reviews the existing generalized predictive controller and its extension to handle a classof bilinear systems. The investigation also makes a case for the need for bilinear controllers in con-trast to the alternative adaptive linear model-based control approach for nonlinear systems. The paperhighlights this justification and demonstrates the potential of adopting the bilinear control approach ona particular example. A Monte Carlo simulation is carried out to evaluate and compare the differencesin performances between the investigated approaches.

1. INTRODUCTION

The generalized predictive control (GPC) algorithm has had a significant impact interms of recent developments in control, as the currently widely adopted three term PIDcontroller, when it become a popular choice as an industry standard. It is a popularmodel-based predictive control (MBPC) method and is being used in industry. The ap-proach was proposed and developed by Clarke et al. during the 1980’s, see [2].

The current well-known GPC utilises a linearised model of the real-word nonlinearsystem such that it allows for predicting the future values of the system output overa prediction horizon. Linear systems represent a small, but important subset of bilin-ear systems, and bilinear systems represent an important subset of the wider class ofnon-linear systems. Many real-world processes can be described using bilinear models.Bilinear systems are characterised by linear behaviour in both state and control whenconsidered separately, with the nonlinearity arising as a product of system state and con-trol [9]. These processes may be found in areas such as engineering, ecology, medicineand socioeconomics. Thus the adoption of bilinear models represents a significant step

†Control Theory and Applications Centre, Coventry University, Coventry, UK

188

towards dealing with practical real-world systems. Prompted by the fact that many pro-cesses may be more appropriately modelled as bilinear systems, as opposed to linearsystems, the development of bilinear MBPC strategies is justified. The use of bilinearGPC (BGPC) should yield better performance over the use of linear GPC when appliedto systems for which a bilinear model is more appropriate.

The paper considers the extension of linear GPC to handle a class of single-inputsingle-output (SISO) time invariant bilinear systems, and the development of BGPC al-gorithm. Several approaches for accommodation of the bilinearity within the BGPCscheme are considered [4, 11]. The focus of the paper will also be directed to the com-parison of the GPC within the self-tuning framework, i.e. self-tuning GPC (STGPC),where the linear model is updated at each time step, and BGPC. Monte Carlo simulationstudies are presented and comparisons are made.

2. GENERALIZED PREDICTIVE CONTROLLER

The idea of GPC is to minimise the variance of the future error between the outputand set point by predicting the long range output of the system and separating the knowncontributions to future output from the unknown contributions. In this way a vector offuture predicted errors can be used to generate a vector of future incremental controls,where only the current control input is applied to the plant. The aim is to minimise themulti-stage quadratic cost function defined as, see e.g. [1], i.e.

J =

Hp∑

j=Hm

[y(t+ j)− r(t+ j)]2 +

Hc∑

j=1

λ[∆u(t+ j − 1)]2 (1)

with respect to current and future values of the differenced control action∆u(t+ j − 1)over a maximum prediction horizonHp. The differencing operator is defined such that∆ = 1 − q−1, whereq−1 denotes the backward shift operator defined byq−1x(t) =x(t− 1) and t is the discrete time index. The minimum prediction horizon is denoted byHm and is defined such thatd 6 Hm 6 Hp, d is a normalised integer valued time delaywhich relates to the system time delay expressed as an integer multiple of the samplinginterval. TheHc denotes the maximum control horizon and in generalHc 6 Hp. Duringthe derivation of the GPC predictive control lawHc = Hp and the rather practical caseof Hc 6 Hp wil l be considered at the end. They(t + j) denotes the future values ofsystem output,r(t + j) is a sequence of future reference trajectory, where in [1] theweighted sequence of future reference trajectory is assumed instead. The cost weightingparameterλ is the energy constraint.

189

The first term in the quadratic cost function (1) corresponds to the requirement oftracking the reference signal, hence by minimising the sum of future squared trackingerrors. The second term in (1) is the weighted sum of the squared differenced controlactions, i.e. the energy constraint in order to achieve a realizable control. Note that thefuture values of the system output are required in (1). The exact values of the futureoutputs are unknown and only the optimal prediction of these can be obtained utilisingthe predictor design. In the case of GPC of Clarke [2], the predictor is based on a locallylinearised CARIMA (Controlled Auto-Regressive Integrated Moving-Average) model ofthe system having the following form

y(t) =B(q−1)

A(q−1)u(t− d) +

C(q−1)

∆A(q−1)e(t), (2)

wheree(t) denotes white normally distributed measurement noise and polynomialsA(q−1),B(q−1) andC(q−1) are defined as follows

A(q−1) = 1 + a1q−1 + . . .+ anaq

na, (3a)

B(q−1) = b0 + b1q−1 + . . .+ bnb

qnb , (3b)

C(q−1) = 1 + c1q−1 + . . .+ cncq

nc . (3c)

For simplicity only the caseC(q−1) = 1 is considered and the caseC(q−1) > 1 is in-vestigated in [3], where the inclusion of theC(q−1) polynomial increases the robustnessof the controller through the better filtering of the measured system output. Note that theCARIMA model structure is chosen for convenience since it allows for elegant inclusionof the integral action to the GPC so that the zero steady state error is guaranteed.

In order to minimise the multi-stage cost function (1) the future values of the systemoutput needs to be computed. The model (2) cannot be used directly for computing thefuture system output since the future values of noise are unknown, hence the predictoris required. Consider Diophantine equation

1 = ˜A(q−1)Ej(q−1) + q−jFj(q

−1), (4)

wherej = 1, . . . ,Hp denotes the prediction, polynomial˜A(q−1) = ∆A(q−1) and poly-nomialsEj(q

−1) andFj(q−1) are defined as

Ej(q−1) = ej,0 + ej,1q

−1 + . . .+ ej,nEjq−nEj , (5)

Fj(q−1) = fj,0 + fj,1q

−1 + . . . + fj,nFjq−nFj , (6)

wherenEj = j − 1 andnFj = na − 1. The predictionj = 1, . . . ,Hm − 1, whereHm = d here, corresponds to a so called simple prediction and is not considered in the

190

design of the GPC control law. The Diophantine equation can be computed recursively,see e.g. [2, 10]. Multiplying both sides of (2) by˜A(q−1)Ej(q

−1)qj leads to

˜A(q−1)Ej(q−1)y(t+ j) = Ej(q

−1)B(q−1)∆u(t+ j − d) + Ej(q−1)e(t+ j). (7)

Making use of (4) the known (current and past) and unknown (future) values of thesystem output can be separated, so that

(1− q−jFj(q−1))y(t+ j) = Ej(q

−1)B(q−1)∆u(t+ j − d) + Ej(q−1)e(t+ j). (8)

Since the best prediction (in the sense of minimising the squared prediction error) of thevalue of the white noisee(t+ j) is null, the predictor of the system output is then givenby

y(t+ j|t) = Fj(q−1)y(t) + Ej(q

−1)B(q−1)∆u(t+ j − d), (9)

wherey(t+ j|t) denotes the predicted system output based on the information availableup to and including timet. Note that the second term in predictor (9) consist of known(past) values, i.e.j = 1, . . . ,Hm − 1, and unknown (current and future) values, i.e.j = Hm − 1, . . . ,Hp, of the differenced control actions which are yet to be determined.These can be separated by utilising a Diophantine equation having the following form

Ej(q−1)B(q−1) = Gj(q

−1) + q−jGj(q−1), (10)

where the polynomials orders areng = j − d andng = d − 2 + nb, respectively, andj = Hm, . . . ,Hp so that the simple prediction is not assumed. The predictor (9) thentakes the form

y(t+ j|t) = Fj(q−1)y(t) +Gj(q

−1)∆u(t− d) +Gj(q−1)∆u(t+ j − d). (11)

The predictor (11) can be rewritten in more convenient matrix form

y = f +Gu, (12)

where the vectors of predicted outputs and control actions are given by

y = [y(t+ d|t), y(t+ d+ 1|t), . . . , y(t+Hp|t)]T (13)

andu = [∆u(t),∆u(t+ 1), . . . ,∆u(t+Hp − d)]T , (14)

191

respectively. The vector of known contributions toy which forms the free response ofthe system [8] is given by

f =[

Fj(q−1) Gj(q

−1)]

[

y(t)∆u(t− d)

]

(15)

and the Toeplitz lower triangular matrixG is defined as

G =

g0 0 . . . 0g1 g0 . . . 0...

.... . .

...gHp−d g(Hp−d)−1 . . . g0

, (16)

where the leadingj subscripts on the elements inG are omitted, since the diagonal(main and minor) elements are the same and not dependent onj. The cost function (1)can be written in matrix form as

JGPC = (y − r)T (y − r) + uTλu, (17)

where the vector of future set points (or reference signal) is defined as

r = [r(t+ d), r(t+ d+ 1), . . . , r(t+Hp)]T . (18)

The next step of deriving the GPC algorithm is to differentiate the cost function (17)with respect tou, i.e.

∂J

∂u=

[

(y − r)T∂

∂u(y − r)

]T

+

[

∂

∂u(y − r)

]T

(y − r)

+

[

uT ∂

∂uλu

]T

+

[

∂

∂uuT

]T

λu

=[

(y − r)TG]T

+ [G]T (y − r)

+[

uTλ]T

+ [I]T λu

=2GT (y − r) + 2λu. (19)

Substituting (12) for the vector of the predicted outputs in (19) leads to

∂J

∂u=2GT (f +Gu− r) + 2λu

=2GT (f − r) + 2(GTG+ λI)u, (20)

192

whereI denotes an identity matrix of appropriate dimension. The analytical solution ofthe cost function minimisation is obtained by setting∂J/∂u = 0, hence

GT (f − r) + (GTG+ λI)u = 0. (21)

Rearranging the expression (21) to solve for vectoru leads to the GPC algorithm

u =[

GTG+ λI]−1

GT [r− f ] , (22)

where only the first term of vectoru is applied to the plant, so that

u(t) = u(t− 1) + ∆u(t). (23)

Throughout the derivation of the GPC algorithm the control horizon has been set suchthatHc = Hp. However, the use ofHc 6 Hp is common in practice, which decreases thecomputational load. The control horizon is relatively simply implemented by reducingthe dimension of the lower triangular matrixG by considering only the firstHc columnsof G andthe dimension ofu is thenHc × 1. The corresponding weighting matrixλI isalso required to be suitably truncated. The matrix inversion in (22) for the special caseof Hc = 1, reduces to the division by a scalar, which is often used in practice due to easeof computation.

3. SELF-TUNING GPC CONTROLLER

Recognition that real-world nonlinear systems exhibit different behaviour over theoperating range, and locally linearised models are valid only for small regions about asingle operating point, has prompted the desire to extend the MBPC utilising a fixedlinearised model to a self-tuning framework. In self-tuning concept the linear modelsupon which the MBPC are based are required to be repeatedly updated as the system isdriven over the operational range of interest.

The STGPC algorithm has virtually the same structure as the GPC algorithm givenin (22). However, note that the matrixG and vectorf in (22) comprise of the modelcoefficientsai andbi, which are repeatedly updated, hence these must also be repeatedlyupdated at each time step. The linear model of the system on which the STGPC is basedis commonly estimated utilising the recursive least square (RLS) estimation technique,see e.g. [7], or any other appropriate on-line estimation technique might be used instead.

193

4. BILINEAR GPC CONTROLLER

Theuse of bilinear GPC increases the operational range of the controller over the useof the linear model-based GPC when applied to systems for which a bilinear model ismore appropriate. It is conjectured that, the bilinear controller makes better use of thedynamics of the system itself compare to his linear counterpart. A general single-inputsingle-output bilinear system can be modelled using a nonlinear auto-regressive movingaverage with exogenous inputs (NARMAX) model representation, i.e.

y(t) =

na∑

i=1

−aiy(t− i) +

nb∑

i=0

biu(t− d− i)

nb∑

i=0

na∑

l=1

ηi,ly(t− d− i)u(t − d− i− l + 1) +

nc∑

i=1

cie(t− i), (24)

where theai andbi are assumed to correspond to the linear CARIMA model (2) withtheηi,l being the discrete bilinear coefficients which are required to beidentified eitheron-line or off-line along with theai andbi [4].

The predictive control law is based on a bilinear model (24), which for the purpose ofobtaining an explicit solution to the multi-stage quadratic cost function (1) is interpretedas a time-step quasi-linear model such that the bilinear coefficients are combined witheither theai or bi parameters. The combined parameters are either given by

ai(t) = ai − u(t− d− i)η(i − 1) (25)

or bybi(t) = bi + y(t− i)η(i). (26)

The decision to accommodate the bilinearity with theai or bi coefficients depends on aparticular control situation and, to some extent, user choice. As a consequence of utilis-ing the bilinear (bilinearised) model for the purpose of predicting the system output theprediction error decreases, hence the BGPC is more effective over the standard GPC.The BGPC algorithm retains the same structure as in the case of GPC (22). However,since theai(t) or bi(t) coefficients are time varying and input or output dependent, re-spectively, the Toeplitz lower triangular matrixG and vectorf , which comprise of thesecoefficients, are required to be updated at each time step.

Naturally, the BGPC can also be implemented in the self-tuning framework, wherelinearisation at a point is replaced by that of bilinearisation over a range. This rise inbenefits such that the estimated model parameters less vary over the time. In some cases

194

the BGPC utilising a fixed bilinearised model is sufficient enough for describing thesystem leading to a less complex controller compare to the standard STGPC. In fact,since theai(t) or bi(t) coefficients are time varying the BGPC can also be interpreted intheself-tuning framework to some extent.


The system (plant) is represented by a second order single-input single-output auto-regressive with external input (ARX) model having additional Hammerstein and bilinearnonlinearities, which takes the form

y(t) =− 1.56y(t − 1) + 0.607y(t − 2) + 0.042u(t − 1) + 0.036u(t − 2)

− 0.01y(t − 1)u(t− 1) + 0.01u2(t− 1) + e(t), (27)

where the noise variance isσ2e = 0.002. The coefficient of the bilinear term isη0 =

−0.01 and the coefficient of the Hammerstein term is 0.01. The negative bilinear termis indicative of a system with saturation.

Similar structured nonlinear models have been assumed previously for replicatingthe characteristics of high temperature industrial furnaces, see [6, 5]. The nonlinearsystem has been chosen to show that bilinear controllers can be used to control nonlinearsystems without using the adaptive control approach, leading to the use of less complexand robust controllers.

5.1. PERFORMANCE CRITERIA

In order to evaluate the control performances of the bench mark GPC and the inves-tigated control schemes several quality performance criteria are introduced. Firstly, themean square error (MSE) is defined as

MSE =1

M

M∑

1

N∑

t=t0

(

r(t)− y(t))2

N − t0, (28)

whereM represents the number of Monte-Carlo runs,N denotes the number of datasamples andt0 denotes the start time of the evaluation. Secondly, the mean squarecontrol (MSC) performance criterion, representing a control effort, is defined as

MSC =1

M

M∑

1

N∑

t=t0

u2(t)

N − t0. (29)

195

The third performance criterion is the activity of the control action, denoted AC givenby

AC =1

M

M∑

1

N∑

t=t0

|u(t)− u(t− 1)|

N − t0× 100. (30)

5.2. SYSTEM IDENTIFICATION

Three controllers are investigated and compared, which are namely: GPC, STGPCand BGPC. The GPC is based on a second order linearised ARX model of the system(27) given by

y(t) = −1.552y(t − 1) + 0.600y(t − 2) + 0.043u(t − 1) + 0.037u(t − 2). (31)

The linearised model (31) of the system has been estimated off-line using linear leastsquares (LLS) estimation technique, see e.g. [7], applied to recorded data obtained whenthe system was simulated in an open-loop setting spanning the expected working pointsin the operational range. Note the difference between the identified linearised ARXmodel (31) and CARIMA model (2) used for GPC derivation. This incongruity arisesfrom the estimation technique used and may worsen controller performance. The BGPCis based on the bilinearised model of the system, which is given by

y(t) =− 1.552y(t − 1) + 0.600y(t − 2) + 0.043u(t − 1)

+ 0.037u(t − 2)− 0.006y(t − 1)u(t − 1). (32)

This has been similarly obtained using LLS as described for the linearised model (31).The STGPC is based on the linear second order ARX model havingna = 2, nb = 2 andunity time delay, where the model parameters are estimated on-line utilising the RLSmethod.

5.3. MONTE CARLO SIMULATION STUDY

A Monte-Carlo simulation study withM = 100 runs,N = 200 samples andt0 = 30,is performed. For all three controllers the tuning parameters are the same, which areHp = 5, Hc = 1 and the cost weighting parameterλ = 0.1. The BGPC is based on themodel (32), where the bilinearity is combined with theai parameters (25). The systemis subjected to a reference signal, switching between±1 with a period of 50 samples.The results are given in Table 1, where the mean values of MSE, MSC and AC for eachcontroller are presented along with the benchmark comparison expressed in normalisedform with respect to the GPC (where indices are all normalised to 100%). The resultsof a single simulation for a particular noise realisation corresponding to the benchmark

196

GPC and the STGPC are shown in Figure 1 and the benchmark GPC and the BGPC areshown in Figure 2.

Table 1. Results of a numerical simulation along with a benchmarkcomparison between the GPC controller(where its values represent100%) and the two investigated control schemes.

Controller MSE MSC AC MSE [%] MSC [%] AC [%]

GPC 0.0586 1.0021 0.1262 100.00 100.00 100.00STGPC 0.0564 0.9361 0.1276 96.136 93.415 101.09BGPC 0.0523 0.8134 0.1161 89.271 81.168 92.025

40 60 80 100 120 140 160 180 200−1.5

−1

−0.5

0

0.5

1

1.5Output signal

y(t)

GPCSTGPCr(t)

40 60 80 100 120 140 160 180 200

−4

−2

0

2

u(t)

Control action

Time

Fig. 1. Simulation of GPC and STGPC controller

5.4. OBSERVATIONS

The results given in Table 1 show the superior performance of the BGPC over thestandard GPC for this particular case. The tracking ability improves by 11% and the

197

40 60 80 100 120 140 160 180 200−1.5

−1

−0.5

0

0.5

1

1.5Output signal

y(t)

GPCBGPCr(t)

40 60 80 100 120 140 160 180 200

−4

−2

0

2

u(t)

Control action

Time

Fig. 2. Simulation of GPC and BGPC controller

control effort decreases by 19%. The STGPC provides moderate improvement overthe GPC. It is noted, however, that for a lower demand on the tracking accuracy (slowcontrol), e.g.Hp = 10, Hc = 1 andλ = 0.2, the three investigated controllers performin an almost undistinguishable manner. It is anticipated here that the high control activityof the STGPC is caused by the interaction of the parameter estimation and control partof the controller algorithm as well as the noise sensitivity of the parameter estimator.

6. CONCLUSIONS

The results obtained highlight the benefits of adopting a bilinear MBPC approachover standard linear MBPC approaches. The BGPC is able to achieve its objectivethrough the effective automatic gain scheduling via the nonlinear (bilinear) controllermodel structure so that the complexity of the controller decreases compare to the self-tuning schemes. It is postulated here that in the cases, where the fast and tight control

198

is required the bilinear controller makes better use of the system dynamics compare tohis linear counterpart. It is conjectured that, in the case when the set point is required tochange over a wide operational range, and/or where the system may change over time, aself-tuning form of the BGPC should be beneficial.

REFERENCES

[1] CAMACHO E. F. and BORDONS C.,Model Predictive Control. Springer-Verlag, London,1998.

[2] CLARKE D. W., MOHTADI C. and TUFFS P. C.,Generalized Predictive Control: PartsI and II. Automatica, vol. 23, 1987, pp. 137–160.

[3] CLARKE D. W. and MOHTADI C.,Properties of generalized predictive control. Auto-matica, vol. 25, 1989, pp. 859–875.

[4] DUNOYER A., Bilinear self-tuning control and bilinearisation with application to non-linear industrial systems. PhD thesis, Coventry University, UK, 1996.

[5] DUNOYER A., BURNHAM K. J. and MCALPINE T. S.,Self-tuning control of an in-dustrial pilot-scale reheating furnace: Design principles and application of a bilinearapproach. Proceedings of the IEE Control Theory and Applications, vol. 144(1), 1997, pp.25–31.

[6] GOOTHART S. G., BURNHAM K. J. and JAMES D. J. G.,Bilinear self-tuning controlof a high temperature heat treatment plant. Proceedings of the IEE Control Theory andApplications, vol. 141(1), 1994, pp. 12–18.

[7] LJUNG L., System Identification - Theory for the user. Prentice Hall PTR, New Jersey,1999.

[8] MACIEJOWSKI J. M., Predictive control with constraints. Pearson education limited,Edunburg Gate, 2002.

[9] MOHLER R. R.,Bilinear control processes: with applications to engineering, ecology andmedicine. Academic Press Inc.,U.S.,Orlando, FL, USA,1974.

[10] NAJIM K. and IKONEN E.,Advanced Process Identification and Control. Marcel Dekker,Inc., 2002.

[11] VINSONNEAU B., Development of errors-in-variables filtering and identification tech-niques: towards nonlinear models for real-world systems incorporating a priori knowl-edge. PhD thesis, Coventry University, UK, 2007.

199


Keywords: scheduling, learning effect, heuristic

Tomasz CZYZ∗

Radosław RUDEK†

SCHEDULING JOBS ON AN ADAPTIVE PROCESSOR

This paper is devoted to a scheduling problem, where the efficiency of a processor increases dueto its learning. Such problems model real-life settings that occur in the presence of a human learning(industry, manufacturing, management). However, the increasing growth of significant achievements inthe field of artificial intelligence and machine learning is a premise that the human-like learning willbe present in mechanized industrial processes that are controlled or performed by machines as well asin multi-agent computer systems. Therefore, the optimization algorithms dedicated in this paper forscheduling problems with learning are not only the answer for present day scheduling problems (wherehuman plays important role), but they are also a step forward to the improvement of self-learning andadapting systems that undeniably will occur in a new future.

1. INTRODUCTION

The classical scheduling problems consider that job parameters such as processingtimes are constant. However, in many industrial and even in multi-agent systems, theefficiency of a machine or a processor increases due to learning. Therefore, to solveefficiently scheduling problems that occur in such environments, it is required to modelthis phenomenon and on this basis design algorithms. In the scientific literature thereare two main approaches to model the learning effect in the context of the schedulingtheory. The first one assumes that the processing time of each job is described by anon-increasing function dependent on the number of performed products [1]. It followsfrom many observations and analysis carried out in economy and industry during thelast few decades (see [5], [6], [10], [11]). The second approach to model the learningphenomenon in scheduling problems is based on the observation that more time a humanspends on performing a job then more he learns. Therefore, the job processing timeis a non-increasing function of the sum of the normal processing times of previously

∗Wrocław University of Technology, Poland.†Wrocław University of Economics, Poland.

200

performed jobs (see [8]. Whereas the first approach models better problems, where aproduction is dominated by machines and a human activity is limited (e.g., to setup amachine), the second approach models better human-learning. For a survey see [2].

In this paper, we analyse the total weighted completion time scheduling problem,where the learning effect (i.e., the job processing time) is modelled according to thesecond approach. Since the computational complexity of the considered problem is notdetermined, we propose some approximation algorithms that are based on metaheuristicmethods such as simulated annealing [7] and tabu-search [3].

The remainder of this paper is organized as follows. The problem formulation ispresented in the next section. The description of the proposed algorithms with the nu-merical verification of their efficiency are given subsequently. The last section concludesthe paper.


There is given a single processor and a setJ = 1, . . . , n of n independent and non-preemptive jobs (e.g., products, packets, calculations) that are available for processingat time 0. The processor can perform one job at a time, and there are no precedenceconstraints between jobs. Each jobj is described by the weightwj and the processingtimepj(v) of job j if i t is scheduled as thevth in a sequence, this parameter models thelearning effect.

If π =⟨

π(1), ..., π(i), ..., π(n)⟩

denotes the sequence of jobs (permutation of theelements of the setJ), whereπ(i) is the job processed in positioni in this sequence,then the processing timepπ(i)(i) of a job scheduled in theith position in the sequenceπis described by the following non-increasing function:

pπ(i)(i) =

aπ(i),1, gπ(i),0 = 0 ≤ e(i) < gπ(i),1aπ(i),2, gπ(i),1 ≤ e(i) < gπ(i),2...aπ(i),k, gπ(i),k−1 ≤ e(i) < gπ(i),k

,

whereaπ(i),l is the processing time of jobπ(i) if the sum of the normal processing times

of previous jobse(i) =∑i−1

l=1 aπ(l),1 is betweengπ(i),l−1 ≤ e(i) < gπ(i),l, wheregπ(i),lis the lth threshold of jobπ(i).

For the given scheduleπ, we can determine the completion timeCπ(i) of a job placed

201

in theith position inπ as follows:

Cπ(i) =i

∑

l=1

pπ(i)(i), (1)

whereCπ(0) = 0. The objective is to find such a scheduleπ of jobs on the adaptiveprocessor, which minimizes the total weighted completion time criterion:

TWC(π) =n∑

i=1

wπ(i)Cπ(i). (2)

To denote the problem, we use the three field notation schemeX | Y | Z (see [4]), whereX describes the processors,Y contains job characteristics andZ is the criterion. Theconsidered problem will be denoted as follows1|LE|

∑

wjCj.

3. ALGORITHMS

The considered problem seams to be NP-hard. Based on this observation, we usemetaheuristic algorithm to solve it. Before we describe the algorithms, we present anapproach to calculate job processing times inO(1). Note that for each possible valueof e(i) ∈ [0,

∑nj=1 aj,1], we can calculate corresponding job processing timesaj,l only

oncefor the given instance, and store it in the following array (Table 1).

Table 1. Array of job processing time values

job\e(i) 0 1 · · ·∑n

i=1 aπ(i),1

12...n

Hence, the job processing times are calculated inO(1).On this basis, we propose two metaheuristic algorithms simulated annealing (SA)

and tabu search (TS).

3.1. SIMULATED ANNEALING

The general idea of the implemented simulated annealing algorithm (SA), following[7], is given. The algorithm starts with an initial solution and based on the current solu-tion π it chooses (in each iteration) the next solutionπ′ by swap two randomly chosen

202

jobs. The new solution replaces the current solution with the following probability

P (π, π′, T ) = min

1, exp(

(

TWC(π′)− TWC(π′))

/T)

,

where T is the current temperature. The initial value of the temperature isT0 and itdecreases till it reaches the stop temperatureTN . When the temperatureT reachesTN

it is reset toT0 and the temperature decreasing process starts aging. Two temperaturedecreasing models are considered: geometricalT = T

1+λTand logarithmicT = λT ,

whereλ is the temperature step. During the initial experiments geometrical model wasrejected because of great relative errors comparing to the logarithmic decreasing. Thealgorithm stops after the given number of iterationsN . Therefore, the complexity of SAisO(nN). The formal description of SA is given below.

Algorithm 1 SA1: T = T0, π = π∗ = πinitial, TWC∗ = TWC(πinitial)2: FOR i = 1 TO N3: CHOOSE π′ BY A RANDOM SWAP OF TWO JOBS IN π4: CALCULATE TWC(π) AND TWC(π′) ACCORDING TO (2)5: ASSIGN π = π′ WITH PROBABILITY

P (T, π′, π) = min

1, exp(

− TWC(π′)−TWC(π)T

)

6: IF TWC(π) < TWC∗ THEN π∗ = π AND TWC∗ = TWC(π)7: T = T

1+λT

8: IF T ≤ TN THEN T = T0

9: THE PERMUTATION π∗ IS THE GIVEN SOLUTION

3.2. TABU SEARCH

Theproposed algorithm is based on the tabu search [3]. Its computational complexityis O(n3N), whereN is the number of iterations. The algorithm uses local search witha short term memory, called tabu list, that stores forbidden moves. In the implementedalgorithm move is defined as the swap of two jobs. The applied tabu list stores pairs offorbidden moves or permutations. If the move or permutation is in the tabu list then it isforbidden and not considered further. The tabu list is organized as FIFO (First In FirstOut), thereby, if the list is full then the new move or permutation is added at its beginning.The size of tabu list is denoted as|TabuList|. We also use a random diversification that

203

chooses a random solution a counter reach the value of a diversification parameterD.Thecounter is increased when the next move gives worse criterion than the current, anddecreased when the new is better. The formal description of TS is given below.

Algorithm 2 TS

1: TabuList = ∅, π = πbest = π∗ = πinitial, counter = 0TWCprevious = TWCbest = TWC∗ = TWC(π∗)

2: FOR i = 1 TO N3: FOR j = n TO 14: FOR v = n TO 15: π′ = π, SWAP π′(j) AND π′(v) IN π′

6: IF j 6= v AND TWC(π′) < TWCbest

7: IF (j, v) IS NOT IN TabuList8: πbest = π′, TWCbest = TWC(π′),

jbest = j, vbest = v9: ASSIGN π = πbest10: ADD (jbest, vbest) TO TabuList11: IF TWCbest < TWC∗

12: π∗ = πbest, TWC∗ = TWCbest

12: IF TWCbest < TWCprevious THENcounter = counter − 1

13: ELSEcounter = counter + 1

12: IF counter == D THEN CHOOSE π RANDOMLY

12: TWCprevious = TWCbest

13: THE PERMUTATION π∗ IS THE BEST FOUND SOLUTION

3.3. EXPERIMENT

In this section, we will verify numerically the efficiency of the proposed algorithms.For eachn ∈ 10, 25, 50, 75, 100 parameters of jobs were generated from the uni-

form distribution in the following ranges:w, k ∈ [1, . . . , 10], aπ(i),1 ∈ [1, . . . , 500],aπ(i),k ∈ [1, . . . , 10] and aπ(i),l > aπ(i),j for l < j andgπ(i),k ∈ [1 · · ·

∑nx=1 aπ(i),1],

gπ(i),0 = 0 andgπ(i),l > gπ(i),j for l < j. Algorithms were tested for 100 instances foreachn.

After testing several combinations of parameters the following parameters of the al-

204

gorithms were chosen (see Table 2 and Table 3). The initial solution for each the algo-rithm was a random permutation.

Table 2. Simulated Annealing variants

SA1 SA2

start temperatureT0 50 000 000 1 000 000 000stop temperatureTN 0.001 0.001temperature stepλ 0.999 0.999

iterations 10 000 10 000

Table 3. Tabu Search variants

TS1 TS2

tabu list block type move permutationtabu list length|TabuList| 20 20diversification parameterD 6 10

iterationsN 100 100

Each algorithm was evaluated according to the relative error

δ = (TWCA − TWCmin)/TWCmin,

whereTWCA is a criterion value provided by the algorithmA ∈ SA1, SA2, TS1, TS2andTWCmin is the best found criterion value for the given instance among tested algo-rithms (forn = 10 it is an optimal solution provided by the extensive search algorithm).The results of minimumδmin, meanδ and maximumδmax relative errors and meanrunning timest are presented in Table 4.

The proposed algorithms have short running times and provides solutions with lowrelative errors. However, SA is more efficient for the considered problem. It provides re-sults with lower mean and maximum relative errors in shorter time than TS. On the otherhand, TS can give better results, but the computational effort would be much greater thanfor SA.

4. CONCLUSIONS

In this paper, the single processor scheduling problem with the learning effect wasconsidered. Since the problem seems to be NP-hard, we proposed two metaheuristicalgorithms that are based on simulated annealing and tabu search methods. The exper-iments showed that they provide results with low relative errors in a short time. Thus,they can be applied for the considered problem. Our future works will be devoted to

205

Table 4. The minimumδmin, meanδ and maximumδmax relative errors and mean running timest of thealgorithms

n 10 25 50 75 100

SA1t[s] 0.14 0.51 1.90 4.90 8.70δ[%] 0.00 0.07 0.25 0.64 0.64δmax[%] 0.00 1.31 1.37 3.75 2.64δmin[%] 0.00 0.00 0.00 0.00 0.00

SA2t[s] 0.14 0.50 1.87 4.87 8.47δ[%] 0.00 0.04 0.20 0.70 0.92δmax[%] 0.00 1.26 2.13 4.16 3.08δmin[%] 0.00 0.00 0.00 0.00 0.00

TS1t[s] 0.01 0.28 4.43 26.43 83.35δ[%] 0.45 0.60 1.68 1.83 3.43δmax[%] 16.87 11.37 13.02 15.08 13.82δmin[%] 0.00 0.00 0.00 0.00 0.00

TS2t[s] 0.01 0.29 4.50 26.72 83.51δ[%] 0.22 0.82 1.61 1.84 2.38δmax[%] 8.42 7.48 13.64 9.85 18.72δmin[%] 0.00 0.00 0.00 0.00 0.00

minimize the computational complexity of tabu search by decreasing the complexity ofneighborhood searching.

REFERENCES

[1] BISKUP D., Single–machine scheduling with learning considerations.European Journalof Operational Research, vol.115, pp.173–178, 1999.

[2] BISKUP D., A state-of-the-art review on scheduling with learning effects.EuropeanJournal of Operational Research, vol.188, pp.315–329, 2008.

[3] GLOVER F., Tabu Search - Part I.ORSA Journal on Computing, Vol. 1, No. 3, pp.190-206, 1989.

[4] GRAHAM R. L., LAWLER E. L., LENSTRA J. K. and RINNOOY KAN A. H. G.,Opti-mization and approximation in deterministic sequencing and scheduling: a survey.Annalsof Discrete Mathematics, vol.5, pp.287–326, 1979.

[5] JABER Y. M. and BONNEY M.,The economic manufacture/order quantity (EMQ/EOQ)and the learning curve: Past, present, and future.International Journal of ProductionEconomics, vol.59, pp.93–102, 1999.

[6] KERZNER H., Project management: a system approach to planning, scheduling, andcontrolling. John Wiley & Sons, Inc., New York, 1998.

206

[7] KIRKPATRICK S., GELATT C. D., and VECCHI M. P.,Optimization by simulated an-nealing.Science, vol.220, pp.671–680, 1983.

[8] KUO W.H. and YANG D.L., Single-machine group scheduling with a time-dependentlearning effect.Computers & Operations Research, vol.33, pp.2099–2112, 2006.

[9] NAWAZ M., ENSCORE JR E. E., and HAM I. A.,A heuristic algorithm form-machine,n-jobs Flow-shop sequencing problem.OMEGA International Journal of ManagementScience, vol.11, pp.91–95, 1983.

[10] WRIGHT T. P., Factors affecting the cost of airplanes.Journal of Aeronautical Sciences,vol.3, pp.122–128, 1936.

[11] YELLE L. E., The learning curve: historical review and comprehensive study.DecisionScience, vol. 10, pp.302–328, 1979.

207


Keywords:metaheuristics, task allocation, simulation, efficiency, Tabu Search, Simulated Annealing

Wojciech KMIECIK* Marek WÓJCIKOWSKI* Andrzej KASPRZAK* Leszek KOSZAŁKA*

TASK ALLOCATION IN MESH CONNECTED PROCESSORS USING LOCAL SEARCH METAHEURISTIC ALGORITHM

This article contains a short analysis of applying three metaheuristic local search algorithms to

solve the problem of allocating two-dimensional tasks on a two-dimensional processor mesh. The primary goal is to maximize the level of mesh utilization. To achieve this task we adapted three algorithms: Tabu Search, Simulated Annealing and Random Search, as well as created an auxiliary algorithm Dumb Fit and adapted another auxiliary algorithm First Fit. To measure the algorithms’ efficiency we introduced two evaluating criteria called Cumulative Effectiveness and Utilization Factor. Finally, we implemented an experimentation system to test these algorithms on different sets of tasks to allocate.

1. INTRODUCTION

Recently, processing with many parallel units has gained on popularity. Parallel processing is applied in various environments, ranging from multimedia home devices to very complex machine clusters used in research institutions. In all these cases, efficiency depends on a wise task allocation [1], enabling the user to utilize the power of a highly parallel system. Research has shown, that in most cases, parallel processing units give only a fraction of their theoretical computing power [2] (which is a multiplication of the potential of a single unit used in the system). One of the reasons for this is high complexity of task allocation on parallel units.

Metaheuristic algorithms have been invented to solve a subset of problems, for which finding an optimal solution is impossible or far too complex for contemporary computers. Algorithms like Tabu Search [3], invented by Fred Glover or Simulated

*Department of Systems and Computer Networks, Wroclaw University of Technology, Poland.

208

Annealing [4], [5], [6] by S. Kirkpatrick are among the most popular. They are capable of finding near-optimum solutions for a very wide range of problems in a time incomparably shorter than the time that it would take to find the best solution [7].

It was decided to adapt three algorithms for solving the allocation problem: Tabu Search, Simulated Annealing and a simplified local search metaheuristic – Random Search used for comparison. In our approach, for comparison purposes, we use also an existing solution for task allocation on processor meshes – the First Fit algorithm. We also designed a modification of First Fit, called Dumb Fit which fits better for this role. First Fit is also used to generate results that we use as a reference when examining the efficiency of the main three algorithms. We propose a new evaluating function called Cumulative Effectiveness. The function and its derivative called Utilization Factor are further explained in following sections of the article. To examine our solutions’ efficiency in different conditions (mesh sizes, task sizes, task processing times etc.) we implemented an experimentation system.

Next sections of the article contain what follows: Section II specifies the problem to be solved, Section III describes used algorithms and their roles, Section IV describes the experimentation system. Section V contains an analysis of results of series of experiments on three task classes: small tasks, mixed tasks and large tasks. Finally, Section VI contains conclusions and sums up the article.

2. PROBLEM STATEMENT

A. Definitions

1. A node is the most basic element which represents a processor in a processor mesh. It is a unit of the dimensions of a mesh, submesh or task. A node can be busy or free.

2. A processor mesh, which thereafter will be simply referred to as ‘mesh’, is a 2-D rectangular structure of nodes distributed regularly on a grid. It can be denoted as M (w, h, t), where w and h are the width and height of the mesh and t is the time of mesh’s life. The value of t may be zero or non-zero. A zero value means that the mesh will be active until the last task from the queue is processed. This value also determines the choice of evaluating function, which will be further explained later in this article.

3. A position (x, y) within a mesh M refers to the node positioned in the column x and row y of the mesh, counting from left to right and top to bottom, starting with 1.

4. A submesh S is a rectangular segment of a mesh M, i.e. a group of nodes, defined in a certain moment of time, denoted as S(a, b, e, j) with its top left node in the position (a, b) in the mesh M, and of width e and height j. This entity, has only symbolic value; it is used in this article to describe various conditions but is not anyhow separately

209

implemented in the software product. If a submesh is occupied, it means that all its nodes are busy.

A mesh in a certain moment of time – M(w, h, t1), can be depicted as a matrix of integers, where each number corresponds to a node. Zero can be denoted as a dot (.) and it means a free node. Non-zero numbers (same for a submesh processing one allocated task) indicate a busy node, their value is the time left to process the task. Such depiction is portrayed in fig. 1. There, we can see four various tasks allocated on a small mesh.

5. Tasks, denoted T(p, q, s), are stored in a list. The entire content of the list is known before the allocation. Tasks are taken from the list and allocated on a mesh. There, they occupy a submesh S of width p height q for s units of time (thus s is their processing time).

B. Evaluating functions and lifetime of mesh

The main evaluating function proposed in this paper is Cumulative Effectiveness (CE) and is given in Equation (1). It is used when there is a non-zero time of life defined for a mesh. Knowing it and the parameters of the used mesh we can count a more self-descriptive factor, i.e. the Usage Factor given in Equation (2). In (1) pi, qi and si denote width, height and processing time, respectively, of the i-th of n processed tasks. In (2) w, h, t denote width, height and time of life, respectively, of the used mesh.

( )∑=

⋅⋅=n

iiii sqpCE

1

(1)

%100⋅⋅⋅

=thw

CEU (2)

A task, as well as a mesh can be treated as 3-D entities when we assume that time is the third dimension. Then CE function is the cumulative volume of all allocated tasks and U is the percentage of mesh’s volume used by the processed tasks. It allows us to

Fig.1. A sample depiction of a mesh with 4 allocated tasks in a moment of time

210

easily determine how much of the mesh’s potential was “wasted” and how much was utilized.

The creation of the CE function and derivative U factor is based on assumption that a company using a processor mesh has a set of tasks to process on their equipment, which exceeds the number of tasks possible to process in one atomic period of time (mesh’s lifetime, e.g. a day), in the beginning of which a single allocation process is conducted. In such case it is essential to utilize as much of the mesh’s power as possible. However, there is also another approach in which it is assumed that the time of life of the mesh is unlimited. In such case it is desired to process all tasks in the list as soon as possible. In such case mesh’s lifetime is set to zero (which here means infinity). Then the evaluating function is the Time of Completion given in Equation (3). In (3) tfin denotes the moment of time, since the start of processing, when the last of all tasks has been processed.

fintT = (3)

This factor can only be used for comparing algorithms, not for objectively evaluating their efficiency. The main advantage of T factor is shorter simulation time. Nevertheless, the Usage Factor is recommended, because of its objectiveness, and is mainly used in our research, described in further sections of this article.

3. ALGORITHMS

A. General information

In our simulation software, we implemented 3 main metaheuristic local search algorithms: SA – Simulated Annealing explained in [4] [5] [6] [7], TS – Tabu Search explained in [3] [7], RS – Random Search (not to be confused with simple evaluating a random solution), explained in [7]. All of them work for a number of iterations. In each iteration they operate on a single solution and its neighbourhood and evaluate the results. A solution is defined here as a permutation of tasks to be allocated, stored in a list. Such permutation is evaluated by performing a simulation, using one of two atomic algorithms (First Fit and Dumb Fit) and, based on simulation result, computing one of the evaluation functions, explained above.

There are also various kinds of neighbourhood to be explored by the algorithms. We implemented two of them: insert and swap. In case of the first one, a neighbouring solution is found by taking one element of the permutation and putting it in some other position. In case of swap, two elements are taken and their positions are swapped (hence the name). Performance of each of the three main algorithms highly depends on

211

the instance of the problem (mesh’s and task’s dimensions and life/processing times) as well as on algorithms’ specific parameters, and the atomic algorithms used.

Fig. 2. Block diagrams for the Dumb Fit and First Fit algorithms respectively

212

B. Random Search (RS)

RS is the simplest local search algorithm. The algorithm starts from a solution and in each iteration, it finds and evaluates a new solution from the neighbourhood of the current one. In the next iteration, the new solution becomes the current one and the process continues. In RS there are no additional parameters except for the number of iterations. This algorithm is highly resistant to local minima.

C. Simulated Annealing (SA)

Main parameters of SA are starting and ending temperatures. During the course of its operation the temperature drops (logarithmically or geometrically). In each iteration, a random solution from the neighbourhood of the current one is found and evaluated. When the temperature is high there is high probability to accept the new solution as the current one, even if it is worse. When the temperature is low only these solutions are accepted which are better than the current one. Such approach makes this algorithm resistant to local minima in the beginning and improving a current solution in the end, going down to the nearest minimum.

D. Tabu Search

Our implementation of the TS algorithm is similar to the SA algorithm with low temperatures, except for the fact that it does not accept a new solution as the current one, if the same solution is found in the taboo list. Whenever a new current solution is set it is added to the taboo list. The taboo list has limited length which is the main parameter of the algorithm. This algorithm is forced to leave the vicinity of a local minimum. The vicinity is limited by the length of the taboo list. At the same time TS tries to precisely improve a current solution. It also returns the best solution found during its operation.

E. Atomic functions

The atomic algorithms that we use are Dumb Fit (DF) and First Fit (FF), their block diagrams are shown in fig. 2. During the operation of DF or FF algorithm, the appropriate evaluating function value is calculated and is passed to the one of the 3 main algorithms that is currently working, allowing it to proceed to another iteration.


We aimed to design as versatile simulation environment as possible, to be able to evaluate all combinations of parameters for various problem instances. As a result, we have developed a console application, written in the C++ language, with various functionalities to read experiment parameters and to write the results. The program has

213

been developed for Microsoft Windows OS and has two main modes of operation: command line mode and menu mode.

A. Input

Generally, in all modes of operation, the software allows the user to set certain input parameters. First group of parameters defines the problem. It allows the user to choose ranges of dimensions (p, q) and processing times (s) for the tasks and the task-list length. The user can also define the size and lifetime of the mesh: w, h, t. All the parameters from the first group allow the program to randomly create a task-list and define a mesh, which, together, form a problem instance.

The other group of parameters varies and consists of specific parameters of the chosen algorithm, like: number of iterations, starting and ending temperatures for SA, temperature profile for SA, tabu-list length for TS etc. Specifying both groups of parameters makes it possible to solve a predefined problem with a chosen, custom configured algorithm.

B. Menu mode

When the software is run without any parameters in command line, it goes into the menu mode. It features a main menu consisting of options giving three main modes of operation. First mode involves tracing a single simulation for a given task-list and a given mesh, using a chosen atomic algorithm. It shows the simulation graphically, step by step and shows the value of evaluation function, which helps to understand the way of evaluating a solution with a given algorithm. The second way of using the program in menu mode is providing a file with predefined test series, a method which will be described in the command line mode subsection. The third way to use the menu mode, is performing a single experiment with a chosen algorithm. It allows the user to specify all the parameters without using an input file and watch the algorithm operation, i.e. it’s progress and current evaluation function value are shown. The menu mode would not be suitable for performing a series of experiments for a major research but is quite convenient for calibrating the parameters and designing a test series.

C. Command line mode

This mode is the preferred one for running a series of experiments for a certain research. It allows the user to specify, as a command line parameter, a file with a predesigned test series. Such file begins with a set of parameters defining the problem instance. The task-list is generated only once and same problem instance is used for the whole test series defined in the file. Also the number of repetitions for each test can be specified. Any number of any kind of tests for a certain problem instance can be defined in an input file. When using the command line mode, the user can create a

214

batch file (.bat) for running a series of series of tests (a series of program executions for more than one input file).

D. Output files

For each experiment (single execution of a main algorithm), in both execution modes, an output file is generated. It contains some specific input parameters set for the chosen algorithm and, most importantly, lines showing the current and best evaluation function value for each iteration. Such data can subsequently be analysed with appropriate software.

5. INVESTIGATIONS AND DISCUSSION

Using our simulation software, we conducted a series of experiments and, in the course of the process, we noticed that the analysed problem instances should be categorized into three groups: tasks relatively small (compared with the mesh size), tasks relatively large and mixed tasks (small and large). For each of the groups, used algorithms behaved differently, so we designed three corresponding test series that are analysed in the following subsections. In each test, FF’s result is used as a reference. Evaluating algorithm means the algorithm used for each solution evaluation by the main algorithms. Initiating algorithm means the algorithm used to generate the initial solution. Table 1 summarises input values for all three test series.

A. Mixed tasks and general observations In this case almost all tests were performed for 20000 iterations (except for a few

with 5000 iterations) for all main algorithms for the same task set. For tests in which FF was the evaluating algorithm, which significantly increases evaluation time, 5000 iterations were tested. The aim was to keep all algorithms running for about 100 seconds. Each test was repeated 3 times and mean values are used below, unless it is specified otherwise. The evaluating function used was CE which allowed to calculate the Usage Factor. There are several observations that emerged after analysis of the results:

1. On average, the SA algorithm was the best performer (fig. 3.).

2. The main factor affecting the effectiveness of SA was the starting temperature. This is illustrated in fig. 3.

3. It is a good idea to use FF as an initialisation algorithm for the main algorithms, since an increase of the optimization time is marginal. This, however, does not apply to SA: mean U value for SA (T0=300, swap neighbourhood) starting from a random permutation was 83.68% and for the same settings but starting from the permutation generated by FF it was 82.61%. This is probably due to the fact that the FF

215

algorithm can put the SA starting position in a wide local minimum that the algorithm is unable to leave.

4. For all algorithms, swap neighbourhood gave better results than insert neighbourhood (about 3% difference on average in terms of U factor).

5. For the TS algorithm, taboo lists of lengths around 1000 performed marginally better than for lengths around 100 (U was 79.8% and 78.8%, respectively).

6. If the FF algorithm is used during the work of the main algorithms for evaluation, it performs well but increases the processing time significantly. After lowering the iteration count it gives comparable results to the configurations with 20000 iterations and DF algorithm.

Table 1.Input for all test series

Parameter Test

Mixed tasks Small tasks Large tasks

p 2÷12 2÷10 4÷12

q 2÷12 2÷10 4÷12

s 2÷12 2÷10 4÷12

w 12 50 12

h 12 50 12

t (task-list length when t=0)

1000 0 (1000 tasks)

1000

tested algorithms

SA, TS, RS SA, TS, RS SA, TS, RS

evaluating algorithms

DF, FF DF, FF DF, FF

initiating algorithms

DF, FF DF, FF DF, FF

evaluating function

CE T CE

neighbourhood swap, insert swap, insert swap, insert

Fig. 3 shows SA results for various starting temperatures for the best found configuration, i.e. the starting permutation being random, DF as the evaluating algorithm, swap neighbourhood, 0.01 final temperature and geometrical temperature profile. The chart in Figure 3 shows that, for this configuration and problem instance, the best starting temperature of SA is around 300. The best value of the U factor achieved by the SA algorithm for 20000 iterations was 83.68%.

216

75

76

77

78

79

80

81

82

83

84

85

0 500 1000 1500 2000 2500

Usa

ge

fact

or,

U (%

)

max temp -SA

SA

FF

RS

TS

Fig. 3. Performance of the SA algorithm for different starting temp vs. other algorithms

Fig. 4. Values of current and best results for SA through iterations (small tasks)

The best algorithm outperformed the FF result in its 4522nd iteration. The values of the current and the best results for each iteration, taken from the result file of the best SA passage, are shown in fig. 4. The chart shows that in the beginning, the current result tends to be lower than the best one, but then it starts to “stick” to the best result as with the temperature fall, the algorithm acts similarly to Descending Search and less like Random Search. It is also visible when the best and current results surpass the value found by FF.

Results obtained:

• best result: SA, swap, evaluation function DF, starting temp. 300, geometrical profile: U=83.68%,

• difference between the best result and FF: 10% of the FF’s result.

B. Small tasks In this case we decided to use the second evaluating factor – T. It is less objective

than the first one but still allows comparing the algorithms and gives much better

217

0

20

40

60

80

100

120

140

160

0 5000 10000 15000 20000 25000

tim

e, T

iterations

Current

Best/FF

ability to spare experimentation time. It is so, because for a large mesh and small tasks it would require to process a large list of tasks. This would make a series of experiments unreasonably long to conduct.

These experiments resulted in a conclusion that using metaheuristic algorithms for allocating small tasks is not needed and does not improve the systems efficiency. Tasks are small enough, in comparison to the size of the mesh, that the FF algorithm manages to fit a task from the list into almost every free submesh. Therefore, even metaheuristics basing on FF’s result cannot achieve any better result; an example plot is shown in fig. 5. Furthermore, due to semi-random characteristics of the tested metaheuristics, ones that started from a random solution and did not use FF for evaluation, gave even worse results, e.g. SA, in such case, gave a result of T=124.

Results obtained:

• best result: SA/TS/RS: T=98,

• difference between the best result and FF: 0.

Fig. 5.Values of current and best results for SA through iterations, for small tasks

C. Large tasks

In this case, achieved results and algorithm’s behaviour were very similar to the general case of mixed tasks (we also used the same scheme of testing as then). Achieved result of the best algorithm was even better, but only by a small margin. Also, as in the case of mixed tasks, SA algorithm was the best performer and the same parameters as previously led to maximum performance.

Results obtained:

• best result: SA, swap, evaluation function DF, starting temp. 300, geometrical profile: U=83.98%,

• difference between the best result and FF: 13.3% of the FF’s result.

218


This paper considered three metaheuristic local search algorithms adapted for the problem of task allocation on a processor mesh. An experimentation system has been developed. The experiments showed, that in general, local search metaheuristic algorithms perform well in solving the considered problem. Only for allocating small tasks on a large mesh, it is needless to use these algorithms, since they do not achieve better results than the basic FF algorithm which itself performs well, due to the easiness of fitting small tasks into free submeshes. On average, the best performer in all tests was the SA algorithm. It outperformed all other algorithms for mixed and large sized tasks. It also achieved reasonable results of over 83% of mesh usage. In the course of our proceedings, it was noticed that it is very important to design the experimentation environment well. It should give as many options as possible and, at the same time, should allow the user to easily design whole series of experiments. Despite our conclusions, it cannot be said that the problem has been fully explored and researched. There are other possible combinations of problem instances and testing parameters to be tested with our software. What is more, it is still possible to construct a far more thorough and versatile testing environment and to implement more algorithms, e.g. Genetic Algorithm, etc.

REFERENCES

[1] GOH L. K. and VEERAVALLI B., Design and performance evaluation of combined first-fit task allocation and migration strategies in mesh multiprocessor systems. Parallel Computing, vol. 34, issue 9, September 2008, pp. 508-520.

[2] BUZBEE B. L., The efficiency of parallel processing. Frontiers of Supercomputing, Los Alamos 1983.

[3] GLOVER F., Tabu Search – part I. ORSA Journal on Computing, vol. 1, no. 3, Summer 1989.

[4] KIRKPATRICK S., GELATT C.D., VECCHI M.P., Optimization by Simulated Annealing. Science, New Series, vol. 220, no. 4598, May 13 1983, pp. 671-680.

[5] GRANVILLE V., KRIVANEK M., RASSON J.P., Simulated Annealing: a proof of convergence. Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 16, issue 6, June 1994, pp.652-656.

[6] LAARHOVEN J.M., EMILE H., AARTS L., Simulated Annealing: theory and applications. Springer, 1987.

[7] GLOVER F., KOCHENBERGER G.A., “Handbook of metaheuristics. Springer, 2002.

219


Keywords: mesh network, task allocation, artificial neural network

Rafał ŁYSIAK* Iwona POŹNIAK-KOSZAŁKA * Leszek KOSZAŁKA*

ARTIFICIAL NEURAL NETWORK FOR IMPROVEMENT OF TASK ALLOCATION IN MESH-CONNECTED PROCESSORS

An efficient allocation of processors to incoming tasks is especially important to achieve a high level of performance. A good allocation algorithm should identify a task and find for it the best position in the mesh. Depends on the type of the system, the algorithm should minimize the time which is necessary to find a place and keep the fragmentation on the lowest possible level. This paper showsthe results of Modified Best Fit Algorithm and the idea of usage of Artificial Neural Network to optimize the task allocation time.

1. INTRODUCTION

Nowadays multicomputer systems with many processors, connecting with high speed networks are huge instruments for scientists all over the World. Among many problems with sharing it, a tasks allocation in two-dimensional (2D) mesh algorithm is one of the most popular. Efficient management of the system can increase the calculation level and general performance. The requirement here is to allocate incoming tasks to submeshes of appropriate size in the 2D mesh-based system. The size of the submesh can range from one node to the entire mesh. The allocation scheme needs to maximize the processor utilization while minimizing allocation time.

This paper presents the results of the Modified Best Fit Algorithm (MBF) and the general idea of usage the Artificial Neural Network (ANN) to decrease the allocation time. The primary target of MBF was to achieve the highest possible level of the allocation. The allocation time was secondary target, because of ANN. The MBF Algorithm was compared with First Fit Algorithm.

2. DEFINITIONS AND NOTATIONS

In this section, we will present definitions and useful notation that is use throughout the paper.


220

Definition 1. Mesh is the set of nodes, which are representing the CPUs. Two-dimensional mesh network is denoted as M(W,H) where W is width and H is height of this mesh network;

and where Wmax and Hmax are the maximum size of the mesh. The example of M(8,7) is shown in Fig. 1.

Definition 2. Node can be described by N(x,y), where x means the column number and y means the row number; and .

Fig. 1. The example mesh network

Definition 3. Submesh is rectangular subset of the main mesh M. The submesh can be denoted as SM(x1,y1,x2,y2), where pair x1,y2 define the top left corner of the submesh, and x2,y2 define the bottom right corner of the submesh. A busy submesh is a submesh with all nodes busy. A free submesh is a submesh where all nodes are free to use.

Definition 4. Task can be denoted as T(w,h,t), where w is width, h is height and t is the execution time. The knowledge of this 3 parameters can be used to allocate the task T in mesh M.

Definition 5. Fragmentation is the index that expresses in percent [%] the efficiency of the allocation process. The fragmentation fM of mesh M can be defined by

. (1)

In (1) W and H represents the mesh M width and height; P represents the number of nodes from the biggest free submesh (explanation in Fig. 2); the sum is representing the number of all busy nodes.

221

Fig. 2. Mesh M(8,7) with fM=12.5%

Definition 6. Allocation time is the period between appearance of the task in the system and the final allocation in the mesh M. It contains the time which is necessary for finding the proper coordinates of the task.

Definition 7. Execution time is the time when task has to be in the mesh M to be fully executed. After this period, task releases the submesh, converting it into the free submesh ready to use.

Definition 8. Time horizon is the length of the simulation in dynamic model express in seconds or minutes.


The M(W,H) is given, where W and H are known. There is also the set of tasks T1(w1,h1), T2(w2,h2), …, Tn(wn,hn), where wn and hn are known. The main purpose of the allocation algorithm is to return the location for the incoming task. The location is represented by <xl,yl>, where xl (column number) and yl (row number) are the coordinates. Top left corner of the mesh is denoted as <0,0>.

The main criteria of the allocation process are: • minimize the fragmentation f,

• minimize the task allocation time.

The constraints are: • once allocated task cannot be moved,

222

• once allocated task stay in the location until it is fully executed, • tasks are coming into the system every one second, • if it is impossible to allocate the task it is rejected and the failure is thrown,

• tasks are allocated in the order of appearance.

It was dynamic allocation model used. The general overview of the allocation process is shown in Fig. 3.

Fig. 3. Dynamic allocation process scheme

4. MBF ALGORITHM

MBF is the Modified Best Fit Algorithm. The main target of the algorithm is to keep the fragmentation (1) as low as it is possible. MBF uses 3 indexes to find the best possible location.

Index #1 an,, where , represents the number of neighbour busy nodes. This is the same functionality as in original Best Fit (BF) [1], but in MBF there is also another limitation. MBF checks also the neighbours which are 2 nodes away, which brings better results of fragmentation. It was made because it was noticed that submeshes with width equals 1 or height equals 1, are the main reason of bigger fragmentation. The example has been shown in Fig. 4a. In original BF the “quality” of location 1 and 2 are the same for task T(2,3). In MBF location 1 is better than 2, because of free submesh SM(7,1,7,6) which remains in case of location 2.

Index #2 bn,, where , represents the ratio between perimeter and area of the busy submesh which is the result of the allocation. It is better to keep the ratio on the possible low level. The example of the situation is shown in Fig. 4b.

In location 1 and in location 2 .

223

Fig. 4a. Index #1 example

Fig. 4b. Index #2 example

It was noticed that the good behavior is to leave a free submesh, which size would be enough to allocate the most possible incoming task. The size of the most possible task was estimated due to the tasks that already appeared in the system.

Index #3 cn,, where , represents the number of nodes, which will be used from the submesh SMl. SMl is the submesh that should stay free to use by the most possible incoming task. Next step is normalization process of sets A, B and C. The process is given by

; ; . (2)

Now, it is possible to represent every possible location by

. (3)

The final step is to find the minimal value of wi that is defined by

, (4)

where L represents the best possible coordinates due to index #1, #2 and #3.

5. EXPERIMENTATION ENVIRONMENT

MBF Algorithm was compared with First Fit Algorithm [2] using Java application that was developed for that experiment. The application gives the opportunity to run

224

the simulation using predefined time horizon, average tasks size, average execution time and mesh size. Both algorithms ran at the same conditions. It means that the time horizon and the tasks set were the same.

Furthermore, the application is gathering the information about the allocation process and when the simulation is finished, this data are saved into the CSV file. CSV is easy to use in MS Excel to do the final analysis of algorithms performance.

Application was written in Eclipse using Java. The example screenshot is shown in Fig. 5.

Fig. 5.Screenshot from the application

6. COMPARISON OF MBF AND FF

Two experiments were run with different input values. In experiment #1 tasks were small and the execution time was long. In experiment #2 tasks were big and the execution time was short.

6.1. EXPERIMENT #1

The input values for experiment #1 are shown in Table 1.

Table 1. The input values for experiment #1

Parameter Value Average task width 10 points Average task height 10 points

Average task execution time 1000 ms

The results for experiment #1 are shown in graphs below.

225

Fig. 6a. Mesh fragmentation in experiment #1 for MBF

Fig. 6b. Mesh fragmentation in experiment #1 for FF

Fig. 6c. Allocation time in experiment #1 for MBF

Fig. 6d. Allocation time in experiment #1 for FF

In Fig. 6a and Fig. 6b mesh fragmentation was shown for MBF and FF algorithms. The values are expressed in percent [%]. The simulator checked the fragmentation every 1 second. In Fig. 6c and Fig. 6d the allocation time was shown. The values are expressed in milliseconds.

6.2. EXPERIMENT #2

The input values for experiment #2 are shown in Table 2.

Table 2. The input values for experiment #2

Parameter Value Average task width 20 points Average task height 20 points

Average task execution time 100 ms

226

The results for experiment #2 are shown in graphs below.

Fig. 7a. Mesh fragmentation in experiment #2 for MBF

Fig. 7b. Mesh fragmentation in experiment #2 for FF

Fig. 7c. Allocation time in experiment #2 for MBF

Fig. 7d. Allocation time in experiment #2 for FF

In Fig. 7a and Fig. 7b mesh fragmentation was shown for MBF and FF algorithms. The values are expressed in percent [%]. The simulator checked the fragmentation every 1 second. In Fig. 7c and Fig. 7d the allocation time was shown. The values are expressed in milliseconds.

7. ARTIFICIAL NEURAL NETWORK

As it was shown in Section 6, MBF algorithm gave better results than FF in case of fragmentation. The only problem was the allocation time. MBF was much slower then FF.

227

In this paper is shown the idea of usage of Artificial Neural Network (ANN) [3] to decrease the allocation time in MBF algorithm. To achieve the target an ANN was created in Matlab application. The topology and main configuration of the chosen ANN is shown in Table 3. Table 3. Artificial Neural Network configuration

Parameter Value

Hidden layers number 1 Number of input layer neurons 27 Number of output layer neurons 25 Number of hidden layer neurons 10 Training function name Trainlm* [4] Activation function Sigmoid

It was 200-elements set used to train the ANN. The whole training process took about 2 minutes (for 200 training dataset; 25 input; 27 output). During training it is possible to watch the progress in real time graph. The graph is shown in Fig. 8.

Fig. 8. Training process graph

After the training ANN performance was 10-3 with training data and 10-2 with

another data. Another data means datasets that wasn’t used during training process.


Results show that MBF is better than FF in both experiments in case of fragmentation level. In experiment #1, where tasks were small and the execution

* trainlm is a training function from Matlab. It is very fast but needs a lot of memory.

228

time was long, the fragmentation of the mesh was 10% lower using MBF. In experiment #2, where tasks were bigger and execution time were short, the difference was even bigger and was about 20%. It shows that performance of MBF is better than FF.

The problem of MBF was the time that was necessary to find location for incoming task. In experiment #1, MBF was much slower than FF in the beginning of the simulation (when the mesh was empty). When the mesh was mostly busy the allocation time of both algorithms were quite similar, but MBF was slower. In experiment #2 MBF was much worst. The difference of allocation time was about 500ms.

It was also shown that ANN can be used to learn how to allocate tasks in mesh-connected processors network using MBF algorithm. It is also possible to try teaching ANN to allocate tasks, using other allocation algorithms. The main advantage of ANN is the quickness of working. There is a chance that ANN can be used to decrease the allocation time of MBF algorithm remarkably.

In this paper, it was shown that ANN can be used to teach how to allocate tasks in mesh network. No simulations were run to check the real performance of such solution. In out future work, we will try to use ANN as effective and fast way to allocate tasks in mesh.

REFERENCES

[1] RUSSELL J. J., A Simulation of First and Best Fit Allocation Algorithms in a Modern Simulation Environment. Sixth Annual CCEC Symposium, 2008.

[2] TENENBAUM A., WIDDER E., A comparison of first-fit allocation strategies. ACM Annual Conference/Annual Meeting, 2000, pp 875-883.

[3] HEYKIN S., Neural Networks: A Comprehensive Foundation, Prentice Hall, 1998.

[4] MATLAB Help.

229


Keywords: friction, limit cycles, rolling mill gauge control

Malgorzata SUMISLAWSKA∗ †

Peter J. REEVE∗

Keith J. BURNHAM∗

Iwona POZNIAK-KOSZALKA†

Gerald HEARNS‡

COMPUTER CONTROL ALGORITHM SIMULATION ANDDEVELOPMENT WITH INDUSTRIAL APPLICATION

The paper addresses the prediction of limit cycles in a hot strip mill gauge control system. Theeffects of a strongly nonlinear friction force in the control loop are investigated. Use is made of thesinusoidal input describing function method in order to determine the limit cycle frequency and ampli-tude. The oscillations measured on the plant are reproduced by the model. Two methods are proposedin order to avoid the effects of the nonlinearity in the plant, namely, dither and friction compesation.

1. INTRODUCTION

Rolling is a process of shaping a metal piece by a reduction of itsthickness. Themetal is compressed by passing it between rollers rotating with the same velocity andan opposite direction. The final stage of the rolling process is a finishing mill, wherethe main goal is to maintain the exit gauge, i.e. the thickness of the steel strip emergingfrom the mill, within the tight specifications and control it to a tolerance of20µm. Thefinishing mill consists of several stands, which consecutively reduce the thickness of thesteel strip. Each of the finishing mill stands is controlled separately.

It has been observed that the exit gauge oscillates with a frequency od 0.36 Hz. Itis belived that such a behaviour is due to limit cycles which are caused by a stronglynonlinear friction. Thus a model of a single finishing mill stand is developed. Then,limit cycles are predicted making use of a sinusoidal input dectribing function (SIDF)

∗Control Theory and Applications Centre, Coventry University, Coventry, UK†Wroclaw University of Technology, Wroclaw, PL‡Converteam, Rugby, UK

230

Fig. 1. Details of controlled plant

and the frequency domain results are compared with the outcomes of a simulation. Inorder to eliminate the oscillations two methods are proposed and a simulation study isperformed.

2. PLANT DETAILS

A schematic view of the plant is presented in Fig. 1. The steel strip remains con-stantly in a contact with a pair of working rolls, which are supported by the backup rolls.The entry gauge of the strip is denoted asH, whilsth corresponds to the exit gauge. Thehydraulic actuator at the top of the stack changes the position of the backup and workrolls, controlling the exit gauge [2, 4, 3].

The change of the exit thickness of the strip is dependent on the force acting on themetal pieceProll, further namedroll force, and the position of the hydraulic pistonz[2, 3]:

∆h = ∆z +Proll

M(1)

whereM is the mill sensitivity to force (mill modulus) [2].A change of the roll force in the point of operation can be described by the following

linear equation:∆Proll = −q∆h+R∆H (2)

where Q and R are moduli of exit and entry gauge, respectively.

231

Harsh temperature contitions close to the rolling mill stand cause a direct measure-ment of the exit gauge impossible [2, 4, 3], hence a need arises to estimate the exit gaugechange from the measured value of the roll force and the mill modulus:

∆he = ∆z + PmeasC

M(3)

wherehe refers to the estimated exit gauge,M is the estimated value of the mill modulusandPmeas corresponds to the measured value of the roll force. In order to improve therobustness of the control loop acompensation variableC < 1 is introduced [2].

There are two possibilities to measure the roll force: making use of the loadcel sensorat the bottom of the stack and from the pressure in the hydraulic chamber (cf. Fig. 1).Due to economic reasons the latter method is utilised. However, the force measuredfrom the hydraulic pressure is affected by the mill housing friction (Pfric):

Pmeas = Proll + Pfric (4)

It is believed that the friction force, due to its nonlinear character, leads to limit cycles,and hence to the oscillatory behaviour of the plant.

3. PLANT MODELLING

3.1. STACK MODEL

The stack is modelled making use of a classical mass-spring-damper model (cf. Fig.2). Due to a symmetrical construction of the stack, only upper backup and work rolls aretaken into consideration. In the further analysis the damperd1 is replaced by a frictionmodel, which introduces a non-linear dependency between the piston velocity and thefriction force.

3.2. ACTUATOR MODEL

The hydraulic actuator model is presented in Fig. 3. The termKp refers to theproportional position controller gain and defines a relation between the position errorand the fluid flow to the capsuleq. p corresponds to the capsule pressure acting on thepiston of areaAp.

Dependency between the fluid flow into the capsule and the pressure acting on thepiston area is represented by the following linear relation [5, 7, 9]:

p = Kc

∫

qdt−Apl

Apl(5)

232

Fig. 2. Mass-spring-damper representation of stack

Fig. 3. Model of hydraulic servo system

Where l refers to the stroke length, and the termKc corresponds to the hydraulic oilcompressibility coefficient. The force acting on the hydraulic piston is given by:

Fh = App (6)

The overall model of the rolling mill stand, containing the stack and the actuatorsmodels, is presented in Fig. 4.

3.3. FRICTION MODEL

The friction is modelled as a sum of three components: a Coulomb friction, a viscousfriction and a Stribeck friction (also namedstiction).

Pfric = PC + PS + PV (7)

where the termPfric denotes the total frictional force, whilstPV , PC andPS refer to theviscous, Coulomb and Stribeck friction, respectively.

233

Fig. 4. Overall plant model

Coulomb friction is modelled as follows:

PC = −FCsign(z)(1− e| z

VC|) (8)

The termFC denotes the Coulomb friction level, whilst the exponential termis intro-duced in order to avoid a zero-crossing discontinuity.

The element related to the viscous friction is modelled as a linear function of velocity:

PV = −mV z (9)

where the termmV is a viscous damping of the frictional force.The Stribeck friction (stiction) model is given by:

PS = −FSsign(z)(1− e| z

V1|)e

| z

V2|

(10)

The termFS determines the magnitude of the static friction, whilstV1 andV2 are utilisedto shape the stiction model.

4. INVESTIGATIONS

4.1. LIMIT CYCLE PREDICTION

Amplitude and frequency of the limit cycles predicted using sinusoidal input describ-ing function (SIDF) method [1, 6]. The results of the frequency domain analysis arepresented in Fig. 5. One can notice a strong dependence of the strip modulus on theamplitude of limit cycles. The amplitude of the oscillations increase with an increase of

234

−14 −12 −10 −8 −6 −4 −2 0

x 10−10

−4

−2

0

2

4

6

8

x 10−10 Limit cycle prediction

Imag

Real

Increasing amplitude of limit cycle

C=1C=0.9C=0.8

−2.5 −2 −1.5 −1 −0.5 0

x 10−9

−1

−0.5

0

0.5

1

1.5

2

x 10−9 Limit cycle prediction

Imag

Real

Increasing amplitude of limit cycle

Q=300T/mmQ=900T/mmQ=2000T/mm

Fig. 5. Prediction of limit cycles; left: impact of compensation variable, right: impact of strip modulus

thestrip sensitivity to force. The influence of the strip moduluson the frequency of limitcycles in negligible. The compensation variable has also an impact on the amplitude oflimit cycles, whilst no influence on the oscillations frequency.

4.2. SIMULATION STUDY

The results obtained in the frequency domain are confronted withthe outcomes ofthe simulation study. Fig. 6(a) shows a strong dependence of the limit cycle frequencyon the compensation variable and the strip modulus. All the same, the influence of theabove mentioned paremeters on the frequency of oscillations is negligible.

Basic assumption of the SIDF technique is a sinusoidal input to the nonlinearity[1, 6, 8]. The simulation study shows that if the input to the nonlinear element stronglydeviates from a sinusoide, the frequecy and amplitude of the oscillations significantlydiffer from those obtained making use of the SIDF method (cf. Fig. 6(b)). Futher-more, a dependency between the friction model coefficients and the shape of the inputto the nonlinearity is observed. This fact is utilised for a reproduction of the oscillationsmeasured on the plant (cf. Fig. 7).

5. PROPOSED CONTROL

In order to suppress the limit cycles two solution are proposed,namely, dither and amodel based friction compensation.

235

15 16 17 18 19 20 21 22 23 24 250.96

1

1.04Simulated limit cycle for various Q

Q=500 T/mmQ=900 T/mm

15 16 17 18 19 20 21 22 23 24 250.96

1

1.04Simulated limit cycle for various C

Exi

t gau

ge [m

m]

C=0.9C=1

(a) Influence of C and Q on limit cycle frequency and amplitude

10 11 12 13 14 15 16 17 18 19 200.9

1

1.1Dependence between input to nonlinearity and shape of limit cycles

Time [s]

Exi

t gau

ge [m

m]

Non−sinusoidal input to nonlinearitySinusoidal input to nonlinearity

(b) Dependence of limit cycle frequency and amplitude on shape of input to nonlinearity

Fig. 6. Time domain simulation of limit cycles

5.1. DITHER

The simulation study shows that adding a high frequency signal to the piston positionreference signal results in a limit cycles elimination (cf. Fig. 8). The dither is in aform of a square wave with the frequency of 25Hz (half of the servo-system controllersampling time) and amplitude of15µm. Although a dither gives very promising resultsin a simulation it may be difficult to apply it on the real plant. Long-term vibrations ofthe hydraulic chamber may lead to an increased wear and tear of mechanical parts. Thiswould shorten a lifetime of the actuator and, consequently, raise the maintenance costsof the plant.

236

10 11 12 13 14 15 16 17 18 19 20−50

0

50Force [T]

MeasuredSimulated

10 11 12 13 14 15 16 17 18 19 20−0.05

0

0.05Gap position [mm]

10 11 12 13 14 15 16 17 18 19 20−0.05

0

0.05Gaugemeter exit gauge [mm]

Time [s]

Fig. 7. Reproduction of measured data

5.2. FRICTION COMPENSATION

Based on a friction model, the friction force is estimated makinguse of the pistonvelocity. The estimated friction force is then subtracted from the measured hydraulicforce. Thus the approximated roll force, which is the difference between the measuredforce and the estimated friction, is an input to the gaugemeter.

Variable environmental conditions of the plant, such as temperature, properties oflubricating agent, wear of contacting surfaces, make the friction difficult to estimate.Hence, the efficiency of a friction compensator in presence of a model mismatch is in-vestigated. The dependencies of the ‘true’ friction model and the models used by thecompensator on the piston velocity are presented in Fig. 9. One can notice a significantdiscrepancy between the friction acting on the piston and the model used for compensa-tion.

Simulation results of the friction compensation are presented in Fig. 9. Although thefriction model mismatch is significant, the compensator performs very well. Making useof a linear viscous friction model, the amplitude of limit cycles is reduced ca. five times,whilst the second model (viscous plus Coulomb) virtually eliminates the unwanted os-cillations.

237

0 0.5 1 1.50

0.5

1

1.5

2Transient response

Exi

t gau

ge [m

m]

30 31 32 33 34 350.98

0.99

1

1.01

1.02Steady state

0 0.5 1 1.50

1

2

3

4

Gap

[mm

]

Time [s]30 31 32 33 34 35

1.96

1.98

2

2.02

2.04

Time [s]

Fig. 8. Elimination of limit cycles by applying dither. Grey dashed line: before application of dither, blacksolid line: with dither

6. CONCLUSIONS

The rolling mill is modelled making use of the well-known mass-spring-damper rep-resentation. A nonlinear model of the friction is developed.

The sinusoidal-input describing function (SIDF) technique is utlisised in order topredict limit cycles. The inverstigation in the frequency domain shows a strong influenceof the compensation veriable and the strip modulus on the amplitude of oscillations, butno impact on the limit cycle frequency. The predicted frequency of oscillations is 0.9Hz, whilst the frequcny of registered signals is given by 0.36 Hz.

The simulation study show a significant influence of the shape of the niput to thenonlinearity on the frequency of limit cycles, what is not a surprise since the basic as-sumption of the SIDF method is a sinusoidal input to the nonlinear element [1, 6, 8].The developed model is capable of reproducing the measured data, if the input to thenonlinearity daviayes from a sinusoide.

Two methods of limit cycle elimination are proposed: dither and model based frictioncompensation. A simulation of the former gives promising results, however there is adiffinculty in application of a dither in practice. Thus a need for more sophisticatedlimit cycle suppression method arises. The friction compensation virtually eliminateseliminates the oscillations, even in presence of a significant discrepancy between thefriction acting on the hydraulic piston and the friction model used for compensation.

238

−1.5 −1 −0.5 0 0.5 1

x 10−5

−4

−3

−2

−1

0

1

2

3

4x 10

4

Velocity [m/s]

Fric

tion

[N]

Friction models used for compensation

truemodel 1model 2

11 12 13 14 15

0.98

0.99

1

1.01

1.02

Time [s]

Exi

t gau

ge [m

m]

Effect of friction compensation

no compensationmodel 1model 2

Fig. 9. Left: friction acting on the piston (denoted as ‘true’) and models used for compensation. Right:simulation of friction compensation

REFERENCES

[1] ATHERTON D. P. Nonlinear control engineering. Van Nostrad Reinhold Company Ltd.,Wokingham, 1975.

[2] ALSTOM Power Conversion. Gaugemeter control, 2003.

[3] YILDIZ S. K. et al. Dynamic modelling and simulation of a hot strip finishing mill.AppliedMathematical Modelling, vol. 33, pp. 3208 – 3225, 2009.

[4] HEARNS G. Hot Strip Mill Gauge Control: Part 1. Converteam, 2009.

[5] JELALI M. and KROLL A. Hydraulic servo-systems: modelling, identification and control.Springer-Verlag, London, 2003.

[6] KHALL H. K. Nonlinear systems. Prentice Hall, Upper Saddle river, New Jersey, 3rdedition, 2002.

[7] MERRITT H. Hydraulic Control Systems. John Wiley and Sons Ltd., New York, 1967.

[8] VAN DE VEGTE J. Feedback control systems. Prentice Hall Inc., Englewood Cliffs, NewJersey, 3rd edition, 1994.

[9] VIERSMA T. J. Analysis, Synthesis and Design of Hydraulic Servosystems and Pipelines.Elsevier Scientific Publishing Company, New York, 1980.

239


Keywords: nonlinear systems, model based predictive control,

PI controller, HVAC, reduced energy consumption

Ivan ZAJIC†

Keith J. BURNHAM†

Tomasz LARKOWSKI†

Dean HILL‡

DEHUMIDIFICATION UNIT CONTROL OPTIMISATION

The paper focuses on heating ventilation and air conditioning (HVAC) systems dedicated for cleanroom production plants. The aim is to increase the energy efficiency of HVAC systems via controlparameter optimisation. There is much scope for improvement within the humidity control, where thedehumidification units (DU) are employed. The current control of DU, utilises a proportional plusintegral (PI) controller, which is sufficient for maintaining the specified levels of humidity. However,since the dehumidification process exhibits non-linear characteristics and also large transport delaysare present the tuning of PI controller is a non-trivial task. The research focus is on applying controloptimisation technique based on a model predictive controller in order to achieve tight specificationsand energy efficient control performance.

1. INTRODUCTION

Abbott Diabetes Care (ADC) UK, an industrial collaborator of Control Theory andApplications Centre, develops and manufactures the glucose and ketones test strips,which are designed to help people with diabetes. One of the production quality re-quirements is that the environmental conditions during the production are stable, wherethe air relative humidity has to be lower then20% and the corresponding temperature is20.5 ± 2oC.

Heating ventilation and air conditioning (HVAC) systems are commonly used tomaintain environmental conditions in industrial (and office) buildings. The HVAC sys-tem provides the manufacturing areas with conditioned fresh air such that the air tem-

†Control Theory and Applications Centre, Coventry University, Coventry, UK‡Abbott Diabetes Care, Witney, Oxfordshire, UK

240

perature and the relative humidity are regulated within specified limits. In some casesthe air CO2 level is also required to be regulated, however this is not the case here. TheHVAC systems are highly energy demanding. Just in ADC UK the estimated annualenergy expenditure for 2009 is£3 million. The increased energy costs and the aware-ness of the environmental issues, such as CO2 emissions, prompts the need for increasedenergy efficiency of these systems. It is estimated in [4] that the 15% of HVAC’s overallenergy usage can be saved by good control.

The paper is focused on increasing the energy efficiency of HVAC systems locatedin ADC UK via control parameter optimisation. There are around 70 HVAC systems inADC UK, where only one is chosen for testing and optimisation purposes. The typicalHVAC plant comprises from two basic components, i.e. the dehumidification unit (DU)and air handling unit (AHU). Both of the units utilises a proportional plus integral (PI)controller for adjusting the gas and cooling/heating valve, respectively. Up to date thelargest scope for improvement has been found within the humidity control, where theDU is employed, hence the research focus is narrowed to control optimisation of DU.The optimisation of temperature control is treated here as a further work. Some workon PI gain tuning and PI controller enhancement has been already done in [8], wherethe cost function minimisation technique based on the derived dehumidification processmodel is employed.

The tuning method, which utilises an unconstrained model predictive controller (MPC)is applied here and the results are compared with those from [8]. This method providesmore intuitive way of PI gain tuning, while the problem of the cost function selectionis avoided. Since the MBPC controller is based on the quadratic cost function min-imisation and can guarantee an optimal performance at each successive time instancethe appropriate PI gains can be assigned online, i.e. gain scheduling. Moreover, thephysical constraints such as the speed of valve modulating, valve operational limits andthe production environment limits can be inherently implemented within the constrainedMPC.

2. PLANT DESCRIPTION

The chosen production area is a room with the designation CCSS2.The schematicof the HVAC plant is given in Figure 1. Starting at the point where the return air isextracted by suction from the manufacturing area (environmentally controlled room),denoted 1, and passed through the main duct to the mixing section, denoted 2. Thereturn air is mixed with the fresh air from the fresh air plant and progress to the DU,where the mixed air is dehumidified. The dehumidified air then progresses from section

241

3 to the AHU, where the air is heated or cooled depending on the operating requirements.The conditioned air then continues to section 4, where a pre-configured amount of airis conducted to the air lock and back to manufacturing area. The detailed descriptionof DU functionality will follow in subsequent section and the description of the otherHVAC components can be found in eg. [1].

Room

Dehumidificationunit

Air handlingunit

PI

PI

Measuredsignals

Fresh air

ExhaustOutside

air

Air lock

Damper

Temperature

Humidity

1

2 3

4

Fig. 1. Schematic diagram of the HVAC system.

2.1. DEHUMIDIFICATION UNIT DESCRIPTION

DU comprises of a large wheel with a honeycomb structure coated with a moistureabsorbent desiccant, in this case, silica gel. The wheel rotates with a constant angularvelocity of 0.17rpm. The process air is driven through the lower part of the wheel,which is approximately3

4of its overall surface. The silica gel absorbs the moisture from

the air, however, this process is exothermal and the air is warmed as well. Consequently,to remove the absorbed water, from the silica gel, the inverse process has to be applied,hence heat needs to be provided. The hot outdoor air is blown through the upper part

242

of the dehumidification wheel, approximately14

of its surface. The outdoor air is heatedwith a gas burner, as the hot outside air is driven through the dehumidification wheel itdries the silica gel and then goes to the exhaust.

The more heated the outdoor air is, the more absorbed water is removed from thesilica gel. The gas burner is coupled with the gas valve, so that by adjusting the gas valveposition the temperature of the outdoor air blown through the dehumidification wheelcan be regulated. Consequently, the capability of DU to absorb the water vapour fromthe process air is modulated as well. Due to the cross-coupling between air temperature(dry bulb temperature) and relative humidity the humidity level within the manufacturingarea is measured in terms of the dew point temperature.

3. DEHUMIDIFICATION PROCESS IDENTIFICATION

The modelling of the relatively complex MIMO HVAC system has been reduced hereto consideration of the dehumidification process only, since this process has the greatestinfluence on the energy consumption. The humidity control system in the closed-loop(CL) setup is depicted in Figure 2, where P denotes the dehumidification process and Ddenotes the dynamics of disturbances caused by personnel within the clean room pro-duction area. The signaly(t) = y(t) + h(t) denotes the measured system output

r(t) e(t)

−

PIu1(t)

u2(t)

SYSTEM

P

D

h(t)

h(t)

y(t) y(t)

Fig. 2. Schematic of the closed loop system with existing PI controller.

(humidity measured in terms of dew point temperature),u1(t) denotes the control ac-

243

tion, u2(t) the (measurable) second input being the outdoor air relative humidity,r(t)is the set-point,e(t) denotes the error signal andh(t) denotes an assumed (not mea-sured) load disturbance. Sinceu2(t) has a direct undesirable impact on the humiditylevel within the production area, the relative humidity of the outdoor air can also beinterpreted as a disturbance.

For the system identification purposes assume single-input single-output discrete-time system expressed in a difference equation form, i.e.

y(t)+a1(t)y(t−1)+ . . .+an(t)y(t−n) = b1(t)u(t−1)+ . . .+ bm(t)u(t−m), (1)

wheret denotes discrete time index. The state dependent parameters are denotedai(t), i =1, . . . , n, andbi(t), i = 1, . . . ,m, respectively. It is assumed that the individual state de-pendent parameters are functions of one of the variables in a state variable vectorx(t),see eg. [7]. The pure time delay, denotedd, given in sampling intervals is introducedsuch that the appropriate number of leadingbi(t) parameters are zero, i.e.bi(t) = 0 fori = 0, . . . , d− 1.

The dehumidification process P is modelled as a first order system withd = 5 sam-ples, i.e.

y(t) = −a1(t)y(t− 1) + b5(t)u1(t− d) + o+ ξ(t). (2)

The individual state dependent parameters are expressed as

a1(t) =α1 − η1u1(t− d), (3a)

b5(t) =η2u2(t− d) + α2 (3b)

andξ(t) denotes coloured noise given by

ξ(t) = e1(t) + c1e1(t− 1). (4)

Constant offset is denotedo, e1(t) is a white zero mean process noise having varianceσ2

1. Note, that the coefficientsη1 andη2 can be interpreted as a bilinear terms.The load disturbanceh(t) is assumed to be caused by personnel in the production

area. However, there are also other disturbances, which are not modelled here, e.g.moisture from cleaning floor and work surfaces and moisture produced by machinery.For the tuning purposes the signalh(t) is assumed to have a staircase shape. A singlestair represents a single person, who generates the moisture corresponding to2oC in-crease of a dew point temperature. Two stairs represent two persons within the room.Maximum number of people within the room at the same time is 2. The width of a stair

244

represents the time of occupancy (time duration). It is expected that D can be describedby afirst order process

D(s) =K

s+ a, (5)

where the constanta (pole) is inversely proportional to the settling time, denotedTset,i.e. a = 4/Tset. The settling time is chosen to beTset = 5min. The gain, denotedK, ischosen to be unity.

3.1. PARAMETER ESTIMATION

The chosen production area had no personnel (h(t) = 0) nor operating machineryinside at the time of data acquisition. Consequently, the measured output is thus thecasey(t) ≈ y(t) and a dehumidification process can be directly estimated. The HVACsystem was operating in an open-loop (OL) setup, which allows OL estimation tech-niques to be used, such as the considered method of extended least squares (ELS), see[3]. However, in practice, the production process requires, that the specified humiditylevel is maintained at all times hence only CL estimation techniques can be consideredin normal practice.

Two data sets are collected from the clean room production area. Whilst the first dataset, collected on 27/05/2009, is used for model estimation, the second data set, collectedon 28/05/2009, is used for model validation. Both data sets contain signalsy(t), u1(t)andu2(t), which were acquired with a sampling time of1s. The estimation data setcomprising of35, 000 data samples is plotted in Figure 3. The measured signals werere-sampled with the new sampling timeTs = 32s and the corresponding time delayof 5 samples has been estimated, i.e.d = 5. The coefficients of model (2) have beenestimated by adopting the method of ELS. The results are given in Table 1.

Table 1. Estimated coefficients of dehumidification process model.

α1 α2 η1 η2 o c1-1.018 -0.034 6.728 × 10−6 -0.001 0.838 −4.930 × 10−6

The dehumidification process model (2) is simulated in OL settingusing the inputsu1(t) andu2(t) acquired on 27/05/2009 and 28/05/2009, respectively. The fit (given in[%]) between the simulated system outputs and the measured outputs is assessed by

fit =

(

1−‖ys(t)− y(t)‖2

‖y(t)− E[y(t)]‖2

)

× 100, (6)

245

0.5 1 1.5 2 2.5 3 3.5

x 104

−20

−10

0.5 1 1.5 2 2.5 3 3.5

x 104

30

40

50

60

0.5 1 1.5 2 2.5 3 3.5

x 104

80

85

y(t)[oC]

u1(t)[%

]u2(t)[%

]

Samples

Fig. 3. The estimation data set acquired on 27/05/2009 in OL setting withTs = 1s.

whereE[·] denotes the mathematical expectation andys(t) is the simulated system out-put. The model fit is then97.22% for estimation data set and85.82% for validation dataset. Note, that the parsimonious model structure (2) assures relatively high model fit andis suitable for assumed MPC.

4. CONTROL OPTIMISATION

The optimisation scheme is based on the unconstrained non-minimal state spaceMPC, see [6], and consist of two stages. Firstly, the modelled process P is simulatedin the CL setting with the MPC controller. During such a simulation the load distur-bances caused by personnel are imposed. The tracking error, i.e.e(t) = r(t)− y(t), isacquired together with the corresponding optimal control action computed by the MPC

246

controller. Secondly, the optimal PI gains are found such that the squared error betweenthecontrol action obtained by the MPC controller in the first stage and the control actionobtained by the considered PI controller is minimised. Thus the tuning task reduces fromtuning of the PI controller to tuning of the MPC controller.

4.1. MODEL PREDICTIVE CONTROLLER

The MPC design is based on the mathematical model of the plant, which is assumedto be a non-minimal state space model. Defining the state variable vector as

x(t) =[

∆y(t) ∆y(t− 1) . . . ∆y(t− n+ 1)

∆u(t− 1) . . . ∆u(t−m+ 1) y(t)]T

, (7)

where∆ is the differencing operator defined as∆ = 1 − q−1 andq−1 is the backwardshift operator defined asq−1y(t) = y(t − 1), u(t) denotes the controllable input, i.e.u(t) = u1(t). Note, that the states in such a state variable vector are current and pastmeasurements of system output and past measurements of system input, hence the stateestimation is avoided here. Then the corresponding non-minimal state space model isdefined as follows

x(t+ 1) = Ax(t) +B∆u(t) (8)

and the output equation is given by

y(t) = Cx(t), (9)

where the state transition matrixA, input vectorB and output vectorC are defined as

A =

−a1 −a2 . . . −an b2 b3 . . . bm 01 0 0 0 0 0 0 0 0

0. .. 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 00 0 0 0 0 0 0 0 00 0 0 0 1 0 0 0 0

0 0 0 0 0. . . 0 0 0

0 0 0 0 0 0 1 0 0−a1 −a2 . . . −an b2 b3 . . . bm 1

, (10)

B =[

b1 0 . . . 0 1 0 . . . 0 0]T

, (11)

247

C =[

0 . . . 0 0 0 0 . . . 0 1]

. (12)

Note that the time indexes of the state variable parameters are relaxed here, i.e.a1(t +1) = a1. Since the MPC controller does predict the future values of the system output,y(t+1), . . . , y(t+Hp), at timet based on the mathematical model of the plant, the futurevalues of the state variable parameters,a1(t + 1), . . . , a1(t + Hp), are also required.However, these are in general unavailable and for simplicity are considered here to beconstant over the prediction horizon, hencea1(t) ≈ a1(t+ 1), . . . , a1(t+Hp).

The aim of the MPC is to minimise the variance of the future error between the systemoutput and set-point by predicting the output and knowing the set-point in advance, henceminimising the given cost function, see [5], i.e.

JMPC = (Y −R)T (Y −R) + ∆UTλI∆U, (13)

with respect to current and future values of the differenced control action yields thedesired control action. The vector of future set-points (or reference signal) is defined as

RT = Rr(t) (14)

andR is the1×Hp unit vector given by

R =[

1 1 . . . 1]

. (15)

The vector of the predicted system outputs is given by

Y =[

y(t+ 1|t) y(t+ 2|t) . . . y(t+Hp|t)]T

, (16)

where y(t + j|t), j = 1, . . . ,Hp denotes the predicted system output based on theinformation up to and including timet, and the vector of incremental control actions isdefined as

∆U =[

∆u(t) ∆u(t+ 1) . . . ∆u(t+Hc − 1)]T

. (17)

Theuser specific tuning parameters are prediction horizonHp, control horizonHc ≤ Hp

and scalar cost weighting parameterλ.The analytical solution of the cost function minimisation is obtained by setting

∂JMPC

∂∆U= 0 (18)

248

and rearranging the solution with respect to∆U leads to the (unconstrained) MPC algo-rithm

∆U =[

ΦTΦ+ λI]−1

ΦT [R− Fx(t)] , (19)

where only the first term of∆U is applied to the plant, hence

uMPC(t) = uMPC(t− 1) + ∆u(t). (20)

The matricesF andΦ are defined as

F =

CACA2

...CAHp

(21)

and

Φ =

CB 0 . . . 0CAB CB . . . 0

.... . .

CAHp−1B CAHp−2B . . . CAHp−HcB

. (22)

4.2. PI GAIN OPTIMISATION

The optimisation scheme is based on a so called control signal matching method [2].In this case the PI gains are chosen to yield a close match between the PI and MPCcontrol signals. So that the the tuning task of finding pair of PI gains moves towardstuning of the MPC controller instead, which in this case, is more straightforward.

The PI controller algorithm is given by

uPI(t) = Kpe(t) +KpTs

TI

t∑

i=1

e(i), (23)

where only unknown parameters in the equation (23) are the proportional gainKp andintegral timeTI , hence the PI control gains. In order to obtain the PI control gains, thefollowing cost function is minimised

JPI =

N∑

t=1

[uPI(t|Kp, TI)− uMPC(t)]2 , (24)

with respect toKp andTI .

249

5. SIMULATION STUDY

The environmental conditions are such that the air relative humidity has to be lowerthen 20% and the corresponding temperature is20.5 ± 2oC. Considering the safetymargins on the relative humidity the targeted relative humidity in the manufacturingarea is10% and the corresponding set-point is thenr(t) = −11oC (measured in termsof dew point temperature).

The main goals of the PI controller, in the considered case, is to maintain, firstly, thehumidity level at the required set-point and, secondly, to reject the load disturbances.Since the load disturbances, i.e. personnel in the production area, may cause violation ofthe relative humidity limit (20%) the PI controller is tuned such that the fast disturbancerejection is achieved.

5.1. SIMULATION SETUP

The MPC controller (19) is simulated in the CL setting with the humidity processmodel (2). During such a simulation the disturbances are introduced. The disturbancesignalh is shown in the bottom part of the Figure 4 and filter D (5) has been discretisedusing zero order hold having sampling timeTs = 32s. The tracking errore(t) andthecorresponding optimal control actionuMPC(t), i.e. u1(t), are acquired during such asimulation run. The setting of the MPC controller is chosen such thatHp = 100, Hc = 1andλ = 2, which corresponds to rather active setting. For the completeness, the statevariable vector and the corresponding tripletA,B,C are shown, hence

x(t) =[

∆y(t) ∆u1(t− 1) ∆u1(t− 2) ∆u1(t− 3)

∆u1(t− 4) y(t)]T

(25)

and the matrixesA,B,C are

A =

−a1 0 0 0 b5 00 0 0 0 0 01 0 0 0 0 00 1 0 0 0 00 0 1 0 0 0

−a1 0 0 0 b5 1

, (26)

B =[

0 1 0 0 0 0]T

, (27)

C =[

0 0 0 0 0 1]

. (28)

250

5.2. SIMULATION RESULTS

Theoptimised PI gains based on the proposed optimisation procedure are

Kp = −3.03 (−4.69),

TI = 23.52 (27.89)min, (29)

where the results given in brackets are those from [8]. Consequently, both pair of PIgains (29) were implemented in ADC UK yielding similar response. It is assumed this isdue to the plant insensitivity to precise value of PI gains caused by e.g. the gas valve stic-tion, sensor resolution etc..The graphical results of a optimisation procedure are shownin Figure 4, where theuMPC(t) obtained from MPC anduPI(t) obtained via optimi-sation procedure are shown. Is is evident that the PI controller cannot achieve the sameperformance as MPC.

6. CONCLUSIONS

The PI controller which maintains the humidity level within the production areahas been optimised. The optimisation procedure is based on a control signal matchingmethod, where the PI gains are chosen such that the control signal from model predictivecontroller matches the control signal from PI controller as close as possible. Since dur-ing such a optimisation procedure no measurement noise is present, rather non-minimalstate space model based controller has been chosen, where the states are current and pastmeasurements of system output and past measurements of system input. The optimisedPI gains were compared to those obtained by cost function minimisation technique basedon the identified state dependent model of the dehumidification process. Consequently,the two pairs of PI gains were applied in Abbott Diabetes Care yielding undistinguish-able plant behaviour. It is anticipated, that this is cause by plant insensitivity to exact PIgains.

REFERENCES

[1] HILL D., DANNE T., BURNHAM K. J., Modelling and control optimistion of desiccantrotor dehumidification plant within the heating ventilation and air conditioning systems ofa medical deivce manufacturer. Proceedings of the International Conference on SystemsEngineering ICSE, 2009, pp. 207–212.

[2] JOHNSON M. A., MORADI M. H.,PID control: New identification and design methods.Springer-Verlag, London, 2005.

251

500 1000 1500 2000 2500

20

25

30

35

40

45

500 1000 1500 2000 25000

1

2

u1(t)[%

]h(t)[oC]

uMPC

uPI(t)

Samples

Fig. 4. Comparison of the control action computed utilising the unconstrained MPC, denoteduMPC , andthe estimated control action computed by the PI, denoteduPI , where the PI gains are optimised based ontheuMPC signal. (Ts = 32s).

[3] LJUNG L., System Identification - Theory for the user. Prentice Hall PTR, New Jersey,1999.

[4] LEVENMORE G.,Building control systems - CIBSE guide H. Oxford, UK, 2000.

[5] WANG L., Model predictive control system design and implementation using Matlab.Springer, 2009.

[6] WANG L., YOUNG P. C.,An improved structure for model predictive control using non-minimal state space realisation. Journal of Process Control, vol. 16, 2006, pp. 355–371.

[7] YOUNG P. C., MCKENNA P., BRUUN J.,Identification of non-linear stochastic systemsby state dependent parameter estimation. Int. J. Control, vol. 74(18), 2001, pp. 1837–1857.

[8] ZAJIC I., LARKOWSKI T., HILL D., BURNHAM K. J., Nonlinear compensator designfor HVAC systems: PI control strategy. Proceedings of the International Conference onSystems Engineering ICSE, 2009, pp. 580–584.

252

polish-british workshops computer systems engineering theory & applications

Documents