Download - Parallel tabu search algorithm for constrained economic dispatch

Parallel tabu search algorithm for constrainedeconomic dispatch

W. Ongsakul, S. Dechanupaprittha and I. Ngamroo

Abstract: A parallel tabu search (PTS) algorithm for solving ramp rate constrained economicdispatch (CED) problems for generating units with non-monotonically and monotonicallyincreasing incremental cost (IC) functions is proposed. To parallelise tabu search (TS) algorithmsefficiently, the neighbourhood decomposition is used to balance the computing load, whereascompetitive selection is used to update the best solution reached among subneighbourhoods. Theproposed PTS is implemented on a 32-processor Beowulf cluster with an Ethernet switchingnetwork on a generating unit system size in the range 10–80 units over the entire dispatch periods.With different subneighbourhood sizes, the proposed PTS compromises the experimental speedupand solution quality for the best performance. PTS is potentially viable for the onlineimplementation of CED because of the substantial generator fuel cost savings and high speedupupper bounds.

1 Introduction

Recently, many parallel tabu search (PTS) algorithms forpower system applications have been implemented onparallel machines. For example, Mori et al. proposed PTSfor solving many combinatorial power system optimisationproblems, including unit commitment, capacitor place-ment in distribution system, reconfiguration of distributionsystems and optimal allocation of FACTS [1–4]. Toparallelise TS, two strategies, including neighbourhooddecomposition into two or three species and multiple tabulengths, were used. The proposed PTS algorithms wereimplemented on Fujitsu S-7/7000U model parallel compu-ters with up to nine processors. However, during theparallel process, there was no solution exchange amongprocessors.

A constrained economic dispatch (CED) is used todetermine the optimal schedule of online generating units soas to meet the load demand at the minimum operating costunder various system and generator operational constraints.To obtain the exact solution for the non-monotonicallyincreasing incremental (IC) functions of generating units,the conventional ‘equal lambda’ could not be used to obtainthe optimal solution [5]. Accordingly, with the advent ofheuristic techniques, evolutionary programming (EP) [6],the simulated annealing (SA) algorithm [7] and the geneticalgorithm (GA) [8] were used to solve the ED problems toobtain the optimal or near optimal solutions.

In this paper, PTS is proposed to solve CED problemsfor generating units with non-monotonically and mono-tonically increasing IC functions. To speed up the

computation, PTS was implemented on the Beowulf clusterwith 32 Intel Pentium II 266MHz processors [9]. Thealgorithm was tested on systems with different subneigh-bourhood size and the number of generating units rangingfrom 10 to 80 over the entire dispatch periods. The PTScompromises the computing time and the solution qualityfor the best performance. In addition, the speedup upperbounds on PTS with 32 species are shown for differentgenerating unit systems. The synchronisation overheadswere analysed in terms of system sizes and number ofprocessors.

2 Constrained economic dispatch problemformulation

The objective of CED is to schedule online generating unitsoptimally so as to meet the load demand at the minimumoperating cost under the power balance constraint andgenerator operational constraints. The ramp rates of thegenerating units are also considered to ensure the viabilityof outputs at the next time period. The CED problem isformulated as:

minimiseCT ðtÞ ¼XNi¼1

CiðPiðtÞÞ ð1Þ

subject to a power balance constraint:

XNi¼1

PiðtÞ ¼ PDðtÞ þ PLðtÞ ð2Þ

and inequality operating constraints of generating units,including their ramp rate limits at time period t:

Pi;lowðtÞ � PiðtÞ � Pi;highðtÞ; i ¼ 1; . . . ;N ; ð3Þ

where

CT(t)¼ total generator fuel cost at time period t (baht/timeperiod)

Ci(Pi(t))¼ generator fuel cost of the ith generating unit attime period t (baht/time period)

W. Ongsakul is with the Energy Field of Study, School of Environment,Resources and Development, Asian Institute of Technology, Pathumthani12120, Thailand

S. Dechanupaprittha and I. Ngamroo are with the Electrical Power EngineeringProgram, Sirindhorn International Institute of Technology, ThammasatUniversity, Pathumthani 12121, Thailand

r IEE, 2004

IEE Proceedings online no. 20040460

doi:10.1049/ip-gtd:20040460

Paper first received 30th August 2002 and in revised form 11th September 2003

IEE Proc.-Gener. Transm. Distrib., Vol. 151, No. 2, March 2004 157

Pi(t)¼ real power output of the ith generating unit at timeperiod t (MW)

PD(t)¼ total real power load demand at time period t(MW)

PL(t)¼ total transmission loss at time period t (MW)

URi¼ up ramp limit of the ith generating unit (MW/timeperiod)

DRi¼ down ramp limit of the i-th generating unit (MW/time period)

Pi,min¼minimum real power output of ith generating unit(MW)

Pi,max¼maximum real power output of the ith generatingunit (MW)

Pi,low¼ lowest possible real power output of the ithgenerating unit at time period t (MW): max{Pi,min,(Pi(t�1)�DRi)}

Pi,high(t)¼ highest possible real power output of the ithgenerating unit at time period t (MW): min{Pi,max,(Pi(t�1)�URi)}

N¼ total number of online generating units to bedispatched.

To achieve the true CED, transmission losses must be takeninto account. In this paper, the traditional B matrix lossformula is used to calculate transmission losses as [10]:

PLðtÞ ¼XNi¼1

XNj¼1

PiðtÞBijPjðtÞ þXNi¼1

Bi0PiðtÞ þ B00 ð4Þ

where Bij¼ the ijth element of the loss coefficient squarematrix, Bi0¼ the ith element of the loss coefficient vectorand B00¼ the loss coefficient constant.

3 Parallel tabu search for CED problems

Parallel tabu search for CED constructs the neighbourhoodsolution space by encoding an initial feasible solutioninto a normalised binary string structure. The neighbour-hood solution space is decomposed into several equal-sizesubneighbourhoods to balance the computing load. Eachsubneighbourhood is assigned to each processor (CPU) tocarry out the tabu search process, incorporating the abilityof escaping from being trapped in local optima [11]. PTSemploys a tabu list restriction for preventing cycling ofthe solution and an aspiration level criterion for impro-ving the solution accuracy. During the parallel searchprocess, the competitive selection strategy is carried out byupdating the best solution reached among processors forevery specified epoch iteration.

3.1 InitialisationBased on the generator operating range ratio, (5) and (6) areused to calculate the initial feasible power outputs of Ngenerating units:

PG;reqðtÞ ¼PDðtÞ þ PLðP1ðtÞ;

P2ðtÞ; . . . ; PN ðtÞÞ �XNi¼1

Pi;lowðtÞð5Þ

PiðtÞ ¼Pi;lowðtÞ þðPi;highðtÞ � Pi;lowðtÞÞ � PG;reqðtÞPN

j¼1

ðPj;highðtÞ � Pj;lowðtÞÞ;

i ¼ 1; . . . ;N

ð6Þ

The Pi(t) in (7) are initially set to Pi,low(t), i¼ 1,y, N.

Equations (7) and (8) are then computed iteratively untilinitial feasible power outputs are obtained, satisfying boththe power balance constraint and generator operationalconstraints.

3.2 Power balance constraintThe power outputs of N generating units at a particulartime period have to satisfy the power balance constraint,unit operating limits and ramp rate constraints. Forarbitrary free unit power outputs Pi(t), Pi,low(t)rPi,high,i¼ 1, y, R�1, R+1, y, N. The Rth dependent referenceunit power output is constrained by the power balanceequation as:

PRðtÞ ¼ PDðtÞ þ PLðtÞ �XNi¼1;i 6¼R

PiðtÞ ð7Þ

In (6), the transmission loss PL(t) can be written in terms ofPR(t) as:

PLðtÞ ¼ AP 2RðtÞ þ BPRðtÞ þ C ð8Þ

where A¼BRR, B ¼ 2PN

j¼1;j6¼R

BRjPjðtÞ þ BR0 and

C ¼PN

i¼1;i 6¼R

PNj¼1;j6¼R

PiðtÞBijPjðtÞ þPN

i¼1;i6¼R

Bi0PiðtÞ þ B00.

Substituting PL(t) in (7),

AP 2RðtÞ þ ðB� 1ÞPRðtÞ þ C þ PDðtÞ �

XNi¼1;i 6¼R

PiðtÞ ¼ 0 ð9Þ

The Rth reference unit power output, PR(t), is obviously thesolution of the quadratic equation (9). PR(t) is regarded as afeasible solution if it satisfies the ramp rate operationalconstraint in (3).

3.3 Encoding and decoding schemeThe concatenated encoding method is employed in thispaper, as shown in Fig. 1. Each unit output is encoded in abinary base string normalised over its operating range. Thisencoding method stacks each unit’s normalised string inseries with each other to construct the string individual.Each unit string structure is assigned by the same number ofn bits.

To obtain the actual generating power output of eachunit for objective function evaluation, each string individualis decoded to the decimal value by:

½PiðtÞ ¼ Pi;lowðtÞ þBi � ðPi;highðtÞ � Pi;lowðtÞÞ

2n � 1;

i ¼ 1; . . . ;Nð10Þ

where Bi¼ decimal integer value of binary string of the ithunit and n¼ number of bits representing each unit poweroutput.

In this paper, 16 bits represent each unit power output.The more the number of bits per unit power output is, thehigher the resolution will be. Each of the N�1 free unitoutputs will be decoded by (10) except that the Rth

unit 1 unit i unit N

xxx…... xxxx xxx…...xxxx xxx…... xxxx

215 214 213 202122

... ...

. . . . . .

Fig. 1 16�N bit concatenated encoding scheme

158 IEE Proc.-Gener. Transm. Distrib., Vol. 151, No. 2, March 2004

reference unit power output is calculated by (9) to satisfy thepower balance constraint.

3.4 Neighbourhood decompositionTo parallelise TS efficiently, the load balancing is carriedout by decomposing the neighbourhood solution spaceinto NSN equal-size subneighbourhoods, where NSN is setto the given number of processors (p). The changeableregion is referred to a subneighbourhood solution space(NS) assigned to each processor. Each NS then has a smallrestricted neighbourhood size of In�N/NSNm. Fig. 2illustrates the concept of neighbourhood decompositionstrategy.

3.5 PTS operators

3.5.1 Trial solution generation: To generate atrial solution of a subneighbourhood, bits of the binarystring in the changeable region are flipped one at a time.Starting from an initial feasible solution, PTS performs adeterministic advanced local search [12].

In this paper, the maximum number of trial solutions ineach iteration is set to the size of NS. As an example, thefirst, second and third bits of an initial feasible solution areflipped one at a time to yield three trial solutions, as shownin Fig. 3. Subsequently, the Rth reference unit power outputis obtained from (9) to satisfy the power balance constraint.

3.5.2 Tabu list restriction: Tabu list (TL) isreferred to as an adaptive memory. The mechanism of TLis to keep attributes (bit positions) of the best solutions inpast iterations, in which each of the best solutions is used asan initial solution to generate the trial solutions in eachsubsequent iteration. The attributes in TL are temporarilyfixed and cannot be flipped to generate the new trialsolution candidates unless the aspiration criterion is met(discussed in Section 3.5.3). As the iteration proceeds, a newattribute enters into TL as a fixed attribute and the oldestattribute is released from TL and becomes a free attribute,as shown in Fig. 4. In particular, TL affects the quality ofthe solution by controlling the search directions so that thesolution is not trapped in the local optima. The length ofTL is also called the tabu length. Basically, the tabu lengthproviding good solutions usually grows with the size of theproblems. However, the appropriate size of the tabu lengthcan be identified by observing the quality of solutions. If the

size of TL is too small, the cycling of solution occurs in thesearch process. On the other hand, if the size is too large, thesearch process will be too much restricted, which may

deteriorate the solution. In our applications,ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffisize of NS

p� �is used to determine the size of TL [12].

3.5.3 Aspiration level criterion: To improve thesolution quality, the aspiration level (AL) criterion isincluded. The major role of AL is to allow the fixed(tabued) attributes included in TL to produce the solutioncandidate if that solution candidate yields a more economic-al solution. There are many different aspiration level criteriaused [12]. The AL used in this paper is to override the tabustatus if the solution candidate of the tabued attribute yieldsa solution that is cheaper than the best solution reached.After the AL is satisfied, updating TL is carried out bymoving the tabued attribute back to the first position of TL.

3.5.4 Reference unit rotation strategy: Thereference unit rotation strategy is also used to reduce thesearch effort towards the optimal region. Regardless ofattributes in the fixed region, the Rth reference unit shouldnot be fixed at the particular unit. More specifically, the Rthreference units of each subneighbourhood are initially set tothe first units in the fixed regions. For the second iteration,the Rth reference units are moved to the second units in thefixed regions of each subneighbourhood, and so on, asshown in Fig. 5.

3.5.5 Competitive selection strategy: To ob-tain a fast convergence rate towards the optimal solutionregion, PTS employs the competitive selection strategy bysetting the best solution reached among all subneighbour-hoods as an initial solution for all subneighbourhoods forevery specified epoch (G) of 20 iterations.

3.6 PTS procedure for CEDThe following notation is used for the PTS procedure:

TL: tabu list

AL: aspiration level criterion

NS: subneighbourhood solution space

C(X): objective value of solution X

X(k,0): initial feasible solution at iteration k

X(k,m): trial solution m at iteration k

Xkcb: current best trial solution at iteration k

Xb: best solution reached

kmax: maximum allowable number of iterations

T: total number of time periods in the time horizon

Using NSN processors, the PTS with NSN species pro-cedure can be described as follows:

Step 1: Each processor reads the unit operating limits, heatinput–output characteristics, ramp rate constraint limits,

entire neighbourhood : 1001011010110010 1100011010110011 1110010011001100

subneighbourhood 1 (CPU 1) :changeable

changeable

changeablefixed

fixed

fixed

fixedsubneighbourhood 2 (CPU 2) :

subneighbourhood 3 (CPU 3) :

1001011010110010 1100011010110011 1110010011001100

1001011010110010 1100011010110011 1110010011001100

1001011010110010 1100011010110011 1110010011001100

Rth

Rth

Rth

Fig. 2 Example of neighbourhood decomposition

R th

R th

R th1011011010110010

initial solution 1001011010110010 xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx

trial solution 1 0001011010110010 xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx

:

:

trial solution 2 1101011010110010 xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx:

trial solution 3 xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx:

changeable fixed

Fig. 3 Example of generating trial solutions of a subneighbourhood

new attribute tabu list old attribute

Fig. 4 Mechanism of tabu list

iteration 1

iteration 2

iteration 3

iteration N

iteration N +1

R th

R th

R th

R th

R th

xxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx


xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx

changeable fixed



......

......

......

Fig. 5 Example of reference unit rotation strategy


fuel cost of each unit, a forecast load demand at t¼ 1,transmission loss B-matrix, and initial power outputs att¼ 0.

Step 2: Each processor specifies the length of TL, kmax, sizeof NS, and epoch iteration (G).

Step 3: Each processor initialises the time period t to one.

Step 4: Each processor initialises the iteration counter (k) toone and empties TL.

Step 5: Each processor generates X(k,0) by (5) and (6).

Step 6: Each processor initialises AL by setting Xb equal toX(k,0).

Step 7: Each processor initialises the Rth reference unit tothe first unit in its fixed region.

Step 8: Each processor executes TS process on its ownspecies as follows:

Step 8.1: Initialise the trial counter m to one.

Step 8.2: Generate a trial solution X(k,m) from X(k,0).

Step 8.3: If X(k,m) is not feasible, go to Step 8.9.

Step 8.4: If X(k,m) is the first feasible solution, set

Xkcb ¼ X ðk;mÞ.

Step 8.5: Perform the tabu test. If X(k,m) is tabued, then goto Step 8.8.

Step 8.6: If CðX ðk;mÞÞoCðXkcbÞ, set Xk

cb ¼ X ðk;mÞ. Other-wise, go to Step 8.9.

Step 8.7: If CðX ðk;mÞÞoCðXbÞ, then update AL by settingXb¼X(k,m). Go to Step 8.9.

Step 8.8: Perform the AL test. If C(X(k,m))oC(Xb), set

Xkcb ¼ X ðk;mÞ, and update AL by setting Xb¼X(k,m).

Step 8.9: If m is less than the size of NS, m¼m+1 andreturn to Step 8.2.

Step 8.10: If there is a feasible solution, set X ðkþ1;0Þ ¼ Xkcb

and update TL. Otherwise, set X(k+1,0)¼Xb.

Step 9: Each processor checks whether k is indivisible by Gand kokmax. If yes, go to Step 15.

Step 10: Each client processor sends its best solutionreached (Xb) to the host processor.

Step 11: The host processor receives solution candidatesfrom all client processors and determines the best solutionreached among all received solutions.

Step 12: The host processor broadcasts the best solutionreached to all client processors.

Step 13: Each client processor sets the solution received asits initial solution and updates AL.

Step 14: Each processor checks whether k¼ kmax. If yes, goto Step 16.

Step 15: Each processor moves the Rth reference unit tothe next unit in its fixed region, set k¼ k+1, and return toStep 8.

Step 16: The Xb of all subneighbourhoods is the solution attime period t.

Step 17: Each processor checks whether toT. If yes, lett¼ t+1, read a forecast load at the time period t, and returnto Step 4. Otherwise, PTS is terminated.

4 Overhead management and speedup upperbounds of Beowulf cluster

4.1 Configuration of Beowulf clusterThe configuration of the Beowulf cluster consists of onefront-end, and 32 Intel Pentium II 266MHz PC compute

nodes, as shown in Fig. 6. The Beowulf cluster is a high-performance massively parallel computer built primarily outof conventional hardware components, running on theSlackware Linux operating system. Particularly, all compu-ters are interconnected as a private network [9].

4.2 Synchronisation overheadThe one-to-one communication time for a message totransfer between any two nodes of the Beowulf cluster usedin this paper is given by:

Tone ¼ Tsetup þ ðNb � 8:309063 � 10�7Þ seconds ð11Þwhere, Tone is the one-to-one communication time (s); Tsetup

is the setup time (s); and Nb is the size of the transferredmessage (bytes).

By experiment, Tsetup¼ 3.602049� 10�4 s and Nb is theencoded best solution reached. For every epoch, thecommunication time per epoch, Tepoch, is upper boundedby 2� (p�1)�Tone for the host processor to sequentiallygather the messages from all client processors and broadcastthose messages to all client processors. In addition, the hostprocessor performs the competitive selection at the lastiteration. Therefore, the total communication time is upperbounded by Tepoch�Jkmax/Gn.

To determine the total waiting time, the total synchro-nisation time has to be determined. The total synchronisa-tion time is determined from the difference between theparallel execution time and the ideal parallel execution time,where the ideal parallel execution time is the serial executiontime divided by the number of processors used [13]. Thus,the total waiting time is the difference between the totalsynchronisation time and the total communication time.

4.3 Speedup upper bounds of PTSThe ideal speedup is equal to the number of processors usedin the parallel computation. The experimental speedup ratio(Z) is defined as the ratio of the serial (one processor)execution time to the parallel execution time. Also, theefficiency is defined as the ratio of the experimental speedupratio to the ideal speedup of the number of processors.

Based on the Amdahl’s rule, the speedup can bewritten as:

Z ¼T scomp

T scomp=p þ Toh

¼T scomp

T scomp=p þ Tcomm þ Twait

ð12Þ

where T scomp is the serial execution time (s), which is com-

pletely parallelised; Toh is the total synchronisation time (s);Tcomm is the total communication time (s); and Twait is thetotal waiting time (s).

In the same generating unit system, Toh tends to increaselinearly when the number of processors grows (discussed inSection 5.2). Toh can be expressed as a function of thenumber of processors as Toh¼ ap+b, where a and b areconstants for pZ2. Note that a and b are obtained by usinga least square criterion (LSC) to curve-fit the relationship

Proc.1 Proc. 24 Proc. 32

front end

computenode

computenode

compute

node

ethernet switch

100 M bits/s

10 M bit/s 10 M bit/s 10 M bit/s

ethernet switch

computenode

Proc. 25

10 M bit/s

100 M bit/s

Fig. 6 Configuration of Beowulf cluster


between Toh and the number of processors, based on theexperimental results for each generating unit system. Tofind out the speedup upper bound (Zupper), we have thefollowing relation:

Z ¼T scomp

T scomp=p þ Toh

¼T scomp

T scomp=p þ ap þ b

�T scomp

T scomp=p

opt þ apopt þ b9Zupper

ð13Þ

where popt is the optimal number of processors thatmaximises the speedup. By the optimality condition, themaximum Z occurs at @Z/@p¼ 0, which gives

popt ¼ffiffiffiffiffiffiffiffiffiffiT scomp

a

rð14Þ

Note that Zupper is achieved at popt which may be lower thanthe maximum allowable number of processors (pmax) or thetotal number of species (NSN).

For a fixed number of processors used, if the system sizeis larger, more computational loads are assigned to eachprocessor and the synchronisation overhead slightly in-creases (discussed in Section 5.2). Therefore, the ratio of theserial execution time to the synchronisation time ðT s

comp=TohÞwill be larger. Obviously, if ðT s

comp=TohÞ � p; Z � p.

5 PTS implementation results and their speedupupper bounds

The proposed PTS was developed by using the messagepassing interface (MPI) in C language [14] on the Beowulfcluster. The proposed algorithms are tested on the systemconsisting of four units of Rayong combined cycle (RY_CC1–4) with linear decreasing IC, four units of Bangpakongthermal (BPK_T 1–4) with linear increasing IC, one unit ofKhanom combined cycle (KN_CC) with decreasing stair-case IC, and one unit of independent power combined cycle(IPT_CC) with increasing staircase IC functions, as shownin Figs. 7 and 8.

In this paper, two types of load demands are used: themonotonically increasing load demand and the daily loaddemand. For monotonically increasing load demand, loaddemands increase from the minimum loads to themaximum loads with step sizes at every 15minutes. For20, 40, and 80 generating unit systems, the multiples ofRY_CC 1–4, BPK_T 1–4, KN_CC, and IPT_CC input–output data are used. Details of system data are given in[15]. The scaled Electricity Generating Authority of Thai-land (EGAT) daily load demands for different generatingunit systems are also used. Figure 9 shows an example ofthe scaled EGAT daily load curve for the 40 generating unitsystem. The B-matrices for any given system size are

randomly generated so that the transmission losses are inthe range 1–2.1% of the total load demands.

5.1 PTS experimental resultsInitially, the proposed PTS decomposes its entire neigh-bourhood into several equal-size subneighbourhoods orspecies assigning to each processor, whereas competitiveselection is performed every epoch G of 20 iterations tospeed up the convergence rate of PTS. More specifically, theentire neighbourhood is decomposed into eight species foreight processors (PTS-8), 16 species for 16 processors (PTS-16), and 32 species for 32 processors (PTS-32). The tabulengths used for 10, 20, 40, and 80 generating unit systemswith different NSN are given in Table 1. These parametersare determined by experiment to yield the best results. Theproposed PTS stops after 500 iterations.

For both monotonically increasing load demands anddaily load demands, the PTS results are compared toCGSA [15], GA-SA [16], SA [16], GA [16] and MOL basedon the full load average production cost, as shown inTables 2 and 3. Obviously, the obtained solutions for PTS-8, PTS-16 and PTS-32 species compromise the elapsed timewith the solution quality. PTSs are much faster than TSwithout neighbourhood decomposition [17] and theirsolutions are still less expensive than those obtained fromCGSA, GA-SA, SA, GA and MOL approaches, especiallyfor the large generating unit systems. It is observed thatPTS-8 is the best choice due to lower total generator fuelcosts than those of PTS-16 and PTS-32, as the problem sizeis larger. Despite the higher elapsed times of PTS-8 thanthose of PTS-16 and PTS-32, the elapsed times of PTS-8can be further reduced with higher processor speed and afaster Ethernet switch.

Note that unconstrained tabu search (UTS) with amuch higher maximum iteration limit than PTS is usedfor solving the ED problem without considering theramp rate constraints to obtain the good quality solutions,

0

100000

200000

300000

400000

500000

600000

0 200 400 500 600 700 800

output, MW

gene

rato

r fu

el c

ost,

baht

/h

RY-CC 1RY-CC 2RY-CC 3RY-CC 4BPK-T 1BPK-T 2BPK-T 3BPK-T 4KN-CCIPT-CC

RY-CC 1~4

BPK-T 1~4 KN-CC

IPT-CC

300100

Fig. 7 Generator fuel cost curves of RY_CC 1–4, BPK_T 1–4,KN_CC and IPT_CC units

300

400

500

600

700

800

900

1000

0 100 200 300 400 500 600 700 800

output, MW

incr

emen

tal c

ost,

baht

/MW

h

RY-CC 1

RY-CC 2

RY-CC 3

RY-CC 4

BPK-T 1

BPK-T 2

BPK-T 3

BPK-T 4

KN-CC

IPT-CC

RY-CC 3

RY-CC 1, 2, and 4

BPK-T 1~4 KN-CC

IPT-CC

Fig. 8 Incremental cost curves of RY_CC 1–4, BPK_T 1–4,KN_CC and IPT_CC units

8000

10000

12000

14000

16000

18000

0 2 4 6 8 10 12 14 16 18 20 22 24time, h

pow

er lo

ad d

eman

d, M

W

Fig. 9 EGAT daily load curve scaled for the 40 unit system


giving the lowest possible total generator fuel cost. Eventhough UTS solutions are less expensive than those ofTS and PTS, its solutions are not applicable to CED.

However, the solutions of UTS are given for benchmarkingpurposes.

5.2 Synchronisation overhead analysisBased on the experimental results of PTS-32 with epoch of20 iterations on the generating unit systems ranging from 10to 80, the total synchronisation time including totalcommunication times and total waiting times are shownin Table 4. The total communication times are based on thecommunication time model obtained from actual experi-ments conducted on the Beowulf cluster, whereas the totalwaiting times are based on implementation data.

As shown in Table 4, the communication time per epochincreases linearly for the same number of generating units

Table 1: Tabu lengths of PTSs with different NSN

Number of units NSN

8 16 32

10 4 3 2

20 6 4 3

40 8 6 4

80 12 8 6

Table 2: Comparison of total generator fuel costs of PTSs with other methods for monotonically increasing load demands

Numberof units

Method Total fuel cost,Baht

Total costdifference, Baht

% total costdifference

CPU time orelapsed time, s

10 UTS 137571020 0 0.0000 N/A

CGSA 137598948 27928 0.0203 11.338195

TS 137604060 33040 0.0240 2.122000

PTS-16 137604827 33807 0.0246 {0.578458}

PTS-8 137621443 50423 0.0367 {0.677890}

PTS-32 137624790 53770 0.0391 {0.755154}

GA-SA 137642287 71267 0.0518 10.876500

SA 137702765 131745 0.0958 2.922000

GA 137852686 281666 0.2047 7.534000

MOL 138026055 455035 0.3308 0.000200

20 UTS 562685577 0 0.0000 N/A

TS 562695976 10399 0.0018 14.684000

PTS-16 562712924 27347 0.0049 {1.088549}

PTS-8 562732675 47098 0.0084 {1.660864}

PTS-32 562856322 170745 0.0303 {1.040572}

CGSA 562878657 193080 0.0343 29.214294

GA-SA 563210950 525373 0.0934 26.050000

SA 563586592 901015 0.1601 7.642000

GA 564279779 1594202 0.2833 17.150000

MOL 565001762 2316185 0.4116 0.000700

40 UTS 1130503496 0 0.0000 N/A

TS 1130537803 34307 0.0030 111.081000

PTS-8 1130868753 365257 0.0323 {10.507727}

PTS-16 1130979753 476257 0.0421 {5.537323}

PTS-32 1131154161 650665 0.0576 {3.308728}

CGSA 1131791215 1287719 0.1139 96.388915

GA-SA 1133302246 2798750 0.2476 72.555000

SA 1134667957 4164461 0.3684 25.205000

MOL 1135185542 4682046 0.4142 0.002500

GA 1136092285 5588789 0.4944 43.335000

80 UTS 2260860347 0 0.0000 N/A

TS 2260875096 14749 0.0007 855.637000

PTS-8 2262406428 1546081 0.0684 {75.908860}

PTS-16 2262894842 2034495 0.0900 {38.284449}

PTS-32 2263740137 2879790 0.1274 {19.768629}

CGSA 2266880289 6019942 0.2663 356.689009

GA-SA 2269031539 8171192 0.3614 228.364000

MOL 2269852713 8992366 0.3977 0.010100

SA 2270572225 9711878 0.4296 89.172000

GA 2278352718 17492371 0.7737 121.609000


when the number of processors used increases. This isbecause of an increase in receiving and sending commu-nication time of the best solution reached among processorsas more processors are used. On the other hand, with afixed number of processors, the communication time perepoch slightly increases when the system size is larger. Thisclearly reveals the fact that a larger binary string transferredhas a small effect on the communication time.

For the same generating unit system, the waiting time perepoch remains almost constant when more processors areinvolved. This is because the load imbalance amongprocessors is reduced as the computation load decreases(smaller subneighbourhood) even though more processors

requires more coordinating task. For a fixed number ofprocessors, the waiting time per epoch slightly increaseswith the system size. The longer waiting time per epoch isbecause of the larger computational load (larger subneigh-bourhood).

For the same number of generating unit system, the totalsynchronisation time including the total communicationtime and the total waiting time increases linearly when thenumber of processors used grows, as shown in Fig. 10.Similarly, for a fixed number of processors used, the totalsynchronisation time slightly increases when the number ofgenerating units increases owing to the slight increase inboth total waiting time and total communication time.

Table 3: Comparison of total generator fuel costs of PTSs with other methods for daily load demands

Numberof units

Method Total fuel cost,Baht


% Total costdifference

CPU time orelapsed time, s

10 UTS 58047733 0 0.0000 N/A

TS 58067292 19559 0.0337 2.122000

PTS-32 58072982 25249 0.0435 {0.755154}

PTS-16 58074746 27013 0.0465 {0.578458}

PTS-8 58086559 38826 0.0669 {0.677890}

CGSA 58082616 34883 0.0601 11.338195

GA-SA 58089485 41752 0.0719 10.876500

SA 58132810 85077 0.1466 2.922000

GA 58193757 146024 0.2516 7.534000

MOL 58269171 221438 0.3815 0.000200

20 UTS 113793032 0 0.0000 N/A

PTS-16 113839323 46291 0.0407 {1.088549}

TS 113840601 47569 0.0418 14.684000

PTS-8 113846388 53356 0.0469 {1.660864}

PTS-32 113869334 76302 0.0671 {1.040572}

CGSA 113883566 90534 0.0796 29.214294

GA-SA 113995162 202130 0.1776 26.050000

SA 114047687 254655 0.2238 7.642000

GA 114156979 363947 0.3198 17.150000

MOL 114332935 539903 0.4745 0.000700

40 UTS 227377731 0 0.0000 N/A

TS 227472893 95162 0.0419 111.081000

PTS-8 227515051 137320 0.0604 {10.507727}

PTS-16 227533911 156180 0.0687 {5.537323}

PTS-32 227574554 196823 0.0866 {3.308728}

CGSA 227715435 337704 0.1485 96.388915

GA-SA 228049922 672191 0.2956 72.555000

SA 228414559 1036828 0.4560 25.205000

MOL 228450426 1072695 0.4718 0.002500

GA 228608830 1231099 0.5414 43.335000

80 UTS 455053117 0 0.0000 N/A

TS 455274003 220886 0.0485 855.637000

PTS-8 455543579 490462 0.1078 {75.908860}

PTS-16 455670278 617161 0.1356 {38.284449}

PTS-32 455800398 747281 0.1642 {19.768629}

CGSA 456445429 1392312 0.3060 356.689009

GA-SA 456977120 1924003 0.4228 228.364000

MOL 457204202 2151085 0.4727 0.010100

SA 457303849 2250732 0.4946 89.172000

GA 459146978 4093861 0.8996 121.609000


5.3 Speedup upper bounds and someobservationsAs shown in Table 4, the experimental speedups areobviously lower than ideal speedups owing to the totalsynchronisation times, including the total communicationtimes and the total waiting times. At pmax¼ 32, the speedupupper bounds (Zupper) of PTS-32 are achieved for thegenerating unit system size in the range 20–80 units, whereasthe speedup upper bound at popt¼ 16 processors is achievedfor the ten generating unit system. The speedup upperbound of PTS-32 using 32 processors increases from11.2569 to 30.6133 as the number of generating unitsincreases from 20 to 80 owing to the increasing ratio of theserial execution time to the total synchronisation time from17.37 to 706.43 as shown in Table 4. Note the speedup

upper bound tends to reach 32, which is the maximumallowable number of processors of PTS-32 as the systemsize increases. Accordingly, PTS is very favourable for large-scale implementation.

In Table 4, for the same generating unit system, thespeedup ratio increases as the number of processorsincreases until it reaches popt. For example, for the 80generating unit system, the speedup ratio increases from1.9993 to 30.6133 as the number of processors usedincreases from two to 32 processors, whereas the efficiencydrops slightly from 99.97% to 95.67%. This is because thereduction in parallel execution time is much more than thelinear increase in the total synchronisation time. Note thatthe parallel execution times, or elapsed times, shown inTable 4 are the elapsed times of the proposed PTS per loaddemand.

Based on the experimental data, the linear relationshipbetween Toh and the number of processors is observed forthe same generating unit system. Note that a and b, whichrelate Toh and the number of processors for each generatingunit system, are determined by using LSC to curve-fit theexperimental results shown in Table 4. These linearrelationships can be used to estimate the total synchronisa-tion time for different numbers of processors and toestimate the speedup upper bound for each generating unitsystem accordingly. As a result, the estimated speedupupper bounds of PTS with popt species (PTS-popt) using theestimated popt for each generating unit system are shown inTable 5.

In Table 6, it is demonstrated why PTS-8 is the bestcompromise by observing the variations in solution qualityand computing times of sequential runs of PTS, as NSN

Table 4: Experimental data of PTS-32 on the Beowulf cluster

Numberof units

Number ofprocessors

Elapsedtime, s

Z or (Zupper) %Efficiency Toh (s) T scomp=Toh Tcomm, s Tcomm, per

epoch, sTwait, s Twait per

epoch, s

10 1 4.037806 1.000000 100.00 0.000000 F 0.000000 0.000000 0.000000 0.000000

2 2.061033 1.959118 97.96 0.042130 95.84 0.018841 0.000754 0.023289 0.000932

4 1.102111 3.663702 91.59 0.092660 43.58 0.056523 0.002261 0.036136 0.001445

8 0.677890 5.956432 74.46 0.173164 23.32 0.131888 0.005276 0.041276 0.001651

(16) 0.578458 (6.980292) 43.63 0.326095 12.38 0.282617 0.011305 0.043478 0.001739

32 0.755154 5.346997 16.71 0.628973 6.42 0.584076 0.023363 0.044897 0.001796

20 1 11.713660 1.000000 100.00 0.000000 F 0.000000 0.000000 0.000000 0.000000

2 5.913614 1.980796 99.04 0.056784 206.28 0.019672 0.000787 0.037112 0.001484

4 3.040637 3.852370 96.31 0.112222 104.38 0.059016 0.002361 0.053206 0.002128

8 1.660864 7.052751 88.16 0.196657 59.56 0.137704 0.005508 0.058952 0.002358

16 1.088549 10.760802 67.26 0.356445 32.86 0.295081 0.011803 0.061364 0.002455

(32) 1.040572 (11.256943) 35.18 0.674520 17.37 0.609834 0.024393 0.064686 0.002587

40 1 82.284286 1.000000 100.00 0.000000 F 0.000000 0.000000 0.000000 0.000000

2 41.229330 1.995771 99.79 0.087187 943.77 0.021334 0.000853 0.065853 0.002634

4 20.706252 3.973886 99.35 0.135181 608.70 0.064002 0.002560 0.071179 0.002847

8 10.507727 7.830836 97.89 0.222191 370.33 0.149337 0.005973 0.072854 0.002914

16 5.537323 14.859940 92.87 0.394555 208.55 0.320008 0.012800 0.074547 0.002982

(32) 3.308728 (24.868858) 77.72 0.737344 111.60 0.661350 0.026454 0.075994 0.003040

80 1 605.182388 1.000000 100.00 0.000000 F 0.000000 0.000000 0.000000 0.000000

2 302.697061 1.999301 99.97 0.105867 5716.44 0.024657 0.000986 0.081210 0.003248

4 151.454400 3.995806 99.90 0.158803 3810.90 0.073972 0.002959 0.084831 0.003393

8 75.908860 7.972487 99.66 0.261062 2318.16 0.172602 0.006904 0.088459 0.003538

16 38.284449 15.807525 98.80 0.460550 1314.04 0.369862 0.014794 0.090687 0.003627

(32) 19.768629 (30.613271) 95.67 0.856679 706.43 0.764382 0.030575 0.092297 0.003692

0.04

2130

0.05

6784

0.08

7187

0.10

5867

0.09

2660

0.11

2222

0.13

5181

0.15

8803

0.17

3164

0.19

6657

0.22

2191

0.26

1062

0.32

6095

0.35

6445

0.39

4555

0.46

0550

0.62

8973

0.67

4520

0.73

7344

0.85

6679

0.00

0.40

0.80

1.20

10 20 40 80number of generation units

tota

l syn

c. ti

me,

s

total sync. time on 2 processorstotal sync. time on 4 processorstotal sync. time on 8 processorstotal sync. time on 16 processorstotal sync. time on 32 processors

Fig. 10 Total synchronisation time data of PTS-32 on the Beowulfcluster


increases. Note that each subneighbourhood has a tabu listcontaining tabued feasible solutions. It is observed that, asthe number of subneighbourhoods increases, the totalnumber of non-tabued feasible solutions will decrease owingto the higher total number of tabued feasible solutions in allsubneighbourhoods of PTS compared with that of TS,

NSN � bffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffin� N=NSNb c

pc4b

ffiffiffiffiffiffiffiffiffiffiffiffin� N

pc, where, NSNZ2.

Consequently, the frequency of updating the current bestsolution of the non-tabued feasible solutions (Step 8.6 inSection 3.6) will decrease at a higher rate whereas thefrequency of updating the current best solution of thetabued feasible solutions (Step 8.8 in Section 3.6) willincrease at a lower rate, leading to less required CPU timesfor the sequential run of PTS. This is because theprobability of passing the aspiration level (AL) test by thetabued solution in Step 8.8 is lower than the probability ofupdating the current best solution by the non-tabuedsolution in Step 8.6. The reduction in CPU time of thesequential run of PTS tends to saturate when NSN reaches8, particularly in the 40 and 80 generating unit systems. Onthe other hand, the slight increase in total generator fuelcost of PTS as NSN increases is also explained by thesmaller feasible solution space searched in all subneighbour-hoods (caused by the neighbourhood decomposition) andthe smaller total number of non-tabued feasible solutions(or the higher total number of tabued feasible solutions).The more the number of subneighbourhoods is, thehigher the total generator fuel cost will be. But therate of increase when NSN grows from 8 to 32 tendsto be smaller as the system size is bigger. Accordingly, theparallel implementation of PTS-8 could further reduce the

computing time with the least sacrifice in quality of solution.Note that the CPU times shown in Table 6 are obtainedfrom the sequential runs of PTS on a Pentium III 733MHzinstalled with a Linux operating system

6 Conclusions

A parallel tabu search (PTS) algorithm for solvingconstrained economic dispatch (CED) problems for gen-erating units with non-monotonically and monotonicallyincreasing incremental cost (IC) functions has been pro-posed. Using load balancing and competitive selection stra-tegies, the proposed PTS has been successfully andeffectively implemented on the 32-processor Beowulf clusteras demonstrated by the high speedup upper bounds onlarge generating unit systems. The proposed PTS compro-mises the solution quality and the experimental speedupsfor the best performance. Accordingly, the proposed PTS ispotentially viable for online control of CED owing to sub-stantial generator fuel cost savings and fast computing time.

7 Acknowledgment

The authors would like to thank the Electricity GeneratingAuthority of Thailand (EGAT) for providing the test data.

8 References

1 Mori, H., and Matsuzaki, O.: ‘A parallel tabu search approach tounit commitment in power systems’. IEEE Proc. Int. Conf. onSystems, Man and Cybernetics (ICSMC), Tokyo, Japan, Oct. 1999, 6,pp. 509–514

Table 5: Estimated speedup upper bounds of PTS based on experimental results on the Beowulf cluster

Numberof units

Elapsedtime, s

PTS EstimatedPopt

EstimatedZupper

a b

10 4.037806 PTS-14 14 7.0592 0.0194 0.0125

20 11.713660 PTS-24 24 11.6750 0.0203 0.0271

40 82.284286 PTS-62 62 30.3240 0.0216 0.0475

80 605.182388 PTS-156 156 77.2470 0.0250 0.0589

Table 6: Comparison of total generator fuel costs of sequential runs of PTS for daily load demands

Number of units NSN Total generator fuelcost, Baht


% total costdifference

CPU time, sec

10 1 58067292 0 0.000000 0.672983

32 58072982 5690 0.009799 0.734154

16 58074746 7454 0.012836 0.729543

8 58086559 19267 0.033181 0.717761

20 16 113839323 0 0.000000 4.146625

1 113840601 1278 0.001122 5.341601

8 113846388 7065 0.006206 4.097291

32 113869334 30011 0.026362 4.187341

40 1 227472893 0 0.000000 40.072732

8 227515015 42122 0.018517 27.614079

16 227533911 61018 0.026824 27.796970

32 227574554 101661 0.044691 27.882971

80 1 455274003 0 0.000000 438.030000

8 455543579 269576 0.059212 199.027500

16 455670279 396276 0.087041 199.470300

32 455800398 526395 0.115622 199.884700


2 Mori, H., and Ogita, Y.: ‘A parallel tabu search based methodfor reconfigurations of distribution systems’. Proc., IEEE PowerEngineering Society Summer Meeting, 2000, 1, pp. 73–78

3 Mori, H., and Ogita, Y.: ‘Parallel tabu search for capacitor placementin radial distribution systems’. Proc., IEEE Power Engineering SocietyWinter Meeting, 2000, 4, pp. 2334–2339

4 Mori, H., and Goto, Y.: ‘A parallel tabu search based method fordetermining optimal allocation of FACTS in power systems’. Proc.,PowerCon on Power system technology, 2000, pp. 1077–1082

5 Ongsakul, W.: ‘Real-time economic dispatch using merit order loadingfor linear decreasing and staircase incremental cost functions’, Electr.Power Syst. Res., 1999, 15, (3), pp. 167–173

6 Yang, H.T., Yang, P.C., and Huang, C.L.: ‘Evolutionaryprogramming based economic dispatch for units with non-smooth fuel cost functions’, IEEE Trans. Power Syst., 1996, 11, (1),pp. 112–118

7 Wong, K.P., and Fung, C.C.: ‘Simulated annealing based economicdispatch algorithm’, IEE Proc. C, Gener. Transm. Distrib., 1993, 140,(6), pp. 509–515

8 Walters, D.C., and Sheble, G.B.: ‘Genetic algorithm solution ofeconomic dispatch with valve point loading’, IEEE Trans. PowerSyst., 1993, 8, (3), pp. 1325–1332

9 Uthayopas, P., Angskun, T., and Maneesilp, J.: ‘Building a parallelcomputer from cheap PCs: SMILE cluster experiences’. Proc., 2nd

Annual Nat. Symp. on Computational Science and Engineering,Bangkok, Thailand, 1998

10 Wood, A.J., and Wollenberg, B.F.: ‘Power generation, operation andcontrol’ (John Wiley & Sons, New York, USA, 1996, 2nd Edn.)

11 Glover, F., and Laguna, M.: ‘Tabu search’ (Kluwer AcademicPublishers, London, 2001)

12 Rayward-Smith, V.J., Osman, I.H., Reeves, C.R., and Smith, G.D.:‘Modern heuristic search methods’ (John Wiley & Sons, UK, 1996)

13 Huang, G., and Ongsakul, W.: ‘Managing the bottlenecks in parallelGauss–Seidel type algorithms for power flow analysis’, IEEE Trans.Power Syst., 1994, 9, (2), pp. 667–684

14 Snir, M., Otto, S., Lederman, S.H., Walker, D., and Dongarra, J.:‘MPI: the complete reference’ (MIT Press, Cambridge, 1996)

15 Ruangpayoonsak, N., Ongsakul, W., and Runggeratikul, S.: ‘Con-strained economic dispatch by combined genetic and simulatedannealing algorithm’, Int. J. Electr. Power Comp. Syst., 2002, 30,(9), pp. 917–931

16 Ongsakul, W., and Ruangpayoongsak, N.: ‘Constrained dynamiceconomic dispatch by simulated annealing/genetic algorithm’. Proc.,22nd Int. Conf. Power Industry Computer Application (PICA),Sydney, Australia, May 2001, pp. 207–212

17 Ongsakul, W., Dechanupaprittha, S., and Ngamroo, I.: ‘Tabu searchalgorithm for constrained economic dispatch’. Proc., Int. Conf. PowerSystems, Wuhan, China, Sept. 2001, pp. 428–433


Download - Parallel tabu search algorithm for constrained economic dispatch

Top Related