bwr in-core fuel management optimization using parallel simulated annealing in formosa-b

lable at ScienceDirect

Progress in Nuclear Energy 53 (2011) 600e606

Contents lists avai

Progress in Nuclear Energy

journal homepage: www.elsevier .com/locate/pnucene

BWR in-core fuel management optimization using parallel simulatedannealing in FORMOSA-B

Ross Hays*, Paul TurinskyDepartment of Nuclear Engineering, North Carolina State University, Campus Box 7909, Raleigh, NC 27695-7909, USA

a r t i c l e i n f o

Article history:Received 15 March 2010Accepted 13 September 2010

Keywords:In-core fuel managementBoiling water reactorParallel simulated annealing

* Corresponding author.E-mail addresses: [email protected] (R. Hays), turi

0149-1970/$ e see front matter � 2010 Elsevier Ltd.doi:10.1016/j.pnucene.2010.09.002

a b s t r a c t

The process of finding optimized fuel reload patterns for boiling water reactors is complicated bya number of factors including the large number of fuel assemblies involved, the three-dimensionalneutronic and thermal-hydraulic variations, and the interplay of coolant flow rate with control rodprogramming. The FORMOSA-B code was developed to provide an automated method for finding fuelloading patterns, control rod programs and coolant flow rate schedules to minimize certain quantitativemetrics of core performance while satisfying given operational constraints. One drawback of this codehas been the long runtimes required for a complete cycle optimization on a desktop workstation(oftentimes several days or more). To address this shortcoming, a parallel simulated annealing algorithmhas been added to the FORMOSA-B code, so that the runtimes may be greatly reduced by usinga multiprocessor computer cluster. Tests of the algorithm on a sample problem indicate that it is capableof parallel efficiencies exceeding 80% when using four processors.

� 2010 Elsevier Ltd. All rights reserved.

1. Introduction

The process of nuclear fuel management for commercial powerreactors incorporates the many diverse and varied aspects involvedin the procurement, manufacture, use and eventual disposal orreuse of the nuclear fuel materials. This complicated and interre-lated set of tasks is key to a nuclear power plants continuedeconomic and operational viability. One area of this process thatcan be particularly challenging from a computational point of viewis the in-core fuel management problem. This problem is composedof decisions on the design, placement and operation of fuelassemblies within the reactor. Although different utilities may havedifferent priorities for which features of the reload core are to beoptimized, the goal of the optimization is generally to obtain themost energy production from a given set of fuel assemblies whilestaying within required operational and safety margins. Within theFORMOSA-B code are a number of mathematical objective func-tions over which the core may be optimized. These include theminimization of end-of-cycle coolant flow, minimization of reloadcost, minimization of local power peaking, and several others. Inaddition to these objective functions, the optimization algorithmseeks to find loading patterns that do not violate a set of thermalconstraints. This is accomplished by the addition of an adjustable

[email protected] (P. Turinsky).

All rights reserved.

penalty-factor multiplier to the objective function value for eachconstraint limit that is violated in a given pattern. In this way, thesearch algorithm can accept loading patterns with constraintviolations early in the optimization (thus allowing it to escape fromlocal solutionminima) while gradually eliminating the violations asthe optimization progresses. This task of selecting a loading patternis particularly difficult in boiling water reactors, as they havea higher number of fuel assemblies than a comparably poweredpressurized water reactor and they require periodic movement ofthe control blades and changes in coolant flow rate during opera-tion. BWR analysis is further complicated by the interplay betweencoolant flow rate (and void formation) and core neutronics,necessitating a full three-dimensional simulation incorporatingfeedback effects.

The process of BWR in-core fuel management optimization isideally suited to the use of stochastic algorithms, such as simulatedannealing and genetic algorithms. These algorithms offer the abilityto optimize over large combinatorial search spaces, they do notrequire derivative information, and they can be conceptually simpleto implement. The downside, however, is that they often requirea long and indeterminate amount of time to converge on theoptimum set of solutions.

The simulated annealing algorithm was originally proposed byMetropolis et al. (1953) as a way to generate equations of state formaterials under high pressures and temperatures. It uses a mathe-matical analogue to the physical annealing process to compute theminimum value of an objective function. It does so by repeatedly

mailto:[email protected]

mailto:[email protected]

www.sciencedirect.com/science/journal/01491970

http://www.elsevier.com/locate/pnucene

http://dx.doi.org/10.1016/j.pnucene.2010.09.002



R. Hays, P. Turinsky / Progress in Nuclear Energy 53 (2011) 600e606 601

perturbing the configuration of the system being examined andthen accepting or rejecting the perturbed solution such that thedistribution of the objective function value of accepted solutions isdescribed by a Boltzmann distribution (Eqs. (1) and (2), below):

PðEÞ ¼ 1ZðTÞe

�E=kBT (1)

where

ZðTÞ ¼X

i˛Se�Ei=kBT ; (2)

S denotes the set of all possible configurations, and kB is theBoltzmann constant.

This distribution is obtained by using the Metropolis Criterion(equation (3)) to decide the probability with which a given solutionshould be accepted. Here DE is the difference in objective functionvalue between the proposed configuration and an initial referenceconfiguration.

p ¼ e�DE=T (3)

Once an equilibrium distribution has been obtained for a giventemperature parameter, T, the temperature is reduced and theprocess repeats. This cycle of so-called cooling steps is repeated untilthe solution distribution converges to a minimum objective value.The rate at which T is reduced, known as a cooling schedule, is a keydeterminer of program performance. If it is reduced too quickly, thesolution may quench, i.e. become trapped in a local minimum, whileif it reduced too slowly, computational efficiency suffers. The FOR-MOSA-B code utilizes an adaptive temperature decrement scheme toaccelerate the cooling schedule while preventing quenching.

The BWR in-core fuel management problem, when including allpossible options, has a search space approaching 10100 possiblesolutions. Although the set of feasible solutions is only a smallfraction of that number, a large number of samples (on the order of104e105) must still be examined if the SA algorithm is to havea reasonable probability offindingoptimumsolutions.With ahighlyoptimized core simulator code andamodern computerworkstation,this optimization problem is tractable, but lengthy. In previouslypublished results using the FORMOSA-B code (Karve, and Turinsky,1999), runtimes of over 130 hwere required for optimization runs of4000 samples, equating to about 2minper sample, on a nowdecadeold high-performanceworkstation.While the computational poweravailable in a standard workstation has greatly increased in the tenyears since these results were obtained, even greater performancecan be obtained through the use of parallel computation.

Over the past decade the growth of Internet commerce andenterprise computing has driven the development of scalable, low-cost, high-power server networks. As a side benefit, these technol-ogies provide scientists and engineerswith an affordable alternativeto the costly main-frame supercomputers previously required forlarge scale computational tasks. By coupling these commodity-levelservers together in a coordinated network, one can achieve verylarge computational throughput with only modest investments intime and money. Coincident with this development has been thedevelopment and deployment of low-cost multi-core processors.

The availability of powerful, low-cost hardware would be oflittle benefit without standardized, portable parallel communica-tion software packages to enable the computers to work together.One such software standard is the Message Passing Interface (MPI)standard (MPI Forum, 2008). The MPI standard was developed inthe early 1990s to provide a standard set of interface routines andprocedures for programming parallel applications in the C andFORTRAN languages. By following these standardized interface

specifications applications developers and systems vendors can beassured of program compatibility. Thus the effort expended onsoftware development would not be lost when moving from onecomputer to another. The MPI standard provides several sets ofsubroutines that allow the programmer to control the parallelenvironment and to send data back and forth between the indi-vidual processes that make up the parallel environment.

The key to utilizing these computational clusters effectively isthe ability to separate a given algorithm into distinct tasks thatrequire a minimum amount of communication and coordination.Simulated annealing, though originally devised as a serial algo-rithm, can be adapted to such a parallel computation model. Itpossesses the very desirable attribute of a high computation tocommunication burden ratio for the in-core fuel managementapplication. The following section describes the parallel simulatedannealing algorithm that has been incorporated into the FOR-MOSA-B code and the results of preliminary testing.

2. Computational algorithm

In the simulated annealing algorithm, the number of histories tobe sampled during each cooling step is determined by two factors,1)the sample sizemust be sufficiently large that the objective functionmay reach an equilibrium distribution, and 2) a sufficient number ofpermutations of the configuration must be performed so that anyfeasible areas in the configuration space may be reached. For theparallel simulated annealing algorithm, a method similar to theSynchronous Multiple Markov-Chain Parallel Simulated Annealing(MMCPSA) of Lee and Lee (1996) is utilized. In this method, eachprocess in aparallel computationgenerates, evaluates andacceptsorrejects a separate Markov Chain of solutions at a given annealingtemperature. All processes continue sampling new loading patternsat the annealing temperature until all processes in the parallelenvironment have either a) accepted at least ltran number of solu-tions each or b) sampled at least lchain number of solutions each(where ltran and lchain are user-defined constants). Once theseconditions are met on each process, an annealing temperatureupdate is initiated. During this update step, the statistics of theaccepted histories are combined across all processes. The combinedcooling statistics are then used to determine the new annealingtemperature and to update the constraint violation penalty multi-pliers. Next, a solution fromamongst the various processes is chosenas the startingpoint for thenextMarkovChain, and theoptimizationcontinues until the next update step (see Fig. 1, below). This processcontinues until more than a specified total number of samples,llngth, have been evaluated by the end of an annealing temperatureupdate step, or until the fraction of accepted histories drops below5% over a Markov Chain. Two different methods are available forchoosing the starting solution for each subsequentMarkov Chain; inthe sync mode of operation, the solution with the lowest objectivefunctionvalue among all processes is chosen. In the binary branchingmode, the best solution from each process is randomly pairedagainst those on the other processwith a survivor from each pairingchosen using the Metropolis criterion. This process of pairing andselection is continued until there is only one survivor, whichbecomes the current solution for the next parallel Markov chains.This method is used to introduce more diversity into the solutionsequence, thereby allowing the algorithm to cover more of theconfiguration space and avoid trapping in local minima.

One important factor affecting the thermal margins and requiredcoolant flow rates within the core is the sequencing pattern throughwhich the control blades arewithdrawn (theControl Rod Program, orCRP).As the fuel loadingpatternchangesduring theoptimization run,the appropriate CRP also changes. While the FORMOSA-B programutilizes heuristic rules to compute the CRP for a given LP, this process

Fig. 1. Synchronous MMCPSA program flow.

R. Hays, P. Turinsky / Progress in Nuclear Energy 53 (2011) 600e606602

is computationally expensive (often requiring a factor of 8e10 longerto evaluate than a simple loading pattern perturbation).Whether theupdated CRP is accepted over the current CRP is also determined bytheMetropolisCriterion.However, aCRPchosenby theheuristic ruleswill usually work quite well for many nearby loading patternperturbations. Therefore, formaximumcomputational efficiency, thecontrol rod program is only updated during a cooling step aftera specifiednumber (ilimx) of loadingpatternshavebeenevaluated foreach parallelMarkov Chain, and immediately before a cooling updatestep if a certain number (the CRP Update Threshold) of loadingpatternshavebeenevaluated across all processes during thepreviouscooling step.

3. Parallel benchmarks and test cases

In order to test the performance of the parallel simulatedannealing algorithm, a number of optimization runs have beenperformed using the Henry2 Linux blade cluster at North CarolinaState University. This cluster consists of a mix of single-, dual- andquad-core processors of varying speeds. Due to the stochasticnature of the simulated annealing algorithm and the first-come,first-serve method by which processors are assigned within thecluster it is to be expected that optimization results will vary fromrun to run. To get a measure of this variability, a number of runsmust be performed for each test case.

Table 1Optimization Parameters.

llngth lchain ltran ilimx

Serial 16,000 800 320 100Parallel 24,000 800* 400* 250


3.1. Sample problem

To test the PSA algorithm optimization calculations were per-formed for a 3-region reload core of a small GE BWR/4 reactor. Thefuel load consists of 368 fuel assemblies of two mechanical designsarranged with quarter-core reflective symmetry. The requestedcycle burnup is 12,336 MWD/MTU at a power level of 1911.6 MW.The objective of the optimizationwas to minimize the end-of-cyclecore flow rate (i.e. maximize cycle energy production).

Toprovideabenchmark for comparison, three serial test runswereperformed using similar settings to those used for the parallel testcases. Specific optimization parameters are listed below in Table 1.

Note that the lchain and ltran values used for the parallel runs(indicated by the asterisk) are divided equally among the fourprocessors utilized for the runs. Thus each process will sample 200histories or accept 100 histories before signaling for an annealingtemperature update.

For this sample problem, there were five constraints thatactively influenced the optimization search (other constraints wereenabled, but not so limiting as to exhibit violations). The numericalvalues of these constraints are computed by taking a cycle burnup,and if appropriate volume, weighted root-mean-square average ofthe value by which a given core property exceeds a specifiedthreshold value. The Maximum Fraction of Limiting Power Density(MFLPD) is determined by the ratio of peak pin power to thelimiting pin power (threshold value of 0.94). The MaximumAverage Planar Linear Heat Generation Rate ratio (MAPRAT) issimilar to MFLPD, but determined by nodal power ratios (thresholdvalue of 0.94). TheMaximum Fraction of Limiting Critical Power Ratio(MFLCPR) is computed from the bundle-average critical power ratio(this relates to dry-out and boiling transition failure, and hasa threshold value of 0.96). The maximum constraint values reflectconservatism in view of modeling uncertainties and are similar tothose used in previous studies (Moore et al., 1998). The constrainton Cold Shutdown Margin (CSDM) is the burnup-weighted, root-mean-square (RMS) average of the amount by which the sub-criticality of the cold core violates the minimum specified coldshutdown margin for each of the tested stuck-rod configurations(minimum value of 0.025 Dk/k). The Excess Reactivity Upper Limit(MAXHX) constraint is given by the burnup-weighted, RMS averageamount by which the core reactivity exceeds a specified upperthreshold at 100% rated flow with all control rods out (1820pcm at

Table 2Optimization Run Results.

CRP Threshold #of Runs Avg. # of Samples Average Runtime [h

Serial 3 16391 44.8No Updates 5 24349 17.61000 5 24518 21.2700 5 24474 18.2400 5 24561 21.9200 5 24451 20.9140 4 25155 22.5100 4 24571 20.480 4 24598 21.260 5 24341 21.840 3 24444 21.820 4 24397 21.1

BOC, 3000pcm otherwise). The final active constraint is the Criti-cality Constraint (CRIT FLOW). This model uses the coolant flow rateto control core criticality; therefore this constraint takes the form ofa burnup-weighted RMS average of the amount by which thecoolant flow rate falls outside a specified window (95e110% ratedflow for the final two burnup steps, 95e101% otherwise).

3.2. Computational benchmarks

Two important factors must be considered when evaluatingprogram results. First, does it produce optimized solutions that areequal to or better than those produced by the original serial code?Secondly, are the runtime reductions commensurate with theadditional computational resources required? Because the numberof possible loading patterns is much greater than the number thatmay feasibly be evaluated during an optimization run, it is highlyunlikely that separate runs will each produce the same best loadingpattern. Although the best loading pattern itself cannot be directlycompared between optimization runs, one can still compare theobjective function and constraint violation values for the bestsolution across multiple runs.

Unlike many numerical computation schemes, where one cancompute residuals and predict convergence rates, there is noway toforecast the remaining benefit to be gained by extending thesimulated annealing search by a given number of samples.Furthermore, because it is a stochastic search, there is no guaranteethat two runs, differing only by the initial random number seed,will move along similar search paths to similar solutions. Thiscomplication makes it difficult to compare the efficiency of theparallel algorithm on the basis of the generated solutions. Instead,the total number of histories sampled was chosen as a baseline inorder to generate a meaningful comparison between the speed ofthe parallel and the serial SA algorithms. Parallel speedup andparallel efficiency are the main measures of algorithm computa-tional performance. For these tests, parallel speedup is defined tobe the ratio of the runtime-per-sampled-history in the serialalgorithm to the runtime-per-sampled-history in the parallelalgorithm. The parallel efficiency is then computed by normalizingthe parallel speedup by the number of processors required toachieve those results. Therefore a speedup of 3 on 4 processorswould be considered 75% efficient. Ideally the algorithm would be100% efficient, however several factors, such as communicationsoverhead, I/O wait time, process imbalance and inherent differ-ences in the algorithm can cause this value to be somewhat lower.

3.3. Testing environment

The parallel simulated annealing algorithm in FORMOSA-B wasdeveloped and tested on the Henry2 Linux Blade Cluster at the High

r] EOC % Flow Change Speedup Efficiency

Average Standard Deviation

�10.60% 0.74% e e

�9.21% 3.12% 3.43 85.7%�10.18% 0.18% 3.21 80.2%�10.35% 0.40% 3.22 80.6%�8.57% 4.23% 3.21 80.3%

�10.33% 0.61% 3.20 80.0%�10.16% 0.10% 3.22 80.4%�10.16% 0.10% 3.21 80.1%�10.39% 0.39% 3.23 80.7%�10.09% 0.09% 3.21 80.4%�10.13% 0.21% 3.21 80.4%�10.57% 0.64% 3.23 80.8%

Table 3Best solution constraint violation averages.

CRP Threshold Constraint Violation (x1,000,000)

MFLPD StandardDeviation

MAPRAT StandardDeviation

MFLCPR StandardDeviation

Reference 0.00 224.93 0.00Serial 83.56 27.75 3.45 5.98 0.00 0.00No Updates 301.21 178.04 77.41 72.60 820.87 627.071000 129.65 156.45 11.25 13.61 34.15 59.24700 119.78 159.46 12.91 18.30 148.22 293.40400 174.14 77.79 19.20 42.92 83.11 185.85200 188.63 116.09 22.00 49.19 454.28 762.19140 159.41 191.17 56.53 113.07 417.57 497.16100 122.22 83.64 17.06 23.34 161.11 322.2280 151.11 127.38 31.50 63.01 47.76 59.8760 106.27 121.02 0.24 0.55 0.00 0.0040 153.21 164.60 0.00 0.00 0.00 0.0020 70.76 141.53 4.44 5.38 137.02 272.36

CRP Threshold Constraint Violation (x1,000,000)

CSDM StandardDeviation

MAXHX StandardDeviation

CRIT FLOW StandardDeviation

Reference 0.00 0.00 0.34Serial 37.72 65.34 0.00 0.00 5114.26 5783.27No Updates 146.24 236.34 52.66 117.75 21297.52 4326.001000 426.70 889.00 19.71 44.07 2253.01 476.36700 213.78 314.65 0.00 0.00 5516.08 4840.76400 118.68 174.31 160.98 324.52 6850.61 3359.02200 357.71 724.19 109.09 236.20 8682.35 7209.18140 621.98 1234.86 0.00 0.00 12164.24 12566.13100 30.18 60.37 265.80 531.61 3738.07 3077.4380 44.78 89.57 0.00 0.00 6980.28 7055.1960 122.90 274.82 0.00 0.00 3586.27 2254.6940 0.00 0.00 0.00 0.00 2357.21 664.5320 0.66 1.32 0.00 0.00 4615.66 3225.62

Fig. 2. Augmented Objective Function Value for Accepted Solutions.

Fig. 3. Parallel Efficiency for sync and binbranch algorithms.


Performance Computing Center at North Carolina State University.This cluster consists of over 600 dual-processor compute nodes ofvarious ages and processor types. The processors are a mixture ofsingle-, dual- and quad-core Intel Xeon processors, with processorspeeds ranging from 2 to 3 GHz and 2 GB of RAM available for eachcore. Load management software is responsible for dispatching andmanaging parallel jobs. It will assign one active process perprocessor based on a pre-established queue priority to provide themaximum utilization of the available resources. The cluster usesthe Linux operating system and features the Intel FORTRANCompiler and MPI implementation.

The Henry2 cluster is constantly growing and changing asnewer, faster nodes are added to the system. Therefore anybenchmark of algorithm performance cannot be based purely onthe measured computation times (which vary by as much asa factor of two, depending on which nodes to which the job isassigned). In order to correct for these variations, the parallelspeedup and efficiency calculations for each run are normalized bythe ratio of the mean time required to evaluate one loading patternduring the given run to the mean time required across all parallelruns. It should be noted, however, that this correction does notaccount for inefficiencies that may be caused when a parallel job isdivided between both fast and slow processors. In those cases, thefaster processors will sample a significantly higher number ofhistories while waiting for the slow processors, leading to a coolingcycle with fewer, but longer cooling steps.

4. Numerical results

The synchronous parallel simulated annealing algorithm wastested using a total of 49 individual runs. For five of the runs, CRPupdates were disabled, while for the other 44 the CRPs were

updated once for every 250 loading pattern shuffles withina Markov chain segment and once before each annealing temper-ature update if a threshold number of loading patterns had beenevaluated across all processes during the previous cooling step. Forcomparison, as noted earlier three trials were run using the originalserial FORMOSA-B code. For these serial runs, a nominal run of16,000 samples were requested, with CRP updates performed oncefor every 100 loading pattern shuffles. The aggregated results for allruns are presented below in Tables 2 and 3. Fig. 2 shows how theobjective function value varies from solution to solution whiletrending downwards as the optimization progresses. The first 400accepted histories in Fig. 2 occur during the temperature initiali-zation phase of the optimization. During this phase, all histories areaccepted, and constraint violation multiplier values are kept at lowinitial levels. As the optimization progresses, the rising constraintmultiplier levels combine with the remaining constraint violationvalues to increase the augmented objective function value abovethe initial unaugmented minimum level.

The levels of end-of-cycle flow reduction and remainingconstraint violations found are noted to be similar for the serial andparallel test cases if the CRP threshold value is selected appropri-ately. This indicates that the parallel algorithm is able to match theserial algorithm in terms of solution quality.

As expected, the runs in which CRP updates were disabled ranmuch more quickly than those runs in which CRP updates wereallowed. However, the inability to update the CRP resulted in much

Table A.1EOC Flow and Constraint Violations by Run.

CRPThreshold

Change Best Solution Constraint Violation Values (x1,000,000)

EOC Flow MFLPD MAPRAT MFLCPR CSDM MAXHX CRIT FLOW

Serial �11.5% 72.8 10.4 0.0 0.0 0.0 2166.5

�10.2% 115.1 0.0 0.0 113.2 0.0 11777.5

�10.1% 62.8 0.0 0.0 0.0 0.0 1398.8

No updates �11.1% 304.8 61.2 1396.8 0.0 0.0 19054.2�10.1% 139.4 7.9 662.1 0.0 0.0 15277.8�10.1% 146.0 83.4 1502.7 0.0 0.0 25614.1�11.0% 340.8 197.2 542.8 188.1 0.0 25163.8�3.7% 575.0 37.3 0.0 543.1 263.3 21377.7

1000 �10.1% 41.1 0.0 0.0 0.0 0.0 2960.9�10.0% 0.0 0.0 0.0 119.2 0.0 2435.0�10.3% 50.9 4.2 34.0 0.0 0.0 2220.8�10.1% 385.3 25.0 136.8 2014.3 98.5 1872.0�10.4% 171.0 27.0 0.0 0.0 0.0 1776.4

700 �10.1% 0.0 0.0 0.0 0.0 0.0 1219.2�10.0% 0.0 0.0 0.0 0.0 0.0 1365.5�10.9% 53.5 4.3 0.0 0.0 0.0 3650.1�10.7% 168.1 17.0 70.9 371.3 0.0 9843.5�10.1% 377.2 43.2 670.2 697.6 0.0 11502.1

400 �10.7% 75.2 0.0 0.0 0.0 0.0 1942.0�1.0% 278.4 96.0 0.0 415.3 739.3 8208.0

�10.2% 140.6 0.0 0.0 131.8 0.0 6814.5�10.6% 220.1 0.0 415.6 46.3 65.6 11170.0�10.4% 156.4 0.0 0.0 0.0 0.0 6118.6

200 �10.1% 0.0 0.0 0.0 0.0 0.0 1259.8�10.0% 155.9 0.0 0.0 0.0 14.0 3348.7�11.4% 259.8 110.0 1758.6 1648.6 531.5 17095.6�10.0% 285.0 0.0 512.9 139.9 0.0 15551.4�10.0% 242.4 0.0 0.0 0.0 0.0 6156.3

140 �10.0% 390.9 226.1 983.6 2474.2 0.0 22095.5�10.3% 0.0 0.0 0.0 0.0 0.0 1476.5�10.2% 241.7 0.0 686.7 0.0 0.0 23956.9�10.1% 5.0 0.0 0.0 13.7 0.0 1128.0

100 �10.2% 180.1 14.2 0.0 120.7 0.0 4549.3�10.1% 0.0 0.0 0.0 0.0 0.0 1196.1�10.2% 172.2 3.2 0.0 0.0 0.0 1457.5�10.1% 136.6 50.9 644.4 0.0 1063.2 7749.4

80 �10.1% 182.5 0.0 124.1 179.1 0.0 5477.9�10.9% 318.3 126.0 67.0 0.0 0.0 17285.9�10.1% 53.6 0.0 0.0 0.0 0.0 3607.1�10.4% 50.0 0.0 0.0 0.0 0.0 1550.2

60 �10.0% 50.0 0.0 0.0 0.0 0.0 3009.0�10.1% 0.0 0.0 0.0 614.5 0.0 2948.0�10.0% 219.5 0.0 0.0 0.0 0.0 7447.4�10.3% 8.2 0.0 0.0 0.0 0.0 1493.5�10.1% 253.5 1.2 0.0 0.0 0.0 3033.4

40 �10.0% 132.4 0.0 0.0 0.0 0.0 2390.9�10.0% 0.0 0.0 0.0 0.0 0.0 1676.5�10.4% 327.2 0.0 0.0 0.0 0.0 3004.3

20 �10.4% 0.0 6.9 0.0 0.0 0.0 2301.1�11.5% 0.0 0.0 2.5 2.6 0.0 8909.2�10.3% 0.0 10.9 0.0 0.0 0.0 1973.4�10.0% 283.1 0.0 545.6 0.0 0.0 5279.0


poorer solutions, both in terms of the final minimization of end-of-cycle flow and in terms of remaining constraint violations. Table 3lists the constraint violations in the reference pattern and theaverage of the constraint violations that remained in the bestsolution for the runs at each CRP threshold setting. The heuristicallydetermined CRPs used the same blade groups as the initial refer-ence CRP, however the number and depth of insertions varied asthe loading patterns changed. To interpret the magnitude of theconstraint violations, a violation value of 100� 10�6 corresponds tothe following weighted root-mean-square violation of the limit:0.0001 fraction for MFLPD, MAPRAT, andMFLCPR; 10pcm for CSDMand MAXHX; and 0.0001% for CRIT FLOW. With this interpretation,Table 3 indicates that all constraint violations are negligible.

The stochastic nature of the algorithm is evident in the run-to-run variability in the constraint violations (seen in the standarddeviations in Table 3, or the individual results in Appendix A,Table A.I). However, beyond noting that the ability to update thecontrol rod program plays a large part in finding optimal solutions,there appears to be no significant correlation between the CRPupdate threshold and the quality of the solutions given the currentoptimization settings. This finding is not unexpected, as there areexpected to be 400 or more histories sampled across all processorsin a given Markov chain, so there should be no difference betweenruns where the CRP threshold is set below this value.

The parallel efficiency of these methods provides an importantmeasure of their ultimate scalability. Because the SA algorithm isinherently serial in nature, the extent of the parallel speedupattainable is somewhat limited. Fig. 3 shows that the calculatedparallel efficiencies drop below 80% when more than 8 processorsare used, and fall below 70% for more than 40 processors. Three setsof runs are depicted in Fig. 3, one using the synchronous bestsolution algorithm, one using the binary branching algorithm (bothwith 48,000 total histories sampled), and one synchronous casewith 16,000 on 4 processors (used as a benchmark against previ-ously run cases).

5. Conclusions

Test results so far indicate that the synchronous parallel simu-lated annealing algorithm is capable of finding suitably optimizedloading patterns for the given BWR reload scenario. Further,reasonably high parallel efficiencies of 80% were obtained in mostcases. There remain, however, a number of further tests yet to beperformed in order to fully characterize this algorithm. Additionaltesting is required to investigate the effects of varying the lchain,ltran, and llngth cooling parameters. Tests reported thus far also donothing to indicate the scalability of the algorithm to largernumbers of processors. It is expected that efficiency will degrade asthe Markov chain segments on each process become progressivelyshorter. This may provide the ultimate limitation to the amount ofruntime reduction that may be obtained with this algorithm.Finally, this algorithm should be tested on a larger BWR system tobetter inform the way in which the size of the problem affects thescaling and performance of the algorithm.

Moving further beyond the synchronous algorithm presentedhere, there are several other variations on multiple-Markov chainPSA that warrant investigation. First among these include BinaryBranching andMixing-of-States type algorithms (Kropaczek, 2008),where the starting configuration after each annealing temperatureupdate is determined stochastically. One perceived benefit of thesemethods is that they introduce further variability in the searchprocess, leading to a better coverage of the search space and fasterconvergence to optimum solutions with a lower likelihood ofquenching. Finally, there are several asynchronous parallel simu-lated annealing algorithms that offer the promise of reduced

communications overhead and accelerated cooling cycles byallowing the individual processes to control their annealingtemperature update schedules.

Appendix A

References

Karve, A.A., Turinsky, P.J., 1999. FORMOSA-B: a boiling water reactor in-core fuelmanagement optimization package II. Nuclear Technology 131 (1), 48e68.

Kropaczek, D.J., Concept for Multi-cycle Nuclear Fuel Optimization Based on ParallelSimulated Annealing with Mixing of States. In: International Conference on thePhysics of Reactors, 2008.


Lee, S.Y., Lee, K.G., 1996. Synchronous and asynchronous parallel simulatedannealing with multiple Markov Chains. IEEE Transactions on Parallel andDistributed Systems 7 (10), 993e1008.

Message Passing Interface Forum, 2008. MPI: a Message-Passing Interface StandardVersion 1.3. University of Tennessee, Knoxville.

Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E., 1953.Equation of State calculations by fast computing Machines. The Journal ofChemical Physics 21 (6), 1087e1092.

Moore, B.R., Turinsky, P.J., Karve,A.A.,1998. FORMOSA-B: aboilingwater reactor in-corefuel management optimization Package. Nuclear Technology 126 (2), 153e169.

bwr in-core fuel management optimization using parallel simulated annealing in formosa-b

Documents