ga knapsack

Upload: mingin2008

Post on 03-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Ga Knapsack

    1/6

    Solving Large Knapsack Problemswith a Genetic A l~or ith~Dr. Richard SpillmanDepartment of Computer SciencePacific Lutheran UniversityTacoma, WA 98447SPI LLMRJ @PLU. edu

    A3STRACTThispaper develops a new approachto fi nd ngsolutions to the subsetsumproblem. The subsetsumproblem is an important NP-complete problem incomputer science which has applications in operationsresearch, cqptography, and bin packing. A geneticalgorithmisdeveloped which easily solvesthisproblem.The genetic algorithm beginswitha randomly generatedpopulation of solutions and breds a new populationusing the best elements of the previous population.Each gemtion of solutions producesbetter solutionstothe subset-sum problemthanthe previous generation. Itis shown that this approach wll efficiently producesolutions to large(10OOOelementsor more) subsetsumproblems. Vari ous parameters of the algorithm arevaried inorderto improveitsperformaace.

    1.6 IntroductionGivenn positive integm wl, . . . ,w andapositive integerW, the subset-sum problem(SSP)asksf or the subsetof the wfswhosesum isW or as closeaspossible toW without exceedingW. Formally, theSSPseeks to

    nmaximize W,X ,l=-

    nW,X , sw

    i - 1

    ere3 is 0 or 1. Thisproblem is knownto beNP-complete[I ] and hence in its most g1 form, it isadifficult problem to solve. Yet, it is also a problemwhich offers many practical appiications in computerscience,operationsresearch, and management science. Italso serves as the basis f or several public keycryptosystems. Because of the importance of thepotenti al applications and the general difficulty ofsolving large SSPs, several algorithms have beendeveoped which solvethe problem directlyand several0-7803-2559-1/95$4.000 1995 IEEE

    fastexalgorithms havebeenproducedwhichprovidegoodapproximate solutions.For example, Ibam and Kim [2] developed afullypolymmial approximation scheme f or theSSP in1975. It was improveduponby Lawler [3] and lambyMartello and Toth [4]. Fischetti [5] confirmed theperfbmaweof Martello and Toths schemealthoughhef oundthat the worst caseperfwmancewas not quiteasgood as suggest ed in [4]. Ma~tello nd Toth reportedvery goal results for several approximationschems inther survey and experimental analysis[6].In 1985,Lagariasand Odlyzko [7l reported thedevel opment of an algorithm for the solution of lowdensity knapsacks. Their algorithm could solve lowdensityknapsacksof s i ze 50usi ng about 14minutes ofCRAY-1 t i me. Coster et. al. 1471 reported animprovement of the Lagarias and Odlyzko algorithmwhich both speeded it up and found more solutions.However, thei r experimental analysis only consideredknapsacks up to size 66. Both of these sizes are, ofcourse, considerably less than the 10OOO elementlmapsack generated by DES. Balas and Zemel (481, onthedher hand, producedanalgorithm for largeknapsackswhichperforms quite wel. However, as they noted inthe r report, their results apply only to knapsacks withbounded coefficients. When the coefficients are alowedtogrowwith t he sizeof theproblem( as seem tobethecasewith theknapsack embeddingof s-boxes) then theyconcluded that the solution time grows exponentiallywith knapsack size. Martelloand Toth (493 have alsosuggested a solution to the knapsack problem whichworks wel on large knapsacks. They also noted alimitation tother approach. Their best results for largekqmcks occured when a large number of opti dsolutions to the knapsack exist so that their branch andboundprocedurewll terminateearly.The Martello and Toth algorithm whichperformedthebest intheir survey isbasedonthe useofagreedy algorithmfor solution of theSSP. This paperpresents an entirely different approach to the SSP, onebasedontheuseof a genetic algorithm which provides a

    632

    mailto:[email protected]:[email protected]
  • 7/29/2019 Ga Knapsack

    2/6

    directed random search of the SSP solution space. Itturns out that the algorithm is quite simple toimplement and the run times are short for even largeproblems. 2.0 Genetic AlgorithmsGenetic algorithms (GAS) wherefirstsuggestedby John Holland in the early seventies E111. Over thelast 20 years they havebeenused to solve a wide rangeof search, optimization, and machine learning problems.As thename indicates, these algorithms attempttosolveproblems by modeling the way in which human geneticprocesses seem tooperate (at least at a simple level). Agood survey of thenatureanduseof genetic algorithmscan be found inbook by Goldberg [121.Holland's idea was to construct a searchalgorithm modeedonthe concepts of natura selection nthe biological sciences. The result is ad md andomsearch procedure. The process begins by constructing arandom population of possible sol uti ons. Thi spopulation sused tocreate a new generation of possiblesolutions which is thenused tocreateanothergenerationof solutions, andsoon. The best elements of the currentgeneration areused to createthe next generation. It ishoped that the new generation will contai n "better"solutions than the previous generation. Remarkably,thisturnsout to be thecase inmanyapplications.2.1 Many applicationsof genetic algorithms havebeen suggested including the deveopment of geneticalgorithms to attack NP-complete problems. Yet, mostof that effort hasbeendi rectedtotheTraveling SalesmanProblem (TSP) [13,14,151. The TSP hasposed severalproblems for genetic algorithms most of which fall inthe area of representation. On the other hand, littleconsiderationhasbeengiven to the class of subsetsumproblems for which the representation issue is easy tosolve. Recently, Falkenauer and Dechambre [16] didreport some success using a genetic algorithmto solvethe bin packing problem which is related to the subsetsum problem. Spillmanhas alsosuggestedusing geneticalgorithmstosolve the knapsack cipher[171.When constructing a genetic algorithm for aspecific problem area, therearethreesystems whichneedto be defined. The first is the representation scheme.The second is a mating pmcss consistent with therepresentation. The third is a mutation process. Allthreesystems for the SubsetSumProblem aredefinedinthis section.2.2 Key RepresentationThe representation structure for theSSP iseasyto generate because the problem naturaly suggests ascheme. The binary bit pattern which represents thesummation termsisperhaps the best structure. Hem atypical population for an 8 element SSP would be:

    A Genetic Algorithm for the SSP

    10101010 - addtermS1,3,5,&701 101 100 - addterms2,3,5,and600001001 - addtermsSand8etc.

    Once a representation scheme is sel ected for agenetic algorithm, it is al so necessary to supply anevaluation function. This function is used to determnethe "best" representations. Again, for theSSP thebaskstructure of the evaluation function is easy todetermne.It shouldmeasure how close a givensumof terms is tothe targetsum Within thisgeneral guideline there is awide rangeof possible variations. For this paper, theactua evaluation function should have three otherpmwes. First, the function should range between 0and 1with 1 indicating an exact match with the targetsumfor the knapsack. Requiring the evaluation functionto fall within a set range gives a better picture of globalperfor". Second, chromosomes which produce asumgreater than the target should, in general, have alower fitness t han chromosomes which produce anequivalent sum less t han the target. In this way,infeasible solutions, solutions which produce a sumgreater than the sought after target are penaized whilefeasible solutions have a greater change of beingfollowed by the algorithm. Third, it should be difficultto produce a high fitness vaue. Small differencesbetween the current chromosome and the target sumshould be amplified. Thi s isaccomplished by theuseofthesquareroot in the evaluation function chosen for thisresearch. It should be noted that none of these threeconditions are required by the nature of the gemticalgorithm. In fact, research in genetic algorithm designhas shown that any reasonable choice of an evaluationfunctionwill work [181.The actual evaluation function used in thisresearch effortisdetermined as fol l ows:(1) Calculatethemaximumdifference that could occurbetween a chrommeand the target sum:MaxDiff =max (Target,Ful l -Sum- Target)where Full-Sum is thesumof all the componentsin the(2) Determnethe value of thecurrent chromosome callit Sum.(3) If Sum Target then the fitness of the chromosomeis given by:2.3 The Mating ProcessGiven a population of chrommes, eachonewith a fitness value, the algorithm progresses byrandomy selecting two f or mati ng. The selection isweighted in favor of chromosomeswith ahi gh fitnessvalue. That is, chro"es with a high fitness valuehave a greater chance of being seected to generatechildren for the next generation. The two parentsgenerate two children using the standard crossoveroperati on.2.4 The Mutation Process

    Af ter thenewgenerationhas been determned,the chmosomes are subjected to a low rate mutation

    knapsack.

    Fit=1- 6throot(lSum- TargetlMaxDiff)

    633

  • 7/29/2019 Ga Knapsack

    3/6

    function which involves three different processes in anattempt to vary the genes. Haf of the time bits arerandomy mutated. The other haf of the time, bits havea low probability of bengswappedwith ther neghbor.The final mutation process is oneof invertmg the orderof a set of bits between two random points. In thisinversion process, wluch also has asmallprobability ofoccurring, two randompoints in the chromosome areseected and the order of the bits between those points isreversed. For example, using() to note the two randompoints, chromosomeA - 0 1 1 ( 0 1 1 0 1 1 ) 0 0 1becomes0 11( 1101 1 0 ) 00 1. These are all lowprobability mutation processes but between them, theyhelp toprevent the algorithm frombecomngstuck in alocal optimum point.2.5 The Complete AlgorithmThese pmcesses are combined to mate thecomplete genetic algorithm. 1. A random populationofchromosomes(binarystring of 0'1& 1's)is generated.2. A fitness value for each chromosome in thepopulation isdetermned.3. A biased (basedon fitness) random selection of twoparentsisconducted.4. The crossover operation is applied to the seectedparents.5. The mutation process is appliedtothe children.6. Thenew population isscannedandused toupdate the"best" chromosome across the generations.This process will stop after a fi xed number ofgenerations or when the best chromosomehas a fitnesswhich exceeds theapproximationleve.3.0 Experimental ResultsA SUN pascal program was wi tten for aSPARCStation 1+which implemented the algorithmdescri bed in section 2.0. The fundamental probleminputs to the program included n, the size of theproblem, the set of nweghts, and the targetsum. Theinputs to the program which describe t he geneticalgorithm included the population si ze, the maximumnumber of generations, the probability of mutation, theprobability of inversion, the probability of a swapoperation,and the approximation level. A typical funwas set f or a population of 40; a maximum of 250generations; a probability of mutation of O.OOO1; aprobability of inversionof 0.001; anda probability of a

    In general, the gemtic algorithmwas able tosolve largeSSPsto ahi gh degree of accuracy in a shorttime. Thi s experimenta analysis will present the resultsin severa ways. First, the genera performanceof thealgorithm across a range of largeSSPswill be presented.Second, the effect of the initial population s ize on theperformance of the algorithm will beconsidered.3.1 General Performance

    Overall, the algorithmefficiently solved largeSSPs in a short time. Onmore than lo00 runs of thealgorithmon problems of 100 to 17,000 variables, itrarey faled to findan acceptable solution. Even in

    swap of 0.001.

    thosecases that failed, it found a very good approximatesolution but it ran into the maximum number ofgenerations (whichwas set to a low 250 for that run).Clearly, given more generations, every case would havebeensolved.It was found that the genetic algorithm couldeasilyand quickly find a solution that was close to theoptimal. However, it was slow to movefroma closesolution to the optimal. As a result, a local searchroutinewas added to the basic agorithm. Whenever apopulation eementwas within .95 of the solution, thesystem would examne the eement one bit at a time todetermine if complementing that bit would improvethesolution. This local search metlxxi did not add muchoverhead to the algorithm yet it greatly improved thetime to solution. In fkt, the algorithm with the localsearch routine routinely solved SSPs in the range ofl0OOO elements in usually less than 20 mnnutesonaSUNSPARC l+. Figure 1isagraph of the fitness ofthe best eementina population of 40 for a typical runona l0OOOeementSSP. Theexact solutionwasfoundin 12.2mnutes. Thisrun requi redonly 30generationswhich means that the algorithm examned only 1200elements of the solution space. Figure 2 shows theresults for singlerunsonthreedifferentsi zed SSPs. Thes imlar i ty in the three solutions does not seem to beund . Infact, the average time to solution across a setof 50 randomSSP or each of 21 differentsized problemsdid not vary significantly. These results are shown inFigure3.32 Population Size EffectsWhile the main purpose of this report is toestablish the genetic algorithmas a viable approach tosolving the SSP problem, it is also interesting toconsider the effects of various genetic parametersontheperformance of the algorithm. As a illustration, thissectionwill briefly look at the impact of population sizeonthe solution of theSSP.As expected, the numer of generations to asolutiondecreases as the population size increases whichisclearlyshownin Figure8(for a l0OOOeementSSP).This occurs because thereare more eements examnedwithin each generation. Of course,thisalso implies thatas the population size increases, the time to process eachgeneration also increases. The result seems to be arelatively stable processing time (within the range 14 to20) as shown in Figure 4.

    4.0 ConclusionsWhile it was found that a genetic algorithmcould efficiently solve a large SSP, many questionsremai n open. For example, what is the effect ofchanging other genetic parameters such as mutation orcrossover rates on the performance of the agorithm?Thispaper only considered the basic formof a geneticagorithm. Other approaches to genetic algorithmsexists,so the question of which genetic structure isbestfor the solution of SSPs needs to be investigated.

    634

  • 7/29/2019 Ga Knapsack

    4/6

    Finally, the application of genetic algorithms to otherversions and modifications of the standard Sum ofSubsets problem shouldbeconsi dered.

    1. MR. k e y & DS. ohnson, ComputersandIntmctability: A Guide to the llzeoryo NP-Completeness,W.H. Fr eeman&Company, New Y ork(1979).2. O.H. IbarraandC.E. Kim, Fast approximationalgorithms for knapsackand sum of subset problems,J oumalo the ACM, 22,463-468 (1975).3. E.L. Lawler, Fast approximation algorithms orknapsack problems,M#hematics o Opemtions Reseurch,4,339-356(1979).4. S.Martello and P.Toth, Worst-case analysis ofgreedy algorithmsfor thesubset-sum problem,MathematidProg?t"ing, 28,198-205 (1984).5. M Fischetti, Worst-case analysis of anapproximation scheme for the subsetsunproblem,OpemtionsResearchLmen, 5,283-284 (1986).6. S. MartelloandP. Toth, Approximation schemesf orthesubset-sumprob1em:Sweyandexperimentalanalysis,firopean J oumalof operationalResew&, 22,56-69 (1985).7. Lagarias, J.C., and Odlyzko, A.M., "Solving Low-Density Subset SumprOblems," J.ACM, vol32, pp.229-246,1985.8. Coster,M. , on,A., LaMacchia, B.A., Odlyzko,A.M., S C ~K ,., andDensity Subset Sum Algorithms," Comput.complexity, vol2, pp. 111-128, 1992.

    Stem, J., "Improved Low-

    9. Balas,E., and Zemel, E., " AnAlgorithmf or LargeZero-One Knapsack Problems," OperationsResearch,vol. 28, p ~ .130-1154, 1980.10. Martello, S., and Toth,P., "A New Algorithmf orthe0-1KnapsackProblem," Management Science, vol.34,pp.633-644, 1988.11. JohnH.HoIIand,AdapSation inMzamlandArtificial System, Umversity of Michigan Press (1975).

    13.Braun,H. "OnSolving Traveling SalesmanProblems by Genetic Aigorithms", inParallel ProblemSolving fromNature, Lecture NotesinComput erScience. 4%, 129-133,1990.14. Homaifar, A., Guan, S., and Liepins,G. "A NewApproach on the Traveling Salesman Problem byGeneticAlgorithms", Proceedings of the5thInternational ConferenceonGeneticAl gor i t hms, pp.460- 466, 1993.15. Grefenstette, J., "IncorpOrating Problem SpecificKnowledge into Genetic Algorithms," GeneticAlgorithms and Simulated Annealing,ed. by L. Davis,Morgan Kaufmann, pp. 42-60,1987.16. Falkenawr, E. and Delchambre, A., "A GeneticAlgorithm for Bin PackingandLine Balancing,"Proceedingsof the IEEE InternationalConfenaceonRobotics and Automation, pp. 1186- 1192, 1992.17. Spillman,R, "Cryptanalysis of knapsack ciphersusing genetic algorithms," Cryptologia, 17,367-377,1993.18. G. Rawlins, Foundationso Genetic Algorithms,Morgan K auf " Publishers,SanMako (1991).

    12. David Goldberg, Genetic Algorithms in Search,Optimzation,andMachneLaming, Addi smWsley,Readq (1989).

    635

  • 7/29/2019 Ga Knapsack

    5/6

    10. 90.80.70.6

    u 0 .50. 4

    c1ir:0.30. 20.10

    Fitness

    Time(minutes)Figure 1: Typical 10,000Element Knapsack Run

    0 I 1 1 1 1 1 1 1 1 1 1m13

    * CJ3 d *l PI m & VI 0 P W d

    timefminutes)

    Figure2: Knapsack Solutions

    636

  • 7/29/2019 Ga Knapsack

    6/6

    50

    r(

    Figure 3: Average time

    3..4

    SI ZEto solution across knapsacks

    20-18 -16-14 -12

    0 0O f 3 m m * vr

    U ime

    PopulationFigure 4: Populationeffects

    637