index fund optimization using a genetic algorithm and a heuristic local search

Index Fund Optimization Using a Genetic Algorithm and a Heuristic Local Search

YUKIKO ORITO,1 MANABU INOGUCHI,2 and HISASHI YAMAMOTO21Ashikaga Institute of Technology, Japan

2Tokyo Metropolitan University, Japan

SUMMARY

It is well known that index funds are popular pas-sively managed portfolios and have been used very exten-sively in hedge trading. Index funds consist of a certainnumber of stocks of listed companies on a stock marketsuch that the fund’s return rates follow a similar path to thechanging rates of the market indices. Thus, index fundoptimization can be viewed as a combinatorial optimizationproblem for portfolio management. In this paper, we pro-pose an optimization method that consists of a geneticalgorithm and a heuristic local search algorithm to makestrong linear association between the fund’s return rates andthe changing rates of the market index. We apply ourmethod to the Tokyo Stock Exchange and create indexfunds whose return rates follow a similar path to the chang-ing rates of the Tokyo Stock Price Index (TOPIX). Theresults show that our proposed method creates index fundswith a strong linear association to the market index withminimal computing time. © 2010 Wiley Periodicals, Inc.Electron Comm Jpn, 93(10): 42–52, 2010; Published onlinein Wiley Online Library (wileyonlinelibrary.com). DOI10.1002/ecj.10099

Key words: index funds; combinatorial optimiza-tion; genetic algorithm; heuristic local search.

1. Introduction

An index fund is one form of a passively managedportfolio. There are several reports [2–4] of good opera-tional performance compared to other investment trusts,and not only on their widespread use in hedge trading tocancel price risk by acquiring in the present market oppositepositions, as in the stock index futures in a futures market[1].

In general, an index fund consists of a combinationof several stocks so that the price tracks the movement ofthe target stock index (for instance, the S&P 500 on the NewYork Stock Exchange, the FTSE 100 on the London StockExchange, or TOPIX on the Tokyo Stock Exchange). As aresult, all of the stocks composing the stock index shouldbe included in the fund in order to create a perfect indexfund (for instance, the S&P 500 has 500 stocks, the FTSE100 has 100 stocks, and TOPIX has all the stocks listed onthe first section in the Tokyo Stock Exchange, that is, about1700 stocks). However, index funds must be rebalanced inresponse to changes in the proportions with which compos-ite issues and individual issues are included in the stockindex in order to maintain continuity with the stock indexin a future period. For this reason, index funds composedof many issues generate higher costs as a result of mis-matches in the planned purchase price and the actual saleprice during rebalancing (see Refs. 5 and 6 regarding theproblem of the costs found in rebalancing). Consequently,creating an index fund with a small number of issues wouldbe preferable. The optimization problem for index fundscan be taken as a combinatorial optimization problem inwhich the issues are selected from the stock market, andthen the investment allocation ratios of the selected issuesare determined in the same way as in the problem ofselecting issues in other portfolios. This problem is NP-hard, and deriving an optimal solution in a practical amountof computational time is difficult even when the number ofissues is not particularly large. There are various reports onefforts to find an optimal solution using evolutionary com-puting in order to resolve this problem.

Markowitz [7] has proposed the theory of a meandistributed model that quantifies the tradeoff between ex-pected return and risk in the problem of portfolio optimiza-tion. Other researchers have extended the Markowitz [7]model and applied it to practical problems. Xia and col-leagues [8] have proposed a method for optimizing a port-folio by using a genetic algorithm (GA) with maximumreturn and minimum risk as its indexes. Chang and col-

© 2010 Wiley Periodicals, Inc.

Electronics and Communications in Japan, Vol. 93, No. 10, 2010Translated from Denki Gakkai Ronbunshi, Vol. 128-C, No. 3, March 2008, pp. 407–415

42

leagues [9] optimized an active portfolio by using variousforms of evolutionary computing. In research in which thetarget portfolio was an index fund, Oh and colleagues [10]showed the utility of an index fund on the Korean StockExchange optimized by a genetic algorithm. Takabayashi[11] proposed a method of selecting and rebalancing issueson the Tokyo Stock Exchange by using a genetic algorithm.Orito and colleagues [12] proposed a method of optimizingthe investment allocation ratios of the issues composing afund by using a genetic algorithm. The method proposedby Orito’s group [12] creates an effective index fund overa period in which the behavior of the stock index shows adownward or flat trend. However, its problem is that theeffectiveness of the method, which cannot create a goodindex fund in a period when the trend is upward, is depend-ent on variable trends in the stock index. Thus, in this paperwe propose a new method for index fund optimization usinga genetic algorithm and heuristic local search based on themethod proposed by Orito’s group [12], in order to createa good index fund during a period with any kind of trend.This paper contains no discussion of rebalancing. Rebal-ancing is the repeated swapping of the issues composingthe fund, and the proposed method is a fundamental tech-nique that is applicable even when rebalancing is taken intoaccount. Numerical experiments are presented to show thatthe proposed method can create a good index fund in a shortsimulation time for any trend in any period.

This paper is organized as follows. In Section 2, theoptimization problem for index funds is formalized. InSection 3, the proposed method using a genetic algorithmand heuristic local search is described. Section 4 presentsthe results of numerical experiments using the proposedmethod and a comparison method, and discusses the results.Section 5 summarizes the paper.

2. Formalization of the Index Fund OptimizationProblem

When formalizing the index fund optimization prob-lem, the symbols used are as follows:

N: the number of issues composing the fundi: issue i (i = 1, 2, . . . , N)t: the time t (stock market business day t = 1, 2, . . . ,

T)C: the amount invested in the fund at time t = 1Q(t): the stock index at time tx(t): the rate of variation in the stock index, x(t) =

[Q(t + 1) − Q(t)] / Q(t)pi(t): the price of issue i at time tvi(t): the volume of issue i at time tgi: the investment allocation ratio for issue i. The

investment allocation ratio for each issue in the group of Nissues is such that gi ∈ [0, 1], ∑i=1

N gi = 1.

g_: fund g

_ (g1, g2, . . . , gN)

y(g_; t): the price-earnings ratio for the fund. At time

t, an investment of an amount C in a fund composed of Nissues is assumed. At this point, the current price for issuei at time t is represented by the following equation:

The fund price at time t is ∑i=1N Fi(t) based on Eq. (1), and

the price-earnings ratio for the fund is expressed in termsof the investment allocation ratio g as

y(g_

; t) =

∑Fi

i = 1

N

(t + 1) + ∑Fi

i = 1

N

(t)

∑Fi

i = 1

N

(t)

.

The purpose of the present investigation is to create a fundin which the behavior of the fund prices tracks the behaviorof the stock index. For this reason, an objective functionthat minimizes the difference between the stock index andthe fund price may be considered. However, these twoquantities have different units, and thus in order to calculatethe difference, the multiple for the stock index with respectto the fund price must be determined beforehand. In orderto avoid this problem, in this paper we compare the fluctua-tion rate of the stock index and the price-earnings ratio ofthe fund. The coefficient of correlation and the coefficientof determination are used as indices to represent thestrength of the linkage between the two quantities in recur-sive analysis. The coefficient of determination is found asthe square of the coefficient of correlation. The closer thevalue of the coefficient of determination or the coefficientof correlation is to 1, the more closely the two variables arelinked. The index fund considered in this paper has N issues,as described above. These N issues have already beenselected (see Section 3.1 for details) by heuristic localsearch before the optimization by a genetic algorithm orglobal search that is used in the proposed method. Thisheuristic local search selects N issues having a positivelinear relationship to the behavior of the stock index [thepositive linear relationship of the fluctuation rate of thestock index and the price-earnings ratio of the fund of Nissues selected by the heuristic rules is shown in Fig. 8(a),which gives the results of the numerical experiments de-scribed in Section 4.4]. Consequently, the coefficient ofdetermination can be considered as a satisfactory objectivefunction for the index fund optimization problem consid-ered in this paper. The strength of the link between thefluctuation rate x = (x(1), x(2), . . . , x(T)) for the stock indexat time t = 1, 2, . . . , T and the price-earnings ratio for fundy = (y(g

_; 1), y(g

_; 2), . . . , y(g

_; T)) is represented by the co-

efficient of determination expressed as follows:

(1)

43

where Cov(x, y) is the covariance of x and y, and Var(x) andVar(y) are the variances of x and y, respectively.

Thus, the index fund optimization problem discussedin this paper can be formalized as follows:

3. Application of GA and HLS to the Index FundOptimization Problem

The method proposed in this paper consists of thefollowing two-step procedure.

(1) Genetic algorithm

The investment allocation rates (g1, g2, . . . , gN) forthe fund g

_ are optimized by a genetic algorithm with

maximization of the coefficient of determination in Eq. (2)as the index.

(2) Heuristic local search

The investment allocation ratios (g1, g2, . . . , gN) arereoptimized by local heuristic search on the variance graphfor x(i) and y(g

_; t) with respect to the fund created in Step

(1).The algorithms in Steps (1) and (2) are described in

Sections 3.1 and 3.2.

3.1 Step 1: Optimization using a geneticalgorithm

The algorithm of Step (1) in the proposed method isa revision of the selection step in the genetic algorithm ofOrito and colleagues [12].

Suppose that the stock market is composed of Kissues, stock 1 through stock K. The average sales volumefor the period of T days for issue i is given by the equation

Here the K issues are renumbered as follows:

V1 ≥ V2 ≥ . . . ≥ Vi ≥ . . . ≥ VK

Hence, issue i is the issue with the i-th highest average salesvolume of all K issues on the market.

The genetic algorithm described below is applied tothe group of N issues from issue 1 through issue N.

• Genetic representation

The length of the gene is the number of issues com-posing the fund (N), and each gene corresponds to theinvestment allocation ratio gi (i = 1, 2, . . . , N) for each sharegiven in the order of issue i through issue N as shown inFig. 1.

• Gene manipulation

We produced new chromosomes by crossover andmutation after creating the initial group of chromosomes byusing the genetic representation shown in Fig. 1. The coef-ficient of determination [Eq. (2)] for the fund in eachcombination of the genes (investment allocation ratio) foreach chromosome is used as the fitness. The same numberof chromosomes as in the initial group is selected to becomethe next generation. These operations are described below.

• Creation of the initial chromosome group

For each chromosome in the initial group of size M,a gene gi is generated at random so as to satisfy ∑i=1

N gi = 1.

• Crossover

Two points on a gene are selected at random, and thegene segments between these two points are swapped,creating two new chromosomes. Each gene with the newchromosomes is subjected to normalization gi / ∑i=1

N gi = 1 soas to satisfy ∑i=1

N gi = 1.

• Mutation

Two points on the gene are selected at random, andthe gene between these two points is regenerated. Each genein the new chromosome is normalized so as to satisfy∑i=1

N gi = 1.Similarly, a new chromosome with the genes outside

the two points is generated at random. Figure 2 shows theprocess of mutation with N = 5 as an example of thegeneration of new chromosomes in the first of the twooperations above.

• Selection

The fitness of each chromosome is found, the chro-mosomes with β% for the group size M are selected by theelite method, and the chromosomes with 100–β% are se-lected by the roulette method, becoming the next generationof chromosomes. In Ref. 12, elite selection is performed ononly one chromosome in each chromosome group. How-ever, in the method presented here the genetic algorithm

Fig. 1. Genetic representation.

(2)

44

selection is performed as above because of the improvedexperimental results.

• Determining the end of genetic algorithm opera-tion

When the chromosome group reaches the set finalgeneration, the operation of the genetic algorithm is halted.The chromosome having the highest fitness among the finalgroup of chromosomes is used as the combination for theinvestment allocation ratio that yields an optimal solutionfor Step (1) in the proposed method.

3.2 Step (2): Reoptimization using a heuristiclocal search

The number of solutions to be searched through isextremely large because of the presence of combinations ofinvestment allocation rates g

_ = (g1, g2, . . . , gN) for the N

issues composing the fund. Finding the combination ofinvestment allocation rates that yields an optimal solutionis difficult. We believe that this problem is one factorresponsible for the error in the solution, due to the fact thatthe method was used in a period during which the effective-ness of the method proposed by Orito and colleagues [12]was dependent on trends in the fluctuations of the stockindex.

In order to resolve this problem, we reoptimized thefund obtained in Step (1) using the proposed Step (2). Inorder to describe the background of Step (2), an exampleof a scatter diagram of the return rate of the fund obtainedin Step (1) and the fluctuation rate of the stock index is givenin Fig. 3.

Each score (point) in Fig. 3 corresponds to a time t.In Step (2), we attempt to increase the coefficient of deter-mination obtained from Eq. (2) by having all scores in thescatter diagram approach the estimation line. In otherwords, we reset the investment allocations for the fundswith the return rates for such scores so as to reduce the errorbetween the scores and the estimation line on the scatterdiagram. However, finding combinations of investmentallocation rates that reduce the error between all scores andthe estimation line on the scatter diagram is extremelydifficult due to the very large number of combinations.Thus, in the proposed Step (2), we select several issuesthought to be significantly affected by the error by using aheuristic local search, then attempt to reoptimize only theinvestment allocation ratios for those issues. The contentsof parts (a) to (e) of Step (2) are described below.

(a) Derive the initial error on the scatter diagram

A scatter diagram is created with the scores plottedon a graph whose horizontal axis is the fluctuation rate x(t)of the stock index at times t = 1, 2, . . . , T, and whose verticalaxis is the return rate y(g

_; t) for the fund obtained in Step

(1). The estimation line for the scores using the coefficientsa and b obtained by the method of least squares is given bythe equation

The error between the real value y(g_; t) of the return rate of

the fund at time t and the estimated value y(g_; t) obtained

from Eq. (3) is represented by the equation

The total error for T days is given by the following equationand is referred to as the “initial error”:

Fig. 2. Process of mutation.

(3)

(4)

(5)

Fig. 3. Scatter diagram between the fund’s return ratesand the changing rates of the market index.

45

In order to minimize the error E obtained from Eq. (5), weat tempt to reset the investment allocation rateg_ = (g1, g2, . . . , gN) for the fund in the next step (b).

(b) Extract the issues with a high present value

Finding the combination of investment allocationratios for all of the scores in the scatter diagram so that theerror E is minimized is extremely difficult because thenumber of combinations is large. Thus, we select the nissues thought to have a substantial effect on the error E.

The scores on the scatter diagram correspond to timet, and the error E(t) obtained using Eq. (4) at each time isspecified. Thus, we first sort the times t in descending orderof error so that E(t1) ≥ E(t2) ≥ . . . ≥ E(tT), then define theaverage present value of issue i from time ij = 1 to tji = Jusing the following equation derived from Eq. (1):

Based on Eq. (6), the greater the present value of the scoreswith a larger error E(t) on the scatter diagram, the greaterthe average present value F

__i. An issue with a high present

value is an issue with a high ratio of the issue price to thefund price at that time. Thus, we sort the issues i in descend-ing order of the average present value so thatFi1

≥ Fi2 ≥ . . . ≥ FiN

, then select n issues in from among theissues i with high average present values as the target issuesfor local search.

(c) Reoptimization of the investment allocation rate

We change the investment allocation rates for the nissues in from among the selected issues i1, then attempt tocreate a new fund with a high coefficient of determination.However, even if a combination of optimal investmentallocation ratios for the n issues can be found, the N issuescomposing the entire fund cannot be optimized. Thus, inour method, optimization of the investment allocation rateis temporarily performed in order from issue i1 to in usingEq. (7) for the fund g

_ = (g1, g2, . . . , gN) obtained in Step

(1), to obtain the re-created fund:

Here wlj represents the weighting parameter for issue l. gl1

g

is found first from Eq. (7), and is then used to obtain gl2g.

Similarly, the fund (g1g , g2

g , . . . , gNg ) is ultimately obtained.

Normalization is performed on the resulting fund, which isrepresented as g

_(l) = (g1

(l), g2(l), . . . , gN

(l)).A fund consisting of a new investment allocation rate

having a coefficient of determination higher than the fund

obtained in Step (1) of the proposed method can be ex-pected as a result of the above operations.

(d) Derive the present error on the scatter diagram

A scatter diagram is created for the return ratey(g; t) of the new fund obtained in Step (c) and the fluctua-tion rate x(t) for the stock index, as was done in step (a).The total error for T days is then obtained from Eq. (5). Thistotal error is referred to as the “present error.”

(e) Determine the end of the heuristic local search

Because the total score changes in the scatter diagramwhen the investment allocation ratio for the n issues isvaried from steps (a) through (d), the estimation line repre-sented in Eq. (3) is newly estimated in combination withthe changes in the scores. As a result, the error E on thescatter diagram obtained from Eq. (5) cannot be guaranteedto become smaller. In order to address this issue, steps (b)

(6)

(7)

Fig. 4. Flowchart of the second step in proposedmethod.

46

through (d) are repeated Z times, and when the present erroris smaller than U% of the initial error, the local search isterminated. When the conditions for termination above aresatisfied, the fund g = (g1

(m), g2(m), . . . , gN

(m)) obtained as aresult of a number of iterations m (≤ Z) in Step (2) is usedas the investment allocation combination that yields anoptimal solution for Step (2) in the proposed method. Thisfund is an index fund obtained by using the proposedmethod.

Figure 4 summarizes in a flowchart the operations inStep (2).

4. Numerical Experiments

4.1 Experimental data

We divided the Tokyo Stock Exchange First Sectionfrom January 6, 1997 through July 15, 2005 (2100 days ofstock trading) into 21 experimental periods of 100 dayseach, then applied our proposed method to each experimen-tal period. The experimental periods are referred to asperiod 1 (January 6, 1997 through May 30, 1997), period 2(June 2, 1997 through October 23, 1997), and so on toperiod 21 (February 21, 2005 through July 15, 2005). Thestock index to be used for the index fund was the TOPIX,and all issues without data defects due to new listings orremoval from the market during each period were used inthe experiment. The price and volume figures used for eachissue were the final TOPIX figures [3] for each day. As aresult, the time t refers to a single business day on the stockmarket and each period is composed of T = 100 days.

Figure 5 shows the trend of TOPIX during the entireexperimental period.

Based on Fig. 5, the TOPIX during the entire experi-mental period is subdivided into periods characterized byrising trends (periods 1, 6, 7, 8, 16, 17, 18), periods charac-terized by declining trends (periods 2, 5, 9, 10, 12, 14, 15),and periods characterized by flat trends (periods 3, 4, 11,13, 19, 20, 21).

4.2 Experimental methods

In order to evaluate the effectiveness of the proposedmethod, we performed a comparative experiment using thethree methods described below. The computer used in theexperiment was a Pentium IV, 2.4 GHz/512 MB RAM.

• GAM1 (Genetic Algorithm Method 1)

This is the comparison method for the numericalexperiments. It is composed of only Step (1) of the proposedmethod, and is referred to as GAM1. Step 1 in the proposedmethod, as described in Section 3.1, is different from themethod of Orito and colleagues [12] in terms of the selec-tion used in the genetic algorithm. However, based on theresults of experiments using elite selection of a percentageβ = 0.5, 10, 15, or 20%, of stocks, the determinationconstant for the fund resulting from Step (1) in the proposedmethod was confirmed to be greater than the determinationfunction for an index fund obtained from the method ofOrito and colleagues [12] with β = 5%, 10%, 15%, or 20%.Thus GAM1 was used as a comparative method that is moreeffective than the Orito et al. [12] method.

In GAM1 the maximization of the coefficient ofdetermination obtained from Eq. (2) constitutes maximiza-tion of the fitness of the genetic algorithm. The parametervalues for the numerical experiment were set as follows.

Number of issues composing the fund: N = 300 (setbased on preliminary experiments in Section 4.3)

Size of chromosome group: M = 100Final generation number for the genetic algorithm:

300Crossover probability: 0.9 (set on the basis of the

preliminary experiments described in Section 4.3)Mutation rate: 0.1 (set on the basis of the preliminary

experiments described in Section 4.3)Elite selection percentage for stocks (%): β = 5For N = 300 issues, the genetic algorithm was run 20

times.

• GAM2 (Genetic Algorithm Method 2)

In Step (2) of the method proposed in this paper, inorder to have the price-earnings ratio of the fund track thefluctuation rate of the stock index, we attempted to increasethe coefficient of determination by using an approach inwhich the error E in the scatter diagram obtained from Eq.(5) was minimized. We used the GAM2 method, incorpo-rating a genetic algorithm in which the error E was thefitness, as a comparative method in the numerical experi-ments.

In GAM2, minimizing the error obtained from Eq.(5) constitutes maximizing the fitness of the genetic algo-rithm. The parameter values for the numerical experimentsFig. 5. TOPIX.

47

were the same as in GAM1. For N = 300 issues, the geneticalgorithm was run 20 times.

• GAHLSM (Genetic Algorithm and Heuristic Lo-cal Search Method)

This is the method proposed in this paper. GAHLSMapplies Step (2), which uses heuristic local search, to thefund obtained from Step (1), which uses the genetic algo-rithm. The parameter values for Steps (1) and (2) were setas follows.

Number of issues composing the fund: N = 300Size of the chromosome group: M = 100Final generation number for the genetic algorithm:

100 (the figure of 100 generations, smaller than in GAM2,was selected with consideration of the simulation time. SeeSection 4.4 for the results of the numerical experiment)

Crossover probability: 0.9Mutation rate: 0.1Elite selection percentage for stocks (%): β = 5Number of days for calculation of the average current

price: J = 10Number of issues to be reoptimized: n = 100Optimal weighting parameter for reoptimization in

Eq. (7): wmax = 2Revised error rate (%): U = 90Number of iterations of the algorithm: Z = 10

The number of iterations of the algorithm can be expectedto converge to a value at which the error E in the scatterdiagram above becomes smaller as the number of iterationsZ is increased. However, in this investigation the revisederror rate in the proposed method was used as a parameter,and Z = 10 was set as an upper limit for the number ofiterations of the algorithm sufficient to satisfy the revisederror rate of U = 90%. GAHLSM was applied to the resultsof the 20 iterations of the genetic algorithm.

4.3 Preliminary experiments

We performed preliminary experiments in order to setthe number of issues N composing the fund, and also thecrossover probability and mutation probability for the ge-netic algorithm.

First, the number of issues was set to N = 200, the sizeof the chromosome group to M = 100, the final generationnumber for the genetic algorithm to 100, the elite selectionpercentage (%) for stocks to β = 5, the crossover probabilityto {0.7, 0.8, 0.9} and the mutation probability to {0.1, 0.2,0.3} in all combinations for Step (1) of the proposedmethod. Based on the experiments with these combina-tions, we decided to set the crossover probability to 0.9 andthe mutation probability to 0.1 in GAM1, GAM2, andGAHLSM, which gave the greatest coefficient of determi-

nation for the fund in the majority of the experimentalperiods.

Next, we performed experiments with the chromo-some group size equal to M = 100, the final number ofgenerations for the genetic algorithm to 100, the elite selec-tion percentage (%) for stocks to β = 5, the crossoverprobability to 0.9, and the mutation probability to 0.1 foreach of the numbers of issues N = {200, 300, 400}. Basedon the experiments we set the number N of issues compos-ing the fund to N = 300, which gave the greatest coefficientof determination for the majority of the periods.

4.4 Experimental results and discussion

Figure 6 shows the behavior of the coefficient ofdetermination [Eq. (2)] for each generation in the geneticalgorithm in one period as an example of the index fundsobtained by using GAM1 and GAM2. Figure 7 shows thesimulation time (seconds) for one iteration of the geneticalgorithm. Similarly, the position of the 100th generationin Figs. 6 and 7 is marked with a black diamond, repre-senting the coefficient of determination and the simulationtime for the fund in GAHLSM.

As indicated by Fig. 6, the value of the coefficient ofdetermination for the fund obtained by GAHLSM in period1 is higher than the values of the coefficient of determina-tion at the 300th generation in GAM1 and GAM2. On theother hand, as indicated by Fig. 7, the simulation time isshorter than that found at the 300th generation for GAM1and GAM2. Thus, it is clear that the proposed GAHLSMcan create an index fund with a higher coefficient of deter-mination while using slightly less simulation time thanGAM1 or GAM2.

The best and worst values, the mean and the standarddeviation of the coefficient of determination for the 20funds obtained using GAM1, GAM2, and GAHLSM in allperiods are shown in (a), (b), and (c) of Table 1. Table 2summarizes the best coefficients of determination for thefunds obtained from GAM1, GAM2, and GAHLSM, as

Fig. 6. Coefficients of determination.

48

well as for the initial fund before the experiment. Table 3summarizes the errors with respect to the estimation line[Eq. (5)] on the scatter diagram for these funds. Figures 8(b)to 8(d) show the scatter diagrams for the funds having thebest coefficient of determination obtained from GAM1,GAM2, and GAHLSM, and Fig. 8(a) shows the scatterdiagram obtained from the initial fund for period 1.

Tables 1(a), (b), and (c) make it clear that the coeffi-cient of determination for the 20 funds obtained by GAM1,GAM2, and GAHLSM has a small standard deviation andexhibits stable results. Applying the Wilcoxon rank-sumtest (a nonparametric test to determine if two observedsamples follow the same distribution) to the 20 coefficientsof determination for GAM1 and GAM2 in each period, andto the 20 coefficients of determination for GAHLSM, wefound that the difference in the distributions for the 20coefficients of determination obtained from GAM1 andGAM2 as well as the 20 coefficients of determinationobtained from GAHLSM was 99% reliable in all periods.This result, along with the data in Table 2, make it clear thatthe fund constructed by using GAHLSM has a greatercoefficient of determination than the funds obtained usingGAM1 or GAM2 across all periods. Furthermore, as indi-cated by Table 3, the fund obtained by GAHLSM across allperiods has a smaller error on the scatter diagram than thefunds obtained using GAM1 or GAM2. It is also clear thatthe coefficient of determination is an effective repre-sentation of fitness in the optimization problem under con-sideration, because GAM2, for which the measure of fitnessis the error of the scatter diagram, has a smaller error thanGAM1, in which the coefficient of determination is used asthe fitness measure. For comparison with the scatter dia-gram, Fig. 8 shows that GAHLSM can create an index fundthat closely tracks the fluctuation of the stock index.

Thus, the proposed GAHLSM method clearly cancreate an index fund that closely tracks the fluctuation of astock index in a simulation time that is slightly shorter thanthat of GAM1 and GAM2, which are based on the method

Table 1. The coefficients of determination of indexfunds

Fig. 7. Computing time.

49

of Orito and colleagues [12], even for periods with rising,falling, or flat trends.

4.5 Gene structure for the GAHLSM fund

The results presented in the previous section indicatethe possibility of performing operations other than the

Table 2. A comparison of the best coefficients ofdetermination obtained by GAM1, GAM2, and

GAHLSM

Table 3. A comparison of total errors obtained byGAM1, GAM2, and GAHLSM

Fig. 8. Scatter diagram between the return rates of theindex fund and the changing rates of TOPIX.

50

genetic operations that are performed when superimposinggenerations in the genetic algorithm used in Step (2) ofGAHLSM. In order to clarify the genetic structure inGAHLSM, Figure 9(a) shows the changes in the geneticstructure after 1 generation, 100 generations, 200 genera-tions, and 300 generations for GAM1 in period 1, and Fig.9(b) shows the genetic structure of the fund obtained usingGAHLSM. The horizontal axis in each figure represents theissues from 1 to N. The vertical axis in Fig. 9(a) representsthe number of elements in the genes for each generation ofthe genetic algorithm (investment allocation ratio g for eachissue) using grayscale shading. Figures 9(a) and 9(b) bothprovide a color bar at the lower right to indicate the rela-tionship between the shading and the investment allocationrate: the values are represented with stocks having higherinvestment allocation rates shown by a darker shade.

Figure 9(a) makes it clear that after the 100th genera-tion of the genetic algorithm, the genetic structure in thefund obtained from GAM1 does not change much evenwhen the numbers of generations overlap. However, thefund obtained from GAHLSM in Fig. 9(b) clearly showsthat the concentration of the number of elements is differentfrom that of Fig. 9(a), and that it has a genetic structuresubstantially different from that of the fund obtained usingGAM1. Thus, the proposed method is able to derive asolution that is difficult to obtain by a genetic algorithm,and in a slightly shorter simulation time. Detailed analysisof the difference between the optimal solution obtained bya genetic algorithm and the optimal solution obtained bythe proposed method is a topic for the future.

5. Conclusions

We have proposed the GAHLSM optimizationmethod, which uses a genetic algorithm and heuristic localsearch, to solve the problem of optimizing an index fund byusing an objective function based on the coefficient ofdetermination of the fluctuation rate of the stock index andthe yield of the fund. Numerical experiments show thatwithin the scope of this paper, the proposed method cancreate, in a limited simulation time, an index fund that isstrongly linked to high values of the objective function, andis independent of the trends in the stock index. Furthermore,the genetic structure of the fund using the proposed methodwas shown to be significantly different from the geneticstructure obtained when using a genetic algorithm. Clarify-ing whether the characteristics of the genetic structure ofthe fund obtained from the proposed method depend on thecharacteristics of the genetic algorithm problem, or on thecharacteristics of the index fund problem, and demonstrat-ing the effectiveness of this genetic structure for rebalanc-ing of the created fund in a future period, are topics forfuture investigation.

REFERENCES

1. Laws J, Thompson J. Hedging effectiveness of stockindex futures. European Journal of Operational Re-search 2000;163:177–191.

2. Elton E, Gruber G, Blake C. Survivorship bias andmutual fund performance. Review of Financial Stud-ies 1996;9:1097–1120.

3. Gruber MJ. Another puzzle: the growth of activelymanaged mutual funds. Journal of Finance1996;51:783–810.

4. Malkiel B. Return from investing in equity mutualfunds 1971 to 1991. Journal of Finance 1995;50:549–572.

5. Aiello S, Chieffe N. International index funds and theinvestment portfolio. Financial Services Review1999;8:27–35.

6. Chang KP. Evaluating mutual fund performance: anapplication of minimum convex input requirementset approach. Computers & Operations Research2004;31:929–940.

7. Markowitz H. Portfolio selection. Journal of Finance1952;7:77–91.

8. Xia Y, Liu B, Wang S, Lai KK. A model for portfolioselection with order of expected returns. Computers& Operations Research 2000;27:409–422.

9. Chang TJ, Meade N, Beasley JE, Sharaiha YM. Heu-ristics for cardinality constrained portfolio optimiza-tion. Computers and Operations Research 2000;27:1271–1302.Fig. 9. Genetic structure.

51

10. Oh KJ, Kim TY, Min S. Using genetic algorithm tosupport portfolio optimization for index fund man-agement. Expert Systems with Applications2005;38:371–379.

11. Takabayashi S. Fund creation and rebalancing usinggenetic algorithms. Japan Finance, Stock Metrics,and Engineering Society, Preliminary Papers of the1995 Winter Research Committee, p 94–103.

12. Orito Y, Takeda M, Yamazaki G. Evaluating the effi-ciency of index fund selection over the fund’s futureperiod. Computational Intelligence in Economicsand Finance 2007;2:157–168.

13. Stock CD-ROM. Toyo Keizai Shinbun Corp., 2003–2006.

AUTHORS (from left to right)

Yukiko Orito (member) studied in the graduate program of Tokyo Metropolitan Institute of Technology (now TokyoMetropolitan University) until 2002 and became an instructor in the Department of Systems Engineering of Ashikaga Instituteof Technology in 2003. She holds a D.Eng. degree, and is a member of the Japan Industrial Management Association.

Manabu Inoguchi (nonmember) received a bachelor’s degree from Tokyo Metropolitan Institute of Technology in 2007and entered the Graduate School of Tokyo Metropolitan University.

Hisashi Yamamoto (nonmember) completed his studies at the Tokyo Institute of Technology in 1983. After employmentat Toshiba Corporation and the West Tokyo University of Science (now Teikyo University of Science and Technology), hebecame a professor at Tokyo Metropolitan University in 1998. He holds a D.Eng. degree, and is a member of the Japan IndustrialManagement Association and the Society of Plant Engineers of Japan.

52

index fund optimization using a genetic algorithm and a heuristic local search

Documents