tabu search for model selection in multiple regression

38
Tabu Search for Model Selection in Multiple Regression Zvi Drezner California State University Fullerton

Upload: rosa

Post on 23-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Tabu Search for Model Selection in Multiple Regression. Zvi Drezner California State University Fullerton. Optimization. Consider a “convex” optimization Wherever you start, you “slide” into the optimal solution. Non-convexity. Now consider the “non-convex” case. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Tabu  Search for Model Selection in Multiple Regression

Tabu Search for Model Selection in Multiple Regression

Zvi DreznerCalifornia State University

Fullerton

Page 2: Tabu  Search for Model Selection in Multiple Regression

Optimization

Consider a “convex” optimizationWherever you start, you “slide” into the

optimal solution.

2

Page 3: Tabu  Search for Model Selection in Multiple Regression

Non-convexity

Now consider the “non-convex” case.You may end up with a local optimum.

3

Page 4: Tabu  Search for Model Selection in Multiple Regression

The Descent Approach

Sliding downhill is a descent approach.For maximization it is termed the ascent

approach.For continuous functions the descent is

done by a gradient search.When the function is non-convex one may

end up with a local optimum which is not the best solution.

4

Page 5: Tabu  Search for Model Selection in Multiple Regression

Tabu Search

Tabu search (Glover, 1986) was designed to escape local optima and hopefully reach the global optimum which is the best solution.

It starts as a descent approach but once a local optimum is found the search continues.

The idea in a nut shell is to allow up moves but no “sliding back” allowed.

5

Page 6: Tabu  Search for Model Selection in Multiple Regression

Tabu (Contd.)

One tries to make the best move downward, but if you just made an up move (because no down move was possible), the best next move may be back down.

If we do not forbid back moves the search will enter an infinite loop.

The mechanism will be described in detail later.

6

Page 7: Tabu  Search for Model Selection in Multiple Regression

Multiple Regression

Data exist for n variables, m observations.A partial set of the variables may provide a

better description of the data.The common criterion used for model fit is

the p-value (significance F).Other criteria such as adjusted R-square can

be used.

7

Page 8: Tabu  Search for Model Selection in Multiple Regression

The Issue

There are 2n partial subsets.For n=30 there are more than one billion

possible subsets.For n=100 there are more than 1030 possible

subsets.

8

Page 9: Tabu  Search for Model Selection in Multiple Regression

Why Tabu?

Suppose that you can check a million subsets per second (quite optimistic).

Checking all possible subsets for n=30 is still manageable in about 20 minutes.

For n=100 it will take more than 1016 (ten million billions) years.

9

Page 10: Tabu  Search for Model Selection in Multiple Regression

Stepwise Resgression

Stepwise regression is “a sort of” a descent algorithm.

Suppose that this is the “graph” of the significance F.

10

Page 11: Tabu  Search for Model Selection in Multiple Regression

The Search

The search starts on the left and each addition or a removal of a variables that reduces the significance F is going down the graph.

It will end at a local minimum.

11

Page 12: Tabu  Search for Model Selection in Multiple Regression

A Descent Algorithm

Suppose that a subset of variables is selected.

We would like to check whether a “move” to another subset can improve the criterion (significance F, adjusted R-square, etc.).

A neighborhood of subsets needs to be defined.

12

Page 13: Tabu  Search for Model Selection in Multiple Regression

We cannot include “all possible subsets” in the neighborhood because we are back to the original issue of checking all possible subsets.

Stepwise regression considers adding or removing one variable from the subset.

This is a possible neighborhood.

13

Page 14: Tabu  Search for Model Selection in Multiple Regression

It may be more effective to add to the neighborhood also exchanging two variables, i.e., removing a variable and adding another in one move.

It is possible that removing a variable or adding a variable is not beneficial but replacing a variable with a more suitable one is beneficial.

14

Page 15: Tabu  Search for Model Selection in Multiple Regression

In the Descent algorithm, the best of the moves in the neighborhood is executed until all moves are not improving.

The last subset is the result of this approach.

15

Page 16: Tabu  Search for Model Selection in Multiple Regression

The descent algorithm terminates at a local minimum that may or may not be the global one.

The final outcome depends on the starting point.

16

Page 17: Tabu  Search for Model Selection in Multiple Regression

Intuition

Consider a plane full of craters.The bottom of one of the craters, the deepest

one, is the optimal solution we are looking for.

The descent algorithm starts at a random point in the plane and “slides” into the nearest crater, not necessarily the deepest one.

Page 18: Tabu  Search for Model Selection in Multiple Regression

So, how can you get out of a crater and land possible at a deeper one?

One must perform a sequence of upward moves to get out of the crater and hopefully slide into a deeper one.

In tabu search the best possible move (whether improving or not) is performed with one stipulation: recent inverse moves are not allowed.

18

Page 19: Tabu  Search for Model Selection in Multiple Regression

A list of tabu moves is created.The best move which is not in the tabu list

is performed.There is one exception to the rule: A move

leading to a solution better than the best one found so far in the search is permitted.

19

Page 20: Tabu  Search for Model Selection in Multiple Regression

The tabu tenure is the number of iterations a tabu move stays in the tabu list.

Once a move is performed, the reverse move is entered the tabu list and, if the length of the tabu list exceeds the tabu tenure, the move whose tenure exceeds the tabu tenure is removed from the list.

20

Page 21: Tabu  Search for Model Selection in Multiple Regression

Other rules for tabu moves can be devised.For example, once a variable is removed

from the selected set (either as a removed variable or one of the exchanged variables) it is entered into the tabu list.

Entering such a variable back into the selected set is forbidden for the next tabu tenure iterations.

21

Page 22: Tabu  Search for Model Selection in Multiple Regression

Such a rule for the tabu list prevents cycling. It is possible to have a cycle longer than the tabu tenure but by experience it has never been observed.

Determining the value of the tabu tenure can impact the effectiveness of the tabu search procedure.

22

Page 23: Tabu  Search for Model Selection in Multiple Regression

Common values are between 10% and 20% of the number of possible moves.

When the strategy of listing variables recently removed from the selected set, there are at most n possible elements to be included in the tabu list.

So, between 10% and 20% of n (but at least 5) is a reasonable choice.

If we have fewer than 5 members in the tabu list we run the danger of cycling.

23

Page 24: Tabu  Search for Model Selection in Multiple Regression

There are many variations researched in the literature about selecting the length of the tabu list (Tabu tenure).

One successful strategy is to select the tabu tenure randomly every iteration.

This reduces to almost zero the probability of cycling.

24

Page 25: Tabu  Search for Model Selection in Multiple Regression

Suppose we randomly select the tabu tenure between 10% and 50% of n.

For example for n=50 we randomly select the tabu tenure between 5 and 25 every iteration.

One iteration the tabu list is 7 variables long and the next one it is 15 variables long, and so on.

25

Page 26: Tabu  Search for Model Selection in Multiple Regression

Handling the tabu list is done efficiently as follows.

For each variable we record the iteration number at which it was taken out from the selected set (or a large negative number when it was never taken out).

A variable for consideration to enter the selected set, if the difference between the current iteration number and this record does not exceed the tabu tenure.

26

Page 27: Tabu  Search for Model Selection in Multiple Regression

A Word of CautionIn most cases the Tabu search finds the best

(optimal) solution.However, the Tabu search does not guarantee

that the best solution is obtained.It is therefore recommended to run the VBA

macro several times and select the best solution.Because of the random nature of the procedure

you may get different solutions each time the code is run.

27

Page 28: Tabu  Search for Model Selection in Multiple Regression

The Excel File

An Excel file with a program coded in VBA (Visual Basic for Applications) is available.

File

28

Page 29: Tabu  Search for Model Selection in Multiple Regression

Metaheuristic Algorithms

Ascent/DescentSimulated AnnealingTabu SearchGenetic Algorithms.

Page 30: Tabu  Search for Model Selection in Multiple Regression

Simulated Annealing

Simulated annealing (Kirkpatrick et al., 1983) simulates the cooling of melted metals.

The algorithm in its simplest form depends on 3 parameters: the starting temperature, the factor by which the temperature is reduced every iteration, and the number of iterations.

Page 31: Tabu  Search for Model Selection in Multiple Regression

Simulated Annealing Algorithm

The temperature T is set to the starting temperature. A starting solution P is randomly generated. One move (as in ascent/descent) is randomly selected. The move is accepted if it is an improving move. If the value of the objective deteriorates by D by the

move, it is accepted with a probability of exp(-D/T) . The temperature is lowered by the factor. Stop when the number of iterations is reached.

Page 32: Tabu  Search for Model Selection in Multiple Regression

Intuition

Simulated annealing is like a bouncing rubber ball.When the temperature is high, the bounce is high.The bounces become shorter and shorter.It is easier to get out of a shallow crater and more

difficult to get out of a deep one.We hope that the ball settles at the bottom of the

deepest crater.

Page 33: Tabu  Search for Model Selection in Multiple Regression

Genetic Algorithms

Genetic algorithms (Holland, 1975) are borrowed from the natural sciences and Darwin's law of natural selection and survival of the fittest.

Genetic algorithms are based on the premise that, like in nature, successful matching of parents will tend to produce better, improved offspring.

Page 34: Tabu  Search for Model Selection in Multiple Regression

Genetic Algorithms (Cont’d)A population of solutions is randomly

generated.The following is repeated G generations.Two parents are selected and merged to

produce an offspring.If the offspring is better than the worst

population member, it replaces it.At the end of the process, the best

population member is the solution.

Page 35: Tabu  Search for Model Selection in Multiple Regression

Discussion

There are many fine tuning techniques for the process.

The most important part of genetic algorithms is the merging process to produce the offspring.

Page 36: Tabu  Search for Model Selection in Multiple Regression

The merging process

The following merging process was found very effective.

The two selected parents consist of two lists of variables of lengths p1 and p2.

The two lists have c variables in common.The union of the two lists has p1+p2-c

variables.

Page 37: Tabu  Search for Model Selection in Multiple Regression

Merging process (Cont’d)

Several randomly selected variables (we used 4 variables) not in the union are added to the union.

The c common variables must be included in the offspring. To create the offspring we select additional variables to

these ones. These are selected by applying a descent/ascent approach

to select out of p1+p2-c+4 nodes in the extended union.

Page 38: Tabu  Search for Model Selection in Multiple Regression

Questions?

[email protected]