cpsc 881: machine learning genetic algorithm. 2 copy right notice most slides in this presentation...

27
CpSc 881: Machine Learning Genetic Algorithm

Upload: arleen-pearson

Post on 21-Jan-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

CpSc 881: Machine Learning

Genetic Algorithm

Page 2: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

2

Copy Right Notice

Most slides in this presentation are adopted from slides of text book and various sources. The Copyright belong to the original authors. Thanks!

Page 3: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

3

Overview of Genetic Algorithm (GA)

GA is a learning method motivated by analogy to biological evolution.

GAs search the hypothesis space by generating successor hypotheses which repeatedly mutate and recombine parts of the best currently known hypotheses.

In Genetic Programming (GP), entire computer programs are evolved to certain fitness criteria.

Page 4: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

4

Biological Evolution

Lamarck and others:Species “transmute” over time

Darwin and Wallace:Consistent, heritable variation among individuals in populationNatural selection of the fittest

Mendel and genetics:A mechanism for inheriting traitsGenotype->phenotype mapping

Page 5: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

5

Genetic Algorithms

The algorithm operates by iteratively updating a pool of hypotheses, called the population.

On each iteration, all members of the population are evaluated according to the fitness function.

A new population is then generated by probabilistically selecting the most fit individuals from current population

some of these selected individuals are carried forward into the next generation population intact.Others are used as the basis for creating new offspring individuals by applying genetic operations.

Page 6: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

6

GA

GA(Fitness, Threshold, p, r, m)Initialize Population: generate p hypotheses at random->P.

Evaluate: for each p, compute fitness(h)

While Maxh Fitness(h) < Threshold do

Select: probabilistically select (1-r)p members of P. Call this new generation Pnew

Crossover: probabilistically select (r*p)/2 pairs of hypotheses from P. For each pair, <h1, h2> produce two offspring by applying the crossover operator. Add all offspring to Pnew.

Mutate: Choose m% of PNew with uniform probability. For each, invert one randomly selected bit in its representation.

Update: P <- Pnew

Evaluate: for each p in P, compute fitness(p)

Return the hypothesis from P that has the highest fitness.

||

1)(

)()Pr(

P

ji

ii

hFitness

hFitnessh

Page 7: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

7

Representing Hypotheses

In GAs, hypotheses are often represented by bit strings so that they can be easily manipulated by genetic operators such as mutation and crossover.

Examples:Represent

(Outlook = Overcast v Rain) ^ (Wind = Strong) By Outlook Wind

011 10RepresentIF Wind = Strong THEN PlayTennis = yes By Outlook Wind PlayTennis

111 10 10

Page 8: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

8

Genetic Operators

Crossover Techniques: produce two new offspring from two parent strings by copying selecting bits form each parents. The choice of which parent contributes the bit for position i is determined by an additional string called crossover mask.

Single-point Crossover: Mask example: 11111000000

Two-point Crossover. Mask example: 00111110000

Uniform Crossover. Mask example: 10011010011

Page 9: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

9

Genetic Operators

Mutation Techniques: produces small random changes to the bit string

Point Mutation

Page 10: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

10

Select Most Fit Hypothesis

A simple measure for modeling the probability that a hypothesis will be selected is given by the fitness proportionate selection (or roulette wheel selection):

Pr(hi)= Fitness(hi)/j=1p Fitness(hj)

This simple measure can lead to crowding

Tournament SelectionPick h1, h2 at random with uniform probabilityWith probability p, select the more fit

Rank SelectionSort all hypotheses by fitnessProbability of selection is proportional to rank

In classification tasks, the Fitness function typically has a component that scores the classification accuracy over a set of provided training examples. Other criteria can be added (e.g., complexity or generality of the rule)

Page 11: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

11

GABIL (DeJong et al. 1993)

Lean disjunctive set of propositional rules, competitive with C4.5

Fitness: Fitness(h)=(correct(h))2

Representation:

IF a1=T Λ a2=F THEN c=T; IF a2=T THEN c=FRepresented bya1 a2 c a1 a2 c10 01 1 11 10 0

Genetic operators: Standard mutation operatorsExtended two point crossover

Want variable length rule sets

Want only well-formed bitstring hypotheses

Page 12: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

12

Crossover with Variable-Length Bit-strings

Start witha1 a2 c a1 a2 c

h1 10 01 1 11 10 0h2 01 11 0 10 01 0

1. choose crossover points for h1, e.g., after bit 1, 8

2. now restrict points in h2 to those that produce bitstrings with well-defined semantics, e.g., <1,3>, <1,8>, <6, 8>

Let d1 and d2 denote the distance from the leftmost and rightmost of two crossover points in h1 to the rule boundary immediately to it left.Then, the crossover points in h2 must have the same d1 and d2 values.

If we choose <1,3> result isa1 a2 c

h3 11 10 0a1 a2 c a1 a2 c a1 a2 c

h4 00 01 1 11 11 0 10 01 0

Page 13: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

13

GABIL Extentions

Add new genetic operators, also applied probabilistically

1.AddAlternative: generalize constraint on ai by changing a 0 to 1.2. DropCondition: generalize constraint on by change every 0 to 1.

And add new filed to bitstring to determine whether to allow these

a1 a2 c a1 a2 c AA DC

01 11 0 10 01 0 1 0So now the learning strategy also evolves

Page 14: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

14

GABIL Results

Performance of GABIL comparable to symbolic rule/tree learning methods C4.5, ID5R, AQ14

Average performance on a set of 12 synthetic problems:

GABIL with out AA and DC operator: 92.1%GABIL with AA and DC operators: 95.2%Symbolic learning methods ranged from 91.2 to 96.6%.

Page 15: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

15

Hypothesis Space Search

GA search can move very abruptly (as compared to Backpropagation, for example), replacing a parent hypothesis by an offspring that may be radically different from the parent.The problem of Crowding: when one individual is more fit than others, this individual and closely related ones will take up a large fraction of the population.Solutions:

Use tournament or rank selection instead of roulette selection.Fitness sharing: the measured fitness of an individual is reduced by the presence of other similar individuals in the population.restrict ion on the kinds of individuals allowed to recombine to form offspring.

Page 16: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

16

The Schema Theorem [Holland, 75]

Definition: A schema is any string composed of 0s, 1s and *s where * means ‘don’t care’.

Example: schema 0*10 represents strings 0010 and 0110.

Characterize population by number of instance representing each possible schema

Page 17: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

17

Consider Just Selection

f(t)= average fitness of population at time t

m(s, t) = number of instances of schema s in population at time t.

u^(s, t) = average fitness of instances of s at time t

Probability of selecting h in one selection step

Probability of selecting an instance of s in one step

Expected number of instances of s after n selections

)(

)(

)(

)()Pr(

1

tnf

hf

hf

hfh

n

ii

),()(

),(ˆ

)(

)()Pr( tsm

tnf

tsu

tnf

hfsh

tpsh

),()(

),(ˆ)]1_,([ tsm

tf

tsutsmE

Page 18: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

18

Schema Theorem

The full schema theorem provides a lower bound on the expected frequency of schema s, as follow:

pc= probability of single point crossover operator

pm=probability of mutation operatorl = length of single bit stringso(s) = number of defined (non “*”) bits in sd(s) = distance between leftmost, rightmost defined bits in s

The Schema Theorem: More fit schemas will tend to grow in influence, especially schemas containing a small number of defined bits (i.e., containing a large number of *s), and especially when these defined bits are near one another within the bit string.

)()1(1

)(1),(

)(

),(ˆ)]1_,([ so

mc pl

sdptsm

tf

tsutsmE

Page 19: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

19

Genetic Programming

Genetic programming is a form of evolutionary computation in which the individuals in the evolving population are computer programs rather than bit strings

Population of programs represented by trees

Page 20: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

20

Genetic Programming

On each interaction, GP produces a new generation of individuals using selection, crossover and mutation.

The fitness of a give individual program in the population is typically determined by executing the program on a set of training data.

Page 21: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

21

Crossover

Page 22: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

22

Models of Evolution and Learning I: Lamarckian Evolution [Late 19th C] Proposition: Experiences of a single organism directly affect the genetic makeup of their offsprings.

Assessment: This proposition is wrong: the genetic makeup of an individual is unaffected by the lifetime experience of one’s biological parents.

However: Lamarckian processes can sometimes improve the effectiveness of computerized genetic algorithms.

Page 23: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

23

Models of Evolution and Learning II: Baldwin Effect

AssumeIndividual learning has no direct influence on individual DNABut ability to learn reduces need to “hard wire” traits in DNA

ThenAbility of individuals to learn will support more diverse gene pool

Because learning allows individuals with various “hard wired” traits to be successful

More diverse gene pool will support faster evolution of gene pool

Individual learning (indirectly) increases rate of evolution

Page 24: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

24

Models of Evolution and Learning II: Baldwin Effect

Plausible example:New predator appears in environmentIndividuals who can learn (to avoid it) will be selectedIncrease in learning individuals will support more diverse gene poolResulting in faster evolutionPossibly resulting in new non-learned traits such as instinctive fear of predator

Page 25: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

25

Computer Experiments on Baldwin Effect

Evolve simple neural networksSome network weights fixed during lifetime, other trainableGenetic makeup determines which are fixed and their weight values

ResultsWith no individual learning, population failed to improve over timeWhen individual learning allowed

Early generations: population contained many individuals with many trainable weights

Later generations: higher fitness, while number of trainable weights decreased

Page 26: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

26

Summary: Evolutionary programming

Conduct randomized, parallel, hill-climbing search through H

Approach learning as optimization problem (optimize fitness)

Nice feature: evaluation of Fitness can be very indirect

Consider learning rule set for multistep decision makingNo issue of assigning credit/blame to individual steps

Page 27: CpSc 881: Machine Learning Genetic Algorithm. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources

27

Parallelizing Genetic Algorithms

GAs are naturally suited to parallel implementation. Different approaches were tried:

Coarse Grain: subdivides the population into distinct groups of individuals (demes) and conducts a GA search in each deme. Transfer between demes occurs (though infrequently) by a migration process in which individuals from one deme are copied or transferred to other demesFine Grain: One processor is assigned per individual in the population and recombination takes place among neighboring individuals.