modeling with irene

Modeling with IRENEIntegrated R-code for Engineered Neural EvolutionTrevor Grant and Olcay AkmanDepartment of MathematicsIllinois State University

OverviewNeural Evolution

What is a Neural Network? Using genetic algorithms to find optimal

parameters to nonlinear functions Neural evolution

Special Population Attributes Jump Connections User defined libraries and learning functions Mutating learning functions

Engineered Genetic Algorithms

What is a Neural

Network?

Starting out simpleWe begin by modeling the data with a simple linear model. We then look at the sum of the squared residuals (SSR). A value is assigned to the model based on this SSR.

β0β1

β2

β3

Inputs (X1, X2, …, Xn)

Output (Y)

Example1974 Statistics regarding Income

Income:per capita income (1974)

Life Exp:life expectancy in years (1969–71)

Murder:murder and non-negligent manslaughter rate per 100,000 population (1976)HS Grad:percent high-school graduates (1970)

Frost:mean number of days with minimum temperature below freezing (1931–1960) in capital or large city

β0β1

β2

β3

Inputs (X1, X2, …, Xn)

Output (Y)

Income

Life Expectancy

Murder Rate

HS Grad %

Frost

ResidualsThe difference between the estimated value

and the fitted value is known as the residual

Sum of Squared ResidualsHeight

Age

A linear model is estimatedwhich minimizes the sum of squared residuals (SSR). The distance between theestimates and the actualdata points.

RelationshipLinear

Traditionally we estimate linear relationships.

Nonlinear True relationship may be (often is)

non-linear Sometimes we know relationship

and can use nonlinear regression methods such as Neural Networks Nonlinear least squares

Sometimes we don’t know the functional form of the relationship. IRENE explores functional forms

while estimating parameters.

Sum of Squared ResidualsHeight

Age

A nonlinear model reducesthe sum of squared residualsand better models theactual data.

Anatomy of a neural network

LayersNodes

What’s in a node?

A node contains a learning functionThe learning function takes input and parameters converts it to output.

A model has parameter values

α11α12α13

α14

Let’s pretend the first observation contains these values

22

2

-14

.1

Now say a model has these parameters:

22

2

-14

.1

5-4.12

2

And the learning function on this node is exponential

1

2

-14

.1

5-4.12

2

h1

So the value for node h1 for the first observation is .1108

1

2

-14

.1

5-4.12

2

h1.1108

This is repeated for each observation

Drag picture to placeholder or click icon to add

Each model has it’s own unique set of α. The fitted values of the output are functions of

Observation h11 .11082 1.5243 .5294 1.011… …n 1.752

After this is complete a linear model is estimated. The values of the nodes in the last layer are regressed on the output. The sum of the squared residuals is assigned as the model’s value.

The linear model estimated

The sum of the squared residuals of the model (SSR) is referred to as the value of the model. We want a model that minimizes sum of squared residuals (or value).

Linear model estimated in a more complex neural network

h11

h12

h13

h21

h22

NOTE: h11, h12, h13 are not included in the final linear model. Only the nodes in the final layer are included in the linear model

Optimizing Parameters with Genetic Algorithms Step 1: A population of models is created each with

randomly assigned parameters Step 2: Models ‘mate’ in the hope of creating ‘children’

models with better value (lower SSR).

From now on we will refer to each unique set of parameters in a model as a creature. A collection of creatures, models with identical topology but different parameters, is referred to as a species.

Copy this model 200 times, each copy has randomly assigned parameter values

Each individual collection of parameters is referred to as a creature. The collection of creatures for a given topology (arrangement of layers and nodes) is referred to as a species.

CreatureSpecies

Species

A species has a unique arrangement of nodes, layers and learning functions. Even though these creatures have the same arrangement of layers and nodes, they have a different learning function and so they are different species

≠Sigmoid Learning Function Exponential Learning Function

Then each creature has a different computed value (SSR), and assigned ID#, this is saved in a table.

ID # 001

ID # 002

ID # 003

41,240

215,635

3,612

Model ID Sum Squared Resid. (SSR)

Two creatures are selected with probability weighted according to model fitness.

ID # 001

ID # 002

ID # 003

41,240

215,635

3,612


Each creature can be represented by DNA

2.512.10551.25-

15.2

α11Model Structure α12 α13 α14

Two methods of matingAverage

The average of each parameter in the mother’s and father’s DNA is averaged in the child’s DNA

Crossover A ‘cut point’ is randomly

determined, every parameter before the cut point is inherited from the father, after the cut point each parameter is inherited from the mother

DNA is selected from the two creatures chosen to mate.

ID # 001

ID # 002

ID # 003

41,240

215,635

3,612


α11=2.512Mother α12=.105 α13=51.25 α14=-15.2

α11=3.613 Father α12=26.252 α13=-25.12 α14=104.4

Average MethodΑ11=3.613 Father Α12=26.252 Α13=-25.12 Α14=104.4

α11=2.512Mother α12=.105 α13=51.25 α14=-15.2

Α11=(3.613+2.512)/2

=3.0625

Child Α12=(26.252+.105)/2

=13.1785

Α13=(-25.12+51.25)/2

=13.065

Α14=(104.4-15.2)/2

=44.6

Average MethodΑ11=3.613 Father Α12=26.252 Α13=-25.12 Α14=104.4

α11=2.512Mother α12=.105 α13=51.25 α14=-15.2

Α11=3.0625 Child Α12=13.1785 Α13=13.065 Α14=44.6

Crossover MethodA random number between one and the length

of the parameter sequence is chosen. This is the ‘cut point’. The child inherits

parameters from the father before this point, from the mother after.

Crossover Method: Cut point at position two

Α11=3.613Father Α12=26.252 Α13=-25.12 Α14=104.4

α11=2.512Mother α12=.105 α13=51.25 α14=-15.2

Child

Α11=3.613 Α12=26.252

α13=51.25 α14=-15.2

The least fit creatures are killed to make room for the new children

ID # 001

ID # 002

ID # 003

41,240

3,289

215,635



ID # 001

ID # 002

41,240

3,289



ID # 001

ID # 002

41,240

3,289


Α11=3.0625Model Structure Α12=13.1785 Α13=13.065 Α14=44.6

The children are assigned new ID numbers and their value (SSR) is computed

ID # 001

ID # 002

41,240

3,289


ID # 004 6,755

This process repeats several times

ID # 001

ID # 002

41,240

3,289


ID # 004 6,755


ID # 005

ID # 002

4,242

3,289


ID # 004 6,755


ID # 005

ID # 002

4,242

3,289


ID # 007 3,111


ID # 008

ID # 002

4,841

3,289


ID # 007 3,111

Eventually there is convergence at an optimum (either local or global)

ID # 239

ID # 159

2,015

2,015


ID # 412 2,015

At convergence we kill all the extra creatures in the species (to free up memory)

ID # 239

ID # 159

2,015

2,015


ID # 412 2,015

What is neural evolution?Neural evolution: simultaneously explore new

topologies while optimizing existing topologies. New species are born out of old species.

‘Growing’ new nodes

(We don’t always wait for convergence to add new layers and nodes…)

We call each arrangement of layers, nodes and learning functions a species.

Who lives? Who dies? After each

generation a roster of all creatures is created and ordered according to value.

Species ID

Creature ID Value (SSR)

003 043 12123003 021 12552002 231 13241003 054 15125001 152 20150005 024 25124003 122 35102002 105 53039… … …001 412 124310151

Who lives? Who dies? If there is at least one creature of species in

the top 60%* of a list of all creatures the species survives. Otherwise the entire species is eradicated.

*60% is arbitrary. We can set that to other proportions. We’ll talk about this more in engineered genetic algorithms.

Drag picture to placeholder or click icon to add

Example:Species2233223113

Species 2

Species 1

Species 3

60%

Survivors: No creature of Species 1 is among them

While each species searches for optimums, new ones are born and others dies out.

We could search forever, but we stop our search based on time or generations elapsed.

Special Population

AttributesJump connections, user defined libraries and

learning functions, and mutating functional forms.

Jump Connections

With jump connections, all nodes and input are regressed on the output.

In a standard neural network, only the nodes in the final layer is regressed on the output.

Jump Connections

h11

h22

h23

x1

h12

h21

x2

x3

x4

Colinearity If jump connections are used and the learning

function is linear then the final linear model will have perfect colinearity. (The computer won’t be able to estimate the final model, this is bad and a failsafe is built in to prevent this from happening)

Colinearity

Libraries of Learning Functions Each time a node is

created a learning function is randomly selected from the library.

Function1: Exponential

Function 2: Sigmoid

Function 3: Logit

Function 4: Step Function

…

User Defined Functions Suppose theory dictates that a particular nonlinear

relationship possibly exists. For example consider Michaelis-Menten kinetics model of enzyme-kinetics.

The researcher can add this functional form to the library to be selected as a possible learning function for nodes.

The standard library contains common functional forms, however certain cases may require special functional forms which can be added by the researcher as needed.

Mutating learning functions Function 3: Sigmoid:

Function 5: Exponential

New Function: Composite:

The researcher can also choose to allow for mutating learning functions.

New composite learning function

Populations

Each collection of species is called a population.

Population attributesMax creatures in speciesLibraryAllow functional mutationsMaximum layersMaximum nodesMutation ratesAllow jump connectionsetc.

Determining population attributesHow many generations should a population be

allowed to run? Should Jump connections be allowed?What portion of the roster should be the cut off

point for determining species survival?Should function mutations be allowed?And other settable attributes…

Populations can be represented with DNA tooPopulation 1

Max creatures 200

Library StdLib

Maximum Layers

3

Maximum Generations

5000

Allow Jump Connections

YES

Population 2

Max creatures 150

Library UsrDef

Maximum Layers

4

Maximum Generations

7000

Allow Jump Connections

NO

Engineered Genetic Algorithms

Engineered Genetic Algorithms refers to using genetic algorithms to find the optimal

population settings for a neural evolution algorithm.

Parsing the data set

Data set of observations

Parsing the data set

Training Data Set Validation Data Set Second ValidationData Set

Evaluating PopulationsCreatures are evaluated on how well they fit

the training data. Creatures that minimize SSE in training data set are considered most fit

Populations are evaluated on how well they predict out of sample. The best creature the population is able to produce is evaluated in the validation data set and SSE is computed. Population that produces creature that minimizes SSE in validation data set is considered most fit.

Example with 3 populations

Pop1

Pop3

Pop2

Each population chooses it’schampion.

The Champion

Recall: each species is comprised of several creatures. The champion is the optimal creature of the optimal species in the populations. I.e. the creature that best minimizes SSE in the entire population.

The Champions CompeteValidation Data Set

Pop1 Pop2 Pop3

Out of Sample Evaluation

0123456

Real DataPop1 PredictionPop2 PredictionPop3 Prediction

Validation Data Set

In this example, Pop3 preforms best, Pop1 is worst. Pop3 and Pop2 are most likely to be selected for mating

Population parameters come in two varietiesNumerical Continuous Examples:

Max Layers (3) Initial Species

Population (300) Mating may be either

crossover or averaging Need to round if

averaging

Switches Examples

Allow Jump Connections(TRUE/FALSE) Mating Rule ( Average / Crossover / Both )

Mating must be crossover with higher probability of selecting father’s (higher value model’s) traits.

Population mating

Father (Pop3)Jump ConnectionsInitial Species PopulationMax Layers

Max nodes per layerMutation Rate

YES30037.15

Child (Pop4)Jump ConnectionsInitial Species PopulationMax Layers


225

.10

Mother (Pop2)Jump ConnectionsInitial Species PopulationMax Layers


NO15024.05

YES300 1503

4.15 .05

And then the new population (Pop4) searches for its champion who will then go compete

Museums

Recall Previous Example:Species4242442323

Species 2

Species 4

Species 3

60%

Survivors: No creature of Species 3 is among them

The optimal creature of the now extinct Species 3 is saved

Recall Previous Example:Population 1Museum of “Natural” History

This creature is saved in the Museum.The optimal creature of each speciesas it goes extinct is also saved.When the population has completed its specified number of generations theoptimal creature from each remainingspecies is also saved to the museum.

Population 1Museum of “Natural” History

Why have a Museum?Neural Networks may ‘over fit’ trainingdata. A good predictive model maygo extinct.

Validation Data Set

Evaluate models in the museum to make sure we didn’t miss a good predictive model.

The End… (not of the slide show, don’t get up yet)Creature Value

At a specified ‘end’ of the algorithmall creatures from all museums are collectedinto a master list.

Validation Data Set

Each creature in the list is evaluated on the validation data set.2412

351612302984151205123110236123191241

The End… (not of the slide show, don’t get up yet)Creature Value

Validation Data Set

The best model is selected. If it passes a second round of validation, it is selected. If it doesn’t pass the second round of validation, the next best model is selected.

2412351612302984151205123110236123191241 Second Validation Data Set

SUCCESS!

And so the final Model is selected.

This is the predictive model the algorithm returns.

References Barrat, Alain, Marc Barthélemy, and Alessandro Vespignani. "Weighted evolving

networks: coupling topology and weight dynamics." Physical review letters 92.22 (2004): 228701.

Maniezzo, Vittorio. "Genetic evolution of the topology and weight distribution of neural networks." Neural Networks, IEEE Transactions on 5.1 (1994): 39-53.

Barrat, Alain, Marc Barthélemy, and Alessandro Vespignani. "Modeling the evolution of weighted networks." Physical Review E 70.6 (2004): 066149.

Sher, Gene. "DXNN: evolving complex organisms in complex environments using a novel tweann system." Proceedings of the 13th annual conference companion on Genetic and evolutionary computation. ACM, 2011.

Sher, Gene I. "Discover & eXplore Neural Network (DXNN) Platform, a Modular TWEANN." arXiv preprint arXiv:1008.2412 (2010).

Michalewicz, Zbigniew, Cezary Z. Janikow, and Jacek B. Krawczyk. "A modified genetic algorithm for optimal control problems." Computers & Mathematics with Applications 23.12 (1992): 83-94.

Wang, Ling, and D-Z. Zheng. "A modified genetic algorithm for job shop scheduling." The International Journal of Advanced Manufacturing Technology 20.1 (2002): 72-76.

modeling with irene

Documents

model ssr

observationeach model

simple linear model

linear model estimatedthe

sum of squared residuals

models value

estimated value

linear relationships