modeling with irene
DESCRIPTION
Modeling with IRENE. I ntegrated R -code for E ngineered N eural E volution Trevor Grant and Olcay Akman Department of Mathematics Illinois State University. Overview. Neural Evolution What is a Neural Network? Using genetic algorithms to find optimal parameters to nonlinear functions - PowerPoint PPT PresentationTRANSCRIPT
Modeling with IRENEIntegrated R-code for Engineered Neural EvolutionTrevor Grant and Olcay AkmanDepartment of MathematicsIllinois State University
OverviewNeural Evolution
What is a Neural Network? Using genetic algorithms to find optimal
parameters to nonlinear functions Neural evolution
Special Population Attributes Jump Connections User defined libraries and learning functions Mutating learning functions
Engineered Genetic Algorithms
What is a Neural
Network?
Starting out simpleWe begin by modeling the data with a simple linear model. We then look at the sum of the squared residuals (SSR). A value is assigned to the model based on this SSR.
β0β1
β2
β3
Inputs (X1, X2, …, Xn)
Output (Y)
Example1974 Statistics regarding Income
Income:per capita income (1974)
Life Exp:life expectancy in years (1969–71)
Murder:murder and non-negligent manslaughter rate per 100,000 population (1976)HS Grad:percent high-school graduates (1970)
Frost:mean number of days with minimum temperature below freezing (1931–1960) in capital or large city
β0β1
β2
β3
Inputs (X1, X2, …, Xn)
Output (Y)
Income
Life Expectancy
Murder Rate
HS Grad %
Frost
ResidualsThe difference between the estimated value
and the fitted value is known as the residual
Sum of Squared ResidualsHeight
Age
A linear model is estimatedwhich minimizes the sum of squared residuals (SSR). The distance between theestimates and the actualdata points.
RelationshipLinear
Traditionally we estimate linear relationships.
Nonlinear True relationship may be (often is)
non-linear Sometimes we know relationship
and can use nonlinear regression methods such as Neural Networks Nonlinear least squares
Sometimes we don’t know the functional form of the relationship. IRENE explores functional forms
while estimating parameters.
Sum of Squared ResidualsHeight
Age
A nonlinear model reducesthe sum of squared residualsand better models theactual data.
Anatomy of a neural network
LayersNodes
What’s in a node?
A node contains a learning functionThe learning function takes input and parameters converts it to output.
A model has parameter values
α11α12α13
α14
Let’s pretend the first observation contains these values
22
2
-14
.1
Now say a model has these parameters:
22
2
-14
.1
5-4.12
2
And the learning function on this node is exponential
1
2
-14
.1
5-4.12
2
h1
So the value for node h1 for the first observation is .1108
1
2
-14
.1
5-4.12
2
h1.1108
This is repeated for each observation
Drag picture to placeholder or click icon to add
Each model has it’s own unique set of α. The fitted values of the output are functions of
Observation h11 .11082 1.5243 .5294 1.011… …n 1.752
After this is complete a linear model is estimated. The values of the nodes in the last layer are regressed on the output. The sum of the squared residuals is assigned as the model’s value.
The linear model estimated
The sum of the squared residuals of the model (SSR) is referred to as the value of the model. We want a model that minimizes sum of squared residuals (or value).
Linear model estimated in a more complex neural network
h11
h12
h13
h21
h22
NOTE: h11, h12, h13 are not included in the final linear model. Only the nodes in the final layer are included in the linear model
Optimizing Parameters with Genetic Algorithms Step 1: A population of models is created each with
randomly assigned parameters Step 2: Models ‘mate’ in the hope of creating ‘children’
models with better value (lower SSR).
From now on we will refer to each unique set of parameters in a model as a creature. A collection of creatures, models with identical topology but different parameters, is referred to as a species.
Copy this model 200 times, each copy has randomly assigned parameter values
Each individual collection of parameters is referred to as a creature. The collection of creatures for a given topology (arrangement of layers and nodes) is referred to as a species.
CreatureSpecies
Species
A species has a unique arrangement of nodes, layers and learning functions. Even though these creatures have the same arrangement of layers and nodes, they have a different learning function and so they are different species
≠Sigmoid Learning Function Exponential Learning Function
Then each creature has a different computed value (SSR), and assigned ID#, this is saved in a table.
ID # 001
ID # 002
ID # 003
41,240
215,635
3,612
Model ID Sum Squared Resid. (SSR)
Two creatures are selected with probability weighted according to model fitness.
ID # 001
ID # 002
ID # 003
41,240
215,635
3,612
Model ID Sum Squared Resid. (SSR)
Each creature can be represented by DNA
2.512.10551.25-
15.2
α11Model Structure α12 α13 α14
Two methods of matingAverage
The average of each parameter in the mother’s and father’s DNA is averaged in the child’s DNA
Crossover A ‘cut point’ is randomly
determined, every parameter before the cut point is inherited from the father, after the cut point each parameter is inherited from the mother
DNA is selected from the two creatures chosen to mate.
ID # 001
ID # 002
ID # 003
41,240
215,635
3,612
Model ID Sum Squared Resid. (SSR)
α11=2.512Mother α12=.105 α13=51.25 α14=-15.2
α11=3.613 Father α12=26.252 α13=-25.12 α14=104.4
Average MethodΑ11=3.613 Father Α12=26.252 Α13=-25.12 Α14=104.4
α11=2.512Mother α12=.105 α13=51.25 α14=-15.2
Α11=(3.613+2.512)/2
=3.0625
Child Α12=(26.252+.105)/2
=13.1785
Α13=(-25.12+51.25)/2
=13.065
Α14=(104.4-15.2)/2
=44.6
Average MethodΑ11=3.613 Father Α12=26.252 Α13=-25.12 Α14=104.4
α11=2.512Mother α12=.105 α13=51.25 α14=-15.2
Α11=3.0625 Child Α12=13.1785 Α13=13.065 Α14=44.6
Crossover MethodA random number between one and the length
of the parameter sequence is chosen. This is the ‘cut point’. The child inherits
parameters from the father before this point, from the mother after.
Crossover Method: Cut point at position two
Α11=3.613Father Α12=26.252 Α13=-25.12 Α14=104.4
α11=2.512Mother α12=.105 α13=51.25 α14=-15.2
Child
Α11=3.613 Α12=26.252
α13=51.25 α14=-15.2
The least fit creatures are killed to make room for the new children
ID # 001
ID # 002
ID # 003
41,240
3,289
215,635
Model ID Sum Squared Resid. (SSR)
The least fit creatures are killed to make room for the new children
ID # 001
ID # 002
41,240
3,289
Model ID Sum Squared Resid. (SSR)
The least fit creatures are killed to make room for the new children
ID # 001
ID # 002
41,240
3,289
Model ID Sum Squared Resid. (SSR)
Α11=3.0625Model Structure Α12=13.1785 Α13=13.065 Α14=44.6
The children are assigned new ID numbers and their value (SSR) is computed
ID # 001
ID # 002
41,240
3,289
Model ID Sum Squared Resid. (SSR)
ID # 004 6,755
This process repeats several times
ID # 001
ID # 002
41,240
3,289
Model ID Sum Squared Resid. (SSR)
ID # 004 6,755
This process repeats several times
ID # 005
ID # 002
4,242
3,289
Model ID Sum Squared Resid. (SSR)
ID # 004 6,755
This process repeats several times
ID # 005
ID # 002
4,242
3,289
Model ID Sum Squared Resid. (SSR)
ID # 007 3,111
This process repeats several times
ID # 008
ID # 002
4,841
3,289
Model ID Sum Squared Resid. (SSR)
ID # 007 3,111
Eventually there is convergence at an optimum (either local or global)
ID # 239
ID # 159
2,015
2,015
Model ID Sum Squared Resid. (SSR)
ID # 412 2,015
At convergence we kill all the extra creatures in the species (to free up memory)
ID # 239
ID # 159
2,015
2,015
Model ID Sum Squared Resid. (SSR)
ID # 412 2,015
What is neural evolution?Neural evolution: simultaneously explore new
topologies while optimizing existing topologies. New species are born out of old species.
‘Growing’ new nodes
(We don’t always wait for convergence to add new layers and nodes…)
We call each arrangement of layers, nodes and learning functions a species.
Who lives? Who dies? After each
generation a roster of all creatures is created and ordered according to value.
Species ID
Creature ID Value (SSR)
003 043 12123003 021 12552002 231 13241003 054 15125001 152 20150005 024 25124003 122 35102002 105 53039… … …001 412 124310151
Who lives? Who dies? If there is at least one creature of species in
the top 60%* of a list of all creatures the species survives. Otherwise the entire species is eradicated.
*60% is arbitrary. We can set that to other proportions. We’ll talk about this more in engineered genetic algorithms.
Drag picture to placeholder or click icon to add
Example:Species2233223113
Species 2
Species 1
Species 3
60%
Survivors: No creature of Species 1 is among them
While each species searches for optimums, new ones are born and others dies out.
We could search forever, but we stop our search based on time or generations elapsed.
Special Population
AttributesJump connections, user defined libraries and
learning functions, and mutating functional forms.
Jump Connections
With jump connections, all nodes and input are regressed on the output.
In a standard neural network, only the nodes in the final layer is regressed on the output.
Jump Connections
h11
h22
h23
x1
h12
h21
x2
x3
x4
Colinearity If jump connections are used and the learning
function is linear then the final linear model will have perfect colinearity. (The computer won’t be able to estimate the final model, this is bad and a failsafe is built in to prevent this from happening)
Colinearity
Libraries of Learning Functions Each time a node is
created a learning function is randomly selected from the library.
Function1: Exponential
Function 2: Sigmoid
Function 3: Logit
Function 4: Step Function
…
User Defined Functions Suppose theory dictates that a particular nonlinear
relationship possibly exists. For example consider Michaelis-Menten kinetics model of enzyme-kinetics.
The researcher can add this functional form to the library to be selected as a possible learning function for nodes.
The standard library contains common functional forms, however certain cases may require special functional forms which can be added by the researcher as needed.
Mutating learning functions Function 3: Sigmoid:
Function 5: Exponential
New Function: Composite:
The researcher can also choose to allow for mutating learning functions.
New composite learning function
Populations
Each collection of species is called a population.
Population attributesMax creatures in speciesLibraryAllow functional mutationsMaximum layersMaximum nodesMutation ratesAllow jump connectionsetc.
Determining population attributesHow many generations should a population be
allowed to run? Should Jump connections be allowed?What portion of the roster should be the cut off
point for determining species survival?Should function mutations be allowed?And other settable attributes…
Populations can be represented with DNA tooPopulation 1
Max creatures 200
Library StdLib
Maximum Layers
3
Maximum Generations
5000
Allow Jump Connections
YES
Population 2
Max creatures 150
Library UsrDef
Maximum Layers
4
Maximum Generations
7000
Allow Jump Connections
NO
Engineered Genetic Algorithms
Engineered Genetic Algorithms refers to using genetic algorithms to find the optimal
population settings for a neural evolution algorithm.
Parsing the data set
Data set of observations
Parsing the data set
Training Data Set Validation Data Set Second ValidationData Set
Evaluating PopulationsCreatures are evaluated on how well they fit
the training data. Creatures that minimize SSE in training data set are considered most fit
Populations are evaluated on how well they predict out of sample. The best creature the population is able to produce is evaluated in the validation data set and SSE is computed. Population that produces creature that minimizes SSE in validation data set is considered most fit.
Example with 3 populations
Pop1
Pop3
Pop2
Each population chooses it’schampion.
The Champion
Recall: each species is comprised of several creatures. The champion is the optimal creature of the optimal species in the populations. I.e. the creature that best minimizes SSE in the entire population.
The Champions CompeteValidation Data Set
Pop1 Pop2 Pop3
Out of Sample Evaluation
0123456
Real DataPop1 PredictionPop2 PredictionPop3 Prediction
Validation Data Set
In this example, Pop3 preforms best, Pop1 is worst. Pop3 and Pop2 are most likely to be selected for mating
Population parameters come in two varietiesNumerical Continuous Examples:
Max Layers (3) Initial Species
Population (300) Mating may be either
crossover or averaging Need to round if
averaging
Switches Examples
Allow Jump Connections(TRUE/FALSE) Mating Rule ( Average / Crossover / Both )
Mating must be crossover with higher probability of selecting father’s (higher value model’s) traits.
Population mating
Father (Pop3)Jump ConnectionsInitial Species PopulationMax Layers
Max nodes per layerMutation Rate
YES30037.15
Child (Pop4)Jump ConnectionsInitial Species PopulationMax Layers
Max nodes per layerMutation Rate
225
.10
Mother (Pop2)Jump ConnectionsInitial Species PopulationMax Layers
Max nodes per layerMutation Rate
NO15024.05
YES300 1503
4.15 .05
And then the new population (Pop4) searches for its champion who will then go compete
Museums
Recall Previous Example:Species4242442323
Species 2
Species 4
Species 3
60%
Survivors: No creature of Species 3 is among them
The optimal creature of the now extinct Species 3 is saved
Recall Previous Example:Population 1Museum of “Natural” History
This creature is saved in the Museum.The optimal creature of each speciesas it goes extinct is also saved.When the population has completed its specified number of generations theoptimal creature from each remainingspecies is also saved to the museum.
Population 1Museum of “Natural” History
Why have a Museum?Neural Networks may ‘over fit’ trainingdata. A good predictive model maygo extinct.
Validation Data Set
Evaluate models in the museum to make sure we didn’t miss a good predictive model.
The End… (not of the slide show, don’t get up yet)Creature Value
At a specified ‘end’ of the algorithmall creatures from all museums are collectedinto a master list.
Validation Data Set
Each creature in the list is evaluated on the validation data set.2412
351612302984151205123110236123191241
The End… (not of the slide show, don’t get up yet)Creature Value
Validation Data Set
The best model is selected. If it passes a second round of validation, it is selected. If it doesn’t pass the second round of validation, the next best model is selected.
2412351612302984151205123110236123191241 Second Validation Data Set
SUCCESS!
And so the final Model is selected.
This is the predictive model the algorithm returns.
References Barrat, Alain, Marc Barthélemy, and Alessandro Vespignani. "Weighted evolving
networks: coupling topology and weight dynamics." Physical review letters 92.22 (2004): 228701.
Maniezzo, Vittorio. "Genetic evolution of the topology and weight distribution of neural networks." Neural Networks, IEEE Transactions on 5.1 (1994): 39-53.
Barrat, Alain, Marc Barthélemy, and Alessandro Vespignani. "Modeling the evolution of weighted networks." Physical Review E 70.6 (2004): 066149.
Sher, Gene. "DXNN: evolving complex organisms in complex environments using a novel tweann system." Proceedings of the 13th annual conference companion on Genetic and evolutionary computation. ACM, 2011.
Sher, Gene I. "Discover & eXplore Neural Network (DXNN) Platform, a Modular TWEANN." arXiv preprint arXiv:1008.2412 (2010).
Michalewicz, Zbigniew, Cezary Z. Janikow, and Jacek B. Krawczyk. "A modified genetic algorithm for optimal control problems." Computers & Mathematics with Applications 23.12 (1992): 83-94.
Wang, Ling, and D-Z. Zheng. "A modified genetic algorithm for job shop scheduling." The International Journal of Advanced Manufacturing Technology 20.1 (2002): 72-76.