javagenes evolving molecules and molecular force fields al globus deepak srivastava sandy johan a...
TRANSCRIPT
JavaGenesEvolving Molecules and Molecular Force
FieldsAl Globus
Deepak SrivastavaSandy Johan
A Work In Progress
Molecules to Evolve
Cl
N
N
O
O
N
N N
N
Graph Crossover Problem
Any edge may be a member of one or more cycles. Graph fragments produced by division may have
more than one crossover point ("broken edges") When two fragments are combined they may have
different numbers of broken edges to be merged. Our crossover operator
• Operate on any connected graph. • Divides graphs at randomly generated cut sets. • Can evolve arbitrary cyclic structures given at least
some cycles in the initial population.• Always produces connected undirected graphs.• Almost always produces connected directed graphs.
Crossover
abcd wxyz
abcd wxyz
abyz wxcd
Strings Trees Graphs
Graph Crossover
Rip Two Parents Apart Combine into a Child
Molecule Division Choose an initial random bond Repeat
• Find the shortest path between the initial bond's atoms.
• Remove and remember a random bond from this path. These bonds are called "broken edges.“
Until a cut set is found, i.e., no path exists between the initial bond's vertices.
Fragment Recombination Repeat
• Select a random broken edge. Determine which fragment it is associated with.
• If at least one broken edge in other fragment exists– choose one at randomchoose one at random– merge the broken edges into one bond; respecting valence merge the broken edges into one bond; respecting valence
by reducing the order of the bond if necessaryby reducing the order of the bond if necessary
• Else flip coin– heads -- attach the broken edge to a random atom in other heads -- attach the broken edge to a random atom in other
fragment (respecting valence)fragment (respecting valence)– tails -- discard the broken edgetails -- discard the broken edge
Until each broken edge has been processed exactly once
Molecule Fitness FunctionAll-pairs-shortest-path distance
•Assign extended types to each atom –Extended type = (element, |single Extended type = (element, |single bonds|, |double bonds|, |triple bonds|)bonds|, |double bonds|, |triple bonds|)
•Find shortest bond path between each pair of atoms
•Create bag: one item per atom pair– item = (type1, type2, path length)item = (type1, type2, path length)–bag = set with repeated itemsbag = set with repeated items
•distance = 1 - |intersection| / |union|
Finding Small Molecules
N
N N
N
Finding Larger Molecules
Cl
N
N
O
O
JavaGenes in Action
Finding
with all-pairs-shortest-pathand Tanimoto index fitnessfunction (0 is perfect)
Molecular Dynamics and Mechanics Newton’s laws of motion in a potential field
Discover common conformations during dynamics
Discover minimum energy conformations (e.g., protein folding problem)
Began in 1960s with two body potentials for inert gas modeling
1980s extended to metals and bonded systems (upper-right corner of periodic table)
Our studies focus on the evolving potentials for reactive systems (bonds break and form)
Molecular Potentials Energy = sum 2-body terms + sum 3-body
terms + … Stillinger-Weber SiF potential function
• 2-body(r) – A(BrA(Br-p-p - r - r-q-q) * cutoff) * cutoff– Cutoff = exp(C/(r-a)); r < a, 0 otherwiseCutoff = exp(C/(r-a)); r < a, 0 otherwise
• 3-body(rij,rjk,theta) =
– (alpha + lambda (cos(theta) - cos(theta(alpha + lambda (cos(theta) - cos(theta00))^2))))^2)) * cutoff * cutoff
– Cutoff = exp(gamma(1/(Cutoff = exp(gamma(1/(rrijij- a1) + 1/(- a1) + 1/(rrjkjk- a1))- a1))
• FFF additional term = – delta(rdelta(rijijrrjkjk))
-m-m * cutoff * cutoff
– Cutoff = exp(beta(1/(Cutoff = exp(beta(1/(rrijij - a2) + 1/( - a2) + 1/(rrjkjk- a2)))- a2)))
Discovering parameters can require months or years
Evolving Molecular Force Fields Chromosome
• 2D ragged array of floating point numbers– SiSi, SiF, FF, SiSiSi, SiSiF, SiFSi, FSiF, FFSi, FFFSiSi, SiF, FF, SiSiSi, SiSiF, SiFSi, FSiF, FFSi, FFF
• 5-63 parameters Transmission operators
• Interval crossover• Mutation
Fitness Function• RMS difference between individuals and
“correct” energies for n molecules• “Correct” energies
– Currently: energies generated with the force field with Currently: energies generated with the force field with published parameterspublished parameters
– Next step: energies generated by higher quality Next step: energies generated by higher quality quantum codesquantum codes
Interval Crossover For each allele:
LowerParental
Value(1.1)
HigherParental
Value(2.1)
Construct larger interval (100% larger)(.6) (2.6)
Choose a random number
(1.3)
1.
2.
3.
Construct an interval from parental values
Si potential results population = 1000 generations = 3000 fitness function: 100 random 5-body Si
tetrahedra 31 runs. Best run results:
• A = 7.151346144801161 (7.049556277)• B = 0.6007865398735448 (0.6022245584)• p = 3.9825158463763977 (4)• q = 0.014970062068368135 (0)• a = 1.797123919332413 (1.8)• alpha = 0.1442970771852687 (0)• lambda = 27.783092740584205 (21)• gamma = 1.328091763076223 (1.2)• a1 = 1.8173559091012945 (1.8)
Future Plans Hill climbing Use experimental data for new fitness
functions Feed results from easy to hard
evolution
SiSi (5) SiF (6)FF (6)
SiSiSi (9) FFF (14)
SiFSi (10)
Full SiF (63)
SiSiF (10)
FSiF (10)
FFSi (10)
Condor Cycle-scavenging batch system for
single workstation jobs• Desktop machines, nights, weekends, etc.• University of Wisconsin • In production since 1986• Unix workstations
250 SGI and 50 Sun workstations at code IN
Good for• parameter studies• stochastic algorithms (e.g., GA)
One JavaGenes job per Condor job