global optimization algorithms for chemical applications
TRANSCRIPT
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 1/ 28
Global Optimization Algorithms for Chemical ApplicationsAlgorithms, Showcases and Other Projects
Johannes M. Dieterich
Institute for Physical ChemistryGeorg-August-University Gottingen
Germany
2012
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 2/ 28
Disclaimer
THIS IS A STRIPPED-DOWN VERSION.
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 3/ 28
The search space
General problem
The search space typically has various minima of different fitness. Only fewest (normally just one) are of optimal fitness. In many cases, the number of minima increases exponentially with
search space dimensionality. NPOs (NP optimization problems):
knapsack (packing problem) travelling salesman (shortest path problem)
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 4/ 28
Exponential scaling of search spaces: Should a chemist care?
The Lennard-Jones example
LJ clusters seem to exhibit exponential (exp(x2)) scaling of the numberof minima w.r.t. cluster size12
Clusters described by two-body forces are NP-hard optimizationproblems.3
1M. R. Hoare and J. McInnes, Faraday Disc. Chem. Soc., 32, 791 (1983).2LJ100 is considered to have 1 10
40 local minima.3L. T. Wille and J. Vennik, J. Phys. A: Math. Gen., 18, L419 (1985).
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 5/ 28
(Efficient) Exploration of the search space: Techniques
MC/MD-based methods Simulated annealing (SA)4: Crystal growth. Monte Carlo with minimizations (MCM)5: Basin hopping. Minimum hopping (MH)6: New MD-based algorithm.
Nature inspired algorithms
Genetic algorithms (GA)7: Darwinian/Lamarckian evolution. Evolutionary strategies (ES)8: Mutation-focused evolution. Particle-Swarm optimization (PSO)9: Model movements in a group.
4S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi, Science, 220, 671 (1983).5Z. Li and H A. Scheraga, Proc. Natl. Acad. Sci. USA, 84, 6611 (1987); D. J. Wales and H. A. Scheraga, Science, 285, 1368 (1999).6S. Goedecker, J. Chem. Phys., 120, 9911 (2004).7D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, 1989.8I. Rechenberg, Evolutionsstrategien: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution., 1973.9J. Kennedy and R. Eberhart, Proc. IEEE Int. Conf. Neural Networks, 1942 (1995).
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 6/ 28
Genetic Algorithms for Cluster Optimization
Status Highly optimized, system-specific implementations10
Typically only support for a few levels of theory Maximal support for binary systems11
Remaining Challenges
More efficient and general algorithms Extensible, futureproof framework Efficient and stable parallelization schemes Support for arbitrary levels of theory Support for arbitrary systems
arbitrary systems arbitrary degrees of freedom
10e.g. Hartke working group: similar yet different versions for TIP4P clusters, doped TIP4P clusters, TTM2F clusters, clathrates,...11see e.g. R. L. Johnston et al and R. Ferrando et al
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 7/ 28
Genetic algorithms in a nutshell
(Genetic) Operations
selection/mating crossover mutation (sanity check) (optional magic) local optimization
Maintaining/updating the population
diversity check add to the population
(a) selecting two parents
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 7/ 28
Genetic algorithms in a nutshell
(Genetic) Operations
selection/mating crossover mutation (sanity check) (optional magic) local optimization
Maintaining/updating the population
diversity check add to the population
(b) crossover
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 7/ 28
Genetic algorithms in a nutshell
(Genetic) Operations
selection/mating crossover mutation (sanity check) (optional magic) local optimization
Maintaining/updating the population
diversity check add to the population
(c) mutation
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 7/ 28
Genetic algorithms in a nutshell
(Genetic) Operations
selection/mating crossover mutation (sanity check) (optional magic) local optimization
Maintaining/updating the population
diversity check add to the population
(d) collision (i.a.)
(e) dissociation (i.a.)
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 7/ 28
Genetic algorithms in a nutshell
(Genetic) Operations
selection/mating crossover mutation (sanity check) (optional magic) local optimization
Maintaining/updating the population
diversity check add to the population
(g) opt. magic
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 7/ 28
Genetic algorithms in a nutshell
(Genetic) Operations
selection/mating crossover mutation (sanity check) (optional magic) local optimization
Maintaining/updating the population
diversity check add to the population
(f) local opt.
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 7/ 28
Genetic algorithms in a nutshell
(Genetic) Operations
selection/mating crossover mutation (sanity check) (optional magic) local optimization
Maintaining/updating the population
diversity check add to the population
(h) diversity check
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 7/ 28
Genetic algorithms in a nutshell
(Genetic) Operations
selection/mating crossover mutation (sanity check) (optional magic) local optimization
Maintaining/updating the population
diversity check add to the population
(i) adding an individual
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 8/ 28
So, what is the magic?
Options
directed mutation: move a building block to a better position (requiressystem knowledge)12
explicit exchange: (randomly) exchange building blocks for a bettermixing in highly-mixed systems13
hybrid algorithms: couple GA/MCM for better search space explorationand exploitation14
GA/MCM
12Applications to water clusters by B. Hartke et al., currently in development for OGOLEM13See e.g. R. L. Johnston, Dalton Trans. 4193 (2003).14Some recent (mostly EA/SA based) implementations found in protein folding. DEM/SA/EP: M. Goldstein, E. Fredj and R. B. Gerber, J.
Comput. Chem., 32, 1785 (2011), SA/EA: Y. Sakae, T. Hiroyusa, M. Miki and Y. Okamoto, J. Comput. Chem., 32, 1353 (2011).
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 9/ 28
Differences in search space exploration
Exploration and exploitation differences
GA-based methods: good long-range exploration, good short-rangeexploitation (with minimizations)
MC/MD-based methods: good mid-range exploration and exploitation,good short-range exploitation (with minimizations)
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 10/ 28
Benchmarking genetic algorithms: Ackleys function
(a) full search space (2D) (b) fine structure (2D)
Function definition
f (~x) = 20 exp
0.21
n
ni=1
x2i
exp[1n
ni=1
cos(2xi)
]+20+e1 (1)
with the global minimum xi = 0.0 in the n-dimensional space.15
15D. H. Ackley, A connectionist machine for genetic hillclimbing, 1987; T. Back, Evolutionary algorithms in theory and practice, 1996.
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 11/ 28
Benchmarking genetic algorithms: Ackleys function II
(a) w/ local opt. (b) w/o local opt.
Conclusions The scaling is both w/ and w/o local optimization excellent for all tested
crossover operators (O(N1.1) and O(N1.0)). The benchmark function can be easily solved even for 1000 dimensions. Standard benchmark functions are too simple for GA-based global
optimization techniques.16
16J. M. Dieterich and B. Hartke, in preparation.
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 12/ 28
Hypersurfaces for cluster optimization
Requirements
exact description (to the sub-kcal regime) fast energy and gradient evaluations (since typically multiple hundreds of
thousands are needed) well-behaving (no discontinuities, distinct gradients)
(Partial) Solutions
specialized force fields: fast but potentially inaccurate ab initio methods: exact and (very) slow combinations of the two any other methods (semiempirics,...) depending on the system
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 13/ 28
OGOLEM methods and interfaces
Interfaces Tinker and OpenBabel:
standard force fields Amber, Gromacs and NAMD:
(custom) force fields for largerbiological systems
DFTB+:tight-binding methods
MNDO and MOPAC:(special) semiempirical methods
Orca:semiempirics, DFT and ab initio
CPMD and CP2K:plane wave DFT with periodicboundaries
MOLPRO:high(er)-level ab initio methods
Internal implementations
Standard LJ FF:employingLorentz-Berthelot mixingrules
High-level LJ FF:dedicated parameters foreach pair, arbitrary powerexpression
Gupta FF:two-body FF for metals
Amber FF:reparametrizable
UFF GGFF:
general Gaussian-basedtwo- and three-body FF
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 14/ 28
OGOLEM: Global optimization of chemical problems
Cluster structures arbitrary levels of theory arbitrary systems with arbitrary DoF first implementation of a N-phenotype X-over for N > 2 support for systems in environments (on surfaces, in pockets)
Parametrization (system-specific) potential parametrization against reference data currently focussed on energy-structure dependence for fitness
Molecular design
discrete optimization problem design of photo-switchable molecules17
tuning of substuents for tuning of excitation energies excitation energies using FOCI-AM1 (Mopac) or TDDFT (Orca)
17N. O. Carstensen, J. M. Dieterich and B. Hartke, Phys. Chem. Chem. Phys., 13 290 (2011).
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 15/ 28
Kanamycin A dimers
(a) (KA)2 (b) (KA)2Na+ (c) (KA)2K+
Methods Global optimization based on AMBER:GAFFv10 employing genotype
operators. Reoptimization using MOPAC:PM3. Final structure selection based on MOLPRO:DF-LMP2/cc-pVDZ. Structures seem to support the preference of sodium over potassium
also seen in experiment.18
18J. M. Dieterich, U. Gerstel, J.-M. Schroder, and B. Hartke, J. Mol. Model. doi: 10.1007/s00894-011-0983-x (2011).
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 16/ 28
Sodium-doped water clusters I: Techniques
Experimental requests
relatively exact description of (H2O)nNa (n
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 17/ 28
Sodium-doped water clusters II
(H2O)3Na
-200
0
200
400
600
800
1000
2800 3000 3200 3400 3600 3800 4000
rela
tive
IR tr
ansm
issi
on
wavenumber [cm-1]
(a) Type A
-600
-400
-200
0
200
400
600
800
1000
2800 3000 3200 3400 3600 3800 4000
rela
tive
IR tr
ansm
issi
on
wavenumber [cm-1]
(b) Type B
200
300
400
500
600
700
800
900
1000
2800 3000 3200 3400 3600 3800 4000
rela
tive
IR tr
ansm
issi
on
wavenumber [cm-1]
(c) Type C
allows for a reliable assignment of structural patterns may explain temperature dependence of experimental spectra19
level of theory becomes unfeasible for (H2O)X Na with X > 7
19R. Forck, J. M. Dieterich, R. A. Mata, T. Zeuch, Phys. Chem. Chem. Phys. accepted.
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 18/ 28
scaTTM3F: A modern TTM3-F
Project
global optimization of water clusters (bigger than 34 units) using TTM3-F as a backend Java2JNI2Fortran is nasty (SMP impossible)
Status purely Scala-based implementation (scaTTM3F) Aland (energy decomposition based globopt) niching directed mutation TBD
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 19/ 28
Water clusters: Some common TTM2-F/TTM3-F structures
(a) 3 (b) 4 (c) 5 (d) 6 (e) 7
(f) 8 (g) 9 (h) 10 (i) 11 (j) 12
(k) 13 (l) 14 (m) 15 (n) 16
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 20/ 28
Clever opt: Simplification rules!
V1
V2
V3
r1
r2
r3
R
R
The workflow
Vtot(r2) = V1(R r2) + V2(r2) + V3(R r2), R = 16.34A (2)
constrained BP86/def2-SVP optimizations (rotated planes for BF4) DF-MP2/cc-pVTZ single points GGFF[5] fitting Is this really reasonable?
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 21/ 28
Clever opt: Simplification rules! II
0.2
0.15
0.1
0.05
0
0.05
13 13.5 14 14.5 15 15.5 16 16.5 17 17.5
E [E
h]
r [a0]
GGFF[5] fit
(a) BF4 pot.
0.25
0.2
0.15
0.1
0.05
0
0.05
10 11 12 13 14 15 16 17 18
E [E
h]
r [a0]
GGFF[5] fit
(b) Cl pot.
E [k
Jm
ol
1 ]
r2 []
1500
1450
1400
1350
7 7.5 8 8.5 9 9.5 10
(c) (BF4)3
E [k
Jm
ol
1 ]
r2 []
1550
1500
1450
1400
1350
7 7.5 8 8.5 9 9.5 10
(d) (BF4)2ClE
[kJ
mol
1 ]
r2 []
1600
1550
1500
1450
1400
7 7.5 8 8.5 9 9.5 10
(e) (BF4)Cl2
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 22/ 28
Sn(III)/Ge(III) Hydrides
(a) GeH, BP (b) SnH, BP
Results20
BP86/def2-SVP Sn: SD-ECP optimizations and frequencies NBO analysis: Sn-H: 89.3% p-character, Ge-H: 84.8% lone pair: Sn: 79.4% s-character, Ge: 70.9% 2nd order perturbation energy NBO analysis: N lone pair donation into
metal p orbital (50 kcal/mol Sn, 35 kcal/mol Ge)
20S. Khan, P. P. Samuel, R. Michel, J. M. Dieterich, R. A. Mata, J.-P. Demers, A. Lange, H. Roesky, D. Stalke, Chem. Commun., in press.
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 23/ 28
Sn(III)/Ge(III) Hydrides: DFT-D?
(a) GeH, BP-D (b) SnH, BP-D3
What about DFT-D? local geometry around the metal similar to BP86 isopropyl geometry changes (methyl moiety points towards metal),
difference to experiment DF-MP2/cc-pVTZ favours the BP results (by 2.9 for Sn and 0.2 kcal/mol
for Ge)
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 24/ 28
Atomdroid: Mobile computational chemistry
User interface Molecular viewer Molecular builder Molecule manipulation/analysis PDB downloader Bluetooth sharing
Computational methods
UFF energy and gradient21
L-BFGS22, NEWUOA and Powell minimizations Monte Carlo Monte Carlo with Minimizations (MCM)23
21A. K. Rappe, C. J. Casewit, K. S. Colwell, W. A. Goddard, and W. M. Skiff, J. Am. Chem. Soc., 114, 10024 (1992).22J. Nocedal, Math. of Comput., 35, 773 (1980) via RISO: an implementation of distributed belief networks in Java.23Z. Li and H A. Scheraga, Proc. Natl. Acad. Sci. USA, 84, 6611 (1987); D. J. Wales and H. A. Scheraga, Science, 285, 1368 (1999).
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 25/ 28
Benchmarks: Is it more than just a toy?
Prejudice
Mobile platforms are (and will always be) just for toys not for realapplications.
Hardware is not (and will never be) fast enough for real computations.
Benchmark systems
(c) MeOH (d) Coffein (e) Cryptand 222 (f) Neurokinin A (g) alpha RgIA
Hardware: PoV Mobii Tablet Tegra2 w/ Android 3.0 ROM
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 26/ 28
Benchmarks: Results
0
5
10
15
20
25
30
35
40
45
0 20 40 60 80 100 120 140 160 180 200 220
time
[ms]
size [atoms]
energyenergy+gradient
System No. of atoms energy [ms] energy+gradient [ms]A 6 0.100.01 0.180.02B 24 0.860.01 1.480.02C 63 3.540.02 5.480.02D 155 17.400.09 26.470.07E 201 28.740.03 43.560.02
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 27/ 28
Summary and Outlook
general framework for global optimization exploitation of new hardware platforms: Android, CUDA (not in this talk) development of new parallelization schemes implementation and parametrization of classical force field expressions
as well as novel, flexible ones cooperations with experiment (Kanamycin A, doped water clusters) development of novel GA-based algorithms ...
-
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 28/ 28
AcknowledgementsThe group:Ricardo Mata group-leaderJoao Oliveira biosystemsJonas Feldt mobile computingThorsten Stolper opts and regionsMilica Andrejic QM/MM
Collaborations:Bernd Hartke water clustersSascha Frick hydration shells in reactionsThomas Zeuch cluster experimentsSvenja Fischmann internshipDietmar Stalke inorganic compounds IGuido Clever inorganic compounds IIMark Feyand inorganic compounds III
Bachelors:Sebastian Gerke (Gottingen), Maximilian Konrad (Gottingen), Sebastian Wille(Gottingen), Marvin Kammler (Gottingen)
IntroductionAlgorithmic DetailsApplications and PitfallsRelated Other ProjectsSummary and Outlook
0.0: 0.1: 0.2: 0.3: 0.4: 0.5: 0.6: 0.7: 0.8: 0.9: 0.10: 0.11: 0.12: 0.13: 0.14: 0.15: 0.16: 0.17: 0.18: 0.19: 0.20: 0.21: 0.22: 0.23: 0.24: 0.25: 0.26: 0.27: 0.28: 0.29: 0.30: 0.31: 0.32: 0.33: 0.34: 0.35: 0.36: 0.37: 0.38: 0.39: 0.40: 0.41: 0.42: 0.43: 0.44: 0.45: 0.46: 0.47: 0.48: 0.49: 0.50: 0.51: 0.52: 0.53: 0.54: 0.55: 0.56: 0.57: 0.58: 0.59: 0.60: 0.61: 0.62: anm0: