global optimization algorithms for chemical applications

35
Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 1/ 28 Global Optimization Algorithms for Chemical Applications Algorithms, Showcases and Other Projects Johannes M. Dieterich Institute for Physical Chemistry Georg-August-University G¨ ottingen Germany 2012

Upload: doanngoc

Post on 04-Jan-2017

229 views

Category:

Documents


0 download

TRANSCRIPT

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 1/ 28

    Global Optimization Algorithms for Chemical ApplicationsAlgorithms, Showcases and Other Projects

    Johannes M. Dieterich

    Institute for Physical ChemistryGeorg-August-University Gottingen

    Germany

    2012

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 2/ 28

    Disclaimer

    THIS IS A STRIPPED-DOWN VERSION.

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 3/ 28

    The search space

    General problem

    The search space typically has various minima of different fitness. Only fewest (normally just one) are of optimal fitness. In many cases, the number of minima increases exponentially with

    search space dimensionality. NPOs (NP optimization problems):

    knapsack (packing problem) travelling salesman (shortest path problem)

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 4/ 28

    Exponential scaling of search spaces: Should a chemist care?

    The Lennard-Jones example

    LJ clusters seem to exhibit exponential (exp(x2)) scaling of the numberof minima w.r.t. cluster size12

    Clusters described by two-body forces are NP-hard optimizationproblems.3

    1M. R. Hoare and J. McInnes, Faraday Disc. Chem. Soc., 32, 791 (1983).2LJ100 is considered to have 1 10

    40 local minima.3L. T. Wille and J. Vennik, J. Phys. A: Math. Gen., 18, L419 (1985).

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 5/ 28

    (Efficient) Exploration of the search space: Techniques

    MC/MD-based methods Simulated annealing (SA)4: Crystal growth. Monte Carlo with minimizations (MCM)5: Basin hopping. Minimum hopping (MH)6: New MD-based algorithm.

    Nature inspired algorithms

    Genetic algorithms (GA)7: Darwinian/Lamarckian evolution. Evolutionary strategies (ES)8: Mutation-focused evolution. Particle-Swarm optimization (PSO)9: Model movements in a group.

    4S. Kirkpatrick, C. D. Gelatt and M. P. Vecchi, Science, 220, 671 (1983).5Z. Li and H A. Scheraga, Proc. Natl. Acad. Sci. USA, 84, 6611 (1987); D. J. Wales and H. A. Scheraga, Science, 285, 1368 (1999).6S. Goedecker, J. Chem. Phys., 120, 9911 (2004).7D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, 1989.8I. Rechenberg, Evolutionsstrategien: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution., 1973.9J. Kennedy and R. Eberhart, Proc. IEEE Int. Conf. Neural Networks, 1942 (1995).

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 6/ 28

    Genetic Algorithms for Cluster Optimization

    Status Highly optimized, system-specific implementations10

    Typically only support for a few levels of theory Maximal support for binary systems11

    Remaining Challenges

    More efficient and general algorithms Extensible, futureproof framework Efficient and stable parallelization schemes Support for arbitrary levels of theory Support for arbitrary systems

    arbitrary systems arbitrary degrees of freedom

    10e.g. Hartke working group: similar yet different versions for TIP4P clusters, doped TIP4P clusters, TTM2F clusters, clathrates,...11see e.g. R. L. Johnston et al and R. Ferrando et al

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 7/ 28

    Genetic algorithms in a nutshell

    (Genetic) Operations

    selection/mating crossover mutation (sanity check) (optional magic) local optimization

    Maintaining/updating the population

    diversity check add to the population

    (a) selecting two parents

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 7/ 28

    Genetic algorithms in a nutshell

    (Genetic) Operations

    selection/mating crossover mutation (sanity check) (optional magic) local optimization

    Maintaining/updating the population

    diversity check add to the population

    (b) crossover

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 7/ 28

    Genetic algorithms in a nutshell

    (Genetic) Operations

    selection/mating crossover mutation (sanity check) (optional magic) local optimization

    Maintaining/updating the population

    diversity check add to the population

    (c) mutation

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 7/ 28

    Genetic algorithms in a nutshell

    (Genetic) Operations

    selection/mating crossover mutation (sanity check) (optional magic) local optimization

    Maintaining/updating the population

    diversity check add to the population

    (d) collision (i.a.)

    (e) dissociation (i.a.)

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 7/ 28

    Genetic algorithms in a nutshell

    (Genetic) Operations

    selection/mating crossover mutation (sanity check) (optional magic) local optimization

    Maintaining/updating the population

    diversity check add to the population

    (g) opt. magic

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 7/ 28

    Genetic algorithms in a nutshell

    (Genetic) Operations

    selection/mating crossover mutation (sanity check) (optional magic) local optimization

    Maintaining/updating the population

    diversity check add to the population

    (f) local opt.

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 7/ 28

    Genetic algorithms in a nutshell

    (Genetic) Operations

    selection/mating crossover mutation (sanity check) (optional magic) local optimization

    Maintaining/updating the population

    diversity check add to the population

    (h) diversity check

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 7/ 28

    Genetic algorithms in a nutshell

    (Genetic) Operations

    selection/mating crossover mutation (sanity check) (optional magic) local optimization

    Maintaining/updating the population

    diversity check add to the population

    (i) adding an individual

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 8/ 28

    So, what is the magic?

    Options

    directed mutation: move a building block to a better position (requiressystem knowledge)12

    explicit exchange: (randomly) exchange building blocks for a bettermixing in highly-mixed systems13

    hybrid algorithms: couple GA/MCM for better search space explorationand exploitation14

    GA/MCM

    12Applications to water clusters by B. Hartke et al., currently in development for OGOLEM13See e.g. R. L. Johnston, Dalton Trans. 4193 (2003).14Some recent (mostly EA/SA based) implementations found in protein folding. DEM/SA/EP: M. Goldstein, E. Fredj and R. B. Gerber, J.

    Comput. Chem., 32, 1785 (2011), SA/EA: Y. Sakae, T. Hiroyusa, M. Miki and Y. Okamoto, J. Comput. Chem., 32, 1353 (2011).

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 9/ 28

    Differences in search space exploration

    Exploration and exploitation differences

    GA-based methods: good long-range exploration, good short-rangeexploitation (with minimizations)

    MC/MD-based methods: good mid-range exploration and exploitation,good short-range exploitation (with minimizations)

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 10/ 28

    Benchmarking genetic algorithms: Ackleys function

    (a) full search space (2D) (b) fine structure (2D)

    Function definition

    f (~x) = 20 exp

    0.21

    n

    ni=1

    x2i

    exp[1n

    ni=1

    cos(2xi)

    ]+20+e1 (1)

    with the global minimum xi = 0.0 in the n-dimensional space.15

    15D. H. Ackley, A connectionist machine for genetic hillclimbing, 1987; T. Back, Evolutionary algorithms in theory and practice, 1996.

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 11/ 28

    Benchmarking genetic algorithms: Ackleys function II

    (a) w/ local opt. (b) w/o local opt.

    Conclusions The scaling is both w/ and w/o local optimization excellent for all tested

    crossover operators (O(N1.1) and O(N1.0)). The benchmark function can be easily solved even for 1000 dimensions. Standard benchmark functions are too simple for GA-based global

    optimization techniques.16

    16J. M. Dieterich and B. Hartke, in preparation.

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 12/ 28

    Hypersurfaces for cluster optimization

    Requirements

    exact description (to the sub-kcal regime) fast energy and gradient evaluations (since typically multiple hundreds of

    thousands are needed) well-behaving (no discontinuities, distinct gradients)

    (Partial) Solutions

    specialized force fields: fast but potentially inaccurate ab initio methods: exact and (very) slow combinations of the two any other methods (semiempirics,...) depending on the system

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 13/ 28

    OGOLEM methods and interfaces

    Interfaces Tinker and OpenBabel:

    standard force fields Amber, Gromacs and NAMD:

    (custom) force fields for largerbiological systems

    DFTB+:tight-binding methods

    MNDO and MOPAC:(special) semiempirical methods

    Orca:semiempirics, DFT and ab initio

    CPMD and CP2K:plane wave DFT with periodicboundaries

    MOLPRO:high(er)-level ab initio methods

    Internal implementations

    Standard LJ FF:employingLorentz-Berthelot mixingrules

    High-level LJ FF:dedicated parameters foreach pair, arbitrary powerexpression

    Gupta FF:two-body FF for metals

    Amber FF:reparametrizable

    UFF GGFF:

    general Gaussian-basedtwo- and three-body FF

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 14/ 28

    OGOLEM: Global optimization of chemical problems

    Cluster structures arbitrary levels of theory arbitrary systems with arbitrary DoF first implementation of a N-phenotype X-over for N > 2 support for systems in environments (on surfaces, in pockets)

    Parametrization (system-specific) potential parametrization against reference data currently focussed on energy-structure dependence for fitness

    Molecular design

    discrete optimization problem design of photo-switchable molecules17

    tuning of substuents for tuning of excitation energies excitation energies using FOCI-AM1 (Mopac) or TDDFT (Orca)

    17N. O. Carstensen, J. M. Dieterich and B. Hartke, Phys. Chem. Chem. Phys., 13 290 (2011).

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 15/ 28

    Kanamycin A dimers

    (a) (KA)2 (b) (KA)2Na+ (c) (KA)2K+

    Methods Global optimization based on AMBER:GAFFv10 employing genotype

    operators. Reoptimization using MOPAC:PM3. Final structure selection based on MOLPRO:DF-LMP2/cc-pVDZ. Structures seem to support the preference of sodium over potassium

    also seen in experiment.18

    18J. M. Dieterich, U. Gerstel, J.-M. Schroder, and B. Hartke, J. Mol. Model. doi: 10.1007/s00894-011-0983-x (2011).

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 16/ 28

    Sodium-doped water clusters I: Techniques

    Experimental requests

    relatively exact description of (H2O)nNa (n

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 17/ 28

    Sodium-doped water clusters II

    (H2O)3Na

    -200

    0

    200

    400

    600

    800

    1000

    2800 3000 3200 3400 3600 3800 4000

    rela

    tive

    IR tr

    ansm

    issi

    on

    wavenumber [cm-1]

    (a) Type A

    -600

    -400

    -200

    0

    200

    400

    600

    800

    1000

    2800 3000 3200 3400 3600 3800 4000

    rela

    tive

    IR tr

    ansm

    issi

    on

    wavenumber [cm-1]

    (b) Type B

    200

    300

    400

    500

    600

    700

    800

    900

    1000

    2800 3000 3200 3400 3600 3800 4000

    rela

    tive

    IR tr

    ansm

    issi

    on

    wavenumber [cm-1]

    (c) Type C

    allows for a reliable assignment of structural patterns may explain temperature dependence of experimental spectra19

    level of theory becomes unfeasible for (H2O)X Na with X > 7

    19R. Forck, J. M. Dieterich, R. A. Mata, T. Zeuch, Phys. Chem. Chem. Phys. accepted.

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 18/ 28

    scaTTM3F: A modern TTM3-F

    Project

    global optimization of water clusters (bigger than 34 units) using TTM3-F as a backend Java2JNI2Fortran is nasty (SMP impossible)

    Status purely Scala-based implementation (scaTTM3F) Aland (energy decomposition based globopt) niching directed mutation TBD

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 19/ 28

    Water clusters: Some common TTM2-F/TTM3-F structures

    (a) 3 (b) 4 (c) 5 (d) 6 (e) 7

    (f) 8 (g) 9 (h) 10 (i) 11 (j) 12

    (k) 13 (l) 14 (m) 15 (n) 16

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 20/ 28

    Clever opt: Simplification rules!

    V1

    V2

    V3

    r1

    r2

    r3

    R

    R

    The workflow

    Vtot(r2) = V1(R r2) + V2(r2) + V3(R r2), R = 16.34A (2)

    constrained BP86/def2-SVP optimizations (rotated planes for BF4) DF-MP2/cc-pVTZ single points GGFF[5] fitting Is this really reasonable?

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 21/ 28

    Clever opt: Simplification rules! II

    0.2

    0.15

    0.1

    0.05

    0

    0.05

    13 13.5 14 14.5 15 15.5 16 16.5 17 17.5

    E [E

    h]

    r [a0]

    GGFF[5] fit

    (a) BF4 pot.

    0.25

    0.2

    0.15

    0.1

    0.05

    0

    0.05

    10 11 12 13 14 15 16 17 18

    E [E

    h]

    r [a0]

    GGFF[5] fit

    (b) Cl pot.

    E [k

    Jm

    ol

    1 ]

    r2 []

    1500

    1450

    1400

    1350

    7 7.5 8 8.5 9 9.5 10

    (c) (BF4)3

    E [k

    Jm

    ol

    1 ]

    r2 []

    1550

    1500

    1450

    1400

    1350

    7 7.5 8 8.5 9 9.5 10

    (d) (BF4)2ClE

    [kJ

    mol

    1 ]

    r2 []

    1600

    1550

    1500

    1450

    1400

    7 7.5 8 8.5 9 9.5 10

    (e) (BF4)Cl2

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 22/ 28

    Sn(III)/Ge(III) Hydrides

    (a) GeH, BP (b) SnH, BP

    Results20

    BP86/def2-SVP Sn: SD-ECP optimizations and frequencies NBO analysis: Sn-H: 89.3% p-character, Ge-H: 84.8% lone pair: Sn: 79.4% s-character, Ge: 70.9% 2nd order perturbation energy NBO analysis: N lone pair donation into

    metal p orbital (50 kcal/mol Sn, 35 kcal/mol Ge)

    20S. Khan, P. P. Samuel, R. Michel, J. M. Dieterich, R. A. Mata, J.-P. Demers, A. Lange, H. Roesky, D. Stalke, Chem. Commun., in press.

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 23/ 28

    Sn(III)/Ge(III) Hydrides: DFT-D?

    (a) GeH, BP-D (b) SnH, BP-D3

    What about DFT-D? local geometry around the metal similar to BP86 isopropyl geometry changes (methyl moiety points towards metal),

    difference to experiment DF-MP2/cc-pVTZ favours the BP results (by 2.9 for Sn and 0.2 kcal/mol

    for Ge)

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 24/ 28

    Atomdroid: Mobile computational chemistry

    User interface Molecular viewer Molecular builder Molecule manipulation/analysis PDB downloader Bluetooth sharing

    Computational methods

    UFF energy and gradient21

    L-BFGS22, NEWUOA and Powell minimizations Monte Carlo Monte Carlo with Minimizations (MCM)23

    21A. K. Rappe, C. J. Casewit, K. S. Colwell, W. A. Goddard, and W. M. Skiff, J. Am. Chem. Soc., 114, 10024 (1992).22J. Nocedal, Math. of Comput., 35, 773 (1980) via RISO: an implementation of distributed belief networks in Java.23Z. Li and H A. Scheraga, Proc. Natl. Acad. Sci. USA, 84, 6611 (1987); D. J. Wales and H. A. Scheraga, Science, 285, 1368 (1999).

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 25/ 28

    Benchmarks: Is it more than just a toy?

    Prejudice

    Mobile platforms are (and will always be) just for toys not for realapplications.

    Hardware is not (and will never be) fast enough for real computations.

    Benchmark systems

    (c) MeOH (d) Coffein (e) Cryptand 222 (f) Neurokinin A (g) alpha RgIA

    Hardware: PoV Mobii Tablet Tegra2 w/ Android 3.0 ROM

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 26/ 28

    Benchmarks: Results

    0

    5

    10

    15

    20

    25

    30

    35

    40

    45

    0 20 40 60 80 100 120 140 160 180 200 220

    time

    [ms]

    size [atoms]

    energyenergy+gradient

    System No. of atoms energy [ms] energy+gradient [ms]A 6 0.100.01 0.180.02B 24 0.860.01 1.480.02C 63 3.540.02 5.480.02D 155 17.400.09 26.470.07E 201 28.740.03 43.560.02

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 27/ 28

    Summary and Outlook

    general framework for global optimization exploitation of new hardware platforms: Android, CUDA (not in this talk) development of new parallelization schemes implementation and parametrization of classical force field expressions

    as well as novel, flexible ones cooperations with experiment (Kanamycin A, doped water clusters) development of novel GA-based algorithms ...

  • Introduction Algorithmic Details Applications and Pitfalls Related Other Projects Summary and Outlook 28/ 28

    AcknowledgementsThe group:Ricardo Mata group-leaderJoao Oliveira biosystemsJonas Feldt mobile computingThorsten Stolper opts and regionsMilica Andrejic QM/MM

    Collaborations:Bernd Hartke water clustersSascha Frick hydration shells in reactionsThomas Zeuch cluster experimentsSvenja Fischmann internshipDietmar Stalke inorganic compounds IGuido Clever inorganic compounds IIMark Feyand inorganic compounds III

    Bachelors:Sebastian Gerke (Gottingen), Maximilian Konrad (Gottingen), Sebastian Wille(Gottingen), Marvin Kammler (Gottingen)

    IntroductionAlgorithmic DetailsApplications and PitfallsRelated Other ProjectsSummary and Outlook

    0.0: 0.1: 0.2: 0.3: 0.4: 0.5: 0.6: 0.7: 0.8: 0.9: 0.10: 0.11: 0.12: 0.13: 0.14: 0.15: 0.16: 0.17: 0.18: 0.19: 0.20: 0.21: 0.22: 0.23: 0.24: 0.25: 0.26: 0.27: 0.28: 0.29: 0.30: 0.31: 0.32: 0.33: 0.34: 0.35: 0.36: 0.37: 0.38: 0.39: 0.40: 0.41: 0.42: 0.43: 0.44: 0.45: 0.46: 0.47: 0.48: 0.49: 0.50: 0.51: 0.52: 0.53: 0.54: 0.55: 0.56: 0.57: 0.58: 0.59: 0.60: 0.61: 0.62: anm0: