data pipelining and workflow management for materials science applications
DESCRIPTION
Workshop in computational methods for materials science, presented at Spring 2010 ACS conference. This workshop illustrates how high-throughput computation and automation can be used with quantum chemistry calculations to solve problems in materials discovery. Examples include catalysts, fuel cells, OLEDs.TRANSCRIPT
Data Pipelining and Workflow Management for Materials
Science Applications
Data Pipelining and Workflow Management for Materials
Science Applications
Dr George Fitzgerald
Dr Mathew Halls
Dr Jacob Gavartin
Dr Gerhard Goldbeck-Wood
Accelrys, Inc.
Overview
• Modeling overview
• Workflow automation
• Examples
– PEM Fuel Cell Catalysts
– Lithium Ion Battery Additives
– OLEDs
– Metallocenes
© 2008 Accelrys, Inc. 2
• Evolutionary optimization algorithms
• Summary
The Concept of Modeling: Computational Physics and Chemistry
• Computational Physics and Chemistry simulate structures, processes and properties
numerically, based fully or in part on fundamental principles of physics
• Some methods may be used to model not only stable molecules but also short-lived,
unstable intermediates and even transition states.
• Computational Physics and Chemistry are vital adjuncts to experimental studies
© 2008 Accelrys, Inc. 3
• Computational Physics and Chemistry are vital adjuncts to experimental studies
• Roles of modeling today
– Run through many scenarios quickly and easily
– Visualize results and share information
– A common platform for expert and non-expert
Virtual Experiments
Issues that simulation can address…
• Reactions, bond formation and breaking
• Miscibility, solubility…
• Diffusion, permeation, membrane transport…
• Adhesion (i.e., interactions with surfaces)
• Crystallization and polymorphism
• Micelle or vesicle formation and properties
Classical
Quantum Mechanics
© 2008 Accelrys, Inc. 4
• Micelle or vesicle formation and properties
• Emulsions, kinetics and properties
• Polymeric microspheres, release profiles
Increasing Size & Complexity
Mesoscale
High-Throughput Computation
• Goal:
– Use computation to assist in the rapid discovery of new materials
• Why High-Throughput Computation (HTC)?
– Brute force: screen more materials
– Make life easier: reduce human effort and human error
– Be clever: with enough results you can start to see trends, make broad predictions
© 2008 Accelrys, Inc. 5
• We want to do these calculations as rapidly as possible
• Available tools
– Predict properties from first principles (or derived from first principles)
– Create phenomenological models based on modeling + experiment (QSAR)
– Statistical analysis of experimental and/or computational results: predictive analytics
Components of an HTC System
• Good hardware
– Fast chips = less time per calculation
– Many cores = more simultaneous calculations
• Good predictive methods
– Accurate methods like DFT, molecular mechanics, or mesoscale models
– Rapid methods like QSAR: GFA, NN, Recursive partitioning
© 2008 Accelrys, Inc. 6
• Workflow automation tools
– Create complex, multistep calculations
– Manage job submission and analysis
– Create summary of results
– Compare to experiment
Automated Chemical Modeling
• Workflow management tools capture complex modeling workflows into an automated
workflow for calculation and analysis of materials systems
• Essential tasks include
– Running simulations (MM, Semiempirical, QM, etc.)
– Manipulation of chemical structures
– Arithmetical manipulation of results
– Integration of multiple data sources (analytical instruments, modeling, publications)
– Statistical analysis of results (QSAR, clustering)
© 2008 Accelrys, Inc. 7
– Statistical analysis of results (QSAR, clustering)
– Reports & graphs
– Pipelining, i.e., using output from one component as input to the next
Chemical
Motif
Design
Virtual
Library
Enumeration
Automated
QC
Calculation
Virtual
Materials Identification
of optimum
Materials Discovery and Optimization using Virtual Screening
© 2008 Accelrys, Inc. 8
Materials
Library /
Database
of optimum
leads
Experimental
screeningAnalysis
QSAR in the Design of Materials
• Some properties are easy to calculate, e.g.,
– Structure
– Heat of formation
– HOMO-LUMO gap
• But the properties that easy are not always the ones we want
– Corrosion resistance
– Catalyst lifetime
– Tg
QSAR gives us a way to estimate the difficult properties based on the ones that we
© 2008 Accelrys, Inc. 9
• QSAR gives us a way to estimate the difficult properties based on the ones that we
can calculate easily and quickly
• QSAR procedure
– Get experimental results (or accurate computation)
– Compute “descriptors”
– Create a statistical model that can predict the target properties
– Use the model to predict the results for “virtual samples”
• Examples
– Cytotoxic activities of platinum complexes, J. Comput. Aided Mol. Des. 23 (2009) 343.
– Corrosion Inhibitors, Progress in Organic Coatings 61 (2008) 11.
– Metal-organic frameworks for hydrogen storage, Cat. Today 120 (2007) 317.
Uses of Workflow Automation
• Programs like Pipeline Pilot provide drag-and-drop method for building workflows
• Some calculations require multiple steps
– IP: ground state optimization + single cation energy
– pKa: vacuum and solvated calculations of protonated and de-protonated species
• Generation of starting structures
– Combinatorial libraries
– Defects
– SurfacesO
OR4
X
X
Z
X
X
X
X
X
ZX X
X
XZ
XX
X
© 2008 Accelrys, Inc. 10
– Surfaces
• Summary and reporting R1
O
R2
R3O
Z
XXX
XX
Z
X
X
X
Z
XX
X = F or H
X z1
Simple Workflow Example: Adiabatic & Vertical IP
• Calculating Vertical IP:
– Geometry optimize neutral
– Single-point energy of cation
• Calculating Adaibatic IP:
– Geometry optimize neutral
– Geometry optimize cation
• Workflow simplifies and automates these 3
calculations and presents results in table,
En
erg
y
λ+/-
En
erg
y
λ+/-
© 2008 Accelrys, Inc. 11
calculations and presents results in table,
spreadsheet, database…
Reaction Coordinate
Ma+/- Mb Ma Mb
+/-
Reaction Coordinate
Ma+/- Mb Ma Mb
+/-
PEM Fuel Cells Challenges
yelectricitOHHO +→+ 222 22• iCatDesign project used combined theory and
experiment to find new catalysts for oxygen
activation in fuel cells
– Johnson Matthey
– CMR Fuel Cells
– Accelrys
– Co-funded by the UK Technology Strategy Board's
Collaborative Research and Development
programme
• One challenging step is Oxygen Reduction
Reaction (ORR)
© 2008 Accelrys, Inc. 12
Anode:
Cathode:
−++→ eHH 442 2
OHeHO 22 244 →++−+
Reaction (ORR)
• Pt is effective catalyst for activating O2 but too
expensive for large-scale application
– How can we find catalysts that are just as effective
but less expensive?
– High-throughput DFT calculations with CASTEP
• Recently published:
– Gavartin, et al., ECS Transactions 25, 1335-1344
(2009)
Adsorption and activation energies: ORR
E
Reaction coordinate
E0=E(O2+*)
E1=E(O2*)
ETS=E(O*-O*)
E2=2E(O*)
© 2008 Accelrys, Inc. 13
Reaction coordinate
iCatDesign
• ORR activity needs the adsorption energy just
right– To loose → no activation
– To tight → no desorption
• Activity would improve if Eads were a bit less
than in pure Pt
• Expansion and contraction of Pt lattice leads to
changes in Eads
Reducing Computational Cost
• This work examined alloys of the form A3B, e.g., Pt3Co
• Use 5 layer model with lowest layers fixed
– In 3 layer model, there are 220 unique structures
– For 2xA and 10xB elements > 2,000 calculations
• Need ORR activation for each
– How can we avoid 2,000 DFT TS searches?
© 2008 Accelrys, Inc. 14
• We can estimate activity with Eads
• Observation: d-band center is roughly linear with Eads
• Reduction in computational cost:
– ORR barrier (TS optimization)
– Eads (constrained geometry optimization)
– d-band center
Summary of HTC for CASTEP Calculations
• Many low-lying structures for each A3B
– Computation of Eads requires ensemble average
– Automation provides tremendous simplification to this process
• CASTEP Component simplifies and automates setup & analysis of multiple jobs
• Pt3Co identified as lead alloy
• Next steps:
– Submit lead compounds to calculations of Eads
© 2008 Accelrys, Inc. 15
– Submit lead compounds to calculations of Eads
– Submit best results to TS calculations
– Submit best results for experimental screening
– Use computation to validate experimental results
• E.g., confirm experimental structures via Raman
– Use experimental results to refine the QSAR model
Lithium Ion Batteries and SEI Film Formation
© 2008 Accelrys, Inc. 16
• The electrolyte typically consists of one or more lithium salts dissolved in
an aprotic solvent with at least one additional functional additive
Lithium Ion Batteries and SEI Film Formation
© 2008 Accelrys, Inc. 17
• The electrolyte typically consists of one or more lithium salts dissolved in
an aprotic solvent with at least one additional functional additive
• Additives are included in electrolyte formulations to increase the
dielectric strength and enhance electrode stability by facilitating the
formation of the solid/electrolyte interface (SEI) layer
Lithium Ion Batteries and SEI Film Formation
• Initiation step leading to anode SEI formation is electron transfer to theSEI forming species
– Results in decomposition reaction
1 e- decomposition
scheme
© 2008 Accelrys, Inc. 18
– Results in decomposition reaction
– Produces the passivating SEI layer
• Important requirements for electrolyte additives selected to facilitategood SEI formation are:
– Higher reduction potential than the base solvent (low LUMO)
– Maximal reactivity for a given chemical design space (low hardness η)
– Large dipole moment for interaction with Li (high µ)
Anode SEI Additive Structure Library
R1
O
R2
O
R3
R4
O
X
X
Z
X
X
X
X
X
Z
XXX
XX
Z
X
X
X
Z
X
XZ
XX
X
X z1
© 2008 Accelrys, Inc. 19
• Cyclic carbonates, related to ethylene carbonate (EC), are often used as
anode SEI additives for use with graphite anodes
• To explore the effect of alkylation or fluorination on EC-based additive
properties an R-Group based enumeration scheme was used to generate a
EC-based additive structure library (7381 stereochemically unique
structures)
XX
X = F or H
X z1
Anode SEI Additive Results
© 2008 Accelrys, Inc. 20
• Optimal materials must satisfy a number of objectives
• Multi-objective solutions represent a trade-off between objectives
• One approach is to adopt the “Pareto-optimal” solution
– Set of solutions such that is not possible to improve one property without making any other property worse
– This case: • Minimize the chemical hardness
• Maximize the dipole moment and electron affinity
3D View of Pareto Surface
© 2008 Accelrys, Inc. 21
Anode SEI Additive Pareto Optimal Candidate
• Optimal materials solutions are systemsthat simultaneously satisfying a numberof target objectives
• Multiobjective solutions represent a trade-off between objectives, with one classbeing Pareto-optimal solutions
• Pareto-optimal solutions are defined as aset of solutions which are non-dominated,
© 2008 Accelrys, Inc. 22
set of solutions which are non-dominated,such that is not possible to improve oneproperty without making any otherproperty worse
• For anode SEI additives optimal solutionsseek to minimize the LUMO energy,maximize the dipole moment andminimize the chemical hardness
• Screening the EC-based additive librarygives structure 1573 as a Pareto-optimalsolution (R1=R2=CH3 and R3=R4=c-C3F5)
1573
Organic Light Emitting Diode (OLED) Basics
ITO Glass Substrate
Hole-Transport Layer (HTL)
Electron-Transport Layer (ETL)
Cathode
Simple 2 Layer OLED Device Structure
AlQ3
© 2008 Accelrys, Inc. 23
ITO Glass Substrate
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
NPB
Anode
Cathode
HTL ETL
AlQ3 Electron Transport and Emitting Material
• Following Tang and Van
Slyke’s pioneering work1,
AlQ3 has become the
archetype OLED material
• Optoelectronic properties can
be tuned by derivatizing AlQ3 HOMO LUMO
Experimental λmax for Derivatized AlQ3
Materials (Al(QX)3)
© 2008 Accelrys, Inc. 24
be tuned by derivatizing AlQ3
with electron-withdrawing or
electron-donating
substituents
• Al(QX)3 have been
experimentally demonstrating
that R1/R2 substituents affect
the electronic and optical
properties2 1 Tang, C. W.; VanSlyke, S. A. Appl. Phys. Lett. 1987, 51, 913.2 Chen, C. H.; Shi, J. Coord. Chem. Rev. 1998, 171, 161.
Group 1-CH3 2-CH3 2-F 2-Cl 2-CN
∆λmax -10 nm +31 nm +15 nm +10 nm -3 nm
Al(QX)3
Virtual Library Enumeration in SES
• Virtual library enumeration has played a major
role in computational drug design
• Similar approaches, using RGroup-based or
Reaction-based, enumeration schemes can be
used to generate virtual libraries of materials
which can be analysed, screened and filtered to
identify and explore:
– Lead material candidates
– Material property trends and SPRs
© 2008 Accelrys, Inc. 25
– Material property trends and SPRs
• The enumeration components in the ‘Chemistry
Component Collection’ on the SES platform
enables automated library generation which can
be store as a file or directly pipelined into an
analysis workflow
• A virtual library of 8436 Al(QX2)3 structures were
generated combining the 6 substituents studied
experimentally over the 2 reaction sites per ligand
on the AlQ3 core
Al(QX2)3 Library
© 2008 Accelrys, Inc. 26
8436 Structures
OLED Pipelined QC Workflow
• Pipeline employing the using the PM3 Hamiltonian through the VAMP
component was constructed to compute:
– Total Energies– HOMO and LUMO Energies
– Vertical & Adiabatic Ionization Potential (IP)
– Vertical & Adiabatic Electron Affinity (EA)
En
erg
y
λ+/-
En
erg
y
λ+/-
© 2008 Accelrys, Inc. 27
(EA)
• Charge transport through weakly interacting monomeric materials is outer sphere electron transfer and is
described by Marcus theory
• Characteristic Energies were also computed:
– Hole Reorganization Energy (λ+)– Electron Reorganization Energy (λ-)
Reaction Coordinate
Ma+/- Mb Ma Mb
+/-
Reaction Coordinate
Ma+/- Mb Ma Mb
+/-
• A random percent filter was used to
sample the Al(QX2)3 structure library
and >1000 structures were analyzed
through the OLED QC protocol
OLED Pipelined QC Workflow Results
© 2008 Accelrys, Inc. 28
OLED Pipelined QC Workflow Results
© 2008 Accelrys, Inc. 29
• Al(QX2)3 properties can be tailored through changes in molecular structure
– LUMO energy and Electron Reorganization Energy vary over ranges of ca. 1.25 and 2.25
eV
• Analysis of the Reorg E Difference (Elec Reorg E - Hole Reorg E) shows that changes
in structure can switch the preferred transport from electron to hole
OLED Pipelined QC Workflow Results
© 2008 Accelrys, Inc. 30
• Al(QX2)3 library with QC computed properties can be screened for optimal candidates
• Superior ETL OLED materials should be stable and preferentially conduct electrons
• Library can be Pareto sorted to simultaneously minimize the ‘Heat of Formation’ and ‘Electron Reorg E’ to identify lead structures
Modeling the Activity of Polymerization Catalysts
• Metallocenes are known as effective catalysts for
polymerization
• Alter ligands for control of
– Activity
– Molecular weight of polymer
– Tacticity of polymer
• QM can predict reliable reaction rates, but…
– Time consuming
– TS difficult to automate
© 2008 Accelrys, Inc. 31
– TS difficult to automate
• How do we make modeling more efficient and
more amenable to automation?
– Develop QSAR models
– Screen many, many structures with QSAR
– Perform time-consuming QM on only the most
promising leads
– Perform experiments on only the best QM results
Metallocene data from Albert J van Reenen,
http://academic.sun.ac.za/UNESCO/Conferences/Conference1999/Lectures1999/VanReenen99/VAN%20REENEN.html
Details of QSAR & GFA
• Choice of descriptors:
– “Fast descriptors”
• Topological descriptors
• Information content descriptors
– QM descriptors with VAMP (PM6 or AM1-d)
• Charge on metal atoms
• Fukui index on metal atoms
– Structural
• “Bite angle”
© 2008 Accelrys, Inc. 32
• “Bite angle”
• Choice of compounds
– 31 structures with experimental data�
• Model
– GFA with linear splines
– 6 term equation
Bite angle
���� Metallocene images and data from Albert J van Reenen,
http://academic.sun.ac.za/UNESCO/Conferences/Conference1999/Lectures1999/VanReenen99/VAN%20REENEN.html
Genetic Function Algorithm (GFA)
• Genetic function algorithm (GFA) yields analytical models
• GFA finds the best function and fewest descriptors
– It is possible to identify the importance of each descriptor
– Produces a family of results, not just a single equation
• Analytical expression can include:
– Linear terms a * xi
– Quadratic terms a * xi2
– Cross terms a * xi * xj
© 2008 Accelrys, Inc. 33
– Cross terms a * xi * xj
– Splines <xi – a>
• Example:
– Catalyst Activity = -23.4
+ 2.04 * [Treatment Time]
– 0.016 * [Fe2O3%]
+ 0.256 * [PtO %]
– 0.0224 * [Al2O3%] * [Cr2O3 %]
GFA Results
• Summary of GFA equations
• Display of predicted vs. actual
© 2008 Accelrys, Inc. 34
Using the GFA for Combinatorial Catalysis
• Framework: 4 choices
• Metal: 3 choices
• R1, R2, R3: 6 choices
• Approx 1,300 calculations
• Procedure
– Generate combinatorial library
– Compute descriptors (charges, bite angle, etc.)
© 2008 Accelrys, Inc. 35
– Compute descriptors (charges, bite angle, etc.)
– Use GFA model to predict catalyst performance
– Take best leads and use QM to predict more accurately
• Advantages
– Easier than manual approach
– Faster than doing exact QM TS on everything
– Find trends in the performance of different R groups
Evolutionary Optimization
• Genetic Function Algorithm (GFA) produces an analytical expression but
how do we find the extrema?
• Approach 1: Brute force
– Generate the combinatorial grid of data and look for maximum and minimum
– For each molecule compute descriptors then evaluate activity with GFA
– Not a bad approach if you have the CPU resources
• Approach 2: Genetic Algorithm (GA)
– GA can be compared to the evolution of DNA
– An initial population is randomly constructed
© 2008 Accelrys, Inc. 36
– An initial population is randomly constructed
– The “best” individuals are allowed to propagate
– Positive traits passed to next generation
Applications of GA to Materials Discovery
• Metallocene catalysts
– Located optimum in ~400 calculations (1,300 possible)
• Battery additives
– Located optimum in ~500 calculations (7,300 possible)
• H storage nanoclusters
© 2008 Accelrys, Inc. 37
• H2 storage nanoclusters
– Dope Mg13 with Li and B
– Total 1,590,000 structures
– Work in progress: predict most stable nanocluster by GA
• Framework: 4 choices
• Metal: 3 choices
• R1, R2, R3: 6 choices
• Generate random population of 20 individuals
• Compute descriptors (charges, bite angle, etc.)
• Use GFA model to predict catalyst performance
Metallocene Optimization by GA
© 2008 Accelrys, Inc. 38
• Use GFA model to predict catalyst performance
• Take best results and allow them to evolve
• Advantages
– Automated
– Faster (usually) than exhaustive search
• Disadvantages
– In danger of becoming a ‘black box’
Summary
• The generation of virtual structure libraries can be used to explore
materials design space
• Automation and data pipelining are key to HTC
– Eliminate tedium
– Reduce human error
– Allow a greater number of samples to be screened
© 2008 Accelrys, Inc. 39
• Larger number of results brings into play statistical methods for
finding trends
• Approximate methods like QSAR are valuable for reducing the
number of expensive calculations
• Evolutionary algorithms like GA make it possible to automate the
discover process, not just the computational process
Acknowledgements
• Collaborator for Li additive project: Ken Tasaki,
– Technology Research Division, Mitsubishi Chemical Inc., Redondo
Beach, CA 90277
• Computational resources for HTC: Hewlett-Packard
• iCatDesign project sponsored by Technology Strategy Board
Project Number: /5/MAT/6/I/H0379C
© 2008 Accelrys, Inc. 40
Project Number: /5/MAT/6/I/H0379C