data pipelining and workflow management for materials science applications

40
Data Pipelining and Workflow Management for Materials Science Applications Data Pipelining and Workflow Management for Materials Science Applications Dr George Fitzgerald Dr Mathew Halls Dr Jacob Gavartin Dr Gerhard Goldbeck-Wood Accelrys, Inc.

Upload: biovia

Post on 25-May-2015

680 views

Category:

Technology


2 download

DESCRIPTION

Workshop in computational methods for materials science, presented at Spring 2010 ACS conference. This workshop illustrates how high-throughput computation and automation can be used with quantum chemistry calculations to solve problems in materials discovery. Examples include catalysts, fuel cells, OLEDs.

TRANSCRIPT

Page 1: Data Pipelining and Workflow Management for Materials Science Applications

Data Pipelining and Workflow Management for Materials

Science Applications

Data Pipelining and Workflow Management for Materials

Science Applications

Dr George Fitzgerald

Dr Mathew Halls

Dr Jacob Gavartin

Dr Gerhard Goldbeck-Wood

Accelrys, Inc.

Page 2: Data Pipelining and Workflow Management for Materials Science Applications

Overview

• Modeling overview

• Workflow automation

• Examples

– PEM Fuel Cell Catalysts

– Lithium Ion Battery Additives

– OLEDs

– Metallocenes

© 2008 Accelrys, Inc. 2

• Evolutionary optimization algorithms

• Summary

Page 3: Data Pipelining and Workflow Management for Materials Science Applications

The Concept of Modeling: Computational Physics and Chemistry

• Computational Physics and Chemistry simulate structures, processes and properties

numerically, based fully or in part on fundamental principles of physics

• Some methods may be used to model not only stable molecules but also short-lived,

unstable intermediates and even transition states.

• Computational Physics and Chemistry are vital adjuncts to experimental studies

© 2008 Accelrys, Inc. 3

• Computational Physics and Chemistry are vital adjuncts to experimental studies

• Roles of modeling today

– Run through many scenarios quickly and easily

– Visualize results and share information

– A common platform for expert and non-expert

Virtual Experiments

Page 4: Data Pipelining and Workflow Management for Materials Science Applications

Issues that simulation can address…

• Reactions, bond formation and breaking

• Miscibility, solubility…

• Diffusion, permeation, membrane transport…

• Adhesion (i.e., interactions with surfaces)

• Crystallization and polymorphism

• Micelle or vesicle formation and properties

Classical

Quantum Mechanics

© 2008 Accelrys, Inc. 4

• Micelle or vesicle formation and properties

• Emulsions, kinetics and properties

• Polymeric microspheres, release profiles

Increasing Size & Complexity

Mesoscale

Page 5: Data Pipelining and Workflow Management for Materials Science Applications

High-Throughput Computation

• Goal:

– Use computation to assist in the rapid discovery of new materials

• Why High-Throughput Computation (HTC)?

– Brute force: screen more materials

– Make life easier: reduce human effort and human error

– Be clever: with enough results you can start to see trends, make broad predictions

© 2008 Accelrys, Inc. 5

• We want to do these calculations as rapidly as possible

• Available tools

– Predict properties from first principles (or derived from first principles)

– Create phenomenological models based on modeling + experiment (QSAR)

– Statistical analysis of experimental and/or computational results: predictive analytics

Page 6: Data Pipelining and Workflow Management for Materials Science Applications

Components of an HTC System

• Good hardware

– Fast chips = less time per calculation

– Many cores = more simultaneous calculations

• Good predictive methods

– Accurate methods like DFT, molecular mechanics, or mesoscale models

– Rapid methods like QSAR: GFA, NN, Recursive partitioning

© 2008 Accelrys, Inc. 6

• Workflow automation tools

– Create complex, multistep calculations

– Manage job submission and analysis

– Create summary of results

– Compare to experiment

Page 7: Data Pipelining and Workflow Management for Materials Science Applications

Automated Chemical Modeling

• Workflow management tools capture complex modeling workflows into an automated

workflow for calculation and analysis of materials systems

• Essential tasks include

– Running simulations (MM, Semiempirical, QM, etc.)

– Manipulation of chemical structures

– Arithmetical manipulation of results

– Integration of multiple data sources (analytical instruments, modeling, publications)

– Statistical analysis of results (QSAR, clustering)

© 2008 Accelrys, Inc. 7

– Statistical analysis of results (QSAR, clustering)

– Reports & graphs

– Pipelining, i.e., using output from one component as input to the next

Page 8: Data Pipelining and Workflow Management for Materials Science Applications

Chemical

Motif

Design

Virtual

Library

Enumeration

Automated

QC

Calculation

Virtual

Materials Identification

of optimum

Materials Discovery and Optimization using Virtual Screening

© 2008 Accelrys, Inc. 8

Materials

Library /

Database

of optimum

leads

Experimental

screeningAnalysis

Page 9: Data Pipelining and Workflow Management for Materials Science Applications

QSAR in the Design of Materials

• Some properties are easy to calculate, e.g.,

– Structure

– Heat of formation

– HOMO-LUMO gap

• But the properties that easy are not always the ones we want

– Corrosion resistance

– Catalyst lifetime

– Tg

QSAR gives us a way to estimate the difficult properties based on the ones that we

© 2008 Accelrys, Inc. 9

• QSAR gives us a way to estimate the difficult properties based on the ones that we

can calculate easily and quickly

• QSAR procedure

– Get experimental results (or accurate computation)

– Compute “descriptors”

– Create a statistical model that can predict the target properties

– Use the model to predict the results for “virtual samples”

• Examples

– Cytotoxic activities of platinum complexes, J. Comput. Aided Mol. Des. 23 (2009) 343.

– Corrosion Inhibitors, Progress in Organic Coatings 61 (2008) 11.

– Metal-organic frameworks for hydrogen storage, Cat. Today 120 (2007) 317.

Page 10: Data Pipelining and Workflow Management for Materials Science Applications

Uses of Workflow Automation

• Programs like Pipeline Pilot provide drag-and-drop method for building workflows

• Some calculations require multiple steps

– IP: ground state optimization + single cation energy

– pKa: vacuum and solvated calculations of protonated and de-protonated species

• Generation of starting structures

– Combinatorial libraries

– Defects

– SurfacesO

OR4

X

X

Z

X

X

X

X

X

ZX X

X

XZ

XX

X

© 2008 Accelrys, Inc. 10

– Surfaces

• Summary and reporting R1

O

R2

R3O

Z

XXX

XX

Z

X

X

X

Z

XX

X = F or H

X z1

Page 11: Data Pipelining and Workflow Management for Materials Science Applications

Simple Workflow Example: Adiabatic & Vertical IP

• Calculating Vertical IP:

– Geometry optimize neutral

– Single-point energy of cation

• Calculating Adaibatic IP:

– Geometry optimize neutral

– Geometry optimize cation

• Workflow simplifies and automates these 3

calculations and presents results in table,

En

erg

y

λ+/-

En

erg

y

λ+/-

© 2008 Accelrys, Inc. 11

calculations and presents results in table,

spreadsheet, database…

Reaction Coordinate

Ma+/- Mb Ma Mb

+/-

Reaction Coordinate

Ma+/- Mb Ma Mb

+/-

Page 12: Data Pipelining and Workflow Management for Materials Science Applications

PEM Fuel Cells Challenges

yelectricitOHHO +→+ 222 22• iCatDesign project used combined theory and

experiment to find new catalysts for oxygen

activation in fuel cells

– Johnson Matthey

– CMR Fuel Cells

– Accelrys

– Co-funded by the UK Technology Strategy Board's

Collaborative Research and Development

programme

• One challenging step is Oxygen Reduction

Reaction (ORR)

© 2008 Accelrys, Inc. 12

Anode:

Cathode:

−++→ eHH 442 2

OHeHO 22 244 →++−+

Reaction (ORR)

• Pt is effective catalyst for activating O2 but too

expensive for large-scale application

– How can we find catalysts that are just as effective

but less expensive?

– High-throughput DFT calculations with CASTEP

• Recently published:

– Gavartin, et al., ECS Transactions 25, 1335-1344

(2009)

Page 13: Data Pipelining and Workflow Management for Materials Science Applications

Adsorption and activation energies: ORR

E

Reaction coordinate

E0=E(O2+*)

E1=E(O2*)

ETS=E(O*-O*)

E2=2E(O*)

© 2008 Accelrys, Inc. 13

Reaction coordinate

iCatDesign

• ORR activity needs the adsorption energy just

right– To loose → no activation

– To tight → no desorption

• Activity would improve if Eads were a bit less

than in pure Pt

• Expansion and contraction of Pt lattice leads to

changes in Eads

Page 14: Data Pipelining and Workflow Management for Materials Science Applications

Reducing Computational Cost

• This work examined alloys of the form A3B, e.g., Pt3Co

• Use 5 layer model with lowest layers fixed

– In 3 layer model, there are 220 unique structures

– For 2xA and 10xB elements > 2,000 calculations

• Need ORR activation for each

– How can we avoid 2,000 DFT TS searches?

© 2008 Accelrys, Inc. 14

• We can estimate activity with Eads

• Observation: d-band center is roughly linear with Eads

• Reduction in computational cost:

– ORR barrier (TS optimization)

– Eads (constrained geometry optimization)

– d-band center

Page 15: Data Pipelining and Workflow Management for Materials Science Applications

Summary of HTC for CASTEP Calculations

• Many low-lying structures for each A3B

– Computation of Eads requires ensemble average

– Automation provides tremendous simplification to this process

• CASTEP Component simplifies and automates setup & analysis of multiple jobs

• Pt3Co identified as lead alloy

• Next steps:

– Submit lead compounds to calculations of Eads

© 2008 Accelrys, Inc. 15

– Submit lead compounds to calculations of Eads

– Submit best results to TS calculations

– Submit best results for experimental screening

– Use computation to validate experimental results

• E.g., confirm experimental structures via Raman

– Use experimental results to refine the QSAR model

Page 16: Data Pipelining and Workflow Management for Materials Science Applications

Lithium Ion Batteries and SEI Film Formation

© 2008 Accelrys, Inc. 16

• The electrolyte typically consists of one or more lithium salts dissolved in

an aprotic solvent with at least one additional functional additive

Page 17: Data Pipelining and Workflow Management for Materials Science Applications

Lithium Ion Batteries and SEI Film Formation

© 2008 Accelrys, Inc. 17

• The electrolyte typically consists of one or more lithium salts dissolved in

an aprotic solvent with at least one additional functional additive

• Additives are included in electrolyte formulations to increase the

dielectric strength and enhance electrode stability by facilitating the

formation of the solid/electrolyte interface (SEI) layer

Page 18: Data Pipelining and Workflow Management for Materials Science Applications

Lithium Ion Batteries and SEI Film Formation

• Initiation step leading to anode SEI formation is electron transfer to theSEI forming species

– Results in decomposition reaction

1 e- decomposition

scheme

© 2008 Accelrys, Inc. 18

– Results in decomposition reaction

– Produces the passivating SEI layer

• Important requirements for electrolyte additives selected to facilitategood SEI formation are:

– Higher reduction potential than the base solvent (low LUMO)

– Maximal reactivity for a given chemical design space (low hardness η)

– Large dipole moment for interaction with Li (high µ)

Page 19: Data Pipelining and Workflow Management for Materials Science Applications

Anode SEI Additive Structure Library

R1

O

R2

O

R3

R4

O

X

X

Z

X

X

X

X

X

Z

XXX

XX

Z

X

X

X

Z

X

XZ

XX

X

X z1

© 2008 Accelrys, Inc. 19

• Cyclic carbonates, related to ethylene carbonate (EC), are often used as

anode SEI additives for use with graphite anodes

• To explore the effect of alkylation or fluorination on EC-based additive

properties an R-Group based enumeration scheme was used to generate a

EC-based additive structure library (7381 stereochemically unique

structures)

XX

X = F or H

X z1

Page 20: Data Pipelining and Workflow Management for Materials Science Applications

Anode SEI Additive Results

© 2008 Accelrys, Inc. 20

• Optimal materials must satisfy a number of objectives

• Multi-objective solutions represent a trade-off between objectives

• One approach is to adopt the “Pareto-optimal” solution

– Set of solutions such that is not possible to improve one property without making any other property worse

– This case: • Minimize the chemical hardness

• Maximize the dipole moment and electron affinity

Page 21: Data Pipelining and Workflow Management for Materials Science Applications

3D View of Pareto Surface

© 2008 Accelrys, Inc. 21

Page 22: Data Pipelining and Workflow Management for Materials Science Applications

Anode SEI Additive Pareto Optimal Candidate

• Optimal materials solutions are systemsthat simultaneously satisfying a numberof target objectives

• Multiobjective solutions represent a trade-off between objectives, with one classbeing Pareto-optimal solutions

• Pareto-optimal solutions are defined as aset of solutions which are non-dominated,

© 2008 Accelrys, Inc. 22

set of solutions which are non-dominated,such that is not possible to improve oneproperty without making any otherproperty worse

• For anode SEI additives optimal solutionsseek to minimize the LUMO energy,maximize the dipole moment andminimize the chemical hardness

• Screening the EC-based additive librarygives structure 1573 as a Pareto-optimalsolution (R1=R2=CH3 and R3=R4=c-C3F5)

1573

Page 23: Data Pipelining and Workflow Management for Materials Science Applications

Organic Light Emitting Diode (OLED) Basics

ITO Glass Substrate

Hole-Transport Layer (HTL)

Electron-Transport Layer (ETL)

Cathode

Simple 2 Layer OLED Device Structure

AlQ3

© 2008 Accelrys, Inc. 23

ITO Glass Substrate

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

NPB

Anode

Cathode

HTL ETL

Page 24: Data Pipelining and Workflow Management for Materials Science Applications

AlQ3 Electron Transport and Emitting Material

• Following Tang and Van

Slyke’s pioneering work1,

AlQ3 has become the

archetype OLED material

• Optoelectronic properties can

be tuned by derivatizing AlQ3 HOMO LUMO

Experimental λmax for Derivatized AlQ3

Materials (Al(QX)3)

© 2008 Accelrys, Inc. 24

be tuned by derivatizing AlQ3

with electron-withdrawing or

electron-donating

substituents

• Al(QX)3 have been

experimentally demonstrating

that R1/R2 substituents affect

the electronic and optical

properties2 1 Tang, C. W.; VanSlyke, S. A. Appl. Phys. Lett. 1987, 51, 913.2 Chen, C. H.; Shi, J. Coord. Chem. Rev. 1998, 171, 161.

Group 1-CH3 2-CH3 2-F 2-Cl 2-CN

∆λmax -10 nm +31 nm +15 nm +10 nm -3 nm

Al(QX)3

Page 25: Data Pipelining and Workflow Management for Materials Science Applications

Virtual Library Enumeration in SES

• Virtual library enumeration has played a major

role in computational drug design

• Similar approaches, using RGroup-based or

Reaction-based, enumeration schemes can be

used to generate virtual libraries of materials

which can be analysed, screened and filtered to

identify and explore:

– Lead material candidates

– Material property trends and SPRs

© 2008 Accelrys, Inc. 25

– Material property trends and SPRs

• The enumeration components in the ‘Chemistry

Component Collection’ on the SES platform

enables automated library generation which can

be store as a file or directly pipelined into an

analysis workflow

• A virtual library of 8436 Al(QX2)3 structures were

generated combining the 6 substituents studied

experimentally over the 2 reaction sites per ligand

on the AlQ3 core

Page 26: Data Pipelining and Workflow Management for Materials Science Applications

Al(QX2)3 Library

© 2008 Accelrys, Inc. 26

8436 Structures

Page 27: Data Pipelining and Workflow Management for Materials Science Applications

OLED Pipelined QC Workflow

• Pipeline employing the using the PM3 Hamiltonian through the VAMP

component was constructed to compute:

– Total Energies– HOMO and LUMO Energies

– Vertical & Adiabatic Ionization Potential (IP)

– Vertical & Adiabatic Electron Affinity (EA)

En

erg

y

λ+/-

En

erg

y

λ+/-

© 2008 Accelrys, Inc. 27

(EA)

• Charge transport through weakly interacting monomeric materials is outer sphere electron transfer and is

described by Marcus theory

• Characteristic Energies were also computed:

– Hole Reorganization Energy (λ+)– Electron Reorganization Energy (λ-)

Reaction Coordinate

Ma+/- Mb Ma Mb

+/-

Reaction Coordinate

Ma+/- Mb Ma Mb

+/-

• A random percent filter was used to

sample the Al(QX2)3 structure library

and >1000 structures were analyzed

through the OLED QC protocol

Page 28: Data Pipelining and Workflow Management for Materials Science Applications

OLED Pipelined QC Workflow Results

© 2008 Accelrys, Inc. 28

Page 29: Data Pipelining and Workflow Management for Materials Science Applications

OLED Pipelined QC Workflow Results

© 2008 Accelrys, Inc. 29

• Al(QX2)3 properties can be tailored through changes in molecular structure

– LUMO energy and Electron Reorganization Energy vary over ranges of ca. 1.25 and 2.25

eV

• Analysis of the Reorg E Difference (Elec Reorg E - Hole Reorg E) shows that changes

in structure can switch the preferred transport from electron to hole

Page 30: Data Pipelining and Workflow Management for Materials Science Applications

OLED Pipelined QC Workflow Results

© 2008 Accelrys, Inc. 30

• Al(QX2)3 library with QC computed properties can be screened for optimal candidates

• Superior ETL OLED materials should be stable and preferentially conduct electrons

• Library can be Pareto sorted to simultaneously minimize the ‘Heat of Formation’ and ‘Electron Reorg E’ to identify lead structures

Page 31: Data Pipelining and Workflow Management for Materials Science Applications

Modeling the Activity of Polymerization Catalysts

• Metallocenes are known as effective catalysts for

polymerization

• Alter ligands for control of

– Activity

– Molecular weight of polymer

– Tacticity of polymer

• QM can predict reliable reaction rates, but…

– Time consuming

– TS difficult to automate

© 2008 Accelrys, Inc. 31

– TS difficult to automate

• How do we make modeling more efficient and

more amenable to automation?

– Develop QSAR models

– Screen many, many structures with QSAR

– Perform time-consuming QM on only the most

promising leads

– Perform experiments on only the best QM results

Metallocene data from Albert J van Reenen,

http://academic.sun.ac.za/UNESCO/Conferences/Conference1999/Lectures1999/VanReenen99/VAN%20REENEN.html

Page 32: Data Pipelining and Workflow Management for Materials Science Applications

Details of QSAR & GFA

• Choice of descriptors:

– “Fast descriptors”

• Topological descriptors

• Information content descriptors

– QM descriptors with VAMP (PM6 or AM1-d)

• Charge on metal atoms

• Fukui index on metal atoms

– Structural

• “Bite angle”

© 2008 Accelrys, Inc. 32

• “Bite angle”

• Choice of compounds

– 31 structures with experimental data�

• Model

– GFA with linear splines

– 6 term equation

Bite angle

���� Metallocene images and data from Albert J van Reenen,

http://academic.sun.ac.za/UNESCO/Conferences/Conference1999/Lectures1999/VanReenen99/VAN%20REENEN.html

Page 33: Data Pipelining and Workflow Management for Materials Science Applications

Genetic Function Algorithm (GFA)

• Genetic function algorithm (GFA) yields analytical models

• GFA finds the best function and fewest descriptors

– It is possible to identify the importance of each descriptor

– Produces a family of results, not just a single equation

• Analytical expression can include:

– Linear terms a * xi

– Quadratic terms a * xi2

– Cross terms a * xi * xj

© 2008 Accelrys, Inc. 33

– Cross terms a * xi * xj

– Splines <xi – a>

• Example:

– Catalyst Activity = -23.4

+ 2.04 * [Treatment Time]

– 0.016 * [Fe2O3%]

+ 0.256 * [PtO %]

– 0.0224 * [Al2O3%] * [Cr2O3 %]

Page 34: Data Pipelining and Workflow Management for Materials Science Applications

GFA Results

• Summary of GFA equations

• Display of predicted vs. actual

© 2008 Accelrys, Inc. 34

Page 35: Data Pipelining and Workflow Management for Materials Science Applications

Using the GFA for Combinatorial Catalysis

• Framework: 4 choices

• Metal: 3 choices

• R1, R2, R3: 6 choices

• Approx 1,300 calculations

• Procedure

– Generate combinatorial library

– Compute descriptors (charges, bite angle, etc.)

© 2008 Accelrys, Inc. 35

– Compute descriptors (charges, bite angle, etc.)

– Use GFA model to predict catalyst performance

– Take best leads and use QM to predict more accurately

• Advantages

– Easier than manual approach

– Faster than doing exact QM TS on everything

– Find trends in the performance of different R groups

Page 36: Data Pipelining and Workflow Management for Materials Science Applications

Evolutionary Optimization

• Genetic Function Algorithm (GFA) produces an analytical expression but

how do we find the extrema?

• Approach 1: Brute force

– Generate the combinatorial grid of data and look for maximum and minimum

– For each molecule compute descriptors then evaluate activity with GFA

– Not a bad approach if you have the CPU resources

• Approach 2: Genetic Algorithm (GA)

– GA can be compared to the evolution of DNA

– An initial population is randomly constructed

© 2008 Accelrys, Inc. 36

– An initial population is randomly constructed

– The “best” individuals are allowed to propagate

– Positive traits passed to next generation

Page 37: Data Pipelining and Workflow Management for Materials Science Applications

Applications of GA to Materials Discovery

• Metallocene catalysts

– Located optimum in ~400 calculations (1,300 possible)

• Battery additives

– Located optimum in ~500 calculations (7,300 possible)

• H storage nanoclusters

© 2008 Accelrys, Inc. 37

• H2 storage nanoclusters

– Dope Mg13 with Li and B

– Total 1,590,000 structures

– Work in progress: predict most stable nanocluster by GA

Page 38: Data Pipelining and Workflow Management for Materials Science Applications

• Framework: 4 choices

• Metal: 3 choices

• R1, R2, R3: 6 choices

• Generate random population of 20 individuals

• Compute descriptors (charges, bite angle, etc.)

• Use GFA model to predict catalyst performance

Metallocene Optimization by GA

© 2008 Accelrys, Inc. 38

• Use GFA model to predict catalyst performance

• Take best results and allow them to evolve

• Advantages

– Automated

– Faster (usually) than exhaustive search

• Disadvantages

– In danger of becoming a ‘black box’

Page 39: Data Pipelining and Workflow Management for Materials Science Applications

Summary

• The generation of virtual structure libraries can be used to explore

materials design space

• Automation and data pipelining are key to HTC

– Eliminate tedium

– Reduce human error

– Allow a greater number of samples to be screened

© 2008 Accelrys, Inc. 39

• Larger number of results brings into play statistical methods for

finding trends

• Approximate methods like QSAR are valuable for reducing the

number of expensive calculations

• Evolutionary algorithms like GA make it possible to automate the

discover process, not just the computational process

Page 40: Data Pipelining and Workflow Management for Materials Science Applications

Acknowledgements

• Collaborator for Li additive project: Ken Tasaki,

– Technology Research Division, Mitsubishi Chemical Inc., Redondo

Beach, CA 90277

• Computational resources for HTC: Hewlett-Packard

• iCatDesign project sponsored by Technology Strategy Board

Project Number: /5/MAT/6/I/H0379C

© 2008 Accelrys, Inc. 40

Project Number: /5/MAT/6/I/H0379C