e-science technologies in the simulation of complex materials l. blanshard, r. tyer, k. kleese s. a....
TRANSCRIPT
e-Science Technologies in the Simulation of Complex Materials
L. Blanshard, R. Tyer, K. Kleese
S. A. French, D. S. Coombes, C. R. A. CatlowB. Butchart, W. Emmerich – CSH. Nowell, S. L. Price – Chem
eMaterials
Combinatorial Computational Catalysis
Polymorphismprediction of prediction of polymorphspolymorphs – – a drug substance may exist a drug substance may exist as two or more crystalline as two or more crystalline phases in which the phases in which the molecules are packed molecules are packed differently. differently.
Acid Sites in Zeolites
explore which sites are involved in explore which sites are involved in catalysiscatalysis – – used in used in diverse diverse industries including petroleum, industries including petroleum, chemical, polymers, chemical, polymers, agrochemicals, and environmental. agrochemicals, and environmental.
N
CH3
NO2
NO2
H
CH3
OH
H
Polymorph Prediction
Different crystal structures of a molecule are called polymorphs.
Polymorphs may have considerably different properties(e.g. bioavailability, solubility, morphology)
Polymorph prediction is of great importance to the pharmaceutical industry where the discovery of a new polymorph during production or storage of a drug may be disastrous
Drug molecules are often flexible and this makes the polymorph prediction process more challenging…
MOLPAK Generation of ~6000 densely packed crystal
structures using rigid molecular probe
DMAREL Lattice energy optimisation
For flexible molecules: conformational optimisation
n feasible rigid molecular probes representing energetically plausible conformers
Data : Unit cell volume, density, lattice energy
Restricted number of structures selected crystal structures and properties stored in
Database
Morphology
n times
n = number of conformers
Polymorph Prediction Workflow
N
CH3
NO2
NO2
H
CH3
OH
H
Blind Test 2004
The Challenge:
Predict the crystal structure of2-methyl-4,5-dinitro-phenyl-acetamide
Wide range of conformers within plausible energy range
8 conformers chosen and used in subsequent searches
Flexibility indicated with arrows
0
5
10
15
20
25
30
35
40
0 100 200 300 400
CCNC Torsion Angle / ˚
En
erg
y D
iffe
ren
ce /
kJm
ol-1
Potential energy surface scan
about the CCNC torsion angle
-130
-110
-90
-70
-50
-30
250 270 290 310 330 350 370 390 410
a
b
c
10
20
-10
-20
-5
Volume / Z (Å3 molecule-1)
Conformer:
Blind Test 2004
Minima in the Lattice Energy for Different ConformationsLa
ttic
e e
nerg
y +
intr
am
ole
cula
r energ
y /
kJm
ol-1
-126
-124
-122
-120
-118
-116
260 265 270 275 280 285 290
a
b
c
10
20
-10
-20
-5
Blind Test 2004
Volume / Z (Å3 molecule-1)
Conformer:
Best 10kJmol-1
Necessary to consider properties of best crystal structures, such as growth rates, to decide which are more likely to be observed
Latt
ice e
nerg
y +
intr
am
ole
cula
r energ
y /
kJm
ol-1
Minima in the Lattice Energy for Different Conformations
Results
Observed crystal structure (revealed upon completion of blind test) – higher energy conformer than those considered!
ObservedPredicted
When just the observed conformer is used as the rigid probe in the search the observed structure is found as global minimum in lattice energy
Summary
High energy gas phase conformers may be stabilised by packing within a lattice in the solid state
As many conformers as possible need to be considered to maximise the chance of predicting crystal structures correctly and exploring the range of structures that are energetically feasible as polymorphs
A fast, distributed e-Science application is being developed, to enable routine crystal structure prediction for large numbers of conformers –this is essential to develop computational methods of predicting possible polymorphs of pharmaceutical molecules
Predicting Morphologies
The shape, or morphology, of a crystal plays an important role in the manufacturing process as there are considerable problems if the morphology changes due to impurities or changes of solvent or when the process is scaled up for high volume manufacture.
An understanding of the factors influencing crystal morphology will help us to understand how the crystallisation process can be controlled through, for example the use of solvents or additives.
• BFDH Theory – based on geometrical factors
• AE Model – based on energetic factors
Scheme for Morphology Calculations
Minimised Structure
Choose faces to study ~15-20
For each face calculate AE
Draw morphology for each crystals set of faces
Calculate relative volume growth rates
From DMAREL minimised structure
BFDH calculation in GDIS
Calculate valid shifts Converge regions (exclude polar)
Wulff plot
New property
The calculated morphology can be visualised using a Wulff plot-where the ratio of surface normal distances of all planes from the centre of the crystal are determined by either the interplanar spacings, attachment or surface energies.
OH NH
O
CH3
Observed and predicted morphology of form 1 of paracetamol
Morphologies
Growth Volume
New property ‘growth volume’- obtained by numerical integration to find the volume within the Wulff shape-gives an indication of whether one face dominates.
0
1
2
3
4
5
6
7
8
9
10
fa37
ak11
am50
cb38 fc2
1aq
34dd
31
am20
ak23
cd49
av32
ca21
am43 ai3
6cb
39de
20ca
28de
40cb
47 ai18
am5
form
II
ca43 ak
7az
5ak
14 fa38
fa29
form
I
ak15
Polymorph-Decreasing Stability
Rel
ativ
e V
olu
me
-30
-25
-20
-15
-10
-5
0
AE
/kJ
mo
l-1
per
mo
lecu
le
Volume
AE
N
Form 1 Z’=4
Many low energy structures, new observed form 2 predicted to grow fast
Pyridine
Prompted expt.search for morepolymorphs
• simulations take too long to run• data are distributed across many sites and systems• no catalogue system• output in legacy text files, different for each program • few tools to access, manage and transfer data• workflow management is manual• licensing within distributed environment
e-Science Issues to Address
1. Expose Fortran binary as distributed Web Service
Fortran binary
XML<x…/>
XSL FO
FO XML
Fortraninput
Fortranoutput
WSDLWSDL
Define an XML interface to the computation
(Web Service Description Language)
To get binary to “talk” in XML: either change Fortran code so input and output uses XML or use parsers and XSLT conversion documents to map from fixed format input/output files to and from XML.
Fortran Web Services
2. Orchestrate Web Serviceswith workflow service
BPEL script
BPEL script
WS wrapped Fortran binary
WS wrapped Fortran binary
Business Process Execution Language
Workflow service is exposed to outside world as a web service
Distributed Workflow
CH4
CH4
CH4
CH4
Fortran programs, use lots of different formats to represent the same thing.
Data Representation
CML<CH4…/>
CML<CH4…/>
Since we provide new WSDL interfaces for each application we have a perfect opportunity to employ a standard representation for chemical structures. XML standard in Chemistry is CML (Chemical Markup Language)
Data Representation
Development of chemical markup language (CML) as a system for handling complex chemical content. P. Murray-Rust, New Journal of Chemistry, 2001, 25, 618-634.
(BPEL)workflow
Integration with Existing Infrastructure
Prototype has been successfully deployed.
Sun Grid
Engine(BPEL)workflow
Existing grid infrastructure does not integrate easily with web services.
Policy on compute clusters enforced by Sun Grid Engine batch system
Other users of clusters submit jobs via this control software
Building a WSDL binding over the Sun Grid Engine protocol is difficult
Smooth transition from existing infrastructure to WS riskier than thought.
Integration with Existing Infrastructure
• file storage at CCLRC• distributed file access via Storage Resource Broker
(SDSC)• catalogue of files using metadata in relational database• web interface to metadata and files via Data Portal
• metadata editor through browser
Data Management at CCLRC
Store data files from simulations in the Store data files from simulations in the Storage Resource BrokerStorage Resource Broker
Storage Resource Broker
Search for studies in material sciences and download Search for studies in material sciences and download associated data using theassociated data using the -- CCLRC Data PortalCCLRC Data Portal
Data Portal
• upload files as part of workflow to SRB• generate metadata• upload extracted data from files
Ongoing and Future Work
Acid Sites in Zeolites
•Determine the extra framework cation position within the zeolite framework.
•Explore which proton sites are involved in catalysis and then characterise the active sites.
•To produce a database with structural models and associated vibrational modes for Si/Al ratios.
•Improve understanding of the role of the Si/Al ratio in zeolite chemistry.
A combined MC and EM approach has been developed to model zeolitic materials with low and medium Si/Al ratios. Firstly Al is inserted into a siliceous unit cell and then a charge compensating cation.
The zeolite Mordenite, which has a 1 dimensional channel system, has been studied with a simulation cell containing two unit cells, which means 296 atoms, with 96 Si centres (referred to as T sites).
MC/EM
0-12085
-12083
-12081
-12079
-12077
-12075
-12073
-12071
-12069
-12067
-12065
ConfigurationsT
ota
l E
ner
gy
(eV
)
5350
5370
5390
5410
5430
5450
5470
5490
5510
5530
5550
Cel
l V
ol.
full_TE
full_Vol
5 per. Mov. Avg. (full_TE)
5 per. Mov. Avg. (full_Vol)
It can be seen that there are two distinct regions, -12079eV to -12076eV and -12075eV to -12073eV, but there is no obvious correlation between total energy and cell volume.
100
100 Configurations
-12090
-12085
-12080
-12075
-12070
-12065
configurationsT
E
5350
5400
5450
5500
5550
VO
L
TE
VOL
200 per. Mov. Avg. (TE)
200 per. Mov. Avg. (VOL)
However, when 10,000 structures are considered it is clear that the most stable structures correspond to cation placements that do not cause the cell to expand. This requires that the cations sit in the large channel.
0 10000
10000 Configurations
Comparison of Regions
-12079.5eV -12075.04eV
When confirmed the lowest energy positions of Al the cation is exchanged for a proton and again energy minimised.
This method will allow us to construct realistic models of low and medium Si/Al zeolites. Such structures can be used for further simulations and aid the interpretation of experimental data.
What Next
Extensive use of Condor pools (UCL – 950 nodes in teaching pools). 48 cpu-years of previously unused compute resource have been utilised in this study. Close collaboration with the NERC e-minerals project has allowed access to this resource.
50,000 calculations have been performed each with 488 particles per simulation box, which means a total of 24,000,000 particles have been included in our simulations to date.
Condor
1. First use of CML schema for defining Web Service port types.2. Calculation of 50,000 configurations of zeolite Mordenite (24,000,000 particles) to gain insight into structure when a realistic ratio of Al substitution is included in model.3. Successfully exposed Fortran codes as OGSI Web Services - prototype application deployed on 80 nodes. The prototype computational polymorph application is being ported to a larger production machine.4. First use of BPEL standard for orchestrating web services in a Grid application.5. Open Source BPEL implementation in development enabling late binding and dynamic deployment of large computational processes.6. Integration of OGSI and BPEL with Sun Grid Engine.7. Development of Graphic User Interface for polymorph application - connects to relational database via EJB interface.8. Infrastructure for metadata and data management9. SRB and dataportal are already being used to hold datasets and being used for transferring the data between different scientists and computer applications.10. Implementation of Condor pool at Ri.
Achievements To Date
We are now doing science that was not possible before the advancements made within e-Science.
Key Achievement