community accessible datastore of high-throughput ...datasys.cs.iit.edu/events/mtags12/s05.pdfof...

Post on 27-Jun-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Community Accessible Datastore of High-Throughput Calculations: Experiences from the Materials

Project Dan Gunter, Shreyas Cholia, Anubhav Jain, Michael Kocher, Kristin Persson,

Lavanya Ramakrishnan

Shyue Ping Ong, Gerbrand Ceder

BACKGROUND

November  12,  2012   Slide  1  

November  12,  2012   2  

Our energy future relies on the rapid development of novel functional materials.

But it takes almost twenty years to develop new materials. How can we do it faster?

Solar cells, advanced batteries, TCOs, and fuel cells will all play a role in our energy future.

Materials Genome Initiative

November  12,  2012   3  

June  2011:  Materials  Genome  Ini/a/ve  which  aims  to  “fund  computa(onal  tools,  so-ware,  new  methods  for  material  characteriza2on,  and  the  development  of  open  standards  and  databases  that  will  make  the  process  of  discovery  and  development  of  advanced  materials  faster,  less  expensive,  and  more  predictable”  

Source:  "Materials  Genome  IniBaBve  for  Global  CompeBBveness"  hFp://www.whitehouse.gov/sites/default/files/microsites/ostp/materials_genome_iniBaBve-­‐final.pdf  

It's the , stupid!

November  12,  2012  

Really  hard  work  on  some  computaBons  

FantasBc  paper  in  a  journal  

Really  hard  work  on  some  computaBons  

FantasBc  paper  in  a  journal  

Black  Hole  

data

data

Drink  margaritas  

FantasBc  paper  in  a  journal  

DB  

data

Brilliant  analysis  

Brilliant  analysis  

Brilliant  analysis  

Escape  velocity?  

data data

data

data

Very specialized skill-set

November  12,  2012   5  

Physics  

Deep  dive  on  specific  soYware  

Computer  Science  

Really hard work on

some computations

Example

November  12,  2012   6  

Predicted and measured performance of of Li9V3(P2O7)3(PO4)2 during cell cycling.

The Materials Project used quantum chemistry calculations to screen over 20,000 materials as potential cathodes for Li ion batteries. From the results, three new materials were identified, tested, and currently have patents pending.

COMPONENTS

November  12,  2012   7  

November  12,  2012   8  

Parallel computation

Parallel HPC resources

Datastore

Data dissemination

Collaborative toolsWeb

server

Analysis library

Science apps

Data V&V

Midrange compute resources

Workflow

HPC storage

Data

Data analytics

NoSQL Datastore

November  12,  2012   9  

Powerful but simple query language Ease of administration Good performance on read-heavy workloads where most of the data can fit into memory. Poor performance at huge scale Bad for write-heavy workloads

FireWorks workflow engine

November  12,  2012   10  

Programmability. Scripting, not GUIs and DSL’s. Administration overhead. No extra servers. Flexibility. DB support, reconfiguring running workflows.

Re-runs Detours Duplicates Iteration

Why?!

Dissemination with REST

November  12,  2012   11  

https://www.materialsproject.org/rest/v1/materials/Fe2O3/vasp/energyPreamble Version Application I.D. Datatype Property

Web UI

November  12,  2012   12  

3-D model of unit cell

Disqus comment

button

Detailed structure

X-ray diffraction

pattern (interactive)

Bandstructure and Density of

states (interactive)

Calculation iterations

Comments

November  12,  2012   13  

WE'RE DOING IT WRONG?

Running on HPC

•  Batch queues and large numbers of jobs with unpredictable runtimes

•  Talking to the database

November  12,  2012   14  

Data analytics

•  Scaling community contributions to code •  Scaling analytic functions

November  12,  2012   15  

Data V&V

•  Loading new data into a production resource •  Constant validation and verification

November  12,  2012   16  

Data dissemination

•  Security and privacy •  Query performance

November  12,  2012   17  

FUTURE WORK

November  12,  2012   18  

Opening up data access

November  12,  2012   19  

November  12,  2012   20  

Compute properties

Stability and

synthesis

Materials Project Source

ideas

User sandboxes

MP Workflow

(b)

(a)

(c)

(d)

(e)

pymatgen

MP

datastore

(f)

Towards materials design

Questions?

November  12,  2012   21  

top related