llnl-pres-679957 this work was performed under the auspices of the u.s. department of energy by...
TRANSCRIPT
LLNL-PRES-679957This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
Web Services Processing, Application Programming Interface
Charles DoutriauxSasha Ames
Tom MaxwellDan Duffy
Dean WilliamsDecember 9th, 2015
LLNL-PRES-6799572
Overview
As computer power goes up, so does data Volume
Scientists generating and analyzing these data are many and dispersed
BUT the scientific need is greater than ever
ESGF solved the first part of the equation: universal, distributed access
Bringing all needed data to your computer or even to your facility is no longer feasible
Now we need to solve the analysis part.
LLNL-PRES-67995733
The ESGF-CWT is putting together an infrastructure for WPS This talk is about the API part API is two fold:
— Developers: Common ground for creating new tools— Users: Standard way to querying/using resources
Goal: Ease things as much as possible for user, i.e. — What services are here?— Can I get their doc?— Let’s use it
As much decision as possible made for the user (but we still let these to be known to and forced by the user)
ESGF-CWT Solution
LLNL-PRES-67995744
Basic Architecture
Server Side
Services
Client Side
ESGF API
LLNL-PRES-67995755
Documented at: https://acme-climate.atlassian.net/wiki/display/ESGF/API+Standards+and+Requirements
First pass, will likely be tweaked/enhanced as more developers and users get involved
Focusing on JSON input data. First problems we’re trying to solve:
— Model Average— Model Ensemble— Multi-models Ensemble
Cater very basic needs so far, needs to grow as more features are required. Hint: That’s YOU here.
API?
LLNL-PRES-67995766
API (excerpts)
http://aims2.llnl.gov:8000/wps/?version=1.0.0&service=wps&request=Execute&identifier=averager&datainputs=[domain={'id':'glbl','longitude’:{'start':%20-180.0,%20'end':%20180.0},'time’:{'start’:'1980’,'end’:'1982'}};variable={'uri':'file://opt/nfs/cwt/uvcdat/latest/share/uvcdat/sample_data/tas_dnm-95a.xml','id':'tas','domain':'glbl'}]
LLNL-PRES-67995777
http://aims2.llnl.gov:8000 VERY BASIC Demo serve— Django-based— Uses UV-CDAT for computation
Will probably grow into a real full blown pretty server Code is at: https://github.com/ESGF/wps_cwt please fork and
issue as many PR as possible and/or use issue tracker to give us feedbacks.
Also take a look at what others presenting here have already done. Let’s try to leverage from each other.
Where do I start?
LLNL-PRES-67995788
Example? (stick this in “process” directory of server)
class Process(esgfcwtProcess): def __init__(self): """Process initialization""" WPSProcess.__init__(self, identifier=os.path.split(__file__)[-1].split('.')[0], title='averager', version=0.1, abstract='Average a variable over a (many) dimension', storeSupported='true', statusSupported='true') self.domain = self.addComplexInput(identifier='domain', title='domain over which to average', formats=[{'mimeType': 'text/json', 'encoding': 'utf-8', 'schema': None}]) self.dataIn = self.addComplexInput(identifier='variable', title='variable to average', formats=[{'mimeType': 'text/json'}], minOccurs=1, maxOccurs=1) self.download = self.addLiteralInput(identifier='download', type=bool, title='download output', default=False) self.average = self.addComplexOutput(identifier='average', title='averaged variable', formats=[{'mimeType': 'text/json'}])
def execute(self): dataIn=self.loadData()[0] data,cdms2keyargs = self.loadVariable(dataIn) dims = "".join(["(%s)" % x for x in cdms2keyargs.keys()]) data = cdutil.averager(data,axis=dims) data.id=self.getVariableName(dataIn) self.saveVariable(data,self.average,"json") return
LLNL-PRES-67995799
No. The API is designed to be backend agnostic But:
— ESGF-CWT will use UV-CDAT where appropriate— UV-CDAT will be officially supported and will be part of the “compute
node stack”— No, your preferred application is not guaranteed to be fully supported
and/or part of the esgf stack
Yes the API team will listen to you even if you do not use UV-CDAT
But really… You “should” be using it ;) It’s so much simpler and it makes sense to have everybody using the same tools
Do I have to use UV-CDAT?
LLNL-PRES-679957101
0
Tom Maxwell -> NASA Maarten Plieger -> sort of ACME
Anybody is using this?
LLNL-PRES-679957111
1
LOTS! Tighter integration with ESGF
— result search as URI?— esgf:// new uri type?
Testing!— We need some basic dataset to run tests on— We need a mechanism to document “correct” solution to a problem
Once this is in place we can move to distributed analysis— Which nodes carry my diagnostic?— Which one should I use? (is it close to my data, is it overloaded, etc…)— Resource management
Multiple implementation of same diagnostics:— MPI vs SLURM vs MPI+SLURM vs HADOOP vs SPARK, vs combinations, etc…
• Which one to trust• which one is faster for me?
So… What’s next?
LLNL-PRES-679957121
2
Still in its infancy but crystalizing The time is NOW, the more you wait the harder it will be to get
your voice heard.
Summary