[email protected] - python & grid (19dec02 - trillium @ caltech) python scripting and grid user...

24
[email protected] - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for the Grid December 19, 2003 Caltech - Pasadena, CA

Upload: edgar-leonard

Post on 11-Jan-2016

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

Python Scripting and Grid User Environments

Craig E. Tull

Trillium Analysis Environment for the Grid

December 19, 2003

Caltech - Pasadena, CA

Page 2: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

Scripting Language Features

• typeless/not strongly typed—simplify connections—simpler syntax—easier/faster to learn (enough)

• error checking at the last possible moment—user interactivity

• interpreted—immediate feedback (eg. data exploration)

• many instructions per line (higher level)—complex tasks in few lines of code

• programmers write same # of LOC per year

—many details are handled automatically

Page 3: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

Why Python?

• Easy to learn/read high-level scripting language— Very little syntax

• A large collection of modules to support common operations, e.g., networking, http, smtp, ldap, XML, Web Services, etc.

• Excellent for “gluing” together existing codes— Many automated tools for interfacing with

C/C++/Fortran• Support for platform independent GUI

components• Runs on all popular OS’s, e.g., UNIX, Win32,

MacOS, etc.• Support for Grid programming with pyGlobus,

PyNWS, etc.

Page 4: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

pyGMA

• LBNL - Data Intensive Distributed Computing Research Group (DIDC) - Dan Gunter

• an implementation of the Grid Monitoring Architecture (GMA) Producer, Consumer, and Directory Service Web-Services SOAP interfaces in Python

• uses the ZSI SOAP library to aid with serialization and deserialization of messages

• a framework that handles the SOAP communications between the monitoring components defined by the GGF

• the target "user" of pyGMA is a developer that wants to connect existing or newly created monitoring components into a GMA-compatible framework

Page 5: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

PEG - Python Extensions for the Grid

• UCSD GRAIL (Grid Research and Innovation Lab)• PyNWS

—interface to the Network Weather Service (NWS) API library.

• routines for accessing the resource monitoring and forecasting services provided by NWS nameserver, memory, sensor, and forecaster processes.

—contains an extension module (nws) which provides functions that closely correspond to the defined NWS API for C/C++, as well as a Python module (nws_class) that provides higher-level support for performing some common NWS activities

Page 6: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

Athena/Gaudi & Python

• Python version of Athena/ Gaudi JobOptions.txt has existed for some time.

• The Athena Startup Kit (ASK)

• Integrate Athena, Atlas releases, and CMT for an improved end-user experience

• Built on top of existing tools, not a replacement, it automates tasks otherwise left to the user

• Both GUI (simple) and CLI (powerful)

• Contains workarounds for broken releases (ugly!)

In addition: make Athena more interactive

=> GaudiPython / GANGA

Page 7: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

pyRoot Motivation

• Pere Mato - ROOT 2002 Workshop• Be able to use any ROOT class from Python in a

generic way.— Without the need of wrapping each class— Using the ROOT object dictionary information

• Facilitate access of ROOT files and other facilities from non-ROOT applications

• Proof-of-concept that Python can be viewed as Software Bus— In analogy to a “hardware bus” where you

can plug a variety of modules and interface adaptors to other buses.

Page 8: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

pyRoot Design

TClassTClassWrap

TMethodWrap

TObjectWrap

TCArrayWrap

TObject

TMethod

RootModule

CIN

T

Boost

.Pyth

on

OtherROOT

LibrariesTROOTWrap

Pythoninterpreter

Page 9: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

C:\> python...>>> from rootmodule import *>>> f1 = TF1('func1','sin(x)/x',0,10)>>> f1.Eval(3)0.047040002686622402>>> f1.Derivative(3)-0.34567505667199266>>> f1.Integral(0,3)1.8486525279994681>>> f1.Draw() <TCanvas::MakeDefCanvas>: created default TCanvas with name c1

Example - Trivial Root

• No much difference between CINT and Python !

Page 10: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

Example - ROOT + Excel

Filling an Excel spreadsheet from a ROOT ntuple

# Get the ntuple from the ROOT fileimport rootmodulehfile = rootmodule.TFile('hsimple.root')ntuple = rootmodule.gROOT.FindObject('ntuple')entries = ntuple.GetEntries()nvar = ntuple.GetNvar()tuple = ntuple.GetArgs()# Initialize Excelimport win32com.clientexcel = win32com.client.Dispatch('Excel.Application')wbook = excel.Workbooks.Add()wsheet = wbook.WorkSheets.Add()wsheet.Name = ntuple.GetTitle()# Fill Excel sheetfor i in xrange(500) : ntuple.GetEntry(i) for j in range(nvar) : wsheet.Cells(i+1,j+1).value = tuple[j]# Make Excel sheet visibleexcel.Visible = 1

Page 11: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

pyGlobus Overview

• The Python CoG Kit provides a mapping between Python and the Globus Toolkit™. It extends the use of Globus by enabling access to advanced Python features such as events and objects for Grid programming.

• Hides much of the complexity of Grid programming behind simple object-oriented interfaces.

• The Python CoG Kit is implemented as a series of Python extension modules that wrap the Globus C code.

• Provides a complete interface to GT2.0.• Uses SWIG (http://www.swig.org) to help

generate the interfaces.

Page 12: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

Scripting/Adaptation Layers

Native LangComponent

Shadow Class

Presentation

Usability

Task-based

Application

Component written in native program-ming language (C, C++, etc).eg. globus_ftp_client, gram_client, …

1 to 1 mapping (eg. via SWIG)

map onto Python concepts/constructs

apply the 80/20 rule for defaultsto narrow interface

aggregate componentsfor a common task

combine tasksin an application

Pyt

hon

• Adaptation laying for pyGlobus shows excellent decomposition.

Page 13: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

OGSI Plans for pyGlobus

• Develop a full OGSI implementation in Python— Planned alpha release of an OGSI client by

the end of August— OGSI hosting environment based on

WebWare (http://webware.sourceforge.net/)• Dynamic web service invocation framework

— Similar to WSIF (Web Services Invocation Framework) from IBM for Java• http://www.alphaworks.ibm.com/tech/wsif

— Download and parse WSDL document, create request on the fly

— Support for multiple protocol bindings to WSDL portTypes

Page 14: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

PyGlobus Status and Plans

• Users:—AccessGrid - Being rewritten using pyGlobus.—LIGO - Laser Interferometer Gravitational Wave Observatory

—CAS - Community Authorization Service—NCAR - National Center for Atmospheric

Research• Current Work - Keith Jackson

—~5-6 developers actively involved—GT3/OGSI port underway—pyGlobus will be part of GT3 distribution—Looking for feedback on "Usability" and

"Task" layers and on Framework.

Page 15: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

GANGA Motivation

• ATLAS and LHCb develop applications within a common framework: Gaudi/Athena

• Both collaborations aim to exploit potential of Grid for large-scale, data-intensive distributed computing

• ATLAS and LHCb develop applications within a common framework: Gaudi/Athena

• Both collaborations aim to exploit potential of Grid for large-scale, data-intensive distributed computing

Simplify management of analysis and production jobs for end-user physicists by developing tool for accessing Grid services with built-in knowledge of how Gaudi/Athena works:

Gaudi/Athena and Grid Alliance (GANGA)

Simplify management of analysis and production jobs for end-user physicists by developing tool for accessing Grid services with built-in knowledge of how Gaudi/Athena works:

Gaudi/Athena and Grid Alliance (GANGA)

Page 16: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

Athena/GAUDI Architecture

Converter

Algorithm

Event DataService

PersistencyService

DataFiles

AlgorithmAlgorithm

Transient Event Store

Detec. DataService

PersistencyService

DataFiles

Transient Detector

Store

MessageService

JobOptionsService

Particle Prop.Service

OtherServices

HistogramService

PersistencyService

DataFiles

TransientHistogram

Store

ApplicationManager

ConverterConverter

Page 17: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

Interfacing to the GRID

• GANGA: Gaudi/Athena and Grid Alliance

— First ideas for GANGA were presented by P.Mato and C.Tull in summer 2001

— Joint ATLAS/LHCb GridPP proposal— 2 funded FTEs

• Karl Harrison• Alexander Soroko

— May 2002 - Cosners' House GridPP Meeting

— Technology Survey• Grappa, Genius, AliEn, Slice

— Atlas/LHCb design team, including US representatives GAUDI / Athena

GANGAGU

I

JobOptionsAlgorithms

GRIDServices

HistogramsMonitoringResults

Interfacing GAUDI with GRID - P.Mato

API

API

Page 18: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

Rule #1: Protect the User

—Real Data vs. Virtual Data—LFN vs. PFN/TFN/SFN—Grid Enabled vs. Standalone—LSF/PBS/Condor

• We do not want the user of the Framework to know or care about details like this.—Implies: Uniform, abstract access to/specification

of data sets (ie. if Real and Virtual Data are to be used).

—Non-Grid implementations of Grid-enabled Services?

—Grid & Non-grid concepts must merge at UI.

Page 19: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

Interfacing to the Grid

Job class Job class GANGA Core module

GANGA Core module

Job Handler class

Job Handler class

XML RPCXML RPC

Data management

service

Data management

service

Job submissionJob submission Job monitoring Job monitoring Security serviceSecurity service

dg-job-list-match

dg-job-submit

dg-job-cancel

dg-job-list-match

dg-job-submit

dg-job-cancel

grid-proxy-init

MyProxy ?

GSI ?

grid-proxy-init

MyProxy ?

GSI ?

dg-job-status

dg-job-get-logging-info

GRM/PROVE

dg-job-status

dg-job-get-logging-info

GRM/PROVE

edg-replica-manager

dg-job-get-output

globus-url-copy

GDMP?

edg-replica-manager

dg-job-get-output

globus-url-copy

GDMP?EDG UI

Page 20: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

Ganga Prototyping

Embedded Python

interpreter

Tree of user

jobs

Job optionsfor selected

job

Page 21: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

Ganga Prototyping (current state)

• GUI is created using wxPython extension module• Access to the Gaudi Job Configuration DB is implemented with the

xmlrpclib module• User can browse and create Job Options files using this DB• Serialization of objects (user jobs) is implemented with the Python

pickle module• Python interpreter is embedded into the GUI and allows user to

configure interface from the command line• GRID stuff is under development at the moment and is oriented on

EDG testbed 1.2

Page 22: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

Conclusion

• Python Common themes—Naitive Lang for main functionality/peformance—Scripting as Glue (ala Stallman)—Fast Prototyping—Easily layered GUI

• wxWindows, tkinter

—Adaptation Layering—Ease of adapting "Legacy" code

• Python is proving its promise as a fast, effective, object-oriented scripting language.

Page 23: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

Conclusion

• Other languages (Perl, Ruby, etc) and approaches (Grid Portals) exist can can/do work.

• But Python 1st choice for many in our field:—Middleware: pyGlobus, pyGMA, pyNWS—Physics: Athena/Gaudi, LCG, GANGA, LIGO

• Layering is crucial for coherent application• Wrapping, gluing, and building from components

a natural use of Python and a real boon in code reuse.

Page 24: CETull@lbl.gov - Python & Grid (19dec02 - Trillium @ Caltech) Python Scripting and Grid User Environments Craig E. Tull Trillium Analysis Environment for

[email protected] - Python & Grid (19dec02 - Trillium @ Caltech)

Scripting/Adaptation Layers

Native LangComponent

Shadow Class

Presentation

Usability

Task-based

Application

Component written in native program-ming language (C, C++, etc).eg. globus_ftp_client, gram_client, …

1 to 1 mapping (eg. via SWIG)

map onto Python concepts/constructs

apply the 80/20 rule for defaultsto narrow interface

aggregate componentsfor a common task

combine tasksin an application

Pyt

hon

• Adaptation laying for pyGlobus shows excellent decomposition.