egee project and middleware overview
DESCRIPTION
CYCLOPS. EGEE Project and Middleware Overview. Marco Verlato. CYCLOPS Second Training Workshop 5-7 May 2008 Chania, Greece. Outline. Introduction The EGEE project Infrastructure Applications Operations and Support The EGEE Middleware: gLite Grid access services Security services - PowerPoint PPT PresentationTRANSCRIPT
EGEE Project and Middleware Overview
Marco Verlato
CYCLOPS Second Training Workshop
5-7 May 2008
Chania, Greece
Outline
Introduction The EGEE project
– Infrastructure– Applications– Operations and Support
The EGEE Middleware: gLite– Grid access services– Security services– Information & Monitoring services– Data Management services– Job Management services
Further information
What is a Grid?
“A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities”
Ian Foster -- Carl Kesselman, 1998
“A grid is a combination of networked resources and the corresponding middleware, which provides services for the user”
Erwin Laure, EGEE T.D., ISSGC2007
The users of a Grid are divided into Virtual Organisations (VOs), abstract entities grouping users, institutions and resources, e.g.: the 4 LHC experiments, the community of biomedical researchers, etc
What is a Grid?
It relies on advanced software, called middleware
Middleware automatically finds the data the scientist needs, and the computing power to analyse it
Middleware balances the load on different resources. It also handles security, accounting, monitoring and much more
Enabling Grid for E-sciencE project
ArcheologyAstronomyAstrophysicsCivil ProtectionComp. ChemistryEarth SciencesFinanceFusionGeophysicsHigh Energy PhysicsLife SciencesMultimediaMaterial Sciences…
>250 sites48 countries>50,000 CPUs>20 PetaBytes>10,000 users>150 VOs>150,000 jobs/day
Flagship Grid infrastructure project co-funded by the European Commission starting from April 2004Entering now in the 3° phase
Disciplines and users
Astrophysics and astroparticle physics Biomedical and bioinformatics information Computational chemistry Othersargo libi enmr.eu aegis inaf bio trgrida apesci pamela biomed compchem astron astro.vo.eu-egee.org embrace gaussian cesga planck enea virgo High Energy Physics Infrastructure grid-it magic calice edteam gridmosi.ici.ro auger hone euindia lights.infn.it
ific ops ncf Earth sciences ildg pvier vo.agata.org trgridc minos.vo.gridpp.ac.uk rdteam vo.ipno.in2p3.fr esr pheno rgstest vo.northgrid.ac.uk
supernemo.vo.eu-egee.org swetest webcom Geophysics vo.lal.in2p3.fr vo.deploymenttest.cea.fr geant4 egeode vo.llr.in2p3.fr vo.e-ca.es imath.cesga.es
vo.lpnhe.in2p3.fr vo.grif.fr proactive Finance vo.sbg.in2p3.fr infngrid cosmo egrid hermes eela crypto.swing-grid.ch
vo.dapnia.cea.fr eumed diligent Fusion alice dteam cyclops fusion atlas vo.plgrid.pl geclipse
babar balticgrid gridcc belle dech cdf see cms seegrid dzero twgrid gridpp trgrida/b/c/d/eilc voce lhcb na48 zeus ghep desy
http://cic.gridops.org/index.php?section=home&page=volist
~8000 users listed in
registered VOs Digital libraries, disaster
recovery, computational sciences, etc.
Types of applications
Simulation– LHC Monte Carlo simulations; Fusion; WISDOM – Jobs needing significant processing power; Large number of
independent jobs; limited input data; significant output data Bulk Processing
– HEP ; Processing of satellite data– Distributed input data; Large amount of input and output data;
Job management (WMS); Metadata services; complex data structures
Parallel Jobs– Climate models, computational chemistry– Large number of independent but communicating jobs; Need for
simultaneous access to large number of CPUs; MPI libraries Short-response delays
– Prototyping new applications; grid Monitoring grid; Interactivity – Limited input & output data; processing needs but fast
response and quality of service Workflow
– Medical imaging; flood analysis– Complex analysis algorithms; complex dependencies between
jobs Commercial Applications
– Non-open source software; Geocluster (seismic platform); FlexX (molecular docking); Matlab, Mathematics; Idl, …
– License server associated to an application deployment model
Chambres à muons
Calorimètre
Trajectographe
-
High Energy Physics Applications
pp @ √s=14 TeVL : 1034/cm2/s
L: 2.1032 /cm2/s
2,5 million collisions per secondLVL1: 10 KHz, LVL3: 50-100 Hz25 MB/sec digitized recording
40 million collisions per secondLVL1: 1 kHz, LVL3: 100 Hz0.1 to 1 GB/sec digitized recording
In silico drug discovery
Diseases such as HIV/AIDS, SRAS, Bird Flu etc. are a threat to public health due to world wide exchanges and circulation of persons
Grids open new perspectives to in silico drug discovery– Reduced cost and adding an accelerating factor in the search for new drugs
•Avian influenza:
•bird casualties
International collaboration is required for: • Early detection
• Epidemiological watch
• Prevention
• Search for new drugs
• Search for vaccines
Wide In Silico Docking On Malaria
http://wisdom.healthgrid.org/
Earth Sciences Applications
ESA, UTV(IT), ESA, UTV(IT), KNMI(NL), IPSL(FR)- KNMI(NL), IPSL(FR)- Production and Production and validation of 7 years of validation of 7 years of Ozone profiles from Ozone profiles from GOMEGOME
Rapid Earthquake Rapid Earthquake analysis analysis (mechanism and (mechanism and epicenter) epicenter) 50- 100CPUs 50- 100CPUs IPGP(FR)IPGP(FR)
Modelling seawater Modelling seawater intrusion in costal intrusion in costal aquifer (SWIMED) aquifer (SWIMED) CRS4(IT),INAT(TU),CRS4(IT),INAT(TU),Univ.Neuchâtel(CH)-Univ.Neuchâtel(CH)-
Geocluster for Geocluster for Academy and Academy and industry CGG(FR)-industry CGG(FR)-
Flood of a Danube river-Flood of a Danube river-Cascade of models Cascade of models (meteorology,hydraulic ,(meteorology,hydraulic ,hydrodynamic….) hydrodynamic….) UISAV(SK)-UISAV(SK)-
Specfem3D: Specfem3D: Seismic Seismic application. application. Benchmark for Benchmark for MPI (2 to 2000 MPI (2 to 2000 CPUs) (IPGP,FR)CPUs) (IPGP,FR)
DKRZ(DE)- Data access DKRZ(DE)- Data access studies, climate impacts on studies, climate impacts on agricultureagriculture
Data mining Data mining Meteorology & Meteorology & Space Weather Space Weather (GCRAS, RU)(GCRAS, RU)
Air Pollution Air Pollution model- BAS(BG)model- BAS(BG)Mars atmosphere CETP(
FR):
EGEE workload in 2007
CPU: 114 Million hours
Data:
25Pb stored
11Pb transferred
Estimated cost if performed with Amazon’s EC2 and S3: € 47,486,548http://gridview.cern.ch/GRIDVIEW/same_index.php http://calculator.s3.amazonaws.com/calc5.html?
16%
82%
2%
storage
CPU
Xfer
EGEE-II to EGEE-III
EGEE-III– To be co-funded under European Commission call INFRA-2007-1.2.3– 32M€ EC funds compared to 36M€ for EGEE-II
Key objectives– Expand/optimise existing EGEE infrastructure, include more resources and user
communities– Prepare migration from a project-based model to a sustainable federated
infrastructure based on National Grid Initiatives 2 year period – May 2008 to April 2010
– No gap between EGEE-II and EGEE-III (1 month extension to EGEE-II) Similar consortium
– Now structured on a national basis (National Grid Initiatives/Joint Research Units)
Networking activities Specific Service Activities
NA1: Management SA1: Operations
NA2: Dissemination SA2: Networking Support
NA3: Training SA3: Integ., testing & Cert.
NA4: Applications Joint Research Activities
NA5: Inter. Coop. & Policy JRA1: Middleware engineering
European Grid Initiative (EGI)
Need to prepare permanent, common Grid infrastructure Ensure the long-term sustainability of the European e-Infrastructure
independent of short project funding cycles Coordinate the integration and interaction between National Grid
Infrastructures (NGIs) Operate the production Grid infrastructure on a European level for a wide
range of scientific disciplines
Must be no gap in the support of the
production grid
EGEE operations
Operations Coord. Centre (OCC)
- management, oversight of all operational and support activities
Regional OperationsCentres (ROC)
- providing the core of the support infrastructure, each supporting a number of resource centres within its region
Resource Centres (RC)
- providing resources
(computing, storage, network…)
- At FZK, coordination and management of user support, single point of contact for users
Monitoring Visualization
16
The EGEE support infrastructure
•RC A
•RC B
•RC C
•RC A
•RC B
•RC C•ROC C•ROC BROC N
RC A
RC B
RC C
TPM
VO TPM CVO TPM B
VO TPM A
GGUS
Central
System
Middleware
supportMiddleware
supportMiddleware
support
Deployment
supportMiddleware
supportDeployment
support
VO Support
CVO Support
BVO Support
A
Middleware
supportMiddleware
supportMiddleware
support
•ROC C•ROC BROC N
Network Support
Network Support Other GridsOther GridsOther Grids
Other GridsOther GridsOther Grids
CODCIC
Portal
•20 sites in 3 continents•> 11000 certificates issued, >20% renewed at least once•> 250 courses, training events, official university curricula•> 2,000,000 hits on the web site from >100 different countries •> 4.5 TB of training material downloaded from the web site
The GILDA t-Infrastructure (https://gilda.ct.infn.it)
e-Infrastructure projects & others Grids
e-Infrastructures adopting gLite
~80 countries “linked” together !
e-Infrastructures interoperable or in pro-gress to be made interoperable with gLite
EGEE Middleware Distribution
Combines components from different providers– Condor and Globus (via VDT)– LCG (LHC Computing Grid)– EDG (European Data Grid) – Others
After prototyping phases in 2004 and 2005 convergence with LCG-2 distribution reached in May 2006
– gLite 3.0 released in May 2006, current release is 3.1
Develop a lightweight stack of generic middleware useful to EGEE applications
– Pluggable components – cater for different implementations
– Follow SOA approach, WS-I compliant where possible
Focus now is on re-engineering and hardening Business friendly open source license: Apache
2.0
LCG-2
prototyping
prototyping
product
20042004
20052005 product
gLite
20062006 gLite 3.0
The middleware structure
Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware
Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory
Foundation Grid Middleware will be deployed on the EGEE infrastructure
– Must be complete and robust– Should allow interoperation with
other major grid infrastructures– Should not assume the use of
Higher-Level Grid Services
gLite services orchestration
Computing Element
Storage Element
Site X
Information System
submit
submit
query
retrieve
retrieve
Workload ManagementLogging & Bookkeeping
User Interface
publishstate
File and ReplicaCatalogs
AuthorizationService
query
updatecredential publish
state
discoverservices
gLite services decomposition
API Access
Job Mgmt. Services
ComputingElement
WorkloadManagement
MetadataCatalog
Data Services
StorageElement
DataMovement
File & ReplicaCatalog
Authorization
Security Services
Authentication
Information &Monitoring
Information & Monitoring Services
Job
Monitoring
Accounting
Auditing
JobProvenance
PackageManager
CLI
Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf
Grid Access
The access point to the EGEE Grid is the User Interface (UI) It provides the CLI tools to access the functionalities offered
by the gLite Services They allow to perform some basic Grid operations:
– create the user proxy needed for authentication/authorization– retrieve the status of different resources from the Information
System– copy, replicate and delete files from the Grid– list all the resources suitable to execute a given job– submit jobs for execution– cancel jobs – retrieve the output of finished jobs– show the status of submitted jobs– retrieve the logging and bookkeeping information of jobs
It provides the APIs to allow the development of Grid-enabled applications
Security Services
GSI Authentication based on PKI X.509 SSL infrastructure • Certificate Authorities (CA) issue (long lived) certificates identifying individuals (much like a passport)• to reduce vulnerability, on the Grid user identification is done by using (short lived) proxies of their certificates (they can be stored on MyProxy servers)• users belong to VO’s, to groups inside a VO and may have special roles
VOMS provides a way to add attributes to a certificate proxy
BDIItop-level
BDIIsite-level
BDIIresource
MDSGRIS
provider provider
WMS
WN
UI
FTS
Queries
Site
- Based on ldap- Standardized information provider (GIP)- GLUE-1.3 schema- Top level Used with 230+ sites - Roughly 60 instances in EGEE
2 minutes
Berkeley Database Information Index
Information & Monitoring Services / 1
Information & Monitoring Services / 2
For users R-GMA appears similar to a single relational database Implementation of OGF’s Grid Monitoring Architecture (GMA) Rich set of APIs (WebBrowsers, Java, C/C++, Python) Typical deployment consists of Producer and Consumer Services on a one
per site basis (MON box), and a centralized Registry and Schema
ProducerService
RegistryService
ConsumerService
AP
IA
PI
SchemaService
Consumerapplication
Producerapplication
Publish Tuples
Send Query
Receive Tuples
Register
LocateQ
uery
Tu
ple
sSQL “CREATE TABLE”
SQL “INSERT”
SQL “SELECT”
GridICE monitoring tool
Data Services /1
Heterogeneity– Data is stored on different storage
systems using different access technologies
Distribution– Data is stored in different locations –
in most cases there is no shared file system or common namespace
– Data needs to be moved between different locations
Data description– Data are stored as files: need a way
to describe files and locate them according to their contents
– Need common interface to storage resources
Storage Resource Manager (SRM)
– Need to keep track where data is stored
File and Replica Catalogs
– Need scheduled, reliable file transfer
File transfer services
– Need a way to describe files’ content and query them
Metadata catalog
Data Services /2
The Storage Resource Manager interface is the basis for the gLite Storage Elements (SE)
– hides the storage system implementation
– handles the authorization based on VOMS credentials
– posix-like access to SRM via GFAL (Grid File Access Layer)
The LCG File Catalogue (LFC)keeps track of file replicas on the gridLogical File Name (LFN)
An alias created by a user to refer to some item of data
Global Unique Identifier (GUID)
A non-human-readable unique identifier for an item of data
Site URL (SURL)Gives indication on which place (Storage Element) the file is actually found. Understood by the SRM interface
Transport URL (TURL)Temporary locator of a replica+access protocol, understood by the backend MSS
Job Management Services /1
the Computing Element (CE) is the front-end to the local farm (cluster, batch system)
– several implementation : Torque/Maui, PBS, LSF, Condor, SGE
– CE is usually installed on the master node of the farm: slave nodes run the Worker Node
– typically CE runs also the site BDII providing information to the top BDII
– software application is installed on CE on a shared area The CE receives users’ job from the WMS
– there are different queues with different priorities– jobs are sent to the batch system which executes them
on WN Output is then copied back to WMS
Job Management Services /2
CREAM: Web Service Computing Element
– Cream WSDL allows defining custom user interface
– C++ CLI interface allows direct submission
Lightweight Fast notification of job status changes
– via CEMon Improved security
– no “fork-scheduler” Will support for bulk jobs on the CE
– optimization of staging of input sandboxes for jobs with shared files
ICE: Interface to Cream Environment– being integrated in WMS for
submissions to CREAM
ENEA-Grid approach to provide access to AIX
A solution of current known limitations:
1) gLite must be installed on each WN only Intel/SL machines2) gLite WN must communicate with RB security/firewall
it works also withNFS or GPFS
it works also withrsh or ssh
Invasiveness of the grid middleware and firewall requirements are minimized !
management issues
Job Management Services /3
WMS: Resource brokering, workflow management, I/O data management Web Service interface: WMProxy– Task Queue: keep non matched jobs– Information SuperMarket: optimized cache of information system– Match Maker: assigns jobs to resources according to user requirements
(possibly including data location)– Job submission & monitoring
Condor-G ICE (to CREAM)
– External interactions: Information System Data Catalogs Logging&Bookkeeping Policy Management
systems
Advanced scheduling
Direct Acyclic Graph (DAG) is a set of jobs where the input, output, or execution of one or more jobs depends on one or more other jobs
A Collection is a group of jobs with no dependencies
– basically a collection of JDL’s
A Parametric job is a job having one or more attributes in the JDL that vary their values according to parameters
Using compound jobs it is possible to have one shot submission of a (possibly very large, up to thousands) group of jobs
– Submission time reduction Single call to WMProxy server Single Authentication and Authorization process Sharing of files between jobs
– Availability of both a single Job Id to manage the group as a whole and an Id for each single job in the group
nodeEnodeC
nodeA
nodeD
nodeB
Logging & Bookkeping (LB)
Tracks jobs in terms of events gathered from various gLite components
Process them to give a higher level view on the job states Provide interfaces for quering L&B, register for notifications Often deployed on the same machine of the WMS, but can be remote
Job submission example
JDL
Logging &Book-keeping
ResourceBroker
Job SubmissionService
StorageElement
ComputingComputingElementElement
Information Service
Job Status
ReplicaCatalog
Job SubmitEvent
Input Sandbox
JDL
Job
Input Sandbox
Output Sandbox
Output Sandbox
User Interface
Author.Service
voms-proxy-init
glite-wms-job-submit myjob.jdlMyjob.jdl
Executable = “gridTest”;StdError = “stderr.log”;StdOutput = “stdout.log”;InputSandbox = {“/home/joda/test/gridTest”};OutputSandbox = {“stderr.log”, “stdout.log”};InputData = “lfn:testbed0-00019”;DataAccessProtocol = “gridftp”;Requirements = other.Architecture==“INTEL” && \
other.OpSys==“LINUX”;Rank = “other.GlueHostBenchmarkSF00”;
GSI data acc/transf
Further information
2nd Iberian Grid Infrastructure Conference: 12-14 May 2008, Porto (Portugal), joint with CYCLOPS Project Conferencewww.ibergrid.eu/2008
EGEE’08 Conference: 22-26 September 2008, Istanbul (Turkey)www.eu-egee.org/egee08
EGEE digital library: egee.lib.ed.ac.uk– Needs certificate (GILDA or national CA in browser)
EGEE www.eu-egee.org gLite www.glite.org GILDA https://gilda.ct.infn.it/ LCG lcg.web.cern.ch/LCG Open Grid Forum www.gridforum.org Globus Alliance www.globus.org VDT www.cs.wisc.edu/vdt/
NEW!!!