www.egi.eu egi-inspire ri-261323 egi-inspire egi-inspire ri-261323 sa1 and jra1 operations and...

42
www.egi.eu EGI-InSPIRE RI-261323 EGI- InSPIRE www.egi.eu EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager - INFN T. Ferrari, Chief Operations Officer - EGI.eu SA1 & JRA1 - EGI-InSPIRE Review 2011 1 30/05/2011

Upload: dale-kennedy

Post on 18-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

EGI-InSPIRE RI SA1 Overview 30/05/2011 SA1 & JRA1 - EGI-InSPIRE Review Countries 45 Beneficiaries 5238 PMs FTEs WPBeneficiaryTotal PM WP4-EEGI.eu36 WP4-ECERN59 WP4-ECNRS12 WP4-ECSC23 WP4-ECSIC29 WP4-ECYFRONET23 WP4-EGRNET70 WP4-EINFN48 WP4-EKIT-G70 WP4-ELIP17 WP4-ENCF40 WP4-ESRCE11 WP4-ESTFC75 WP4-EVR-SNIC23 WP4-NARNES94 WP4-NCESNET128 WP4-NCNRS316 WP4-NCSC67 WP4-NCSIC372 WP4-NCYFRONET156.1 WP4-NE-ARENA71 WP4-NGRENA19 WP4-NGRNET180 WP4-NICI58 WP4-NICT-BAS124 WP4-NIIAP NAS RA19 WP4-NIMCS-UL52 WP4-NINFN378 WP4-NIPB118 WPBeneficiaryTotal PM WP4-NIUCC25 WP4-NKIT-G278 WP4-NLIP107 WP4-NMTA KFKI118 WP4-NNCF159 WP4-NRENAM20 WP4-NSIGMA82 WP4-NSRCE72 WP4-NSTFC277 WP4-NSWITCH86 WP4-NTCD94 WP4-NTUBITAK130 WP4-NUCPH81 WP4-NUCY48 WP4-NUI SAV96 WP4-NUIIP NASB30 WP4-NUKIM71 WP4-NUOBL ETF75 WP4-NUOM71 WP4-NUPT32 WP4-NVR-SNIC84 WP4-NVU22 WP4-NASGC193 WP4-NASTI156 WP4-NKEK1 WP4-NKISTI92 WP4-NUNIMELB36 WP4-NNUS14 France Finland Spain Poland Greece Italy Germany Portugal Netherlands Croatia UK Sweden Slovenia Czech Republic Russia Georgia Romania Bulgaria Armenia Latvia Serbia Israel Hungary Moldova Norway Switzerland Ireland Turkey Denmark Cyprus Slovakia Belarus FYR Macedonia Bosnia & Herzegovina Montenegro Albania Lithuania Taiwan Philippines Japan Korea Australia Singapore

TRANSCRIPT

Page 1: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323

EGI-InSPIRE

www.egi.euEGI-InSPIRE RI-2613231

SA1 and JRA1Operations and Operational Tools

D. Cesini, JRA1 Activity Manager - INFNT. Ferrari, Chief Operations Officer - EGI.eu

SA1 & JRA1 - EGI-InSPIRE Review 201130/05/2011

Page 2: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 2

Outline

• PART I– Objectives, tasks, effort, partners

• PART II– Resource Infrastructure

• PART III – Service infrastructure: status and achievements

• PART IV– Issues, use of resources, impact and plans

30/05/2011

Page 3: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011

SA1 Overview

30/05/2011 3

JRA13%

NA13%

NA29%

NA315%

SA156%

SA25%

SA38%

SA1 Effort

43 Countries 45 Beneficiaries 5238 PMs109.14 FTEs

WP Beneficiary Total PMWP4-E EGI.eu 36WP4-E CERN 59WP4-E CNRS 12WP4-E CSC 23WP4-E CSIC 29WP4-E CYFRONET 23WP4-E GRNET 70WP4-E INFN 48WP4-E KIT-G 70WP4-E LIP 17WP4-E NCF 40WP4-E SRCE 11WP4-E STFC 75WP4-E VR-SNIC 23WP4-N ARNES 94WP4-N CESNET 128WP4-N CNRS 316WP4-N CSC 67WP4-N CSIC 372WP4-N CYFRONET 156.1WP4-N E-ARENA 71WP4-N GRENA 19WP4-N GRNET 180WP4-N ICI 58WP4-N ICT-BAS 124WP4-N IIAP NAS RA 19WP4-N IMCS-UL 52WP4-N INFN 378WP4-N IPB 118

WP Beneficiary Total PMWP4-N IUCC 25WP4-N KIT-G 278WP4-N LIP 107WP4-N MTA KFKI 118WP4-N NCF 159WP4-N RENAM 20WP4-N SIGMA 82WP4-N SRCE 72WP4-N STFC 277WP4-N SWITCH 86WP4-N TCD 94WP4-N TUBITAK 130WP4-N UCPH 81WP4-N UCY 48WP4-N UI SAV 96WP4-N UIIP NASB 30WP4-N UKIM 71WP4-N UOBL ETF 75WP4-N UOM 71WP4-N UPT 32WP4-N VR-SNIC 84WP4-N VU 22WP4-N ASGC 193WP4-N ASTI 156WP4-N KEK 1WP4-N KISTI 92WP4-N UNIMELB 36WP4-N NUS 14

FranceFinlandSpainPolandGreeceItalyGermanyPortugalNetherlandsCroatiaUKSwedenSloveniaCzech RepublicRussiaGeorgia

RomaniaBulgariaArmeniaLatviaSerbiaIsraelHungaryMoldovaNorwaySwitzerlandIrelandTurkeyDenmarkCyprusSlovakia

BelarusFYR MacedoniaBosnia & HerzegovinaMontenegroAlbaniaLithuaniaTaiwanPhilippinesJapanKoreaAustraliaSingapore

Page 4: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 4

ObjectivesOperate a secure, reliable European-wide federated production grid infrastructure that is integrated and interoperates with other grids worldwide

Tasks Task ObjectivesO1 TSA1.2 Maintain a secure infrastructure

O2 TSA1.3 Validate new technology releases (tools and middleware)

O3 TSA1.7 Support end-users and Resource Centre administrators

O4 TSA1.8 Service Level Management, grid oversight, documentation and procedures

O5 TSA1.4TSA1.5TSA1.6

Operate tools, the accounting infrastructure and the EGI Helpdesk

O6 JRA1.2JRA1.3JRA1.4JRA1.5

Evolve the operational tools used by the production infrastructure - Maintenance, development and support of national deployment - Accounting for the use of new resources (desktop, virtualization, storage, data,

application and billing)

30/05/2011

Page 5: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 5

SA1 tasks and resource distribution

Task Leader/Partner Task effort distribution

TSA1.1 Activity Management T. Ferrari/EGI.eu 0.70%

TSA1.2 Secure Infrastructure M. Ma/STFC 8.60%

TSA1.3 Service Deployment Validation M. David/LIP 11.00%

TSA1.4 Infrastructure for Grid Management

E. Imamagic/ SRCE 20.66%

TSA1.5 Accounting J. Gordon/STFC 5.81%

TSA1.6 Helpdesk Infrastructure T. Antoni/KIT 8.76%

TSA1.7 Support Teams R. Trompert/SARA 28.16%

TSA1.8 Providing a Reliable Grid Infrastructure and core services

C. Kanellopoulos/AUTH

16.31%

30/05/2011

Page 6: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323

JRA1 Overview

30/05/2011 SA1 & JRA1 - EGI-InSPIRE Review 2011 6

JRA13%

NA13%

NA29%

NA315%

SA156%

SA25% SA3

8%

JRA1 Effort

WP Task Beneficiary Total PMsWP7-E TJRA1.1 INFN 24WP7-E TJRA1.2 KIT-G 47WP7-E TJRA1.2 CSIC 12WP7-E TJRA1.2 CNRS 12WP7-E TJRA1.2 GRNET 12WP7-E TJRA1.2 SRCE 12WP7-E TJRA1.2 STFC 24WP7-E TJRA1.2 CERN 12WP7-G TJRA1.3 CSIC 3WP7-G TJRA1.3 CNRS 3WP7-G TJRA1.3 SRCE 3WP7-G TJRA1.3 STFC 3WP7-G TJRA1.3 CERN 6WP7-G TJRA1.4 KIT-G 18WP7-G TJRA1.4 CSIC 18WP7-G TJRA1.4 INFN 26WP7-G TJRA1.4 STFC 27WP7-G TJRA1.5 CNRS 53

7 Countries 8 Beneficiaries 315 PMs8.67 FTE

ItalyGermanySpainGreeceCroatiaCERNFranceUK

Page 7: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 7

JRA1 tasks and resource distribution

Task Leader/Partner Task effort distribution

TJRA1.1 Activity Management D. Cesini/INFN 7.6%

TJRA1.2 Maintenance and development of the deployed operational tools

T. Antoni/KIT 41.6%

TJRA1.3 Supporting National Deployment models

P. Solagna/EGI.eu 5.7% (PY1 only)

TJRA1.4 Accounting for usage of different resource types • Cloud, HPC, Desktop Grid, • Storage/Data Usage• Application Usage• Billing system

J. Gordon/SFTC 28.5% (PY2-PY4 only)

TJRA1.5 Integrated Operations Portal• Service Oriented model• Harmonization with GOCDB• Porting to Symfony• New DCI integration

C. L’Orphelin/CNRS 16.6% (PY1-PY3 only)

30/05/2011

Page 8: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 8

PART II• PART I

– Objectives, tasks, effort, partners

• PART II– Resource Infrastructure

• Architecture• Resource capacity and utilization

• PART III – Service infrastructure: status and achievements

• PART IV– Issues, use of resources, impact and plans

30/05/2011

Page 9: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323

EGI Resource Infrastructure

SA1 & JRA1 - EGI-InSPIRE Review 2011 9

Resource Infrastructure

Resource Centres

Resource Centres

Resource Infrastructure

Resource Centres

Resource Centres

Resource Infrastructure

Resource Centres

Resource Centres

Network

Resource Provider

NGI/EIRO Resource Provider

MoUs

EGI.eu

Layer I. Resource Centre (RC)A localised or geographically distributed administration domain, where EGI resources (CPUs, data storage, instruments and digital libraries) are managed and operated to be accessed by end-users

Layer II. Resource InfrastructureThe federation of Resource Centres, which are interconnected by the National Research and Education Networks (NRENs) and GÉANT.

Integrated Infrastructures:operated by a non-EGI-InSPIRE partner but relying on EGI operational services, e.g. Latina American and Caribbean

Peer infrastructures:accessible to EGI users, but relying on own operational services, e.g. Open Science Grid (USA)

Resource infrastructure Provider (RP)The legal organisation responsible for any matter that concerns the respective Resource Infrastructure

EGI Participant: National Grid Initiatives (NGIs), European Intergovernmental Organizations (EIROs)

Layer III.EGI Resource Infrastructure

30/05/2011

Page 10: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 10

Status and yearly increase

Resource Centres 338 +6.8% 96 supporting MPI +31.5% Europe, Asia Pacific, North and South AmericaCountries 51 (57 with integrated RPs) +18.75%Capacity 240,000 CPU cores

(339,000 with integrated and peer RPs) 24.9% 1.89 Million HEP-SPEC 06* 102 PB disk, 89 PB tape

* HEP-SPEC 06: Computing benchmark based on SPECCPU2006, 10 HEP-SPEC = 4 kSI2k

30/05/2011

Page 11: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 11

From EGEE federations to NGIs

• April 2010: 12 EGEE federated regional infrastructures • April 2011

• 40 European NGIs and 1 EIRO (CERN) • Integrated resource infrastructures

– Asia Pacific federation– Canada federation– Latin American and Caribbean Grid Initiative– Latin American federation

• Transition completed in January 2011

30/05/2011

Page 12: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 12

Service/Resource Centre (RC) Availability and Reliability

• Availability– the percentage of time that the

service/RC was up and running (uptime / total_time) x 100

– Minimum RC availability: 70%• Reliability

– the percentage of time that the service/RC was up and running, excluding periods of scheduled interventions

(uptime / (total time – scheduled time)] x 100).– Min RC reliability: 75%

• Suspension policy– RC availability < 50% for 3 consecutive

months– 6 RCs suspended– Stricter policy from PY2: from 50 to 70%

PQ1 PQ2 PQ3 PQ488.00%

89.00%

90.00%

91.00%

92.00%

93.00%

94.00%

95.00%

96.00%

97.00%

monthly availability monthly reliabilityEG

I AVG

Ava

ilabi

lity/

Relia

bilit

y (%

)

• Monthly performance reports per RC• New ticket-based procedure for monitoring of underperforming RCs

30/05/2011

Overall PY1 EGI availability: 92.73%Overall PY1 EGI reliability: 93.85%

Page 13: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 13

Usage statistics Metric Unit Per month Per day (yearly increase)Average Number Jobs (all VOs) number 27.8 Million 914,000 (+82%)Average Number Jobs (non-HEP VOs)

number 2.8 Million (10% of total)

100,000 (+47%)

CPU wall clock (all VOs) hours 74.8 Million 2.5 MillionNormalized CPU wall clock (all VOs)

HEP-SPEC 06 hours

563.2 Million 18.5 Million

30/05/2011

Page 14: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 14

PART III• PART I

– Objectives, tasks, effort, partners

• PART II– Resource Infrastructure

• PART III – Service infrastructure: status and achievements

• Infrastructure Services• Technical Services• Support Services• Human Services

• PART IV– Issues, use of resources, impact and plans

30/05/2011

Page 15: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 15

Operations CentresOperations Centres

Resource Infrastructure

Resource InfrastructureResource

InfrastructureResource

Infrastructure

and partners

Operations Centres

EGI Service InfrastructureThe service infrastructure enables secure, interoperable and reliable access to distributed resources. EGI services are provided locally by Operations Centres and globally by EGI.eu.

I. Infrastructure Services tools

II. Technical Services Grid middleware

III. Support Services Helpdesk

IV. Human Services Service Level Management, security,

documentation, coordination

Service categories:

Resource Infrastructure

Local Services

Global Services

30/05/2011

Page 16: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 16

I. Infrastructure ServicesInfrastructure Services

SA1/ JRA1 tasks

Central components Local components

Broker network TSA1.4, JRA1.2 ActiveMQ brokers -

Service Availability Monitoring

TSA1.4, JRA1.2, JRA1.3

MyEGI portal, Aggregated Topology Provider, Metrics Description Database (rw), Metrics Results Store

MyEGI portal, Aggregated Topology Provider, Metrics Description Database (r), Metrics Results Store, Nagios Configuration Generator

Operations portal and dashboard

TSA1.4, JRA1.2, JRA1.5

Central instance Local instance

Accounting TSA1.5, JRA1.4 APEL central databases, portal Sensors, national /regional repositories and portal APEL local database under development

Helpdesk TSA1.6, JRA1.2 Global Grid User Support Support Unit/Local helpdesk

Central Tools TSA1.4, JRA1.2 GOCDB Under development

30/05/2011

Page 17: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 17

I. Infrastructure ServicesService Availability Monitoring (SAM) 1/2

SAM (CERN, AUTH, SRCE) monitoring framework for RCs and services • one of the main data sources for

the Operations Dashboard • data source to create

Availability/Reliability statistics • composed of various components:

1. The test submission framework: based on the NAGIOS system set up and customized by the NAGIOS Configurator (NCG)

2. The DataBase components for storage of information about topology, metrics and results

3. A message bus to publish the monitoring results (load balanced ActiveMQ broker network)

4. A visualization tool GUI: MyEGI

30/05/2011

Page 18: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 18

Achievements• 8 releases (new EGI release procedure as distributed software)• myEGI visualization portal in production (central and local instances)

– New look and feel– MyEGI Web Service available– GridMap style plots added

• Database components re-engineering– ATP as new topology provider (replacing the old SAM database)

• Probes– Integration of ARC and GLOBUS5 probes (UNICORE in progress)– New probe for testing of the Certification Authority certificate distribution with automatic discovery of

the latest version• Support for

– robot certificates– monitoring of uncertified sites– authorization plugin (messaging infrastructure) for denial of all broker-to-broker communications (for

accounting)

• Other: creation of the second 2nd level support and handover of probes development to EMI and IGE (in progress)

I. Infrastructure ServicesService Availability Monitoring (SAM) 2/2

30/05/2011

Page 19: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323

I. Infrastructure Services

Operations Portal and Dashboard

Achievements• 8 releases • package for local deployment released and

updated (deployed in 4 NGIs)• Porting to a new web framework almost

completed• Improvements to all the modules

– VO ID Cards module implementation driven by NA3 requirements

• Integration with security dashboard (in progress)

• New “Central Operator on Duty” view released

SA1 & JRA1 - EGI-InSPIRE Review 2011 19

Operations Portal (CNRS)• Broadcast tool • Operational Dashboard• VO Identity Cards

30/05/2011

Page 20: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 20

I. Infrastructure ServicesAccounting Repository and Portal

Accounting Repository (STFC)- usage of compute resources within the production infrastructure- based on gLite-APELAccounting Portal (FCTSG) GUI for access to data from the Accounting RepositoryAchievements• New: complete integration of the APEL accounting system with the message

broker network• Porting of APEL tests to Nagios• Design and implementation of a distributable Regional Accounting Server (in

progress)• Portal modified to support new GOCDB4 PI and Ops Portal XML feeds• NGI View added in the portal• Decommissioning of central R-GMA accounting services (Feb 2011)

30/05/2011

Page 21: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 21

I. Infrastructure Services

EGI Helpdesk

• EGI Helpdesk (KIT)– distributed helpdesk with

central coordination: Global Grid User Support (GGUS)

• Achievements– 9 releases– Update of support teams and units– Integration of the new NGIs (31 NGIs interfaced, 22 as support units, 6 with a

local helpdesk)– Definition and implementation of new workflows for

• technology support (1st line, 2nd line and 3rd line provided by the Technology Providers – EMI, IGE etc.) and the respective access privileges

• support of software provisioning and bug reporting processes that involve EGI and its external technology providers

– Local view (xGUS) available and deployed by various NGIs/projects

30/05/2011

Page 22: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 22

I. Infrastructure Services

Central configuration repository

Achievements- Decommissioning of GOCDB3, release

and deployment of new GOCDB4- Prototype for local deployment

available but w/o synchronization system

- Naming schema modification to integrate UNICORE services

- GLUE2.0 compatibility for service names (ongoing)

GOCDB (STFC)EGI relies on a central configuration database to record static information contributed by theresource providers as to the service instances that they are running and the individual contact, role and status information for those responsible for particular services

30/05/2011

Page 23: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 23

Project Management Tool

Metrics PortalMetrics Portal (FCTSG) prototype tool being developed for a manual/automatic collection of EGI-InSPIRE metrics from different information sources to track project and partner performance

30/05/2011

Page 24: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 24

II. Technical Services

Technical Services

SA1 tasks

Central components Local components

Requirements TSA1.1 Gathering and prioritization at the Operations Management Board

Gathering from Resource Centres and prioritization

Technology Staged Rollout

TSA1.3 Coordination Deployment validation by a restricted list of Resource Centres

Interoperability TSA1.4 Coordination Collection of local requirements, GLOBUS and UNICORE integration task forces

Core services TSA1.8 Authentication services for infrastructure VOs (DTEAM), WMS and top-BDII for monitoring of uncertified sites, core services for small user communities , catch-all CA

File catalogues, workload managers, authentication and authorization services, data transfer schedulers

Purpose: to improve the usage of the production infrastructure and generally of the technology that makes up the production infrastructure

30/05/2011

Page 25: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 25

• New process for requirements gathering (tools and deployed software) every 3 months

Virtual Organisation

Virtual research

communities

User Community Board

EGI Request TrackerResource Centres

Resource infrastructure

Provider

Operations Tools Advisory Group

Operations Management Board

Technology Coordination Board

andEGI- JRA1

II. Technical ServicesRequirements gathering

Requirements gathering PrioritisationDiscussion with

Technology Providers

30/05/2011

Page 26: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 26

II. Technical ServicesStaged Rollout

• New software updates (grid middleware and tools) are deployed into the production infrastructure incrementally through a staged rollout to ensure that they are reliable in actual use, following successful verification of the software component against published criteria

• Early Adopters are the production Resource Centres willing to deploy one or more new releases – Automation of the process based on RT– Process tested with the validation of gLite 3.1/3.2 releases and SAM

Achievements ValueMax number of components tested/rejected in staged rollout per PQ

29/3

Max number of staged rollout tests undertaken 40 (PQ4) Number of EA teams 45Middleware stacks/components ARC, gLite, UNICORE, SAM, CA trust

chain, GLOBUS (in progress)

30/05/2011

Page 27: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 27

II. Technical ServicesInteroperability

Deployed middleware- ARC (2.38%), gLite (97.62%), UNICORE - More ARC and UNICORE installations expected in 2011- Croatia, Germany, Poland, Romania, The Netherlands, UK integrating GLOBUS and/or UNICORE GLOBUS and UNICORE task forces

Accomplishments • ARC fully integrated in to GOCDB,

accounting and SAM• Integration of UNICORE and GLOBUS in

progress• Open Grid Forum

• Production Grid Infrastructure WG• Grid Interoperability Now WG• Infrastructure Policy Group

30/05/2011

Page 28: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 28

II. Technical ServicesCore services

Achievements• Core Grid services for new/small VOs• New infrastructure for the DTEAM VO

membership management (troubleshooting)

• Membership management for OPS VO (monitoring)

• New infrastructure for monitoring of uncertified sites

• Catch all CA• Local core Grid service instances

– 135 workload management services (WMS)

– 45 file catalogues (LFC)– 118 information discovery services

(top-BDII)– 41 VO membership services (VOMS)

30/05/2011

Page 29: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 29

III. Support Services 1/2

Technical Services SA1 tasks Central components Local components

1st line support TSA1.7 Triage of tickets in GGUS 1st line support for tickets opened locally

Grid oversight TSA1.7 Central operations support and escalation of tickets not managed locally

Local operations support

Network Support TSA1.7 Support to connectivity and performance problems (contact point to the NREN PERT teams)

2nd line support: Deployment Middleware Support Unit (SA2)3rd line support: Technology providers

30/05/2011

Page 30: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 30

Accomplishments• New training and dissemination channels for new NGI support teams, monthly

newsletter • Most of the new NGIs successfully established their own local support structures • Support for network performance issues in place (relying on tools for monitoring

and troubleshooting) – contact point with NREN PERT teams

But• Grid oversight workload affected by new Operations Centres starting operations,

now progressively reducing• Support problems faced in some NGIs now under resolution

Metric ValmeAverage number of EGI tickets CREATED/month 965 tickets (~constant)

Average monthly response time 2.7 operating hours

Average median of monthly solution time 5.8 operating hours

III. Support Services 2/2

30/05/2011

Page 31: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 31

IV. Human Services

Human Services SA1 tasks Central components Local components

Service Level Management

TSA1.8 Coordination Monitoring of local performance and support to Resource Centre administrators

Operational security

TSA1.2 Coordination Incident response (EGI CSIRT), security monitoring, security drills, software vulnerability assessment

Documentation TSA1.8 Coordination Contribution to manuals, procedures and best practices

Operations Management

TSA1.1 Coordination of the Operations Management Board

Local operations management

30/05/2011

Page 32: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 32

IV. Human Services

Service Level Management• Purpose

1. to provide the metrics for conformance of the achieved level of service to the agreed one

2. to ensure that the agreed level of service is provided (monitoring and reporting on Service Levels)

• Achievements– Definition of the EGI Resource Centre Operational Level Agreement [ITIL

v3]: duties, services and the related quality parameters.• agreement between an IT Service Provider (EGI) and another part of the same

Organization (Resource Centre)• an OLA supports the IT Service Provider's (EGI) delivery of IT Services (Grid) to

Customers (end-users) – Resource Provider OLA in progress– Definition of new GGUS-based process for Service Level Management

(involving the central operators on duty – COD)– New suspension policy

30/05/2011

Page 33: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 33

IV. Human Services

Operational security

• Handling potential vulnerabilities reported • Vulnerability assessment• Secure coding education

Computer Security Incident

Response Team

Security Coordination

GroupCoordinate overall EGI

security activities

Security Policy Group

Develop and maintain security policies

EUGridPMAEuropean Policy Management

Authority for Grid Authentication:Coordinate the trust fabric for e-Science authentication in Europe

Software Vulnerability

Group

• Security incident response• Security monitoring of EGI infrastructure• Security drills• Security training and dissemination

Achievements

EGI CSIRT Security Service Challenge 4 13 RCs tested (including WLCG Tier1 sites) 9 security incidents handled12 advisories issued (3 critical)3 critical vulnerabilities mitigated within 7 days1 security training session (EGI TF)

SVG 29 software vulnerabilities reported 15 concerning Grid middleware 4 fixed (others have not passed their Target Date yet)

Procedures 3 new procedures Software vulnerability handling Critical vulnerability handling Security incident (exploited vulnerability) handling

Resource Centres suspended

0

30/05/2011

Page 34: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 34

IV. Human Services

Documentation• Documentation collected at the EGI wiki (160 operations

pages)• 9 new procedures defined and approved• 3 new manuals and 4 how-TOs (in progress)• Migration and update of existing legacy technical documentation (in

progress)• Mirroring of EGI wiki at ASGC

30/05/2011

Page 35: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 35

PART IV

• PART I– Objectives, tasks, effort, partners

• PART II– Resource Infrastructure

• PART III – Service infrastructure: status and achievements

• PART IV– Issues, use of resources, impact and plans

30/05/2011

Page 36: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 36

Issues

• SA1– Pending integration of two NGIs– Establishment of the NGI as reference provider in the

country• JRA1

– Development for local deployment tools delayed– No funded effort for 2nd level support of distributed tools

• SAM• Operations Portal

30/05/2011

Page 37: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 37

Use of Resources 1/2

• SA1– 98% PMs achieved (aggregated)– EGI.eu Global Services

• Some marginal cases of overspending due to transition• TSA1.8E: 59% achieved due to issues in claiming effort within the JRU

(but activities successfully delivered)– NGI Local Services

• Few cases of under/overspending that will be compensated over the duration of the project

• JRA1– 80% PMs achieved (aggregated on all tasks)

30/05/2011

Page 38: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 38

Use of Resources 2/2• TJRA1.2 – Maintenance

– total spent 86%– Unspent effort can be compensated during the coming years – 4 years

task• TJRA1.3 – Development for the Regionalisation of Ops tools

– total spent 63% – Under spending by almost all the partners and development not

completed• Hiring issues for some partners • Consolidation of use cases• Dependencies among tool development roadmaps

– Propose extension of TJRA1.3 into PY2• TJRA1.5 (CNRS)

– Total spent 76%– Harmonization of operations portal with GOCDB postponed

30/05/2011

Page 39: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 39

Plans for next year• SA1

– Extending participation in Staged Rollout actvities– Integration

• New NGIs and MoUs with new integrated RPs• Finish UNICORE and GLOBUS integration• Desktop grids and PRACE (pilots)

– Operational tools availability reports (Global and Local)– Automation of service level management processes– Day-by-day operations (security, support, oversight)

• JRA1– Accounting

• New APEL Publisher September 2011• Regional Accounting Server packaged and released to NGIs December 2011• New resources and billing (roadmap under discussion) careful prioritization needed

– Local deployment models to be completed (synchronisation system for regional GOCDB)

– Operations Portal: Integration of security dashboard, creation of VO dashboards under discussion

30/05/2011

Page 40: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 40

Project targetsProject Objectives Metrics Target Y1 Achieved by PQ4

PO1: Expansion of a nationally based production infrastructure

Number of production Resource Centres in EGI (M.SA1.Size.1)

300 347

Number of CPU cores available in EGI (M.SA1.Size.2) – Integrated

300,000 338,895

Number of CPU cores available in EGI (M.SA1.Size.2) – Project

200,000 239,840

EGI Reliability (M.SA1.Operation.5) 90% 94.6%

PO2: Support of European researchers and international collaborators through VRCs 

Number of jobs done a day (M.SA1.Usage.1) 500 000 914,000

PO3: Sustainable support for Heavy User Communities

Number of Resource Centre with MPI (M.SA1.Integration.2)

50 96

PO6: Integration of new technologies and resources

Number of HPC clusters (M.SA1.Integration.1) 1 49

Number of virtualised resources (M.SA1.Integration.4)

0 16,108

Number of desktop resource (M.SA1.Integration.3) 0 

1,562

30/05/2011

Page 41: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 41

Activity impact and valueProject objective SA1/JRA1 AchievementsO1 The continued operation and expansion of today’s production Infrastructure

• SA1 and JRA1 provided continued, open and available services to all disciplines

• Radical transition to a NGI-based model >20 NGIs• NGIs at different levels of maturity but active,

increasingly sustainable and improving their performance

• OMB and OTAG established >40 members• Installed capacity and Resource Centres integrated continued

to grow +25% CPU cores, +85% job run• 28 operational tool releases• 6 task forces

O4 Interfaces that expand access to new user communities

• Support of MPI expanding +31.5%• Integration of UNICORE HPC

O5 Mechanisms to integrate existing infrastructure providers in Europe and around the world

• New procedures and processes +9 • Collaboration with integrated RPs through MoUs

O6 Establish processes and procedures to allow the integration of new DCI technologies

• Accounting infrastructure migrated to messaging• ARC fully integrated, GLOBUS and UNICORE in progress• Integration of virtual Grid sites (StratusLab)

30/05/2011

Page 42: Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE   EGI-InSPIRE RI-261323 SA1 and JRA1 Operations and Operational Tools D. Cesini, JRA1 Activity Manager

www.egi.euEGI-InSPIRE RI-261323SA1 & JRA1 - EGI-InSPIRE Review 2011 42

Summary

• All project metric targets met• SA1 and JRA1 effectively contributed to the

accomplishments of the project objectives– continued operation with improving performance and

increasing integration– new operational structure

from 12 federations to NGIs and a framework for collaboration with integrated infrastructures

– expansion of the resource infrastructure and utilization+25% sites+84% jobs run

30/05/2011