egi-inspire sa3 “heavy user communities”

40
www.egi.eu EGI-InSPIRE RI-261323 EGI- InSPIRE www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE SA3 “Heavy User Communities” Past, Present & Future [email protected]

Upload: bjorn

Post on 23-Feb-2016

46 views

Category:

Documents


0 download

DESCRIPTION

EGI-InSPIRE SA3 “Heavy User Communities”. Past, Present & Future [email protected]. EGI InSPIRE SA3: Status & Plans. [email protected] WLCG Grid Deployment Board June 2010. The EGI- InSPIRE Project. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-261323

EGI-InSPIRE

www.egi.euEGI-InSPIRE RI-261323

EGI-InSPIRE SA3“Heavy User Communities”

Past, Present & [email protected]

Page 2: EGI-InSPIRE SA3 “Heavy User Communities”

EGI InSPIRE SA3: Status & Plans

[email protected]

WLCG Grid Deployment BoardJune 2010

Page 3: EGI-InSPIRE SA3 “Heavy User Communities”

The EGI-InSPIRE Project Integrated Sustainable Pan-European Infrastructure for

Researchers in Europe

• A proposal for an FP7 project– Work in progress..., i.e. this may all change!

• Targeting call objectives:– 1.2.1.1: European Grid Initiative1.2.1.2: Service deployment for Heavy Users

• Targeting a 3 year project (this did change!)• Seeking a total 25M€ EC contribution

Slides from S. Newhouse

Page 4: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-261323

PY1-PY2 Trend

PY2

II. Resource infrastructure

PY1

CPU norm.wall clock hours

9SA1 and JRA1 - June 2012

Page 5: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-261323

CPU UsagePY2 Metrics Value (yearly increase)

CPU wall clock time Total normalized CPU wall clock time consumed (Billion HEP-SPEC 06 hours)

10.5 (+52.91%)

JobsJob/year (Million)

492.5 (+46.42% ) PY2 Target: 334.8 (+47.10%)

Average Job/day (Million) 1.35

% of total norm. CPU wall time consumed

High-Energy Physics 93.60% (+48.82%)

Astronomy and Astrophysics 2.25% (+117.79)Life Sciences (HEP+AA+LS=97.14%) 1.30% (+1.97)

Various disciplines 1.23% (+20.86)Remaining disciplines 1.62%

II. Resource infrastructure

10SA1 and JRA1 - June 2012

Page 6: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-26132311

Communities & ActivitiesHigh Energy PhysicsTSA3.3

The LHC experiments use grid computing for data distribution, processing and analysis. Strong focus on common tools and solutions. Areas supported include: Data Management, Data Analysis and Monitoring. Main VOs: ALICE, ATLAS, CMS, LHCb but covers many other HEP experiments + related projects.

Life Sciences

Covers the European Extremely Large Telescope (E-ELT), the Square Kilometre Array (SKA) and Cerenkov Telescope Array (CTA) and others. Activities focus on visualisation tools and database/catalog access from the grid. Main VOs: Argo, Auger, Glast, Magic, Planck, CTA, plus others (total 23) across 7 NGIs.

Large variety of ES disciplines. Provides also access from the grid to resources within the Ground European Network for Earth Science Interoperations - Digital Earth Community (GENESI-DEC); assists scientists working on climate change via the Climate-G testbed. Main VOs: esr, egeode, climate-g, env.see-grid-sci.eu, meteo.see-grid-sci.eu, seismo.see-grid-sci.eu- support by ~20 NGIs

Astronomy & AstrophysicsTSA3.5

Earth SciencesTSA3.6

Focuses on medical, biomedical and bioinformatics sectors to connect worldwide laboratories, share resources and ease access to data in a secure and confidential way. Supports 5 VOs (biomed, lsgri, vlemed, pneumogrid + medigrid) across 6 NGIs via the Life Science Grid Community

Life SciencesTSA3.4

These and other communities supported by shared tools & services

EGI-InSPIRE Review 2012

Page 7: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-26132312

Communities & ActivitiesHigh Energy PhysicsTSA3.3

The LHC experiments use grid computing for data distribution, processing and analysis. Strong focus on common tools and solutions. Areas supported include: Data Management, Data Analysis and Monitoring. Main VOs: ALICE, ATLAS, CMS, LHCb but covers many other HEP experiments + related projects.

Life Sciences

Covers the European Extremely Large Telescope (E-ELT), the Square Kilometre Array (SKA) and Cerenkov Telescope Array (CTA) and others. Activities focus on visualisation tools and database/catalog access from the grid. Main VOs: Argo, Auger, Glast, Magic, Planck, CTA, plus others (total 23) across 7 NGIs.

Large variety of ES disciplines. Provides also access from the grid to resources within the Ground European Network for Earth Science Interoperations - Digital Earth Community (GENESI-DEC); assists scientists working on climate change via the Climate-G testbed. Main VOs: esr, egeode, climate-g, env.see-grid-sci.eu, meteo.see-grid-sci.eu, seismo.see-grid-sci.eu- support by ~20 NGIs

Astronomy & AstrophysicsTSA3.5

Earth SciencesTSA3.6

Focuses on medical, biomedical and bioinformatics sectors to connect worldwide laboratories, share resources and ease access to data in a secure and confidential way. Supports 5 VOs (biomed, lsgri, vlemed, pneumogrid + medigrid) across 6 NGIs via the Life Science Grid Community

Life SciencesTSA3.4

These and other communities supported by shared tools & services

EGI-InSPIRE Review 2012

Page 8: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-261323

SA3 Overview

13

WP Task Beneficiary Total PMsWP6-G TSA3.1 CERN 18WP6-G TSA3.2 ARNES 3WP6-G TSA3.2 CERN 120WP6-G TSA3.2 CNRS 30WP6-G TSA3.2 CSC 18WP6-G TSA3.2 CSIC 45WP6-G TSA3.2 CYFRONET 6WP6-G TSA3.2 EMBL 15WP6-G TSA3.2 INFN 36WP6-G TSA3.2 TCD 21WP6-G TSA3.2 UI SAV 18

Sub-total 312WP6-G TSA3.3 INFN 60WP6-G TSA3.3 CERN 203

Sub-total 263WP6-G TSA3.4 CNRS 53WP6-G TSA3.4 EMBL 22

Sub-total 75WP6-G TSA3.5 INFN 30WP6-G TSA3.6 KIT-G 27

Sub-total 57CERN 341 TOTAL 725

CERNFranceSloveniaSlovakiaItalySpainFinlandPolandEMBLIrelandGermany

NA14%

NA2

23%

SA156%

SA25%

SA38%

JRA13%

SA3 Effort

9 Countries11 Beneficiaries725 PMs20.1 FTEs

EGI-InSPIRE Review 2012

Page 9: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-26132314

SA3 Objectives

Transition to sustainable support:

+Identify tools of benefit to multiple communities

– Migrate these as part of the core infrastructure

+Establish support models for those relevant to individual communities

EGI-InSPIRE Review 2012

Page 10: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-261323

Achievements in Context

• As an explicit example, we use the case of HEP / support for WLCG

The 3 phases of EGEE (I/II/III) overlapped almost exactly with final preparations for LHC data taking:– WLCG Service Challenges 1-4, CCRC’08, STEP’09

EGI-InSPIRE SA3 covered virtually all the initial data taking run (3.5TeV/beam) of the LHC: first data taking and discoveries!

The transition from EGEE to EGI was non-disruptive Continuous service improvement has been demonstrated Problems encountered during initial data taking were rapidly solved Significant progress in the identification and delivery of common solutions Active participation in the definition of the future evolution of WLCG

15EGI-InSPIRE Review 2012

Page 11: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-26132316

WLCG Service Incidents

Scale Test

EGI-InSPIRE Review 2012

These are significant service incidentswrt targets defined in the WLCG MoU.Basically mean major disruption to datataking, distribution, processing or analysis.A Service Incident Report is required.

Page 12: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-26132317

WLCG Service Incidents

Scale Test

Start of Data Taking

EGI-InSPIRE Review 2012

Page 13: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-26132318

Resolution of Incidents

Data taking

Incidents

EGI-InSPIRE Review 2012

Page 14: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-26132319

Services for HEPActivity PY2 ResultsDistributed Analysis

Common Analysis Framework study for ATLAS and CMS initiated; first stage successfully completed (May 2012); next phase launched (Sep 2012);

Data Management

Dynamic caching / data popularity – move away from static data placement: common solutions deployed; others under development

Persistency Framework

Handles the event and detector conditions data from the experiments

Monitoring / Dashboards

All aspects of production and analysis: additional common solutions deployed

Task Leader: Maria Girone

EGI-InSPIRE Review 2012Common Solutions

Focu

s on

Com

mon

Sol

utio

ns A

cros

s (a

ll) V

Os

Page 15: EGI-InSPIRE SA3 “Heavy User Communities”

Experiment Support

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

DBES

The Common Solutions Strategy of the Experiment Support group at CERN for

the LHC Experiments

Maria Girone, CERNOn behalf of the CERN IT-ES Group

CHEP, New York City, May 2012

Page 16: EGI-InSPIRE SA3 “Heavy User Communities”

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES Motivation

21Maria Girone, CERN

• Despite their differences as experiments at the LHC, from a computing perspective a lot of the workflows are similar and can be done with common services

• While the collaborations are huge and highly distributed, effort available in ICT development is limited and decreasing – Effort is focused on analysis and physics

• Common solutions are a more efficient use of effort and more sustainable in the long run

Page 17: EGI-InSPIRE SA3 “Heavy User Communities”

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES Anatomy of Common Solution• Most common solutions can be diagrammed as the

interface layer between common infrastructure elements and the truly experiment specific components– One of the successes of the grid deployment has

been the use of common grid interfaces and local site service interfaces

– The experiments have a environments and techniques that are unique

– In common solutions we target the box in between. A lot of effort is spent in these layers and there are big savings of effort in commonality

• not necessarily implementation, but approach & architecture

– LHC schedule presents a good opportunity for technology changes

Maria Girone, CERN 22

Higher Level Services that

translate between

Experiment Specific

Elements

Common Infrastructure Components

and Interfaces

Page 18: EGI-InSPIRE SA3 “Heavy User Communities”

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES The Group

• IT-ES is a unique resource in WLCG– The group is currently supported with substantial

EGI-InSPIRE project effort– Careful balance of effort embedded in the

experiments & on common solutions– Development of expertise in experiment systems &

across experiment boundaries– People uniquely qualified to identify and implement

common solutions • Matches well with the EGI-InSPIRE mandate of developing

sustainable solutions• A strong and enthusiastic team

Maria Girone, CERN 23

EGI-InSPIRE INFSO-RI-261323

Page 19: EGI-InSPIRE SA3 “Heavy User Communities”

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES Activities

• Monitoring and Experiment Dashboards– Allows experiments and sites to monitor and track their

production and analysis activities across the grid• Including services for data popularity, data cleaning and data integrity

and site test stressing

• Distributed Production and Analysis– Design and development for experiment workload management

and analysis components

• Data Management support– Covers development and integration of the experiment specific

and shared grid middleware • The LCG Persistency Framework

– Handles the event and detector conditions data from the experiments

Maria Girone, CERN 24

Page 20: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-26132326

Achievements in Context

SA3 has fostered and developed cross-VO and cross-community solutions beyond that previously achieved Benefits of multi-community WP

The production use of grid at the petascale and “Terra”scale has been fully and smoothly achieved Benefits of many years of grid funding

EGI-InSPIRE Review 2012

Page 21: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-261323

Reviewers’ Comments

• “In view of the recent news from CERN, it can easily be seen that the objectives of WP6 (=SA3) for the current period have not only been achieved but exceeded. Technically, the work carried out in WP6 is well managed and is of a consistently high quality, meeting the goals, milestones and objectives described in the DoW.” [ etc. ]

27EGI-InSPIRE Review 2012

Page 22: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-261323

LHC Timeline

28EGI-InSPIRE Review 2012

Page 23: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-261323

EGI-InSPIRE

www.egi.euEGI-InSPIRE RI-261323

FUTURE OUTLOOK

Page 24: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-261323

Sustainability Statements

30EGI-InSPIRE D6.8 Draft

Tool / Package Implementation of Sustainable SupportPersistency Framework POOL component maintained by experiments; COOL and CORAL by

CERN-IT and experimentsData Analysis Tools Proof-of-concept and prototype developed partially using EGI-

InSPIRE resources. Production system – if approved – to be resourced by key sites (e.g. CERN, FNAL, ...) plus experiments. The development of this system is in any case outside the scope of EGI-InSPIRE SA3.

Data Management Tools Released to production early in PY3. Long-term support taken over by PH department at CERN (outside SA3 scope).

Ganga CERN’s involvement in Ganga-core will cease some months after EGI-InSPIRE SA3 terminates and will be picked up by the remainder of the Ganga project (various universities and experiments).[ Ganga allowed us to get other project effort at low cost. ]

Experiment Dashboard All key functionality has been delivered to production before or during PY3. Long-term support guaranteed through CERN-Russia and CERN-India agreements, in conjunction with other monitoring efforts within CERN-IT.

Page 25: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-261323

SA3 – Departures

31

30.04.2013 MASCHERONI Marco IT-ES-VOS 01.07.2011

30.04.2013 TRENTADUE Raffaello IT-ES-VOS 01.07.2010

30.04.2013 GIORDANO Domenico IT-ES-VOS 01.05.2011

30.04.2013 CINQUILLI Mattia IT-ES-VOS 01.09.2010

30.04.2013 NEGRI Guido IT-ES-VOS 01.07.2011

30.04.2013 LANCIOTTI Elisa IT-ES-VOS 01.10.2010

30.04.2013 KARAVAKIS Edouardos IT-ES-DNG 01.07.2010

30.04.2013 KENYON Michael John IT-ES-DNG 01.11.2010

30.04.2013 BARREIRO MEGINO Fernando Harald IT-ES-VOS 01.06.2010

30.04.2013 DENIS Marek Kamil IT-ES-VOS 01.09.2012

30.04.2013 KUCHARCZYK Katarzyna IT-ES-VOS 01.10.2012

Page 26: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-261323

FP8 / Horizon 2020

• Expect first calls in 2013 – funding from late 2013 / early 2014

• IMHO, calls relating to data management and/or data preservation plus specific disciplines (e.g. LS) are likely

• Will we be part of these projects?• Actively pursuing leads now with this objective• Will not solve the problem directly related to

experiment support, nor address “the gap”

• EU projects need not have a high overhead!

32

Page 27: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-261323

Summary

• EGI-InSPIRE SA3 has provided support for many disciplines – the key grid communities at the end of EGEE III

• It has played a key role in the overall support provided to experiments by IT-ES

• All of the main grid communities will be affected by the end of the work package

• The “sustainability plans” are documented in D6.8, due January 2013

• Expect no miracles

33

Page 28: EGI-InSPIRE SA3 “Heavy User Communities”

www.egi.euEGI-InSPIRE RI-261323

EGI-InSPIRE

www.egi.euEGI-InSPIRE RI-261323

BACKUP

Page 29: EGI-InSPIRE SA3 “Heavy User Communities”

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES Examples: Data Popularity

• Experiments want to know which datasets are used, how much, and by whom– Good chance of a common solution

• Data popularity uses the fact that all experiments open files and access storage

• The monitoring information can be accessed in a common way using generic and common plug-ins

• The experiments have systems that identify how those files are mapped onto logical objects like datasets, reprocessing and simulation campaigns

Maria Girone, CERN 40

Files accessed, users and CPU

used

Experiment Booking Systems

Mapping Files to Datasets

File Opens and Reads

Page 30: EGI-InSPIRE SA3 “Heavy User Communities”

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES Popularity Service • Used by the experiments to assess the

importance of computing processing work, and to decide when the number of replicas of a sample needs to be adjusted either up or down

Maria Girone, CERN 41

See D. Giordano et al., [176] Implementing data placement strategies for the CMS experiment based on a popularity model

Page 31: EGI-InSPIRE SA3 “Heavy User Communities”

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES Cleaning Service • The Site Cleaning Agent is used to suggest obsolete or

unused data that can be safely deleted without affecting analysis.

• The information about space usage is taken from the experiment dedicated data management and transfer system

Maria Girone, CERN 42

Page 32: EGI-InSPIRE SA3 “Heavy User Communities”

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES

D. Tuckett et al., [300], Designing and developing portable large-scale JavaScript web applications within the Experiment Dashboard framework

Dashboard Framework and Applications • Dashboard is one of the original common services

– All experiments execute jobs and transfer data– Dashboard services rely on experiment specific

information for site names, activity mapping, error codes– The job monitoring system collects centrally information

from workflows about the job status and success• Database, framework and visualization are common

Maria Girone, CERN 43

Framework & visualization

Sites and activities

Job submission &

data transfers

Page 33: EGI-InSPIRE SA3 “Heavy User Communities”

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES Site Status Board • Another example of a good common service

– Takes specific lower level checks on the health of common services

– Combines with some experiment specific workflow probes– Includes links into the ticketing system– Combines to a common view

Maria Girone, CERN 44

Page 34: EGI-InSPIRE SA3 “Heavy User Communities”

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES HammerCloud

• HammerCloud is a common testing framework for ATLAS (PanDA), CMS (CRAB) and LHCb (Dirac)

• Common layer for functional testing of CEs and SEs from a user perspective

• Continuous testing and monitoring of site status and readiness. Automatic Site exclusion based on defined policies

• Same development, same interface, same infrastructure less workforce

, Maria Girone, CERN 45

Testing and Monitoring Framework

Distributed analysis

Frameworks

Computing & Storage

Elements

Page 35: EGI-InSPIRE SA3 “Heavy User Communities”

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES HammerCloud

D. van der Ster et al. [283], Experience in Grid Site Testing for ATLAS, CMS and LHCb with HammerCloud

Page 36: EGI-InSPIRE SA3 “Heavy User Communities”

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES New Activities – Analysis Workflow

• Up to now services have generally focused on monitoring activities– All of these are important and commonality saves

effort– Not normally in the core workflows of the

experiment

• Success with the self contained services has provided confidence moving into a core functionality– Looking at the Analysis Workflow

• Feasibility Study for a Common Analysis Framework between ATLAS and CMS

Maria Girone, CERN 47

Job Tracking, Resubmission, and scheduling

Data discovery, environment configuration,

and job splitting

Job submission and Pilots

Page 37: EGI-InSPIRE SA3 “Heavy User Communities”

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES Analysis Workflow Progress

• Looking at ways to make the workflow engine common between the two experiments– Improving the sustainability of the central

components that interface to low-level services • A thick layer that handles prioritization, job

tracking and resubmission

– Maintaining experiment specific interfaces • Job splitting, environment, and data discovery

would continue to be experiment specific

Maria Girone, CERN 48

Job Tracking, Resubmission, and scheduling

Data discovery, job splitting and

packaging of user

environment

Job submission and Pilots

Page 38: EGI-InSPIRE SA3 “Heavy User Communities”

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES Proof of Concept Diagram

Maria Girone, CERN 49

• Feasibility Study proved that there are no show-stoppers to design a common analysis framework

• Next step is a proof of concept

Page 39: EGI-InSPIRE SA3 “Heavy User Communities”

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES Even Further Ahead

• As we move forward, we would also like to assess and document the process– This should not be the only common project

• The diagram for data management would look similar– A thick layer between the experiment logical definitions

of datasets and the service that moves files• Deals with persistent location information and tracks files in

progress and validates file consistency

• Currently no plans for common services, but has the right properties

Maria Girone, CERN 50

File locations and files in

transfer

Datasets to file mapping

File Transfer Service (FTS)

Page 40: EGI-InSPIRE SA3 “Heavy User Communities”

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

ES Outlook

IT-ES has a good record of identifying and developing common solutions between the LHC experiments– Setup and expertise of the group have helped

Several services focused primarily on monitoring have been developed and are in production use

As a result, more ambitious services that would be closer to the experiment core workflows are under investigation The first is a feasibility study and proof of concept of a

common analysis framework between ATLAS and CMS

Both better and more sustainable solutions could result – with lower operational and maintenance costs

Maria Girone, CERN 51