gridpp presentation to pparc e-science committee 31 may 2001 steve lloyd tony doyle

28
GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

Upload: rebecca-beach

Post on 28-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP Presentation to PPARC e-Science Committee31 May 2001

Steve LloydTony Doyle

Page 2: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 2

GridPP History

Collaboration formed by all UK PP Experimental Groups in 1999 to submit £5.9M JIF bid for Prototype Tier-1 centre at RAL (later withdrawn)

Added some Tier-2 support to become part of PPARC LTSR - “The LHC Computing Challenge” Input to SR2000

Formed GridPP in Dec 2000 included CERN, CLRC and UK PP Theory Groups

From Jan 2001 handling PPARC’s commitment to EU DataGrid

UKUK

Page 3: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 3

Physics DriversAddresses one of PPARC's highest priority Science programmes.LHC Experiments are the key to understanding the origins of Mass (Higgs?), Supersymmetry, CP Violation, Quark gluon plasma and possible new phenomena e.g. extra dimensions

Maximise return from substantial UK investment in LHC Detectors

Page 4: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 4

The LHC Computing Challenge

The problem is Huge:• Total data/year from one

experiment ~ 1 to 8 PB (Petabytes = 1015 Bytes)

• Estimate total requirement to be – ~ 8M SI-95 of CPU Power

( 200,000 1 GHz PCs)– ~ 28 PB of 'Tape' storage– ~ 10 PB of disk storage

The problem is Complex:• > 108 electronic channels

are read out each event• The LHC will produce 8x108

pp interactions per second• The Higgs to rate for

example is expected to be 2x10-4 per second– A 2x10-4 needle in a

8x108 haystack

Distributed Solution to maximise use of facilities and resources worldwide

Page 5: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 5

LHC Computing Model

PhysicsDepartment

Desktop

CERN

Tier2

Lab a

Uni a

Lab c

Uni n

Lab m

Lab b

Uni bUni y

Uni x

Tier 1

USAFermiLab

UK

France

Italy

NL Germany

USABrookhaven

……….

Page 6: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 6

Proposal Executive Summary

• £40M 3-Year Programme• LHC Computing Challenge• = Grid Technology• Five Components:

– Foundation– Production– Middleware– Exploitation– Value-added

Exploitation• Emphasis on Grid Services

and Core Middleware• UK computing strength

within CERN

• Integrated with EU DataGrid, PPDG and GriPhyN

• Facilities at CERN, RAL and up to four UK Tier-2 sites

• Centres = Dissemination• LHC developments

integrated into current programme (BaBar, CDF, D0, ...)

• Robust Management Structure

• Deliverables in March 2002, 2003, 2004

Page 7: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 7

GridPP Component Model

Foundation

Value-added

Production

Middleware

Exploitation

Component 1: Foundation

The key infrastructure at CERN and within the UK

Component 2: Production

Built on Foundation to provide an environment for experiments to use

Component 3: Middleware

Connecting the Foundation and the Production environments to create a functional Grid

Component 4: Exploitation

The applications necessary to deliver Grid based Particle Physics

Component 5: Value-Added

Full exploitation of Grid potential for Particle Physics

£8.5M

£12.7M

£17.0M

£21.0M

£25.9M

Page 8: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 8

Major Deliverables

Prototype I - March 2002• Performance and scalability testing of components• Testing of the job scheduling and data replication software from

the first DataGrid release.

Prototype II - March 2003• Prototyping of the integrated local computing fabric, with

emphasis on scaling, reliability and resilience to errors. • Performance testing of LHC applications. Distributed HEP and other

science application models using the second DataGrid release.

Prototype III - March 2004• Full scale testing of the LHC computing model with fabric

management and Grid management software for Tier-0 and Tier-1 centres, with some Tier-2 components.

Page 9: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 9

Financial Summary

Components 1-4:PPARC External Funds

UK Staff £10.7M £5.9M (EPSRC?)UK Capital £3.2M £4.5M (SRIF?)CERN Staff £5.7MCERN Capital £1.4MTotal £21.0M £10.3M

LHC Tier-1/BaBar Tier A Up to 4 Tier-2

Computing Science

LHC Tier-0

Page 10: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 10

GridPP Organisation

Software development organised around a number of Workgroups

Hardware development organised around a number Regional Centres

• Likely Tier-2 Regional Centres

• Focus for Dissemination and Collaboration with other disciplines and Industry

• Clear mapping onto Core Regional e-Science Centres

Page 11: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 11

GridPP Workgroups

A - Workload Management

Provision of software that schedule application processing requests amongst resources

B - Information Services and Data Management

Provision of software tools to provide flexible transparent and reliable access to the data

C - Monitoring Services

All aspects of monitoring Grid services

D - Fabric Management and Mass Storage

Integration of heterogeneous resources into common Grid frameworkE - Security

Security mechanisms from Certification Authorities to low level components

F - Networking

Network fabric provision through to integration of network services into middleware

G - Prototype Grid

Implementation of a UK Grid prototype tying together new and existing facilities

H - Software Support

Provide services to enable the development, testing and deployment of middleware and applications at institutes

I - Experimental Objectives

Responsible for ensuring development of GridPP is driven by needs of UK PP experiments

J - Dissemination

Ensure good dissemination of developments arising from GridPP into other communities and vice versa

Technical work broken down into several workgroups - broad overlap with EU DataGrid

Page 12: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 12

GridPP and CERN

UK involvement through GridPP will boost CERN investment in key areas:– Fabric management software– Grid security– Grid data management– Networking– Adaptation of physics applications– Computer Centre fabric (Tier-0)

For UK to exploit LHC to the full:Requires substantial investment at CERN to support LHC computing.

Page 13: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 13

GridPP and CERN

• Allow operation of a production quality prototype of the distributed model prior to acquisition of final LHC configuration

• Train staff for management and operation of distributed computing centres

• Provide excellent training ground for young people

• Enable the technology to be re-used by other sciences and industry

This investment will:

Page 14: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 14

Staff at CERN

• Integrated part of the CERN LHC activity• Staff assigned to CERN teams with operational and

development responsibilities– Each team responsible for LHC development and

prototyping as well as for operating current services– Developers need hands-on operational experience– Ensures that CERN experience is fully utilised

• Formal LHC Computing Project structure being defined, to ensure overseeing role by funding bodies – to be agreed with CERN Council

Proposal is that staff are hired by UK Universities or Laboratories and sent on long-term mission to CERN. Employed as CERN associates in IT Division

Page 15: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 15

Hardware at CERN

CERN Prototype - Planned Capacityyear 2001 2002 2003 2004

processor farm no. of 2-cpu systems installed 200 400 800 1,200estimated total capacity (SI95) 13,000 33,000 85,000 158,600GridPP contribution 5,000 15,000 33,000disk storageno. of disks installed 200 400 800 1,600estimated total capacity (TB) 16 44 120 320GridPP contribution 12 36 86tape drivestotal capacity (achievable MB/sec) 200 350 500 800automated mediatotal capacity (TB) 30 120 350 600

Page 16: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 16

GridPP Management Structure

Page 17: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 17

Management Status

The Project Management Board (PMB)

The Executive board chaired by Project Leader - Project Leader being appointed. Shadow Board in operation

The Collaboration Board (CB)

The governing body of the project - consists of Group Leaders of all Institutes - established and Collaboration Board Chair elected

The Technical Board (TB)

The main working forum chaired by the Technical Board Chair - interim task force in place

The Experiments Board (EB)

The forum for experimental input into the project - nominations from experiments underway

The Peer Review Selection

Committee (PRSC)

Pending approval of Project

The Dissemination Board (DB)

Pending approval of Project

Page 18: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 18

Information Flow

Page 19: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 19

Meetings Schedule

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecOpen Meetings

Technical 1 1 1 1 1 1 1 1 1 1 1 1

Collaboration 2 2 2

Board Meetings

Project (PMB) 1 1 1 1 1 1 1 1 1 1 1 1

Technical (TB) 1 1 1 1

Experiments (EB) 1 1 1

Collaboration (CB) 1 1

Dissemination (DB) 1 1

Peer Review (PRSC) 1

PPARC e-science? 1

Q1 Q4Q2 Q3

Quarterly Reporting to EU Yearly Reporting to PPARC?

Page 20: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 20

GridPP Collaboration Meeting

1st GridPP Collaboration Meeting - Coseners House - May 24/25 2001

Page 21: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 21

UK StrengthsWish to build on UK strengths -• Information Services• Networking - world leaders in monitoring• Security• Mass storage

UK Major Grid Leadership roles -• Lead DataGrid Architecture Task Force (Steve Fisher)• Lead DataGrid WP3 Information Services (Robin Middleton)• Lead DataGrid WP5 Mass Storage (John Gordon)

• ATLAS Software Coordinator (Norman McCubbin)• LHCb Grid Coordinator (Frank Harris)

Strong UK Collaboration with Globus• Globus people gave 2 day tutorial at RAL to PP community• Carl Kesselman attended UK Grid Technical meeting • 3 UK people visited Globus at Argonne

Natural UK Collaboration with US PPDG and GriPhyN

Page 22: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 22

Funding RequirementsFull exploitation requires £25.9M from PPARC plus £11.6M external funds

Minimum programme requires £21.0M from PPARC plus £10.3M external funds

Our proposal profiled this as

2001/2 2002/3 2003/4

£3.91M £8.43M £8.64M

c.f. PPARC profile

£3.0M £8.0M £15M

Profiling is driven by:

Hardware: Immediate need (Buy now) v Moore's law (Buy later)

UK Flat, CERN rising

Manpower: Immediate need + requirement for 3 year positions (Hire now) v Availability (spread out)

UK + CERN want to front load

Does not match PPARC funding profile

Page 23: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 23

Proposal Profile

Oct-01 Apr-02 Apr-03 TOTAL

£m Apr-02 Apr-03 Apr-04Assumed Funding Profile 3.91 8.43 8.64 20.98

UK Operational Costs + Misc 0.12 0.24 0.24 0.60UK Hardware 0.98 0.76 0.86 2.60UK Staff for EU-Datagrid Commitment 0.80 0.80 0.80 2.40UK Staff for GridPP Development 0.83 3.53 3.95 8.30CERN Staff 0.80 2.63 2.23 5.67Cern Hardware 0.38 0.47 0.56 1.42

Total 3.91 8.43 8.64 20.98FTEs

EU funded DataGrid funded staff 5.00 5.00 5.00 15.0PPARC funded DataGrid staff 14.81 14.81 14.81 44.4PPARC funded GridPP staff 15.30 65.28 73.06 153.6PPARC funded CERN staff 12.00 39.50 33.50 85.0

0.0540.067

7.08

UK Staff cost including overheads and PCCERN Staff cost including travel

Page 24: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 24

Reprofiling

Oct-01 Apr-02 Apr-03 TOTAL

£m Apr-02 Apr-03 Apr-04Assumed Funding Profile 2.50 7.00 11.48 20.98

UK Operation Costs + Misc 0.15 0.22 0.23 0.60UK Hardware 0.75 0.75 1.10 2.60UK Staff for EU-Datagrid Commitment 0.40 1.00 1.00 2.40UK Staff for GridPP Development 0.40 1.85 6.06 8.30CERN Staff 0.50 2.63 2.53 5.67Cern Hardware 0.30 0.55 0.56 1.41

Total 2.50 7.00 11.48 20.98FTEs

EU funded DataGrid funded staff 5.00 5.00 5.00 15.0PPARC funded DataGrid staff 7.41 18.52 18.52 44.4PPARC funded GridPP staff 7.41 34.20 112.13 153.7PPARC funded CERN staff 7.50 39.50 38.00 85.0

0.0540.067

UK Staff cost including overheads and PCCERN Staff cost including travel

7.08

Too many staff hired in 3rd year (for 1 year!)

One attempt to match PPARC profile

Page 25: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 25

First Year DeliverablesEach Workgroup has detailed deliverables. These will be refined each year and build on the successes of the previous year.

The Global Objectives for the first year are:• Deliver EU DataGrid Middleware (First prototype [M9])• Running experiments to integrate their data management

systems into existing facilities (e.g. mass storage)• Assessment of technological and sociological Grid analysis

needs• Experiments refine data models for analyses• Develop tools to allow bulk data transfer• Assess and implement metadata definitions• Develop relationships across multi-Tier structures and countries• Integrate Monte Carlo production tools• Provide experimental software installation kits• LHC experiments start Data Challenges• Feedback assessment of middleware tools

Page 26: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 26

External ResourcesExternal Funds (additional to PPARC Grants and central facilities) have provided computing equipment for several experiments and institutes

All these Resources will contribute directly to GridPP

• BaBar (Birmingham, Bristol, Brunel, Edinburgh, £0.8M (JREI) + £1.0M (JIF)Imperial, Liverpool, Manchester, QMUL, RAL, RHUL)

• MAP (Liverpool ) £0.3M (JREI)• ScotGrid (Edinburgh, Glasgow) £0.8M (JREI)• D0 (Lancaster) £0.4M (JREI) + £0.1M (Univ)• Dark Matter (Sheffield) £0.03M (JIF)• CDF/Minos (Glasgow, Liverpool, Oxford, UCL) £1.7M (JIF)• CMS (Imperial) £0.15M (JREI)• ALICE (Birmingham) £0.15M (JREI)

Total £5.4M

Many Particle Physics Groups are involved in large SRIF bids in collaboration with other disciplines mostly to form e-Science centres. The amount of resource available to GridPP from this SRIF round could be several £M

Page 27: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 27

First Year Priorities• Funding of PPARC's EU DataGrid staff

commitment• Staff to implement initial Grid Testbed in UK• Hardware to satisfy BaBar requirements and EU

DataGrid testbed commitment• Staff for CERN LHC computing• Contribution to CERN Tier-0 • 15 EU DataGrid 3 year posts (already committed)

£0.4M• 15 GridPP 3 year posts £0.4M• Hardware for BaBar Tier-A/LHC Tier-1 £0.9M • 15 CERN 3 year posts £0.5M• Hardware for CERN Tier-0 £0.3M

Total £2.5M

(Staff costs assume only 6 months of salary in first year)

Minimum viable programme to meet commitments

Page 28: GridPP Presentation to PPARC e-Science Committee 31 May 2001 Steve Lloyd Tony Doyle

GridPP e-Science Presentation Slide 28

Summary• Have been working towards this project for ~ 2

years building up hardware• Balanced exploitation programme costing £21M• Will put PPARC and the UK at the forefront of Grid

development in Europe• Funds installation and operation of experimental

testbeds, key infrastructure, generic middleware and making application code grid aware– Does NOT fund physics analysis or experiment

specific algorithms • Does not match well with PPARC's funding profile

– A mechanism for moving some money forward from the third year needs to be found

• Requires substantial investment NOW• £2.5M required this year to kick start programme