february 12, 2000: towards a us and lhc grid environment for experiments harvey newman (cit) towards...

38
February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Towards a US (and LHC) Grid Environment for HENP Experiments Environment for HENP Experiments CHEP 2000 Grid Workshop CHEP 2000 Grid Workshop Harvey B. Newman, Caltech Harvey B. Newman, Caltech Padova Padova February 12, 2000 February 12, 2000

Upload: gordon-hunt

Post on 12-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Towards a US (and LHC) Grid Towards a US (and LHC) Grid Environment for HENP ExperimentsEnvironment for HENP Experiments

CHEP 2000 Grid WorkshopCHEP 2000 Grid WorkshopHarvey B. Newman, CaltechHarvey B. Newman, Caltech

PadovaPadovaFebruary 12, 2000February 12, 2000

Page 2: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Data Grid Hierarchy: Integration, Data Grid Hierarchy: Integration, Collaboration, Marshal resourcesCollaboration, Marshal resources

Tier2 Center ~1 TIPS

Online System

Offline Farm~20 TIPS

CERN Computer Center

Fermilab~4 TIPS

France Regional Center

Italy Regional Center

Germany Regional Center

InstituteInstituteInstituteInstitute ~0.25TIPS

Workstations

~100 MBytes/sec

~100 MBytes/sec

~2.4 Gbits/sec

100 - 1000 Mbits/sec

Bunch crossing per 25 nsecs.100 triggers per secondEvent is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute has ~10 physicists working on one or more channels

Data for these channels should be cached by the institute server

Physics data cache

~PBytes/sec

~622 Mbits/sec or Air Freight

Tier2 Center ~1 TIPS

Tier2 Center ~1 TIPS

Tier2 Center ~1 TIPS

~622 Mbits/sec

Tier 0Tier 0

Tier 1Tier 1

Tier 3Tier 3

Tier 4Tier 4

1 TIPS = 25,000 SpecInt95

PC (today) = 10-15 SpecInt95

Tier2 Center ~1 TIPS

Tier 2Tier 2

Page 3: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

To Solve: the LHC “Data Problem”To Solve: the LHC “Data Problem”

The proposed LHC computing and data handling will not support FREE The proposed LHC computing and data handling will not support FREE access, transport or processing for more than a small part of the dataaccess, transport or processing for more than a small part of the data

Balance between proximity to large computational and data Balance between proximity to large computational and data handling facilities, and proximity to end users and more handling facilities, and proximity to end users and more local resources for frequently-accessed datasets local resources for frequently-accessed datasets

Strategies must be studied and prototyped, to ensure both:Strategies must be studied and prototyped, to ensure both: acceptable turnaround times, and efficient resource utilisation acceptable turnaround times, and efficient resource utilisation

Problems to be Explored Problems to be Explored

How to meet demands of hundreds of users who How to meet demands of hundreds of users who needneed transparent transparent access to local and remote data, in disk caches and tape stores access to local and remote data, in disk caches and tape stores

Prioritise hundreds of requests of local and remote communities,Prioritise hundreds of requests of local and remote communities, consistent with local and regional policies consistent with local and regional policies

Ensure that the system is dimensioned/used/managed Ensure that the system is dimensioned/used/managed

optimally, for the mixed workload optimally, for the mixed workload

Page 4: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Regional Center ArchitectureRegional Center Architecture Example by I. Gaines (MONARC) Example by I. Gaines (MONARC)

Tapes

Network from CERN

Networkfrom Tier 2& simulation centers

Tape Mass Storage & Disk Servers

Database Servers

PhysicsSoftware

Development

R&D Systemsand Testbeds

Info serversCode servers

Web ServersTelepresence

Servers

TrainingConsultingHelp Desk

ProductionReconstruction

Raw/Sim ESD

Scheduled, predictable

experiment/physics groups

ProductionAnalysis

ESD AODAOD DPD

Scheduled

Physics groups

Individual Analysis

AOD DPDand plots

Chaotic

PhysicistsDesktops

Tier 2

Local institutes

CERN

Tapes

Page 5: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Grid Services Architecture [*]:Grid Services Architecture [*]:

GridGridFabricFabric

GridGridServicesServices

ApplnApplnToolkitsToolkits

ApplnsApplns

Networks, data stores, computers, display devices, Networks, data stores, computers, display devices, etc.; associated local services (local implementations) etc.; associated local services (local implementations)

Protocols, authentication, policy, resource Protocols, authentication, policy, resource management, instrumentation, data discovery, etc.management, instrumentation, data discovery, etc.

......RemoteRemote

vizviztoolkittoolkit

RemoteRemotecomp.comp.toolkittoolkit

RemoteRemotedatadata

toolkittoolkit

RemoteRemotesensorssensorstoolkittoolkit

RemoteRemotecollab.collab.toolkittoolkit

HEP Data-Analysis HEP Data-Analysis Related ApplicationsRelated Applications

[*] Adapted from Ian Foster[*] Adapted from Ian Foster

Page 6: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Grid Hierarchy Goals: Better Grid Hierarchy Goals: Better Resource Use Resource Use andand Faster Turnaround Faster Turnaround

““Grid” integration and (de facto standard) common services to ease Grid” integration and (de facto standard) common services to ease

development, operation, management and security development, operation, management and security

Efficient resource use and improved responsiveness through:Efficient resource use and improved responsiveness through: Treatment of the ensemble of site and network resourcesTreatment of the ensemble of site and network resources

as an integrated (loosely coupled) systemas an integrated (loosely coupled) system Resource discovery, query estimation (redirection), Resource discovery, query estimation (redirection),

co-scheduling, prioritization, local and global co-scheduling, prioritization, local and global

allocationsallocations Network and site “instrumentation”: performance Network and site “instrumentation”: performance

tracking, monitoring, forward-prediction, problem tracking, monitoring, forward-prediction, problem

trapping and handlingtrapping and handling

Page 7: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

GriPhyN: First Production Scale GriPhyN: First Production Scale “Grid Physics Network”“Grid Physics Network”

Develop a New Integrated Distributed System, Develop a New Integrated Distributed System, while Meeting Primary Goals of the US LIGO, SDSS while Meeting Primary Goals of the US LIGO, SDSS

and LHC Programsand LHC Programs Unified GRID System Concept; Hierarchical StructureUnified GRID System Concept; Hierarchical Structure ~Twenty Centers; with Three Sub-Implementations ~Twenty Centers; with Three Sub-Implementations

5-6 Each in US for LIGO, CMS, ATLAS; 2-3 for SDSS5-6 Each in US for LIGO, CMS, ATLAS; 2-3 for SDSS Emphasis on Training, Mentoring and Remote CollaborationEmphasis on Training, Mentoring and Remote Collaboration

Focus on LIGO, SDSS (+BaBar and Run2) handling of real Focus on LIGO, SDSS (+BaBar and Run2) handling of real data, and LHC Mock Data Challenges with simulated datadata, and LHC Mock Data Challenges with simulated data

Making the Process of Discovery Making the Process of Discovery Accessible to Students WorldwideAccessible to Students Worldwide

GriPhyN Web Site: http://www.phys.ufl.edu/~avery/mre/GriPhyN Web Site: http://www.phys.ufl.edu/~avery/mre/

White Paper: http://www.phys.ufl.edu/~avery/mre/white_paper.htmlWhite Paper: http://www.phys.ufl.edu/~avery/mre/white_paper.html

Page 8: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Grid Development IssuesGrid Development Issues

Integration of applications with Grid MiddlewareIntegration of applications with Grid Middleware Performance-oriented user application software architecturePerformance-oriented user application software architecture

is required, to deal with the realities of data access and deliveryis required, to deal with the realities of data access and delivery Application frameworks must work with system state and policy Application frameworks must work with system state and policy

information (“instructions”) from the Gridinformation (“instructions”) from the Grid O(R)DBMS’s must be extended to work across networksO(R)DBMS’s must be extended to work across networks

E.g. “Invisible” (to the DBMS) data transport, and catalog updateE.g. “Invisible” (to the DBMS) data transport, and catalog update Interfacility cooperation at a new level, across Interfacility cooperation at a new level, across

world regionsworld regions Agreement on choice and implementation of standard Grid Agreement on choice and implementation of standard Grid

components, services, security and authenticationcomponents, services, security and authentication Interface the common services locally to match with Interface the common services locally to match with

heterogeneous resources, performance levels, and local heterogeneous resources, performance levels, and local operational requirementsoperational requirements

Accounting and “exchange of value” software to enable Accounting and “exchange of value” software to enable cooperationcooperation

Page 9: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

RD45, RD45, GIOD:GIOD: Networked Object DatabasesNetworked Object Databases Clipper/GC;Clipper/GC; High speed access to Objects or File data High speed access to Objects or File data

FNAL/SAMFNAL/SAM for processing and analysis for processing and analysis SLAC/OOFS Distributed File System + Objectivity Interface SLAC/OOFS Distributed File System + Objectivity Interface NILE, Condor:NILE, Condor: Fault Tolerant Distributed Computing with Fault Tolerant Distributed Computing with

Heterogeneous CPU ResourcesHeterogeneous CPU Resources

MONARC:MONARC: LHC Computing Models: LHC Computing Models: Architecture, Simulation, Strategy, PoliticsArchitecture, Simulation, Strategy, Politics

PPDG:PPDG: First Distributed Data Services and First Distributed Data Services and Data Grid System Prototype Data Grid System Prototype

ALDAP:ALDAP: OO Database Structures and Access Methods OO Database Structures and Access Methods for Astrophysics and HENP Datafor Astrophysics and HENP Data

GriPhyN: GriPhyN: Production-Scale Data GridProduction-Scale Data Grid Simulation/Modeling, Application + Network Simulation/Modeling, Application + Network

Instrumentation, System Optimization/EvaluationInstrumentation, System Optimization/Evaluation APOGEEAPOGEE

Roles of ProjectsRoles of Projectsfor HENP Distributed Analysisfor HENP Distributed Analysis

Page 10: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Other ODBMS tests Other ODBMS tests

0

2 0 0 0

4 0 0 0

6 0 0 0

8 0 0 0

1 0 0 0 0

1 2 0 0 0

1 4 0 0 0

1 6 0 0 0

1 8 0 0 0

2 0 0 0 0

0 5 0 1 0 0 1 5 0 2 0 0 2 5 0

U p d a t e N u m b e r ( T i m e o f D a y )

mil

ise

co

nd

s

c r e a t e L A Nc r e a t e W A Nc o m m i t L A Nc o m m i t W A N

S a t u r a t e d h o u r s ~ 1 0 k b i t s / s e c o n d U n s a t u r a t e d ~ 1 M b i t s / s e c o n d

DRO WAN Tests with CERN

Production on CERN’s PCSF

and file movement to

Caltech

Objectivity/DB Creation of 32000

database federation

Tests with Versant(fallback ODBMS)

Page 11: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

The China Clipper Project:The China Clipper Project:A Data Intensive GridA Data Intensive Grid

China Clipper GoalChina Clipper GoalDevelop and demonstrate middleware allowing Develop and demonstrate middleware allowing

applications transparent, high-speed access to large applications transparent, high-speed access to large data sets distributed over wide-area networksdata sets distributed over wide-area networks..

Builds on expertise and assets at ANL, LBNL & SLACBuilds on expertise and assets at ANL, LBNL & SLAC NERSC, ESnetNERSC, ESnet

Builds on Globus Middleware and high-performance Builds on Globus Middleware and high-performance distributed storage system distributed storage system (DPSS from LBNL)(DPSS from LBNL) Initial focus on large DOE HENP applicationsInitial focus on large DOE HENP applications

RHIC/STAR, BaBarRHIC/STAR, BaBar Demonstrated data rates to 57 Mbytes/sec.Demonstrated data rates to 57 Mbytes/sec.

ANL-SLAC-Berkeley

Page 12: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Grand Challenge ArchitectureGrand Challenge Architecture

An order-optimized prefetch architecture for data retrieval An order-optimized prefetch architecture for data retrieval from multilevel storage in a multiuser environmentfrom multilevel storage in a multiuser environment

Queries select events and specific event components based Queries select events and specific event components based upon tag attribute rangesupon tag attribute ranges Query estimates are provided prior to executionQuery estimates are provided prior to execution Queries are monitored for progress, multi-useQueries are monitored for progress, multi-use

Because event components are distributed over several files, Because event components are distributed over several files, processing an event requires delivery of a “bundle” of files processing an event requires delivery of a “bundle” of files

Events are delivered in an order that takes advantage of what is Events are delivered in an order that takes advantage of what is already on disk, and multiuser policy-based prefetching of already on disk, and multiuser policy-based prefetching of further data from tertiary storagefurther data from tertiary storage

GCA intercomponent communication is CORBA-based, but GCA intercomponent communication is CORBA-based, but physicists are shielded from this layerphysicists are shielded from this layer

Page 13: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

GCA System OverviewGCA System Overview

Client

GCASTACS

Stagedeventfiles

EventTags

(Other)disk-resident

event data

Index

HPSSpftp

File Catalog

ClientClient

Client

Client

Page 14: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

STorage Access Coordination System STorage Access Coordination System (STACS)(STACS)

QueryEstimator

QueryMonitor

CacheManager

PolicyModule

Bit-SlicedIndex

FileCatalog

Query Status,Cache Map

Query

Estimate

File Bundles,Event lists

Pftp andfile purgecommands

List of file bundles and events

Requests for file caching and purging

Page 15: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

The Particle Physics Data Grid (PPDG)The Particle Physics Data Grid (PPDG)

First Year Goal: First Year Goal: Optimized cached read access to 1-10 Gbytes, Optimized cached read access to 1-10 Gbytes, drawn from a total data set of order One Petabytedrawn from a total data set of order One Petabyte

PRIMARY SITEPRIMARY SITEData Acquisition,Data Acquisition,

CPU, Disk, CPU, Disk, Tape RobotTape Robot

SECONDARY SITESECONDARY SITECPU, Disk, CPU, Disk, Tape RobotTape Robot

Site to Site Data Replication Service

100 Mbytes/sec

ANL, BNL, Caltech, FNAL, JLAB, LBNL, ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC, U.Wisc/CSSDSC, SLAC, U.Wisc/CS

Multi-Site Cached File Access Service

UniversityUniversityCPU, Disk, CPU, Disk,

UsersUsersPRIMARY SITEPRIMARY SITE

DAQ, Tape, DAQ, Tape, CPU, CPU,

Disk, RobotDisk, Robot

Satellite SiteSatellite SiteTape, CPU, Tape, CPU, Disk, RobotDisk, Robot

UniversityUniversityCPU, Disk, CPU, Disk,

UsersUsers

UniversityUniversityCPU, Disk, CPU, Disk,

UsersUsers

UniversityUniversityCPU, Disk, CPU, Disk,

UsersUsers

UniversityUniversityCPU, Disk, CPU, Disk,

UsersUsers

Satellite SiteSatellite SiteTape, CPU, Tape, CPU, Disk, RobotDisk, Robot

Page 16: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

The Particle Physics Data Grid The Particle Physics Data Grid (PPDG)(PPDG)

The ability to query and partially retrieve hundreds of terabytes The ability to query and partially retrieve hundreds of terabytes across Wide Area Networks within seconds, across Wide Area Networks within seconds,

PPDG uses advanced services in three areas:PPDG uses advanced services in three areas: Distributed caching:Distributed caching: to allow for rapid data delivery in response to allow for rapid data delivery in response

to multiple requests to multiple requests Matchmaking and Request/Resource co-scheduling:Matchmaking and Request/Resource co-scheduling:

to manage workflow and use computing and net resources to manage workflow and use computing and net resources efficiently; to achieve high throughputefficiently; to achieve high throughput

Differentiated Services:Differentiated Services: to allow particle-physics bulk data transport to allow particle-physics bulk data transport to coexist with interactive and real-time remote collaboration to coexist with interactive and real-time remote collaboration sessions, and other network traffic.sessions, and other network traffic.

Page 17: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

PPDG: Architecture for Reliable High PPDG: Architecture for Reliable High Speed Data DeliverySpeed Data Delivery

Object-based andObject-based andFile-based Application File-based Application

ServicesServices

Cache ManagerCache Manager

File AccessFile AccessServiceService

Matchmaking Matchmaking ServiceService

Cost EstimationCost Estimation

File FetchingFile FetchingServiceService

File Replication File Replication IndexIndex

End-to-End End-to-End Network ServicesNetwork Services

Mass Storage Mass Storage ManagerManager

Resource Resource ManagementManagement

File MoverFile Mover

File MoverFile Mover

Site BoundarySite Boundary Security DomainSecurity Domain

+ Future+ FutureFile and Object Export;File and Object Export;

Cache & State Tracking;Cache & State Tracking;Forward PredictionForward Prediction

Page 18: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

First Year PPDG “System” First Year PPDG “System” ComponentsComponents

Middleware Components (Initial Choice): See PPDG ProposalMiddleware Components (Initial Choice): See PPDG Proposal Object and File-Based Object and File-Based Objectivity/DB (SLAC enhanced)Objectivity/DB (SLAC enhanced)

Application Services Application Services GC Query Object, Event Iterator,GC Query Object, Event Iterator, Query Monitor Query Monitor

FNAL SAM SystemFNAL SAM System Resource ManagementResource Management Start with Human InterventionStart with Human Intervention

(but begin to deploy resource discovery & mgmnt tools: Condor, SRB)(but begin to deploy resource discovery & mgmnt tools: Condor, SRB) File Access Service File Access Service Components of OOFS (SLAC)Components of OOFS (SLAC) Cache ManagerCache Manager GC Cache Manager (LBNL)GC Cache Manager (LBNL)

Mass Storage ManagerMass Storage Manager HPSS,HPSS, Enstore,Enstore, OSM (Site-dependent) OSM (Site-dependent) Matchmaking Service Matchmaking Service Condor (U. Wisconsin)Condor (U. Wisconsin) File Replication Index File Replication Index MCAT (SDSC)MCAT (SDSC) Transfer Cost Estimation ServiceTransfer Cost Estimation Service Globus (ANL)Globus (ANL)

File Fetching ServiceFile Fetching Service Components of OOFSComponents of OOFS File Movers(s) File Movers(s) SRB (SDSC)SRB (SDSC) ; Site specific ; Site specific End-to-end Network ServicesEnd-to-end Network Services Globus tools for QoS reservationGlobus tools for QoS reservation

Security and authenticationSecurity and authentication Globus (ANL) Globus (ANL)

Page 19: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

CONDOR MatchmakingCONDOR Matchmaking A Resource Allocation Paradigm A Resource Allocation Paradigm

Parties use ClassAds to Parties use ClassAds to advertise properties, advertise properties, requirements and ranking requirements and ranking to a matchmakerto a matchmaker

ClassAds are Self-ClassAds are Self-describing (no separate describing (no separate schema)schema)

ClassAds combine query ClassAds combine query and dataand data

http://www.cs.wisc.edu/condorhttp://www.cs.wisc.edu/condor

ResourceLocal Resource Management

Owner AgentEnvironment Agent

Customer AgentApplication Agent

Application

Page 20: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Request

Queue

OwnerAgent

ExecutionAgent

ApplicationProcess

CustomerAgent

ApplicationProcess

ApplicationAgent

Data &ObjectFiles

CkptFiles

ObjectFiles

RemoteI/O &Ckpt

ObjectFiles

Submission Execution

Remote Execution in CondorAgents for Remote Execution in CONDOR

Page 21: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Mobile AgentsMobile Agents Execute AsynchronouslyExecute Asynchronously Reduce Network Load: Local ConversationsReduce Network Load: Local Conversations Overcome Network Latency; Some OutagesOvercome Network Latency; Some Outages Adaptive Adaptive Robust, Fault Tolerant Robust, Fault Tolerant Naturally Heterogeneous Naturally Heterogeneous Extensible Concept: Extensible Concept: Agent HierarchiesAgent Hierarchies

Beyond Traditional Architectures:Beyond Traditional Architectures:Mobile Agents (Java Aglets)Mobile Agents (Java Aglets)

““Agents are objects with rules and legs” -- D. TaylorAgents are objects with rules and legs” -- D. Taylor

Application

Se

rvic

e

Ag

entAgent

Ag

ent A

gen

tA

gen

t

Ag

ent

Ag

ent

Page 22: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Using the Globus ToolsUsing the Globus Tools

Tests with “gsiftp”, a modified ftp Tests with “gsiftp”, a modified ftp server/client that allows control server/client that allows control of the TCP buffer sizeof the TCP buffer size

Transfers of Objy database files Transfers of Objy database files from the Exemplar tofrom the Exemplar to ItselfItself An O2K at Argonne (via An O2K at Argonne (via

CalREN2 and Abilene)CalREN2 and Abilene) A Linux machine at INFN (via A Linux machine at INFN (via

US-CERN Transatlantic link)US-CERN Transatlantic link) Target /dev/null in multiple Target /dev/null in multiple

streams (1 to 16 parallel gsiftp streams (1 to 16 parallel gsiftp sessions). sessions).

Aggregate throughput as a Aggregate throughput as a function of number of streams function of number of streams and send/receive buffer sizesand send/receive buffer sizes

gsiftp rate as a function of Buffer Size (single stream over HiPPI)

0

5000

10000

15000

20000

25000

30000

0 500 1000 1500 2000 2500

Buffer Size (kBytes)

Ra

te (

kB

ytes

/sec

on

d)

gsiftp rate as a function of Buffer Size (single stream to Argonne)

0

500

1000

1500

2000

2500

3000

3500

4000

4500

0 500 1000 1500 2000 2500

Buffer Size (kBytes)

Ra

te (

kB

yte

s/s

ec

on

d)

gsiftp Aggregate rate to Argonne as function of the number of parallel streams

0

500

1000

1500

2000

2500

0 2 4 6 8 10 12 14 16 18

Number of parallel streams

Rat

e (k

Byt

es/s

ec)

~25 MB/sec on HiPPI loop-back

~4MB/sec to Argonne by tuning TCP window size

Saturating available B/W to

Argonne

Page 23: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Distributed Data Delivery and Distributed Data Delivery and LHC Software ArchitectureLHC Software Architecture

Software Architectural ChoicesSoftware Architectural Choices

Traditional, single-threaded applicationsTraditional, single-threaded applications Wait Wait for data location, arrival and reassembly for data location, arrival and reassembly

OROR

Performance-Oriented (Complex)Performance-Oriented (Complex) I/O requests up-front; multi-threaded; data driven;I/O requests up-front; multi-threaded; data driven;

respond to ensemble of (changing) cost estimates respond to ensemble of (changing) cost estimates Possible code movement as well as data movementPossible code movement as well as data movement Loosely coupled, dynamicLoosely coupled, dynamic

Page 24: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

GriPhyN FoundationGriPhyN Foundation

Build on the Distributed System Results of the GIOD, Build on the Distributed System Results of the GIOD, MONARC, NILE, Clipper/GC and PPDG ProjectsMONARC, NILE, Clipper/GC and PPDG Projects

Long Term Vision in Three PhasesLong Term Vision in Three Phases 1. 1. Read/write access to high volume data and processing powerRead/write access to high volume data and processing power

Condor/Globus/SRB + NetLogger components to manage jobs Condor/Globus/SRB + NetLogger components to manage jobs and resources and resources

2. 2. WAN-distributed data-intensive Grid computing system WAN-distributed data-intensive Grid computing system Tasks move automatically to the “most effective” Node in the GridTasks move automatically to the “most effective” Node in the Grid Scalable implementation using mobile agent technologyScalable implementation using mobile agent technology

3. 3. “Virtual Data” concept for multi-PB distributed data management,“Virtual Data” concept for multi-PB distributed data management, with large-scale Agent Hierarchies with large-scale Agent Hierarchies

Transparently match data to sites, manage data replication or Transparently match data to sites, manage data replication or transport, co-schedule data & compute resourcestransport, co-schedule data & compute resources

Build on VRVS Developments for Remote CollaborationBuild on VRVS Developments for Remote Collaboration

Page 25: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

INSTRUMENTATION, SIMULATION, OPTIMIZATION, COORDINATIONINSTRUMENTATION, SIMULATION, OPTIMIZATION, COORDINATION

SIMULATION of a Production-Scale Grid HierarchySIMULATION of a Production-Scale Grid Hierarchy Provide a Toolset for HENP experiments to test and optimize Provide a Toolset for HENP experiments to test and optimize

their data analysis and resource usage strategiestheir data analysis and resource usage strategies

INSTRUMENTATION of Grid PrototypesINSTRUMENTATION of Grid Prototypes Characterize the Grid components’ performance under loadCharacterize the Grid components’ performance under load Validate the SimulationValidate the Simulation Monitor, Track and Report system state, trends and “Events”Monitor, Track and Report system state, trends and “Events”

OPTIMIZATION of the Data GridOPTIMIZATION of the Data Grid Genetic algorithms, or other evolutionary methodsGenetic algorithms, or other evolutionary methods Deliver optimization package for HENP distributed systemsDeliver optimization package for HENP distributed systems Applications to other experiments; accelerator and other Applications to other experiments; accelerator and other

control systems; other fieldscontrol systems; other fields

COORDINATE with Experiment-Specific Projects: COORDINATE with Experiment-Specific Projects: CMS, ATLAS, CMS, ATLAS, BaBar, Run2BaBar, Run2

GriPhyN/APOGEE: Production-Design GriPhyN/APOGEE: Production-Design of a Data Analysis Gridof a Data Analysis Grid

Page 26: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Grid (IT) Issues to be AddressedGrid (IT) Issues to be Addressed

Dataset compaction; data caching and mirroring strategies Dataset compaction; data caching and mirroring strategies Using large time-quanta or very high bandwidth bursts, Using large time-quanta or very high bandwidth bursts,

for large data transactionsfor large data transactions Query estimators, Query Monitors (cf. GCA work)Query estimators, Query Monitors (cf. GCA work)

Enable flexible, resilient prioritisation schemes (marginal utility)Enable flexible, resilient prioritisation schemes (marginal utility) Query redirection, fragmentation, priority alteration, etc.Query redirection, fragmentation, priority alteration, etc.

Pre-Emptive and realtime data/resource matchmakingPre-Emptive and realtime data/resource matchmaking Resource discoveryResource discovery

Data and CPU Location BrokersData and CPU Location Brokers Co-scheduling and queueing processesCo-scheduling and queueing processes

State, workflow, & performance-monitoring State, workflow, & performance-monitoring instrumentation; tracking and forward predictioninstrumentation; tracking and forward prediction

Security: Authentication (for resource allocation/usage Security: Authentication (for resource allocation/usage and and priority); running a certificate authoritypriority); running a certificate authority

Page 27: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

CMS Example: Data Grid CMS Example: Data Grid Program of Work (I)Program of Work (I)

FY 2000FY 2000 Build basic services; “1 Million event” samples on proto-Tier2’sBuild basic services; “1 Million event” samples on proto-Tier2’s

For HLT milestones and detector/physics studies with ORCAFor HLT milestones and detector/physics studies with ORCA MONARC Phase 3 simulations for study/optimization MONARC Phase 3 simulations for study/optimization

FY 2001 FY 2001 Set up initial Grid system based on PPDG deliverablesSet up initial Grid system based on PPDG deliverables

at the first Tier2 centers and Tier1-prototype centers at the first Tier2 centers and Tier1-prototype centers High speed site-to-site file replication serviceHigh speed site-to-site file replication service Multi-site cached file accessMulti-site cached file access

CMS Data Challenges in support of DAQ TDRCMS Data Challenges in support of DAQ TDR Shakedown of preliminary PPDG (+ MONARC and GIOD) Shakedown of preliminary PPDG (+ MONARC and GIOD)

system strategies and tools system strategies and tools FY 2002FY 2002

Deploy Grid system at the second set of Tier2 centersDeploy Grid system at the second set of Tier2 centers CMS Data Challenges for Software and Computing TDR CMS Data Challenges for Software and Computing TDR

and Physics TDR and Physics TDR

Page 28: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Data Analysis Grid Program of Data Analysis Grid Program of Work (II)Work (II)

FY 2003 FY 2003 Deploy Tier2 centers at last set of sitesDeploy Tier2 centers at last set of sites 5%-Scale Data Challenge in Support of Physics TDR5%-Scale Data Challenge in Support of Physics TDR Production-prototype test of Grid Hierarchy System,Production-prototype test of Grid Hierarchy System,

with first elements of the production Tier1 Center with first elements of the production Tier1 Center

FY 2004 FY 2004 20% Production (Online and Offline) CMS Mock Data Challenge, 20% Production (Online and Offline) CMS Mock Data Challenge,

with all Tier2 Centers, and partly completed Tier1 Centerwith all Tier2 Centers, and partly completed Tier1 Center Build Production-quality Grid SystemBuild Production-quality Grid System

FY 2005 (Q1 - Q2) FY 2005 (Q1 - Q2) Final Production CMS (Online and Offline) ShakedownFinal Production CMS (Online and Offline) Shakedown Full distributed system software and instrumentationFull distributed system software and instrumentation Using full capabilities of the Tier2 and Tier1 CentersUsing full capabilities of the Tier2 and Tier1 Centers

Page 29: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

SummarySummary The HENP/LHC data handling problem The HENP/LHC data handling problem

Multi-Petabyte scale, binary pre-filtered data, resources Multi-Petabyte scale, binary pre-filtered data, resources distributed worldwidedistributed worldwide

Has no analog now, but will be increasingly Has no analog now, but will be increasingly prevalent in research, and industry by ~2005. prevalent in research, and industry by ~2005.

Development of a robust PB-scale networked data Development of a robust PB-scale networked data access and analysis system is mission-critical access and analysis system is mission-critical

An effective partnership exists, HENP-wide, through An effective partnership exists, HENP-wide, through many R&D projects - many R&D projects -

RD45, GIOD, MONARC, Clipper, GLOBUS, CONDOR, ALDAP, PPDG, ...RD45, GIOD, MONARC, Clipper, GLOBUS, CONDOR, ALDAP, PPDG, ... An aggressive R&D program is required to developAn aggressive R&D program is required to develop

Resilient “self-aware” systems, for data access, Resilient “self-aware” systems, for data access, processing and analysis across a hierarchy of networksprocessing and analysis across a hierarchy of networks

Solutions that could be widely applicable to data problems Solutions that could be widely applicable to data problems in other scientific fields and industry, by LHC startupin other scientific fields and industry, by LHC startup

Focus on Data Grids for Next Generation PhysicsFocus on Data Grids for Next Generation Physics

Page 30: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

LHC Data Models: 1994-2000LHC Data Models: 1994-2000

HEP data models are complex!HEP data models are complex! Rich hierarchy of hundreds of Rich hierarchy of hundreds of

complex data types (classes)complex data types (classes) Many relations between themMany relations between them Different access patterns Different access patterns

(Multiple Viewpoints)(Multiple Viewpoints)

OO technologyOO technology OO applications deal with networks OO applications deal with networks

of objects (and containers)of objects (and containers) Pointers (or references) are Pointers (or references) are

used to describe relations used to describe relations

Existing solutions do not scaleExisting solutions do not scale Solution suggested by RD45: Solution suggested by RD45:

ODBMS coupled to a Mass ODBMS coupled to a Mass Storage SystemStorage System

Construction of “Compact” Datasets for Analysis:Construction of “Compact” Datasets for Analysis:Rapid Access/Navigation/TransportRapid Access/Navigation/Transport

EventEvent

TrackListTrackList

TrackerTracker CalorimeterCalorimeter

TrackTrackTrackTrack

TrackTrackTrackTrackTrackTrack

HitListHitList

HitHitHitHitHitHitHitHitHitHit

Page 31: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Web-Based Server-Farm Networks Circa 2000Web-Based Server-Farm Networks Circa 2000Dynamic (Grid-Like) Content Delivery EnginesDynamic (Grid-Like) Content Delivery Engines

Akamai,Akamai, Adero, Sandpiper Adero, Sandpiper 11200 200 Thousands Thousands of Network-Resident Serversof Network-Resident Servers

25 25 60 ISP Networks 60 ISP Networks 25 25 30 Countries 30 Countries 40+ Corporate Customers40+ Corporate Customers $ 25 B Capitalization$ 25 B Capitalization

Resource DiscoveryResource Discovery Build “Weathermap” of Server Network (State Tracking)Build “Weathermap” of Server Network (State Tracking) Query Estimation; Matchmaking/Optimization; Query Estimation; Matchmaking/Optimization;

Request rerouting Request rerouting Virtual IP Addressing: One address per server-farmVirtual IP Addressing: One address per server-farm

Mirroring, CachingMirroring, Caching (1200) Autonomous-Agent Implementation(1200) Autonomous-Agent Implementation

Content Delivery Networks (CDN)Content Delivery Networks (CDN)

Page 32: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Strawman Tier 2 EvolutionStrawman Tier 2 Evolution

20002000 20052005

Linux Farm:Linux Farm: 1,200 SI95 20,000 SI95 1,200 SI95 20,000 SI95** Disks on CPUsDisks on CPUs 4 TB 4 TB 50 TB50 TB RAID ArrayRAID Array 1 TB 1 TB 30 TB30 TB Tape LibraryTape Library 1-2 TB 50-100 TB 1-2 TB 50-100 TB LAN Speed 0.1 - 1 Gbps 10-100 GbpsLAN Speed 0.1 - 1 Gbps 10-100 Gbps WAN Speed 155 - 622 Mbps 2.5 - 10 GbpsWAN Speed 155 - 622 Mbps 2.5 - 10 Gbps Collaborative MPEG2 VGA Realtime HDTVCollaborative MPEG2 VGA Realtime HDTV

Infrastructure (1.5 - 3 Mbps) (10 - 20 Mbps)Infrastructure (1.5 - 3 Mbps) (10 - 20 Mbps)

[*][*] Reflects lower Tier 2 component costs due to less demanding Reflects lower Tier 2 component costs due to less demanding usage. Some of the CPU will be used for simulation. usage. Some of the CPU will be used for simulation.

Page 33: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

USCMS S&C Spending profileUSCMS S&C Spending profile

US CMS Software and Computing Project

CAS: People

UF: People

UF: Hardware

Tier2: People

Tier2: Hardware

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

18.00

FY1999 FY2000 FY2001 FY2002 FY2003 FY2004 FY2005Fiscal Year

$[M]

Operations / Year

2006 is a model year for the operations phase of CMS

Page 34: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

GriPhyN CostGriPhyN Cost

System supportSystem support $ 8.0 M$ 8.0 M R&DR&D $ 15.0 $ 15.0

MM SoftwareSoftware $ 2.0 M$ 2.0 M Tier 2 networkingTier 2 networking $ 10.0 M$ 10.0 M Tier 2 hardwareTier 2 hardware $ 50.0 M$ 50.0 M TotalTotal $ 85.0 $ 85.0

MM

Page 35: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Grid Hierarchy Concept:Grid Hierarchy Concept:Broader AdvantagesBroader Advantages

Partitioning of users into Partitioning of users into “proximate”“proximate” communities communitiesinto for support, troubleshooting, mentoringinto for support, troubleshooting, mentoring

Partitioning of facility tasks, to manage and focus Partitioning of facility tasks, to manage and focus resourcesresources

Greater flexibility to pursue different physics interests, Greater flexibility to pursue different physics interests, priorities, and resource allocation strategies by regionpriorities, and resource allocation strategies by region

Lower tiers of the hierarchy Lower tiers of the hierarchy More local control More local control

Page 36: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Storage Request Brokers (SRB)Storage Request Brokers (SRB)

Name Transparency: Access to data by attributes Name Transparency: Access to data by attributes stored in an RDBMS (MCAT).stored in an RDBMS (MCAT).

Location Transparency: Logical collections (by Location Transparency: Logical collections (by attributes) spanning multiple physical resources.attributes) spanning multiple physical resources.

Combined Location and Name Transparency meansCombined Location and Name Transparency meansthat datasets can be replicated across multiple caches that datasets can be replicated across multiple caches and data archives (PPDG).and data archives (PPDG).

Data Management Protocol Transparency: SRB with Data Management Protocol Transparency: SRB with custom-built drivers in front of each storage systemcustom-built drivers in front of each storage system

User does not need to know how the data is accessed;User does not need to know how the data is accessed;SRB deals with local file system managersSRB deals with local file system managers

SRBs (agents) authenticate themselves and users, SRBs (agents) authenticate themselves and users, using Grid Security Infrastructure (GSI)using Grid Security Infrastructure (GSI)

Page 37: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Role of SimulationRole of Simulationfor Distributed Systemsfor Distributed Systems

Simulations are widely recognized and used as essential toolsSimulations are widely recognized and used as essential tools for the design, performance evaluation and optimisation for the design, performance evaluation and optimisation

of complex distributed systemsof complex distributed systems From battlefields to agriculture; from the factory floor to From battlefields to agriculture; from the factory floor to

telecommunications systemstelecommunications systems Discrete event simulations with an appropriate and Discrete event simulations with an appropriate and

high level of abstraction high level of abstraction Just beginning to be part of the HEP cultureJust beginning to be part of the HEP culture

Some experience in trigger, DAQ and tightly coupledSome experience in trigger, DAQ and tightly coupledcomputing systems: CERN CS2 models (Event-oriented)computing systems: CERN CS2 models (Event-oriented)

MONARC (Process-Oriented; Java 2 Threads + Class Lib) MONARC (Process-Oriented; Java 2 Threads + Class Lib) These simulations are very different from HEP “Monte Carlos”These simulations are very different from HEP “Monte Carlos”

““Time” intervals and interrupts are the essentialsTime” intervals and interrupts are the essentials

Simulation is a vital part of the study of site architectures,Simulation is a vital part of the study of site architectures, network behavior, data access/processing/delivery strategies, network behavior, data access/processing/delivery strategies,

for HENP Grid Design and Optimization for HENP Grid Design and Optimization

Page 38: February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT) Towards a US (and LHC) Grid Environment for HENP Experiments

February 12, 2000: Towards a US and LHC Grid Environment for Experiments Harvey Newman (CIT)

Monitoring Architecture:Monitoring Architecture:Use of NetLogger in CLIPPERUse of NetLogger in CLIPPER

End-to-end monitoring of End-to-end monitoring of grid assets is necessary togrid assets is necessary to Resolve network Resolve network

throughput problemsthroughput problems Dynamically schedule Dynamically schedule

resourcesresources

Add precision-timed event Add precision-timed event monitor agents to:monitor agents to:

ATM switches ATM switches Storage serversStorage servers Testbed computational Testbed computational

resourcesresources

Produce trend analysis Produce trend analysis modules for monitor agentsmodules for monitor agents

Make results available to Make results available to applicationsapplications