open science grid @jointtechs ruth pordes fermilab, july 17th 2006 what is osg where networking fits...
TRANSCRIPT
Open Science Grid @JointTechs
Ruth PordesFermilab, July 17th 2006
What is OSG
Where Networking fits
Middleware
Security
Networking & OSG
Outline
2
OSG - goals and scope OSG is a Consortium joining efforts from disparate projects
OSG is national distributed facility for many sciences: providing access to heterogeneous computing, network and storage resources and common operational services, software stack, policies and procedures.
Stakeholders include US HEP and NP experiments and Fermilab users, LIGO, Astrophysics… and core Grid technology groups.
OSG IInfrastructurenfrastructure is a core piece of the WLCG and delivers accountable resources and cycles for LHC experiment production and analysis.
OSG relies on many external projects for development and support . E.g. for Network infrastructures.
OSG federates with other grids - especially TeraGrid, Campus Grids and EGEE.
OSG must deliver to LHC and LIGO milestones and schedules -- ie 2007, 2008 throughput and capacity needs.
3
OSG Consortium
US LHC, LIGO, Tevatron Run II, STAR, SDSS,
DOE Labs - BNL, Fermilab, JLab, LBNL, SLAC
Multidisciplinary Campus Grids - GLOW, GROW, Crimson Grid
Bio-Informatics - GADU, FMRI
Condor, Globus, Storage Resource Manager
4
OSG’s world is flat - a Grid of Grids - from Local to Global
Global ScienceCommunity Systems
e.g. FermiGrid, NWIC
Local Campus And RegionalGrids
NationalCyberInfrastructures
for Science
e.g. OSG-TeraGrid
e.g. CMS , D0
5
OSG’s world is flat - a Grid of Grids - from Local to Global
Global ScienceCommunity Systems
e.g. FermiGrid, NWIC
Local Campus And RegionalGrids
NationalCyberInfrastructures
for Science
e.g. OSG-TeraGrid
e.g. CMS , D0
People Working Together!
People Working Together!
6
Monitored “OSG jobs”-- July2005-2006
3000
OSG 0.4.0 deployment
Off to a running start … but lot’s more to do.—Routinely exceeding 1Gbps at 3 sites:
—Need to Scale by x4 by 2008 with many more sites—Routinely exceeding 1000 running jobs per client
—Need to Scale by at least x10 by 2008—Have reached 99% success rate for 10,000 jobs per day submission
—Need to reach this routinely, even under heavy load
7
Operations Model
In practice, support
organizations often play
multiple rolesLines represent communication paths and, in our model, agreements.
We have not progressed very far with agreements yet.
Gray shading indicates that OSG Operations composed of effort from all the support centers.
8
Outline, OSG Network People
Where Networking fits
TG-Networks - Shawn McGee and Don Petravick.
TG-Storage currently covers data distribution and management issues also.
LHC Tier-1s network groups.
MonaLisa monitoring & accounting service.
9
Uses commodity networks - ESNet, Campus LANs
Some sites are well network provisioned -- connected to Starlight etc.
Some OSG sites are also on TeraGrid.
Connectivity of Resources range from full-duplex, outgoing only, to fully behind firewalls.
Network Connectivity - purvue of the Sites & VOs: Labs and Universities
10
CMS Experiment - example of a global community grid
GermanyTaiwan UKItaly
Data & jobs moving locally, regionally & globally within CMS grid.
Transparently across grid boundaries from campus to the world.
Florida
USA@FNAL
CERN
Caltech
Wisconsin
UCSD
France
Purdue
MIT
UNL
OSG
EGEE
11
CMS Global Operations
Job submission:
16,000 jobs per day submitted across EGEE & OSG via INFN RB.
Data movement to 39 sites worldwide for CMS data transfer challenge.
Peak transfer rates of ~5Gbps are reached.
All 7 CMS OSG sites have reached 5TB/day goal. Caltech, Florida, UCSD exceed 10TB/day.
12
ATLAS Global Data Distribution
Deploys “DQ” data management on “Edge Service”. VO specific hardware at Edge of each site.
OSG plans to deploy such Edge Services based on XEN VMs to enable better site control and auditing.
13
Middleware
Me: thin user layer
My friends:VO services
VO infrastructureVO admins
The Grid: anonymous sites & admins Common to all.
Me & My friendsare usually domain science specific.
Middleware & Service Principles: Me -- My friends -- The grid
14
OSG Middleware Layering
NSF Middleware Initiative (NMI): Condor, Globus, Myproxy
Virtual Data Toolkit (VDT) Common Services NMI + VOMS, CEMon (common EGEE
components), MonaLisa, Clarens, AuthZ, Squid,
OSG Release Cache: VDT + Configuration, Validation, VO management
LHC Services & Framework
LIGOData Grid
CDF, D0SamGrid &Framework
Infr
ast
ruct
ure
Ap
plic
ati
on
s
…Bio Services &Framework
15
OSG Middleware Deployment
Domain science requirements.
OSG stakeholders and middleware developer (joint) projects.
Integrate into VDT Release. Deploy on OSG integration grid
Provision in OSG release & deploy to OSG production.
Condor, Globus,EGEE etc
Test on “VO specific grid”
Test InteroperabilityWith EGEE and TeraGrid
16
Integration Testbed - a Grid for System Testing & Component Validation Software Providers deliver components and/or services to OSG
Applications or Facility. Activities have responsible technical lead.
Provide Readiness Plan to Integration. Identify Support Center.
Submit request for inclusion in VDT - agree to support model, supply test programs etc.
Validate on VO-specific Grid then Integration Testbed. Integration Coordinator recommends when new component is ready for Provisioning to production.
ITBITB
17
Security - Integrated & End-To-End OSG Security Program is Risk based; it covers OSG assets and
Agreements with & between VOs, sites, and grids; we are looking at Residual Risks to identify priorities for work. Incident Response Plan; AUPs, Risk Assessment, Security Plan.
Scope includes Operational Security, Auditing, Quality Assurance, Documented Policies, Collaboration with peers, Trust Controls.
We will provide common references for sites+VOs+Grids (Otherwise program does not scale)
Scope includes Software: baseline, auditing, patching.
Good collaboration with
IGTF, EGEE -- JSPG,MSWG --
TeraGrid, Security for Open Science CET etc.
18
e.g. User and VO Management VO Registers with with Operations Center
Provides URL for VOMS service to be propagated to the sites. Several VOMS are shared with EGEE as part of WLCG.
User registers through VOMRS or VO administrator User added to VOMS of one or more VOs. VO responsible for users to sign AUP. VO responsible for VOMS service support.
Site Registers with the Operations Center Signs the Service Agreement. Decides which VOs to support (striving for default admit) Populates GUMS from VOMSes of all VOs. Chooses account UID policy for each VO &
role.
VOs and Sites provide Support Center Contact and joint Operations.
For WLCG: US ATLAS and US CMS Tier-1s directly registered to WLCG. Other support centers propagated through OSG GOC to WLCG.
19
e.g. Software Fast & Critical Updates -- already exercised for mysql vulnerabilities
20
e.g. Risk Assessment
Incident Response Plan
21
OSG & Networking needs
OSG needs end-to-end performance and reliability of the networks it uses.
Thus OSG needs access to comprehensive and usable network monitoring, capacity and performance data.
OSG need both “real time” and historical information including: Network Weather Maps Troubleshooting Tools Performance analysis and diagnosis.
Network characteristics are part of the Information Service that aggregates static and dynamic resource information and uses an LDAP-based collection service.
22
OSG & Networking needs cont.
OSG does not include development activities nor specific effort for networking. Thus we rely on External Projects for this development and support --- Help!
OSG will provide common reference configurations for end points e.g. for storage services.
We need to document reference and performant Site and Campus Network configurations e.g. Firewalls affect throughput and capabilities of Sites.
Resource Management at multiple interfaces is becoming more important as the OSG expands. We need provide the middleware and applications availability and allocation of network bandwidth information for these services.
23
The End