u.s. atlas grid testbed status and plans kaushik de university of texas at arlington doe/nsf...
TRANSCRIPT
U.S. ATLAS Grid Testbed Status and Plans
Kaushik DeKaushik De
University of Texas at ArlingtonUniversity of Texas at Arlington
DoE/NSF Mid-term ReviewDoE/NSF Mid-term Review
NSF Headquarters, June 2002NSF Headquarters, June 2002
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 2
Outline
Testbed Phase 2 launched: UTA Workshop Testbed Phase 2 launched: UTA Workshop http://heppc1.uta.edu/atlas/workshop_april_2002/index.html
New focus on rapid software deployment New focus on rapid software deployment and grid based data production leading to and grid based data production leading to demonstrations at Supercomputing 2002demonstrations at Supercomputing 2002
Kaushik De coordinating U.S. Testbed and Kaushik De coordinating U.S. Testbed and SC2002 planning since mid-April 2002SC2002 planning since mid-April 2002
This talk based on new & evolving plansThis talk based on new & evolving plans Testbed status
Software distribution
Application toolkit
MC production plans
Monitoring
Grid tools
Integration
SC2002 demos
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 3
Testbed Goals
Demonstrate success of grid computing Demonstrate success of grid computing model for High Energy Physicsmodel for High Energy Physics in data production
in data access
in data analysis
Develop, deploy and test grid middleware Develop, deploy and test grid middleware and applicationsand applications integrate middleware with applications
simplify deployment - robust, rapid & scalable
inter-operate with other testbeds & grid organizations (iVDGL, DataTag…)
provide single point-of-service for grid users
Evolve into fully Evolve into fully functioning scalable functioning scalable distributeddistributed tiered grid tiered grid
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 4
Testbed Website
http://heppc1.uta.edu/atlas/grid-testbed/index.htm
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 5
Lawrence BerkeleyNational Laboratory
BrookhavenNationalLaboratoryIndiana
University
Boston University
ArgonneNationalLaboratory
U Michigan
University ofTexas atArlington
OklahomaUniversity
US -ATLAS testbed launched February 2001
Grid Testbed Sites
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 6
Testbed Fabric
8 production gatekeepers - ANL, BNL, 8 production gatekeepers - ANL, BNL, LBNL, BU, IU, UM, OU, UTALBNL, BU, IU, UM, OU, UTA http://heppc1.uta.edu/atlas/grid-testbed/testbed-sites.htm
Large clusters at BNL, LBNL, IU, UTA, BULarge clusters at BNL, LBNL, IU, UTA, BU BNL: RCF, LBNL: PDSF, IU/BU: prototype Tier 2
UTA awarded NSF MRI for acquisition of D0 & ATLAS grid facility ($950k+$400k) - Thanks!
+ Multiple R&D gatekeepers+ Multiple R&D gatekeepers gremlin@bnl - iVDGL GIIS
heppc5@uta - ATLAS hierarchical GIIS
atlas10/14@anl - EDG testing
heppc6@uta+gremlin@bnl - glue schema
heppc17/19@uta - GRAT development
few sites - Grappa portal
bnl - VO server
few sites - iVDGL testbed
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 7
Software Distribution
Jason Smith, Kaushik De, Saul Youssef, Jason Smith, Kaushik De, Saul Youssef, Wensheng Deng, Shava SmallenWensheng Deng, Shava Smallen
Goals:Goals: Easy installation by System Administrators
Uniform software versions
Pacman perfect for this task
First stage deploymentFirst stage deployment Done - May, 2002 Pacman, Globus 2.0b, cernlib GRAT application/production package
Second stage deploymentSecond stage deployment Magda, Grappa - June, 2002
Tools for distributed production
Third stageThird stage VDT 1.1.1, Chimera, … - July/August, 2002
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 8
Available Packages
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 9
Applications Team
Horst Severini, Kaushik De, Dan Engh, Horst Severini, Kaushik De, Dan Engh, Wensheng Deng, Ed MayWensheng Deng, Ed May
Goal:Goal: enable physicist to use testbed without worrying about underlying middleware or ATLAS software
Athena-Atlfast for grid testbedAthena-Atlfast for grid testbed Tool 1: runs on any globus enabled node (requires
transfer of ~17MB executable package)
Tool 2: runs on grid site where executable package has been preinstalled
Tool 3: runs on afs enabled sites (the latest version of software is built and used)
GRid Applications Toolkit: GRATGRid Applications Toolkit: GRAT Above plus grid tools - ver 0.1 released 4/12/02 tested successfully on 17 U.S. ATLAS
gatekeepers, CMS gatekeeper, D0 gatekeeper, EDG CE node (RH 6.x and RH 7.x), ...
Version 0.3 of GRAT released May 8, 2002
Next, add Magda+ & merge with GrappaNext, add Magda+ & merge with Grappa
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 10
GRAT v 0.3
Script based toolkit. Merging now with Grappa visual GUI tool (see Gardner talk)
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 11
Testbed Production
Goals: Goals: Demonstrate distributed ATLAS data production,
access and analysis using grid middleware and tools developed by the testbed group
Plans:Plans: Atlfast production to test middleware and tools,
and produce physics data for summer students, based on athena-atlfast, using VDT+Magda +Chimera and both GRAT and Grappa 2 weeks to regenerate data, once a month
deploy new tools and middleware each cycle
move away from farm paradigm to grid model
very aggressive schedule - people limited!
DC1 production to test fabric capabilities and produce and access data, using old Fortran code atlsim, atrig and atrecon (see previous talks) not repeatable - hard to actively test grid software
increase U.S. participation - involve grid testbed
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 12
Atlfast Production
Application: Application: Athena-atlfast
Current version 3.0.1. Next release will be 3.2.0 (official DC1 release)
Middleware: Middleware: VDT+Magda+Chimera
Interface: Interface: GRAT, Grappa
Sites: Sites: 8 ATLAS testbed sites, 2 CMS testbed sites, 2 D0 MC farms, EDG sites? TeraGrid sites?
June, 2002: June, 2002: Phase AlphaPhase Alpha Demonstrate software deployment and simple
production system done
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 13
Summer Schedule
July 1-15: July 1-15: Phase 0Phase 0, , 10^7 events10^7 events Globus 2.0 beta, Athena 3.0.1, Grappa, common
disk model, Magda, 5 physics processes, BNL VO manager, minimal job scheduler, GridView monitoring
August 5-19: August 5-19: Phase 1, Phase 1, 10^8 events10^8 events VDT 1.1.1, Hierarchical GIIS server, Athena-atlfast
3.2.0, Grappa, Magda - data & replica management with metadata catalogue, 10 physics processes, static MDS based job scheduler, new visualization
September 2-16: September 2-16: Phase 2, Phase 2, 10^9 events, 10^9 events, 1 TB storage, 40k files1 TB storage, 40k files Athena-atlfast 3.2.0 instrumented, 20 physics
processes, upgraded BNL VO manager, dynamic job scheduler, fancy monitoring
Need some planning of analysis toolsNeed some planning of analysis tools
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 14
Atlfast Production Architecture
BoxedAthena-Atlfast
JobOptions:HiggsSUSYQCDTopW/Z
Compute Sites
Grappa Portalor
GRAT script
User
ResourceBroker
Magda VDC
MDS Globus
StorageElements
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 15
Monitoring Team
Dantong Yu, Patrick McGuigan, Craig Tull, Dantong Yu, Patrick McGuigan, Craig Tull, Kaushik De, Shawn McKee, Dan Engh, Kaushik De, Shawn McKee, Dan Engh, Jason SmithJason Smith
Monitoring is critically important in Monitoring is critically important in distributed Grid computingdistributed Grid computing check system health, debug problems
discover resources using static data
job scheduling and resource allocation decisions using dynamic data from MDS and other monitors
Testbed monitoring prioritiesTestbed monitoring priorities Discover site configuration
Discover software installation
Application monitoring
Grid status/operations monitoring
Also needAlso need Well defined data for job scheduling
Visualization
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 16
Monitoring - Back End
Publishing MDS informationPublishing MDS information Glue schema - BNL & UTA
Pippy - Pacman information service provider
BNL ACAS schema
Hierarchical GIIS server
Non-MDS back endsNon-MDS back ends iPerf, Netlogger, Prophesy, Ganglia
ArchivingArchiving MySQL
GridView, BNL ACAS
RRD Network
Work neededWork needed What to store?
Replication of archived information
Good progress on back end!Good progress on back end!
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 17
Monitoring - Front End
MDS basedMDS based GridView, Gridsearcher
Converting TeraGrid and other toolkits
Non-MDSNon-MDS Cricket, Ganglia
Work neededWork needed Urgent for SC2002! Graphs, maps, drill-down…
New visualization team: Dantong Yu (evaluation of existing tools), Patrick McGuigan (Java CoG, Python), Jason Smith (PHP)
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 18
GridView 2.2
Simple visualization tool using Globus Toolkit First native Globus application for ATLAS grid (March 2001)
Collects information using Globus tools. Archival information is stored in MySQL server on a different machine. Data published through web server on a third machine.
http://heppc1.uta.edu/atlas/grid-status/index.html
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 19
Testbed Tools
Many tools developed by the U.S. ATLAS Many tools developed by the U.S. ATLAS testbed group during past yeartestbed group during past year
GridView - simple tool to monitor status of testbed Kaushik De, Patrick McGuigan
Gripe - unified user accounts Rob Gardner
Magda - MAnager for Grid DAta Torre Wenaus, Wensheng Deng (see Gardner & Wenaus talks)
Pacman - package management and distribution tool Saul Youssef
Being widely used or adopted by iVDGL VDT, Ganga, and others (see Gardner talk)
Grappa - web portal using active notebook technology Shava Smallen (see Gardner talk)
GRAT - GRid Application Toolkit
Gridsearcher - MDS browser Jennifer Schopf
GridExpert - Knowledge Database Mark Sosebee
VO Toolkit - Site AA Rich Baker (see Baker talk)
......
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 20
Integration!!
Coordination with other grid efforts and Coordination with other grid efforts and software developers - very difficult task!software developers - very difficult task!
Project centric:Project centric: GriPhyN/iVDGL - Rob Gardner PPDG - Torre Wenaus EDG - Ed May, Jerry Gieraltowski ATLAS/LHCb - Rich Baker ATLAS/CMS - Kaushik De ATLAS/D0 - Jae Yu
Fabric/Middleware centric:Fabric/Middleware centric: Afs Software installations - Alex Undrus, Shane
Canon, Iwona Sakrejda Networking - Shawn McKee, Rob Gardner Virtual and Real Data Management -
Wendsheng Deng, Sasha Vaniachin, Pavel Nevski, David Malon, Rob Gardner, Dan Engh, Mike Wilde, Yong Zhao, Shava Smallen
Security/Site AA/VO - Rich Baker, Dantong Yu
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 21
SC2002 Plans
SC2002 in Maryland, mid-November
Testbed Production demo (BNL) Kaushik De Monitor/interact with grid production
ATLAS/CMS demo (FNAL/SLAC) Kaushik De preliminary discussions with CMS
may become iVDGL demo (see Gardner talk)
ATLAS GRAT already running at CMS sites
GridView is monitoring two CMS sites
Application monitoring (LBNL) Craig Tull Athena + Netlogger + Prophesy
Virtual data demo (ANL/UC/IU) Rob Gardner
Common areas Brochure - Rob Gardner
Posters - Craig Tull
Common script - Jennifer Schopf
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 22
Testbed Production Demo. (in BNL booth)
ATLAS physics storyATLAS physics story
ATLAS computing storyATLAS computing story
Visualize production:Visualize production: Monitor site status
static - glue, pippy
dynamic - jobs, cpu usage
Monitor data status magda - visual?
VDC (same as IU booth)
Monitor applications Athena instrumented (same as LBNL booth)
Event display?Event display?
First version at LBNL US Computing First version at LBNL US Computing meeting July 29-31meeting July 29-31
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 23
SC2002 Demo
ATLAS-CMS Demo. Architecture
ATLAS-CMSUser Job
SchedulingPolicy
ATLAS-CMSTestbed
Visualization(status, physics)
ProductionJobs
MOP, GRAT, Grappa
Condor, Python?
Globus,Condor-G?
MDS,Ganglia,Paw/Root
??
June 20, 2002June 20, 2002Kaushik De DoE/NSF ReviewKaushik De DoE/NSF Review 24
Summary
Testbed -> SC2002Testbed -> SC2002 Recently refocused testbed activities and plans
Important grid-based production milestone this summer to test middleware using light-weight layered approach to software deployment
Testbed production should naturally lead to Supercomputing 2002 demos
Exploring various integration and cooperation issues - no need to reinvent the wheel
The testbed can provide a lot of resources, hardware and people, when fully grid-enabled
In summary - hardware not limiting problem yet! Middleware coming along. Need serious work on integration and deployment and testing. Shortage of people critical here - lab and university base funding shortages are the limiting factors!!