tony doyle [email protected] “executive summary”, gridpp cb meeting, iop, 29 october...
TRANSCRIPT
Tony [email protected]
“Executive Summary”, GridPP CB Meeting, IoP, 29 October 2002
Tony Doyle - University of Glasgow
Excutive SummaryExcutive Summary
• Introduction
• Project Management
• Resources
• CERN
• DataGrid
• Applications
• Tier-1/A
• Tier-2
• Dissemination
• Future Funding
• the Grid is starting to work
• Major: via Project Map now under
control
• Small modifications
• Manpower OK. Hardware
• Making impact
• Engaged (posts filled + value added)
• Tier-A production mode
• Latent resources
• UK flagship project
• Preliminary planning
Tony Doyle - University of Glasgow
Update Update (to 10/10/02)(to 10/10/02)
DataGrid Q7 Update
• ATLAS taskforce– v. useful, ongoing, other
experiments
• Releases– Avoid “Big Bang” approach– Separate “Prod” & “Dev”
• 2nd EU Review– 1 month earlier
• Active in FP6 planning
• Domestic– Q7 delivered~150% contract– No major issues
Applications Q7 Update• Review of Project Map
Milestones
(DB, PC, NB – 9/10/02)– No major issues
– Feedback to experiments initiated
• GridPP-LCG-DataGrid applications overlap
• e.g. persistency framework• e.g. future milestones
(via change notice)Q7 Cumulative Effort Effort
Funded Unfunded Total Effort Effort Delivered Delivered WP Title Effort Effort Effort Expectation Shortfall % %
1 Workload Management 0.0 0.0 0.0 1.5 1.5 0.0% 55.6% 2 Data Management 0.0 11.2 11.2 4.5 -6.7 248.9% 99.3% 3 Monitoring Services 9.0 16.1 25.1 14.4 -10.7 174.3% 108.2% 4 Fabric Management 0.0 3.0 3.0 1.5 -1.5 196.7% 206.7%
5 Mass Storage Management 4.0 11.4 15.4 7.5 -7.9 205.3% 92.6% 6 Testbed 0.0 11.0 11.0 9.0 -2.0 122.2% 125.0% 7 Network 0.0 0.0 0.0 6.0 6.0 0.0% 105.8%
TOTALS 13.0 52.7 65.7 44.4 -21.3 147.9% 123.0%
Tony Doyle - University of Glasgow
GridPP OverviewGridPP Overview
EDG - UK Contributions
ArchitectureTestbed-1Network MonitoringCertificates & SecurityStorage Element R-GMALCFGMDS deploymentGridSiteSlashGridSpitfire…
Applications (start-up phase)
BaBarCDF/D0 (SAM)ATLAS/LHCbCMS(ALICE)UKQCD
£17m 3-year project funded by PPARC through the e-Science Programme
CERN - LCG (start-up phase)
funding for staff and hardware...CERN
DataGrid
Tier - 1/A
ApplicationsOperations
http://www.gridpp.ac.uk
Tony Doyle - University of Glasgow
£17m++ 3-Year Project£17m++ 3-Year Project
• Five components– Tier-1/A = Hardware + CLRC ITD Staff
– DataGrid = 15 DataGrid Posts + CLRC PPD Staff
– Applications = 13 Experiments Posts (to interface middleware)
– Operations = Travel (~100 people)+ Management + Early Investment
– CERN = 25 LCG posts + Tier-0 + LTA
6/Oct/2002
£3.79m
£5.67m
£3.67m
£2.08m£1.79m
CERN
DataGrid
Tier - 1/A
ApplicationsOperations
Tony Doyle - University of Glasgow
Financial ProfileFinancial Profile
TOTAL FY2001 FY2002 FY2003 FY2004
CERN £5,666,600 £21,625 £1,813,784 £1,923,647 £1,907,544
DataGrid £3,786,302 £591,177 £1,209,807 £1,278,227 £707,092
Tier-1/A £3,651,371 £888,000 £1,117,987 £1,391,399 £253,986
Applications £2,066,018 £62,817 £659,911 £749,032 £594,257
Operations £1,829,647 £473,060 £519,346 £539,566 £297,675
TOTAL £16,999,938 £2,036,679 £5,320,835 £5,881,871 £3,760,553
Estimated Breakdown by Financial YearProject Summary
Tony Doyle - University of Glasgow
Financial BreakdownFinancial Breakdown
CERN TOTAL FY2001 FY2002 FY2003 FY2004
Hardware £1,245,800 £0 £300,000 £425,000 £520,800
Staff £4,420,800 £21,625 £1,513,784 £1,498,647 £1,386,744
TOTAL £5,666,600 £21,625 £1,813,784 £1,923,647 £1,907,544
DataGrid(to end 2002Q2) Planned Delivered Planned Delivered
WP1 £12,900 0.25 0.42 0.50 0.42WP2 £41,900 1.00 1.26 1.50 1.26WP3 £161,031 2.66 3.91 4.80 5.45WP4 £11,450 0.25 1.30 0.50 1.30WP5 £66,283 1.08 1.34 2.50 2.02WP6 £151,443 2.51 4.07 3.00 4.07WP7 £90,040 1.55 3.00 2.00 3.00WP8 £56,130 1.05
TOTAL £591,177 9.30 15.30 14.80 17.52
Cost to GridPP GridPP Effort (SY) Total UK Effort (SY)
Tier - 1/A TOTAL FY2001 FY2002 FY2003 FY2004
Hardware £2,438,000 £888,000 £775,000 £775,000 £0
Staff £1,213,371 £0 £342,987 £616,399 £253,986
TOTAL £3,651,371 £888,000 £1,117,987 £1,391,399 £253,986
Experiments
(SY) Cost (SY) Cost (SY) Value
ATLAS 4.50 £234,400 5.50 £306,800 10.00 £541,200
LHCb 3.00 £150,700 5.42 £342,318 8.42 £493,018
CMS 4.50 £237,100 4.21 £234,060 8.71 £471,160
LHC Total 12.00 £622,200 15.13 £883,178 27.13 £1,505,378BaBar 0.00 £0 6.88 £395,120 6.88 £395,120CDF 0.00 £0 5.50 £307,300 5.50 £307,300D0 0.00 £0 5.58 £323,820 5.58 £323,820UKQCD 0.00 £0 3.00 £156,600 3.00 £156,600
Non-LHC Total 0.00 £0 20.96 £1,182,840 20.96 £1,182,840GRAND TOTAL 12.00 £622,200 36.08 £2,066,018 48.08 £2,688,218
Application PostsDataGrid Posts TOTAL
Operations TOTAL FY2001 FY2002 FY2003 FY2004
Managers £772,147 £124,060 £248,346 £274,566 £125,175
Travel £790,000 £200,000 £220,000 £220,000 £150,000
Miscellaneous £267,500 £149,000 £51,000 £45,000 £22,500
TOTAL £1,829,647 £473,060 £519,346 £539,566 £297,675
Tony Doyle - University of Glasgow
Financial ProfileFinancial Profile
TOTAL FY2001 FY2002 FY2003 FY2004
CERN £5,666,600 £21,625 £1,813,784 £1,923,647 £1,907,544
DataGrid £3,786,302 £591,177 £1,209,807 £1,278,227 £707,092
Tier-1/A £3,651,371 £888,000 £1,117,987 £1,391,399 £253,986
Applications £2,066,018 £62,817 £659,911 £749,032 £594,257
Operations £1,829,647 £473,060 £519,346 £539,566 £297,675
OC2 TOTAL £16,999,938 £2,036,679 £5,320,835 £5,881,871 £3,760,553
Estimated Breakdown by Financial YearProject Summary
TOTAL FY2001 FY2002 FY2003 FY2004
CERN £5,666,600 £0 £1,239,021 £2,282,111 £2,145,468
DataGrid £3,786,302 £591,177 £1,209,807 £1,278,227 £707,092
Tier-1/A £3,674,009 £888,000 £742,987 £1,766,399 £276,624
Applications £2,077,645 £91,846 £661,304 £738,557 £585,938
Operations £1,794,159 £443,060 £507,695 £523,688 £319,716
OC3 TOTAL £16,998,715 £2,014,083 £4,360,813 £6,588,981 £4,034,838
Estimated Breakdown by Financial YearProject Summary
Re-profile of CERN contribution and Tier-1/A increase in FY2002
Tony Doyle - University of Glasgow
Financial BreakdownFinancial Breakdown
CERN TOTAL FY2001 FY2002 FY2003 FY2004
Hardware £1,245,800 £0 £300,000 £425,000 £520,800
Staff £4,420,800 £21,625 £1,513,784 £1,498,647 £1,386,744
TOTAL £5,666,600 £21,625 £1,813,784 £1,923,647 £1,907,544
Tier - 1/A TOTAL FY2001 FY2002 FY2003 FY2004
Hardware £2,438,000 £888,000 £775,000 £775,000 £0
Staff £1,213,371 £0 £342,987 £616,399 £253,986
TOTAL £3,651,371 £888,000 £1,117,987 £1,391,399 £253,986
CERN TOTAL FY2001 FY2002 FY2003 FY2004
Hardware £1,245,800 £0 £0 £725,000 £520,800
Staff £4,420,800 £0 £1,239,021 £1,557,111 £1,624,668
TOTAL £5,666,600 £0 £1,239,021 £2,282,111 £2,145,468
Tier - 1/A TOTAL FY2001 FY2002 FY2003 FY2004
Hardware £2,438,000 £888,000 £400,000 £1,150,000 £0
Staff £1,236,009 £0 £342,987 £616,399 £276,624
TOTAL £3,674,009 £888,000 £742,987 £1,766,399 £276,624
Reduced Spend in FY2002(Increase in FY2003-04)
Tony Doyle - University of Glasgow
Project ManagementProject Management - 7 Elements- 7 Elements
Tony Doyle - University of Glasgow
Summary (by Project Map areas)Summary (by Project Map areas)
• Grid success is fundamental for PP
1. CERN = LCG, Grid as a Service.
2. DataGrid = Middleware built upon Globus and Condor-G. Testbed 1 deployed.
3. Applications – complex, need to interface to middleware.
LHC Analyses – ongoing feedback/development.
Other Analyses have immediate requirements. Integrated using Globus, Condor, EDG/SAM tools
4. Infrastructure = Tiered computing to the physicist desktop:
Scale in UK? 1 PByte and 2,000 distributed CPUs
GridPP in Sept 2004 5. Integration = ongoing with UK e-
science…6. Dissemination• Co-operation required with other
disciplines/industry7. Finances – under control, but need to
start looking to Year 4..• Year 1 was a good starting point.
First Grid jobs have been submitted.. • Looking forward to Year 2. Web
services ahead.. but• Experiments will define whether this
experiment is successful (or not)
Tony Doyle - University of Glasgow
LCG CreationLCG Creation
LCG Phase 1 Personnel funded from Special ContributionEffective FTEs (weighted by experience)
Only identified personnel shown
0
10
20
30
40
50
60
70
2001 2002 2003 2004 2005Years
FT
E
(wei
gh
ted
by
exp
erie
nce
)
EU
USA
CERNMat
Sweden
Israel
Hungary
Portugal
Switzerland
Spain
France
Germany
Italy
UK
Requested
Tony Doyle - University of Glasgow
LCG Level 1 MilestonesLCG Level 1 Milestones
2002 200520042003
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4Q1 Q2 Q3 Q4Q1 Q2 Q3 Q4
Hybrid Event Store available for general users
Distributed production using grid services
First Global Grid Service (LCG-1) available
Distributed end-user interactive analysis
Full Persistency Framework
LCG-1 reliability and performance targets
“50% prototype” (LCG-3) available
LHC Global Grid TDR
applications
grid
Tony Doyle - University of Glasgow
Robust?Robust?Development InfrastructureDevelopment Infrastructure
• CVS Repository– management of DataGrid source code– all code available (some mirrored)
• Bugzilla
• Package Repository– public access to packaged DataGrid code
• Development of Management Tools– statistics concerning DataGrid code– auto-building of DataGrid RPMs– publishing of generated API documentation
– latest build = Release 1.2 (August 2002)
testbed 1 source code lines
java
cpp
ansic
python
perl
sh
csh
sed
sql
makefile
140506 Lines of Code140506 Lines of Code10 Languages10 Languages(Release 1.0)(Release 1.0)
Tony Doyle - University of Glasgow
Component ETT UT IT NI NFF MB SD
Resource Broker
Job Desc. Lang.
Info. Index
User Interface
Log. & Book. Svc.
Job Sub. Svc.
Broker Info. API
SpitFire
GDMP
Rep. Cat. API
Globus Rep. Cat.
ETT Extensively Tested in Testbed
UT Unit Testing
IT Integrated Testing
NI Not Installed
NFF Some Non-Functioning Features
MB Some Minor Bugs
SD Successfully Deployed
Component ETT UT IT NI NFF MB SD
Schema
FTree
R-GMA
Archiver Module
GRM/PROVE
LCFG
CCM
Image Install.
PBS Info. Prov.
LSF Info. Prov.
Component ETT UT IT NI NFF MB SD
SE Info. Prov.
File Elem. Script
Info. Prov. Config.
RFIO
MSS Staging
Mkgridmap & daemon
CRL update & daemon
Security RPMs
EDG Globus Config.
Component ETT UT IT NI NFF MB SD
PingER
UDPMon
IPerf
Globus2 Toolkit
Robust?Robust?Software EvaluationSoftware Evaluation
Tony Doyle - University of Glasgow
Robust?Robust?Middleware Testbed(s)Middleware Testbed(s)
B.Jones– July 2002 - n° 2
Testing Activities
WPs add unittested code toCVS repository
Run nightly build& auto. tests
AnyErrors?
I nstall on cert. Testbed& run back. compat.
tests
yesFix problems
no AnyErrors?
yes
Fix problems
no
Candidate beta ReleaseFor testing by apps.
“Development”testbed
“Certification”testbed
“Application”testbed
“WP specific”testbeds
AnyErrors?
yes no
Candidate publicrelease
for use by apps.
24x7Offi ce hours
I Team
TSTG
ATG
Apps
WPs
Validation/Maintenance=>Testbed(s)
EU-wide development
Tony Doyle - University of Glasgow
Robust?Robust?Code Development IssuesCode Development Issues
• Reverse Engineering (C++ code analysis and restructuring; coding standards) => abstraction of existing code to UML architecture diagrams
• Language choice
(currently 10 used in DataGrid)– Java = C++ - - “features” (global variables, pointer
manipulation, goto statements, etc.).– Constraints (performance, libraries, legacy code)
• Testing (automation, object oriented testing)
• Industrial strength?
• OGSA-compliant?
• O(20 year) Future proof??
Experiment-wide database selection
Output files Storage options preferences (SE, MSS, closest...)
Define execution cri teria (CE, priori ty ...)
Submit Physic Appl i
login
PRODUCTION: Simulation
else
If actor is proxy certi fied
Get LFNs for database access
Al locate output LFNs
Write submission job (JDL?) -Submit Job to Grid
VO metadata data description catalog
VO Job submission bookkeeping service
VO metadata configuration Catalog
Job resource match
VO repl ica catalog
Record job parameter
Al locate Job Id
Optimize CE choice /VO
Submit job to CE
Submit Job to Working Node
Prepare exec environment -associate PFN-LFN
Execute Physic Appl i
Manage Output Files & update Fi le catalog LFN-PFN
Record execution info
Fi le management & PFN selection
Record job parameter (JDL, input, ...)
Register/update attributes (LFN)
Management of job-related information
Display avai lable resources/JDL
Job execution accounting service
POSIX cal l -Open (LFN) Read/Wri te Close or grid wrapper to POSIX cal ls
VO Database access
Grid access via API
Appl ication is never recompi led or rel inked to run on Grid - Access to data is done via standard POSIX cal ls (???????)
Register/Update attributes (LFN) in VO metadata Catalog
Publ ish job-related information
ex: automatic file replication or fi le transfer & fi le catalog update
PHYSIC APPLICATIONGRIDEXPERIMENT SPECIFIC MODULESPRODUCTION TEAM
ETT Extensively Tested in Testbed
UT Unit Testing
IT Integrated Testing
NI Not Installed
NFF Some Non-Functioning Features
MB Some Minor Bugs
SD Successfully Deployed
testbed 1 source code lines
java
cpp
ansic
python
perl
sh
csh
sed
sql
makefile
Tony Doyle - University of Glasgow
DisseminationDissemination(to other e-scientists)(to other e-scientists)
e-Science ‘All Hands’ Meeting held at Sheffield, 2-4 September 2002– ~ 300 people in total– ~ 19 GridPP People attended– ~ 13 GridPP ‘Abstracts’ accepted (total ~100)– ~ 10 GridPP Posters displayed– 4 GridPP Invited talks– 3 GridPP Demonstrations
GridPP Web Page Requests
Tony Doyle - University of Glasgow
GridPP PostersGridPP Posters
ATLAS SAM
OptorSimGridPP Tier-1/A ScotGrid
BaBarLHCbCMS
Storage
10 Posters for NeSC Opening and e-Science All Hands Meeting
Tony Doyle - University of Glasgow
t0
t1
Web PagesWeb Pagesusing Certificates and GridSiteusing Certificates and GridSite
Tony Doyle - University of Glasgow
TestBed StatusTestBed Status28 Oct 2002 15:2228 Oct 2002 15:22
Tony Doyle - University of Glasgow
Tier-1/A: Year of GrowthTier-1/A: Year of Growth
CSF Linux Weekly CPU Use
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
110000
120000
130000
140000
01
/01
/20
01
05
/02
/20
01
12
/03
/20
01
16
/04
/20
01
21
/05
/20
01
25
/06
/20
01
30
/07
/20
01
03
/09
/20
01
08
/10
/20
01
12
/11
/20
01
17
/12
/20
01
21
/01
/20
02
25
/02
/20
02
01
/04
/20
02
06
/05
/20
02
10
/06
/20
02
15
/07
/20
02
19
/08
/20
02
23
/09
/20
02
No
rma
lise
d P
45
0 C
PU
Ho
urs
P450 cpu hours
CSF Linux Accounts : Since April 2002
delphiuk4%
lhcb1%
atlas13%
antaresw5%
theory14%
cms2%
h18%
bfactory46%
sno3%
zeus4%
020406080100120140
Oct-01
Dec-01
Feb-02
Apr-02
Jun-02
Aug-02
Personal
Server
GridPP Certificates
BaBar Use
Tony Doyle - University of Glasgow
Tier-1/A developmentTier-1/A development
In the next six months the main targets will be:
• Integration within the Grid (or Grids).
• Improving monitoring and error-handling.
• Developing support structure including longer hours of cover.
• Optimising the service for LHC Data Challenges.
• Completing next year’s hardware procurement.
• Migration to UK e-Science CA (start October 30th)
• Beyond this, we recognise support of the Tier-1/A centre as our highest priority for future funding
Tony Doyle - University of Glasgow
GridPP –GridPP – Achievements and Issues Achievements and Issues
• 1st Year Achievements• Complete Project Map
– Applications: Middleware: Hardware
• Fully integrated with EU DataGrid, LCG and SAM Projects
• Rapid middleware deployment /testing
• Integrated US-EU applications development e.g. BaBar+EDG
• Roll-out document for all sites in the UK (Core Sites, Friendly Testers, User Only).
• Testbed up and running at 16 sites in the UK
• Tier-1 Deployment• 250 GridPP Certificates issued
(personal + server)• First significant use of Grid by an
external user (LISA simulations) in May 2002
• Web page development (GridSite)
• Issues for Year 2• Status: 28 Oct 2002 15:22 GMT
– monitor and improve testbed deployment efficiency short (10 min) and long-term (monthly)…
• Importance of EU-wide development of middleware and integration with US-led approach
• Integrated Testbed for use/testing by all applications
• Common “integration” layer between middleware and application software
• Integrated US-EU applications development
• Tier-1 Grid Production Mode• Tier-2 Definitions and Deployment• Integrated Tier-1 + Tier-2 Testbed• Transfer to UK e-Science CA• Integration with other UK projects
e.g. AstroGrid, MyGrid…• Publication of YOUR work
Tony Doyle - University of Glasgow
GridPP User InterfaceGridPP User InterfacePulling it all together…Pulling it all together…
GridPP Web Links 2003?: Demonstration elements: Gridsite, GUIDO, R-GMA,…Short and Long-Term Monitoring,Local and WAN Monitoring.
Tony Doyle - University of Glasgow
SWOT AnalysisSWOT Analysis
• Strengths– Excellent people, sense of community, well resourced, well
respected, integrated into EU development, significant role in US development, good website
• Weaknesses– Slow to get going, have not convinced the UK PP community
yet, limited industry involvement
• Opportunities– Further funding, potential for wide impact (talks, demos etc),
sharing of resources by all experiments
• Threats– EDG software delays, UK PP community no longer support us
Tony Doyle - University of Glasgow
Reporting to OversightReporting to Oversight(and Steering) Committee(and Steering) Committee
• Exceeding 1st year targets
• Delivering.. and seen to be delivering via the Project Map
• 1st year Grid and PP success
The Project has now completed one year of operations with almost all posts filled in the UK and interviews completed at CERN. There is active engagement across 16 UK sites, with more than 120 Grid users on 130 servers underpinned by a strong testbed team and a successful website. Valuable contributions are being made to Grid middleware and interfaces to each of the applications are being developed. DataGrid release1.2 is currently being deployed. The project is widely recognised and engaged with the UK e-Science programme and is starting to make an impact in developments within the International Grid community. Project and resource management is under control, the Tier 1/A centre is recognised as making significant impact Internationally and a plan for Tier-2 site development has been prepared. We approach year 2 expecting to build upon these initial foundations and extend the Grid in terms of functionality (via middleware development), robustness (through the testbed development), accountability (necessary to ensure sharing of distributed resources), accessibility (via integration of farms to all experiments and a wider user base) and risk management (via the establishment of a risk register). This summary highlights various issues with respect to project management, resource monitoring, LHC Computing Grid (LCG) development at CERN, DataGrid and application developments, Tier-1/A and UK central support issues, Tier-2 development plan and dissemination.