atlas on grid3/osg

13
1 ATLAS on Grid3/OSG R. Gardner December 16, 2004

Upload: christopher-downs

Post on 31-Dec-2015

41 views

Category:

Documents


3 download

DESCRIPTION

ATLAS on Grid3/OSG. R. Gardner December 16, 2004. ATLAS Applications. Pythia Generation Geant4 simulation Pileup Digitization Reconstruction. ATLAS Users. DC2 production team Managed production High priority 7 users User production Opportunistic production and reconstruction - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ATLAS on Grid3/OSG

1

ATLAS on Grid3/OSG

R. Gardner

December 16, 2004

Page 2: ATLAS on Grid3/OSG

2

ATLAS Applications

• Pythia Generation

• Geant4 simulation

• Pileup

• Digitization

• Reconstruction

Page 3: ATLAS on Grid3/OSG

3

ATLAS Users

• DC2 production team – Managed production– High priority– 7 users

• User production– Opportunistic production and reconstruction– 3 users– growing

Page 4: ATLAS on Grid3/OSG

4

#Job status

Capone Total

1 failed 33165

2 finished 90534

3 running 101

4 submitted 42

 

ATLAS DC2 on Grid3

• Production statistics on Grid3 (End of November 2004)

• Overall “success” rate: 74% – Through September: 66%

– During last 2 months:• finished: 53163 failed:14353

success rate: 78%.

• We improved our results since (September)

• Only 2-3 submit-clients now (10-20 in September )

Page 5: ATLAS on Grid3/OSG

5

Job Success Rate on GRID3

PassedPassed FailedFailed Success RateSuccess Rate

JulyJuly 87998799 66766676 57%57%

AugustAugust 1708317083 94489448 64%64%

SeptemberSeptember 1728317283 77177717 69%69%

OctoberOctober 2660026600 51865186 84%84%

NovemberNovember 2186921869 50385038 81%81%

Key factors in improved success rate:Key factors in improved success rate: Experienced team using common submit hosts Quicker response to large scale site/network/hardware failures

Can we improve more?Can we improve more? Some shifts >95% success, others <50% Automatic throttle for failures? But still lose all running jobs Do we care?

K. De+ improvements to Capone/GCE

Page 6: ATLAS on Grid3/OSG

6

# CE Gatekeeper Finished+Failed Jobs Finished Jobs Failed Success Rate (%)

1 BU_ATLAS_Tier2 19395 16349 3046 84.29

2 UTA_dpcc 19214 14634 4580 76.16

3 UC_ATLAS_Tier2 13285 11196 2089 84.28

4 BNL_ATLAS 11261 8993 2268 79.86

5 IU_ATLAS_Tier2 10528 8403 2125 79.82

6 UM_ATLAS 9434 6054 3380 64.17

7 BNL_ATLAS_BAK 6061 4578 1483 75.53

8 UBuffalo_CCR 4654 3992 662 85.78

9 PDSF 5075 3590 1485 70.74

10 FNAL_CMS 3857 2222 1635 57.61

11 CalTech_PG 3136 2178 958 69.45

12 UCSanDiego_PG 2828 2101 727 74.29

13 FNAL_CMS2 2157 1506 651 69.82

14 SMU_Physics_Cluster 1462 969 493 66.28

15 BU_AGT_Tier2 975 820 155 84.10

16 PSU_Grid3 769 583 186 75.81

17 OU_OSCER 843 575 268 68.21

18 UFlorida_PG 946 451 495 47.67

19 Rice_Grid3 569 370 199 65.03

20 UWMadison 803 363 440 45.21

21 UNM_HPC 502 347 155 69.12

22 OU_OSCER_LSF 412 251 161 60.92

ATLAS ProdDB

Page 7: ATLAS on Grid3/OSG

7

Detailed Job Failures(un-normalized)

Failure Total, till Nov. Total, till Sep. Last 2 months

Submission 894 472 422

Execution 428 428 0

Post Run 10131 1147 8984

Stage-Out 10833 8037 2796

RLS 1065 989 76

Capone 3975 2725 1250

Windmill 564 57 507

Other 5225 5139 86

TOTAL 33165 19303 13862

Page 8: ATLAS on Grid3/OSG

8

Status of GRID3 Jobs

evgen simul digi pile-up

Done % Done % Done % Done %dc2.003003.B1_jets_180 100 100% 19998 100% 11899 60% 14833 74%dc2.003028.A9_susy 400 100% 11409 71% 7992 50%dc2.003034.J1_Pt_17_35 2 100% 400 100% 400 100%dc2.003035.J2_Pt_35_70 2 100% 400 100% 400 100%dc2.003036.J3_Pt_70_140 2 100% 400 100% 400 100%dc2.003037.J4_Pt_140_280 2 100% 400 100% 400 100%dc2.003038.J5_Pt_280_560 2 100% 400 100% 400 100%dc2.003039.J6_Pt_560_1120 2 100% 400 100% 400 100%dc2.003040.J7_Pt_1120_2240 1 100% 200 100% 200 100%dc2.003041.J8_Pt_2240 1 100% 200 100% 200 100%dc2.003043.B2_gamjet 4000 100% 3990 100%dc2.003054.B3_Bmumu 4300 86% 0%dc2.003080.B4_jets17 9606 96% 0%

To Do – extra A9 simulation, some digitization and some B1 pile-upNote – also waiting for some B3 and B4 input evgen files from LCG

K. De

Page 9: ATLAS on Grid3/OSG

9

ATLAS historical use

ACDC archive

Page 10: ATLAS on Grid3/OSG

10

ATLAS Jobs by site

ACDC archive

Page 11: ATLAS on Grid3/OSG

11

ATLAS Production - Number of Jobs - 30 November

-50000

0

50000

100000

150000

200000

250000

300000

4062

3

4062

8

4070

3

4070

8

4071

3

4071

8

4072

3

4072

8

4080

2

4080

7

4081

2

4081

7

4082

2

4082

7

4090

1

4090

6

4091

1

4091

6

4092

1

4092

6

4100

1

4100

6

4101

1

4101

6

4102

1

4102

6

4103

1

4110

5

4111

0

4111

5

4112

0

4112

5

Days

Nu

mb

er

of

job

s

LCGNorduGridGrid3Total

Page 12: ATLAS on Grid3/OSG

12

Grid3OSG Resource Availability

• ATLAS expects to be running continuous production starting now throughout 2005

• This activity consists of:– Completion of DC2– Production for the Rome physics workshop in June– User production via Capone clients– Distributed analysis via ADA

• Expect trend towards resource saturation to continue as more users are equipped with job submission tools

Page 13: ATLAS on Grid3/OSG

13

Some OSG Issues

• Managed storage is now the biggest problem facing continued DC2 production– for both access and space management

• Authorization – role based, access rights, queue priorities– policy infrastructure, publication

• Accounting service– user-level what resources have been used– cpu, storage over an arbitrary time period

• Operations – extend operations protocol between BNL Tier1 and iGOC/OSG operations activity