lhcopn operational working group report
Post on 05-Jan-2016
28 Views
Preview:
DESCRIPTION
TRANSCRIPT
LHCOPN operational working group report
Guillaume Cessieux (FR-CCIN2P3 / EGEE networking support)on behalf of the Ops WG
LHCOPN meeting, 2009-01-15, Berlin
Background
• LHCOPN meeting in Copenhagen 2008-10-16/17– Test procedures for backup paths agreed– Feedbacks requested– Roadmap for implementation needed
• One LHCOPN Ops meeting in mid December– http://indico.cern.ch/conferenceDisplay.py?confId=44050
– Very productive (20 actions + 15 actions for GGUS)
GCX - LHCOPN meeting - 2009-01-15 2
Agenda
• The operational model itself– Main feedbacks reviewed– Main changes on the model– Areas of weaknesses
• Implementation status and updates– Tools– Roadmap– Pending & next steps
GCX - LHCOPN meeting - 2009-01-15 3
1- THE OPERATIONAL MODELhttps://twiki.cern.ch/twiki/bin/view/LHCOPN/OperationalModel
GCX - LHCOPN meeting - 2009-01-15 4
Ops model in one slide
• Federated with key responsibilities on T1s– On top of what currently exists
• Information centralised (twiki & GGUS)
GCX - LHCOPN meeting - 2009-01-15 5
Site A Site BNREN A * NREN B NREN C
LHCOPN TTS(GGUS) All sites
12
3
Users
4
Overview of sites feedbacks
GCX - LHCOPN meeting - 2009-01-15 6
Site Remark
CA-TRIUMF No major issue
CH-CERN Ops wg member
DE-KIT Ops wg member
ES-PIC Ops wg member
FR-CCIN2P3 Ops wg member
IT-INFN-CNAF No answer
NDGF
NL-T1
TW-ASGC No clear agreement
UK-T1-RAL Ops wg member & confirmed
US-FNAL-CMS No answer
US-T1-BNL No answer
• Fear of additional load for small events– Wise thresholds [> 1 hour || > 5 times an hour]
• Lack of accuracy of the Ops model on twiki– Initially a high level view– Point us what is still not enough detailed
• Many details– Open tickets and then investigate - or the contrary– Flexible model
GCX - LHCOPN meeting - 2009-01-15 7
Sites feedbacks
Network providers feedbacks• Where is the E2ECU?• Hard to understand the twiki
– Balance between complexity and accuracy
• Low robustness • Federated model cannot work seriously in a stable mode• Inappropriate way to operate such a network• Hot potatoes, cost, distributed ownership of trouble• “You are not prepared for the worst”
– Responsibilities will be highlighted based on cost model
GCX - LHCOPN meeting - 2009-01-15 8
Grid feedbacks
GCX - LHCOPN meeting - 2009-01-15 9
• Communication channel to the Grid to be studied— Different user communities to be targeted— Grid data contacts to be nominated
• Performance issues are very important
Changes on the model (1/2)• Much vagueness removed
– Reasonable, major, suitable...
• Notification: No longer all sites but affected ones
• Sample common use cases provided– https://twiki.cern.ch/twiki/bin/view/LHCOPN/OpsModelUseCases
• Quality assessment by CH-CERN– When?– Infrastructure and operation
• Suitable data to be availableGCX - LHCOPN meeting - 2009-01-15 10
Changes on the model (2/2)
• Responsibilities highlighted– Outages on links between T0 and T1s are of
responsibility of T1s (which ordered the link) – Responsibility for outages on T1-T1 links are being
studied (should be mapped from existing contract by studying costs model: who pays what, where)
– Responsibility for GGUS' ticket is on the unique site which the ticket is assigned to
• « You take responsibilities for what you ordered »
GCX - LHCOPN meeting - 2009-01-15 11
Areas of weaknesses
• Robustness to be really ensured– Will sites play the game?– Is quality assessment a sufficient way to be
protected from passivity of sites?
• Grid interactions– They have to provide us clear communication
channels
GCX - LHCOPN meeting - 2009-01-15 12
2 –IMPLEMENTATION
GCX - LHCOPN meeting - 2009-01-15 13
Tools status (1/4)
• Global information repository= CERN twiki https://twiki.cern.ch/twiki/bin/view/LHCOPN/WebHome
– Deeply reorganised– With private part
• TTS access details, statistics reports…
• Change management database will be into– https://twiki.cern.ch/twiki/bin/view/LHCOPN/ChangeManagementDatabase
– Acts as LHCOPN’s technical logbookGCX - LHCOPN meeting - 2009-01-15 14
Global web repository(Twiki)
Operational procedures
Operational contacts
Technical information
Change management DB
Statistics reports
Tools status (2/4)
• LHCOPN trouble ticket system= GGUS dedicated helpdesk
– Access previously opened to the ops working group• First review done and requests sent 2008-12-15• Group certificate?
– Really taking shape– Next release = first production usable release
• 2009-02-01
GCX - LHCOPN meeting - 2009-01-15 15
LHCOPN TTS(GGUS)
Tools status (3/4)• Around GGUS
– 15 pending actions• Details but also key things for production use
– E-mail reminders• A weekly reminder of GGUS tickets assigned to a site and
opened• A weekly reminder of GGUS tickets submitted by a site and
still opened
– E-mail notifications• By default only to impacted sitenames and site which the
ticket is assigned (if different)• More notification options: No notification or to all
GCX - LHCOPN meeting - 2009-01-15 16
Tools status (4/4)
• LHCOPN Planning/Calendar - Ongoing– Automatic export of GGUS tickets in open
iCalendar standard format (.ics)– And a web instance of the calendar
GCX - LHCOPN meeting - 2009-01-15 17
Other
• New link IDs for “hidden” links that can deeply affect the LHCOPN
– DE-KIT-I-II-LHCOPN-001, CH-CERN-I-II-LHCOPN-001, IT-INFN-CNAF-MIL-BOL-LHCOPN-001TW-ASGC-AMS-TPE-LHCOPN-001, TW-ASGC-AMS-CHI-LHCOPN-001, TW-ASGC-CHI-TPE-LHCOPN-001
• Key dependencies: Monitoring– Soon trustable?– ASPDrawer – BGP monitoring
• Deploy it fully, hosted by CERN, integrated within MDM? – cf. tomorrow’s talk
– DownCollector’s LHCOPN flavourhttps://ccenoc.in2p3.fr/DownCollector/
GCX - LHCOPN meeting - 2009-01-15 18
Proposed roadmap for implementation
GCX - LHCOPN meeting - 2009-01-15 19
2009
63 4 71 2 5 8 109
Pro
duct
ion
vers
ion
Mod
el c
ompu
lsor
y
Janu
ary’
s LHC
OPN
mee
ting
First
pub
lic re
leas
e of
LHCO
PN T
TS
April
’s LH
COPN
mee
ting
Key i
mpr
ovem
ents
and
adj
uste
men
ts o
f the
mod
elJu
ly’s L
HCOP
N m
eetin
g
First
com
plet
e as
sess
men
t and
fina
l adj
uste
men
ts
LHC
star
tup
Tria
l ver
sion
Mod
el o
ptio
nal
Fina
l pro
duct
ion
vers
ion
Mod
el c
ompu
lsor
y
11
Next steps
• Gather GGUS accesses details– Table to be filled on twiki
https://twiki.cern.ch/twiki/bin/view/LHCOPN/TTSdetails
• “Test” tickets, notifications and twiki accesses• Dissemination around the ops model?
– Presentation and “training”?– Target: 12 router operators?– Define KPI
GCX - LHCOPN meeting - 2009-01-15 20
PendingOps model:• Finalise implementation, test, disseminate, assess,
improveTools:• GGUS production usable release (2009-02-01)
– And accesses
• CalendarOthers:• Monitoring, quality assessment, unified
authentication
GCX - LHCOPN meeting - 2009-01-15 21
Conclusion• Model itself
– Complex high level view, but flexible– Robustness to be ensured– Need commitment from sites
• Can drive improvement of the model
• Implementation– Tools taking shape– Tighten schedule to match potential LHC start-up
GCX - LHCOPN meeting - 2009-01-15 22
Questions & discussion
GCX - LHCOPN meeting - 2009-01-15 23
top related