lhcc comprehensive review november 2007

24
David Foster, CERN LHCC Comprehensive Review November 2007 LHCOPN Networking Status David Foster Head, Network and Communications Systems Group CERN IT-CS LHCC Comprehensive Review, November 2007

Upload: hiero

Post on 02-Feb-2016

35 views

Category:

Documents


0 download

DESCRIPTION

LHCC Comprehensive Review November 2007. LHCOPN Networking Status. David Foster Head, Network and Communications Systems Group CERN IT-CS. Information. All technical content is on the LHCOPN Twiki : http://lhcopn.cern.ch Coordination Process LHCOPN Meetings (every 3 months) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: LHCC Comprehensive Review November 2007

David Foster, CERN

LHCC Comprehensive ReviewNovember 2007

LHCOPN Networking Status

David FosterHead, Network and Communications Systems Group

CERN IT-CS

LHCC Comprehensive Review, November 2007

Page 2: LHCC Comprehensive Review November 2007

David Foster, CERN

Information

• All technical content is on the LHCOPN Twiki: http://lhcopn.cern.ch

• Coordination Process– LHCOPN Meetings (every 3 months)

• Active Working Groups– Routing– Monitoring– Operations

– Active Interfaces to External Networking Activities• European Network Policy Groups• US Research Networking• Grid Deployment Board• LCG Management Board• EGEE

Page 3: LHCC Comprehensive Review November 2007

David Foster, CERN

Overview

• LHC Wide Area Networking– LHCOPN Mission– Current Status– Production– Issues and Risks

• Not Covered– CERN General Purpose Networking– Accelerator and Experiment Networks– Other Communications Systems

Page 4: LHCC Comprehensive Review November 2007

David Foster, CERN

Mission

• To assure the T0-T1 transfer capability.– Essential for the Grid to distribute data out to the

T1’s.– Capacity must be large enough to deal with most

situation including “Catch up”– The excess capacity can be used for T1-T1

transfers.• Lower priority than T0-T1• May not be sufficient for all T1-T1 requirements

• Resiliency Objective– No single failure should cause a T1 to be

isolated.

Page 5: LHCC Comprehensive Review November 2007

David Foster, CERN

GÉANT2: Consortium of 34 NRENs

Multi-Wavelength Core (to 40) + 0.6-10G Loops

Dark Fiber Core Among16 Countries:

AustriaBelgiumBosnia-Herzegovina Czech RepublicDenmarkFranceGermanyHungaryIrelandItaly,NetherlandSlovakiaSloveniaSpainSwitzerlandUnited Kingdom

22 PoPs, ~200 Sites38k km Leased Services, 12k km Dark Fiber Supporting Light Paths for LHC, eVLBI, et al.

H. Doebbeling

Page 6: LHCC Comprehensive Review November 2007

David Foster, CERN

OPN Status SummaryNovember 2007

Link Status Nominal E2e Capacity Changes Expected

BNL OPN Production 10G (Colt)

FNAL OPN Production 10G (Qwest)

TRIUMF OPN Production 5G +1G Backup link

ASGC OPN Production 2x1G (2.5G+10G to AMS)

10G via GN2 + IP peering with GN2

Q4 07

NDGF OPN Production 10G Connected to Nordunet.

Connect to NDGF OPN

Q1 08

SARA OPN Production 10G Still using GEANT/IP Q4 07

RAL OPN Production 10G

FZK OPN Production 10G

CNAF OPN Production 10G

IN2P3 OPN Production 10G

PIC OPN Production 10G

Page 7: LHCC Comprehensive Review November 2007

David Foster, CERN

USLHCNetNovember 2007

Link Status Nominal E2e Capacity Changes Expected

CERN - MANLan OPN Production 10G (Colt)

CERN - Starlight OPN Production 10G (Qwest)

CERN - NetherLight Backup 10G (GN2)

NetherLight - MANLan Backup 10G (Global Crossing)

MANLan - StarLight Backup 10G (Global Crossing)

MANLan - StarLight Backup 10G (Qwest)

MANLan - London Backup 10G (GC) New Q1 08

London - CERN Backup 10G New Q1 08

Page 8: LHCC Comprehensive Review November 2007

David Foster, CERN

USLHCNet

• A number of links providing alternate routing for primary traffic.

• Relationship with ESNet (and DOE approval) to provide capacity (O(5G)) on the ManLan – AMS link for additional ESNet-GEANT peering– This helps for US Tier-1 to EU Tier-2 connectivity.

– US Tier-2 to EU Tier-1 will require additional peering I2-GEANT. Discussions are ongoing.

LHCC Comprehensive Review, November 2007

Page 9: LHCC Comprehensive Review November 2007

David Foster, CERN

CBF Status SummaryNovember 2007

Link Status Nominal E2e Capacity

Provider Changes Expected

SARA - NDGF 10G Q4 ‘07

SARA - FZK In Place. Unused 10G Q4 ‘07 In Production

FZK - CNAF In Production 10G

FZK – IN2P3 In Production 10G

Triumf - BNL In Place. 1G Q4 ‘07 In Production

Page 10: LHCC Comprehensive Review November 2007

David Foster, CERN

Page 11: LHCC Comprehensive Review November 2007

David Foster, CERN

Page 12: LHCC Comprehensive Review November 2007

David Foster, CERN

Page 13: LHCC Comprehensive Review November 2007

David Foster, CERN

T0-T1 Lambda routing (schematic) Connect. Communicate. Collaborate

DEFrankfurt

Basel

T1 GRIDKA

T1

Zurich

CNAF

DK

Copenhagen

NL

SARA

UK

London

T1

BNL

T1

FNAL

CH

NY

Starlight

MAN LAN

FR

Paris

T1

IN2P3

Barcelona

T1

PIC

ES

Madrid

T1

RAL

ITMilan

Lyon

Strasbourg/Kehl

GENEVA

AtlanticOcean

VSNL N

VSNL S

AC-2/Yellow

Stuttgart

T1 NDGF

T0

Hamburg

T1SURFnet

T0-T1s:CERN-RALCERN-PICCERN-IN2P3CERN-CNAFCERN-GRIDKACERN-NDGFCERN-SARACERN-TRIUMFCERN-ASGCUSLHCNET NY (AC-2)USLHCNET NY (VSNL N)USLHCNET Chicago (VSNL S)

T1

TRIUMFT1

ASGC

???

Via SMW-3 or 4 (?)

Amsterdam

From Michael Enrico, DANTE

Page 14: LHCC Comprehensive Review November 2007

David Foster, CERN

T1-T1 Lambda routing (schematic) Connect. Communicate. Collaborate

DEFrankfurt

Basel

T1 GRIDKA

T1

Zurich

CNAF

DK

Copenhagen

NL

SARA

UK

London

T1

BNL

T1

FNAL

CH

NY

Starlight

MAN LAN

FR

Paris

T1

IN2P3

Barcelona

T1

PIC

ES

Madrid

T1

RAL

ITMilan

Lyon

Strasbourg/Kehl

GENEVA

AtlanticOcean

VSNL N

VSNL S

AC-2/Yellow

Stuttgart

T1 NDGF

T0

Hamburg

T1SURFnet

T1-T1s:GRIDKA-CNAFGRIDKA-IN2P3GRIDKA-SARASARA-NDGF

T1

TRIUMFT1

ASGC

???

Via SMW-3 or 4 (?)

From Michael Enrico, DANTE

Page 15: LHCC Comprehensive Review November 2007

David Foster, CERN

Some Initial ObservationsConnect. Communicate. Collaborate

DEFrankfurt

Basel

T1 GRIDKA

T1

Zurich

CNAF

DK

Copenhagen

NL

SARA

UK

London

T1

BNL

T1

FNAL

CH

NY

Starlight

MAN LAN

FR

Paris

T1

IN2P3

Barcelona

T1

PIC

ES

Madrid

T1

RAL

ITMilan

Lyon

Strasbourg/Kehl

GENEVA

AtlanticOcean

VSNL N

VSNL S

AC-2/Yellow

Stuttgart

T1 NDGF

T0

Hamburg

T1SURFnet(Between CERN and BASEL)Following lambdas run in same fibre pair:CERN-GRIDKACERN-NDGFCERN-SARACERN-SURFnet-TRIUMF/ASGC (x2)USLHCNET NY (AC-2)

Following lambdas run in same (sub-)duct/trench:(all above +)CERN-CNAFUSLHCNET NY (VSNL N) [supplier is COLT]

Following lambda MAY run in same (sub-)duct/trench as all above:USLHCNET Chicago (VSNL S) [awaiting info from Qwest…]

(Between BASEL and Zurich)Following lambdas run in same trench:CERN-CNAFGRIDKA-CNAF (T1-T1)

Following lambda MAY run in same trench as all above:USLHCNET Chicago (VSNL S) [awaiting info from Qwest…]

T1

TRIUMFT1

ASGC

???

Via SMW-3 or 4 (?)

KEY

GEANT2

NREN

USLHCNET

Via SURFnetT1-T1 (CBF)

From Michael Enrico, DANTE

Page 16: LHCC Comprehensive Review November 2007

David Foster, CERN

Result• SARA-CERN lambda has been rerouted• 4th diverse USLHCNET lambda will be added• RAL & PIC still need backups• CNAF needs a 3rd route into CERN

– Long route around “eastern ring” OR– New CBF solution(s)…

• Further investigations required in particular concerning:

– Physical routing of GRIDKA-IN2P3 in Paris area– Leased lambdas passing through UK

• Further analysis is on-going– May be some layer-1 switching solutions (LCAS) that could

help on the GEANT footprint.• Can do “LCAS protected 10GE” for ASGC

– Tests are on-going on the USLHCNet footprint

LHCC Comprehensive Review, November 2007

Page 17: LHCC Comprehensive Review November 2007

David Foster, CERN

Link Layer Monitoring

• Perfsonar very well advanced in deployment (but not yet complete). Monitors the “up/down” status of the links.

• Integrated into the “End to End Coordination Unit” (E2ECU) run by DANTE

• Provides simple indications of “hard” faults.

• Insufficient to understand the quality of the connectivity

LHCC Comprehensive Review, November 2007

Page 18: LHCC Comprehensive Review November 2007

David Foster, CERNLHCC Comprehensive Review, November 2007

Page 19: LHCC Comprehensive Review November 2007

David Foster, CERNLHCC Comprehensive Review, November 2007

Page 20: LHCC Comprehensive Review November 2007

David Foster, CERN

Initial Active Measurements

• One Way Latency– To measure network Reliability & detect Congestion– Between

• Tier0 to Tier1• Tier1 to Tier1

• Bandwidth– To detect & quantify service degradation– Between

• Tier0 and Tier1• Tier1 to Tier1

• ICMP based Latency– To measure Reliability & Congestion– Between

• LHCOPN Edge into Tier1 facility

Page 21: LHCC Comprehensive Review November 2007

David Foster, CERN

Active Monitoring Deployment

• It is a small number of servers at each Tier-1• Dante proposes to deploy this as a “service”.

Managed and maintained by them.– Mainly funded by the GEANT project as part of

the “transition to service” activity.– Major advantages in terms of measurement

quality and consistency.

• Will be presented at the next OB– Documents in preparation to cover requirements

from the T1’s and a “security plan”.

LHCC Comprehensive Review, November 2007

Page 22: LHCC Comprehensive Review November 2007

David Foster, CERN

Operational Procedures

• Have to be finalised but need to deal with change and incident management.– Many parties involved.– Have to agree on the real processes involved (activity

being lead by Mathieu Goutelle)• Recent Operations workshop made some progress

– Try to avoid, wherever possible, too many “coordination units”.

– All parties agreed we need some centralised information to have a global view of the network and incidents.

– Further workshop planned to quantify this.– We also need to understand existing processes used by

T1’s.

LHCC Comprehensive Review, November 2007

Page 23: LHCC Comprehensive Review November 2007

David Foster, CERN

Resiliency Issues

• The physical fiber path considerations continue– Some lambdas have been re-routed. Others still may

be.• Layer3 backup paths for RAL and PIC are still an

issue.– In the case of RAL, excessive costs seem to be a

problem.– For PIC, still some hope of a CBF between RedIris

and Renater• Overall the situation is quite good with the CBF

links, but can still be improved.– Most major “single” failures are protected against.

LHCC Comprehensive Review, November 2007

Page 24: LHCC Comprehensive Review November 2007

David Foster, CERN

Bigger Issues

• Will be important to get some agreements from the T1’s– Active Monitoring– Operational Management – in progress

• GEANT-2 will end (March 2009), GEANT-3 is being planned. GN-4 and beyond?– Assumption is that GEANT will continue ad-infinitum

• What will follow from EGEE-III in terms of network management resources?– Dante may be able to take over most of the

responsibility• Funding for USLHCNet assumed to continue.

LHCC Comprehensive Review, November 2007