connect. communicate. collaborate e2emon michael enrico, dante (representing many others!) tnc 2008,...

41
Connect. Communicate. Collaborate E2Emon Michael Enrico, DANTE (representing many others!) TNC 2008, Bruges, Belgium 22 May 2008 (E2E Link Monitoring) a PerfSONAR-based monitoring system for multi-domain, point-to- point managed bandwidth services

Upload: clemence-williams

Post on 01-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Connect. Communicate. Collaborate

E2Emon

Michael Enrico, DANTE (representing many others!)

TNC 2008, Bruges, Belgium

22 May 2008

(E2E Link Monitoring)

a PerfSONAR-based monitoring system for multi-domain, point-to-point managed bandwidth services

Connect. Communicate. CollaborateOutline

• The motivation for E2Emon• What is E2Emon?• The gory details (OK, not all of them)• How does it look? (some screenshots)• Developments (and a few more screenshots)• Who participates today?• Future development• Summary & credits

(plus some extra material to read off-line)

Connect. Communicate. CollaborateMotivation

• Original motivation was to aid in monitoring of “Cross-Border Fibre” (CBF) wavelength services

• Quickly realised that it would be useful for all wavelength services traversing multiple “provider domains”(e.g. including those transiting GÉANT2)

• Reason why this work landed in GN2’s JRA4 activity (rather than JRA1/perfSONAR)

Karlsruhe

Manno

Basel

MilanoBologna

320km

100km

Connect. Communicate. Collaborate

E2E (Link) Monitoring(The problem space)

Point A

Point B

Domain 1

Domain 2

Domain 3

GOAL: to realise (near) real-time monitoring (link status & in-service PM)

of the constituent parts and of the whole E2ELink A-B

E2ELink A-B

(where E2ELink = discrete layer 1 or 2 service)

Connect. Communicate. CollaborateWhat E2Emon is not…

E2Emon

Everything else

• …a panacea (when it comes to providing quality P2P services in a multi-domain environment)

• …a substitute for sound

multi-domain operational processes

NOTE: for more on this topic leave now (!!!) and see Marian Garcia-Vidondo’s talk on Multi-domain operations in Room D - Erasmus

Connect. Communicate. Collaborate

E2Emon data model:divide & conquer Connect. Communicate. Collaborate

REMEMBER:

Initial focus was on

wavelength services

Domain A Domain B Domain C

Dem

arc

A,2

Dem

arc

B,1

Dem

arc

B,2

Dem

arc

C,3

End

Poi

nt E

1

End

Poi

nt E

2

DomainLink

Responsibility of Domain B

Responsibility of Domain A

E2E_Link

IDLink DomainLink ID

Link

Responsibility of Domain C

DomainLink

ID_PartialLink(Domain B)

ID_PartialLink(Domain A)

IDLink

DomainLink

= TopologyPoint

Connect. Communicate. Collaborate

E2Emon(itoring)(Method)

Point A

PointB

Domain 1

Domain 2

Domain 3

E2ELink A-B

perfSONARMP or MA

perfSONARMP or MA

SOAP/XML

E2Emoncorrelator

perfSONARMP or MA

DomainLink and(partial) ID_Link info

E2ECUoperators SNMP

SOAP/XML

SO

AP

/XM

L

“Weathermap”view for usersHTTP

Connect. Communicate. Collaborate

Basic characteristics(of E2Emonitoring)

• Status information corresponds to network layer 1 and 2• Status information is logical abstraction• No information about physical devices necessary• Domain and Interdomain (ID) link status provided by

constituent domains using perfSONAR– Abstraction process within domain may be non-trivial– Some examples given later

• E2E link status: aggregation of NREN and ID links

Connect. Communicate. Collaborate

• Operational States:– Up – link is available– Degraded – link is up, but has reduced performance (future)– Down – unavailable– Unknown – state is unknown

• Administrative States:– NormalOperational– Maintenance– TroubleShooting– UnderRepair– Unknown

Information available in E2Emon

Not yet any “in-service” PM data• Still for further study• Difficult in heterogeneous

environment• See MCF from PerfSONAR

DJ1.2.5: MCF: Experimental Results & Sub-layer3Monitoring

Connect. Communicate. CollaborateGory detail (data model) Connect. Communicate. Collaborate

Connect. Communicate. Collaborate

Gory detail (raw XML)

<nmwg:metadata id="md3"><nmwg:subject id="sub1">

<nmtl2:link><nmtl2:name type="logical">ams-gen_LHC-06002A</nmtl2:name><nmtl2:globalName type="logical">CERN-TRIUMF-LHCOPN-001</nmtl2:globalName><nmtl2:type>DOMAIN_Link</nmtl2:type><nmwgtopo3:node nodeIdRef="GEANT2-GEN">

<nmwgtopo3:role>DemarcPoint</nmwgtopo3:role></nmwgtopo3:node><nmwgtopo3:node nodeIdRef="GEANT2-AMS">

<nmwgtopo3:role>DemarcPoint</nmwgtopo3:role></nmwgtopo3:node>

</nmtl2:link></nmwg:subject><nmwg:parameters>

<nmwg:parameter name="supportedEventType">UP</nmwg:parameter></nmwg:parameters>

</nmwg:metadata>

<nmwg:data id="d1" metadataIdRef="md3"><ifevt:datum timeType="ISO" timeValue="2008-05-14T16:42:36.0+0100">

<ifevt:stateOper>UP</ifevt:stateOper><ifevt:stateAdmin>NORMALOPERATION</ifevt:stateAdmin>

</ifevt:datum></nmwg:data>

Connect. Communicate. Collaborate How does it look?

(some screenshots follow…)

Connect. Communicate. Collaborate

12

Connect. Communicate. CollaborateConnect. Communicate. CollaborateTypical E2E link(working normally)

Connect. Communicate. CollaborateConnect. Communicate. CollaborateTypical E2E link(failure condition)

Connect. Communicate. Collaborate

The “magic” within the domains

• Refers to the process of synthesizing E2Emon-compliant abstract information from whatever raw data is available

• May need to synthesize from atomic MIB objects like LOS or LOW on a certain set of interfaces/boards– these, in turn, may need to be retrieved directly from

NEs on the data plane OR– from an NMS via a “northbound” interface

• If an NMS is present then it may perform some of the necessary synthesis (but maybe not all!)

• Transmission equipment – may be an SNMP-free zone!

Connect. Communicate. CollaborateExample 1: GARR

What goes into

synthesizing this?

Connect. Communicate. Collaborate

IP MPLS lambda

GARR

SWITCH

CNAFXBO

MIPDKARLSRUHE

DFN

WDM

WDM

WDM

MannoX X

lambda lambda

GINS e2e Service

check the status of segments

E2E Monitoring System

E2E Monitoring System

status aggregation

Connect. Communicate. CollaborateDetail within DomainLink(more detail in slides at end of presentation)

Connect. Communicate. CollaborateExample 2: GÉANT2(more detail in slides at end of presentation)

OSI & IP DCN

DomainLink partial

IDLpartial

IDL

ALU NMS

1353NM (EML)

1354RM (NML)1359

IOO

TRAP Handler& other stuff

& MP/MA

SNMP

traps

Connect. Communicate. CollaborateDevelopments

• Introduced in R2.0…

• Synthesized management object alarm handling• Export of synthesized alarms and defect conditions (via

SNMP traps) to umbrella management systems• Availability statistics• Production/non-production flags

Connect. Communicate. Collaborate

13

Connect. Communicate. Collaborate

ALARM!!!

Connect. Communicate. CollaborateSynthesized alarm handling

RETURN

ALARM CLEARED

ALSO…SNMP trap sent to NOC operators’ dashboard

Connect. Communicate. CollaborateExport to Nagios (E2ECU)(via SNMP)

Connect. Communicate. Collaborate

Who is participating?

GN2 partner HardwareStatus info available?

perfSONAR

installation?Expected RFS

GÉANT2 Alcatel yes done in service now

DFN Huawei yes done in service now

RENATER Alcatel yes done in service now

RedIRIS Nortel 8010 yes done in service now

NORDUnet Alcatel yes not yet forthcoming

GARR Juniper/ADVA yes done in service now

SURFnet [NL] Nortel yes done in service now

ja.net Nortel+Ciena TBC not yet forthcoming

SWITCH Sorrento yes done in service now

CESNET Cisco yes done in service now

PSNC Adva yes done in service now

Internet2 Ciena/Infinera yes not yet forthcoming

CANARIE Nortel yes not yet nearly ready

ESNET Ciena/Infinera yes done in service now

USLHCNET Ciena yes done in service now

Fermilab various yes done in service now

CERN Force 10 + others yes done in service now

IN2P3 ? yes done in service now

DEISA Cisco Yes Done in service now

Connect. Communicate. CollaborateThe future?

• Minor release (R2.1) on the way• Big omission is still in-service PM stats

– Do we invest the effort to rectify this? Do we need it?

• Making it more “production quality”:– Need to encourage a more thorough approach to feeding E2Emon

(improve on quality of MP/MA data, availability of MP/MA, etc)– Add controls to better control front-end view and manage

synthesized alarms– Add HA?

• Adding proper AAI support

Connect. Communicate. CollaborateSummary

• E2Emon came about as a “quick fix” to an immediate problem – (monitoring wavelength services in a multi-domain environment)

• Now adopted in a production environment (E2ECU)• Wider applicability (within R&E net community)

– Sub-wavelength services (e.g. GE EPLs)– Will be adapted to monitor short-lived services (e.g. created using

AutoBAHN, DICE CP, etc)

• Wider applicability (outside R&E net community)?– do we try to take this to the standards bodies?

Connect. Communicate. CollaborateURLs

• Most material (documentation, downloads, etc) can be found on the PerfSONAR wiki at:

http://wiki.perfsonar.net/jra1-wiki/index.php/PerfSONAR_support_for_E2E_Link_Monitoring

• E2E Monitoring System (Sandbox)

http://cnmdev.lrz-muenchen.de/e2e/lhc/mon/G2_E2E_index_ALL.html

Connect. Communicate. CollaborateCreditsM&M

Matthias Hamm & Mark Yampolskiy

DFN/LRZ/MNMT (München)

(Developers)

Other authorsOtto Kreiter & Loukik Kudarimoti

(DANTE)

Giovanni Cesaroni (GARR)

(Developers/contributors)

GÉANT2 JRA-4 (WI-03)and

PerfSONAR folks

Emma Apted

DANTE Operations

(Coordinator)

Numerous othersin participating domains

(Implementers/maintainers of

domain-specific “magic”)

Connect. Communicate. Collaborate

That was…

E2E (Link) mon itoring

Questions?

Connect. Communicate. CollaborateExtra info

• On the “magic” within the domains(GARR and GÉANT2) …

Connect. Communicate. CollaborateExample 1: GARR

What goes into

synthesizing this?

Connect. Communicate. Collaborate

IP MPLS lambda

GARR

SWITCH

CNAFXBO

MIPDKARLSRUHE

DFN

WDM

WDM

WDM

MannoX X

lambda lambda

GINS e2e Service

check the status of segments

E2E Monitoring System

E2E Monitoring System

status aggregation

Connect. Communicate. CollaborateDetail within DomainLink 1

Connect. Communicate. Collaborate

Detail within DomainLink 2

Connect. Communicate. Collaborate

Domain information, LSP status, trafficInterdomain information, e2e L2 circuit status

GARR UI(MPLS monitoring)

Connect. Communicate. Collaborate

Information:mplsLspName 1.3.6.1.4.1.2636.3.2.3.1.1mplsLspPathChanges 1.3.6.1.4.1.2636.3.2.3.1.10mplsLspLastPathChange 1.3.6.1.4.1.2636.3.2.3.1.11mplsLspConfiguredPaths 1.3.6.1.4.1.2636.3.2.3.1.12mplsLspStandbyPaths 1.3.6.1.4.1.2636.3.2.3.1.13mplsLspOperationalPaths 1.3.6.1.4.1.2636.3.2.3.1.14mplsLspFrom 1.3.6.1.4.1.2636.3.2.3.1.15mplsLspTo 1.3.6.1.4.1.2636.3.2.3.1.16mplsPathName 1.3.6.1.4.1.2636.3.2.3.1.17mplsPathType 1.3.6.1.4.1.2636.3.2.3.1.18mplsPathExplicitRoute 1.3.6.1.4.1.2636.3.2.3.1.19

mplsLspState 1.3.6.1.4.1.2636.3.2.3.1.2mplsPathRecordRoute 1.3.6.1.4.1.2636.3.2.3.1.20mplsPathBandwidth 1.3.6.1.4.1.2636.3.2.3.1.21mplsPathCOS 1.3.6.1.4.1.2636.3.2.3.1.22mplsPathInclude 1.3.6.1.4.1.2636.3.2.3.1.23mplsPathExclude 1.3.6.1.4.1.2636.3.2.3.1.24mplsPathSetupPriority 1.3.6.1.4.1.2636.3.2.3.1.25mplsPathHoldPriority 1.3.6.1.4.1.2636.3.2.3.1.26mplsPathProperties 1.3.6.1.4.1.2636.3.2.3.1.27mplsLspOctets 1.3.6.1.4.1.2636.3.2.3.1.3mplsLspPackets 1.3.6.1.4.1.2636.3.2.3.1.4mplsLspAge 1.3.6.1.4.1.2636.3.2.3.1.5mplsLspTimeUp 1.3.6.1.4.1.2636.3.2.3.1.6mplsLspPrimaryTimeUp 1.3.6.1.4.1.2636.3.2.3.1.7mplsLspTransitions 1.3.6.1.4.1.2636.3.2.3.1.8mplsLspLastTransition 1.3.6.1.4.1.2636.3.2.3.1.9

How to get information on an MPLS LSP

1 - Get the snmp index (see next slide)

BO1-MI1-VPN :.66.79.49.45.77.73.49.45.86.80.78.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0

2 - Query

snmpget -v2c -c <comunity> <router> <mplsLspState>.<LSP index>

3 - Parse the output

1 = unknown2 = up3 = down

MPLS Monitor(using Juniper MIBs)

Connect. Communicate. Collaborate

<?$name=$argv[1];$oid=name2oid($name);print $name.": ".$oid."\n";

function name2oid($string) { $hex = ''; $len = strlen($string); for ($i = 0; $i < $len; $i++) { $hex .= ".".str_pad(ord($string[$i]), 2, 0, STR_PAD_LEFT); } $npoints=32-$len; for ($i=0;$i<$npoints;$i++){ $hex .= ".0"; } return $hex;}?>

Finding the index of the LSP

B O 1 - M I .....

.66.79.49.45.77.73.49.45.86.......

:

$ php name2oid.php BO1-MI1-VPN

BO1-MI1-VPN: .66.79.49.45.77.73.49.45.86.80.78.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0

MPLS Monitor(using Juniper MIBs)

Connect. Communicate. CollaborateExample 2: GÉANT2

OSI & IP DCN

DomainLink partial

IDLpartial

IDL

ALU NMS

1353NM (EML)

1354RM (NML)1359

IOO

TRAP Handler& other stuff

& MP/MA

SNMP

traps

Connect. Communicate. CollaborateExample 2: GÉANT2

Path – gen_mil_CERN

OCH trailPhys-link Phys link

Domain link P. IDLink

CERN-SARA-LHC-001

OCH trailPhys-link

P. IDLink

Connect. Communicate. Collaborate

Monitoring data processing “e2e path”

Connect. Communicate. CollaborateGÉANT2 Alarm analyzer

• Called every time a trap is received• Written in bash• Each trap is analyzed separately

– if in the meantime a new trap arrives it waits in the queue (snmptrapd)

• Must maintain state• After analysing the trap, action is taken call the data transformation

script• Had several problems:

– snmptrapd version– Alcatel snmp problems

• After one year of testing and modification currently stable – awaiting a new NMS upgrade – or an alarm churn

Connect. Communicate. CollaborateE2E Data transformation

• Applications developed in Java: – E2EXMLWriter– XMLGenerator

• E2EXMLWriter takes in a template XML and produces an XML file containing live e2e path status information conforming to the JRA4 e2e data model– Triggered by the bash script listening to SNMP alarms– Parameters passed

• Trail ID• Status

• E2EXMLWriter – updates the perfSONAR MA

• XMLGenerator produces this template XML that E2EXMLWriter uses to export domain’s e2e information