connect. communicate. collaborate e2emon michael enrico, dante (representing many others!) tnc 2008,...
TRANSCRIPT
Connect. Communicate. Collaborate
E2Emon
Michael Enrico, DANTE (representing many others!)
TNC 2008, Bruges, Belgium
22 May 2008
(E2E Link Monitoring)
a PerfSONAR-based monitoring system for multi-domain, point-to-point managed bandwidth services
Connect. Communicate. CollaborateOutline
• The motivation for E2Emon• What is E2Emon?• The gory details (OK, not all of them)• How does it look? (some screenshots)• Developments (and a few more screenshots)• Who participates today?• Future development• Summary & credits
(plus some extra material to read off-line)
Connect. Communicate. CollaborateMotivation
• Original motivation was to aid in monitoring of “Cross-Border Fibre” (CBF) wavelength services
• Quickly realised that it would be useful for all wavelength services traversing multiple “provider domains”(e.g. including those transiting GÉANT2)
• Reason why this work landed in GN2’s JRA4 activity (rather than JRA1/perfSONAR)
Karlsruhe
Manno
Basel
MilanoBologna
320km
100km
Connect. Communicate. Collaborate
E2E (Link) Monitoring(The problem space)
Point A
Point B
Domain 1
Domain 2
Domain 3
GOAL: to realise (near) real-time monitoring (link status & in-service PM)
of the constituent parts and of the whole E2ELink A-B
E2ELink A-B
(where E2ELink = discrete layer 1 or 2 service)
Connect. Communicate. CollaborateWhat E2Emon is not…
E2Emon
Everything else
• …a panacea (when it comes to providing quality P2P services in a multi-domain environment)
• …a substitute for sound
multi-domain operational processes
NOTE: for more on this topic leave now (!!!) and see Marian Garcia-Vidondo’s talk on Multi-domain operations in Room D - Erasmus
Connect. Communicate. Collaborate
E2Emon data model:divide & conquer Connect. Communicate. Collaborate
REMEMBER:
Initial focus was on
wavelength services
Domain A Domain B Domain C
Dem
arc
A,2
Dem
arc
B,1
Dem
arc
B,2
Dem
arc
C,3
End
Poi
nt E
1
End
Poi
nt E
2
DomainLink
Responsibility of Domain B
Responsibility of Domain A
E2E_Link
IDLink DomainLink ID
Link
Responsibility of Domain C
DomainLink
ID_PartialLink(Domain B)
ID_PartialLink(Domain A)
IDLink
DomainLink
= TopologyPoint
Connect. Communicate. Collaborate
E2Emon(itoring)(Method)
Point A
PointB
Domain 1
Domain 2
Domain 3
E2ELink A-B
perfSONARMP or MA
perfSONARMP or MA
SOAP/XML
E2Emoncorrelator
perfSONARMP or MA
DomainLink and(partial) ID_Link info
E2ECUoperators SNMP
SOAP/XML
SO
AP
/XM
L
“Weathermap”view for usersHTTP
Connect. Communicate. Collaborate
Basic characteristics(of E2Emonitoring)
• Status information corresponds to network layer 1 and 2• Status information is logical abstraction• No information about physical devices necessary• Domain and Interdomain (ID) link status provided by
constituent domains using perfSONAR– Abstraction process within domain may be non-trivial– Some examples given later
• E2E link status: aggregation of NREN and ID links
Connect. Communicate. Collaborate
• Operational States:– Up – link is available– Degraded – link is up, but has reduced performance (future)– Down – unavailable– Unknown – state is unknown
• Administrative States:– NormalOperational– Maintenance– TroubleShooting– UnderRepair– Unknown
Information available in E2Emon
Not yet any “in-service” PM data• Still for further study• Difficult in heterogeneous
environment• See MCF from PerfSONAR
DJ1.2.5: MCF: Experimental Results & Sub-layer3Monitoring
Connect. Communicate. Collaborate
Gory detail (raw XML)
<nmwg:metadata id="md3"><nmwg:subject id="sub1">
<nmtl2:link><nmtl2:name type="logical">ams-gen_LHC-06002A</nmtl2:name><nmtl2:globalName type="logical">CERN-TRIUMF-LHCOPN-001</nmtl2:globalName><nmtl2:type>DOMAIN_Link</nmtl2:type><nmwgtopo3:node nodeIdRef="GEANT2-GEN">
<nmwgtopo3:role>DemarcPoint</nmwgtopo3:role></nmwgtopo3:node><nmwgtopo3:node nodeIdRef="GEANT2-AMS">
<nmwgtopo3:role>DemarcPoint</nmwgtopo3:role></nmwgtopo3:node>
</nmtl2:link></nmwg:subject><nmwg:parameters>
<nmwg:parameter name="supportedEventType">UP</nmwg:parameter></nmwg:parameters>
</nmwg:metadata>
<nmwg:data id="d1" metadataIdRef="md3"><ifevt:datum timeType="ISO" timeValue="2008-05-14T16:42:36.0+0100">
<ifevt:stateOper>UP</ifevt:stateOper><ifevt:stateAdmin>NORMALOPERATION</ifevt:stateAdmin>
</ifevt:datum></nmwg:data>
Connect. Communicate. CollaborateConnect. Communicate. CollaborateTypical E2E link(working normally)
Connect. Communicate. CollaborateConnect. Communicate. CollaborateTypical E2E link(failure condition)
Connect. Communicate. Collaborate
The “magic” within the domains
• Refers to the process of synthesizing E2Emon-compliant abstract information from whatever raw data is available
• May need to synthesize from atomic MIB objects like LOS or LOW on a certain set of interfaces/boards– these, in turn, may need to be retrieved directly from
NEs on the data plane OR– from an NMS via a “northbound” interface
• If an NMS is present then it may perform some of the necessary synthesis (but maybe not all!)
• Transmission equipment – may be an SNMP-free zone!
Connect. Communicate. Collaborate
IP MPLS lambda
GARR
SWITCH
CNAFXBO
MIPDKARLSRUHE
DFN
WDM
WDM
WDM
MannoX X
lambda lambda
GINS e2e Service
check the status of segments
E2E Monitoring System
E2E Monitoring System
status aggregation
Connect. Communicate. CollaborateDetail within DomainLink(more detail in slides at end of presentation)
Connect. Communicate. CollaborateExample 2: GÉANT2(more detail in slides at end of presentation)
OSI & IP DCN
DomainLink partial
IDLpartial
IDL
ALU NMS
1353NM (EML)
1354RM (NML)1359
IOO
TRAP Handler& other stuff
& MP/MA
SNMP
traps
Connect. Communicate. CollaborateDevelopments
• Introduced in R2.0…
• Synthesized management object alarm handling• Export of synthesized alarms and defect conditions (via
SNMP traps) to umbrella management systems• Availability statistics• Production/non-production flags
Connect. Communicate. Collaborate
ALARM!!!
Connect. Communicate. CollaborateSynthesized alarm handling
RETURN
ALARM CLEARED
ALSO…SNMP trap sent to NOC operators’ dashboard
Connect. Communicate. Collaborate
Who is participating?
GN2 partner HardwareStatus info available?
perfSONAR
installation?Expected RFS
GÉANT2 Alcatel yes done in service now
DFN Huawei yes done in service now
RENATER Alcatel yes done in service now
RedIRIS Nortel 8010 yes done in service now
NORDUnet Alcatel yes not yet forthcoming
GARR Juniper/ADVA yes done in service now
SURFnet [NL] Nortel yes done in service now
ja.net Nortel+Ciena TBC not yet forthcoming
SWITCH Sorrento yes done in service now
CESNET Cisco yes done in service now
PSNC Adva yes done in service now
Internet2 Ciena/Infinera yes not yet forthcoming
CANARIE Nortel yes not yet nearly ready
ESNET Ciena/Infinera yes done in service now
USLHCNET Ciena yes done in service now
Fermilab various yes done in service now
CERN Force 10 + others yes done in service now
IN2P3 ? yes done in service now
DEISA Cisco Yes Done in service now
Connect. Communicate. CollaborateThe future?
• Minor release (R2.1) on the way• Big omission is still in-service PM stats
– Do we invest the effort to rectify this? Do we need it?
• Making it more “production quality”:– Need to encourage a more thorough approach to feeding E2Emon
(improve on quality of MP/MA data, availability of MP/MA, etc)– Add controls to better control front-end view and manage
synthesized alarms– Add HA?
• Adding proper AAI support
Connect. Communicate. CollaborateSummary
• E2Emon came about as a “quick fix” to an immediate problem – (monitoring wavelength services in a multi-domain environment)
• Now adopted in a production environment (E2ECU)• Wider applicability (within R&E net community)
– Sub-wavelength services (e.g. GE EPLs)– Will be adapted to monitor short-lived services (e.g. created using
AutoBAHN, DICE CP, etc)
• Wider applicability (outside R&E net community)?– do we try to take this to the standards bodies?
Connect. Communicate. CollaborateURLs
• Most material (documentation, downloads, etc) can be found on the PerfSONAR wiki at:
http://wiki.perfsonar.net/jra1-wiki/index.php/PerfSONAR_support_for_E2E_Link_Monitoring
• E2E Monitoring System (Sandbox)
http://cnmdev.lrz-muenchen.de/e2e/lhc/mon/G2_E2E_index_ALL.html
Connect. Communicate. CollaborateCreditsM&M
Matthias Hamm & Mark Yampolskiy
DFN/LRZ/MNMT (München)
(Developers)
Other authorsOtto Kreiter & Loukik Kudarimoti
(DANTE)
Giovanni Cesaroni (GARR)
(Developers/contributors)
GÉANT2 JRA-4 (WI-03)and
PerfSONAR folks
Emma Apted
DANTE Operations
(Coordinator)
Numerous othersin participating domains
(Implementers/maintainers of
domain-specific “magic”)
Connect. Communicate. Collaborate
IP MPLS lambda
GARR
SWITCH
CNAFXBO
MIPDKARLSRUHE
DFN
WDM
WDM
WDM
MannoX X
lambda lambda
GINS e2e Service
check the status of segments
E2E Monitoring System
E2E Monitoring System
status aggregation
Connect. Communicate. CollaborateDetail within DomainLink 1
Connect. Communicate. Collaborate
Domain information, LSP status, trafficInterdomain information, e2e L2 circuit status
GARR UI(MPLS monitoring)
Connect. Communicate. Collaborate
Information:mplsLspName 1.3.6.1.4.1.2636.3.2.3.1.1mplsLspPathChanges 1.3.6.1.4.1.2636.3.2.3.1.10mplsLspLastPathChange 1.3.6.1.4.1.2636.3.2.3.1.11mplsLspConfiguredPaths 1.3.6.1.4.1.2636.3.2.3.1.12mplsLspStandbyPaths 1.3.6.1.4.1.2636.3.2.3.1.13mplsLspOperationalPaths 1.3.6.1.4.1.2636.3.2.3.1.14mplsLspFrom 1.3.6.1.4.1.2636.3.2.3.1.15mplsLspTo 1.3.6.1.4.1.2636.3.2.3.1.16mplsPathName 1.3.6.1.4.1.2636.3.2.3.1.17mplsPathType 1.3.6.1.4.1.2636.3.2.3.1.18mplsPathExplicitRoute 1.3.6.1.4.1.2636.3.2.3.1.19
mplsLspState 1.3.6.1.4.1.2636.3.2.3.1.2mplsPathRecordRoute 1.3.6.1.4.1.2636.3.2.3.1.20mplsPathBandwidth 1.3.6.1.4.1.2636.3.2.3.1.21mplsPathCOS 1.3.6.1.4.1.2636.3.2.3.1.22mplsPathInclude 1.3.6.1.4.1.2636.3.2.3.1.23mplsPathExclude 1.3.6.1.4.1.2636.3.2.3.1.24mplsPathSetupPriority 1.3.6.1.4.1.2636.3.2.3.1.25mplsPathHoldPriority 1.3.6.1.4.1.2636.3.2.3.1.26mplsPathProperties 1.3.6.1.4.1.2636.3.2.3.1.27mplsLspOctets 1.3.6.1.4.1.2636.3.2.3.1.3mplsLspPackets 1.3.6.1.4.1.2636.3.2.3.1.4mplsLspAge 1.3.6.1.4.1.2636.3.2.3.1.5mplsLspTimeUp 1.3.6.1.4.1.2636.3.2.3.1.6mplsLspPrimaryTimeUp 1.3.6.1.4.1.2636.3.2.3.1.7mplsLspTransitions 1.3.6.1.4.1.2636.3.2.3.1.8mplsLspLastTransition 1.3.6.1.4.1.2636.3.2.3.1.9
How to get information on an MPLS LSP
1 - Get the snmp index (see next slide)
BO1-MI1-VPN :.66.79.49.45.77.73.49.45.86.80.78.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
2 - Query
snmpget -v2c -c <comunity> <router> <mplsLspState>.<LSP index>
3 - Parse the output
1 = unknown2 = up3 = down
MPLS Monitor(using Juniper MIBs)
Connect. Communicate. Collaborate
<?$name=$argv[1];$oid=name2oid($name);print $name.": ".$oid."\n";
function name2oid($string) { $hex = ''; $len = strlen($string); for ($i = 0; $i < $len; $i++) { $hex .= ".".str_pad(ord($string[$i]), 2, 0, STR_PAD_LEFT); } $npoints=32-$len; for ($i=0;$i<$npoints;$i++){ $hex .= ".0"; } return $hex;}?>
Finding the index of the LSP
B O 1 - M I .....
.66.79.49.45.77.73.49.45.86.......
:
$ php name2oid.php BO1-MI1-VPN
BO1-MI1-VPN: .66.79.49.45.77.73.49.45.86.80.78.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
MPLS Monitor(using Juniper MIBs)
Connect. Communicate. CollaborateExample 2: GÉANT2
OSI & IP DCN
DomainLink partial
IDLpartial
IDL
ALU NMS
1353NM (EML)
1354RM (NML)1359
IOO
TRAP Handler& other stuff
& MP/MA
SNMP
traps
Connect. Communicate. CollaborateExample 2: GÉANT2
Path – gen_mil_CERN
OCH trailPhys-link Phys link
Domain link P. IDLink
CERN-SARA-LHC-001
OCH trailPhys-link
P. IDLink
Connect. Communicate. CollaborateGÉANT2 Alarm analyzer
• Called every time a trap is received• Written in bash• Each trap is analyzed separately
– if in the meantime a new trap arrives it waits in the queue (snmptrapd)
• Must maintain state• After analysing the trap, action is taken call the data transformation
script• Had several problems:
– snmptrapd version– Alcatel snmp problems
• After one year of testing and modification currently stable – awaiting a new NMS upgrade – or an alarm churn
Connect. Communicate. CollaborateE2E Data transformation
• Applications developed in Java: – E2EXMLWriter– XMLGenerator
• E2EXMLWriter takes in a template XML and produces an XML file containing live e2e path status information conforming to the JRA4 e2e data model– Triggered by the bash script listening to SNMP alarms– Parameters passed
• Trail ID• Status
• E2EXMLWriter – updates the perfSONAR MA
• XMLGenerator produces this template XML that E2EXMLWriter uses to export domain’s e2e information