perfsonar in atlas/wlcg shawn mckee, marian babik atlas jamboree / network section 3 rd december...

14
perfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014

Upload: bertram-richard

Post on 23-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014

perfSONAR in ATLAS/WLCGShawn McKee, Marian BabikATLAS Jamboree / Network Section3rd December 2014

Page 2: PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014

● perfSONAR has been deployed to monitor the network● The WLCG Networks and Transfer Metrics working

group is in the middle of a campaign to get perfSONAR upgraded and properly operating at ALL WLCG Tier-2 (and above) sites– Info on the working group is at: https://

twiki.cern.ch/twiki/bin/view/LCG/NetworkTransferMetrics – perfSONAR install details at: https://

twiki.opensciencegrid.org/bin/view/Documentation/DeployperfSONAR

– Primary challenges: newest version 3.4.1 installed, properly configured, firewalls not blocking operation

Introduction

Network Monitoring and Metrics WGMeeting

Page 3: PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014

• perfSONAR 3.4 released Oct 14th

• Restructuring support and operations– Introduced site-level support via GGUS

• Rewritten documentation– https://twiki.opensciencegrid.org/bin/view/Documentation/DeployperfSONAR

• Responded to ShellShock and Poodle– Sites advised to terminated their instances– Performed security audit and established security procedures

• Testing and validation of the new perfSONAR central configuration is in progress

• Ongoing perfSONAR 3.4 update campaign – includes migration to the new configuration system – security considerations documented– progressing well (119 sonars updated out of 214)– See http://grid-monitoring.cern.ch/perfsonar_coverage.txt – deadline 8th January (we start ticketing sites after…)

3

perfSONAR ops

Network Monitoring and Metrics WGMeeting

Page 4: PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014

• Mesh-configuration tool deployed in OSG production– Extra slides at end cover this tool

• Provides central interface to reconfigure the entire network– All aspects – tests parameters, mesh participation– List of available sonars taken from GOCDB and OIM– Supports hierarchical support model (per mesh admins)– Web interface– Connected to perfSONAR infrastructure monitoring

• Site reconfiguration needed to adopt– Run as part of 3.4 campaign

• perfSONAR data store status and plans– Deployed in OSG ITB – several major issues fixed– Scale tests on-going this week; operationally ready for production– Will be the “source” of network metrics for OSG/WLCG– Plan is to feed ->SSB->AGIS->SchedConfigDB continuously

Network Monitoring and Metrics WG Meeting

4

perfSONAR config and store

Page 5: PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014

• Via perfSONAR we gather a number of metrics:– Topology/path-information via traceroute– One-way delay via OWAMP– Packet-loss via OWAMP– Usable bandwidth via BWCTL

• ESnet has some nice pages on using perfSONAR to identify problems– http://fasterdata.es.net/performance-testing/evaluating-n

etwork-performance/

• Some specific examples are discussed in the presentation from last week’s WG meeting: https://indico.cern.ch/event/354593/ Network Monitoring and Metrics WG

Meeting5

What?: Metrics and Their Use

Page 6: PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014

• Saul Youssef has made a study of FTS transfers (see http://egg.bu.edu/atlas/studies%7btype:egg.Hatch%7d/fts-wan-study-2-for-adc/plot_latest/ )– Uses FTS transfer data + traceroute; assumes avg rate/file– Identifies problematic FTS channels– Can identify problematic hops in the network– Update at

http://egg.bu.edu/atlas/studies%7btype:egg.Hatch%7d/FTS_November_2014_bonus/

• This technique can be extended to other network metrics (perfSONAR, FAX, etc)

• Is very similar to PuNDITs network tomography algorithm• We should plan to incorporate this technique in ops:

– Automate this analysis on specific datasets (perfSONAR,FTS,FAX)– Plan to use it to identify problematic paths/links and FIX them!

Network Monitoring and Metrics WG Meeting

6

Saul’s Network Study

Page 7: PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014

• We have a number of tools to track, monitor and manage the perfSONAR deployment– OMD (Nagios “bundle”) to track service status,

versions, configuration• https://maddash.aglt2.org/WLCGperfSONAR/check_mk (prototype)• Credentials WLCGps/WLCG to “read”• Now have a new version respecting x509 certs to be put into prod.

– MaDDash to visualize metrics • http://maddash.aglt2.org/maddash-webui/ (prototype)

– Summary coverage http://grid-monitoring.cern.ch/perfsonar_coverage.txt

– Mesh config/management (see slides at end)Network Monitoring and Metrics WG

Meeting7

Monitoring/Management Tools

Page 8: PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014

• perfSONAR instances must be upgraded and properly configured.

• WG waiting on input from ATLAS (and others) on use-cases/requirements for network metrics– Strawman document ready early next year

• Discussion topics– How best to correlate perfSONAR instances with storage?– Tuning perfSONAR parameters and coverage– Requirements for “user” API for datastore – Using perfSONAR data (network tomography; problem

location; problem identification)

Network Monitoring and Metrics WG Meeting

8

perfSONAR Related Items

Page 9: PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014

• We need to get perfSONAR data consistently available from all our sites, covering all our paths. Get sites upgraded/configured!

Questions?Discussion, Comments?

Network Monitoring and Metrics WG Meeting

9

Conclusion

Page 10: PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014

Network Monitoring and Metrics WG Meeting

10

Mesh-Config GUI Host Groups

OSG (Soichi) has developed a nice web interface for mesh creation and configurationCurrently implements access based upon x509 credential. No fined-grain authorization: either ‘admin’ or ‘no access’Instances found from perfSONAR registration information from OIM (OSG) or GOCDB (WLCG)

Page 11: PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014

Network Monitoring and Metrics WG Meeting

11

Mesh-Config Parameters

Parameters for perfSONAR tests are controlled centrally. Easy to modify as required

Page 12: PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014

Network Monitoring and Metrics WG Meeting

12

Mesh-Config Meshes

Meshes can be created using this tab. This is the “metadata” needed to organize sets of perfSONAR instances.

Page 13: PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014

Network Monitoring and Metrics WG Meeting

13

Mesh-Config Test Definitions

What tests get run for a mesh? That is controlled by this section.

Page 14: PerfSONAR in ATLAS/WLCG Shawn McKee, Marian Babik ATLAS Jamboree / Network Section 3 rd December 2014

• Once meshes are defined they are exposed via a URL like:http://myosg.grid.iu.edu/pfmesh/json/name/<mesh-name>?new

• Example for us-atlas: http://myosg.grid.iu.edu/pfmesh/json/name/us-atlas?new

• Status: In production but without the ?new will return the old “static” values hosted on CERN AFS

Network Monitoring and Metrics WG Meeting

14

Mesh Config URL