pert – evn-nren –amsterdam 28/1/05 – toby rodwell ([email protected]) the pert and...
TRANSCRIPT
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
The PERT and Network Performance Monitoring
EVN-NREN,Amsterdam 28//01/05
Toby Rodwell, Network Engineer
DANTE
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
Network Performance Problems
• Historically, long distance circuits (the “wide-area”) have been the bottleneck in a network
• In recent years, the capacity of long distance circuits has significantly increased
• End-to-end performance bottle-necks may now occur at any point in a system – end-system (application, OS, hardware), LAN or WAN
• As such, it is becoming more and more difficult for a non-expert end-user to diagnose their network performance issues
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
Origins of the PERT
• Conception of the PERT … Jan 01 Internet2 Meeting– Performance Enhancement and Response Team– To provide a support structure to investigate and resolve
problems in the performance of applications over computer networks
– Comparable to CERT structure
• Realization of the PERT … Dec 2002 TERENA meeting– GARR, TERENA, DANTE, SWITCH, CESnet, HEAnet and
UKERNA committed to a practical trial of a basic PERT
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
The GEANT PERT
• PERT 2002-2004– Informal, unregulated access to PERT; anybody can
request PERT’s help– PERT communicated via e-mail list– Primary purpose of investigation was to improve PERT’s
knowledge and experience– Problems were addressed on a best efforts basis– No dedicated Monitoring tools– RoundUp tracking system (off-the-shelf) used
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
GEANT2 PERT
• A development of the existing PERT• Pilot phase Nov 04 –Feb 05• Fully operational from Mar 05
• A virtual team consisting of – Case Managers, who receive new requests and manage unresolved
issues– Subject specialists who can be called upon to help resolve complex issues
• Monitoring tools – During the course of the GEANT2 project a monitoring infrastructure will
be developed and deployed which should be of particular help with performance troubleshooting
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
PERT Staff
• Case Managers– Part-time staff provided by GEANT2 project participants– On a roster to ensure continuous cover during normal working hours
(once PERT fully operational) – Cross-discipline experts who are capable of identifying the locations of
performance bottle-necks
• Subject Matter Experts – Unfunded volunteers from a potentially wide variety of organizations who
provide help on a best efforts basis– Have specialist knowledge in one or more subjects and so can precisely
diagnose the cause of a given problem and help the end-users resolve it
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
Pilot PERT Systems
• Issue Tracker– Record of PERT issues (cases) and their investigation– Use open-source, “Roundup” software– Publicly accessible at http://roundup.geant2.net:8080/pert
(eVLBI performance case issue4)
• PERT Diary– For assessing the performance of the PERT and highlighting
issues– Uses TWiki open-source software (user editable website)– Publicly accessible at
http://cemp1.switch.ch/cgi-bin/twiki/view/PERTDiary/WebHome
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
PERT Systems
• PERT Ticket System– Similar to Trouble Ticket systems used by NOCs– Optimised for the collaborative nature of PERT investigations
(will collect and records e-mails and Instant Messaging threads)– May directly contact SMEs who have expressed interest in a
particular subject
• Knowledge Base– Known performance issues, with possible ways to address them– Successful diagnostic strategies
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
Lessons Learned to Date
• Identify technical contact at each end• Determine the scope of testing possible
– If production machines involved, some configurations changes may not be acceptable for testing purposes
• Wherever possible, use methods to minimise the amount of variables – e.g. sink data to /dev/null, memory to memory transfer not
to disk
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
Contacting the PERT
• Normally via NREN• Selected pan-European projects (including EVN)
may contact PERT directly – Because the PERT is not 24x7 quick response, suspected
network failures are best reported to NREN/GEANT NOCs
• E-mail address – [email protected]
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
GEANT Network Monitoring
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
Monitoring Tools• GEANT status monitoring
– 5 minute polling - state of equipment, circuits and services – Failed hardware or circuits detected within 10 minutes and action taken
by GEANT NOC, 24x7 • GEANT traffic statistics collection
– 5 minute polling of router interface counters (default and customised)– Collected data stored in a Round Robin Database (RRD), that is kept a
constant size by aggregating data as it ages• GEANT traffic statistics display
– For quick, real-time view – Weathermap– For back history and specialist counters – Taksometro
http://stats.geant.net/
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
Monitoring Tools - Taksometro
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
Monitoring Tools Taksometro
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
Monitoring Tools Taksometro
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
Monitoring Tools ‘Weathermap’ Kairos
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
Monitoring Tools ‘Weathermap’ Kairos
Hyperlinked traffic chart
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
Monitoring Tools – Synagon
(GEANT Ops only)Before After
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
Any Questions?
Thank you.
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
Example Case
• … from last year. • Project has since moved on, but sequence of
events is still instructive• EVN throughput test
– Test the download of 430MB file from the JIVE website in Dwingerloo to the University of Oxford
– Problems with the systems in Oxford, therefore test done between JIVE and a GÉANT workstation.
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
Example Case
• Initial transfer test: – Via http, using wget– Took 5 minutes to complete the 430Mbps transfer,
(approximately 10Mbps throughput)
• PERT case opened• Potential causes
– Ethernet interfaces not full duplex mode– Insufficiently large TCP buffers
PERT – EVN-NREN –Amsterdam 28/1/05 – Toby Rodwell ([email protected])
Example Case– The TCP receive buffers max size on GEANT of reasonable
size– wget uses the default TCP buffer size. TCP default buffer
size increased on two receiver (ws4.uk: Linux -> 8MB, ws1.de: Unix -> 196kB)
Dramatic improvement: 40Mbps– Could not access the JIVE webserver to increase the Tx
buffer (critical production machine)– Access was granted to the JIVE FTP server, where the Tx
buffer was increased to 2MBImprovement: 90Mbps