a day in the life of a netadmin - the calm after the storm
TRANSCRIPT
![Page 1: A Day in the Life of a NetAdmin - The Calm After the Storm](https://reader031.vdocuments.mx/reader031/viewer/2022021213/577d26ef1a28ab4e1ea29678/html5/thumbnails/1.jpg)
8/6/2019 A Day in the Life of a NetAdmin - The Calm After the Storm
http://slidepdf.com/reader/full/a-day-in-the-life-of-a-netadmin-the-calm-after-the-storm 1/2
A Day in the Life
of a Network AdministratorTHE CALM AFTER THE STORM…
How did a day of crisis prompt my
company to rethink the way we managed
our network? Pretty easily. We knew
we needed to change, we’d already
had our fair share of problems, but it
took a real crisis to spur us into action.
With our new network management
solution in place, we are shocked at
what’s been happening on our network
for who knows how long! Now we know
exactly what’s working and what’s not
– which is a great position to be in. As a
network administrator or manager, I‘m
sure you’ve lived through a few of them
yourself. If you haven’t already learned
from them, perhaps our story will drive
you to action too.
HOW IT ALL STARTED
It started with an unforeseen network
outage on a week day, right in the
middle of our peak sales season. Sure,
our network had gone down before -
even for several hours, but never at
such a critical time. We basically lost
access to our customers on one of
our busiest sales days of the year. We
do nearly 25% of our annual business
during our peak season, so any kind
of network availability issue costs the
company big dollars.
In hindsight, we know our crisis that
day was complex. Multiple things were
going wrong but we had no clear way
of seeing if and how they were related.
We could only guess at the reasons and
underlying causes, because we had
no unified way of troubleshooting or
visualizing the network dependencies.
It turned out to be a very educational
day for us- we learned from the crisis
and changed the way we manage
the network. As a result, we’re much
better equipped to minimize downtime
and outages and in some cases, even
prevent them.
FROM NETWORK SLOWDOWN
TO A CRISIS
The first hint of a problem showed up
early that morning. One of our company
sales reps called to say that her Webex
session was slow. These slowdowns
were usually temporary and resulted
from occasional spikes in traffic. We
were all set to add a new T1 line (next
month), but that wouldn’t help us
today.
In the next ten minutes we got two more
calls – one from a sales rep saying that
his (VoIP) phone conversations had
become hard to follow and the other
call was from our Webmaster, who had
noticed that our order entry application
web pages were taking a long time to
load. When it rains it pours. Next, we
got a call from our telesales manager
telling us that all sales reps were
experiencing noticeable problems on
the phone. With this news, we knew it
was not an isolated user problem at all
– it was now a network wide issue. We
had to act fast.
ABOUT ME
My name is Mark Brown and I’m
a Network Administrator. I have a
degree in Information Technology
and have been in my job for almost
four years.
MY COMPANY
I work for a medical device and
technology reseller. My boss
(Director of IT) and I are responsible
for supporting 80 people at ouroffice. We do nearly half of our
business online and the rest via
our telesales team. For a relatively
small company, we have a pretty
sophisticated infrastructure and key
business apps which need to be
available 24x7.
TECHNOLOGY ENVIRONMENT
Our web site and app servers are
located in a datacenter upstate
but our email servers, file servers,
VoIP servers and our demomachines are in house. Our sales
team use Webex regularly, and we
migrated to a VoIP system about
two years ago. Altogether, we
have approximately 20 servers, 90
workstations and phones and 40
network devices.
BEFORE AND AFTER
For the last six months, we’ve
been using a network and systems
management solution called
WhatsUp Gold. It basically runs ournetwork infrastructure, so I can
focus on what I need to get done. I
used to be forever behind schedule,
even coming in on weekends.
Now, all that has changed and
it’s a great feeling personally
and professionally to be ahead of
what’s going on rather than being
behind it.
![Page 2: A Day in the Life of a NetAdmin - The Calm After the Storm](https://reader031.vdocuments.mx/reader031/viewer/2022021213/577d26ef1a28ab4e1ea29678/html5/thumbnails/2.jpg)
8/6/2019 A Day in the Life of a NetAdmin - The Calm After the Storm
http://slidepdf.com/reader/full/a-day-in-the-life-of-a-netadmin-the-calm-after-the-storm 2/2
TROUBLESHOOTING THE OLD FASHIONED WAY
We checked the stats on the VoIP systems management
portal – and sure enough latency was high and call quality
was down. It looked liked network congestion. We’d
used a free tool called Wireshark to troubleshoot network
traffic issues before, so we set it up to monitor the
current problem. Yet when we looked at the results from
Wireshark, the traffic seemed within range. We began tothink the problem could be our external link.
Next step – call the service provider. We spent 30 minutes
on the phone troubleshooting our gateway router and
external link. Our service provider told us both tested fine,
so it wasn’t a link (or internet connectivity) problem after
all. During the call they gave us some interesting news;
they told us they were seeing occasional bursts of traffic
on our external link. Perhaps it was a congestion problem
after all.
We checked the traffic again via Wireshark and there itwas! We had missed it before, because the traffic was
fluctuating wildly. We were most certainly congested and
the traffic was coming from inside our network. There was
an extraordinary amount of HTTP and RTP header packets
and it looked like a lot of unnecessary traffic. Everyone
knew how critical the peak season was to our revenues
and they were especially careful not to load the network.
So, what would explain this burst of traffic?
DIGGING DEEPERWe checked the stats on the VoIP server. Sure enough
the retransmission of the failed packets was overloading
network I/O and CPU utilization was high. In fact, some
of the calls were being routed to our backup VoIP
server which shared the same system as our order entry
app server. We checked the back-up server and it was
overloaded too, now we knew why the order entry app
was so slow – faced with a double whammy of network
congestion and server performance issues.
The crisis was now full blown. We needed tofix the situationfast. We had eliminated any issue from the service provider
and from our own end-users. The network was congested
off and on and the bursts were being seen on the externa
link. Clearly, this meant that one or more of our interna
devices were communicating with an external site. And
then it dawned on us – it was a virus. That would explain
the traffic and the connection to the external site. Now, it
was not just an issue of performance – but could possibly
be a security breach as well.
INOCULATING THE NETWORK
Eliminating the virus by shutting down the external link
was not an option. The network was too critical to the
business in peak season. The next few hours were a mad
rush to find the infected machines and quarantine them
We didn’t have a topology diagram, and we didn’t know
what was on each subnet, which made it harder. In the
end, we found four infected workstations and one server
After we shut the last one off, traffic returned to normalWe rebooted the primary VoIP server too and after nearly
six hectic hours, we were back to business as usual.
PUTTING A LONG TERM SOLUTION IN PLACE
This was a wake-up call for my boss, the CEO, and for me
We had discussed purchasing a network management
solution before, but we were never able to carve out the
budget for it. Now we knew that if we had a solution in
place – we could have saved valuable hours trying to find
and solve the problem. The right performance monitoring
would have alerted us to network traffic congestion and
persistent high utilization on the servers. We would have
known the network topology and been able to visualize
the affected subnets and nodes. And active monitoring
for instances of failure, like the high number of dropped
packets, would have alerted us to impending faults far
before it impacted our end users and our business. It
didn’t take us long to make the decision to put WhatsUp
Gold in charge of our network management. Thankfully
life has been quieter ever since.