disaster porn and the value of a generalist
TRANSCRIPT
![Page 1: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/1.jpg)
Disaster Porn... and the importance of being a generalist
![Page 3: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/3.jpg)
Surge Conference 2011
● Ben Fried's (Google CIO) keynote speech talks about the importance being a generalist
● I think specializing is fine (and normal as your career advances), but it's VITAL to keep a generalist perspective
● Disaster porn!● I have no affiliation with OmniTI or Surge,
but I highly recommend you attend the conference in Baltimore on Sept. 27th - 28th
![Page 4: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/4.jpg)
Background
● Taxi Magic○ Mobile applications to book/track/pay for taxis○ Web booking integration for taxi fleets○ In-car payment hardware (PIM)
● What's a PIM?○ Passenger Information Monitor○ 7" HD touchscreen○ Credit card swipe○ Wired into cab hardware and dispatch system○ Uses cellular communication to talk to TM○ Regular GPS events over UDP○ Payment transactions over HTTPS
![Page 5: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/5.jpg)
The problems begin... (June 5th)
● A handful of cab drivers in Los Angeles begin reporting failures when swiping CCs
● Embedded hardware team recalls a few cabs and investigates local log files
● Reports problems during SSL handshake to RideCharge servers
● Tech Ops team remaps httpd to the same libcrypto.so and libssl.so version as the PIM using libmap.conf(5)
● Problem vanishes! HOORAY!!! Beer!
![Page 6: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/6.jpg)
Fast forward to June 12th...
● SHTF● Widespread reports of failing CC swipes
across the entire SoCal region● Hardware team pulls more vehicles and
notices the same SSL handshake problem● Tech Ops team is unable to correlate this to
a drop in traffic● Furthermore, Tech Ops is still seeing regular
GPS updates from ALL active cabs!
![Page 7: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/7.jpg)
WTF?
![Page 8: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/8.jpg)
Diving in...
● Our cellular ISP insists they aren't having any problems
● (Sound familiar to anyone?)● I start running the standard toolkit looking for
patterns○ tcpdump○ traceroute○ NMAP
● NMAP is giving me some inconsistent results
![Page 9: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/9.jpg)
Understanding how TCP/IP works
● How do you establish a TCP connection?○ SYN (Hey, you there?)○ SYN/ACK (Yeah, what's up?)○ ACK (Cool, lets talk!)
● What happens if you connect to a port that doesn't have a service bound to it?○ SYN (Hey, you there?)○ RST (leave me alone!)
● So why am I only getting a RST every now and then? Why do I see timeouts instead?
● This is starting to smell like a routing problem
![Page 10: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/10.jpg)
Proving the problem exists
● Since I am receiving GPS updates over UDP from all the cabs I can use this to identify the IP of a cab and its location at a point in time
● We know the expected behavior when attempting a connection to a closed port
● Let's run some tests and gather some data
![Page 11: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/11.jpg)
comm_test.sh#!/usr/bin/env bash
test_connection () {
# fork a subshell to handle the tcp connect test
( # the result is either no-response or conn-refused
result=$(nmap -P0 -T1 -sT -p22 --reason -q $4 | awk '/^22/{print $4}')
echo "$1 $2 $3 $4 $result $8 $9" >> results.txt
) &
}
# connect to the gps receiver host and monitor real-time UDP gps updates
ssh -t gps001.iad1.prod.rws 'tail -F gps_updates.csv' | while read line ; do
# line format: Jun 16 15:14:45, 184.251.233.91, 0, 20, 2577, \
# 33.9822566666667, -118.4593
line=$(echo $line | tr -d ',')
test_connection $line
done
![Page 12: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/12.jpg)
Results% comm_test.sh
Jun 16 15:28:00 102.122.93.194 conn-refused 33.8221321105957 -116.548851013184
Jun 16 15:27:57 176.135.73.0 conn-refused 32.8885866666667 -97.0376933333333
Jun 16 15:27:59 181.251.163.200 conn-refused 33.9004183333333 -118.387591666667
Jun 16 15:27:53 178.156.201.182 conn-refused 44.9484977722168 -93.2568588256836
Jun 16 15:27:28 180.229.138.141 no-response 39.766675 -104.940496666667
Jun 16 15:27:28 187.231.74.250 no-response 33.80945 -118.206921666667
Jun 16 15:28:00 181.255.84.59 conn-refused 34.0593466666667 -118.24536
Jun 16 15:27:55 78.6.67.236 conn-refused 34.0581833333333 -118.415878333333
![Page 13: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/13.jpg)
Awesome way to get non-techie's on your side and impress some management :-)
Visualize the problem
![Page 14: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/14.jpg)
![Page 15: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/15.jpg)
![Page 16: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/16.jpg)
Beating up your ISP (figuratively)
● After more than a dozen calls to the ISP and as many "escalations" we landed on a conference call with some lead networks engineers
● After 6 hours on this conference call reiterating the problem and showing the data one engineer asks us to "hold tight"
● Things get very quiet...● Like magic all of my tests start succeeding!
![Page 17: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/17.jpg)
WTF!?!
![Page 18: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/18.jpg)
The backstory
● On June 5th, the ISP migrated the SoCal region to a new datacenter in Anaheim. This was an epic failure and they rolled back
● On June 12th, the ISP migrated again to Anaheim "successfully"
● Cell traffic is pooled by connection, and one of the pools was routing asymmetrically
● Asymmetric routing + stateful firewalls = BAD
● Updating the routing tables fixed everything
![Page 19: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/19.jpg)
Being a generalist
● A DevOps culture requires generalists● Understanding the full stack means being
able to troubleshoot problems at all layers● Fluid communication between sysadmins,
developers, hardware engineers, and network engineers requires generalists
● Fewer people in the war room results in faster problem solving
● This saves time and money and makes your team more valuable to the business
![Page 20: Disaster porn and the value of a generalist](https://reader034.vdocuments.mx/reader034/viewer/2022052509/55a14db51a28abc2488b4601/html5/thumbnails/20.jpg)
We're hiring!
Thank you!