dyn: active/active failover with cory von wallenstein & eric rosenberry
DESCRIPTION
Dyn's Cory von Wallenstein & Iovation's Eric Rosenberry did a webinar recently on active/active failover setup with managed DNS. Here's the official slides.TRANSCRIPT
![Page 1: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/1.jpg)
Going Ac)ve/Ac)ve
Cory von Wallenstein Chief Technology Officer,
Dyn Inc. @cvonwallenstein
Eric Rosenberry Principal Infrastructure Architect,
iova)on Inc. eric.rosenberry@iova)on.com
@eprosenx
![Page 2: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/2.jpg)
Introduc)ons
Cory von Wallenstein Chief Technology Officer,
Dyn Inc. [email protected]
@cvonwallenstein
Eric Rosenberry Principal Infrastructure Architect,
iova)on Inc. eric.rosenberry@iova)on.com
@eprosenx
![Page 3: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/3.jpg)
What Do We Mean By Ac)ve/Ac)ve?
• Ac)ve • Passive • Ac)ve/Passive • Ac)ve/Ac)ve
![Page 4: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/4.jpg)
What Are We Looking to Gain?
• High(er) availability • Flexibility to change infrastructure without down)me
• Flexibility to expand infrastructure without four walled limita)ons
• Disaster resilience
![Page 5: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/5.jpg)
Ac)ve/Ac)ve FUD
• “It’s impossible!” – CAP theorem – WAN latency
• “It’s built in to my database!” – NoSQL and WAN replica)on
• Reality is it’s somewhere in the middle, depending on what problem you’re trying to solve
![Page 6: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/6.jpg)
hZp://www.flickr.com/photos/notaperfectpilot/8119088205/
“Wired people should know something about wires” -‐ Neal Stephenson, quoted in Andrew Blum’s TED Talk What is the Internet, Really?
![Page 7: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/7.jpg)
hZp://www.ted.com/talks/andrew_blum_what_is_the_internet_really.html
![Page 8: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/8.jpg)
Paradigm Shif
• All system maintenance is done during business hours without impact
• All sofware upgrades are done during business hours
• Sofware upgrades do not require down)me, so code can be pushed to produc)on more rapidly (more frequent smaller itera)ons)
• Enable commodity hardware usage
![Page 9: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/9.jpg)
The Four Ques)ons You Need To Ask Before Embarking
1. What problem(s) am I aZemp)ng to solve? 2. How will I segment? 3. Where will I deploy? 4. How will this affect each part of my app?
![Page 10: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/10.jpg)
Step One: Scope the Problem
• What are we replica)ng and why? • How close to real)me is it needed to be?
– Synchronous vs. Asynchronous • Think about this for each applica)on )er, and set availability/distribu)on goals
![Page 11: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/11.jpg)
Step One: Scope the Problem
• Example: • iova)on end-‐user facing content services must be served using the closest GSLB
selected node and each node must have N capacity (where N = our full overall global load) -‐ so overall we have more than 4N total capacity with all nodes online
• iova)on real-‐)me API services require N+1 redundancy in each of our two Ac)ve/Ac)ve facili)es -‐ i.e. 2 * (N+1) -‐ Allows us to lose any server, plus a datacenter and con)nue to func)on
• Non real-‐)me API services (i.e. Admin Console) require 2N+ resiliancy (i.e. one instance in each of our two Ac)ve/Ac)ve datacenters, with that instance running on a N+1 Virtual cluster)
• Some internal processes (i.e. Research Analy)cs) only require placement in one datacenter
![Page 12: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/12.jpg)
Step Two: How Will You Segment?
• Global Server Load Balancing with DNS – Round robin – Advanced load balancing – Ac)ve failover – Geographic
• Other strategies (out of scope for today): – Anycast – Challenges with TCP – HTTP Redirec)on – Challenges with performance – BGP Netblock based failover
![Page 13: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/13.jpg)
Step Three: Where Will You Deploy?
• Going from 1 to N • Where are you thinking?
– What are your current datacenter assets and how can they be leveraged?
• And for what reasons? – Disaster resilience – Get closer to users – Room to grow
![Page 14: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/14.jpg)
Disaster Resilience
hZp://maps.google.com
![Page 15: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/15.jpg)
hZp://www.cogentco.com/files/images/network/network_map/networkmap_global_large.png
Speed of light 299,792.458 km/second
(in a vacuum)
Theore)cal RTT ~40ms
Real RTT ~90ms
Speed of Light
![Page 16: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/16.jpg)
• Things don’t work as well at 90ms RTT latency as they do at 9ms RTT latency
• Where can you go to get out of the way of a disaster but not create latency headaches?
hZp://www.globaldatavault.com/natural-‐disaster-‐threat-‐maps.htm
Implica)ons on Selec)on
![Page 17: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/17.jpg)
hZp://soladrive.com/images/level3-‐map-‐large.png
Where The Fiber Actually Goes
![Page 18: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/18.jpg)
Disaster Resilience: Local Failures
hZp://www.datacenterknowledge.com/archives/2012/07/09/outages-‐surviving-‐electric-‐squirrels-‐ups-‐failures/
“A frying squirrel took out half of our Santa Clara data center two years back,” -‐ Mike Chris)an, Yahoo
![Page 19: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/19.jpg)
Local Failures
hZp://blog.level3.com/level-‐3-‐network/the-‐10-‐most-‐bizarre-‐and-‐annoying-‐causes-‐of-‐fiber-‐cuts/
“Squirrel chews account for a whopping 17% of our damages so far this year! But let me add that it is down from 28% just last year and it con)nues to decrease since we added cable guards to our plant.”, Fred Lawler, Level(3)
![Page 20: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/20.jpg)
Get closer to users
hZp://www.akamai.com/html/technology/dataviz1.html
![Page 21: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/21.jpg)
Get closer to users
hZp://www.akamai.com/html/technology/dataviz1.html
![Page 22: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/22.jpg)
“Sorry, we’re full”
hZp://www.theregister.co.uk/2010/10/12/capgemini_merlin_data_center/
![Page 23: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/23.jpg)
Step Three: Where Will You Deploy?
• Don’t just assume vastly different geographic areas
• How far do you need to go to get out of same disaster zone? – What kind of disasters happen in your area? – What geographic barriers are there? – Can you drive it in an emergency?
![Page 24: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/24.jpg)
hZp://www.zayo.com/sites/default/files/images/Zayo-‐US-‐Network-‐EXTERNAL-‐11-‐1-‐2012.kmz
Portland to SeaZle
![Page 25: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/25.jpg)
Step Four: Think Through Your Apps
• How will these different pieces of the architecture behave with increased latency between them?
• Can you avoid real-‐)me calls across the WAN?
![Page 26: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/26.jpg)
Step Four: Think Through Your Apps
• Examples from Iova)on: – Web Device Print code is served from four global nodes using GSLB
• via Dyn Traffic Management • Was our first Ac)ve/Ac)ve applica)on
– Real )me API responses are served Ac)ve/Ac)ve between Portland and SeaZle
• 50% of the )me our API URL returns PDX, and 50% it returns SEA IP
• Real )me queries are handled locally within single DC
![Page 27: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/27.jpg)
Summary
• Top takeaways – Ac)ve/Ac)ve is a Paradigm Shif – It is achievable – Choose your loca)ons carefully
• Network is a primary selec)on criteria • How far do you really need to go?
– Analyze each applica)on )ers constraints carefully – Start with low hanging fruit
![Page 28: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/28.jpg)
What iovation Does
Iden)fy and re-‐recognize devices connec)ng to your business sites
Associate groups of devices that would otherwise appear unrelated
Assess real-‐)me risk through business rules including velocity, anomaly, proxy use, etc.
![Page 29: Dyn: Active/Active Failover with Cory von Wallenstein & Eric Rosenberry](https://reader035.vdocuments.mx/reader035/viewer/2022062701/55381d7c4a7959c36e8b4694/html5/thumbnails/29.jpg)
Ques)ons?
Cory von Wallenstein Chief Technology Officer,
Dyn Inc. [email protected]
@cvonwallenstein
Eric Rosenberry Principal Infrastructure Architect,
iova)on Inc. eric.rosenberry@iova)on.com
@eprosenx