fail the right way - node.js in production
TRANSCRIPT
![Page 1: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/1.jpg)
FAIL... THE RIGHTWAY
NODE.JS IN PRODUCTION
|
ssw2014.formidablelabs.com
@ryan_roemer formidablelabs.com
![Page 2: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/2.jpg)
WELCOME TO PRODUCTIONProduction can be a rough place for
your Node.js apps. Things can go verywrong out in the wild.
![Page 3: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/3.jpg)
FORMIDABLE LABS
![Page 4: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/4.jpg)
3:00 AM
![Page 5: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/5.jpg)
OUR FOCUSWhether on PAAS, IAAS, or bare metal.
Design for Failure: Keep your Node.js apps up
Avoidance: Get yourself out of the failover business
Isolate: One failure at a time
Analyze: Debug and diagnose problems quickly
![Page 6: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/6.jpg)
1. DESIGN FOR FAILUREFail and recover at multiple levels.
Let's look at failure from a systemperspective.
![Page 7: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/7.jpg)
SINGLE NODE.JS WORKER.Never ignore errors
Have a strong bias for killing theworker.
Handle: uncaughtException,
Listen: foo.on("error")
Domains
![Page 8: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/8.jpg)
MULTIPLE NODE.JS WORKERSUse or to
multiplex CPUs and isolate errors.Workers: die early on errors
Master: monitor and kill workers
cluster recluster
![Page 9: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/9.jpg)
MULTIPLE NODE.JS WORKERS
var recluster = require("recluster");var cluster = recluster("./server.js");cluster.run();
// Hot reload: kill -s SIGUSR2 CLUSTER_PIDprocess.on("SIGUSR2", function() { console.log("Got SIGUSR2, reloading cluster..."); cluster.reload();});
![Page 11: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/11.jpg)
SERVICELoad-balancers
Heartbeat / ping monitors
Availability zones, etc.
![Page 12: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/12.jpg)
MAKE IT HOTEverything up to this point should have
hot failover.
![Page 13: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/13.jpg)
DATACENTERHot failover across
datacenters?Typically very costly
But, the real deal if you're serious
![Page 14: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/14.jpg)
DISASTER RECOVERY"Business Continuity"
Don't let a technological problem end your business
Have a worst case, "lose some data" recovery plan
![Page 15: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/15.jpg)
2. AVOID FAILURESGet out of the business of failover
when you don't have to do it yourself.
![Page 16: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/16.jpg)
RESOURCES TO NOT SUPPORTDon't rely on system / service
resources you don't need to.Disk: NAS, disks, SSDs.
Datastores: DB, cloud services.
... Load Balancers, DNS, etc.
![Page 17: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/17.jpg)
HOW TO AVOIDUse SAAS wherever possible! (DB, LBs, storage).
Or PAAS for some Node.js apps.
Design Stateless, fungible servers (no disk risks).
![Page 18: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/18.jpg)
3. ISOLATE FAILURESIsolate failures you can't
avoid.
![Page 19: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/19.jpg)
RESOURCES TO SUPPORTLook to resources you must depend on:
CPU/Load: Run out of this and it's over.
HTTP: Each different host you hit.
Datastores: Connections? Different Hosts?
... also, memory, I/O, etc. and combinations thereof
![Page 20: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/20.jpg)
SOME ANECDOTESNode.js apps can be bad neighbors.
DB (auto-suggest) vs. HTTP (vendor translations)
DB (CRUD app) vs. CPU/Load (co-located PHP app)
Read vs. Write DB operations.
![Page 21: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/21.jpg)
HOW TO ISOLATECreate "micro-services" that stand on their own.
Monitor for cross-pressure and respond. (Next section!)
![Page 22: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/22.jpg)
4. ANALYZE EVERYTHINGData drives problem discovery
and action.
![Page 23: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/23.jpg)
LOG, MONITOR, MINE
![Page 24: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/24.jpg)
DECISIONS, GOALSThings to look for in Node.js apps...
IdentifyResource pressure: CPU, I/O,memory, network
Performance: Throughput,latency
Errors/Bugs: Quantitative,qualitative
DecideScale up, scale down?
Separate services?
![Page 25: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/25.jpg)
RECAPDesign for failure
Avoid
Isolate
Analyze
![Page 26: Fail the Right Way - Node.js in Production](https://reader033.vdocuments.mx/reader033/viewer/2022060120/5592a4c21a28ab5c798b463f/html5/thumbnails/26.jpg)
THANKS!
|
ssw2014.formidablelabs.com
@ryan_roemer formidablelabs.com