scalable distributed system architecture
TRANSCRIPT
Scalable Raspberry Pi Architecture
( distributed systems )LBa node app
HDFS
-Aulëkin
Goal: "Painless" Development and Maintenance
Specifically, SaaS ecosystems (Google, Facebook, Twitter)
Mindset Primer:
Simple vs Easy, avoid Complection
UNIX Philosophy
Power of Functional Programming
Do 1 thing well, compose through universal interfaces
Focusing on inputs and outputs, referential transparency
Easy hides complexity, simple eliminates it
Web Developer = Internet Plumber
Our Tests
Starting a New ServiceGrowing Existing Service
Performance Monitoring
Logging
Host Management
Decoupled consumption
Deployment
Starting a New Service
Growing Existing Service
How much work does it take to get a new serviceready for users?
How do you communicate with your team aboutincreased cluster complexity?
How hard is it to add features to an existing service?
Can you increase robustness / redundancy whenunforeseen edge cases are stumbled upon?
Performance Monitoring
Logging
How much visibility do you have into the runtimeperformance characteristics of your programs?
If something is breaking, how do you diagnose + fix?
How do you monitor what your application is actuallydoing?
How do you manage logs of all user + serverinteractions?
Host Management
Deployment
How do you set up the program's environment?- Configuration (DB access), env vars, libs / packages
How do you get your code (as a binary or as scripts)onto host machines?
Decoupled consumption
How easy is it to replace a service?
How easy is consuming a new service?
Are your services just "big objects?"
app
Growing New Service
Easy because there's 1
Growing Old Service
Monitoring
HTOP
Logging
Host Management
Deployment
Decoupled ConsumptionHardcoded
What's deployment?
apt-get install
node server.js > \ $(date +"%d-%m-%y-%s").log
Not there yet
N = 1
Not even there yet
Upstart + scp
Bash scripts
Growing New Service
Growing Old Service
Monitoring
Logging
Host Management
Deployment
Decoupled ConsumptionHardcoded
LB
app
app
Git push!
HTOP always
Growing steadily
N = 4
Bash Scripts
Still a monolith
10k lines, little structure
Growing New Service
Growing Old Service
Monitoring
Logging
Host Management
Deployment
Decoupled Consumption
LB app
app
app
HDFS
StatsD, New Relic
Dump to Loggly + HDFS
Bash script -> Git push's
Hardcoded
N < 10
Turn an app into an image
Compose images into 'pods'
Unique & Reproducible
Generic Daemons(enables reuse)
Handle logs, monitoring
Programmatic Host Config
Idempotent (and fast)
Can also deployYour app is just an app
Pools servers together
Config rules are mixable
Loggly ($$$) or HDFS
Growing New Service
Growing Old Service
Monitoring
Logging
Host Management
Deployment
Decoupled Consumption
LB
app
app
app
LB
app
app
app
LB
app
app
app HDFS
HDFS
LB
LB
app
app
app
Bunch of work
Tight coupling, hard
StatsD (New relic $$$)
Ansible / Salt
Above + Docker
DNS rules
N < 50
DNS management decouples services
Key-Value store handles configuration management
API driven private DNS rules are like aprogrammable service phone book
Also feature flagging, distributed coordination( kv-store semaphores )
Growing New Service
Growing Old Service
Monitoring
Logging
Host Management
Deployment
Decoupled Consumption
LB
app
app
app
LB
app
app
app
LB
app
app
app
LB
app
app
app
LB
app
app
app
LB
app
app
app
LB
app
app
app
HDFS
HDFS
LB
LB
LB
app
app
app
LB
app
app
app
LB
app
app
app
Balls of mud
Time consuming
StatsD
Kafka / Flume
Ansible / Salt
Above + Docker
Consul + DNS
N < 500
What were we doing again?Plumbing the internet, moving data around
We make programs which run in parallel but shareresources (global state, hosts, pod characteristics)
Manage inter-service http communication, wireour cluster together
- Running code
- Managing services and their resources
- Managing communication
What does that sound like?
Need a Cluster OS
Give services resources
Run those services only when needed(upstart for apps, cron for jobs)
Orchestrate service connections
Properly isolate service processes
A Data Center's Kernel
Distributed Schedulingand resource management
Enter Mesos
Modeled after Google's Borg
Manages 10,000+ nodes intwitter's data centers
+ Aurora
a node a node a node a node a node a node a node
a node a node a node a node a node a node a node
a node a node a node a node a node a node a node
a node a node a node a node a node a node a node
a node a node a node a node a node a node a node
a node
a node
a node
a node
a node
a node
a node
a node
a node
a node
zk zk zkmastermastermaster
Mesos
a node a node a node a node a node a node a node
a node a node a node a node a node a node a node
a node a node a node a node a node a node a node
a node a node a node a node a node a node a node
a node a node a node a node a node a node a node
a node
a node
a node
a node
a node
a node
a node
a node
a node
a node
zk zk zkmastermastermaster
LB
app
app
app
LB
app
app
LB
appapp
app
HDFS
HDFS
LB
appapp
HDFS
LB
app
app
app
LB
app
app
app
LB
app
app
app
app
app
app
LB
app
app
app
LB
app
app
app
LB
app
app
app
LB
app
app
app
LB
app
app
LB
app
app
app
LB
Higher level of abstraction
Deployments are tagged and reproducible
Lots of redundancy, 'Startup DevOps' is baked in
What have we gotten?
Stop worrying about individual machines!
Automatic High Availability
Constraints and Labels control what runs where
Why not start with Mesos?
Unproven
Additional Setup Complexity
Except that Mesos runs Twitter, AirBnB, Netflix, TWC, Paypal, OpenTable, Groupon, FourSquare, eBay...
( At least for data processing needs )
Compared to N < 10, perhaps. Starting w/ Mesos leadsto less code churn down the line. Less slowing down.
Why start with Mesos?
Start with Microservices, Private PaaS (Heroku)
Easy rolling deployments
Higher node utilization Forced decoupling
Stop worrying about individual machines!
Invest in Infrastructure
Developer Happiness :) We love building things
Productivity soars when tedium is removed
Hackathon projects can be quickly scaled to prod
Let computers do the boring things - avoid human mistakes