scale like an ant, distribute the workload - dpc, amsterdam, 2011
DESCRIPTION
Many services / applications now a day are ill equipped with handling a sudden rush of popularity, as is often the case on the internet now a days, to a point where the services either become unavailable or unbearably slow.By taking a chapter from the ant colonies in the wild, where their strength lies in their numbers and the fact that everyone works together towards the same goal, we can apply the same principle to our service by using systems such as- gearman- memcache- daemons- message queues- load balancersand many more, you can achieve greater performance, more redundancy, higher availability and have the ability to scale your services up and down as required easily.During this talk attendees will be lead through the world of distributed systems and scalability, and shown the how, where and what, of how to take the average application and splitting it into smaller more manageable piecesTRANSCRIPT
Distribute the workload
Helgi Þormar ÞorbjörnssonDutch PHP Conference, Amsterdam, 19th May 2011
Thursday, 19 May 2011
Who am I?
Thursday, 19 May 2011
Helgi
Thursday, 19 May 2011
VP of Engineering at Orchestra.io
Helgi
Thursday, 19 May 2011
VP of Engineering at Orchestra.io
Developer at PEAR
Helgi
Thursday, 19 May 2011
VP of Engineering at Orchestra.io
Developer at PEAR
From Iceland
Helgi
Thursday, 19 May 2011
VP of Engineering at Orchestra.io
Developer at PEAR
From Iceland
@h on Twitter
Helgi
Thursday, 19 May 2011
Why Distribute?
Thursday, 19 May 2011
Why Distribute?
Thursday, 19 May 2011
Why Distribute?
Budget
Thursday, 19 May 2011
Why Distribute?
Budget
Efficiency
Thursday, 19 May 2011
Why Distribute?
Budget
Efficiency
Perception
Thursday, 19 May 2011
Budget
Thursday, 19 May 2011
Budget
Spend wisely
Thursday, 19 May 2011
Budget
Spend wisely
Commodity servers
Thursday, 19 May 2011
Budget
Spend wisely
Commodity servers
Low overhead, high yield
Thursday, 19 May 2011
Budget
Spend wisely
Commodity servers
Low overhead, high yield
Cloud Computing (EC2)
Thursday, 19 May 2011
Efficiency
10 small servers > 1 big
Thursday, 19 May 2011
Venue Security
Thursday, 19 May 2011
1000 people can exit quicker through 10 small doors than 1 big
Thursday, 19 May 2011
1000 people can exit quicker
through 10 small doors than 1 big
Thursday, 19 May 2011
1000 people can exit quicker
through 10 small doors than 1 big
Thursday, 19 May 2011
1000 people can exit quicker through 10 small doors than 1 big
Thursday, 19 May 2011
Thursday, 19 May 2011
Thursday, 19 May 2011
Thursday, 19 May 2011
Thursday, 19 May 2011
Perception
Thursday, 19 May 2011
Perception
Defer intensive processes
Thursday, 19 May 2011
Perception
Defer intensive processes
Give instant feedback
Thursday, 19 May 2011
Perception
Defer intensive processes
Give instant feedback
Users keep on browsing
Thursday, 19 May 2011
Perception
Defer intensive processes
Give instant feedback
Users keep on browsing
Thursday, 19 May 2011
“It all depends on how we look at things, and not how
they are in themselves.”
- Carl G. Jung
Thursday, 19 May 2011
Thursday, 19 May 2011
Chapter from Nature
Thursday, 19 May 2011
Ant Colonies
Thursday, 19 May 2011
Algorithms
Thursday, 19 May 2011
Algorithms
Scheduling
Thursday, 19 May 2011
Algorithms
Scheduling
Vehicle Routing
Thursday, 19 May 2011
Algorithms
Scheduling
Vehicle Routing
Assignment
Thursday, 19 May 2011
Algorithms
Scheduling
Vehicle Routing
Assignment
Sets
Thursday, 19 May 2011
Algorithms
Scheduling
Vehicle Routing
Assignment
Sets
Other
Thursday, 19 May 2011
Algorithms
Scheduling
Vehicle Routing
Assignment
Sets
Other
Thursday, 19 May 2011
How do ants fit?
Thursday, 19 May 2011
How do ants fit?
Strength in numbers
Thursday, 19 May 2011
How do ants fit?
Strength in numbers
Work together
Thursday, 19 May 2011
How do ants fit?
Strength in numbers
Work together
Size benefits them
Thursday, 19 May 2011
Teamwork
When faced with a problem they will solve the problem as one.
Thursday, 19 May 2011
Thursday, 19 May 2011
Thursday, 19 May 2011
What if they were bigger?
Thursday, 19 May 2011
Types of Ants
Thursday, 19 May 2011
Types of Ants
Military
Thursday, 19 May 2011
Types of Ants
Military
Maids
Thursday, 19 May 2011
Types of Ants
Military
Maids
Tunnel diggers
Thursday, 19 May 2011
Types of Ants
Military
Maids
Tunnel diggers
Food gatherers
Thursday, 19 May 2011
How does this map to my application?
Thursday, 19 May 2011
Thursday, 19 May 2011
Colony = Application
Thursday, 19 May 2011
Colony = Application Ants = Components
Thursday, 19 May 2011
Colony = Application Ants = Components
Ants do many different types of work to keep their colony running
Thursday, 19 May 2011
Architect for Distribution
Thursday, 19 May 2011
Characteristics
Thursday, 19 May 2011
Characteristics
Decoupling
Thursday, 19 May 2011
Characteristics
Decoupling
Elasticity
Thursday, 19 May 2011
Characteristics
Decoupling
Elasticity
High Availability
Thursday, 19 May 2011
Characteristics
Decoupling
Elasticity
High Availability
Concurrency
Thursday, 19 May 2011
Decoupling
Thursday, 19 May 2011
Application
DB API
Cache FE
Thursday, 19 May 2011
Application
DB API
Cache FE
Thursday, 19 May 2011
ApplicationDB API
Cache FE
Thursday, 19 May 2011
ApplicationDB API
Cache FE
Cache
Thursday, 19 May 2011
ApplicationDB API
Cache FE
Cache
API
Thursday, 19 May 2011
ApplicationDB API
Cache FE
Cache
API
API
Thursday, 19 May 2011
Elasticity
Thursday, 19 May 2011
Cloud Computing
Thursday, 19 May 2011
Load Balancing
Thursday, 19 May 2011
HA Proxy
Nginx
My Favourite
Thursday, 19 May 2011
Monitoring
Thursday, 19 May 2011
When do I need more servers?
Thursday, 19 May 2011
Needs to be around from the start!
Thursday, 19 May 2011
Keep records
Thursday, 19 May 2011
Spot trends
Thursday, 19 May 2011
Different types
Thursday, 19 May 2011
Different types
Hardware Performance
Thursday, 19 May 2011
Different types
Hardware Performance
Software Performance
Thursday, 19 May 2011
Different types
Hardware Performance
Software Performance
Availability
Thursday, 19 May 2011
Different types
Hardware Performance
Software Performance
Availability
Resourcing
Thursday, 19 May 2011
Different types
Hardware Performance
Software Performance
Availability
Resourcing
Thursday, 19 May 2011
Applications
Thursday, 19 May 2011
ApplicationsNew Relic
Thursday, 19 May 2011
ApplicationsNew Relic
CloudKick
Thursday, 19 May 2011
ApplicationsNew Relic
CloudKick
ScoutApp
Thursday, 19 May 2011
ApplicationsNew Relic
CloudKick
ScoutApp
Nagios
Thursday, 19 May 2011
ApplicationsNew Relic
CloudKick
ScoutApp
Nagios
Cacti
Thursday, 19 May 2011
ApplicationsNew Relic
CloudKick
ScoutApp
Nagios
Cacti
Circonus
Thursday, 19 May 2011
Automation
Thursday, 19 May 2011
Want to sleep easy at night?
Thursday, 19 May 2011
Want to sleep easy at night?
Go out partying without worrying about getting a phone call?
Thursday, 19 May 2011
Plug into your monitoring
Thursday, 19 May 2011
Bringing together Monitoring and Elastic behaviour into one
beautiful whole!
Thursday, 19 May 2011
Add some intelligence to add / remove servers as needed based
on current information.
Thursday, 19 May 2011
This is why good monitoring is essential or this wouldn’t be
possible
Thursday, 19 May 2011
Just make sure it doesn’t turn into...
Thursday, 19 May 2011
Skynet!!Thursday, 19 May 2011
High Availability
Thursday, 19 May 2011
Get a highly available and resilient setup by following a few
of those recommendations
Thursday, 19 May 2011
Remember, even Google has outages
Thursday, 19 May 2011
Benefits
Thursday, 19 May 2011
Benefits
Easy management
Thursday, 19 May 2011
Benefits
Easy management
Ability to stop / start servers quickly
Thursday, 19 May 2011
Benefits
Easy management
Ability to stop / start servers quickly
Responsibilities are separate
Thursday, 19 May 2011
Benefits
Easy management
Ability to stop / start servers quickly
Responsibilities are separate
Quickly move to a new cluster
Thursday, 19 May 2011
Benefits
Easy management
Ability to stop / start servers quickly
Responsibilities are separate
Quickly move to a new cluster
Reduced risk
Thursday, 19 May 2011
What to avoid
Thursday, 19 May 2011
Local Sessions
Thursday, 19 May 2011
Store sessions in DB / Memcache
Solution
Thursday, 19 May 2011
Local Memory
Thursday, 19 May 2011
Networked Memcache
Solution
Thursday, 19 May 2011
Local Files
Thursday, 19 May 2011
Local Uploads
Thursday, 19 May 2011
Writing to /tmp
Thursday, 19 May 2011
Store on S3 or a networked FS
Solution
Thursday, 19 May 2011
Serve up static files from CDNs
Solution
Thursday, 19 May 2011
Servers can vanish at any given time
Thursday, 19 May 2011
Internal APIs
Thursday, 19 May 2011
Application
S3GFS FS
Internal Storage API
Thursday, 19 May 2011
Application
MySQLMongo Cache
Internal DB API
Thursday, 19 May 2011
SOA
Thursday, 19 May 2011
Service Oriented Architecture
Thursday, 19 May 2011
Sort of :-)
Thursday, 19 May 2011
Eventually Consistent
Thursday, 19 May 2011
CAP Therom
Thursday, 19 May 2011
Consistency
Availability
Partition Tolerance
Thursday, 19 May 2011
Consistency
All nodes see the same data at the same time
Thursday, 19 May 2011
Availability
Node failures do not prevent survivors from continuing to
operate
Thursday, 19 May 2011
Partition Tolerance
The system continues to operate despite arbitrary message loss
Thursday, 19 May 2011
Consistency
Availability
Partition Tolerance
Thursday, 19 May 2011
Queue Systems
Thursday, 19 May 2011
Good for
Thursday, 19 May 2011
Good forImage Processing
Thursday, 19 May 2011
Good forImage Processing
Distributed Logs
Thursday, 19 May 2011
Good forImage Processing
Distributed Logs
Data Mining
Thursday, 19 May 2011
Good forImage Processing
Distributed Logs
Data Mining
Mass Emails
Thursday, 19 May 2011
Good forImage Processing
Distributed Logs
Data Mining
Mass Emails
Intensive transformation
Thursday, 19 May 2011
Good forImage Processing
Distributed Logs
Data Mining
Mass Emails
Intensive transformation
Search
Thursday, 19 May 2011
Common Tools
Thursday, 19 May 2011
Common Tools
Gearman
Thursday, 19 May 2011
Common Tools
Gearman
Hadoop
Thursday, 19 May 2011
Common Tools
Gearman
Hadoop
Zero MQ (0MQ)
Thursday, 19 May 2011
Common Tools
Gearman
Hadoop
Zero MQ (0MQ)
RabbitMQ
Thursday, 19 May 2011
Common Tools
Gearman
Hadoop
Zero MQ (0MQ)
RabbitMQ
And many others!
Thursday, 19 May 2011
Gearman
Thursday, 19 May 2011
Your Client Code
Gearman Client API(C, PHP, Perl, MySQL UDF, ...)
Gearman Job Servergearmand
Gearman Worker API(C, PHP, Perl, Python, ...)
Your Worker Code
Your App Gearman
Thursday, 19 May 2011
A Story!
Thursday, 19 May 2011
Financial Software
Thursday, 19 May 2011
3000+ Clients
Thursday, 19 May 2011
Each one has 5 external data sources
Thursday, 19 May 2011
Each data source is a web service
Thursday, 19 May 2011
Ran every 6 hours every day
Thursday, 19 May 2011
Cron
1
2
3
4
5
Job 1
Gearman
1
43
2
5
Web Services Processing
Thursday, 19 May 2011
But! That wasn’t enough
Thursday, 19 May 2011
Job kicked off on login
Thursday, 19 May 2011
Another Story!
Thursday, 19 May 2011
CloudSplit
Thursday, 19 May 2011
Near Real Time Cloud Analytics
Thursday, 19 May 2011
Clients install logging agent locally
Thursday, 19 May 2011
syslogd
Thursday, 19 May 2011
Public API
Thursday, 19 May 2011
Multiple Persistent Gearman Servers
Thursday, 19 May 2011
Internal DB API
Thursday, 19 May 2011
Agent
Thursday, 19 May 2011
Agent syslogd
Thursday, 19 May 2011
Agent syslogd
API
Thursday, 19 May 2011
Agent syslogd
API
Load Balanced
Thursday, 19 May 2011
Agent syslogd
API
Gearman
Gearman
Load Balanced
Thursday, 19 May 2011
Agent syslogd
API
Gearman
Gearman
Load Balanced
PersistentThursday, 19 May 2011
Agent syslogd
API
Gearman
Gearman
Worker
Worker
Worker
Load Balanced
PersistentThursday, 19 May 2011
Agent syslogd
API
Gearman
Gearman
Worker
Worker
Worker
Internal API
Load Balanced
PersistentThursday, 19 May 2011
Agent syslogd
API
Gearman
Gearman
Worker
Worker
Worker
Internal API
Load Balanced
Load Balanced
PersistentThursday, 19 May 2011
Agent syslogd
API
Gearman
Gearman
CouchDB
Worker
Worker
Worker
Internal API
Load Balanced
Load Balanced
PersistentThursday, 19 May 2011
CouchDB Setup
Thursday, 19 May 2011
Write vs Read
Thursday, 19 May 2011
Writes
Thursday, 19 May 2011
Writes
Multi Master setup
Thursday, 19 May 2011
Writes
Multi Master setup
Replicated
Thursday, 19 May 2011
Writes
Multi Master setup
Replicated
Deals with writes only
Thursday, 19 May 2011
Writes
Multi Master setup
Replicated
Deals with writes only
Thursday, 19 May 2011
Reads
Thursday, 19 May 2011
Reads
Multi Master setup
Thursday, 19 May 2011
Reads
Multi Master setup
Replicated from write cluster
Thursday, 19 May 2011
Reads
Multi Master setup
Replicated from write cluster
Slaves handle website requests
Thursday, 19 May 2011
Reads
Multi Master setup
Replicated from write cluster
Slaves handle website requests
Thursday, 19 May 2011
Heavy Map/Reduce usage for data
Thursday, 19 May 2011
Supervisord
Thursday, 19 May 2011
phpadvent.org/2009/daemonize-your-php-by-sean-coates
Thursday, 19 May 2011
Map/Reduce
Thursday, 19 May 2011
Map
Thursday, 19 May 2011
Map
Master gets a problem to solve
Thursday, 19 May 2011
Map
Master gets a problem to solve
Breaks into multiple sub-problems
Thursday, 19 May 2011
Map
Master gets a problem to solve
Breaks into multiple sub-problems
Distributed to multiple workers
Thursday, 19 May 2011
Map
Master gets a problem to solve
Breaks into multiple sub-problems
Distributed to multiple workers
A worker can take the same steps
Thursday, 19 May 2011
Map
Master gets a problem to solve
Breaks into multiple sub-problems
Distributed to multiple workers
A worker can take the same steps
Answer passed back to Master
Thursday, 19 May 2011
Reduce
Thursday, 19 May 2011
Reduce
Takes in answers from the map workers
Thursday, 19 May 2011
Reduce
Takes in answers from the map workers
Combines together to get an answer
Thursday, 19 May 2011
Reduce
Takes in answers from the map workers
Combines together to get an answer
There can be multiple reducers
Thursday, 19 May 2011
process petabytes of data in few hours on commodity server farm
Thursday, 19 May 2011
CouchDB
Thursday, 19 May 2011
CouchDB
Thursday, 19 May 2011
CouchDB
Highly Concurrent
Thursday, 19 May 2011
CouchDB
Highly Concurrent
Schema free, document based
Thursday, 19 May 2011
CouchDB
Highly Concurrent
Schema free, document based
RESTful API
Thursday, 19 May 2011
CouchDB
Highly Concurrent
Schema free, document based
RESTful API
Map/Reduce Views
Thursday, 19 May 2011
CouchDB
Highly Concurrent
Schema free, document based
RESTful API
Map/Reduce Views
Easy Replication
Thursday, 19 May 2011
Hadoop
Thursday, 19 May 2011
Hadoop is a framework for running applications on large clusters of commodity hardware.
Thursday, 19 May 2011
Thursday, 19 May 2011
The Hadoop framework transparently provides applications both reliability and data motion
Thursday, 19 May 2011
Thursday, 19 May 2011
Uses Map/Reduce concept to farm out work
Thursday, 19 May 2011
Thursday, 19 May 2011
Distributed FS to handled node failure automagically
Thursday, 19 May 2011
Thursday, 19 May 2011
Join 2 datasets together of a significant size
Thursday, 19 May 2011
Thursday, 19 May 2011
500 GB worth of log files with a large location dataset
Thursday, 19 May 2011
ØMQ
Thursday, 19 May 2011
ØMQ
Thursday, 19 May 2011
ØMQ
Async Message System
Thursday, 19 May 2011
ØMQ
Async Message System
Thin and lightweight
Thursday, 19 May 2011
ØMQ
Async Message System
Thin and lightweight
High Performance
Thursday, 19 May 2011
ØMQ
Async Message System
Thin and lightweight
High Performance
Simple
Thursday, 19 May 2011
ØMQ
Async Message System
Thin and lightweight
High Performance
Simple
Scalable
Thursday, 19 May 2011
Thursday, 19 May 2011
One socket can load balance to multiple end points
Thursday, 19 May 2011
Thursday, 19 May 2011
Multiple end points can be funnelled into a single socket
Thursday, 19 May 2011
Thursday, 19 May 2011
Handle deployments to multiple servers
Thursday, 19 May 2011
Thursday, 19 May 2011
Scale is an example of that
Thursday, 19 May 2011
Thursday, 19 May 2011
Mongrel2 is a web server that uses it in a similar way as fastcgi
Thursday, 19 May 2011
Thursday, 19 May 2011
Move around text (JSON) and Binary data for real time communication
Thursday, 19 May 2011
Thursday, 19 May 2011
Could have replaced syslogd and the external API in my previous example
Thursday, 19 May 2011
Code time? :-)
Thursday, 19 May 2011
Questions?
Joind.in: http://joind.in/3212
Thursday, 19 May 2011