node.js, javascript and the future
DESCRIPTION
TRANSCRIPT
javascript & the future
Jeff MiccolisDevelopment Seed
Open AtriumFeaturesContextStrongarm
Jeff MiccolisMapBox
First, a brief apology.
Three “war” stories about things that are hard with php & drupal
...and easier in node.js
Two lessons I learned
...from Drupal and node.js work
SERVER SIDE JAVASCRIPT
V8
CHROME ‘S JAVASCRIPT ENGINE
ON YOUR SERVER
1. Sending lots of email
OpenAtrium
Single emailsDigests email
SMSXMPP (jabber)Twitteretc, etc, etc...
d.o/project/Messaging
channel independent messages
don’t send e-mails to users, send them 'messages'
delivered by mail, IM, SMS, etc...
Problems
Problems
Send an email to one user - 50 ms
28 users - 1.4 seconds
600 users - 30 seconds
Problems
Run cron every 5 minutes
and you can send a lot more emails
but if one run doesn’t complete...
<?php
$addresses = array( '[email protected]', '[email protected]', '[email protected]');
$message = 'I am writing to inform you of a...';$count = 0;
foreach($addresses as $sucker) { if (mail($sucker, 'New opportunity', $message)) { $count++; };}
echo "$count messages sent";
var mail = require('mail');
var addresses = [ '[email protected]', '[email protected]', '[email protected]'];
var message = 'I am writing to inform you of a...';var count = 0;
addresses.forEach(function(sucker, i) { mail(sucker, 'New opportunity', message, function(err) { if (!err) { count++; }
if (i == addresses.length - 1) { console.log(count +" messages sent"); } });});
// Attempt to send email.mail(sucker, 'New opportunity', message, function(err) { if (!err) { count++; }});
// Don't wait, get the next address NOW.console.log('next');
// Attempt to send email.if (mail($sucker, 'New opportunity', $message)) { $count++};
// Once email is sent, get the next address.echo 'next';
Send an email ~ 5 ms processing, 45 ms waiting
28 emails ~ 140 ms to process, done 45 ms later.
600 users ~ 3 seconds to process, done 45 ms later.
Stop waiting around
Stop waiting around
600 emails w/Php: ~30 seconds
600 emails w/node.js: ~3 seconds
Stop waiting around
DEMO
2. Handling feeds
The task
Fetch RSS feedsFetch original article
Tag items (calais)Tag items (watchlist)Tag items (wikipedia)
Geocode items
The scale
Thousands of feeds.
Millions of items.
Gigs and Gigs of data.
Problems
cron.php
maggie
maggied
multi-threaded python daemon
maggied
4 “retriever” workers get batches of 50 items.
they fetch/tag/geocode each item.
maggied
retrieve original story: 300ms
tag: 100ms
geocode: 150ms
TOTAL: 550ms
Nearly all of that 550ms is spent idle.
Stop waiting around
...but we could run “packs” of retrievers!
Is that really the best idea?
Replace the retrievers with a single HYPERACTIVE SQUID!
The squid runs up and down as fast as it can dealing with each item in turn.
It fires off any long running I/O operations and then moves on to the next item.
When the I/O operation reports progress, it does a little more work on behalf of the
corresponding item.
100% async
Event loop
100% async
Anything that leaves v8 takes a callback.
100% async
filesystem, network, stdio, timers, child processes
var request = require('request');
request('http://example.com', function (err, resp, body) { if (!err && resp.statusCode == 200) { console.log(body); }});
var fs = require('fs');
fs.readFile('/etc/passwd', function (err, data) { if (err) throw err; console.log(data);});
100% async
Limiting factors change:
how many open sockets are you allowed?how much bandwidth can you grab?
how fast can you issue requests?
3. Big files, long sessions.
What’s big?
gigabyte
What’s long?
hours
What’s a session?
HTTP
A category of stories...
Big uploads in php
upload_max_filesize
post_max_size
max_input_time
max_execution_time
Big uploads in php
this approach caps out ~500mb
Problems
Opens the door to DOS.
Tolerates application bloat.
Problems in production can get really bad.
Never gonna get gigabyte uploads.
Problems
Php makes you look elsewhere.
If only we could stream...
If only we could stream...
Streaming
Deal with things bucket by bucket
and you don’t need all that memory.
Streaming
write that files to disk as it comes in
...or stream it off to s3!
var formidable = require('formidable');var http = require('http');
http.createServer(function(req, res) { if (req.url == '/upload' && req.method == 'POST') {
var form = new formidable.IncomingForm(); form.parse(req); form.onPart = function(part) { part.addListener('data', function(chunk) { // Do cool stuff, like streaming! console.log(chunk.toString('utf8')); }); }; }}).listen(80);
_changes
_changes
A persistent connection to your database, which feeds you new data when it has some.
_changes
it’s amazing
MapBox uploads
1. Map uploads to s3
2. Save a record to CouchDB
3. “Downloader” listens for new maps
MapBox uploads
node.js process
very long-lived http connection
to CouchDB
Everything is different
non-blocking I/O & single event loop
4. Package Management
drush make
Package management for Drupal
drush make
d.o - project namespace
d.o - inclusive project policy
drush make
a way to stay sane.
drush make
...and is part of drush proper now!
drush make
But I’d been using Drupal for YEARS by then
pear
pear
PHP Extension and Application Repository
pear
high threshold for new projects
imagine if pear was...
wildly inclusive
awesomely useful
awesomely successful
npm
wildly inclusive
awesomely useful
awesomely successful
npm
node package manager
packages
pear: 584
d.o: 15, 296 (~3,600 for D7)
npm: 7,976 (2 years old!)
npm - package.json
{ "author": "Jeff Miccolis <[email protected]>", "name": "portal", "description": "Data Catalog", "version": "0.0.0", "engines": { "node": "~v0.6.6" }, "dependencies": { "couchapp": "https://github.com/.../attachment_operators", "underscore": "= 1.2.0", "request": "= 2.1.1" }}
npm
you’ll love it.
5. Nice hammer
Nice hammer...
“Javascript, really?”
Nice hammer...
“Clearly he’s overly excited about this async stuff”
Nice hammer...
“...and thinks it’ll work for everything.”
“Do it with Drupal”, eh?
node.js is bad for...
computationally heavy tasksdatabases
node.js is awesome for...
interacting with other services.
services like...
databases, mail servers, web services, web clients..
Other people’s words
http://substack.net/posts/b96642
http://blog.nelhage.com/2012/03/why-node-js-is-cool/
Limited surface area
“The primary focus of most node modules is on using, not extending... A big part of what makes node modules so great is how they tend to have really obvious entry points as a consequence of focusing on usability and limited surface area”
Callback Austerity
“Instead of the http server being an external service that we configure to run our code, it becomes just another tool in our arsenal”
Async by default
"The upshot of this pressure is that, since essentially every node.js library works this way,
you can pick and choose arbitrary node.js libraries and combine them in the same
program, without even having to think about the fact that you’re doing so."
Try it!
If nothing else you’ll get better at javascript.
Try it!
But I bet you’ll like it.
Thanks!
Photo credit due to these wonderful people who offer their photos on flickr under a creative commons license:
Spam - http://www.flickr.com/photos/olivabassa/Cows - http://www.flickr.com/photos/lynndombrowski/Dogs - http://www.flickr.com/photos/photosightfaces/Squid - http://www.flickr.com/photos/laughingsquid/Ants - http://www.flickr.com/photos/fuzzcat/Stream - http://www.flickr.com/photos/universalpops/Boxes - http://www.flickr.com/photos/sillydog/Pear - http://www.flickr.com/photos/reebob/Hammer - http://www.flickr.com/photos/kefraya
What did you think?Locate this session on theDrupalCon Denver website
http://denver2012.drupal.org/program
Click the “Take the Survey” link.
Thank You!