cowboy development with django
DESCRIPTION
Keynote for DjangoCon 2009, presented on the 8th of September 2009. Covers two cowboy projects - WildLifeNearYou.com and MP expenses - and talks about ways of "reigning in the cowboy" and developing in a more sustainable way.TRANSCRIPT
Cowboy development with Django
Simon WillisonDjangoCon 2009
http://www.youtube.com/watch?v=nZx9sNXv9h0
Just one problem... we didn’t have cowboys in
England
The Napoleonic Wars
A Napoleonic Sea Fort
http://en.wikipedia.org/wiki/File:Alderney_-_Fort_Clonque_02.jpg
Super Evil Dev Fort
http://www.anotherurl.com/travel/fort_clonque/handbook.htm
Photos by Cindy Li
http://www.flickr.com/photos/cindyli/sets/72157610369683426/
WildLifeNearYou.com(Built in 1 week and 10 months)
DEMO
Search uses the geospatial branch of Xapian
Species database comes from Freebase
Photos can be imported from Flickr
“Suggest changes” to our Zoo information uses model objects representing proposed changes to other model objects
What is /dev/fort?
Imagine a place of no distractions, noIM, no Twitter — in fact, nointernet. Within, a group of a dozenor more developers, designers,thinkers and doers. And a lot of afood.
Now imagine that place is a fort.
The idea behind /dev/fort is to throwa group of people together, cut themoff from the rest of the world, and
/dev/fortCohort 3: Winter 2009
The tripThe third /dev/fort will run from 9th to 16th November on the KintyrePeninsula in Scotland.
Cohort 2: Summer 2009
The tripThe second /dev/fort ran from 30th May to 6th June 2009 at KnockbrexCastle in Scotland. As with the first cohort, we have a few remainingproblems still to iron out (thorny issues inside Django we were hoping toavoid, that sort of thing). We hope to have the site in alpha by the end of thesummer.
Cohort membersRyan Alexander, Steven Anderson, James Aylett, Hannah Donovan, NatalieDowne, Mark Norman Francis, Matthew Hasler, Steve Marshall, RichardPope, Gareth Rushgrove, Simon Willison.
Cohort 1: Winter 2008
http://devfort.com/
Cowboy development at work
MP expenses
Heather Brooke
January 2005The FOI request
February 2008The Information Tribunal
“Transparency will damage democracy”
January 2009The exemption law
March 2009The mole
“All of the receipts of 650-odd MPs, redacted and unredacted, are for sale at a price of £300,000, so I am told. The price is going up because of the interest in the
subject.”Sir Stuart Bell, MP
Newsnight, 30th March
8th May, 2009The Daily Telegraph
At the Guardian...
April: “Expenses are due out in a couple of months, is there
anything we can do?”
June: “Expenses have been bumped forward, they’re out
next week!”
Thursday 11th JuneThe proof-of-concept
Monday 15th JuneThe tentative go-ahead
Tuesday 16th JuneDesigner + client-side engineer
Wednesday 17th JuneOperations engineer
Thursday 18th JuneLaunch day!
How we built it
$ convert Frank_Comm.pdf pages.png
Frictionless registration
Page filters
page_filters = ( # Maps name of filter to dictionary of kwargs to doc.pages.filter() ('reviewed', { 'votes__isnull': False }), ('unreviewed', { 'votes__isnull': True }), ('with line items', { 'line_items__isnull': False }), ('interesting', { 'votes__interestingvote__status': 'yes' }), ('interesting but known', { 'votes__interestingvote__status': 'known'...)page_filters_lookup = dict(page_filters)
pages = doc.pages.all() if page_filter: kwargs = page_filters_lookup.get(page_filter) if kwargs is None: raise Http404, 'Invalid page filter: %s' % page_filter pages = pages.filter(**kwargs).distinct() # Build the filters filters = [] for name, kwargs in page_filters: filters.append({ 'name': name, 'count': doc.pages.filter(**kwargs).distinct().count(), })
Matching names
http://github.com/simonw/datamatcher
On the day
def get_mp_pages(): "Returns list of (mp-name, mp-page-url) tuples" soup = Soup(urllib.urlopen(INDEX_URL)) mp_links = [] for link in soup.findAll('a'): if link.get('title', '').endswith("'s allowances"): mp_links.append( (link['title'].replace("'s allowances", ''), link['href']) ) return mp_links
def get_pdfs(mp_url): "Returns list of (description, years, pdf-url, size) tuples" soup = Soup(urllib.urlopen(mp_url)) pdfs = [] trs = soup.findAll('tr')[1:] # Skip the first, it's the table header for tr in trs: name_td, year_td, pdf_td = tr.findAll('td') name = name_td.string year = year_td.string pdf_url = pdf_td.find('a')['href'] size = pdf_td.find('a').contents[-1].replace('(', '').replace(')', '') pdfs.append( (name, year, pdf_url, size) ) return pdfs
“Drop Everything”
Photoshop + AppleScriptv.s.
Java + IntelliJ
Images on our docroot (S3 upload was taking too long)
Blitz QA
Launch! (on EC2)
Crash #1: more Apache children than MySQL
connections
unreviewed_count = Page.objects.filter( votes__isnull = True).distinct().count()
SELECT COUNT(DISTINCT `expenses_page`.`id`)FROM `expenses_page` LEFT OUTER JOIN `expenses_vote` ON ( `expenses_page`.`id` = `expenses_vote`.`page_id` ) WHERE `expenses_vote`.`id` IS NULL
unreviewed_count = cache.get('homepage:unreviewed_count')if unreviewed_count is None: unreviewed_count = Page.objects.filter( votes__isnull = True ).distinct().count() cache.set('homepage: unreviewed_count', unreviewed_count, 60)
With 70,000 pages and a LOT of votes...
DB takes up 135% of CPU
Cache the count in memcached...
DB drops to %35 of CPU
unreviewed_count = Page.objects.filter( votes__isnull = True ).distinct().count()
reviewed_count = Page.objects.filter( votes__isnull = False ).distinct().count()
unreviewed_count = Page.objects.filter( is_reviewed = False ).count()
Migrating to InnoDB on a separate server
ssh mps-live "mysqldump mp_expenses" |sed 's/ENGINE=MyISAM/ENGINE=InnoDB/g' |
sed 's/CHARSET=latin1/CHARSET=utf8/g' |ssh mysql-big "mysql -u root mp_expenses"
Reigning in the cowboy
An RSS to JSON proxy service
Pair programming
Comprehensive unit tests, with mocks
Continuous integration (Team City)
Deployment scripts against CI build numbers
Reigning in the cowboy
Points of embarrassment
Database required to run the test suite
Logging? What logging?
Tests get deployed alongside the code (!)
... but generally pretty smooth sailing
A final thought
Web development in 2005
RelationalDatabase Cache
Application Admin tools
Templates XML feeds
Web development in 2009RelationalDatabase Cache
ApplicationAdmin tools
Templates XML feeds
Datastructure servers
Search index
External web services
Monitoring and reporting
API Webhooks
Message queue Offline workers
Non-relationaldatabase
Thank you