cowboy development with django

Cowboy development with Django

Simon WillisonDjangoCon 2009

http://www.youtube.com/watch?v=nZx9sNXv9h0



Just one problem... we didn’t have cowboys in

England

The Napoleonic Wars

A Napoleonic Sea Fort

http://en.wikipedia.org/wiki/File:Alderney_-_Fort_Clonque_02.jpg



Super Evil Dev Fort

http://www.anotherurl.com/travel/fort_clonque/handbook.htm



Photos by Cindy Li

http://www.flickr.com/photos/cindyli/sets/72157610369683426/



WildLifeNearYou.com(Built in 1 week and 10 months)

Search uses the geospatial branch of Xapian

Species database comes from Freebase

Photos can be imported from Flickr

“Suggest changes” to our Zoo information uses model objects representing proposed changes to other model objects

What is /dev/fort?

Imagine a place of no distractions, noIM, no Twitter — in fact, nointernet. Within, a group of a dozenor more developers, designers,thinkers and doers. And a lot of afood.

Now imagine that place is a fort.

The idea behind /dev/fort is to throwa group of people together, cut themoff from the rest of the world, and

/dev/fortCohort 3: Winter 2009

The tripThe third /dev/fort will run from 9th to 16th November on the KintyrePeninsula in Scotland.

Cohort 2: Summer 2009

The tripThe second /dev/fort ran from 30th May to 6th June 2009 at KnockbrexCastle in Scotland. As with the first cohort, we have a few remainingproblems still to iron out (thorny issues inside Django we were hoping toavoid, that sort of thing). We hope to have the site in alpha by the end of thesummer.

Cohort membersRyan Alexander, Steven Anderson, James Aylett, Hannah Donovan, NatalieDowne, Mark Norman Francis, Matthew Hasler, Steve Marshall, RichardPope, Gareth Rushgrove, Simon Willison.

Cohort 1: Winter 2008

http://devfort.com/

http://devfort.com

http://devfort.com

Cowboy development at work

MP expenses

Heather Brooke

January 2005The FOI request

February 2008The Information Tribunal

“Transparency will damage democracy”

January 2009The exemption law

March 2009The mole

“All of the receipts of 650-odd MPs, redacted and unredacted, are for sale at a price of £300,000, so I am told. The price is going up because of the interest in the

subject.”Sir Stuart Bell, MP

Newsnight, 30th March

8th May, 2009The Daily Telegraph

At the Guardian...

April: “Expenses are due out in a couple of months, is there

anything we can do?”

June: “Expenses have been bumped forward, they’re out

next week!”

Thursday 11th JuneThe proof-of-concept

Monday 15th JuneThe tentative go-ahead

Tuesday 16th JuneDesigner + client-side engineer

Wednesday 17th JuneOperations engineer

Thursday 18th JuneLaunch day!

How we built it

$ convert Frank_Comm.pdf pages.png

Frictionless registration

Page filters

page_filters = ( # Maps name of filter to dictionary of kwargs to doc.pages.filter() ('reviewed', { 'votes__isnull': False }), ('unreviewed', { 'votes__isnull': True }), ('with line items', { 'line_items__isnull': False }), ('interesting', { 'votes__interestingvote__status': 'yes' }), ('interesting but known', { 'votes__interestingvote__status': 'known'...)page_filters_lookup = dict(page_filters)

pages = doc.pages.all() if page_filter: kwargs = page_filters_lookup.get(page_filter) if kwargs is None: raise Http404, 'Invalid page filter: %s' % page_filter pages = pages.filter(**kwargs).distinct() # Build the filters filters = [] for name, kwargs in page_filters: filters.append({ 'name': name, 'count': doc.pages.filter(**kwargs).distinct().count(), })

Matching names

http://github.com/simonw/datamatcher



On the day

def get_mp_pages(): "Returns list of (mp-name, mp-page-url) tuples" soup = Soup(urllib.urlopen(INDEX_URL)) mp_links = [] for link in soup.findAll('a'): if link.get('title', '').endswith("'s allowances"): mp_links.append( (link['title'].replace("'s allowances", ''), link['href']) ) return mp_links

def get_pdfs(mp_url): "Returns list of (description, years, pdf-url, size) tuples" soup = Soup(urllib.urlopen(mp_url)) pdfs = [] trs = soup.findAll('tr')[1:] # Skip the first, it's the table header for tr in trs: name_td, year_td, pdf_td = tr.findAll('td') name = name_td.string year = year_td.string pdf_url = pdf_td.find('a')['href'] size = pdf_td.find('a').contents[-1].replace('(', '').replace(')', '') pdfs.append( (name, year, pdf_url, size) ) return pdfs

“Drop Everything”

Photoshop + AppleScriptv.s.

Java + IntelliJ

Images on our docroot (S3 upload was taking too long)

Blitz QA

Launch! (on EC2)

Crash #1: more Apache children than MySQL

connections

unreviewed_count = Page.objects.filter( votes__isnull = True).distinct().count()

SELECT COUNT(DISTINCT èxpenses_page`.ìd`)FROM èxpenses_page` LEFT OUTER JOIN èxpenses_vote` ON ( èxpenses_page`.ìd` = èxpenses_vote`.`page_id` ) WHERE èxpenses_vote`.ìd` IS NULL

unreviewed_count = cache.get('homepage:unreviewed_count')if unreviewed_count is None: unreviewed_count = Page.objects.filter( votes__isnull = True ).distinct().count() cache.set('homepage: unreviewed_count', unreviewed_count, 60)

With 70,000 pages and a LOT of votes...

DB takes up 135% of CPU

Cache the count in memcached...

DB drops to %35 of CPU

unreviewed_count = Page.objects.filter( votes__isnull = True ).distinct().count()

reviewed_count = Page.objects.filter( votes__isnull = False ).distinct().count()

unreviewed_count = Page.objects.filter( is_reviewed = False ).count()

Migrating to InnoDB on a separate server

ssh mps-live "mysqldump mp_expenses" |sed 's/ENGINE=MyISAM/ENGINE=InnoDB/g' |

sed 's/CHARSET=latin1/CHARSET=utf8/g' |ssh mysql-big "mysql -u root mp_expenses"

Reigning in the cowboy

An RSS to JSON proxy service

Pair programming

Comprehensive unit tests, with mocks

Continuous integration (Team City)

Deployment scripts against CI build numbers

Reigning in the cowboy

Points of embarrassment

Database required to run the test suite

Logging? What logging?

Tests get deployed alongside the code (!)

... but generally pretty smooth sailing

A final thought

Web development in 2005

RelationalDatabase Cache

Application Admin tools

Templates XML feeds

Web development in 2009RelationalDatabase Cache

ApplicationAdmin tools

Templates XML feeds

Datastructure servers

Search index

External web services

Monitoring and reporting

API Webhooks

Message queue Offline workers

Non-relationaldatabase

Thank you

cowboy development with django

Technology

page lters

lters lters

invalid page lter

mp expenses

url mp

lter votes

url pdfs

list of mp