Download - Lessons learned while building Omroep.nl
Lessons learned while building omroep.nl
Bart Zonneveld (@bartzon)Sjoerd Tieleman (@tieleman)
Nederlandse Publieke OmroepDutch Public broadcasting Company
AVRO Joodse Omroep NMO Teleac
BNN KRO NOS TROS
BOS LLiNK NPS VARA
EO MAX OHM VPRO
HUMAN NCRV RKK ZvK
IKON NIO RVU
Rails sites
• Beetlejuice
• Centrale navigatie
• Omroep.nl
• Nederland 1
• Nederland 3
• Radio 1
• Z@PP
• Z@ppelin
• Zelda
• Various tools
Team
• 2 coders
• 1 designer
• 1 editor
• 1 project manager
6 months, CMS built from scratch
Requirements
• Handle 30.000-40.000 pageviews per day
• Handle traffic spikes
• Flexible, multi user CMS
• Loads of external data
Daily spread
Some numbers+----------------------+-------+-------+---------+---------+-----+-------+| Name | Lines | LOC | Classes | Methods | M/C | LOC/M |+----------------------+-------+-------+---------+---------+-----+-------+| Controllers | 1864 | 1535 | 41 | 185 | 4 | 6 || Helpers | 797 | 631 | 1 | 75 | 75 | 6 || Models | 1303 | 1055 | 40 | 153 | 3 | 4 || Libraries | 814 | 620 | 15 | 79 | 5 | 5 || Integration tests | 0 | 0 | 0 | 0 | 0 | 0 || Functional tests | 0 | 0 | 0 | 0 | 0 | 0 || Unit tests | 0 | 0 | 0 | 0 | 0 | 0 || Model specs | 1932 | 1573 | 0 | 9 | 0 | 172 || View specs | 7322 | 5950 | 0 | 153 | 0 | 36 || Controller specs | 7292 | 5846 | 0 | 175 | 0 | 31 || Helper specs | 900 | 676 | 0 | 2 | 0 | 336 || Library specs | 56 | 45 | 2 | 12 | 6 | 1 |+----------------------+-------+-------+---------+---------+-----+-------+| Total | 22280 | 17931 | 99 | 843 | 8 | 19 |+----------------------+-------+-------+---------+---------+-----+-------+
Code LOC: 3841 Test LOC: 14090 Code to Test Ratio: 1:3.7
Moar numbers
• 410 Cucumber scenarios, 600 step definitions
• 2235 RSpec specifications
so it must be bug-free, right? ;-)
Tools
Ruby on Rails 2.3.4 Rspec + Webrat + Cucumber
Apache 2.2 Paperclip
SVN App monitoring: RPM, Hoptoad
MySQL 5 Service monitoring: Nagios
Memcache
Tools: app monitoring
Hoptoad New Relic RPM
Architecture
• Apache 2.2 with mod_proxy
• Rails 2.3.4 running on Phusion Passenger 2.2.5 with REE
• 4 hosts, each running 4 instances (per app)Appdex: 1.0, avg. response time 40ms, 130 rpm, db load 0.6 %
Servers
• Quadcore Intel Xeon E542, 32 GB Ram
• Fedora 8
• Other mumbojumbo
Architecture
Front proxy Front proxy
Application server
Application server
Application server
Application server
Database memcache
Workflow
• BDD
• Shared behaviours
• Performance testing
• Staging and production environment
BDD
• RSpec
• Cucumber
• (Webrat)
3 slide intro to BDD:RSpec
describe Article do it_should_behave_like "all objects with userstamps" it_should_behave_like "all objects than can be published" it_should_behave_like "all objects that have an url" it_should_behave_like "all objects that can be searched" it_should_behave_like "all objects with related articles" it "should not be valid without a name" do @article.attributes = @valid_attributes.except(:name) @article.should_not be_valid end it "should not be valid without contents" do @article.attributes = @valid_attributes.except(:contents) @article.should_not be_valid endend
3 slide intro to BDD:Cucumber features
Feature: Articles on the homepage As a visitor I want to view articles on the homepage So that I can see the latest content Scenario: 5 most recent articles Given there are 8 articles When I visit the homepage Then I should see the 5 last published articles
3 slide intro to BDD:Cucumber steps
Given "there are $num articles" do |num| num.to_i.times { create_article }end
When "I visit the homepage" do visit root_pathend
Then "I should see the $num last published articles" do |num| Article.last_published(num).each do |article| response.should contain(article.title) endend
Shared behaviours
• Tags
• User stamps (created by, updated by)
• Searching
• “Related” articles
• Publication timestamps (on/offline at)
Shared behavioursmodule UserStamps def self.included(klass) klass.instance_eval do include InstanceMethods end end module InstanceMethods def created_by User.find_by_id(creator_id) end
def updated_by User.find_by_id(updater_id) end endend
class Article < ActiveRecord::Base include Shared::UserStamps include Shared::Published include Shared::Url include Shared::Search include Shared::RelatedArticles
# stuffend
Workflow: performance testing
• ab, httperf, autobench, cURL
• NewRelic RPM
• Safari Web Inspector
• http://railslab.newrelic.com/scaling-rails
Autobench
Challenges
• Content Management System
• Loads and loads and loads of external data
CMS
• Articles, Themas, Specials, Subsites
• Multiple feeds, images, links
• Version control
• Media database
CMS: Articles
Article
PageThema
Subsite
Special
Link Feed Image
CMS: Version control
Media DB
• Implemented as REST app
• To be used as REST service
• Files, folders, crops
External data
• RSS feeds
• EPG data
• Zelda
• Babel
• News/sport/teletekst
• Lots of custom XML formats
External data: XML/RSS
• Empty feeds
• Encodings are off (Windows-1252, ISO-8859-1, UTF-8)
• “Custom” fields
• Incorrect fields (dates, unescaped HTML)
• Timeouts
• Everything that can go wrong, will go wrong
External data: Twitter
External data: EPG data
Zelda
don’t sue us Nintendo... please? :)
External data
• Empty feeds
• Retrieving the feed while someone is updating it
• Required fields that are empty
• DTD?
<!ELEMENT aflevering ( prid?, tite?, medium?, icon?, aankondiging?, inkl?, ingl?, infi?, inak?, inds?, inbb?, kykw?, orti?, aant?, land?, lcod?, psrt?, prem?, inh1?, afle?, atit?, inh2?, bron?, prij?, inh3?, mail?, webs?, inhk?, gids_tekst?, omroepen?, genres?, personen?, streams?, fragmenten?, serie?)>
<!ELEMENT type (#PCDATA)><!ELEMENT begi (#PCDATA)><!ELEMENT eind (#PCDATA)>
<!ELEMENT inkl (#PCDATA)><!ELEMENT ingl (#PCDATA)><!ELEMENT intt (#PCDATA)><!ELEMENT inhh (#PCDATA)><!ELEMENT omro (#PCDATA)><!ELEMENT lcod (#PCDATA)><!ELEMENT herh (#PCDATA)><!ELEMENT inds (#PCDATA)><!ELEMENT infi (#PCDATA)><!ELEMENT inbb (#PCDATA)><!ELEMENT genr (#PCDATA)><!ELEMENT kykw (#PCDATA)><!ELEMENT afle (#PCDATA)>
<!ELEMENT inh1 (#PCDATA)><!ELEMENT inh2 (#PCDATA)><!ELEMENT inh3 (#PCDATA)>
Lessons learned
• Cache the crap out of everything
• Rescue everything
• Test everything (frontend and backend)
Caching
• Cache the homepage
• Page cache → Fragment cache
• Don’t cache forms
• Cache as much as possible
Case: article views
• Article is page cached
• Update the number of views in realtime?
Use AJAX!
<% javascript_tag do %> <%= remote_function :url => update_views_article_path(@article) %><% end %>
Case: banner items
Case: banner items
• Fast requests (<10ms)
• ETags (304 Not Modified)
• Static resource → page cache
• Move to front proxy, frees up Rails cluster
• 1100rpm → 130rpm
• 20ms → 40ms
• Average response time going up? Oh nooooes!
Caching external data
• Don’t expire cache (preferrably)
• Explicitly overwrite
• Update in background (feeeeeeeds)
• memcache FTW!
memcache
• Escape your keys using CGI::escape
• Max keylength is 250
• Max value size is 1MB
Rescueing
def self.get_feed_contents(url) content = "" open(url) { |s| content = s.read } RSS::Parser.parse(content, false).itemsrescue => e logger.warn "Feed #{url} raised error: #{e.message}" []rescue Timeout::Error => e logger.warn "Feed #{url} timed out: #{e.message}" []end
Timeout::Error is an interrupt...
Testing
• rcov
• Refactor your tests
• Peer reviews, external audits
• Run specs/features (continuously) in parallel(your Cucumber is too slooooow!)
Cucumber salad
num_of_processes.times do |count| pids << Process.fork do setup_database(conn, count) Cucumber::Cli::Main.execute( ["-f", "progress", "-l", "nl", "-r", "features"] + feature_sets[count] ) endend
“Regular” MacBook Pro (4) Mac Pro (8)
12:12 4:34 2:12
Conclusions
• Test
• Optimize
• Monitor
What’s next for us?
• Building a high-performance backend
• Uitzending Gemist statistics API
• 250+ reqs/s at minimum
@questions.any?