feed burner scalability

FeedBurner:Scalable WebApplications usingMySQL and Java

Joe Kottke, Director ofNetwork Operations

© 2006 FeedBurner

2What is FeedBurner?

• Market-leading feed management provider• 170,000 bloggers, podcasters and commercial

publishers including Reuters, USA TODAY,Newsweek, Ars Technica, BoingBoing…

• 11 million subscribers in 190 countries.• Web-based services help publishers expand

their reach online, attract subscribers andmake money from their content

• The largest advertising network for feeds

© 2006 FeedBurner

3Scaling history

• July 2004– 300Kbps, 5,600 feeds– 3 app servers, 3 web servers 2 DB servers

• April 2005– 5Mbps, 47,700 feeds– My first MySQL Users Conference– 6 app servers, 6 web servers (same machines)

• September 2005– 20Mbps, 109,200 feeds

• Currently– 115 Mbps, 270,000 feeds, 100 Million hits per day

© 2006 FeedBurner

4Scalability Problem 1: Plain old reliability

• August 2004• 3 web servers, 3 app servers, 2 DB servers.

Round Robin DNS• Single-server failure, seen by 1/3 of all users

© 2006 FeedBurner

5Solution: Load Balancers, Monitoring

• Health Check pages– Round trip all the way back to the database– Same page monitored by load balancers

and monitoring• Monitoring

– Cacti (http://www.cacti.net/)– Nagios (http://www.nagios.org)

© 2006 FeedBurner

6Health Check

UserComponent uc = UserComponentFactory.getUserComponent();User user = uc.getUser(”monitor-user");

// If first load, mark as down. // Let FeedServlet mark things as up in init method. load-on-startupString healthcheck = (String) application.getAttribute("healthcheck");if(healthcheck == null || healthcheck.length() < 1) { healthcheck = new String(”DOWN"); application.setAttribute("healthcheck",healthcheck);}// We return null in case of problem, or if user doesn’t existif( user == null ) {

healthcheck = new String("DOWN");application.setAttribute("healthcheck",healthcheck);

}System.out.print(healthcheck);

© 2006 FeedBurner

7Cacti

© 2006 FeedBurner

8Start/Stop scripts

#!/bin/bash

# Source the environment. ${HOME}/fb.env

# Start TOMCATcd ${FB_APPHOME}

# Remove stale temp filesfind ~/rsspp/catalina/temp/ -type f -exec rm -rf {} \;

# Remove the work directory#rm -rf ~/rsspp/catalina/work/*

${CATALINA_HOME}/bin/startup.sh

© 2006 FeedBurner

9Start/Stop scripts

#!/bin/bashFB_APPHOME=/opt/fb/fb-appJAVA_HOME=/usrCATALINA_HOME=/opt/tomcatCATALINA_BASE=${FB_APPHOME}/catalinaCATALINA_OPTS="-Xmx768m -Xms7688m -Dnetworkaddress.cache.ttl=0"WEBROOT=/opt/fb/webroot

export JAVA_HOME CATALINA_HOME CATALINA_BASE CATALINA_OPTS WEBROOT

© 2006 FeedBurner

10Scalability Problem 2: Stats recording/mgmt

• Every hit is recorded• Certain hits mean more than others• Flight recorder• Any table management locks• Inserts slow way down (90GB table)

© 2006 FeedBurner

11Solution: Executor Pool

• Executor Pool– Doug Lea’s concurrency library– Use a PooledExecutor so stats inserts happen in a

separate thread– Spring bean definition:

<bean id="StatsExecutor"class="EDU.oswego.cs.dl.util.concurrent.PooledExecutor">

<constructor-arg><bean class="EDU.oswego.cs.dl.util.concurrent.LinkedQueue"/>

</constructor-arg><property name="minimumPoolSize" value="10" /><property name="keepAliveTime" value="5000" />

</bean>

© 2006 FeedBurner

12Solution: Lazy rollup

• Only today’s detailed stats need to go againstreal-time table

• Roll up previous days into sparse summarytables on-demand

• First access for stats for a day is slow,subsequent request are fast

© 2006 FeedBurner

13Scalability Problem 3: Primary DB overload

• Mostly used master DB server for everything• Read vs. Read/Write load didn’t matter in the

beginnning• Slow inserts would block reads, when using

MyISAM

© 2006 FeedBurner

14Solution: Balance read and read/write load

• Looked at workload– Found where we could break up read vs. read/write– Created Spring ExtendedDaoObjects– Tomcat-managed DataSources

• Balanced master vs. slave load (Duh)– Slave becomes perfect place for snapshot backups

• Watch for replication problems– Merge table problems (later)– Slow queries slow down replication

© 2006 FeedBurner

15Example: Cacti graph of MySQL handlers

© 2006 FeedBurner

16ExtendedDaoObject

• Application code extends this class and usesgetHibernateTemplate() or getReadOnlyHibernateTemplate()depending upon requirements

• Similar class for JDBC

public class ExtendedHibernateDaoSupport extends HibernateDaoSupport {

private HibernateTemplate readOnlyHibernateTemplate;

public void setReadOnlySessionFactory(SessionFactory sessionFactory) { this.readOnlyHibernateTemplate = new HibernateTemplate(sessionFactory); readOnlyHibernateTemplate.setFlushMode(HibernateTemplate.FLUSH_NEVER); }

protected HibernateTemplate getReadOnlyHibernateTemplate() { return (readOnlyHibernateTemplate == null) ? getHibernateTemplate() :readOnlyHibernateTemplate; }

}

© 2006 FeedBurner

17Scalability Problem 4: Total DB overload

• Everything slowing down• Using DB as cache• Database is the ‘shared’ part of all app servers• Ran into table size limit defaults on MyISAM

(4GB). We were lazy.– Had to use Merge tables as a bridge to newer

larger tables

© 2006 FeedBurner

18Solution: Stop using the database

• Where possible :)• Multi-level caching

– Local VM caching (EHCache, memory only)– Memcached (http://www.danga.com/memcached/)– And finally, database.

• Memcached– Fault-tolerant, but client handles that.– Shared nothing– Data is transient, can be recreated

© 2006 FeedBurner

19Scalability Problem 5: Lazy initialization

• Our stats get rolled up on demand– Popular feeds slowed down the whole system

• FeedCount chicklet calculation– Every feed gets its circulation calculated at the

same time– Contention on the table

© 2006 FeedBurner

20Solution: BATCH PROCESSING

• For FeedCount, we staggered the calculation– Still would run into contention– Stats stuff again slowed down at 1AM Chicago time.

• We now process the rolled-up data every night– Delay showing the previous circulation in the

FeedCount until roll-up is done.

• Still wasn’t enough

© 2006 FeedBurner

21Scalability Problem 6: Stats writes, again

• Too much writing to master DB• More and more data stored associated with

each feed• More stats tracking

– Ad Stats– Item Stats– Circulation Stats

© 2006 FeedBurner

22Solution: Merge Tables

• After the nightly rollup, we truncate thesubtable from 2 days ago

• Gotcha with truncating a subtable:– FLUSH TABLES; TRUNCATE TABLE ad_stats0;

– Could succeed on master, but fail on slave

• The right way to truncate a subtable:– ALTER TABLE ad_stats TYPE=MERGEUNION=(ad_stats1,ad_stats2);

– TRUNCATE TABLE ad_stats0;

– ALTER TABLE ad_stats TYPE=MERGEUNION=(ad_stats0,ad_stats1,ad_stats2);

© 2006 FeedBurner

23Solution: Horizontal Partitioning

• Constantly identifying hot spots in thedatabase– Ad serving– Flare serving– Circulation (constant writes, occasional reads)

• Move hottest tables/queries off to own clusters– Hibernate and certain lazy patterns allow this– Keeps the driving tables from slowing down

© 2006 FeedBurner

24Scalability Problem 7: Master DB Failure

• Still using just a primary and slave• Master crash: Single point of failure• No easy way to promote a slave to a master

© 2006 FeedBurner

25Solution: No easy answer

• Still using auto_increment– Multi-master replication is out

• Tried DRBD + HeartBeat– Disk is replicated block-by-block– Hot primary, cold secondary

• Didn’t work as we hoped– Myisamchk takes too long after failure– I/O + CPU overhead

• InnoDB is supposedly better

© 2006 FeedBurner

26Our multi-master solution

• Low-volume master cluster– Uses DRBD + HeartBeat– Works well under smaller load– Does mapping to feed data clusters

• Feed Data Cluster– Standard Master + Slave(s) structure– Can be added as needed

© 2006 FeedBurner

28Scalability Problem 8: Power Failure

• Chicago has ‘questionable’ infrastructure.• Battery backup, generators can be problematic• Colo techs have been known to hit the Big

Red Switch• Needed a disaster recovery/secondary site

– Active/Active not possible for us. Yet.– Would have to keep fast connection to redundant

site– Would require 100% of current hardware, but

would lie quiet

© 2006 FeedBurner

29Code Name: Panic App

• Product Name: Feed Insurance• Elegant, simple solution• Not Java (sorry)• Perl-based feed fetcher

– Downloads copies of feeds, saved as flat XML files– Synchronized out to local and remote servers– Special rules for click tracking, dynamic GIFs, etc

© 2006 FeedBurner

30General guidelines

• Know your DB workload– Cacti really helps with this

• ‘EXPLAIN’ all of your queries– Helps keep crushing queries out of the system

• Cache everything that you can• Profile your code

– Usually only needed on hard-to-find leaks

© 2006 FeedBurner

31Our settings / what we use

• Don’t always need the latest and greatest– Hibernate 2.1– Spring– DBCP– MySQL 4.1– Tomcat 5.0.x

• Let the container manage DataSources

© 2006 FeedBurner

32JDBC

• Hibernate/iBatis/Name-Your-ORM-Here– Use ORM when appropriate– Watch the queries that your ORM generates– Don't be afraid to drop to JDBC

• Driver parameters we use:# For Internationalization of Ads, multi-byte characters in generaluseUnicode=truecharacterEncoding=UTF-8

# Biggest performance bitscacheServerConfiguration=trueuseLocalSessionState=true

# Some other settings that we've needed as things have evolveduseServerPrepStmts=falsejdbcCompliantTruncation=false

feed burner scalability

Business