kottke_joe

8/7/2019 kottke_joe

http://slidepdf.com/reader/full/kottkejoe 1/33

FeedBurner:Scalable WebApplications usingMySQL and Java

Joe Kottke, Director of

Network Operations

8/7/2019 kottke_joe


© 2006 FeedBurner

2What is FeedBurner?

• Market-leading feed management provider

• 170,000 bloggers, podcasters and commercialpublishers including Reuters, USA TODAY,

Newsweek, Ars Technica, BoingBoing…• 11 million subscribers in 190 countries.

• Web-based services help publishers expandtheir reach online, attract subscribers and

make money from their content• The largest advertising network for feeds

8/7/2019 kottke_joe


© 2006 FeedBurner

3Scaling history

• July 2004

– 300Kbps, 5,600 feeds

– 3 app servers, 3 web servers 2 DB servers

• April 2005

– 5Mbps, 47,700 feeds– My first MySQL Users Conference

– 6 app servers, 6 web servers (same machines)

• September 2005

– 20Mbps, 109,200 feeds

• Currently

– 115 Mbps, 270,000 feeds, 100 Million hits per day

8/7/2019 kottke_joe


© 2006 FeedBurner

4Scalability Problem 1: Plain old reliability

• August 2004

• 3 web servers, 3 app servers, 2 DB servers.Round Robin DNS

• Single-server failure, seen by 1/3 of all users

8/7/2019 kottke_joe


© 2006 FeedBurner

5Solution: Load Balancers, Monitoring

• Health Check pages

– Round trip all the way back to the database

– Same page monitored by load balancers

and monitoring• Monitoring

– Cacti (http://www.cacti.net/)

– Nagios (http://www.nagios.org)

8/7/2019 kottke_joe


© 2006 FeedBurner

6Health Check

UserComponent uc = UserComponentFactory.getUserComponent();

User user = uc.getUser(”monitor-user");

// If first load, mark as down.// Let FeedServlet mark things as up in init method. load-on-startup

String healthcheck = (String) application.getAttribute("healthcheck");

if(healthcheck == null || healthcheck.length() < 1) {healthcheck = new String(”DOWN");application.setAttribute("healthcheck",healthcheck);

}

// We return null in case of problem, or if user doesn’t existif( user == null ) {

healthcheck = new String("DOWN");application.setAttribute("healthcheck",healthcheck);

}System.out.print(healthcheck);

8/7/2019 kottke_joe


© 2006 FeedBurner

7Cacti

8/7/2019 kottke_joe


© 2006 FeedBurner

8Start/Stop scripts

#!/bin/bash

# Source the environment

. ${HOME}/fb.env

# Start TOMCAT

cd ${FB_APPHOME}

# Remove stale temp files

find ~/rsspp/catalina/temp/ -type f -exec rm -rf {} \;

# Remove the work directory

#rm -rf ~/rsspp/catalina/work/*

${CATALINA_HOME}/bin/startup.sh

8/7/2019 kottke_joe


© 2006 FeedBurner

9Start/Stop scripts

#!/bin/bash

FB_APPHOME=/opt/fb/fb-app

JAVA_HOME=/usr

CATALINA_HOME=/opt/tomcatCATALINA_BASE=${FB_APPHOME}/catalina

CATALINA_OPTS="-Xmx768m -Xms7688m -Dnetworkaddress.cache.ttl=0"

WEBROOT=/opt/fb/webroot

export JAVA_HOME CATALINA_HOME CATALINA_BASE CATALINA_OPTS WEBROOT

8/7/2019 kottke_joe


© 2006 FeedBurner

10Scalability Problem 2: Stats recording/mgmt

• Every hit is recorded

• Certain hits mean more than others

• Flight recorder

• Any table management locks• Inserts slow way down (90GB table)

8/7/2019 kottke_joe


© 2006 FeedBurner

11Solution: Executor Pool

• Executor Pool

– Doug Lea’s concurrency library

– Use a PooledExecutor so stats inserts happen in aseparate thread

– Spring bean definition:

<bean id="StatsExecutor"class="EDU.oswego.cs.dl.util.concurrent.PooledExecutor">

<constructor-arg>

<bean class="EDU.oswego.cs.dl.util.concurrent.LinkedQueue"/>

</constructor-arg>

<property name="minimumPoolSize" value="10" /><property name="keepAliveTime" value="5000" />

</bean>

8/7/2019 kottke_joe


© 2006 FeedBurner

12Solution: Lazy rollup

• Only today’s detailed stats need to go againstreal-time table

• Roll up previous days into sparse summarytables on-demand

• First access for stats for a day is slow,subsequent request are fast

8/7/2019 kottke_joe


© 2006 FeedBurner

13Scalability Problem 3: Primary DB overload

• Mostly used master DB server for everything

• Read vs. Read/Write load didn’t matter in thebeginnning

• Slow inserts would block reads, when usingMyISAM

8/7/2019 kottke_joe


© 2006 FeedBurner

14Solution: Balance read and read/write load

• Looked at workload

– Found where we could break up read vs. read/write

– Created Spring ExtendedDaoObjects

– Tomcat-managed DataSources

• Balanced master vs. slave load (Duh)

– Slave becomes perfect place for snapshot backups

• Watch for replication problems

– Merge table problems (later)– Slow queries slow down replication

8/7/2019 kottke_joe


© 2006 FeedBurner

15Example: Cacti graph of MySQL handlers

8/7/2019 kottke_joe


© 2006 FeedBurner

16ExtendedDaoObject

• Application code extends this class and usesgetHibernateTemplate() or getReadOnlyHibernateTemplate()depending upon requirements

• Similar class for JDBC

public class ExtendedHibernateDaoSupport extends HibernateDaoSupport {

private HibernateTemplate readOnlyHibernateTemplate;

public void setReadOnlySessionFactory(SessionFactory sessionFactory) {this.readOnlyHibernateTemplate = new HibernateTemplate(sessionFactory);readOnlyHibernateTemplate.setFlushMode(HibernateTemplate.FLUSH_NEVER);

}

protected HibernateTemplate getReadOnlyHibernateTemplate() {return (readOnlyHibernateTemplate == null) ? getHibernateTemplate() :

readOnlyHibernateTemplate;}

}

8/7/2019 kottke_joe


© 2006 FeedBurner

17Scalability Problem 4: Total DB overload

• Everything slowing down

• Using DB as cache

• Database is the ‘shared’ part of all app servers

• Ran into table size limit defaults on MyISAM(4GB). We were lazy.

– Had to use Merge tables as a bridge to newerlarger tables

8/7/2019 kottke_joe


© 2006 FeedBurner

18Solution: Stop using the database

• Where possible :)

• Multi-level caching

– Local VM caching (EHCache, memory only)

– Memcached (http://www.danga.com/memcached/)

– And finally, database.

• Memcached

– Fault-tolerant, but client handles that.

– Shared nothing– Data is transient, can be recreated

8/7/2019 kottke_joe


© 2006 FeedBurner

19Scalability Problem 5: Lazy initialization

• Our stats get rolled up on demand

– Popular feeds slowed down the whole system

• FeedCount chicklet calculation

– Every feed gets its circulation calculated at thesame time

– Contention on the table

8/7/2019 kottke_joe


© 2006 FeedBurner

20Solution: BATCH PROCESSING

• For FeedCount, we staggered the calculation

– Still would run into contention

– Stats stuff again slowed down at 1AM Chicago time.

• We now process the rolled-up data every night

– Delay showing the previous circulation in theFeedCount until roll-up is done.

• Still wasn’t enough

8/7/2019 kottke_joe


© 2006 FeedBurner

21Scalability Problem 6: Stats writes, again

• Too much writing to master DB

• More and more data stored associated witheach feed

• More stats tracking– Ad Stats

– Item Stats

– Circulation Stats

8/7/2019 kottke_joe


© 2006 FeedBurner

22Solution: Merge Tables

• After the nightly rollup, we truncate thesubtable from 2 days ago

• Gotcha with truncating a subtable:– FLUSH TABLES; TRUNCATE TABLE ad_stats0;

– Could succeed on master, but fail on slave

• The right way to truncate a subtable:– ALTER TABLE ad_stats TYPE=MERGEUNION=(ad_stats1,ad_stats2);

– TRUNCATE TABLE ad_stats0;– ALTER TABLE ad_stats TYPE=MERGE

UNION=(ad_stats0,ad_stats1,ad_stats2);

8/7/2019 kottke_joe


© 2006 FeedBurner

23Solution: Horizontal Partitioning

• Constantly identifying hot spots in thedatabase

– Ad serving

– Flare serving

– Circulation (constant writes, occasional reads)

• Move hottest tables/queries off to own clusters

– Hibernate and certain lazy patterns allow this

– Keeps the driving tables from slowing down

8/7/2019 kottke_joe


© 2006 FeedBurner

24Scalability Problem 7: Master DB Failure

• Still using just a primary and slave

• Master crash: Single point of failure

• No easy way to promote a slave to a master

8/7/2019 kottke_joe


© 2006 FeedBurner

25Solution: No easy answer

• Still using auto_increment

– Multi-master replication is out

• Tried DRBD + HeartBeat

– Disk is replicated block-by-block

– Hot primary, cold secondary

• Didn’t work as we hoped

– Myisamchk takes too long after failure

– I/O + CPU overhead

• InnoDB is supposedly better

8/7/2019 kottke_joe


© 2006 FeedBurner

26Our multi-master solution

• Low-volume master cluster

– Uses DRBD + HeartBeat

– Works well under smaller load

– Does mapping to feed data clusters

• Feed Data Cluster

– Standard Master + Slave(s) structure

– Can be added as needed

8/7/2019 kottke_joe


© 2006 FeedBurner

28Scalability Problem 8: Power Failure

• Chicago has ‘questionable’ infrastructure.

• Battery backup, generators can be problematic

• Colo techs have been known to hit the Big

Red Switch• Needed a disaster recovery/secondary site

– Active/Active not possible for us. Yet.

– Would have to keep fast connection to redundant

site– Would require 100% of current hardware, but

would lie quiet

8/7/2019 kottke_joe


© 2006 FeedBurner

29Code Name: Panic App

• Product Name: Feed Insurance

• Elegant, simple solution

• Not Java (sorry)

• Perl-based feed fetcher– Downloads copies of feeds, saved as flat XML files

– Synchronized out to local and remote servers

– Special rules for click tracking, dynamic GIFs, etc

8/7/2019 kottke_joe


© 2006 FeedBurner

30General guidelines

• Know your DB workload

– Cacti really helps with this

• ‘EXPLAIN’ all of your queries

– Helps keep crushing queries out of the system

• Cache everything that you can

• Profile your code

– Usually only needed on hard-to-find leaks

8/7/2019 kottke_joe


© 2006 FeedBurner

31Our settings / what we use

• Don’t always need the latest and greatest

– Hibernate 2.1

– Spring

– DBCP

– MySQL 4.1

– Tomcat 5.0.x

• Let the container manage DataSources

8/7/2019 kottke_joe


© 2006 FeedBurner

32JDBC

• Hibernate/iBatis/Name-Your-ORM-Here

– Use ORM when appropriate

– Watch the queries that your ORM generates

– Don't be afraid to drop to JDBC

• Driver parameters we use:# For Internationalization of Ads, multi-byte characters in general

useUnicode=true

characterEncoding=UTF-8

# Biggest performance bitscacheServerConfiguration=true

useLocalSessionState=true

# Some other settings that we've needed as things have evolved

useServerPrepStmts=falsejdbcCompliantTruncation=false

kottke_joe

Documents