kottke_joe
TRANSCRIPT
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 1/33
FeedBurner:Scalable WebApplications usingMySQL and Java
Joe Kottke, Director of
Network Operations
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 2/33
© 2006 FeedBurner
2What is FeedBurner?
• Market-leading feed management provider
• 170,000 bloggers, podcasters and commercialpublishers including Reuters, USA TODAY,
Newsweek, Ars Technica, BoingBoing…• 11 million subscribers in 190 countries.
• Web-based services help publishers expandtheir reach online, attract subscribers and
make money from their content• The largest advertising network for feeds
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 3/33
© 2006 FeedBurner
3Scaling history
• July 2004
– 300Kbps, 5,600 feeds
– 3 app servers, 3 web servers 2 DB servers
• April 2005
– 5Mbps, 47,700 feeds– My first MySQL Users Conference
– 6 app servers, 6 web servers (same machines)
• September 2005
– 20Mbps, 109,200 feeds
• Currently
– 115 Mbps, 270,000 feeds, 100 Million hits per day
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 4/33
© 2006 FeedBurner
4Scalability Problem 1: Plain old reliability
• August 2004
• 3 web servers, 3 app servers, 2 DB servers.Round Robin DNS
• Single-server failure, seen by 1/3 of all users
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 5/33
© 2006 FeedBurner
5Solution: Load Balancers, Monitoring
• Health Check pages
– Round trip all the way back to the database
– Same page monitored by load balancers
and monitoring• Monitoring
– Cacti (http://www.cacti.net/)
– Nagios (http://www.nagios.org)
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 6/33
© 2006 FeedBurner
6Health Check
UserComponent uc = UserComponentFactory.getUserComponent();
User user = uc.getUser(”monitor-user");
// If first load, mark as down.// Let FeedServlet mark things as up in init method. load-on-startup
String healthcheck = (String) application.getAttribute("healthcheck");
if(healthcheck == null || healthcheck.length() < 1) {healthcheck = new String(”DOWN");application.setAttribute("healthcheck",healthcheck);
}
// We return null in case of problem, or if user doesn’t existif( user == null ) {
healthcheck = new String("DOWN");application.setAttribute("healthcheck",healthcheck);
}System.out.print(healthcheck);
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 7/33
© 2006 FeedBurner
7Cacti
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 8/33
© 2006 FeedBurner
8Start/Stop scripts
#!/bin/bash
# Source the environment
. ${HOME}/fb.env
# Start TOMCAT
cd ${FB_APPHOME}
# Remove stale temp files
find ~/rsspp/catalina/temp/ -type f -exec rm -rf {} \;
# Remove the work directory
#rm -rf ~/rsspp/catalina/work/*
${CATALINA_HOME}/bin/startup.sh
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 9/33
© 2006 FeedBurner
9Start/Stop scripts
#!/bin/bash
FB_APPHOME=/opt/fb/fb-app
JAVA_HOME=/usr
CATALINA_HOME=/opt/tomcatCATALINA_BASE=${FB_APPHOME}/catalina
CATALINA_OPTS="-Xmx768m -Xms7688m -Dnetworkaddress.cache.ttl=0"
WEBROOT=/opt/fb/webroot
export JAVA_HOME CATALINA_HOME CATALINA_BASE CATALINA_OPTS WEBROOT
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 10/33
© 2006 FeedBurner
10Scalability Problem 2: Stats recording/mgmt
• Every hit is recorded
• Certain hits mean more than others
• Flight recorder
• Any table management locks• Inserts slow way down (90GB table)
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 11/33
© 2006 FeedBurner
11Solution: Executor Pool
• Executor Pool
– Doug Lea’s concurrency library
– Use a PooledExecutor so stats inserts happen in aseparate thread
– Spring bean definition:
<bean id="StatsExecutor"class="EDU.oswego.cs.dl.util.concurrent.PooledExecutor">
<constructor-arg>
<bean class="EDU.oswego.cs.dl.util.concurrent.LinkedQueue"/>
</constructor-arg>
<property name="minimumPoolSize" value="10" /><property name="keepAliveTime" value="5000" />
</bean>
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 12/33
© 2006 FeedBurner
12Solution: Lazy rollup
• Only today’s detailed stats need to go againstreal-time table
• Roll up previous days into sparse summarytables on-demand
• First access for stats for a day is slow,subsequent request are fast
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 13/33
© 2006 FeedBurner
13Scalability Problem 3: Primary DB overload
• Mostly used master DB server for everything
• Read vs. Read/Write load didn’t matter in thebeginnning
• Slow inserts would block reads, when usingMyISAM
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 14/33
© 2006 FeedBurner
14Solution: Balance read and read/write load
• Looked at workload
– Found where we could break up read vs. read/write
– Created Spring ExtendedDaoObjects
– Tomcat-managed DataSources
• Balanced master vs. slave load (Duh)
– Slave becomes perfect place for snapshot backups
• Watch for replication problems
– Merge table problems (later)– Slow queries slow down replication
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 15/33
© 2006 FeedBurner
15Example: Cacti graph of MySQL handlers
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 16/33
© 2006 FeedBurner
16ExtendedDaoObject
• Application code extends this class and usesgetHibernateTemplate() or getReadOnlyHibernateTemplate()depending upon requirements
• Similar class for JDBC
public class ExtendedHibernateDaoSupport extends HibernateDaoSupport {
private HibernateTemplate readOnlyHibernateTemplate;
public void setReadOnlySessionFactory(SessionFactory sessionFactory) {this.readOnlyHibernateTemplate = new HibernateTemplate(sessionFactory);readOnlyHibernateTemplate.setFlushMode(HibernateTemplate.FLUSH_NEVER);
}
protected HibernateTemplate getReadOnlyHibernateTemplate() {return (readOnlyHibernateTemplate == null) ? getHibernateTemplate() :
readOnlyHibernateTemplate;}
}
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 17/33
© 2006 FeedBurner
17Scalability Problem 4: Total DB overload
• Everything slowing down
• Using DB as cache
• Database is the ‘shared’ part of all app servers
• Ran into table size limit defaults on MyISAM(4GB). We were lazy.
– Had to use Merge tables as a bridge to newerlarger tables
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 18/33
© 2006 FeedBurner
18Solution: Stop using the database
• Where possible :)
• Multi-level caching
– Local VM caching (EHCache, memory only)
– Memcached (http://www.danga.com/memcached/)
– And finally, database.
• Memcached
– Fault-tolerant, but client handles that.
– Shared nothing– Data is transient, can be recreated
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 19/33
© 2006 FeedBurner
19Scalability Problem 5: Lazy initialization
• Our stats get rolled up on demand
– Popular feeds slowed down the whole system
• FeedCount chicklet calculation
– Every feed gets its circulation calculated at thesame time
– Contention on the table
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 20/33
© 2006 FeedBurner
20Solution: BATCH PROCESSING
• For FeedCount, we staggered the calculation
– Still would run into contention
– Stats stuff again slowed down at 1AM Chicago time.
• We now process the rolled-up data every night
– Delay showing the previous circulation in theFeedCount until roll-up is done.
• Still wasn’t enough
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 21/33
© 2006 FeedBurner
21Scalability Problem 6: Stats writes, again
• Too much writing to master DB
• More and more data stored associated witheach feed
• More stats tracking– Ad Stats
– Item Stats
– Circulation Stats
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 22/33
© 2006 FeedBurner
22Solution: Merge Tables
• After the nightly rollup, we truncate thesubtable from 2 days ago
• Gotcha with truncating a subtable:– FLUSH TABLES; TRUNCATE TABLE ad_stats0;
– Could succeed on master, but fail on slave
• The right way to truncate a subtable:– ALTER TABLE ad_stats TYPE=MERGEUNION=(ad_stats1,ad_stats2);
– TRUNCATE TABLE ad_stats0;– ALTER TABLE ad_stats TYPE=MERGE
UNION=(ad_stats0,ad_stats1,ad_stats2);
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 23/33
© 2006 FeedBurner
23Solution: Horizontal Partitioning
• Constantly identifying hot spots in thedatabase
– Ad serving
– Flare serving
– Circulation (constant writes, occasional reads)
• Move hottest tables/queries off to own clusters
– Hibernate and certain lazy patterns allow this
– Keeps the driving tables from slowing down
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 24/33
© 2006 FeedBurner
24Scalability Problem 7: Master DB Failure
• Still using just a primary and slave
• Master crash: Single point of failure
• No easy way to promote a slave to a master
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 25/33
© 2006 FeedBurner
25Solution: No easy answer
• Still using auto_increment
– Multi-master replication is out
• Tried DRBD + HeartBeat
– Disk is replicated block-by-block
– Hot primary, cold secondary
• Didn’t work as we hoped
– Myisamchk takes too long after failure
– I/O + CPU overhead
• InnoDB is supposedly better
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 26/33
© 2006 FeedBurner
26Our multi-master solution
• Low-volume master cluster
– Uses DRBD + HeartBeat
– Works well under smaller load
– Does mapping to feed data clusters
• Feed Data Cluster
– Standard Master + Slave(s) structure
– Can be added as needed
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 27/33
© 2006 FeedBurner
27Mapping / Marshalling Database Cluster
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 28/33
© 2006 FeedBurner
28Scalability Problem 8: Power Failure
• Chicago has ‘questionable’ infrastructure.
• Battery backup, generators can be problematic
• Colo techs have been known to hit the Big
Red Switch• Needed a disaster recovery/secondary site
– Active/Active not possible for us. Yet.
– Would have to keep fast connection to redundant
site– Would require 100% of current hardware, but
would lie quiet
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 29/33
© 2006 FeedBurner
29Code Name: Panic App
• Product Name: Feed Insurance
• Elegant, simple solution
• Not Java (sorry)
• Perl-based feed fetcher– Downloads copies of feeds, saved as flat XML files
– Synchronized out to local and remote servers
– Special rules for click tracking, dynamic GIFs, etc
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 30/33
© 2006 FeedBurner
30General guidelines
• Know your DB workload
– Cacti really helps with this
• ‘EXPLAIN’ all of your queries
– Helps keep crushing queries out of the system
• Cache everything that you can
• Profile your code
– Usually only needed on hard-to-find leaks
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 31/33
© 2006 FeedBurner
31Our settings / what we use
• Don’t always need the latest and greatest
– Hibernate 2.1
– Spring
– DBCP
– MySQL 4.1
– Tomcat 5.0.x
• Let the container manage DataSources
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 32/33
© 2006 FeedBurner
32JDBC
• Hibernate/iBatis/Name-Your-ORM-Here
– Use ORM when appropriate
– Watch the queries that your ORM generates
– Don't be afraid to drop to JDBC
• Driver parameters we use:# For Internationalization of Ads, multi-byte characters in general
useUnicode=true
characterEncoding=UTF-8
# Biggest performance bitscacheServerConfiguration=true
useLocalSessionState=true
# Some other settings that we've needed as things have evolved
useServerPrepStmts=falsejdbcCompliantTruncation=false
8/7/2019 kottke_joe
http://slidepdf.com/reader/full/kottkejoe 33/33
© 2006 FeedBurner
33Thank You
Questions?