serbenfiquista.com drupal performance case study, drupal camp lisbon 2011
DESCRIPTION
Counting more than 6 million pageviews each month and being listed by alexa.com as one of the 200 most visited websites from Portugal, SerBenfiquista.com is probably the most popular website in Portugal powered by Drupal. It is built and maintained since 2001 by a community of fans that only have scarce resources available for running it, so the performance of the site must be planned with caution to tolerate usage peaks that can reach on match days about 2000 online users. In this session every step that we took during the migration to a Drupal architecture will be detailed. Starting by the design and architecture and passing by the configuration performed on several cache levels and database and server tuning. Pressflow, AuthCache, Memcached, nginx, APC and lots of imagination are active stake holders and will be stared in the credits. Unfortunately not everything has the same glory as Benfica, so it is imperative to talk about the problems we had since the opening date and how little details can become critical when exposed to an unexpected traffic load and how we learnt from our experience to prevent them.TRANSCRIPT
Drupal Performance -‐ Case Study
Hernâni Borges de Freitas [email protected]
@hernanibf DrupalCamp Lisboa 2011
Biggest online fan community about Benfica. About 10 years old. Done as hobby by:
Staff around 20 people Members (~31 000 users, 8000 Active Members)
Articles, blog aggregation, image gallery, forum, press aggregation, matches and players’ profiles.
January 2011 7M Pageviews total ▪ 225k / Day
590,848 Visits ▪ 19k Visits / Day ▪ ~8k Unique Visits / Day
12 pages per Session (!) 5500 forum messages per day. 50 blogs aggregated.
According to Alexa.com (Fev 2011) 170th most visited website in Portugal.
On top 50 portuguese websites. Most popular portuguese website made in DRUPAL!
Technology was slowing us down Custom designed CMS, with 10 years legacy code and cache control based in Smarty.
Site must be done by community Workflow and revision process was weak and unsafe
Developments took to much time Hard to implement or change existing features.
Performance problems on traffic peaks Around 1700 users at same time.
Dedicated Server Quad Xeon 2,4 Ghz 4 Gb Memory Lighttpd/Apache as App Server Php with eAccelerator.
Most of pageviews are seen by registered users. Most pageviews are generated by forum (based on Simple Machines Forum).
Cache control was made using smarty and forum cache system.
Development started in August 2010 Done on spare time by 1 Drupalista. We went live in 24th October 2010 (2 months of dev...)
Website redone from scratch Data migration done used custom scripts
60k nodes, 30k users, 2,5k terms, 16 content types 100 modules
Portuguese/English , Web/Mobile Site
Optimize using iterative improvements (Progressive Doping) 1. Architecture 2. Profiling 3. Caching 4. Application diet 5. WebServer 6. Drupal Tips and Tricks 7. Future Ideas
Use Pressflow! High speed drupal fork ▪ Optimized for PHP5 and Mysql (No wrapper functions)
▪ Designed for Database replication and Reverse proxies. ▪ Squid / Varnish
▪ Optimized in session and path handling ▪ Non Anon-‐Sessions (Lazy session creation) ▪ Fast path alias detection Avoid tons of calls to drupal_lookup_path.
Use Devel Module Use xdebug.profiling. Identify heavier pages, functions and queries.
Start by most visited pages Try to identify which functions are taking most time.
You’ll find a non pleasant detail Drupal Bootstrap in a normal site can be slow.
Great to understand how drupal core works! Great to measure cost vs importance
Page Cache
• Only for Anonymous Users
Content Cache
• Block Cache • Filter Cache • Modules Cache
Function Cache
• Static variables already fetched from db
Generic Drupal cache handling
If user is anonymoys and cache system is active On Bootstrap ▪ Check if we have a valid cached page for that url and delivery it without load all modules, render all regions... FAST!
Blocks and some content is also cached, and can be served to authenticated users.
Cached content is stored in database tables Tables are flushed
Nodes and comments are posted Cron runs Explicits calls to cache_clear_all
However most of our traffic is authenticated We can’t use drupal base cache There’s a module for that ! => authcache
AuthCache Register cookie variables on login like roles, login name and login profile.
On page_early_cache bootstrap verify if there’s a cached version of that page to the roles the user belongs.
If there isn’t, do full_bootstrap , render the page and save a rendered version on cache to future usage.
Include a setting in settings.php:
Configure roles and pages to cache We are not caching anything to editors/moderators, neither any page in admin section or content edition.
Be Careful with ajax stuff..
$conf['cache_inc'] = './sites/all/modules/authcache/authcache.inc';
Small problem: all page looks the same to everyone.
We want to customize the header with a pleasant message.
Authcache recommendation is to do page replacements using ajax calls => More http calls
To avoid http traffic I tweaked authcache module to do a str_replace of a certain zone, and start to store cached pages not gzipped.
MySql is not designed to be used as a cache backend.
Fortunately Drupal allows pluggable cache backends.
We started by using cache router, using memcached One bin for most cache table We ended up using memcache module because of several crashs and white screens.
Install MemCache, MemCache Pecl Extension conf = array( 'memcache_servers' => array('localhost:11211' => 'default', 'localhost:11212' => 'cache_block', 'localhost:11213' => 'cache_page', 'localhost:11214' => 'users', 'localhost:11215' => 'session'),
'memcache_bins' => array('cache' => 'default', 'cache_block' => 'cache_block', 'cache_page' => 'cache_page', 'users'=>'users', 'session'=>'session'), 'memcache_key_prefix' => 'sb10prod', );
$conf['cache_inc'] = './sites/all/modules/authcache/authcache.inc';
On popular pages use custom cache!
We are storing in cache_block to allow block refresh when new content arrives.
if($content=cache_get('artigos:home:cronicas','cache_block')) { $block-‐>content=$content-‐>data;
} else {
$view=views_get_view('artigos'); $view-‐>set_display('panel_pane_1'); $block-‐>content=$view-‐>render(); $block-‐>content.='<span style="clear:both;float:right">'.l(t("View all"),'cronicas').'</span>’; cache_set('artigos:home:cronicas',$block-‐>content,'cache_block');
}
Sessions Table was heavily used. We replace it with memcache session module.
Serious dropdown on server load Pressflow already does not store sessions for anon users => non_anon module does the same.
UPDATE sessions SET uid = 1, cache = 0, hostname = '10.1.1.2', session = '', timestamp = 1243567406 WHERE sid = '74db42a7f35d1d54fc6b274ce840736e'
$conf['session_inc'] = './sites/all/modules/memcache/memcache-‐session.inc';
In forum pages just call what is needed. require_once './includes/bootstrap.inc'; drupal_bootstrap(DRUPAL_BOOTSTRAP_PATH); $arg0=arg(0); if($arg0=='forum’) {
require_once './includes/common.inc'; drupal_load('module','serbenfiquista'); drupal_load('module','filter'); drupal_load('module','locale'); require_once './includes/theme.inc';
$content=render_forum(); /* do some load of modules when content not cached, vars will be available at theme …. */ require_once ('./sites/default/themes/serbenfiquista/page.tpl.php');
}
Remember we were redirecting only non-‐static content to apache
What about css/js aggregated files and imagecache files ? => They were going to apache also..
$HTTP["url"] !~ "\.(js|css|png|gif|jpg|ico|txt|swf)$" { proxy.server = ( "" => ( ( "host" => ”localhost ", "port" => 81) ) ) }
07/Nov/2010 – Things went bad
Apache was handling too much connections. We were runnning out of memory, and no more connections available...
After that nightmare by night we decide to switch to nginx.
.
Using php in php-‐fpm mode Configuration based on perusio’s examples:
https://github.com/perusio/drupal-‐with-‐nginx Using APC 3.1.7 as opcode cache to use shared keys in
memory used by SMF.
apc.enabled = 1 apc.shm_segments = 1 apc.shm_size = 300 apc.max_file_size = 100M apc.stat=0 #avoid to check if file was changed
Don’t forget to use Css and JS aggregation to avoid http connections
Index customization (Use EXPLAIN to understand queries)
Run cron twice a day
Do not use cron to import feeds
Use Apache Solr to index your content.
0 */12 * * * cd /var/www/html/ && drush core-‐cron >> /var/log/crondrupal
*/10 * * * * cd /var/www/html/ && drush feeds-‐refresh blogs >> /var/log/cronblogs
select title,created,nid from node use index(node_status_type) where status=1 and type='usernews' order by created desc limit 0,4
We have load peaks when some content is changed: most of cached content is erased.
Control in detail what is cached and expire only what is invalid.
Pre-‐Cache most page details. Use Cache Actions and Rules to clean specific views/blocks /
panes. When page is regenerated its components are already
rendered.
SerBenfiquista.com by: Alberto Rodrigues, André Garcia, André Sabino, André Marques, António Alves, António Costa, Bernardo Azevedo, Diogo Sarmento, Élvio da Silva,Filipe Varela, Francisco Castro, Hugo Manita, Hernâni Freitas, João Pessoa Jorge, João Cunha, João Mariz, José Barata, Isabel Cutileiro,, Luis Correia,Miguel Ferreira, Nelson Vidal, Nuno Silva, Paulo Paulos, Pedro Lança, Pedro Neto, Rafael Santos, Rodrigo Marques, Ricardo Martins, Ricardo Solnado, Valter Gouveia
and plenty others !