challenges when building high profile editorial sites

52
BUILDING HIGH PROFILE EDITORIAL SITES YANN MALET 2014.DJANGOCON.EU MAY 2014

Upload: yann-malet

Post on 05-Jul-2015

213 views

Category:

Engineering


2 download

DESCRIPTION

This talk will be a walk through the challenges encountered when building a high profile editorial sites. My goal is to present some of the common pitfalls we have encountered at Lincoln Loop and to explain how we solved: * Legacy migration always take longer * devops * Multiple environment * Easy deployment * Responsive design impacts the backend * Journey of an image * Picturefill.js * Danger of reusing published django applications * Caching strategy * Html fragment * Varnish Audience Decision maker that are going to rebuild their magazine Developer bidding for this kind of projects for the first time

TRANSCRIPT

Page 1: Challenges when building high profile editorial sites

!

BUILDING HIGH PROFILE EDITORIAL SITES

YANN MALET2014.DJANGOCON.EU

MAY 2014

Page 2: Challenges when building high profile editorial sites

ABOUT THIS TALK

● It comes after

− Data Herding: How to Shepherd Your Flock Through Valleys of Darkness (2010)

− Breaking down the process of building a custom CMS (2010)

− Stop Tilting at Windmills - Spotting Bottlenecks (2011)

Page 3: Challenges when building high profile editorial sites

AGENDA

● Foreword

● Multi layer cache to protect your database

● Image management on responsive site

● Devops

Page 4: Challenges when building high profile editorial sites

HIGH PERFORMANCE

Django is web scale...

Page 5: Challenges when building high profile editorial sites

… AS ANYTHING ELSE ...

Page 6: Challenges when building high profile editorial sites

AGENDA

● Foreword

● Multi layer cache to protect your database

● Image management on responsive site

● Devops

Page 7: Challenges when building high profile editorial sites

VARNISH CACHE

Page 8: Challenges when building high profile editorial sites

VARNISH

● Varnish Cache is a web application accelerator

− aka caching HTTP reverse proxy

− 10 – 1000 times faster

!

● This is hard stuff don't try to reinvent this wheel

Page 9: Challenges when building high profile editorial sites

VARNISH: TIPS AND TRICKS

● Strip cookies

● Saint Mode

● Custom error better than guru meditation

Page 10: Challenges when building high profile editorial sites

STRIP COOKIES

● Increasing hit rate is all about reducing

− Vary: on parameters

● Accept-Language

● Cookie

Page 11: Challenges when building high profile editorial sites

STRIP COOKIES

sub vcl_recv { # unless sessionid/csrftoken is in the request, # don't pass ANY cookies (referral_source, utm, etc) if (req.request == "GET" && (req.url ~ "^/static" || (req.http.cookie !~ "sessionid" && req.http.cookie !~ "csrftoken"))) { remove req.http.Cookie; } ... } sub vcl_fetch { # pass through for anything with a session/csrftoken set if (beresp.http.set-cookie ~ "sessionid" || beresp.http.set-cookie ~ "csrftoken") { return (pass); } else { return (deliver); } ... }

Page 12: Challenges when building high profile editorial sites

VARNISH: SAINT MODE

● Varnish Saint Mode lets you serve stale content from cache, even when your backend servers are unavailable.

− http://lincolnloop.com/blog/varnish-saint-mode/

Page 13: Challenges when building high profile editorial sites

VARNISH: SAINT MODE 1/2

# /etc/varnish/default.vcl backend default { .host = "127.0.0.1"; .port = "8000"; .saintmode_threshold = 0; .probe = { .url = "/"; .interval = 1s; .timeout = 1s; .window = 5; .threshold = 3;} } sub vcl_recv { if (req.backend.healthy) { set req.grace = 1h; set req.ttl = 5s; } else { # Accept serving stale object (extend TTL by 6h) set req.grace = 6h; } }

Page 14: Challenges when building high profile editorial sites

VARNISH: SAINT MODE 2/2

!sub vcl_fetch { # keep all objects for 6h beyond their TTL set beresp.grace = 6h; ! # If we fetch a 500, serve stale content instead if (beresp.status == 500 || beresp.status == 502 || beresp.status == 503) { set beresp.saintmode = 30s; return(restart); } }

Page 15: Challenges when building high profile editorial sites

VARNISH: SAINT MODE

.url: Format the default request with this URL.

.timeout: How fast the probe must finish, you must specify a time unit with the number, such as “0.1 s”, “1230 ms” or even “1 h”.

.interval: How long time to wait between polls, you must specify a time unit here also. Notice that this is not a ‘rate’ but an ‘interval’. The lowest poll rate is (.timeout + .interval).

.window: How many of the latest polls to consider when determining if the backend is healthy.

.threshold: How many of the .window last polls must be good for the backend to be declared healthy.

Page 16: Challenges when building high profile editorial sites

VARNISH: CUSTOM ERROR PAGE

sub vcl_error { ... # Otherwise, return the custom error page set obj.http.Content-Type = "text/html; charset=utf-8"; synthetic std.fileread("/var/www/example_com/varnish_error.html"); return(deliver); }

● Use a nicely formatted error page instead of the

default white meditation guru

Page 17: Challenges when building high profile editorial sites

CACHING STRATEGY IN YOUR APP

Page 18: Challenges when building high profile editorial sites

INEVITABLE QUOTE

!„THERE ARE ONLY TWO HARD THINGS IN

COMPUTER SCIENCE: CACHE INVALIDATION AND NAMING

THINGS, AND OFF-BY-ONE ERRORS.“ !

– PHIL KARLTON

Page 19: Challenges when building high profile editorial sites

CACHING STRATEGY

● Russian doll caching

● Randomized your cache invalidation for the HTML cache

● Cache buster URL for your HTML cache

● Cache database queries

● More resilient cache backend

Page 20: Challenges when building high profile editorial sites

RUSSIAN DOLL CACHING

● Nested cache with increasing TTL as you walk down

{% cache MIDDLE_TTL "article_list" request.GET.page last_article.id last_article.last_modified %} {% include "includes/article/list_header.html" %} <div class="article-list"> {% for article in article_list %} {% cache LONG_TTL "article_list_teaser_" article.id article.last_modified %} {% include "includes/article/article_teaser.html" %} {% endcache %} {% endfor %} </div> {% endcache %}

Page 21: Challenges when building high profile editorial sites

RUSSIAN DOLL CACHING

It get faster as traffic increases

Page 22: Challenges when building high profile editorial sites

try: expire_time = int(expire_time) expire_time = randint(expire_time * 0.8, expire_time * 1.2) except (ValueError, TypeError): raise TemplateSyntaxError( '"cache" tag got a non-integer timeout value: %r' % expire_time)

RANDOMIZED CACHE TTL

● Do not invalidate all the `X_TTL` at the same time

− Modify cache templatetag: TTL +/- 20%

● Fork the {% cache … %} templatetag

Page 23: Challenges when building high profile editorial sites

CENTRAL TTL DEFINITION

● Context processor to set TTL

− SHORT_TTL

− MIDDLE_TTL

− LONG_TTL

− FOR_EVER_TTL (* not really)

Page 24: Challenges when building high profile editorial sites

RESILIENT CACHE BACKEND

● Surviving node outages is not included

− Wrap the Django cache backend in try / except

− You might also want to report it in New Relic

● Fork Django cache backend

Page 25: Challenges when building high profile editorial sites

CACHE BUSTER URL

● http://example.com/*/?PURGE_CACHE_HTML

● This URL

− traverses your stack

− purges the HTML cache fragment

− generates fresh one

!

● Fork the {% cache … %} templatetag

Page 26: Challenges when building high profile editorial sites

# johnny/cache.py def enable(): """Enable johnny-cache, for use in scripts, management commands, async workers, or other code outside the Django request flow."""

get_backend().patch()

CACHING DB QUERIES

● Johnny cache

− It is a middleware so there is surprising side effects

− If you change the DB outside request / response

Page 27: Challenges when building high profile editorial sites

MULTIPLE CACHE BACKENDS

!CACHES = { 'default': { 'BACKEND': 'project.apps.core.backends.cache.PyLibMCCache', 'OPTIONS': cache_opts, 'VERSION': 1}, 'html': { 'BACKEND': 'myproject.apps.core.backends.cache.PyLibMCCache', 'TEMPLATETAG_CACHE': True, 'VERSION': 1}, 'session': { 'BACKEND': 'myproject.apps.core.backends.cache.PyLibMCCache', 'VERSION': 1, 'OPTIONS': cache_opts,}, 'johnny': { 'BACKEND': 'myproject.apps.core.backends.cache.JohnnyPyLibMCCache', 'JOHNNY_CACHE': True, 'VERSION': 1} }

Page 28: Challenges when building high profile editorial sites

CACHED_DB SESSION

SESSION_ENGINE = "Django.contrib.sessions.backends.cached_db" SESSION_CACHE_ALIAS = "session"

Page 29: Challenges when building high profile editorial sites

AGENDA

● Foreword

● Multi layer cache to protect your database

● Image management on responsive site

● Devops

Page 30: Challenges when building high profile editorial sites

RESPONSIVE DESIGN IMPACTS

● 3x more image sizes

− Desktop

− Tablet

− Mobile

Page 31: Challenges when building high profile editorial sites

IMAGE MANAGEMENT

● Django-filer

● Easy-thumbnails

● Cloudfiles (cloud containers)

!

● Assumption of fast & reliable disk should be forgotten

− The software stack is not helping, a lot of work is left to you

● Forked − Dajngo-filer (fork)

− Easy-thumbnails (Fork - very close to to be able to drop it)

− Django-cumulus (81 Forks)

− Monkey patch pyrax

− ...

Heein!!!

Page 32: Challenges when building high profile editorial sites

DJANGO-CUMULUS

● The truth is much worst

− Log everything from the swiftclient

● Target 0 calls to the API and DB on a hot page

− The main repo is getting better ...

'loggers': { ... 'Django.db': { 'handlers': ['console'], 'level': 'DEBUG', 'propagate': True, }, 'swiftclient': { 'handlers': ['console'], 'level': 'DEBUG', 'propagate': True, },

Page 33: Challenges when building high profile editorial sites

DJANGO-CUMULUS

● Django storage backend for Cloudfiles from Rakspace

− Be straight to the point when talking to slow API

diff --git a/cumulus/storage.py b/cumulus/storage.py @@ -201,6 +202,19 @@ class SwiftclientStorage(Storage): ... + def save(self, name, content): + """ + Don't check for an available name before saving, just overwrite. + """ + # Get the proper name for the file, as it will actually be saved. + if name is None: + name = content.name + name = self._save(name, content) + # Store filenames with forward slashes, even on Windows + return force_text(name.replace('\\', '/'))

Page 34: Challenges when building high profile editorial sites

DJANGO-CUMULUS

Trust unreliable API at scale

diff --git a/cumulus/storage.py b/cumulus/storage.py @@ -150,8 +150,11 @@ class SwiftclientStorage(Storage): def _get_object(self, name): """ Helper function to retrieve the requested Object. """ - if self.exists(name): + try: return self.container.get_object(name) + except pyrax.exceptions.NoSuchObject as err: + pass @@ -218,7 +221,7 @@ class SwiftclientStorage(Storage): def exists(self, name): """ exists in the storage system, or False if the name is available for a new file. """ - return name in self.container.get_object_names() + return bool(self._get_object(name))

Page 35: Challenges when building high profile editorial sites

PATCH PYRAX

● Assume for the best

− Reduce the auth attempts

− Reduce the connection timeout

def patch_things(): # Automatically generate thumbnails for all aliases models.signals.post_save.connect(queue_thumbnail_generation) # Force the retries for pyrax to 1, to stop the request doubling pyrax.cf_wrapper.client.AUTH_ATTEMPTS = 1 pyrax.cf_wrapper.client.CONNECTION_TIMEOUT = 2

Page 36: Challenges when building high profile editorial sites

GENERATE THE THUMBS

● Generate the thumbs as soon as possible

− post save signals that offload to a task

− easy-thumbnails

def queue_thumbnail_generation(sender, instance, **kwargs): """ Iterate over the sender's fields, and if there is a FileField instance (or a subclass like MultiStorageFileField) send the instance to a task to generate All the thumbnails defined in settings.THUMBNAIL_ALIASES. """ …

Page 37: Challenges when building high profile editorial sites

PICTUREFILL.JS

… A Responsive Images approach that you can use today that mimics the proposed picture element using spans...

− Old API demonstrated 1.2.1

<span data-picture data-alt="A giant stone facein Angkor Thom, Cambodia"> <span data-src="small.jpg"></span> <span data-src="medium.jpg" data-media="(min-width: 400px)"></span> <span data-src="large.jpg" data-media="(min-width: 800px)"></span> <span data-src="extralarge.jpg" data-media="(min-width: 1000px)"></span> <!-- Fallback content for non-JS browsers. Same img src as the initial, unqualified source element. --> <noscript> <img src="small.jpg" alt="A giant stone face in Angkor Thom, Cambodia"> </noscript> </span>

Page 38: Challenges when building high profile editorial sites

PUTTING IT ALL TOGETHER 1/2

<!-- article_list.html --> {% extends "base.html" %} {% load image_tags cache_tags pagination_tags %} {% block content %} {% cache MIDDLE_TTL "article_list_" category author tag request.GET.page all_pages %} <div class="article-list archive-list "> {% for article in object_list %} {% cache LONG_TTL "article_teaser_" article.id article.modified %} {% include "newsroom/includes/article_teaser.html" with columntype="categorylist" %} {% endcache %} {% endfor %} </div> {% endcache %}

● Iterate through article_list

● Nested cache

Page 39: Challenges when building high profile editorial sites

PUTTING IT ALL TOGETHER 2/2

<!-- article_teaser.html --> {% load image_tags %} <section class="blogArticleSection"> {% if article.image %} <a href="{{ article.get_absolute_url }}" class="thumbnail"> <span data-picture data-alt="{{ article.image.default_alt_text }}"> <span data-src="{{ article.image|thumbnail_url:"large" }}"></span> <span data-src="{{ article.image|thumbnail_url:"medium" }}" data-media="(min-width: 480px)"></span> <span data-src="{{ article.image|thumbnail_url:"small" }}" data-media="(min-width: 600px)"></span> <noscript> <img src="{{ article.image|thumbnail_url:"small" }}" alt="{{ article.image.default_alt_text }}"> </noscript> </span> </a> {% endif %} ...

Use Picturefill to render your images

Page 40: Challenges when building high profile editorial sites

AGENDA

● Foreword

● Multi layer cache to protect your database

● Image management on responsive site

● Devops

Page 41: Challenges when building high profile editorial sites

DEVOPS

● Configuration management

● Single command deployment for all environments

● Settings parity

Page 42: Challenges when building high profile editorial sites

CONFIGURATION MANAGEMENT

● Pick one that fits your brain & skillset

− Puppet

− Chef

− Ansible

− Salt

● At Lincoln Loop we are using Salt

− One master per project

− Minion installed on all the cloud servers

Page 43: Challenges when building high profile editorial sites

SALT

● Provision & deploy a server role ● +X app servers to absorb a traffic spike

● Replace an unsupported OS

● Update a package

● Run a one-liner command − Restart a service on all instances

● Varnish, memcached, ...

− Check the version

Page 44: Challenges when building high profile editorial sites

SINGLE COMMAND DEPLOYMENT

● One-liner or you will get it wrong

● Consistency for each role is critical

− Avoid endless debugging of pseudo random issue

Page 45: Challenges when building high profile editorial sites

SETTING PARITY

● Is the Utopia you want to tend to but …

− There are some differences

● Avoid logic in settings.py

● Fetch data from external sources: .env

Page 46: Challenges when building high profile editorial sites

SETTINGS.PY READS FROM .ENV

import os import ConfigParser from superproject.settings.base import * TEMPLATE_LOADERS = ( ('Django.template.loaders.cached.Loader', TEMPLATE_LOADERS),) config = ConfigParser.ConfigParser() config.read(os.path.abspath(VAR_ROOT + "/../.env")) DATABASES = { 'default': { 'ENGINE': 'Django.db.backends.mysql', 'NAME': config.get("mysql", "mysql_name"), 'USER': config.get("mysql", "mysql_user"), 'PASSWORD': config.get("mysql", "mysql_password"), 'HOST': config.get("mysql", "mysql_host"), 'PORT': config.get("mysql", "mysql_port"), } }

Page 47: Challenges when building high profile editorial sites

CONCLUSION

● Multi-layer Cache to protect your database − Varnish − Russian doll cache for the HTML fragments

● Smart key naming and invalidation condition ● Cache buster URL

● Image management

− Harder on high traffic responsive site

− Software stack not mature

● Devops

− Configuration management is a must

− Try to have settings parity between your environment

Page 48: Challenges when building high profile editorial sites

HIGH PERFORMANCE DJANGO

Kickstarter http://lloop.us/hpd

Page 49: Challenges when building high profile editorial sites

BACKUP SLIDES

Page 50: Challenges when building high profile editorial sites

A WORD ABOUT LEGACY MIGRATION

● This is often the hardest part to estimates

− Huge volume of data

− Often inconsistent

− Unknown implicit business logic

!

● At scale if something can go wrong it will

● It always take longer

Page 51: Challenges when building high profile editorial sites

REUSING PUBLISHED APPLICATIONS

● Careful review before adding an external requirements

− Read the code

● Best practice

● Security audit

− Can operate at your targeted scale

− In line with the rest of your project

● It is not a binary choice you can

− extract a very small part

− Write your own version based on what you learned

Page 52: Challenges when building high profile editorial sites