when dispatcher caching is not enough by jakub wądołowski
TRANSCRIPT
When dispatcher caching is not enough…Jakub WądołowskiSenior Systems Engineer @ Cognifide
The What The Why The How
Agenda
The What
It all started in 2012…
www.flickr.com/photos/nasahqphoto/16327416694
To be perfectly honest, initially it was rather like that…
www.flickr.com/photos/garryknight/5703519506
The client
EU pharmaceutical company 75 offices across the globe Over 40 000 employees Medical products available worldwide (180+ countries)
www.flickr.com/photos/worak/2258271659
Country specific brochureware websites for medical products iPad app for sales representatives Single point for content entry Multiple integration points (SSO, user/device authentication, etc.) CQ 5.5, upgrade to AEM 6.1 in progress
Requirements
Main components
Brochureware website
iPad app AEM Authoring
Single datacenter in London (Rackspace)
REST-like API for iPad app Integrations with local and remote
services
Logical architecture
Initially it was just Spain, Argentina and Sweden
6 months later the number of countries was tripled
To finally reach 21 and it is still not over
The Why
“Our team in Argentina complains that the app feels slow. They can’t download presentations sometimes. Could you please investigate that?”
Mr B.
www.flickr.com/photos/r4vi/8640618489
Latency, latency, latency… Way too high round trip times (RTT) Timeouts Broken streams Connection resets Poor Internet connections in some
areas
Problems
Solutions
It has been decided that Hong Kong is the way to go for us
There’s over 10 000 km between London and Buenos Aires…
…which is nearly the same distance as between London and Hong Kong
Client-server problems became server-server ones How we’re going to sync all the changes (both ways)? What about deployments? Do we have enough licenses? What’s the best way to implement content sharding? How long it will take to implement all of these things?
When initial excitement was gone…
www.flickr.com/photos/geishaboy500/2496995573
PoC conclusion
We can’t just cache more on dispatcher This is a very well known problem Let’s use the right tool to solve the problem the right way Content Delivery Network (CDN) is the way to go!
The road to CDN
“(…) CDN is a large distributed system of servers deployed in multiple data centers across the Internet. The goal of a CDN is to serve content to end-users with high availability and high performance. CDNs serve a large fraction of the Internet content today (…).”, Wikipedia
CDN definition
AEM + CDN
www.flickr.com/photos/pictures-of-money/16678590844
CDN, huh?
That's not necessarily true nowadays…
www.flickr.com/photos/halfrain/14410890555
Pay-as-you-go model Powered by Varnish Highly customizable (ability to upload your own VCL) 150 ms to purge – globally ~5 sec to change a config through the web API SSD powered servers connected to T1 networks Real-time insight what’s happening (graphs, logs, etc) Great support
Why Fastly?
https://www.fastly.com/network
Still not convinced?
The How
Ok… how should I start?
www.flickr.com/photos/kleuske/8004416109
www.flickr.com/photos/martinbamford/5638834940
The logs!
grep, awk, sed - all of these are your friends Count your requests Leverage the power of log monitoring tools (ELK, Splunk, etc.) Plan your content structure carefully
Logs and content structure
Look for patterns
www.flickr.com/photos/wwarby/4915777722
If it is a GET request and starts with /bin/myapp/v[1-2]/a_string.json then it is X All requests to /content/something/*/_jcr_content.zip end with 302 to
/some/path/to/file.zip
Request patterns
Assign these patterns to multiple buckets
www.flickr.com/photos/ddebold/15991919514
Public content Private content Content available for authorized users only
Content groups/buckets
Reverse HTTP proxy In-memory time based cache Blazing-fast Big “state” machine Varnish Configuration Language (VCL) Full control of HTTP flow
Varnish in 1 slide!
Cacheable methods: GET, HEAD Cacheable response codes:
200, 203 300, 301, 302 404, 410
“Cache-Control: private” if not defined otherwise
General caching rules
Let’s start with the iPad app
www.flickr.com/photos/pestoverde/15048774061
3 request types REST API request Presentation request (ZIP files) Image request
iPad – HTTP flows
2 content groups Private For all authorized users
8 request patterns TTL varies from 10 minutes to 7 days 35/65 dynamic/static content (frequently changing JSON files vs PDFs/PNGs) All REST API responses are private
iPad app content
Private content is cacheable What makes HTTP response private?
It is tied up with user session – in other words HTTP request carried unique authorization cookie
Private content
www.flickr.com/photos/hyku/368912557
Is it really safe to cache that type of content?
Varnish cache is a key-value store Default key: req.url + req.http.host req.url + req.http.host + sessionId = private cache space - voila!
Private cache
Dynamic means uncacheable?
www.flickr.com/photos/gsfc/7402445224
Cache usually brings some trade-off Updates won’t be instantaneous
TTL has to expire, or a purge request has to be triggered
CDN is the way to go if you accept this delay
Dynamic content
Content purging
www.flickr.com/photos/librariesrock/13522859053
Fastly exposes purge REST API Purge URL Purge Key
Purge all assets marked with special “label” https://www.fastly.com/blog/surrogate-keys-part-1
Purge All Purge vs Soft Purge
https://www.fastly.com/blog/introducing-soft-purge
Content purging
Results
www.flickr.com/photos/89228431@N06/11322953266
Hit ratio: 49,9%
Cache coverage: 66,1%
Requests: 89K
iPad app statistics
What about the speed?
www.flickr.com/photos/129341635@N02/16609174727
Presentation downloads Europe: up to 21% faster South America: up to 50% faster APAC: up to 83% faster
API responses Europe: up to 60% faster South America: up to 40% faster APAC: up to 55% faster
Speed boost
Issues?
www.flickr.com/photos/giuseppemilo/15414290956
Crimes against cacheability
www.flickr.com/photos/alancleaver/4121423119
Adding Set-Cookie to every response Auth cookie is not revoked in the browser after logout TBD
Crimes against cacheability
“iPad app performance is much better now! But we still have some issues with authoring. It is really slow in some countries.”
Mr B.
www.flickr.com/photos/r4vi/8640618489
I was rather skeptical Way too dynamic to be considered cacheable? What kind of improvement we might get? 5-10%? Is it worth it? Don’t know how, but it has been decided to roll things out
CDN in front of authoring?
3 content groups 36 request patterns TTL up to 14 days Mostly dynamic + static web GUI resources A lot of assets common for every logged in user
CDN + AEM Author
Request pattern Cachable?/apps/cq/core/content/login/.*(png|jpg|css|js)$ YES
/libs/cq/i18n/dict.en.json YES
/etc/.*\.(png|woff|css|js|jpg|gif|ttf|svg|eot|swf|ico)$
YES
/cf#/content/myapp/en/about.html NO
Authorized only!
www.flickr.com/photos/rudyjuanito/5170435542
CDN knows nothing about user session The goal is to cache common content for successfully authorized users Authorize them at the edge!
Authorize at the edge
Auth tokens
www.flickr.com/photos/cfortier/426610972
2nd auth cookie (token), readable by CDN HMAC function 2 auth cookies are tied together Reference implementation: https://github.com/fastly/token-functions Private key shared between AEM and CDN CDN can evaluate user session without request to AEM
Auth tokens
96,3%
www.flickr.com/photos/spacexphotos/16169087563
Hit ratio: 96,3%
Cache coverage: 45,7%
Requests: 83K
Author statistics
Adding Set-Cookie to every response Auth cookie is not revoked in the browser after logout “Vary: Cookie” usage
Crimes against cacheability
www.flickr.com/photos/aushiker/20369395093www.flickr.com/photos/andrewhurley/6254409229
What about deployments?
Does every deploy involve full CDN cache purge? Nope!
iPad presentations are packaged in a ZIP file and versioned Majority of authoring related cacheable assets stay untouched between
deployments
AEM deployments
Summary
www.flickr.com/photos/andrewhurley/6254409229
Traffic growth is no longer an issue Over 2 TB monthly reaches CDN servers ~5,5 million HTTP requests per month just ~570 GB was passed through to AEM
License, budget and time savings More than satisfying results Very small changes in the AEM app itself Happy client
Summary