Download - NY Times: so news doesn't break your server
@NYTDevs | developers.nytimes.com
@NYTDevs | developers.nytimes.com
Varnish: Linchpin of the NYTimes.com Re-architecture
Adam E. FalkSoftware Architect, Web Products
@NYTDevs | developers.nytimes.com
Who I Am
A software architect focusing on server configuration and resiliency, with sidelines in DevOps, release engineering, and testing.
Started as a LAMP developer but has always been a generalist interested in all aspects of the data center.
@NYTDevs | developers.nytimes.com
Who We Are
Photo credit: Tony Cenicola/The New York Times
@NYTDevs | developers.nytimes.com
Scope of this Presentation
Everything that follows pertains to the use of Varnish to accelerate serving content on the <www.nytimes.com> hostname, only.
There are several other Varnish clusters at NYTimes.com.
@NYTDevs | developers.nytimes.com
NYTimes.com: Size
15+ million page URLs (1851–present)● Not all HTML; working on that
200+ new page URLs created each day
Millions more image URLs
@NYTDevs | developers.nytimes.com
NYTimes.com: Traffic<www.nytimes.com> normal daily peak is ~75,000 requests/second – just this hostname.
● primarily APIs● HTML traffic is ~4,000 req/sec
Traffic spikes up to 4x during abreaking news event
R.I.P. Leonard Nimoy
@NYTDevs | developers.nytimes.com
2013 Redesign of NYTimes.com
@NYTDevs | developers.nytimes.com
Mission Statement
“Leverage the latest technology in order to improve the user experience, enhance our journalism, and provide a more effective environment for our advertisers.”
Project document
@NYTDevs | developers.nytimes.com
Improve the User Experience
Technical goals:1. 25% improvement in browser load time,
minimum.2. ...
Sounds like a job for page caching!
@NYTDevs | developers.nytimes.com
50% or better improvement in● Time to first byte● Time to paint● Time to page ready
Achievement Unlocked
@NYTDevs | developers.nytimes.com
Brave New World
@NYTDevs | developers.nytimes.com
Exception to the Rule
A complete code rewrite (almost). Why?● < insert usual suspects here >● Deeply embedded server-side personalization
(includes ads)
Output was simply uncacheable.
@NYTDevs | developers.nytimes.com
Never Let a Crisis Go To Waste
☒ (Test|Behavior) Driven Development☒ Web performance was core from Day 0☒ Async wherever, whenever☒ New APIs☒ CSS: LESS (then), SASS (now)
@NYTDevs | developers.nytimes.com
Can We Cache Pages Now?
Yes, Virginia.
</summary>
@NYTDevs | developers.nytimes.com
Spotlights for You
VCL file modular organization
Cache refresh instead of purge
Varnish cluster today
@NYTDevs | developers.nytimes.com
Changing Horses in Midstream
Site functionality that must not break:● redirects (mobile, registration, et. al.)● user tracking● web crawler detection
@NYTDevs | developers.nytimes.com
Best Practice (singular)
@NYTDevs | developers.nytimes.com
Easy Yet Powerful
@NYTDevs | developers.nytimes.com
Easy Yet Powerful
@NYTDevs | developers.nytimes.com
Easy Yet Powerful
@NYTDevs | developers.nytimes.com
Easy Yet Powerful
@NYTDevs | developers.nytimes.com
Greatest Thing Since Sliced Bread☒ Single responsibility principle☒ Code readability (and understanding!)☒ Time spent troubleshooting☒ Coding standards
@NYTDevs | developers.nytimes.com
Intermission
There are only two hard things in Computer Science:
1.Cache invalidation2.Naming things3.Off-by-one errors
http://martinfowler.com/bliki/TwoHardThings.html
@NYTDevs | developers.nytimes.com
Cache Invalidation
Purge is not good enough (in Varnish 3).
PURGE causes cache misses on the highest-traffic content.
Needed cache re(set|build|prime).
@NYTDevs | developers.nytimes.com
NYT Homepage
● Must always be in Varnish cache.● Every article linked to on the
homepage should already be in Varnish cache.
No cache misses = long TTL.
@NYTDevs | developers.nytimes.com
But...
Some content changes frequently.Latest version served in real-time after every publish action.
Short TTL = more cache misses.PURGE = more cache misses.
@NYTDevs | developers.nytimes.com
Cache Rules Everything Around Me
CREAM: an API to re(set|build|prime) a single cache entry.
Publish event calls API synchronously.
@NYTDevs | developers.nytimes.com
req.hash_always_miss = true
CREAM requests the just-updated article to every Varnish server, in parallel.
@NYTDevs | developers.nytimes.com
Where We Are Today: Software
~2,300 lines of VCL code● Minimum of inline C
10 VMODs● std, utils, crashhandler, wurfl, boltsort,
queryfilter● 4 custom
@NYTDevs | developers.nytimes.com
Where We Are Today: Traffic
Of the ~4,000 page requests/second to <www.nytimes.com>:
● ~1,500 now served by Varnish● ~91% cache hit rate (down from
~96%)
@NYTDevs | developers.nytimes.com
Where We Are Today: Performance
Load test: ~3,000 requests/second/server with current configuration
We could handle a 4x spike with 2 servers
We run 8 servers per data center
@NYTDevs | developers.nytimes.com
8 Servers? Why?!Because:
● Biggest spike ever was 10x (2012 Election Night)
● 2 hypervisors => even number of server instances
● Takes too long for us to dynamically provision● We can afford to stay over-provisioned
Yes, this causes extra backend network traffic.
Scaled out for resilience, scaling up for performance.
@NYTDevs | developers.nytimes.com
Next Steps for Us
1. Install Varnish Cache Plus 42. Utilize the Varnish Plus tools for
monitoring.3. Replace CREAM with VHA
@NYTDevs | developers.nytimes.com
Thank You
Adam E. [email protected]
@xenogragadamfalk.com xenograg.com
We’re hiringnytimes.com/careers
@NYTDevs | #timesopen | developers.nytimes.com