Riding the N(ode) Train: Dismantling the Monoliths
Tuesday, December 3, 2013
Sean McCullough – Engineer at Groupon @mcculloughsean
Part I
Broken Architecture and
A Changing Business
Business in Early 2012
Page 3
Architecture in 2012
Page 4
0%
20%
40%
60%
80%
100%
January ‘11
January ‘13
October ’12
July ’12
April ’12
January ’12
October ‘11
July ’11
April ’11
March ‘13
June ‘13
Leading the Mobile Commerce Revolution
Page 5
Mobile Transaction Mix Monthly, January 2011 to September 2013 (% of transactions)
September ’13
Product Engineering was Stuck
We couldn’t build features fast enough
We wanted to build features world-wide
Mobile and Web weren’t at feature parity
Page 6
Part II
The Rewrite
Page 7
The Rewrite
Page 8
The Rewrite
Should ...
• be built on APIs for consistent contract with mobile
• be easy to hire developers
• allow for teams to work at their own pace
• allow teams to deploy their own code
• allow for global design changes
• have out of the box I18n/L13n support
• be optimized for our read-heavy traffic pattern
• be small Page 9
How do we…?
• Deploy
• Authorize Users
• Share Sessions
• Route to different applications
• Manage distributed ops
• QA the whole site
Page 10
We Tried This Before and Failed
• Rolled out a new site design in our monolith
• Too many things changed all at once
• Hard to evaluate performance of each feature
Page 11
New Platform Evaluation
We evaluated:
• Node
• MRI Ruby/Rails, MRI Ruby/Sinatra
• JRuby/Rails, Sinatra
• MRI Ruby + Sinatra+EM
• Java/Play, Java/Vertx
• Python+Twisted
• PHPPage 12
Why Node?
• Vibrant community
• NPM!
• Easy to hire JavaScript developers
• Had the minimum viable performance characteristic
• Easy scaling (process model)
Page 13
The First App
Page 14
Growing Pains
Page 15
Poking Holes in our Infrastructure
• Longevity Test over two days
• Try to root out memory leaks
• Talking only to non-production systems
Page 16
Poking Holes in our Infrastructure
Within 2 hours we had a major site outage
Page 17
Poking Holes in our Infrastructure
• SSL termination on our hardware load balancer caused CPU to max out at 100%
• Production systems were using same LB as test and development systems
Page 18
Lessons Learned
• You will run into problems with Node
• You will find problems with your infrastructure
• Don’t panic!
Page 19
The Second App
• Looking for the next page
• Chose the “Browse” page
• Recently Built
• Built using mostly Backbone
• Experienced team of JS developers
Page 20
The Second App
Page 21
The Second App
New Problems:
• User authentication
• More service calls
• Complicated routing
• More traffic
• Needed to share look and feel
Page 22
The Second App
• Cultural problems
• Change of workflow
• Feedback loop fell apart
3 rewrites
6 months to launch
Page 23
Shared Layout
Maintain consistent look and feel across site:
• Distribute layout as library
• Use ESIs for top/bottom of page
• Apps are called through a “chrome service”
• Fetch templates from service
Page 24
Groupon Interface Guidelines
Page 25
Layout Service
• Uses semantic versioning
• Roll forward with bug fixes
• Stay locked on a specific version
• Enable Site-Wide ExperimentsPage 26
Layout Service
Page 27
Layout Service
Page 28
Routing Service
Page 29
The Big Push… or There’s No Going Back
Page 30
• Decided to get the whole company to move at once
• Supporting two platforms is hard – Rip off the band aid!
• End of June 2012 - move to I-Tier by September 1st
The Big Push… or There’s No Going Back
Page 31
• ~150 developers
• Global effort
• Feature freeze – A/B testing against mostly the same features
Part III
It Worked!
Page 32
95% Consumer Traffic On Node
Page 33
Sustained US Traffic Over 120k RPM
Page 34
Our Pages Got Faster
Page 35
It Worked!
Page 36
Success?
Page 37
• Moving to a new platform is not a straight line
• Solving for old problems
• Solving for new problems
• Culture shift
38
• Streaming responses for better performance
• Better resiliency to outages… circuit breakers, brownouts
• Distributed Tracing
• International
• Open Source
New I-Tier apps as we build new teams, products, ideas.
Latest technologies to help us drive our business.
Next Steps
Q&A