geecon 2016: scaling microservices at gilt

63
h u d s o n b a y t e c h

Upload: adrian-trenaman

Post on 16-Apr-2017

239 views

Category:

Software


2 download

TRANSCRIPT

h u ds o nb a yt e c h

~300… the number of micro services running gilt.com.

But what about the Persian horde?

Gilt: luxury designer brands at discounted prices

we shoot the product in our studios

we receive, store, pick, pack and ship...

we sell every day at noon...

stampede...

this is what the stampede really looks like...

The Hype Cycle

2011: Boo: we have a monolith! Maybe these micro-services can help us move faster!

2012: This is AMAZING!

Gilt’s Microservice Hype Cycle

2013 :Look at all these services!

service growth over time: point of inflexion === scala.

2011: Boo: we have a monolith! Maybe these micro-services can help us move faster!

2012: This is AMAZING!

2013: Look at all these services!

2014: Holy cr&p, what have we done? Look at ALL these services

2015: Let’s get a handle on§ this

Gilt’s Microservice Hype Cycle

2016: Ah, the sweet taste of awesome sauce.

from rails to riches

rails to riches: 2007 - ruby-on-rails monolith

2011: java, loosely-typed, monolithic services

(5) Hidden linkages; buried business logic

(4) Monolithic Java App; huge bottleneck for innovation.

(2) Lots of duplicated code :(

(3) Teams focused on business lines

(1) Large loosely-typed JSON/HTTP services

enter: µ-services

“How can we arrange our teams around strategic initiatives? How can we make it fast

and easy to get to change to production?”

2015: micro-services

driving forces behind gilt’s emergent architecture● team autonomy● voluntary adoption (tools, techniques, processes)● kpi or goal-driven initiatives● failing fast and openly● open and honest, even when it’s difficult

anatomy of a gilt service

anatomy of a gilt service - typical choices

gilt-service-framework,

log4j, cloudwatch Cave,

, , javascript

or

service discovery: straight forward

zookeeper

Brocade Traffic Manager (aka Zeus, Stringray, SteelApp,...)

cloudiness

from bare-metal...

PHXIAD

… to vapour.

Lift-and-shift + elastic teams

Existing Data Centre

Dual 10Gb direct connect line, 2ms latency.

‘Legacy VPC’

MobileCommon Person-alisation Admin Data

(1) Deploy to VPC

(2) ‘Department’ accounts for elasticity & devops

single tenant: one EC2 instance per service instance

reproducible, immutable deployments: docker

service discovery: same pattern, different LB

zookeeper

Amazon ELB

# running instances per service: ‘rule of three’

AWS instance sizing

Lessen dependencies between teams: faster code-to-prod

Lots of initiatives in parallel

Your favourite <tech/language/framework> here

We (heart) μ-servicesGraceful degradation of service

Disposable Code: easy to innovate, easy to fail and move on.

We (heart) cloudDo devops in a meaningful way.Low barrier of entry for new tech (dynamoDB, Kinesis, ...)Isolation

Cost visibilitySecurity tools (IAM)Well documentedResilience is easyHybrid is easyPerformance is great

Lessons from the Slope:

1. µservice architecture is emergent

2. manage ownership & risk

3. make your clients thin

4. avoid snowflakes

5. test in production where possible

emergent architecture

It’s hard to think of architecture in one dimension.

n = 265, where n is the number of services.

… we used a “spread sheet”.‘The Gilt Genome Project’

It’s hard to think of architecture in one dimension.

We added ‘Functional Area’, ‘System’ and ‘Subsystem’ columns to Gilt Genome; provides a strong (although subjective) taxonomy.

It turns out we have an elegant, emergent architecture.

Some services / components are deceptively simple.

Others are simply deceptive, and require knowledge of their surrounding ‘constellation’

n = 265, where n is the number of services.

Deceptively Simple - many services are small; < 2048 loc

Deceptively Simple - many services are small, < 32 files.

Gilt Admin (Legacy Ruby on Rails Application)

City

Discounts FinancialReporting

Fraud Mgmt

Gift Cards Inventory Mgmt Order Mgmt

Sales Mgmt Product Catalog

Purchase Orders

Targetting

Billing

Other Admin Applications (Scala + Play Framework)*

City Creative (2) CS

Discounts Distribution i18n Inventory (2)

Order Processing

(2)Util

Service Constellations (Scala, Java)*

Auth (1) Billing (1) City (6) Creative (4) CS (2) Discounts (1) Distribution (9) i18n (3) inventory (6)

Order Processing

(8)Payments (3) Product

Catalog (5) Referrals (1) Util (2)

Core Database - ‘db3’

Job System (Java, Ruby)

Gilt Logical Architecture - Back Office Systems

* counts denote number of service / app components.

Simply deceptive: service context only make sense in constellation.

Emergent Architecture: Using the three-level taxonomy approach, we’ve been able to get a better understanding of an emergent architecture, at a department level, and where the complexity lies.

We’ve also concluded that the department is the right level of granularity for consensus on technical decisions (language, framework, …)

Gilt’s Architecture Board set’s the overall standards that teams must follow when interacting across departmental boundaries. HTTP. REST. DNS. AWS.

ownership

1. Software is owned by departments, tracked in ‘genome project’. Directors assign services to teams.

2. Teams are responsible for building & running their services; directors are accountable for their overall estate.

bottom-up ownership, RACI-style

Notes:

Zero Power, High Influence: The Architecture Board https://github.com/gilt/arch-board Gilt Standards and Recommendations: https://github.com/gilt/standards

The perfect size for a team

5 ± 2

The perfect size for a ‘department’ (team of teams)

20 ± 4

30%Amount of time a department should spend on operations / maintenance / red-hot.

We build the notion of SRE (Site Reliability Engineering) into the team.

‘ownership donut’ informs tech strategy

We classify ownership as: active, passive, at-risk.

‘done’ === 0% ‘at risk’

Getting a handle on ownership...

Jul 2015 Sep 2015

Oct 2015 Feb 2015

Emergent Architecture + Ownership Oriented Org: “You just pulled an inverse Conway manoeuvre”

Back-Office Personalisation Mobile Web & Core Services

Back-Office Personalisation Mobile Web & Core Services

Architectural Area

Department

thin clients

Consumer Dependencies

ConsumerRepo

Take as few code dependencies as possible. This stuff HURTS when n ~= 300.

Service Code

Common Code

Client Code

ServiceRepo

Service Dependencies

Client JAR

Dependency hell as client JAR dependencies conflicts with service dependencies.

X

This is way easier. http://apidoc.me

<<apidoc>>Service API

Service CodeServiceRepo Service

Dependencies

Consumer Dependencies

ConsumerRepo

apidoc: define RESTful service API agnostically and generate dependency free, thin clients.

Client Code

<< generate>>

Service Stub

<< generate>>

stop building snowflakes

7 different code deployment pipelines... Really?

6Andrey’s Rule of Six:

“We could solve this now, or, just wait six months, and Amazon will provide a solution”

Andrey Kartashov, Distinguished Engineer, Gilt.

Current thinking on deployment:

(1) Re-use as much AWS tooling as possible: Code Pipeline, Code Deploy, Cloud Formation.

(2) Very lightweight tool chain to support dark canaries, canary releases, phased roll-out and roll-back: NOVA

https://github.com/gilt/nova

testing in production

Testing and TiPMaintaining stage environments in a micro-service architecture is HARD.

Prefer to test in production where possible: use dark canaries, canaries, staged roll-out and roll-back.

Invest in monitoring and alerting over hard-to-maintain test pipelines.

Where teams need a stage environment, let them build a minimal environment, and manage it themselves.

Estimate: about 85% of Gilt’s teams use TiP techniques; 15% need a stage environment.

Lessons from the Slope:

1. µservice architecture is emergent

2. manage ownership & risk

3. make your clients thin

4. avoid snowflakes

5. test in production where possible

#thanks @adrian_trenaman @gilttech