1 scaling stack overflow david fullerton, vp engineering @df07 qcon nyc 2015-06-12

55
1 Scaling Stack Overflow David Fullerton, VP Engineering • @df07 QCon NYC • 2015-06-12

Upload: nathaniel-jenkins

Post on 23-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

1

Scaling Stack Overflow

David Fullerton, VP Engineering • @df07

QCon NYC • 2015-06-12

Page 2: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

2

**SPOILERS**

Page 3: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

3

Conclusions

1. Our architecture is boring

Page 4: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

4

Conclusions

1. Our architecture is boring

2. How we keep it boring is interesting

Page 5: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

5

What’s Stack Overflow?

Page 6: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

6

Q&A for Programmers

• 9.4M questions• 16M answers• 45M uniques / month• 8,000 new questions every day

(quantcast.com/stackoverflow.com)

Page 7: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

7

Developer Jobs

• Best place on the internet to get a programming job or hire a developer

Page 8: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

8

Part of Stack Exchange Network

• Stack Overflow-style Q&A in 143 other topics & languages

Page 9: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

9

A Distributed Team

• 34 developers, 6 sysadmins, 6 designers• 75% remote

Page 10: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

10

A Distributed Team

• 34 developers, 6 sysadmins, 6 designers• 75% remote

Page 11: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

11

How do we work?

• Remote work culture• Hire smart people and get out of their way• Full-stack developers / sysadmins with a

specialty

Page 12: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

12

Our Architecture(I warned you, it’s boring)

Page 13: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

13(stackexchange.com/performance)

Page 14: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

14

“Monolith Plus” architecture

• Almost everything happens in the web tier + DB

• A few services pulled out and optimized

Page 15: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

15

Scales pretty well (for us)

• 4 billion requests per month, 3000 req/s peak• 800M SQL queries per day, 8500/s peak

Page 16: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

16

(opserver – https://github.com/opserver/opserver)

Page 17: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

17

(opserver – https://github.com/opserver/opserver)

Page 18: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

18

New York(primary)

Oregon(secondary)

Availability (also boring)

Page 19: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

19

Deploys

• All day every day• Rolling deploys through the web tier

(TeamCity)

Fast!

Page 20: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

20

Testing

• Test on our users• Feature flag

– Turn it on for a subset of sites to see how it performs

Page 21: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

21

* Works for us!

• Read-heavy load centered on one page• Not as much customized content as some sites• A forgiving community

Page 22: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

22

Page 23: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

23

How did we get here?

Page 24: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

24

Our Process

1. Start with what we know

2. Measure it live

3. Fix the slow

Page 25: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

25

Step 1: Start with what we know

• Original developers knew C# and MSSQL• Started with a bunch of off-the-shelf tools:

– ASP.NET MVC– LINQ to SQL– MSSQL + SQL fulltext search– Built-in caching (no Redis)

Page 26: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

26

Step 2: Measure it live

• Performance is a feature!• Test under real load• Measure, don’t guess

Page 27: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

27

(miniprofiler – https://github.com/MiniProfiler/dotnet)

Page 28: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

28

(miniprofiler – https://github.com/MiniProfiler/dotnet)

Page 29: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

29

(opserver – https://github.com/opserver/opserver)

Page 30: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

30

(opserver – https://github.com/opserver/opserver)

Page 31: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

31

Step 3: Fix the slow

• Slow performance is a bug, fix it now!• Over time, replace major parts of our stack:

– Caching and Redis– SQL access– Tag Engine– Elasticsearch

Page 32: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

32

• Already hand-rolling queries for performance• LINQ to SQL provides basic ORM:

Dapper

Page 33: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

33

• Problem:

Dapper

Page 34: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

34

• Solution: replace the object mapper• Idea: emit raw IL, then cache mapper

Dapper

Page 35: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

35

• Results (500 iterations):

Dapper

(dapper– https://code.google.com/p/dapper-dot-net/)

Page 36: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

36

Tag Engine

Page 37: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

37

Tag Engine

• Early hack: use SQL fulltext search to index tags

Page 38: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

38

Tag Engine

• Problem:

Page 39: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

39

Tag Engine

• Problem:

• Performance!

Page 40: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

40

Tag Engine

• Highly custom in-memory tag index cache• Carefully memory-managed to avoid GC stalls

– Learned the hard way: see “Assault by GC” by Marc Gravell

• Serialize / deserialize from disk on build

Page 41: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

41

Results

Page 42: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

42

Results

1. Start with what we know

2. Measure it live

3. Fix the slow

Optimize for performance, get scale thrown in

Page 43: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

43

Results

• “Monolith Plus” architecture • Extract services that solve real problems, not

imagined ones • Avoid SOA “tax”

Page 44: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

44

So my primary guideline would be don’t even consider microservices unless you have a system that’s too complex to manage as a

monolith

- Martin Fowler, “MicroservicePremium”

Page 45: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

45

Conclusions

Page 46: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

46

Conclusions

1. Our architecture is boring

2. How we keep it boring is interesting:

1. Start with what we know

2. Measure it live

3. Fix the slow

Page 47: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

47

Application

• You can optimize for performance and get scale thrown in (almost for free)

• Your monolith can scale further than you think• SOA is not the only way

– Know your own problem space– Fix actual problems

Page 48: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

48

Questions?(We’re all about questions)

Obligatory:

• We’re hiring! stackexchange.com/work-here

• Open source! stackexchange.github.io• Follow me! twitter.com/df07

Page 49: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

49

Page 50: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

50

Here Be Dragons(rejected slides)

Page 51: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

51

• Started with basic OutputCache (cache rendered HTML for a page)

• ~4% cache hit rate

Caching

Page 52: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

52

• Add in-memory & Redis caching

Caching

Page 53: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

53

StackExchange.Redis

• Wrote our own library for talking to Redis• Multiplexing operations over a single connection• Aware of primary / secondary instances

– Can target reads at secondary slave

Page 54: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

54

StackExchange.Redis

(opserver – https://github.com/opserver/opserver)

Page 55: 1 Scaling Stack Overflow David Fullerton, VP Engineering @df07 QCon NYC 2015-06-12

55

Moonspeak (Localization)