scaling stack overflow (qcon nyc 2015)

55
1 Scaling Stack Overflow David Fullerton, VP Engineering • @df07 QCon NYC • 2015-06-12

Upload: dfullerton

Post on 19-Aug-2015

19 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Scaling Stack Overflow (QCon NYC 2015)

1

Scaling Stack Overflow

David Fullerton, VP Engineering • @df07

QCon NYC • 2015-06-12

Page 2: Scaling Stack Overflow (QCon NYC 2015)

2

**SPOILERS**

Page 3: Scaling Stack Overflow (QCon NYC 2015)

3

Conclusions

1. Our architecture is boring

Page 4: Scaling Stack Overflow (QCon NYC 2015)

4

Conclusions

1. Our architecture is boring

2. How we keep it boring is interesting

Page 5: Scaling Stack Overflow (QCon NYC 2015)

5

What’s Stack Overflow?

Page 6: Scaling Stack Overflow (QCon NYC 2015)

6

Q&A for Programmers

• 9.4M questions• 16M answers• 45M uniques / month• 8,000 new questions every day

(quantcast.com/stackoverflow.com)

Page 7: Scaling Stack Overflow (QCon NYC 2015)

7

Developer Jobs

• Best place on the internet to get a programming job or hire a developer

Page 8: Scaling Stack Overflow (QCon NYC 2015)

8

Part of Stack Exchange Network

• Stack Overflow-style Q&A in 143 other topics & languages

Page 9: Scaling Stack Overflow (QCon NYC 2015)

9

A Distributed Team

• 34 developers, 6 sysadmins, 6 designers• 75% remote

Page 10: Scaling Stack Overflow (QCon NYC 2015)

10

A Distributed Team

• 34 developers, 6 sysadmins, 6 designers• 75% remote

Page 11: Scaling Stack Overflow (QCon NYC 2015)

11

How do we work?

• Remote work culture• Hire smart people and get out of their way• Full-stack developers / sysadmins with a

specialty

Page 12: Scaling Stack Overflow (QCon NYC 2015)

12

Our Architecture(I warned you, it’s boring)

Page 13: Scaling Stack Overflow (QCon NYC 2015)

13(stackexchange.com/performance)

Page 14: Scaling Stack Overflow (QCon NYC 2015)

14

“Monolith Plus” architecture

• Almost everything happens in the web tier + DB

• A few services pulled out and optimized

Page 15: Scaling Stack Overflow (QCon NYC 2015)

15

Scales pretty well (for us)

• 4 billion requests per month, 3000 req/s peak• 800M SQL queries per day, 8500/s peak

Page 16: Scaling Stack Overflow (QCon NYC 2015)

16

(opserver – https://github.com/opserver/opserver)

Page 17: Scaling Stack Overflow (QCon NYC 2015)

17

(opserver – https://github.com/opserver/opserver)

Page 18: Scaling Stack Overflow (QCon NYC 2015)

18

New York(primary)

Oregon(secondary)

Availability (also boring)

Page 19: Scaling Stack Overflow (QCon NYC 2015)

19

Deploys

• All day every day• Rolling deploys through the web tier

(TeamCity)

Fast!

Page 20: Scaling Stack Overflow (QCon NYC 2015)

20

Testing

• Test on our users• Feature flag

– Turn it on for a subset of sites to see how it performs

Page 21: Scaling Stack Overflow (QCon NYC 2015)

21

* Works for us!

• Read-heavy load centered on one page• Not as much customized content as some sites• A forgiving community

Page 22: Scaling Stack Overflow (QCon NYC 2015)

22

Page 23: Scaling Stack Overflow (QCon NYC 2015)

23

How did we get here?

Page 24: Scaling Stack Overflow (QCon NYC 2015)

24

Our Process

1. Start with what we know

2. Measure it live

3. Fix the slow

Page 25: Scaling Stack Overflow (QCon NYC 2015)

25

Step 1: Start with what we know

• Original developers knew C# and MSSQL• Started with a bunch of off-the-shelf tools:

– ASP.NET MVC– LINQ to SQL– MSSQL + SQL fulltext search– Built-in caching (no Redis)

Page 26: Scaling Stack Overflow (QCon NYC 2015)

26

Step 2: Measure it live

• Performance is a feature!• Test under real load• Measure, don’t guess

Page 27: Scaling Stack Overflow (QCon NYC 2015)

27

(miniprofiler – https://github.com/MiniProfiler/dotnet)

Page 28: Scaling Stack Overflow (QCon NYC 2015)

28

(miniprofiler – https://github.com/MiniProfiler/dotnet)

Page 29: Scaling Stack Overflow (QCon NYC 2015)

29

(opserver – https://github.com/opserver/opserver)

Page 30: Scaling Stack Overflow (QCon NYC 2015)

30

(opserver – https://github.com/opserver/opserver)

Page 31: Scaling Stack Overflow (QCon NYC 2015)

31

Step 3: Fix the slow

• Slow performance is a bug, fix it now!• Over time, replace major parts of our stack:

– Caching and Redis– SQL access– Tag Engine– Elasticsearch

Page 32: Scaling Stack Overflow (QCon NYC 2015)

32

• Already hand-rolling queries for performance• LINQ to SQL provides basic ORM:

Dapper

Page 33: Scaling Stack Overflow (QCon NYC 2015)

33

• Problem:

Dapper

Page 34: Scaling Stack Overflow (QCon NYC 2015)

34

• Solution: replace the object mapper• Idea: emit raw IL, then cache mapper

Dapper

Page 35: Scaling Stack Overflow (QCon NYC 2015)

35

• Results (500 iterations):

Dapper

(dapper– https://code.google.com/p/dapper-dot-net/)

Page 36: Scaling Stack Overflow (QCon NYC 2015)

36

Tag Engine

Page 37: Scaling Stack Overflow (QCon NYC 2015)

37

Tag Engine

• Early hack: use SQL fulltext search to index tags

Page 38: Scaling Stack Overflow (QCon NYC 2015)

38

Tag Engine

• Problem:

Page 39: Scaling Stack Overflow (QCon NYC 2015)

39

Tag Engine

• Problem:

• Performance!

Page 40: Scaling Stack Overflow (QCon NYC 2015)

40

Tag Engine

• Highly custom in-memory tag index cache• Carefully memory-managed to avoid GC stalls

– Learned the hard way: see “Assault by GC” by Marc Gravell

• Serialize / deserialize from disk on build

Page 41: Scaling Stack Overflow (QCon NYC 2015)

41

Results

Page 42: Scaling Stack Overflow (QCon NYC 2015)

42

Results

1. Start with what we know

2. Measure it live

3. Fix the slow

Optimize for performance, get scale thrown in

Page 43: Scaling Stack Overflow (QCon NYC 2015)

43

Results

• “Monolith Plus” architecture • Extract services that solve real problems, not

imagined ones • Avoid SOA “tax”

Page 44: Scaling Stack Overflow (QCon NYC 2015)

44

So my primary guideline would be don’t even consider microservices unless you have a system that’s too complex to manage as a

monolith

- Martin Fowler, “MicroservicePremium”

Page 45: Scaling Stack Overflow (QCon NYC 2015)

45

Conclusions

Page 46: Scaling Stack Overflow (QCon NYC 2015)

46

Conclusions

1. Our architecture is boring

2. How we keep it boring is interesting:

1. Start with what we know

2. Measure it live

3. Fix the slow

Page 47: Scaling Stack Overflow (QCon NYC 2015)

47

Application

• You can optimize for performance and get scale thrown in (almost for free)

• Your monolith can scale further than you think• SOA is not the only way

– Know your own problem space– Fix actual problems

Page 48: Scaling Stack Overflow (QCon NYC 2015)

48

Questions?(We’re all about questions)

Obligatory:

• We’re hiring! stackexchange.com/work-here

• Open source! stackexchange.github.io• Follow me! twitter.com/df07

Page 49: Scaling Stack Overflow (QCon NYC 2015)

49

Page 50: Scaling Stack Overflow (QCon NYC 2015)

50

Here Be Dragons(rejected slides)

Page 51: Scaling Stack Overflow (QCon NYC 2015)

51

• Started with basic OutputCache (cache rendered HTML for a page)

• ~4% cache hit rate

Caching

Page 52: Scaling Stack Overflow (QCon NYC 2015)

52

• Add in-memory & Redis caching

Caching

Page 53: Scaling Stack Overflow (QCon NYC 2015)

53

StackExchange.Redis

• Wrote our own library for talking to Redis• Multiplexing operations over a single connection• Aware of primary / secondary instances

– Can target reads at secondary slave

Page 54: Scaling Stack Overflow (QCon NYC 2015)

54

StackExchange.Redis

(opserver – https://github.com/opserver/opserver)

Page 55: Scaling Stack Overflow (QCon NYC 2015)

55

Moonspeak (Localization)