sre in startup
TRANSCRIPT
![Page 2: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/2.jpg)
What is SRE?
2
![Page 3: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/3.jpg)
"What happens when a software engineer is tasked with what used to be called operations."
» Ben Treynor Sloss, Vice President, Google Engineering, founder of Google SRE
3
![Page 4: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/4.jpg)
"Our work is like being part of the world's most intense pit crew. We change the tires of a race car as it's going 100 mph."
» Andrew Widdowson, Site Reliability Engineer, Mountain View
4
![Page 5: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/5.jpg)
In general, an SRE team is responsible for:
» availability
» latency
» performance
» efficiency
» change management
» monitoring
» emergency response
» capacity planning
5
![Page 6: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/6.jpg)
6
![Page 7: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/7.jpg)
If the team agrees on a 99.9% SLA, that gives them an error budget of
0.1%.
7
![Page 8: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/8.jpg)
8
![Page 9: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/9.jpg)
RuleIf service is in SLA, launch away- clearly DEV team is doing a good job
If service is not within SLA, launch freeze- Until you earn back enough error budget
9
![Page 10: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/10.jpg)
Error budget» removes SRE - DEV conflict
» DEV teams make self-police
10
![Page 11: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/11.jpg)
Common staffing pool» one more SRE = one less Dev
11
![Page 12: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/12.jpg)
SRE hires only coders» they get bored easily
» speak same language as Dev
12
![Page 13: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/13.jpg)
50% cap on ops work» if you succeed works scales with traffic
» coding reduce work / traffic ratio
13
![Page 14: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/14.jpg)
Keep Dev in rotation» 5% ops handled by devs
14
![Page 15: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/15.jpg)
Speaking of Dev and Ops work» excess operations load (tickets, oncall, etc.)
15
![Page 16: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/16.jpg)
SRE portability» no requirement to stick with project or SRE
16
![Page 17: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/17.jpg)
Outages» minimalize impact
» prevent recurrence
17
![Page 18: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/18.jpg)
Minimalize damage» no NOC
» good diagnostic information
» practice, practice, practice
18
![Page 19: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/19.jpg)
Prevent recurrence1. Handle event
2. Write post-mortems
3. Reset
19
![Page 20: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/20.jpg)
Post-mortems philosophy» blameless, focus on process and technology
» create timeline
» get all facts
» create bugs for all followup work
20
![Page 21: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/21.jpg)
How are specific SRE in startup?
21
![Page 22: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/22.jpg)
1:10
22
![Page 23: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/23.jpg)
Horizontal team
23
![Page 24: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/24.jpg)
SaaS oriented
24
![Page 25: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/25.jpg)
Oncall culture
25
![Page 26: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/26.jpg)
It's cool work
26
![Page 28: SRE in Startup](https://reader034.vdocuments.mx/reader034/viewer/2022042723/5883572a1a28ab42678b58b7/html5/thumbnails/28.jpg)
"May the Queries Flow,And the Pagers Remain Silent"
SRE Benediction
28