bad machinery: managing interrupts under load · why the model breaks down interrupt load is too...

18
Google Confidential and Proprietary Bad Machinery: Managing Interrupts Under Load Dave O'Connor (daveoc@), Storage SRE

Upload: others

Post on 07-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

Bad Machinery: Managing Interrupts Under Load

Dave O'Connor (daveoc@), Storage SRE

Page 2: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

Whut

● Google SRE since 2004○ GMail○ Google Apps Console○ Reader○ Google Accounts○ Google Analytics○ BigTable○ Colossus○ Spanner○ Logs○ MySQL

Page 3: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

Interrupts vs. Projects

Interrupts

● More tactical, immediate fixes.

● Usually not so much.

● More often not.

Projects

● Contribute to development of service medium/long term.

● Highly Creative.

● (Theoretically) fun to do.

Page 4: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

How Interrupts work in Theory

Page 5: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

How Interrupts work in Practice

Page 6: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

Why the model breaks down

● Interrupt load is too much for 1+2.

● Interrupts are specialised to a person or subset of the team.

● Intentionally.

Page 7: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

Antipatterns

● "The Gauntlet"

● "The Busy Worker"

● "The Amazing Disappearing Category"

Page 8: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

Things We Must Discuss

● Context Switches are Hard

Page 9: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

Things We Must Discuss

● Cognitive Flow State is Hard

Page 10: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

Things We Must Discuss

Oncall is a Project

Oncall does not care whether you consider it a priority. It just is.

Page 11: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

Things We Must Discuss

Fairness is Easy to Program

...if you assume people are machines

Page 12: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

Fundamentals

● Humans are Bad Machinery.

"Humans are bad machinery. They get bored, they have processors (and sometimes UIs) that aren’t very well-understood, and aren’t very efficient"

● Cognitive Flow is Precious.

● Time should be Polarised. Do one thing well.

Page 13: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

Practicals

● Polarise Time

Page 14: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

Practicals

● Think about Interruptibility

Page 15: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

Practicals

Do For Tickets What You Do For Pages

Page 16: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

Practicals

Respect Yourselves

Page 17: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

Other Lessons Learned

● Email alerts are from the past.

● Consensus is nice.

● Policy is as powerful a tool as code.

● The A stands for agreement.

Page 18: Bad Machinery: Managing Interrupts Under Load · Why the model breaks down Interrupt load is too much for 1+2. ... Oncall does not care whether you consider it a priority. It just

Google Confidential and Proprietary

How Can I Apply this?

● Minimise time the individual can be interrupted.

● Do for Interrupts what you do for Oncall

● Respect the Customer and Respect Yourself.