Transcript
Page 1: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

I volunteer as tribute

the future of

oncall

Page 2: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

lives: Minneapolis,

Minnesota

works: Pivotal

podcasts: Arrested DevOps

organizes: devopsdays

Bridget Kromhout

Page 3: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

traded oncall… …for more travel (similar effect on sleep)

Page 4: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

things fall apart

Page 5: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

“In a world that celebrates pioneers— be the settlers instead.”

— Laura Bell (@lady_nerd)

Page 6: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

previously, on #opslife…

Page 7: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Image credit: James Ernest

Page 8: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout Image credit: 00abstrahiert99 on Flickr

…but #opslife means I’m a

cynical realist

Page 9: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Page 10: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Page 11: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Attack Kitten is

skeptical about

NoOps

Page 12: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Attack Kitten Cat Reality Check

Page 13: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

empathy

Page 14: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Page 15: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

serverless(in the brave new cloudy-with-a-chance-of-containers world)

Page 16: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

serverless(in the brave new cloudy-with-a-chance-of-containers world)

Page 17: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

two-pizza silo

Page 18: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout Image credit: Wikipedia

“Any organization that designs a system… will produce a design

whose structure is a copy of the organization's

communication structure.”

Mel Conway

Page 19: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Image credit: Vasa Museet

probably fine

Page 20: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

in a perfect world

Page 21: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

for ops, don’t tell devs: gl;hf!

do: automate document

share

Page 22: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

for devs, build for operability:

observability, debuggability, reality

Page 23: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

The Wall of Confusion

Page 24: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

The Wall of Confusion

yolo nope

Page 25: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Image credit: wikimedia

Page 26: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

"The past is never dead. It's not even past.” William Faulkner

Page 27: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

limited custom dev; network incidents

oncall handled by ops only

Image credit: Wallpaper Up

Page 28: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

limited custom dev; colo incidents

oncall handled by ops only

Page 29: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

low trust; difficult to grant partial access

oncall handled by ops only

Page 30: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

everyone’s on call!!1!

high trust; variable ability

Image credit: Robot Unicorn Attack 2

Page 31: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

ops on call; devs available

building trust; variable visibility

Page 32: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

shared oncall; branching decision tree

follow-the-sun if possible

Page 33: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

oncall investments architecture observability

culture

Page 34: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Page 35: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

keep on shipping (implementation details vary)

Page 36: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

tree failure?!?

Page 37: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Page 38: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

architecture: plan for continuous partial failure

Page 39: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

CA

CP AP

AvailabilityConsistency

Partition Tolerance

“a partition is a time bound

on communication.”Eric Brewer

Page 40: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

observability: answering questions we didn’t know to ask

Page 41: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

observability: understand your environment

Page 42: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

monitoring: the old way

Page 43: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Monitorin

g

monitoring: the new way

Page 44: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

The business:

UX data for product & engineering Measure value delivered

Information Technology:

Visibility into state and failures Product & engineering decisions

Measure success of projects

monitoring needs of…

The Art of Monitoring (2016) James Turnbull

artofmonitoring.com

Page 45: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

culture of collaboration

Page 46: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

a tranquil beach… or is it?

Page 47: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Page 48: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Page 49: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

learning culture: be adaptable

Page 50: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Computers are easy; people are hard

Page 51: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Massively scalable fault-tolerant distributed systems require a

significant engineering effort to build and operate; complex socio-technical systems are even more challenging.

Computers are easy; people are hard

Page 52: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Who owns your availability? The answer may surprise you!

Image credit: Wikipedia

Page 53: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

not actually 20 units of devops

Page 54: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

silos are for grain

Page 55: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Page 56: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

still computers

Page 57: I volunteer as tribute: the future of oncall (Uptime)

oncall blood and tears don’t scale

@bridgetkromhoutgif credit: @paddyforan

Page 58: I volunteer as tribute: the future of oncall (Uptime)

oncall blood and tears don’t scale

@bridgetkromhoutgif credit: @paddyforan

Page 59: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

don’t volunteer as tribute

Page 60: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

don’t volunteer as tribute

invest in architecture, observability, culture

Page 61: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

Page 62: I volunteer as tribute: the future of oncall (Uptime)

@bridgetkromhout

,


Top Related