i volunteer as tribute: the future of oncall (uptime)

Download I volunteer as tribute: the future of oncall (Uptime)

Post on 17-Mar-2018

93 views

Category:

Technology

0 download

Embed Size (px)

TRANSCRIPT

  • @bridgetkromhout

    I volunteer as tribute

    the future of

    oncall

  • @bridgetkromhout

    lives: Minneapolis,

    Minnesota

    works: Pivotal

    podcasts: Arrested DevOps

    organizes: devopsdays

    Bridget Kromhout

  • @bridgetkromhout

    traded oncall for more travel (similar effect on sleep)

  • @bridgetkromhout

    things fall apart

  • @bridgetkromhout

    In a world that celebrates pioneers be the settlers instead.

    Laura Bell (@lady_nerd)

  • @bridgetkromhout

    previously, on #opslife

  • @bridgetkromhout

    Image credit: James Ernest

  • @bridgetkromhout Image credit: 00abstrahiert99 on Flickr

    but #opslife means Im a

    cynical realist

  • @bridgetkromhout

  • @bridgetkromhout

  • @bridgetkromhout

    Attack Kitten is

    skeptical about

    NoOps

  • @bridgetkromhout

    Attack Kitten Cat Reality Check

  • @bridgetkromhout

    empathy

  • @bridgetkromhout

  • @bridgetkromhout

    serverless(in the brave new cloudy-with-a-chance-of-containers world)

  • @bridgetkromhout

    serverless(in the brave new cloudy-with-a-chance-of-containers world)

  • @bridgetkromhout

    two-pizza silo

  • @bridgetkromhout Image credit: Wikipedia

    Any organization that designs a system will produce a design

    whose structure is a copy of the organization's

    communication structure.

    Mel Conway

  • @bridgetkromhout

    Image credit: Vasa Museet

    probably fine

  • @bridgetkromhout

    in a perfect world

  • @bridgetkromhout

    for ops, dont tell devs: gl;hf!

    do: automate document

    share

  • @bridgetkromhout

    for devs, build for operability:

    observability, debuggability, reality

  • @bridgetkromhout

    The Wall of Confusion

  • @bridgetkromhout

    The Wall of Confusion

    yolo nope

  • @bridgetkromhout

    Image credit: wikimedia

  • @bridgetkromhout

    "The past is never dead. It's not even past. William Faulkner

  • @bridgetkromhout

    limited custom dev; network incidents

    oncall handled by ops only

    Image credit: Wallpaper Up

  • @bridgetkromhout

    limited custom dev; colo incidents

    oncall handled by ops only

  • @bridgetkromhout

    low trust; difficult to grant partial access

    oncall handled by ops only

  • @bridgetkromhout

    everyones on call!!1!

    high trust; variable ability

    Image credit: Robot Unicorn Attack 2

  • @bridgetkromhout

    ops on call; devs available

    building trust; variable visibility

  • @bridgetkromhout

    shared oncall; branching decision tree

    follow-the-sun if possible

  • @bridgetkromhout

    oncall investments architecture observability

    culture

  • @bridgetkromhout

  • @bridgetkromhout

    keep on shipping (implementation details vary)

  • @bridgetkromhout

    tree failure?!?

  • @bridgetkromhout

  • @bridgetkromhout

    architecture: plan for continuous partial failure

  • @bridgetkromhout

    CA

    CP AP

    AvailabilityConsistency

    Partition Tolerance

    a partition is a time bound

    on communication.Eric Brewer

  • @bridgetkromhout

    observability: answering questions we didnt know to ask

  • @bridgetkromhout

    observability: understand your environment

  • @bridgetkromhout

    monitoring: the old way

  • @bridgetkromhout

    Monitorin

    g

    monitoring: the new way

  • @bridgetkromhout

    The business:

    UX data for product & engineering Measure value delivered

    Information Technology:

    Visibility into state and failures Product & engineering decisions

    Measure success of projects

    monitoring needs of

    The Art of Monitoring (2016) James Turnbull

    artofmonitoring.com

  • @bridgetkromhout

    culture of collaboration

  • @bridgetkromhout

    a tranquil beach or is it?

  • @bridgetkromhout

  • @bridgetkromhout

  • @bridgetkromhout

    learning culture: be adaptable

  • @bridgetkromhout

    Computers are easy; people are hard

  • @bridgetkromhout

    Massively scalable fault-tolerant distributed systems require a

    significant engineering effort to build and operate; complex socio-technical systems are even more challenging.

    Computers are easy; people are hard

  • @bridgetkromhout

    Who owns your availability? The answer may surprise you!

    Image credit: Wikipedia

  • @bridgetkromhout

    not actually 20 units of devops

  • @bridgetkromhout

    silos are for grain

  • @bridgetkromhout

  • @bridgetkromhout

    still computers

  • oncall blood and tears dont scale

    @bridgetkromhoutgif credit: @paddyforan

  • oncall blood and tears dont scale

    @bridgetkromhoutgif credit: @paddyforan

  • @bridgetkromhout

    dont volunteer as tribute

  • @bridgetkromhout

    dont volunteer as tribute

    invest in architecture, observability, culture

  • @bridgetkromhout

  • @bridgetkromhout

    ,