moving forward under the weight of all that state

77
Upgrading under the weight of all that state Quinton Anderson

Upload: quinton-anderson

Post on 11-Apr-2017

113 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Moving forward under the weight of all that state

Upgrading under the weight of all that

stateQuinton Anderson

Page 2: Moving forward under the weight of all that state

Context

Page 3: Moving forward under the weight of all that state

Canonical Model

Page 4: Moving forward under the weight of all that state

Source

Source

Source

Source

Raw Data

Business Data

Access Layer

Access Layer

Access Layer

Access Layer

Page 5: Moving forward under the weight of all that state

Load Balance

r

//TODO

Function

Cntrl-V

Scaling

Page 6: Moving forward under the weight of all that state
Page 7: Moving forward under the weight of all that state

Downstream systems• Specialised

management systems

• Reporting Systems• Product management

Channel & product systems

Master Data Management

Hadoop• Leverage all data & reduce

integration costs• Comprehensive dataset –

internal & external, realtime & batch, structured & unstructured

• Advanced analytics / machine learning

Group Data Warehouse• Understand our business• Accurate, conformed, and

reconciled data• Access layer to support BI &

reporting

BI/Reporting• User facing tools• Regulatory reporting• Dishoarding• Self service BI for the

masses

Customer record &insights

All data

Price, conversation,credit dec. etc.

Financial Data

Subset ofdata

Useraccess

‘ Reconciled

data

Information for people

Core Financial Systems and functions• P&L• Recon• General Ledger• Etc…

Closed loop,automated ‘decisions’

Decisioning• Personalise/optimise decisions,

maximise customer value• E.g. price, credit decision, next

conversation, experience

Core information repositories

Analytics applications

Other systems

Page 8: Moving forward under the weight of all that state

Channels

Had

oop

Rules

Serving and decisioning

AnalyticRecords

Systems Of Record

Core Banking Payments

Event Processor

Raw Data

Derived Data

Feature Store

Event Store

Scoring

Machine Learning

www

Event Streams

Customer Information data loaded

Data analysed

& processed

Insights & events captured

Integration API/Service Discovery

Page 9: Moving forward under the weight of all that state

> 4000 Daily Batch Jobs

Page 10: Moving forward under the weight of all that state

> 6 PB of State and growing

Page 11: Moving forward under the weight of all that state

Hbase, Cassandra,

HDFS, Influx,

Elastic Search, Kafka,Etcd,

ZookeeperOpenStack Swift

Page 12: Moving forward under the weight of all that state

Oracle,MySQL,

Postgres

Page 13: Moving forward under the weight of all that state

Hundreds of services

Page 14: Moving forward under the weight of all that state

MR1,MR2,

Spark,Akka

Page 15: Moving forward under the weight of all that state

Dev,Test,

Staging,Prod 1,Prod 2,Etc…

Page 16: Moving forward under the weight of all that state

== Complexity

Page 17: Moving forward under the weight of all that state

Imperative:

Page 18: Moving forward under the weight of all that state
Page 19: Moving forward under the weight of all that state
Page 20: Moving forward under the weight of all that state
Page 21: Moving forward under the weight of all that state

Culture

Page 22: Moving forward under the weight of all that state
Page 23: Moving forward under the weight of all that state
Page 24: Moving forward under the weight of all that state

Architecture

Page 25: Moving forward under the weight of all that state

Immutable

Page 26: Moving forward under the weight of all that state

Someone else’s computer

Page 27: Moving forward under the weight of all that state

State Locality

Page 28: Moving forward under the weight of all that state

Workload non-locality

Page 29: Moving forward under the weight of all that state

Flexible over optimal

Page 30: Moving forward under the weight of all that state

Practically, it is a closed system

Page 31: Moving forward under the weight of all that state

State management is my problem

Page 32: Moving forward under the weight of all that state

All abstractions are leaky

Page 33: Moving forward under the weight of all that state

Repo(s) CI/CD Apps

Docker CalicoMesos Yarn

Spark, MR, Impala, etcMarathon, Chronos, Cassandra, etc

CI/CD

CI/CD

Repo(s)

Repo(s)

Open Stac

k

Nova

Nova/Ironic

OSKVM

OSFirmware + Hardware + Tags

Page 34: Moving forward under the weight of all that state

Strategies

Page 35: Moving forward under the weight of all that state

Outsource the problem, and tool away the resulting issues

Page 36: Moving forward under the weight of all that state

Delete it, tool away the resulting issues

Page 37: Moving forward under the weight of all that state

Be stateless, tool away the resulting issues

Page 38: Moving forward under the weight of all that state

Implement some patterns, incrementally optimise. Tool away the resulting issues

Page 39: Moving forward under the weight of all that state

Excess Capacity

Page 40: Moving forward under the weight of all that state

Patterns

Page 41: Moving forward under the weight of all that state

Consumer

Router

DB

Old Old

Web

App

DB

Web

App

Page 42: Moving forward under the weight of all that state

Consumer

Router

DB

Old Old

Web

App

DB

Web

App

Page 43: Moving forward under the weight of all that state

L4

HAProxy

Old Old Old Old

Page 44: Moving forward under the weight of all that state

L4

HAProxy

Old Old Old Old New

Page 45: Moving forward under the weight of all that state

L4

HAProxy

Old Old Old Old New

Page 46: Moving forward under the weight of all that state

L4

HAProxy

Old Old Old Old New

Page 47: Moving forward under the weight of all that state

L4

HAProxy

Old Old Old New New

Page 48: Moving forward under the weight of all that state

L4

HAProxy

Old Old New New New

Page 49: Moving forward under the weight of all that state

L4

HAProxy

Old New New New New

Page 50: Moving forward under the weight of all that state

L4

HAProxy

New New New New New

Page 51: Moving forward under the weight of all that state

== Incrementally accept risk

Page 52: Moving forward under the weight of all that state

In place upgrade

Page 53: Moving forward under the weight of all that state

Stateful

Page 54: Moving forward under the weight of all that state
Page 55: Moving forward under the weight of all that state

CAP, PACELC

Page 56: Moving forward under the weight of all that state

Data models

Page 57: Moving forward under the weight of all that state

Atomicity

Page 58: Moving forward under the weight of all that state

Access patterns

Page 59: Moving forward under the weight of all that state

Implementation approaches = ??

Page 60: Moving forward under the weight of all that state

Upgrade Duration O(N)

Page 61: Moving forward under the weight of all that state

for node in nodes: if info[node]['instance']:

if Status(node).run().wait() == AVAILBLE_FOR_MAINTENANCE:

MaintenanceMode(node).run().wait()Upgrade(node).run().wait()Health = HealthTests(node).run.wait()UpdateStatus(node, health).run.wait()

Page 62: Moving forward under the weight of all that state

all_good = Truehost = self.cdh.get_host(self.host_map[self.node_name])if host.healthSummary != 'GOOD': all_good = False

# Look up the host by its rolesfor c in self.cdh.get_all_clusters(): for s in c.get_all_services(): for r in s.get_all_roles(): h = r.hostRef if h.hostId == self.host_map[self.node_name]:

if r.healthSummary != 'GOOD': all_good = False

return all_good

Page 63: Moving forward under the weight of all that state

O(log N)

Page 64: Moving forward under the weight of all that state
Page 65: Moving forward under the weight of all that state

nodeComputation = for {_ <- Status(node)_ <-

MaintenanceMode(_,node)_ <- Upgrade(node)nodeResult <- HealthTests(node)

} yield nodeResult

upgrade = for {node <- groupcomp <- nodeComputation(node)

} yield comp.exec

groups.map(upgrade)

Page 66: Moving forward under the weight of all that state

Repo(s) CI/CD Apps

Docker CalicoMesos Yarn

Spark, MR, Impala, etcMarathon, Chronos, Cassandra, etc

CI/CD

CI/CD

Repo(s)

Repo(s)

Open Stac

k

Nova

Nova/Ironic

OSKVM

OSFirmware + Hardware + Tags

Page 67: Moving forward under the weight of all that state

Workflow

Page 68: Moving forward under the weight of all that state

Jenkins

Environment

Branch PR

Merge

Dev

Deploy

Master

Deploy

TestChange Plan

Page 69: Moving forward under the weight of all that state

clusters: green-cluster: dns: nameservers: - x.x.x.x data_domain: *.*.* etcd: token: green-cluster masters: able: provision_id: 1 lan: - mac: 0c:c4:7a:c1:2e:92 ip: 1.1.11.151/24 vlan: 11 gateway: 1.1.1.1 ironic_id: a7af76ad-6583-4209-ba5f-cf1477b6405e flavor: ramish-baremetal-flavor2 image: *mesos-master-green theta: provision_id: 2 lan: - mac: 0c:c4:7a:a9:04:0c ip: 1.1.11.53/24 vlan: 11 gateway: 1.1.1.1 ironic_id: 8ff1fd1c-4893-11e6-a447-2f366077ca0e flavor: ramish-baremetal-flavor2 image: *mesos-master-green tobias: provision_id: 3 lan: - mac: 0c:c4:7a:a8:f6:ac ip: 1.11.11.52/24 vlan: 11 gateway: 1.1.1.1 ironic_id: c89fdd08-232c-40fe-b965-49fc3e4dcba7 flavor: ramish-baremetal-flavor2 image: *mesos-master-green

Page 70: Moving forward under the weight of all that state
Page 71: Moving forward under the weight of all that state

Recommendations

Page 72: Moving forward under the weight of all that state

Instrument as much of deployment and provisioning as you can

Page 73: Moving forward under the weight of all that state

Optimise incrementally, learn the right hard lessons

Page 74: Moving forward under the weight of all that state

Allow for manual intervention, but attack it aggressively

Page 75: Moving forward under the weight of all that state

Encourage your people to intervene

Page 76: Moving forward under the weight of all that state

Prevent Pets

Page 77: Moving forward under the weight of all that state

Spend more time on testing