operational and scaling wins at workday · 2019. 2. 26. · cassandra rabbitmq zookeeper contrail...

60
Operational and Scaling Wins at Workday From 50K to 300K Cores OpenStack Summit Berlin 2018

Upload: others

Post on 16-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Operational and Scaling Wins at WorkdayFrom 50K to 300K Cores

OpenStack SummitBerlin 2018

Page 2: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Edgar Magana Imtiaz Chowdhury

Architecture Overviewand Use Cases

Kyle Jorgensen

Clearing the Image Distribution Bottleneck

Sergio de Carvalho

Identifying and Fighting Scaling Issues

Howard Abrams

Monitoring, Loggingand Metrics

Moderator Image Challenges API ChallengesInstrumentation

Page 3: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Workday provides enterprise cloud applications for financial management, human capital management (HCM), payroll, student systems, and analytics.

Page 4: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

OpenStack @ Workday

Our Story

Page 5: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Our Journey So Far

Cloud Engineering

Team formed

2013 201920182017201620152014

OpenStack Icehouse

in Development - Internal workload

Deployment automation tools ready. - 2 Workday

services in QA

First production workload

OpenStack Mitaka

Development- 14 services

- Production workload on Mitaka- 39 services

50% of production workloads

on OpenStack

Page 6: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Workday Private Cloud Growth

Revenue US $273M

Page 7: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Our Private Cloud

5Data Centers

45Clusters

4kActive VM Images

4.6kCompute Hosts

300kCores

22kRunning VMs

Page 8: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

How Workday Uses the Private Cloud

Weekly update Narrow Update Window

https://www.blockchainsemantics.com/blog/immutable-blockchain/

Immutable Images

Page 9: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Architecture EvolutionArchitecture Evolution

Page 10: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Initial Control Plane Architecture

MySQL

glance

keystone

nova

rabbitmq

OpenStack Controller

SDN Controller

Cassandra

rabbitmq

zookeeper

Contrail API

Page 11: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Key drivers for architectural evolution

Downtime upgrade0 Provide upgrade path without

affecting the workload

High availability 99% Make critical services highly available

Scalability400% Scale API services horizontally

Page 12: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Control Plane

HAProxy 1

Controllers

HAProxy 2

Clients

rsOpenStack Controllers

SDN Controllers

Stateless API services Stateful services

Page 13: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Logging and Monitoring and Metrics, Oh My!

Instrumentation

Page 14: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

● No access to production systems: full automation

● Dispersed logs among multiple systems

● Sporadic issues with services:

“What do you mean RabbitMQ stopped!?”

● Vague or subjective concerns:

“Why is the system slow!?”

Instrumentation Challenges

Page 15: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

OpenStackNode

Instrumentation Architecture

Big Panda

Alerts

Wavefront

Metrics

LogCollector

Logs

HA ELK

Log MessagesSensuClient

Uchiwa

Checks

Page 16: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

For each issue, we:● Fixed the issue/bug● Wrote tests to address the issue/bug● Wrote a check to alert if it happened again

Monitoring

Page 17: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Our customers use our project (OpenStack), a particular way…

For each node in each cluster, test by:● Start a VM with a particular image● Check DNS resolves host name● Verify SSH service● Validate LDAP access● Stop the VM

Rinse and Repeat

Example: Our Health Check

Page 18: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

CRITICAL: Health validation suite had failures. Connection Error - While attempting to get VM details. See logging system with r#3FBM for details.

Troubleshooting Issues

Internal WikiSupportDocuments

Check FailureDetails

Internal LoggingCollection System

Page 19: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Troubleshooting with Logs

Page 20: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Troubleshooting with Logs

Page 21: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Troubleshooting with Logs

Page 22: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Troubleshooting with Logs

Page 23: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

There’s death, and then there’s illness…

Metrics

What is this guy doing up here?

If all the compute node load levels are down here…

Page 24: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Dashboards to Track Changes

nbproc=1–mc –set2

nbproc=1+mc –set2

nbproc=1+mc +set2

nbproc=2–mc –set2

nbproc=2+mc -set2

nbproc=2+mc +set2

Page 25: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Transient Dashboards

What’s up with MySQL?

Page 26: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Instrumentation Takeaways

● Can’t scale if you can’t tweak. Can’t tweak if you can’t monitor.

● Collect and filter all the logs

● Create checks for everything...especially running services

● Invest in a good metric visualization tool:○ Create focused graphs

○ Dashboards start with key metrics (correlated to your service level

agreements)

○ Be able to create one-shots and special-cases

○ Learn how to accurately monitor all the OpenStack services

○ Overview/Summary

○ Networking Services

○ Network Traffic

○ HAProxy

○ RabbitMQ

○ MySQL

○ Cassandra

○ Zookeeper

○ Hardware (CPU Load / Disk)

Page 27: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Clearing the Image Distribution Bottleneck

Image Distribution

Page 28: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Challenge: Control Plane Usage

Example - Nova Scheduler response time

Page 29: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Challenge: Control Plane Usage

Example - Nova Scheduler response time

Page 30: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Challenge: Control Plane Usage

Example - Count of deployed VMs

Page 31: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Large images: worst offender

~6GBImage size

~1700Instance count across DC’s

Page 32: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Problem

Glance

Compute Compute Compute Compute

Many VM boots in short period of time + large images = bottleneck

Page 33: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Problem

Glance

Many VM boots in short period of time + large images = bottleneck

Cache Cache Cache Cache

Page 34: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Problem

Glance

Many VM boots in short period of time + large images = bottleneck

SLOW...

Cache Cache Cache Cache

Page 35: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

curl https://<host>:8774/v2.1/image_prefetch -X POST \

...

-H "X-Auth-Token: MIIOvwYJKoZIQcCoIIOsDCCDasdkoas=" \

-H "Content-Type: application/json" \

-d '{ "image_id": "d5ac4b1a-9abe-4f88-8f5f-7896ece564b9" }'

Solution: Extend Nova API

Operator

Page 36: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

curl https://<host>:8774/v2.1/image_prefetch -X POST \

...

-H "X-Auth-Token: MIIOvwYJKoZIQcCoIIOsDCCDasdkoas=" \

-H "Content-Type: application/json" \

-d '{ "image_id": "d5ac4b1a-9abe-4f88-8f5f-7896ece564b9" }'

Solution: Extend Nova API

Operator Nova API Nova Compute

libvirtddriver

Nova Conductor

Nova DB API

Page 37: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

HTTP/1.1 202 Accepted

Content-Type: application/json

Content-Length: 50

X-Compute-Request-Id:

req-f7a3bd10-ab76-427f-b6ee-79b92fc2a978

Date: Mon, 02 Jul 2018 20:52:37 GMT

{"job_id": "f7a3bd10-ab76-427f-b6ee-79b92fc2a978"}

(Async job)

Solution: Extend Nova API

Operator

Nova API

Page 38: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

curl https://<host>:8774/v2.1/image_prefetch/image/<image_ID>

...

OR

curl https://<host>:8774/v2.1/image_prefetch/job/<job_ID>

...

Solution: Extend Nova API

Operator Nova API Nova DB API

Page 39: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

HTTP/1.1 200 OK ...

{

"overall_status": "5 of 10 hosts done. 0 errors.",

"image_id": "d5ac4b1a-9abe-4f88-8f5f-7896ece564b9",

"job_id": "f7a3bd10-ab76-427f-b6ee-79b92fc2a978",

"total_errors": 0,

"num_hosts_done": 5,

"start_time": "2018-07-02T20:52:37.000000",

"num_hosts_downloading": 2,

"error_hosts": 0,

"num_hosts": 10

}

Solution: Extend Nova API

Operator

Nova API

Page 40: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Before

Cache hit

• Avg 300 sec of VM boot time reduced

• VM creation failure rate decreased by 20 %

After

Image Prefetch API Result

Page 41: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

HAProxy Bottleneck

Load balancerNova

Compute

GET image

Glance API

Glance API

Download307 redirect

Glance API

HTTPD HTTPD HTTPD

Page 42: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

HAProxy Bottleneck

Page 43: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

• Under heavy load, downloading images can be a bottleneck

‒ Contribute image prefetch back to community

• HA Tradeoffs

• API Specific monitoring allows for unique insights

Image Distribution: Key Takeaways

Page 44: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Identifying and Fighting Fire Scaling Issues

API Challenges

Page 45: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Nova Metadata API

14 seconds!

Average response time (sec)

Each VM makes > 20 API requests

Page 46: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Nova Metadata API & Database Transfer Rate

Average response time (sec)Bytes sent (MB/sec)

1 GB/sec 14 seconds!

Each VM makes > 20 API requests

Page 47: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

SELECT ...

FROM (SELECT ...

FROM instances

WHERE instances.deleted = 0

AND instances.uuid = ?

LIMIT 1) AS instances

LEFT OUTER JOIN instance_system_metadata

ON instances.uuid = instance_system_metadata.instance_uuid

LEFT OUTER JOIN instance_extra

ON instance_extra.instance_uuid = instances.uuid

LEFT OUTER JOIN instance_metadata

ON instance_metadata.instance_uuid = instances.uuid

AND instance_metadata.deleted = 0

...

Top Query by “Rows Sent”

Page 48: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

SELECT ...

FROM (SELECT ...

FROM instances

WHERE instances.deleted = 0

AND instances.uuid = ?

LIMIT 1) AS instances

LEFT OUTER JOIN instance_system_metadata

ON instances.uuid = instance_system_metadata.instance_uuid

LEFT OUTER JOIN instance_extra

ON instance_extra.instance_uuid = instances.uuid

LEFT OUTER JOIN instance_metadata

ON instance_metadata.instance_uuid = instances.uuid

AND instance_metadata.deleted = 0

...

Instance Object-Relational Mapping

instances

instancemetadata

instancesystem

metadata

N N1

Page 49: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

SELECT ...

FROM (SELECT ...

FROM instances

WHERE instances.deleted = 0

AND instances.uuid = ?

LIMIT 1) AS instances

LEFT OUTER JOIN instance_system_metadata

ON instances.uuid = instance_system_metadata.instance_uuid

LEFT OUTER JOIN instance_extra

ON instance_extra.instance_uuid = instances.uuid

LEFT OUTER JOIN instance_metadata

ON instance_metadata.instance_uuid = instances.uuid

AND instance_metadata.deleted = 0

...

Instance Object-Relational Mapping

Expected result set (metadata union):50 + 50 = 100 rows

Actual result set (metadata product):50 x 50 = 2,500 rows!

instances

instancemetadata

instancesystem

metadata

N N1

Page 50: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

SELECT ...

FROM (SELECT ...

FROM instances

WHERE instances.deleted = 0

AND instances.uuid = ?

LIMIT 1) AS instances

LEFT OUTER JOIN instance_system_metadata

ON instances.uuid = instance_system_metadata.instance_uuid

LEFT OUTER JOIN instance_extra

ON instance_extra.instance_uuid = instances.uuid

LEFT OUTER JOIN instance_metadata

ON instance_metadata.instance_uuid = instances.uuid

AND instance_metadata.deleted = 0

...

Instance Object-Relational Mapping

Expected result set (metadata union):50 + 50 = 100 rows

Actual result set (metadata product):50 x 50 = 2,500 rows!

https://bugs.launchpad.net/nova/+bug/1799298

Thanks to Dan Smith & Matt Riedemann!

instances

instancemetadata

instancesystem

metadata

N N1

Page 51: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Commit: Avoid lazy-loads in metadata requests (Feb 5 2016)

The metadata server currently doesn't pre-query for metadata and system_metadata, which ends up generating *two* lazy-loads on many requests. Since especially user metadata is almost definitely one of the things an instance is going to fetch from the metadata server, this is fairly inefficient.

--- a/nova/api/metadata/base.py

+++ b/nova/api/metadata/base.py

def get_metadata_by_instance_id(instance_id, address, ctxt=None):

ctxt = ctxt or context.get_admin_context()

instance = objects.Instance.get_by_uuid(

- ctxt, instance_id, expected_attrs=['ec2_ids', 'flavor', 'info_cache'])

+ ctxt, instance_id, expected_attrs=['ec2_ids', 'flavor', 'info_cache',

+ 'metadata', 'system_metadata'])

return InstanceMetadata(instance, address)

Nova Pre-loads Metadata Tables (since Mitaka)

Page 52: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Reverting Metadata Pre-load

No metadata pre-loadBaseline test

Average response time (sec)

Bytes sent (MB/sec)

700 MB/sec2.2 sec

345 MB/sec

0.5 sec

Page 53: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Can We Do Better?

HAProxy

VM

NovaMetadata

API

NovaMetadata

API

GET metadata

NovaMetadata

API

Page 54: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Can We Do Better?

HAProxy

VM

NovaMetadata

API

NovaMetadata

API

GET metadata

NovaMetadata

API

Database

Page 55: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Memcached!

HAProxy

VM

NovaMetadata

API

NovaMetadata

API

GET metadata

NovaMetadata

API

Database

Page 56: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Enabling Memcached

Average response time (sec)

Bytes sent (MB/sec)

Memcached enabledBaseline test

700 MB/sec2.2 sec

400 MB/sec

0.2 sec

Page 57: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

No Metadata pre-load + Memcached

No metadata pre-load Memcached enabled Both

Page 58: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Product ofmetadata tables

Repeated databasefetching

Multiple API serversfetching data through

load balancers

Root Causes

Heavy SQL query No Memcached HA architecture

Booting many VMssimultaneously

with lots of metadata

Lots of metadata

Page 59: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Rolled back pre-loadof metadata tables

(2-line code change)

Enabled Memcached(3-line config change)

SQLProxy?Clustered

Memcached?

Fixes

Reduced SQL load Memcached HA architecture

Reduce (ab)use ofmetadata?

Lots of metadata

Page 60: Operational and Scaling Wins at Workday · 2019. 2. 26. · Cassandra rabbitmq zookeeper Contrail API. Key drivers for architectural evolution Downtime 0 upgrade Provide upgrade path

Questions?