playstation and cassandra streams (alexander filipchik & dustin pham, sony) | c* summit 2016

31
PlayStation and Cassandra Streams Cassandra Summit 2016

Upload: datastax

Post on 06-Jan-2017

97 views

Category:

Software


0 download

TRANSCRIPT

Page 1: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

PlayStation and Cassandra Streams

Cassandra Summit 2016

Page 2: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Who are we? Alexander Filipchik (PSN: LaserToy)

Principal Software Engineer at Sony Interactive Entertainment

Dustin Pham (PSN: quibfan)Principal Software Engineer at Sony Interactive Entertainment

Page 3: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Agenda• Multi-regional deployment problem• Proving C* replications will work for us• Designing a Test System• Cassandra Streams as a result

Page 4: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

How it all startedWe want

Multiple regions and always on

Page 5: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

A lot of unknowns• Will it work?• Will performance degrade?• How eventual is multiregion eventual

consistency?• Will we hit any roadblocks?• Well, how many roadblocks will we hit?

Page 6: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

What did we know?Netflix is doing it and they actually tested it:• They wrote 1M records in one region of a

multi-region cluster• 500ms later read in other clusters was

initiated• All records were successfully read

Page 7: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Well…Some questions to answer:

Should we just trust the Netflix’s results and just replicate data and see what happens?

Is their experiment applicable to our situation?

Can we do better?

Page 8: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016
Page 9: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Some wants:• Track replication latencies between regions• Use close to production traffic (both load and data)• Write/read in all the regions in the same time• To be able to simulate different disruptions• To have a reusable system we can use to test our

future Cassandra deployments• Do it in one month

Page 10: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Tracking latenciesTo track latencies we need to record some information when message arrives on a specific node:

17:06:52 Received from DC1, R1: update KS Test CF test K 1000 C hello Size 76 Timestamp 1456333612729000 at 1456333612735000. Diff is: 6000

17:06:53 Received from DC2, R1: update KS Test CF test K 1000 C hello Size 76 Timestamp 1456333613344000 at 1456333613345000. Diff is: 10000

Page 11: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Using real data• We need something we can use as a buffer so

we can store prod size data in there and then replay when we want

Page 12: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Results

Bad Idea

We need a way to store all the latencies and something to analyze the results

Page 13: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Putting everything together.

Page 14: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Preparation

Exporter

Thrift

CQL

Thrift

JSON

Region 1

Region 2

Ingester

Ingester

Page 15: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Test

Read/Write Loader

Region 1

Read/Write Loader

Region 2

Page 16: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Analysis

Page 17: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

How did we extract latencies?

Page 18: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Just injected code here and there

Messaging Service Keyspace CommitLog Memtable Etc.

Fire Async event

Store Context

info

Use

Write

StorageProxy

Page 19: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Example or results

10

100

1000

10000

100000

1000000

10000000Two DC connection cut-off and recovery ( latency in logarithmic scale)

Pct95 Pct99

Pct999 MaxLag

Page 20: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Looking at the Bigger Picture

Page 21: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

What now?• The information gathered through tests were

extremely useful but also not easily reachable in Cassandra’s current state

• Could we somehow ’tap’ off of Cassandra’s data streams?

Page 22: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Cassandra streams

queues logs metrics

Page 23: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Why????• Why not use triggers?• Why not put data routing ahead of

Cassandra?• Wouldn’t this cause a performance impact?• Wouldn’t this result in data bloat somewhere

else?

Page 24: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Knowing what happens at different points can power different use cases

message storage keyspace Commitlog memtable etc111011010110

001001000100

111011010110001001000100

111011010110001001000100

111011010110001001000100

111011010110001001000100

111011010110001001000100

111011010110001001000100

Page 25: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Use Cases• Building personalized search indices• In-place Migrations at Data tier level• Cache invalidation• Building analytic views• Data into read optimized views (transformations)• Smart backups• Disabling Hints, and provide alternative mechanisms• Provide more failure handling possibilities• Production level tests (stress tests)

Page 26: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Tap flow: In-place Migration

message storage keyspace Commitlog memtable etc111011010110

001001000100

111011010110001001000100

consumers

Read Schema ATransformWrite Schema B

Page 27: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Tap flow: Alternative Failure

message storage keyspace Commitlog memtable etc111011010110

001001000100

111011010110001001000100

Failure!

Replay Log

Hints causing cassandra to die faster

Page 28: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Tap flow: Production load test

message storage keyspace Commitlog memtable etc111011010110

001001000100

111011010110001001000100

consumers

Formalizing the previous Cassandra multi regional latency tests into the ‘Streams’ framework

Page 29: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

High level framework• Cassandra configuration to enable ’Streams’ per

keyspace– Tap hooks (+after StorageProxy => topic)– Sampling/Throttling capability/circuit breaking– Request Log mode (not recommended) / Kafka mode

• Common interfaces for consumers with common reference implementations

Page 30: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

Still a W.I.P.• The ‘Cassandra Streams’ for our Cassandra clusters is still a W.I.P.

and used only for measurement/analysis• Introducing a tap off of the write path introduces a new set of

complexity– Consistency– Paxos– Etc

• However, depending on use-case, it is a useful tool that can be enabled & disabled via configuration

Page 31: PlayStation and Cassandra Streams (Alexander Filipchik & Dustin Pham, Sony) | C* Summit 2016

PlayStation is hiring:

hackitects.com