cassandra summit 2014: launching playstation 4 with apache cassandra

47
Launching PS4 with Cassandra

Upload: planet-cassandra

Post on 27-Nov-2014

546 views

Category:

Technology


5 download

DESCRIPTION

Presenters: Alexander Filipchick and Staff Software Engineer, Staff Software Engineers at Sony Network Entertainment Since the launch of the PlayStation 4, many of the PSN features have been delivered using Cassandra. We will be talking about our experience as we launched one of the most popular gaming consoles in the world on well over 300 nodes. - Why we picked Cassandra - Exactly what PSN features for PS4 are powered by Cassandra - The infrastructure used to deploy our clusters - How we monitor system heath - How we design, test and deploy - Issues we faced and lessons learned along the way

TRANSCRIPT

Page 1: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Launching PS4 with Cassandra

Page 2: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Introduction •  Alexander Filipchik – Staff Software Engineer at SNEI

•  Dustin Pham – Staff Software Engineer at SNEI

Page 3: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Agenda •  Journey towards Cassandra •  Cassandra-backed PS4 Features •  Ops-y Stuff •  Lessons learned

Page 4: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Journey towards Cassandra

Page 5: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Challenges •  Small Team •  Legacy Support •  Hardware Deadline •  Scaling @ Peak Time

Page 6: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Why Cassandra •  Strong community •  Horizontally scalable architecture •  Good performance •  Cost effective •  New adventure J

6

Page 7: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

PS4 Features backed by Cassandra

Page 8: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 9: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 10: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 11: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 12: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 13: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 14: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 15: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 16: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 17: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 18: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 19: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 20: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  +more

Page 21: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cassandra-backed PS4 features

•  What’s New •  Video Library •  My Library •  PS Now •  Notifications •  LiveArea •  Store catalog •  Pre-order •  PS Plus •  Recommendations •  Remote Download •  Share •  Authentication •  + more

Page 22: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Ops-y Stuff

Page 23: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Infrastructure •  Hosted in cloud and physical DCs •  Several hundred nodes and growing •  Cluster by feature •  Vnodes and Assigned token clusters •  Astyanax Client

Page 24: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Stats for PS4 cloud nodes •  Data throughput: Gigabytes / sec •  Cassandra read/writes: > 200,000 / sec •  Data size: tens of terabytes •  10M PS4 and 80M PS3 sold

24

Page 25: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Clusters •  Cluster per Read/Write pattern initially •  Now use cluster per feature •  Seeds referenced by DNS names •  Size Tiered compaction •  Manual compactions for some CFs

25

Page 26: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

A typical node •  m2.4xl + i2.2xl •  2 ephemeral disks (~ 2 x 800 GB) •  Commit log on root partition •  Topology managed in the topology file

managed by chef

26

Page 27: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

AWS •  Nodes are

interleaved between AZs – Replication factor

spreads data across AZ’s

– Minimizes downtime due to AZ outage

Availability Zone A Availability Zone C

Page 28: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Eph1

Disk Layout

Pre-Launch Launch Current

ü  2 Ephemerals in a RAID 0 ü  Higher throughput (io

spreads into 2 devices for reading & writing)

ü  If you lose 1 device, you loose the array !

ü  2 Ephemerals in a RAID 1 ü  Higher throughput for

reading (io spreads into 2 devices), but not for writing

ü  If you lose 1 device, the array continues up in degraded mode.

ü  ½ the available space

ü  2 individual Ephemerals ü  Higher throughput (io

spreads into 2 devices for reading & writing)

ü  You lose 1 device, Cassandra stops (configurable)

ü  No RAID overhead

Eph0

AWS m2.4xl

RAID 0

Eph1

Eph0

AWS m2.4xl

RAID 1

Eph1

Eph0

AWS m2.4xl

Page 29: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cluster Resizing

Page 30: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Thrift Payload Size

thri%_framed_transport_size_in_mb  thri%_max_message_length_in_mb  

Page 31: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Bouncing Nodes phi_convict_threshold  

Page 32: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Inter-DC Latency

Page 33: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Monitor system health •  Nagios •  Kibana/Elasticsearch •  Graphite •  AWS Cloudwatch •  App level monitoring •  Opscenter

Page 34: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra
Page 35: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

App level metrics

Page 36: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Lessons Learned

Page 37: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Fun with Astyanax Client •  Cross DC Latencies –  Several second latencies in JP and EE data

centers –  Astyanax configs to ensure local datacenters

used •  Imbalanced node traffic –  Hashing algorithm (MD5 vs Murmur3)

•  DNS Caching in the JVM –  Stale seed nodes

Page 38: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

A tale of 2 Nodes

Page 39: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Cluster lessons •  A single bad node can raise app

latencies significantly •  Taking out an entire cassandra cluster is

easy (not so fun) – Compressing data before sending to

cassandra helps a lot. •  Corrupted SStable resulted in

cascading failure

Page 40: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra
Page 41: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

•  Monitoring – Memtable flush frequency – Hinted handoffs – Garbage collection – Compactions – Histograms

Page 42: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

•  VPNs are a dangerous bottle neck

•  Easier to rebuild a node than to fix

•  Backup data – Replication factor helps

but does not account for data corruption

Page 43: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

•  Denormalization costs •  Disk is cheap but EC2s are

not •  TTL on almost everything •  Adjust gc_grace_period

based off TTL times •  Transactions ? Be creative •  Load test with real data

Page 44: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

•  Replication strategy: –  Read / Write pattern –  Data is source of truth or not –  Data locality –  User Level data vs App level

data •  Cluster wide commands

should be staggered –  Global repair L

Page 45: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Tokens •  Vnodes vs Assigned Tokens –  Increased chattiness on gossip protocol

with vnodes – Perceived slowness on repair and cleanup

operations on vnodes enabled cluster – Astyanax client does not like vnodes…

Page 46: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

Compactions •  Compactions are your worst enemy –  larger disk usage = high cpu & longer

compactions •  Leveled compaction vs sized compaction –  Start up time –  Cpu tradeoff –  IO tradeoff

•  Updates + Removals eat up disks

Page 47: Cassandra Summit 2014: Launching PlayStation 4 with Apache Cassandra

We are hiring… sonyentertainmentnetwork.com/careers