stampede con 2014 cassandra in the real world
Post on 30-Oct-2014
494 Views
Preview:
DESCRIPTION
TRANSCRIPT
STAMPEDECON 2014
CASSANDRA IN THE REAL WORLD
Nate McCall @zznate
!
Co-Founder & Sr. Technical Consultant !
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
About The Last Pickle. !
Work with clients to deliver and improve Apache Cassandra based solutions.
!
Based in New Zealand & USA.
“…in the Real World?” !
Lots of hype, stats get attention,
as do big names
“Real World?” !
“…1.1 million client writes per second. Data was automatically replicated across all three zones making a total of 3.3 million writes per second across the cluster.”
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
“Real World?” !
“+10 clusters, +100s nodes, 250TB provisioned,
9 billion writes/day, 5 billion reads/day”
http://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-cassandra-summit-2013
“Real World?” !
… • “but I don’t have an∞ AMZN budget” • “maybe one day I’ll have that much data”
“Real World!” !
Most folks needed: real fault tolerance,
scale out characteristics
“Real World!” !
Most folks have: 3 to 12 nodes with 2-15TB,
commodity hardware, small teams
!
Cassandra at 10k feet Case Studies
Common Best Practices
Cassandra in the Real World.
Cassandra Architecture (briefly).
API's
Cluster Aware
Cluster Unaware
Clients
Disk
Cassandra Cluster Architecture (briefly).
API's
Cluster Aware
Cluster Unaware
Clients
Disk
API's
Cluster Aware
Cluster Unaware
Disk
Node 1 Node 2
Dynamo Cluster Architecture (briefly).
API's
Dynamo
Database
Clients
Disk
API's
Dynamo
Database
Disk
Node 1 Node 2
Cassandra Architecture (briefly). !
API Dynamo Database
API Transports. !
Thrift Native Binary
Thrift transport. !
Extremely performant for specific workloads
Astyanax, disruptor-based HSHA in 2.0
API Transports. !
Thrift Native Binary
Native Binary Transport. !
Focus of future development Uses Netty, CQL 3 only,
asynchronous
API Services. !
JMX Thrift
CQL 3 !
API Services. !
JMX Thrift CQL 3
!
API Services. !
JMX Thrift
CQL 3 !
Cassandra Architecture (briefly). !
API Dynamo Database
Please see: http://www.slideshare.net/aaronmorton/cassandra-community-webinar-introduction-to-apache-cassandra-12-20353118 http://www.slideshare.net/planetcassandra/c-summit-eu-2013-cassandra-internals http://www.slideshare.net/aaronmorton/cassandra-community-webinar-august-29th-2013-in-case-of-emergency-break-glass
Cassandra in the Real World. !
Cassandra at 10k feet Case Studies
Common Best Practices
Case Studies.
Ad Tech Sensor Data
Mobile Device Diagnostics
Ad Tech.
Latency = $$$
Ad Tech.
Large “Hot Data” set active users,
targeting, display count
Ad Tech.
Huge Long Tail who saw what, used for billing,
campaign effectiveness over time, all sorts of analytics
Ad Tech: Software.
Java CQL via DataStax Java Driver
Python Pycassa (Thrift)
Ad Tech: Cluster.
Cluster 12 nodes,
2 datacenters, {DC1:R1:3,DC2:R2:3}
Ad Tech: Systems.
Physical Hardware commodity 1U 8xSSD,
36GB RAM, 10gigE + 4x1gigE
Case Studies.
Ad Tech Sensor Data
Mobile Device Diagnostics
Sensor Data.
Latency != $$$
Sensor Data.
High Write Throughput: consistent “shape”,
immutable data, large sequential reads,
high uptime (for writes)
Sensor Data: Software.
REST application: separate reader service,
writes to kafka, ELB to multiple regions
Sensor Data: Software.
Java: Thrift via Astyanax,
read from kafka and batch insertions to optimal size
Sensor Data: Cluster.
Cluster 9 nodes,
1 availability zone, {RF:3}
Sensor Data: Systems.
m1.xlarge: 15GB, 2TB RAID0
“high”, tablesnap for backup
Case Studies.
Ad Tech Sensor Data
Mobile Device Diagnostics
Device Diagnostics.
Latency = battery
Device Diagnostics.
Write Bursts large single payloads,
large hot data set
Device Diagnostics.
Huge long tail but irrelevant after 2 months,
external partner API* !
*thar be dragons
Device Diagnostics: Software.
Java CQL / DataStax Java Driver
Device Diagnostics: Software.
REST application Payloads to S3,
pointer in kafka to payload
Device Diagnostics: Cluster.
Cluster 12 nodes,
3 availability zones {us-east-1:1}
Device Diagnostics: Systems.
i2.2xlarge 61gb, 1.8TB RAID0 SSD “Enhanced Networking”,
dedicated ENI
Device Diagnostics: Systems.
No Backups. !
!
Device Diagnostics: Systems.
No Backups. !
“Replay the front end.”
Cassandra in the Real World. !
Cassandra at 10k feet Case Studies
Common Best Practices
Common Best Practices.
API's
Cluster Aware
Cluster Unaware
Clients
Disk
Client Best Practices.
Decouple! buffer writes for
event based systems, use asynchronous operations
Client Best Practices.
Use Official Drivers (but there are exceptions)
Client Best Practices.
CQL3: collections,
user defined types, tooling available
Common Best Practices.
API's
Cluster Aware
Cluster Unaware
Clients
Disk
API Best Practices.
Understand Replication!
API Best Practices.
Monitor & Instrument
Common Best Practices.
API's
Cluster Aware
Cluster Unaware
Clients
Disk
Cluster Best Practices.
Understand Replication! learn all you can about
topology options
Cluster Best Practices.
Verify Assumptions: test failure scenarios explicitly
Common Best Practices.
API's
Cluster Aware
Cluster Unaware
Clients
Disk
Systems Best Practices.
Better to have a lot of a little commodity hardware*,
32-64gb or RAM (or more)
*10gigE is now commodity
Systems Best Practices.
BUT: do you have staff that can tune kernels?
larger hardware needs tuning: “receive packet steering”
Systems Best Practices.
EC2 SSD instances if you can,
Use VPCs, Deployment groups and ENIs
Common Best Practices.
API's
Cluster Aware
Cluster Unaware
Clients
Disk
Storage Best Practices.
Dependent on workload can mix and match:
rotational for commitlog and system
Storage Best Practices.
You can mix and match: rotational for commitlog and
system, SSD for data
Storage Best Practices.
SSD consider JBOD,
consumer grade works fine
Storage Best Practices.
“What about SANs?”
Storage Best Practices.
“What about SANs?” !
NO. !
(You would be moving a distributed system onto a centralized component)
Storage Best Practices.
Backups: tablesnap on EC2,
rsync (immutable data FTW!)
Storage Best Practices.
Backups: combine rebuild+replay for
best results (Bonus: loading production data to staging is
testing your backups!)
Thanks. !
Nate McCall @zznate
!
Co-Founder & Sr. Technical Consultant www.thelastpickle.com
top related