seattle cassandra meetup - hasoffers

18
Tuesday, August 28, 2012 Apache Cassandra @ HasOffers

Upload: btoddb

Post on 13-Dec-2014

782 views

Category:

Documents


2 download

DESCRIPTION

HasOffers presentation on using Apache Cassandra in AWS

TRANSCRIPT

Page 1: Seattle Cassandra Meetup - HasOffers

Tuesday, August 28, 2012

Apache Cassandra @ HasOffers

Page 2: Seattle Cassandra Meetup - HasOffers

Topics

● Cassandra Configuration● Amazon Web Services

Page 3: Seattle Cassandra Meetup - HasOffers

Why Cassandra?

● High write throughput● Low latency● Multiple datacenter replication● Fault tolerant● Large online community● Linear scalability

Page 4: Seattle Cassandra Meetup - HasOffers

Keyspace Configuration

● One keyspace with two column families● Multiple secondary indexes● No super column families● Counter columns● Consistent column counts● Large row and key cache● Compression

Page 5: Seattle Cassandra Meetup - HasOffers

Keyspace Configuration

● placement_strategy = 'NetworkTopologyStrategy'

● strategy_options = {eu-west : 1, us-west : 1, us-east : 1}

● Replication factor of 1● 9 Nodes total● 3 Nodes at each datacenter● EC2 Snitch

Token RingNode 1

Node 2

Node 3

Page 6: Seattle Cassandra Meetup - HasOffers

HasOffers Keyspace Statistics

● Number of Keys (estimate): 318453248● Key ttl: 90 days● Approximately 13.8 Million daily inserts● Replication to 3 Datacenters● Approximately 3 Million daily queries● Compacted row mean size: 1408

Page 7: Seattle Cassandra Meetup - HasOffers

0

2000000

4000000

6000000

8000000

10000000

12000000

14000000

16000000

Daily Inserts

USW

USE

EUW

ALL

Month

Ke

ys

Page 8: Seattle Cassandra Meetup - HasOffers

Cassandra Configuration

● Keep Commitlog and Data on separate disks● Set Initial Token to prevent hotspots● RandomPartitioner for good data distribution● commitlog_sync, batch vs. Periodic

● Batch mode won't ack writes until log has been synced to disk to prevent dropped mutations.

Page 9: Seattle Cassandra Meetup - HasOffers

Cassandra Configuration

● max_hint_window_in_ms: ● Depends on replication factor and response time

● flush_largest_memtables_at: 0.95● Depends on heap size

● rpc_timeout_in_ms: 18000● Network latency and datacenter location are factors

● index_interval: 512● Larger index interval can lower memory usage

Page 10: Seattle Cassandra Meetup - HasOffers

Cassandra Configuration

● Nodes networked behind a VPN● Pycassa client library● Multiple clause resource intensive queries● Key based queries● Regional failover strategy controlled by client

script

Page 11: Seattle Cassandra Meetup - HasOffers

Cassandra Configuration

● Pycassa client scripts● Query with exception handling

● Retry● Reconnect● Fail

Page 12: Seattle Cassandra Meetup - HasOffers

Data Recovery

● nodetool repair● Resource intensive● Depends on data locality

● sstable loader● snapshots

Page 13: Seattle Cassandra Meetup - HasOffers

AWS

● Instance Sizes● Complex cassandra queries require more

memory● Ephemeral vs. EBS vs. PIO● Root Partition instance store/EBS

Page 14: Seattle Cassandra Meetup - HasOffers

AWS Instances

● 14 different instance types● Varying specifications and prices

High-Memory Quadruple Extra Large Instance

68.4 GB of memory

26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)

1690 GB of instance storage

64-bit platform

I/O Performance: High

EBS-Optimized Available: 1000 Mbps

API name: m2.4xlarge

Page 15: Seattle Cassandra Meetup - HasOffers

Disk Options

● EBS RAID

● Good performance● Easy implementation

● Provisions IO's

● Additional cost● Very Good performance● Optimized for use with only some instance types

● Ephemeral

● Good performance● Lost when instance is stopped

Page 16: Seattle Cassandra Meetup - HasOffers

EBS Raid Performance on AWS

Page 17: Seattle Cassandra Meetup - HasOffers

Provisioned IOPS for Amazon EBS

“Provisioned IOPS are a new EBS volume type designed to deliver predictable, high performance for I/O intensive workloads, such as database applications, that rely on consistent and fast response times. With EBS Provisioned IOPS, customers can flexibly specify both volume size and volume performance, and Amazon EBS will consistently deliver the desired performance over the lifetime of the volume. Customers can then attach multiple volumes to an Amazon EC2 instance and stripe across them to deliver thousands of IOPS to their application.”

*EBS-Optimized instances deliver dedicated throughput between Amazon EC2 and Amazon EBS, with options between 500 Megabits per second and 1,000 Megabits per second depending on the instance type used.

Page 18: Seattle Cassandra Meetup - HasOffers

Questions?