cassandra day denver 2014: so, you want to use cassandra?

Post on 01-Dec-2014

1.204 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

This talk discusses things to consider when considering Cassandra through the purview of a Pearson’s team’s recent Cassandra adoption after coming from a .NET/SQL world. Topics covered include data model design, operationalization of a cluster, and other best practices along with what happens when they aren’t followed.

TRANSCRIPT

Introduction

So You Want To Use Cassandra?

Lessons Learned Implementing Cassandra at Pearson

Data Model

Data Modeling

● Know not only your data, but how you plan to retrieve it

● Can Cassandra store it in an easily retrievable manner?

● Will the data scale well and not break Cassandra

About Your Data...

● Data partitioning strategy● Know how you need to search your data● Limit the number of updates and deletes on

data that must be indexed● Denormalize ALL THE THINGS

Things C* Does Well

● Non-relational Data● Permanent Data● Storing Data as it should be viewed

Things C* Does NOT Do Well

● Constructible Views Across Data● Queue-like Data Patterns● Highly Volatile Indexed Data

Searching Your Data

● Do not rely on a single column family to handle all lookups

● Single set of data can have multiple column families depending on how many ways you need to look up the data

● Avoid secondary indexes in almost all use cases

Searching Your Data (continued)

● Avoid indexing volatile data● Limit your lookups to single partitions where

possible

Tombstones

How to Kill Your Cassandra Service

Tombstones

● Cassandra’s mechanism for handling deletes in a distributed fashion

● Created whenever a row or column is deleted or an indexed value is updated

● Essentially timestamped soft deletes

● Can cause your lookups to fail inexplicably when too many are read (100,000)

Managing Tombstones

● Avoid data models that:○ Update indexed columns○ Have too many deletes○ Need to query data across partitions

● Try to make your data as immutable as possible

● Fine tune your garbage collection settings

Operationalization

Maintaining a C* Cluster

Operationalization

● Cassandra requires more maintenance than most RDBMS

● Strange, difficult to debug issues will arise when your cluster is neglected

● Need to perform maintenance jobs regularly to keep cluster healthy and consistent

● Possibly perform major compactions to help keep reads performant

Thank You

top related