time series data with apache cassandra

Time Series Data With Apache Cassandra

StrangeloopSeptember 19, 2014

Eric Evanseevans@opennms.org

@jericevans

Network

Management

System

OpenNMS: What It Is

● Network Management System○ Discovery and Provisioning○ Service monitoring○ Data collection○ Event management, notifications

● Java, open source, GPLv3● Since 1999

Time series: RRDTool

● Round Robin Database● First released 1999● Time series storage● File-based, constant-size, self-maintaining● Automatic, incremental aggregation

… and oh yeah, graphing

Consider

● 5+ IOPs per update (read-modify-write)!● 100,000s of metrics, 1,000s IOPS● 1,000,000s of metrics, 10,000s IOPS● 15,000 RPM SAS drive, ~175-200 IOPS

We collect and write a great deal; We read (graph) relatively little.

So why are we aggregating everything?

● Not everything is a graph● Inflexible● Incremental backups impractical● Availability subject to filesystem access

Metrics typically appear in groups that are accessed together.

Optimizing storage for grouped access is a great idea!

What OpenNMS needs:

● High throughput● High availability● Late aggregation● Grouped storage/retrieval

Cassandra

● Apache top-level project● Distributed database● Highly available● High throughput● Tunable consistency

SSTables

Writes

Commitlog

Memtable

SSTable

DiskMemory

Write Properties

● Optimized for write throughput● Sorted on disk● Perfect for time series!

Partitioning

Key: Apple

Placement

Key: Apple

Replication

Key: Apple

CAP Theorem

Consistency

Availability

Partition tolerance

Consistency

R+W > N

Distribution Properties

● Symmetrical● Linearly scalable● Redundant● Highly available

D ata odelM

Data Modelresource

T1 T2 T3

Data Modelresource

Data Model

CREATE TABLE samples ( T timestamp,

M text,

V double,

resource text,

PRIMARY KEY(resource, T, M));

Data model

V1T1 M1 V1T2 M1 T3 V1M1resource

Data model

SELECT * FROM samplesWHERE resource = ‘resource’AND T >= ‘T1’ AND T <= ‘T3’;

Data model

SELECT * FROM samplesWHERE resource = ‘resource’AND T >= ‘T1’ AND T <= ‘T3’;

Data model

T1 M1 V1resource

T2 M2 V2resource

T3 M3 V3resource

● Standalone time series data-store○ Java API○ REST interface

● Raw sample storage and retrieval● Flexible aggregations (computed at read)

○ Rate (from counter types)○ Pluggable aggregation functions○ Arbitrary calculations

● Cassandra-speed● Resource search indexing (preliminary)● Approaching “1.0”● Apache license● Github (http://github.com/OpenNMS/newts)● http://newts.io

time series data with apache cassandra

Technology

time series data with apache cassandra (apachecon eu 2014)

Введение в apache cassandra

cassandra summit 2014: apache cassandra on pivotal...

cassandra day denver 2014: introduction to apache cassandra

apache cassandra in bangalore - cassandra internals and...

time series data with apache cassandra

storing time series data with apache cassandra

introduction to cassandra • why spark - apache cassandra |...

state of cassandra, 2012 - nosql | apache cassandra ·...

analyzing time series data with apache spark and cassandra

apache cassandra in action - o'reilly...

apache cassandra from the ground up -...

apache cassandra, part 3 – machinery, work with cassandra

taller apache cassandra -...

nosql apache cassandra

amazon managed apache cassandra service - developer...

apache cassandra & apache spark for time series data

devcenter apache cassandra

with kaa, apache cassandra, and apache zeppelin … ·...

apache cassandra at target - cassandra summit 2014