time series storage in cassandra

20
Time Series Storage Cassandra London Meetup April 7, 2014 Eric Evans [email protected] @jericevans

Upload: eric-evans

Post on 15-Jan-2015

987 views

Category:

Technology


3 download

DESCRIPTION

Presented at Cassandra London (April 7, 2014); The challenges of time-series storage and analytics in OpenNMS, with an introduction to Newts, a new Cassandra-based time-series data store.

TRANSCRIPT

Page 1: Time series storage in Cassandra

Time Series StorageCassandra London Meetup

April 7, 2014

Eric [email protected]

@jericevans

Page 2: Time series storage in Cassandra

Open

Page 3: Time series storage in Cassandra

Open

Page 4: Time series storage in Cassandra

Network

Management

System

Page 5: Time series storage in Cassandra

OpenNMS: What It Is

● Network Management System○ Discovery and Provisioning○ Service monitoring○ Data collection○ Event management and notifications

● Java, open source, GPLv3● Since 1999

Page 6: Time series storage in Cassandra

Graph All The Things

Page 7: Time series storage in Cassandra

RRDTool

● Round robin database● First released 1999● Time-series storage● File-based● Constant-size● Automatic, amortized aggregation

Page 8: Time series storage in Cassandra

Consider

● 2 IOPs per update (read-update-write)● 1 RRD per data source (storeByGroup=false)● 100,000s of data sources, 1,000s IOPS● 1,000,000s of data sources, 10,000s IOPS● 15,000 RPM SAS drive, ~175-200 IOPS

Page 9: Time series storage in Cassandra

Also

● Not everything is a graph● Inflexible● Incremental backups impractical● ...

Page 10: Time series storage in Cassandra

Observation #1

We collect and write a great deal; We read (graph) relatively little.

We are optimized for reading everything, always.

Page 11: Time series storage in Cassandra

Observation #2

Samples are naturally collected, and graphed together in groups.

Grouping samples that are accessed together is an easy optimization.

Page 12: Time series storage in Cassandra

Project: NewtsGoals:● Stand-alone time-series data store● High-throughput● Horizontally scalable● Grouped metric storage/retrieval● Late-aggregating

Page 13: Time series storage in Cassandra

Cassandra

Why:● Write-optimized● Sorted● Horizontally scalable (linear)

Page 14: Time series storage in Cassandra

Gist

● Samples stored as-is.● Samples can be retrieved as-is.● Measurements are aggregations calculated

from samples (at time of query).

Page 15: Time series storage in Cassandra

Sample{

“resource” : “london”,

“timestamp” : 1396289065,

“name” : “meanTemp”,

“type” : “GAUGE”,

“value” : 17.2,

“attributes” : { “units”: “celsius” }

}

Page 16: Time series storage in Cassandra

SamplesCREATE TABLE newts.samples ( resource text, collected_at timestamp, metric_name text, metric_type text, value blob, attributes map<text, text>, PRIMARY KEY(resource, collected_at, metric_name));

Page 17: Time series storage in Cassandra

Samples

resource | collected_at | metric_name | value

---------+---------------------+--------------+-----------

london | 2014-03-31 18:04:25 | dewPoint | 0xc01a0000

london | 2014-03-31 18:04:25 | maxTemp | 0x40280000

london | 2014-03-31 18:04:25 | maxWindGust | 0x7ff80000

london | 2014-03-31 18:04:25 | maxWindSpeed | 0x40180000

london | 2014-03-31 18:04:25 | meanTemp | 0xbfe00000

Page 18: Time series storage in Cassandra

Behind the scenes...

london (2014-03-31 18:04:25, dewPoint): 0xc01a0000

(2014-03-31 18:04:25, maxTemp): 0x40280000

...

Ascending Order

Page 20: Time series storage in Cassandra