paul dix, influxdb // open-source time series database

58
InfluxDB - an open source time series database Paul Dix CEO @pauldix paul@influxdb.com

Upload: firstmark-capital

Post on 23-Jan-2018

775 views

Category:

Technology


0 download

TRANSCRIPT

InfluxDB - an open source time series

databasePaul Dix

CEO @pauldix

[email protected]

What it’s for…

Metrics

Time Series

Analytics

Events

Use Cases

DevOps

Real-time analytics (user & business)

Sensor Data

Can’t you just use a regular DB?

order by time?

Doesn’t Scale

Example from metrics:

100 measurements per host * 10 hosts * 8640 per day (once every 10s) * 365 days

= 3,153,600,000 records per year

Have fun with that table…

But wait, we’ll just keep the summaries!

1h averages =

8,760,000 per year

Lose Detail and AdHoc Queryability

So let’s use Cassandra, HBase, or Scaleasaurus!

Too much application code and complexity

Application logic and scripts to compute

summaries

Application level logic for balancing

No data locality for AdHoc queries

How to handle data retention?

And then there’s more…

Web services

Libraries for web services

Data collection

Visualization

–Paul Dix

“Building an application with an analytics component today is like building a web

application in 1998. You spend months building infrastructure before getting to the actual thing

you want to build.”

Analytics and monitoring should be about analyzing and interpreting data, not the infrastructure to store

and process it.

A time series database with no external dependencies

FeaturesUpcoming 0.9.0 release

Data model

• Databases

Data model

• Databases

• Measurements

• cpu_load, temperature, log_lines, click, etc.

Data model• Databases

• Measurements

• cpu_load, temperature, log_lines, click, etc.

• Tags

• region=uswest, host=serverA, building=23, service=redis, etc.

Data model• Databases

• Measurements

• cpu_load, temperature, log, click, etc.

• Tags

• region=uswest, host=serverA, building=23, service=redis, etc.

• Series - measurement + unique tagset

Data model• Databases

• Measurements

• cpu_load, temperature, log, click, etc.

• Tags

• region=uswest, host=serverA, building=23, service=redis, etc.

• Series - measurement + unique tagset

• Points

• Fields - bool, int64, float64, string, []byte

• Timestamp - nano epoch

Writing Data

curl -XPOST 'http://localhost:8086/write' -d '...'

Writing Data{ "database": "mydb", "retentionPolicy": "30d", "points": [ { "name": "cpu_load", "tags": { "host": "server01", "region": "us-west" }, "timestamp": "2009-11-10T23:00:00Z", "fields": { "value": 0.64 } } ]}

Measurement

Tags

Fields

Querying

curl -G 'http://localhost:8086/query' --data-urlencode "q=..."

SQL-ish query language

SELECT value FROM cpu WHERE host = 'serverA'

{ "results":[ { "query": "SELECT value FROM cpu WHERE host='serverA'", "series": [ { "name": "cpu", "tags": { "host": "serverA" }, "columns": ["time", "value"], "values": [ ["2009-11-10T23:00:00Z", 22.1], ["2009-11-10T23:00:10Z", 25.2] ] } ] } ]}

QUERY:RESULTS:

SELECT value FROM cpuWHERE host = ‘serverA'OR host = 'serverB'QUERY:

{ "series": [ { "name": "cpu", "tags": { "host": "serverA" }, "columns": ["time", "value"], "values": [] }, { "name": "cpu", "tags": { "host": "serverB" }, "columns": ["time", "value"], "values": [] } ]}

SERIESIN RESULT:

SELECT percentile(90, value) FROM cpuWHERE time > now() - 4hGROUP BY time(10m), region

QUERY:

[ { "name": "cpu", "tags": { "region": "us-west" }, "columns": ["time", "percentile"], "values": [] }, { "name": "cpu", "tags": { "region": "us-east" }, "columns": ["time", "percentile"], "values": [] } ]

SERIESIN RESULT:

Multiple aggregates

SELECT mean(value), percentile(90, value), min(value), max(value)FROM cpuWHERE host='serverA' AND time > now() - 48hGROUP BY time(1h)

Return every series in CPU

SELECT mean(value)FROM cpuWHERE time > now() - 48hGROUP BY time(1h), *

Discovery based on tags

{ "results":[ { "query": "SHOW MEASUREMENTS", "series": [ { "name": "measurements", "columns": ["name"], "values": [ ["cpu"], ["memory"], ["network"] ] } ] } ]}

{ "results":[ { "query": "SHOW SERIES", "series": [ { "name": "cpu", "columns": ["id", "region", "host"], "values": [ [1, "us-west", "serverA"], [2, "us-east", "serverB"] ] } ] } ]}

{ "query": "SHOW MEASUREMENTS WHERE service='redis'", "series": [ { "name": "measurements", "name": "series", "columns": ["measurement"], "values": [ ["key_count"], ["connections"] ] } ]}

{ "query": "SHOW TAG KEYS from cpu", "series": [ { "name": "keys", "columns": ["key"], "values": [ ["region"], ["host"] ] } ]}

{ "query": "SHOW TAG VALUES WITH KEY = service", "series": [ { "name": "series", "columns": ["service"], "values": [ ["redis"], ["apache"] ] } ]}

{ "query": "SHOW TAG VALUES FROM cpu WITH KEY = service", "series": [ { "name": "series", "columns": ["service"], "values": [ ["redis"], ["apache"] ] } ]}

Much more

• Retention policies

• Automatic downsampling and aggregation

• Clustering

Grafana Dashboards

Thank you!Paul Dix @pauldix

[email protected]