dealing with an upside down internet

68
© 2014 MapR Technologies 1 © 2014 MapR Technologies Dealing with an Upside Down Internet Ted Dunning June 10, 2015

Upload: mapr-technologies

Post on 28-Jul-2015

362 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Dealing with an Upside Down Internet

© 2014 MapR Technologies 1© 2014 MapR Technologies

Dealing with an Upside Down Internet

Ted Dunning

June 10, 2015

Page 2: Dealing with an Upside Down Internet

© 2014 MapR Technologies 2

Who am I?

Ted Dunning, Chief Applications Architect MapR Technologies

Email [email protected] [email protected]

Twitter @Ted_Dunning

Page 3: Dealing with an Upside Down Internet

© 2014 MapR Technologies 3

e-book available courtesy of MapRhttp://bit.ly/1jQ9QuL

A New Look at Anomaly Detectionby Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly)

Page 4: Dealing with an Upside Down Internet

© 2014 MapR Technologies 4

Agenda

• The Internet is turning upside down• A story• The last (mile) shall be first• Time series on NO-SQL• Faster time series on NO-SQL• Summary

Page 5: Dealing with an Upside Down Internet

© 2014 MapR Technologies 5

How the Internet Works

• Big content servers feed data across the backbone to

• Regional caches and servers feed data across neighborhood transport to

• The “last mile”

• Bits are nearly conserved, $ are concentrated centrally– But total $ mass at the edge is much higher

Page 6: Dealing with an Upside Down Internet

© 2014 MapR Technologies 6

How The Internet Works

Page 7: Dealing with an Upside Down Internet

© 2014 MapR Technologies 7

Conservation of Bits Decreases Bandwidth

Page 8: Dealing with an Upside Down Internet

© 2014 MapR Technologies 8

Total Investment Dominated by Last Mile

Page 9: Dealing with an Upside Down Internet

© 2014 MapR Technologies 9

The Rub

• What's the problem?– Speed (end-to-end latency, backbone bw)– Feasibility (cost for consumer links)– Caching

• What do we need?– Cheap last-mile hardware– Good caches

Page 10: Dealing with an Upside Down Internet

© 2014 MapR Technologies 10

First:

An apology for going off-script

Page 11: Dealing with an Upside Down Internet

© 2014 MapR Technologies 11

Now, the story

Page 12: Dealing with an Upside Down Internet

© 2014 MapR Technologies 12

Page 13: Dealing with an Upside Down Internet

© 2014 MapR Technologies 13

By the 1840’s, the NY-SF sailing time was down to 130-180 days

Page 14: Dealing with an Upside Down Internet

© 2014 MapR Technologies 14

Page 15: Dealing with an Upside Down Internet

© 2014 MapR Technologies 15

In 1851, the record was set at 89 days by the Flying Cloud

Page 16: Dealing with an Upside Down Internet

© 2014 MapR Technologies 16

The difference was due (in part) to big data

and a primitive kind of time-series database

Page 17: Dealing with an Upside Down Internet

© 2014 MapR Technologies 17

Page 18: Dealing with an Upside Down Internet

© 2014 MapR Technologies 18

Page 19: Dealing with an Upside Down Internet

© 2014 MapR Technologies 19

Page 20: Dealing with an Upside Down Internet

© 2014 MapR Technologies 20

These charts were free …

If you donated your data

Page 21: Dealing with an Upside Down Internet

© 2014 MapR Technologies 21

But how does this apply today?

Page 22: Dealing with an Upside Down Internet

© 2014 MapR Technologies 22

What has changed?Where will it lead?

Page 23: Dealing with an Upside Down Internet

© 2014 MapR Technologies 23

Page 24: Dealing with an Upside Down Internet

© 2014 MapR Technologies 24

Page 25: Dealing with an Upside Down Internet

© 2014 MapR Technologies 25

Page 26: Dealing with an Upside Down Internet

© 2014 MapR Technologies 26

Page 27: Dealing with an Upside Down Internet

© 2014 MapR Technologies 27

Page 28: Dealing with an Upside Down Internet

© 2014 MapR Technologies 28

Page 29: Dealing with an Upside Down Internet

© 2014 MapR Technologies 29

Page 30: Dealing with an Upside Down Internet

© 2014 MapR Technologies 30

Page 31: Dealing with an Upside Down Internet

© 2014 MapR Technologies 31

Page 32: Dealing with an Upside Down Internet

© 2014 MapR Technologies 32

Page 33: Dealing with an Upside Down Internet

© 2014 MapR Technologies 33

Things

Page 34: Dealing with an Upside Down Internet

© 2014 MapR Technologies 34

Emitting data

Page 35: Dealing with an Upside Down Internet

© 2014 MapR Technologies 35

How The Internet Works

Page 36: Dealing with an Upside Down Internet

© 2014 MapR Technologies 36

How the Internet is Going to Work

Page 37: Dealing with an Upside Down Internet

© 2014 MapR Technologies 37

Where Will The $ Go?

Page 38: Dealing with an Upside Down Internet

© 2014 MapR Technologies 38

Sensors

Page 39: Dealing with an Upside Down Internet

© 2014 MapR Technologies 39

Controllers

Page 40: Dealing with an Upside Down Internet

© 2014 MapR Technologies 40

The Problems

• Sensors and controllers have little processing or space– SIM cards = 20Mhz processor, 128kb space = 16kB– Arduino mini = 15kB RAM (more EPROM)– BeagleBone/Raspberry Pi = 500 kB RAM

• Sensors and controllers have little power– Very common to power down 99% of the time

• Sensors and controls often have very low bandwidth– Mesh networks with base rates << 1Mb/s– Power line networking– Intermittent 3G/4G/LTE connectivity

Page 41: Dealing with an Upside Down Internet

© 2014 MapR Technologies 41

What Do We Need to Do With a Time Series

• Acquire– Measurement, transmission, reception– Mostly not our problem

• Store– We own this

• Retrieve– We have to allow this

• Analyze and visualize– We facilitate this via retrieval

Page 42: Dealing with an Upside Down Internet

© 2014 MapR Technologies 42

Retrieval Requirements

• Retrieve by time-series, time range, tags– Possibly pull millions of data points at a time– Possibly do on-the-fly windowed aggregations

• Search by unstructured data– Typically require time windowed facetting after search– Also need to dive in with first kind of retrieval

Page 43: Dealing with an Upside Down Internet

© 2014 MapR Technologies 43

Storage choices and trade-offs• Flat files

– Great for rapid ingest with massive data– Handles essentially any data type– Less good for data requiring frequent updates– Harder to find specific ranges

• Traditional relational db– Ingests up to 10,000’s/ sec; prefers well structured (numerical) data; expensive

• Non-relational db: Tables (such as MapR tables in M7 or HBase)– Ingests up to 100,000 rows/sec– Handles wide variety of data– Good for frequent updates – Easily scanned in a range

Page 44: Dealing with an Upside Down Internet

© 2014 MapR Technologies 44

Specific Example

• Consider a server farm• Lots of system metrics• Typically 100-300 stats / 30 s• Loads, RPC’s, packets, requests/s• Common to have 100 – 10,000 machines

Page 45: Dealing with an Upside Down Internet

© 2014 MapR Technologies 45

The General Outline

• 10 samples / second / machine

x 1,000 machines

= 10,000 samples / second

• This is what Open TSDB was designed to handle

• Install and go, but don’t test at scale

Page 46: Dealing with an Upside Down Internet

© 2014 MapR Technologies 46

Specific Example

• Consider oil drilling rigs• When drilling wells, there are *lots* of moving parts• Typically a drilling rig makes about 10K samples/s• Temperatures, pressures, magnetics,

machine vibration levels, salinity, voltage,

currents, many others• Typical project has 100 rigs

Page 47: Dealing with an Upside Down Internet

© 2014 MapR Technologies 47

The General Outline

• 10K samples / second / rig

x 100 rigs

= 1M samples / second

Page 48: Dealing with an Upside Down Internet

© 2014 MapR Technologies 48

The General Outline

• 10K samples / second / rig

x 100 rigs

= 1M samples / second

• But wait, there’s more– Suppose you want to test your system– Perhaps with a year of data– And you want to load that data in << 1 year

• 100x real-time = 100M samples / second

Page 49: Dealing with an Upside Down Internet

© 2014 MapR Technologies 49

How Should That Work?

Message queue

CollectorMapR tableSamples

Web service Users

Page 50: Dealing with an Upside Down Internet

© 2014 MapR Technologies 50

Example Time Series

...1409497082 327810227706 mysql.bytes_received schema=foo host=db11409497099 6604859181710 mysql.bytes_sent schema=foo host=db11409497106 327812421706 mysql.bytes_received schema=foo host=db11409497113 6604901075387 mysql.bytes_sent schema=foo host=db...

UNIX epoch timestamp: $(date +%s)

a metric (often hierarchical)

two tags

Page 51: Dealing with an Upside Down Internet

© 2014 MapR Technologies 51

The Whole Picture

HBase or

MapR-DB

Page 52: Dealing with an Upside Down Internet

© 2014 MapR Technologies 52

Wide Table Design: Point-by-Point

Page 53: Dealing with an Upside Down Internet

© 2014 MapR Technologies 53

Wide Table Design: Hybrid Point-by-Point + Blob

Insertion of data as blob makes original columns redundantNon-relational, but you can query these tables with Drill

Page 54: Dealing with an Upside Down Internet

© 2014 MapR Technologies 54

Status to This Point

• Each sample requires one insertion, compaction requires another

• Typical performance on SE cluster– 1 edge node + 4 cluster nodes– 20,000 samples per second observed – Would be faster on performance cluster, possibly not a lot

• Suitable for server monitoring• Not suitable for large scale history ingestion• Bulk load helps a little, but not much• Still 1000x too slow for industrial work

Page 55: Dealing with an Upside Down Internet

© 2014 MapR Technologies 55

Speeding up OpenTSDB

20,000 data points per second per node in the test cluster

Why can’t it be faster ?

Page 56: Dealing with an Upside Down Internet

© 2014 MapR Technologies 56

Speeding up OpenTSDB: open source MapR extensions

Available on Github: https://github.com/mapr-demos/opentsdb

Page 57: Dealing with an Upside Down Internet

© 2014 MapR Technologies 57

Status to This Point

• 3600 samples require one insertion• Typical results on SE cluster

– 1 edge node + 4 cluster nodes– 14 million samples per second observed– ~700x faster ingestion

• Typical results on performance cluster– 2-4 edge nodes + 4-9 cluster nodes– 110 million samples/s (4 nodes) to >200 million samples/s (8 nodes)

• Suitable for large scale history ingestion• 30 million data points retrieved in 20s• Ready for industrial work

Page 58: Dealing with an Upside Down Internet

© 2014 MapR Technologies 58

Going Further

• Open TSDB is substantially limited in many respects– Millisecond resolution is a bit of a hack– Data formats “just growed”, better design needed– Internal code is difficult to modify safely

• Possible improvements– Compress and batch at collectors– Use advanced compression technology– Interface with modern query systems (Apache Drill)

Page 59: Dealing with an Upside Down Internet

© 2014 MapR Technologies 59

Compression example

Samples are 64b time, 16 bit

sample

Sample time at 10kHz

Sample time jitter makes it important to keep original time-stamp

How much overhead to retain time-stamp?

Page 60: Dealing with an Upside Down Internet

© 2014 MapR Technologies 60

Key Results

• Ingestion is network limited– Edge nodes are the critical resource– Number of edge nodes defines a limit to scaling

• With enough edge nodes scaling is near perfect

• Performance of raw OpenTSDB is limited by stateless demon

• Modified OpenTSDB can run 1000x faster

Page 61: Dealing with an Upside Down Internet

© 2014 MapR Technologies 61

Two ingestors

One ingestor

Page 62: Dealing with an Upside Down Internet

© 2014 MapR Technologies 62

Two ingestors

One ingestor

Page 63: Dealing with an Upside Down Internet

© 2014 MapR Technologies 63

Why MapR?

• MapR tables are inherently faster, safer– Sustained > 1GB/s ingest rate in tests

• Mirror to M5 or M7 cluster to isolate analytics load

• Transaction logs involves frequent appends, many files

Page 64: Dealing with an Upside Down Internet

© 2014 MapR Technologies 64

When is this All Wrong?

• In some cases, retrieval by series-id + time range not sufficient• May need very flexible retrieval of events based on text-like

criteria

• Search may be better than class time-series database

• Can scale Lucene based search to > 1 million events / second

Page 65: Dealing with an Upside Down Internet

© 2014 MapR Technologies 65

When is it Even More Right

• In many industrial settings, data rates from individual sensors are relatively high– Latency to view is still measured in seconds, not sample points

• This allows batching at source• Common requirement for highly variable sample rates

– 1 sample/s, baseline, switch to 10 k sample/s– Small batches during slow times are just fine since number of sensors is

constant– Requires variable window sizes

Page 66: Dealing with an Upside Down Internet

© 2014 MapR Technologies 66

Summary

• The internet is turning upside down

• This will make time series ubiquitous

• Current open source systems are much too slow

• We can fix that with modern NoSQL systems– (I wear a red hat for a reason)

Page 67: Dealing with an Upside Down Internet

© 2014 MapR Technologies 67

Questions

Page 68: Dealing with an Upside Down Internet

© 2014 MapR Technologies 68

Thank You

@mapr maprtech

[email protected]@apache.org

Ted Dunning, Chief Application Architect

MapRTechnologies

maprtech

mapr-technologies