monoids and sketches and crdts, oh my!

100
Monoids and Sketches and CRDTs, oh my! Kevin Scaldeferri OSB 2016

Upload: kscaldef

Post on 14-Apr-2017

75 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Monoids and sketches and crdts, oh my!

Monoids and Sketches and CRDTs, oh my!

Kevin ScaldeferriOSB 2016

Page 2: Monoids and sketches and crdts, oh my!

How Do I Math with Big Data?

Page 3: Monoids and sketches and crdts, oh my!

This document and the information herein (including any information that may be incorporated by reference) is provided for informational purposes only and should not be construed as an offer, commitment, promise or obligation on behalf of New Relic, Inc. (“New Relic”) to sell securities or deliver any product, material, code, functionality, or other feature. Any information provided hereby is proprietary to New Relic and may not be replicated or disclosed without New Relic’s express written permission.

Such information may contain forward-looking statements within the meaning of federal securities laws. Any statement that is not a historical fact or refers to expectations, projections, future plans, objectives, estimates, goals, or other characterizations of future events is a forward-looking statement. These forward-looking statements can often be identified as such because the context of the statement will include words such as “believes,” “anticipates,” “expects” or words of similar import.

Actual results may differ materially from those expressed in these forward-looking statements, which speak only as of the date hereof, and are subject to change at any time without notice. Existing and prospective investors, customers and other third parties transacting business with New Relic are cautioned not to place undue reliance on this forward-looking information. The achievement or success of the matters covered by such forward-looking statements are based on New Relic’s current assumptions, expectations, and beliefs and are subject to substantial risks, uncertainties, assumptions, and changes in circumstances that may cause the actual results, performance, or achievements to differ materially from those expressed or implied in any forward-looking statement. Further information on factors that could affect such forward-looking statements is included in the filings we make with the SEC from time to time. Copies of these documents may be obtained by visiting New Relic’s Investor Relations website at ir.newrelic.com or the SEC’s website at www.sec.gov.

New Relic assumes no obligation and does not intend to update these forward-looking statements, except as required by law. New Relic makes no warranties, expressed or implied, in this document or otherwise, with respect to the information provided.

Page 4: Monoids and sketches and crdts, oh my!
Page 5: Monoids and sketches and crdts, oh my!

How?

Page 6: Monoids and sketches and crdts, oh my!

Monoids and Sketches and CRDTs, oh my!

Page 7: Monoids and sketches and crdts, oh my!

Monoids

超音波システム研究所 / http://bit.ly/26bBTQ1 / CC BY 3.0

Page 8: Monoids and sketches and crdts, oh my!

WikipediaA monoid is an algebraic structure with a single

associative binary operation and an identity element.

http://bit.ly/1Wlrigv / CC0

Page 9: Monoids and sketches and crdts, oh my!

It’s just a thing you can “add”

Page 10: Monoids and sketches and crdts, oh my!

interface Monoid[T] { // (x + y) + z = x + (y + z) T add(T x, T y);

// 0 + x = x = x + 0 T unit();}

Page 11: Monoids and sketches and crdts, oh my!

interface Monoid[T] { // (x + y) + z = x + (y + z) T add(T x, T y);

// 0 + x = x = x + 0 T unit();}

Page 12: Monoids and sketches and crdts, oh my!

interface Monoid[T] { // (x + y) + z = x + (y + z) T add(T x, T y);

// 0 + x = x = x + 0 T unit();}

Page 13: Monoids and sketches and crdts, oh my!

interface Monoid[T] { // (x + y) + z = x + (y + z) T add(T x, T y);

// 0 + x = x = x + 0 T unit();}

Page 14: Monoids and sketches and crdts, oh my!

interface Monoid[T] { // (x + y) + z = x + (y + z) T add(T x, T y);

// 0 + x = x = x + 0 T unit();}

Page 15: Monoids and sketches and crdts, oh my!

One data type can have multiple monoids!

Page 16: Monoids and sketches and crdts, oh my!

Operation Unit

Sum 0

Product 1

Max -∞

Min +∞

Page 17: Monoids and sketches and crdts, oh my!

Live Demo!

Page 18: Monoids and sketches and crdts, oh my!

More Monoids

Count Boolean And

Lists & StringConcatenation

Boolean Or

Set UnionFunction

Composition

Page 19: Monoids and sketches and crdts, oh my!

Tuple Monoids

Monoid[U] & Monoid[V]

Monoid[(U,V)]

Page 20: Monoids and sketches and crdts, oh my!

Derived Monoids

Count & Sum ➜ Average

Count & Sum & SumOfSquares ➜ StdDev

Page 21: Monoids and sketches and crdts, oh my!

Sets don’t scale

Dan Morgan / http://bit.ly/1UiFhGs / CC BY 2.0

Page 22: Monoids and sketches and crdts, oh my!

Sketches=

Monoids +

Physics

Page 23: Monoids and sketches and crdts, oh my!

Counting by Flipping Coins

HHT T T HHHHHT HT T HHT HT T T

T T T HT T T T T T HT

Page 24: Monoids and sketches and crdts, oh my!

Unique Count by Hashing0111101001 1110101100 0010010010 0100100011 1000111000 0100011011 1100100110 1111011011 0011100001 1001011100

1110100101 1001110101 1010111001 1011110111 0000101001 0100101001 0100110000 0011110100 1011011010 0010011011

Page 25: Monoids and sketches and crdts, oh my!

Set Cardinality

(uniqueCount)≈

HyperLogLogAldo Schumann / http://bit.ly/1Yqzvme / public domain

Page 26: Monoids and sketches and crdts, oh my!

Set Membership

Page 27: Monoids and sketches and crdts, oh my!

interface ExtensionalSet[T] { Iterator[T] iterator()}

Page 28: Monoids and sketches and crdts, oh my!

interface IntensionalSet[T] { boolean isMember(T t);}

Page 29: Monoids and sketches and crdts, oh my!

Intensional Sets≈

Bloom Filters

Page 30: Monoids and sketches and crdts, oh my!

HashSet

Page 31: Monoids and sketches and crdts, oh my!

AHashSet

Page 32: Monoids and sketches and crdts, oh my!

AHashSet

Page 33: Monoids and sketches and crdts, oh my!

A

HashSet

Page 34: Monoids and sketches and crdts, oh my!

A

BHashSet

Page 35: Monoids and sketches and crdts, oh my!

A

BHashSet

Page 36: Monoids and sketches and crdts, oh my!

A B

HashSet

Page 37: Monoids and sketches and crdts, oh my!

A B

CHashSet

Page 38: Monoids and sketches and crdts, oh my!

A B

CHashSet

Page 39: Monoids and sketches and crdts, oh my!

A B

C

Ohnoes!

HashSet

Page 40: Monoids and sketches and crdts, oh my!

A B

C

HashSet

Page 41: Monoids and sketches and crdts, oh my!

A B

C

D?HashSet

Page 42: Monoids and sketches and crdts, oh my!

A B

C

D?HashSet

Page 43: Monoids and sketches and crdts, oh my!

A B

C

D?

Nopes!

HashSet

Page 44: Monoids and sketches and crdts, oh my!

A B

C

E?HashSet

Page 45: Monoids and sketches and crdts, oh my!

A B

C

E?HashSet

Page 46: Monoids and sketches and crdts, oh my!

A B

C

E?

Hmmm

HashSet

Page 47: Monoids and sketches and crdts, oh my!

A B

C

E?==

HashSet

Page 48: Monoids and sketches and crdts, oh my!

A B

C

E?==Nope!

HashSet

Page 49: Monoids and sketches and crdts, oh my!

BloomFilter

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Page 50: Monoids and sketches and crdts, oh my!

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

ABloomFilter

Page 51: Monoids and sketches and crdts, oh my!

0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0

ABloomFilter

Page 52: Monoids and sketches and crdts, oh my!

0 0 1 0 1 0 0 0 1 1 0 0 1 0 1 0

A BBloomFilter

Page 53: Monoids and sketches and crdts, oh my!

0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0

A B CBloomFilter

Page 54: Monoids and sketches and crdts, oh my!

0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0

A B C

D?

BloomFilter

Page 55: Monoids and sketches and crdts, oh my!

0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0

A B C

D?Nope!

BloomFilter

Page 56: Monoids and sketches and crdts, oh my!

0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0

A B C

A?

BloomFilter

Page 57: Monoids and sketches and crdts, oh my!

0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0

A B C

A?Yes*

BloomFilter

Page 58: Monoids and sketches and crdts, oh my!

BloomFilter Monoid

0 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0

0 1 1 0 0 0 0 1 0 1 0 0 0 0 0 1

0 1 1 0 1 0 1 1 1 1 0 0 1 0 1 1

+

=

Page 59: Monoids and sketches and crdts, oh my!

Circling Back:BloomFilters are a scalable

approximation to Sets

Page 60: Monoids and sketches and crdts, oh my!

CountMinSketch

Page 61: Monoids and sketches and crdts, oh my!

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

CountMinSketch

Page 62: Monoids and sketches and crdts, oh my!

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

A

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

CountMinSketch

Page 63: Monoids and sketches and crdts, oh my!

0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0

A

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

CountMinSketch

Page 64: Monoids and sketches and crdts, oh my!

10 0 0 0 0 0 0 0 1 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0

A

0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0

0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0

BCountMinSketch

Page 65: Monoids and sketches and crdts, oh my!

0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0

A

0 0 0 0 0 0 0 0 0 0 0 0 1 0 2 0

0 0 1 0 2 0 0 0 0 0 0 0 0 0 0 0

B CCountMinSketch

Page 66: Monoids and sketches and crdts, oh my!

0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0

A

0 0 0 0 0 0 0 0 0 0 0 0 1 0 3 0

0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0

B CCountMinSketch

Page 67: Monoids and sketches and crdts, oh my!

0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0

A

0 0 0 0 0 0 0 0 0 0 0 0 1 0 3 0

0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0

B C

D?

CountMinSketch

Page 68: Monoids and sketches and crdts, oh my!

0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0

A

0 0 0 0 0 0 0 0 0 0 0 0 1 0 3 0

0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0

B C

D? Min(2,1,0) = 0

CountMinSketch

Page 69: Monoids and sketches and crdts, oh my!

0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0

A

0 0 0 0 0 0 0 0 0 0 0 0 1 0 3 0

0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0

B C

A?

CountMinSketch

Page 70: Monoids and sketches and crdts, oh my!

0 0 0 0 0 0 1 0 1 2 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0

A

0 0 0 0 0 0 0 0 0 0 0 0 1 0 3 0

0 0 2 0 2 0 0 0 0 0 0 0 0 0 0 0

B C

A? Min(2,2,3) = 2

CountMinSketch

Page 71: Monoids and sketches and crdts, oh my!

CountMinSketchFrequency of Occurrence

Page 72: Monoids and sketches and crdts, oh my!

Funnels% of users who do A, then B

Size(A ∪ B) ≈ HyperLogLog

Size(A ∩ B) / Size(A ∪ B) ≈

MinHash

pedrik / http://bit.ly/25WzP1H / CC BY 2.0

Page 73: Monoids and sketches and crdts, oh my!

What About Streaming Data?

Page 74: Monoids and sketches and crdts, oh my!

Streaming is Distributed-in-Time

Computation

Page 75: Monoids and sketches and crdts, oh my!

What About Mutable Data?

Page 76: Monoids and sketches and crdts, oh my!

CRDTs

Page 77: Monoids and sketches and crdts, oh my!

Conflict-Free

Replicated

Data

Types

Page 78: Monoids and sketches and crdts, oh my!

Available,Eventually Consistent

Data Structures

Page 79: Monoids and sketches and crdts, oh my!

How Can Two People Count?

Page 80: Monoids and sketches and crdts, oh my!

0

0

Shared Counter

Page 81: Monoids and sketches and crdts, oh my!

0

0

Shared Counter

(+5)5

5

Page 82: Monoids and sketches and crdts, oh my!

0

0

Shared Counter

(+5)5

5

(-4)

(-3)

1 -2

2 -2

Page 83: Monoids and sketches and crdts, oh my!

0

0

Op-based Counter

(+5)5

5

(-4)

(-3)

1 -2

2 -2

Page 84: Monoids and sketches and crdts, oh my!

0

0

Op-based Counter

(+5)5

5 10

Oops!

Page 85: Monoids and sketches and crdts, oh my!

{}

{}

Naive Sets

Page 86: Monoids and sketches and crdts, oh my!

{}

{}

Naive Sets

(+X){X}

(+X)

{X}

{X} {X}

Page 87: Monoids and sketches and crdts, oh my!

{}

{}

Naive Sets

(+X){X}

(+X)

{X}

{X} {X}

(-X){}

{}

Page 88: Monoids and sketches and crdts, oh my!

{}

{}

Naive Sets

(+X){X}

(+X)

{X}

{X} {X}

(-X){}

{}

Oops!

Page 89: Monoids and sketches and crdts, oh my!
Page 90: Monoids and sketches and crdts, oh my!

{}

{}

Observed-Remove Sets

(+Xa){Xa}

(+Xb)

{Xb}

{Xb} {XaXb}

(-Xa){}

{Xb}

Page 91: Monoids and sketches and crdts, oh my!

0

0

State-based Counter

Page 92: Monoids and sketches and crdts, oh my!

0

0

State-based Counter

(+5){a=5}=5

{a=5}=5

Page 93: Monoids and sketches and crdts, oh my!

0

0

{a=9}=9

State-based Counter

(+5) (+4)

(+3)

{a=5}=5

{a=5}=5 {a=5,b=3}=8 {a=9,b=3}=12

{a=9,b=3}=12

Page 94: Monoids and sketches and crdts, oh my!

0

0

{a=9}=9

State-based Counter

(+5) (+4){a=5}=5

???{a=9}=9

Page 95: Monoids and sketches and crdts, oh my!

0

0

Increment-only Counter

(+5) (+4){a=5}=5

{a=9}=9{a=9}=9

{a=9}=9

Page 96: Monoids and sketches and crdts, oh my!

0

0 {a=+5,-4}=1

{a=+5,-4}=1

PN Counter

(+5) (-4){a=+5}=5

{a=+8,-4}=4{a=+5,-4}=1

(+3){a=+8,-4}=4

Page 97: Monoids and sketches and crdts, oh my!

0

0 {a:2:1}=1

{a:2:1}=1

Versioned State

(+5) (-4){a:1:5}=5

{a:3:4}=4{a:2:1}=1

(+3){a:3:4}=4

Page 98: Monoids and sketches and crdts, oh my!

Replace exactly-once, in-order delivery

with an idempotent merge strategy

Page 99: Monoids and sketches and crdts, oh my!

Summing UpMonoids allow computations to be done across many machines and merged

Sketches allow approximate results when the exact answers are computationally infeasible

CRDTs give an approach for mutable distributed data

Page 100: Monoids and sketches and crdts, oh my!

Thank [email protected]@kscaldef