streaming data mining - edo liberty's homepage · edo liberty , jelani nelson : streaming data...

229
Streaming Data Mining Edo Liberty 1 Jelani Nelson 2 1 Yahoo! Research, [email protected]. 2 Princeton University, [email protected]. Edo Liberty , Jelani Nelson : Streaming Data Mining 1 / 111

Upload: others

Post on 09-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

Edo Liberty 1 Jelani Nelson2

1Yahoo! Research, [email protected] University, [email protected].

Edo Liberty , Jelani Nelson : Streaming Data Mining 1 / 111

Page 2: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

The need for Streaming Data Mining

Standard Interface between data and mining algorithms

Edo Liberty , Jelani Nelson : Streaming Data Mining 2 / 111

Page 3: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

The need for Streaming Data Mining

We have a lot of data...Example: videos, images, email messages, webpages, chats, click data,search queries, shopping history, user browsing patterns, GPS trials,financial transactions, stock exchange data, electricity consumption, trafficrecords, Seismology, Astronomy, Physics, medical imaging, Chemistry,Computational Biology, weather measurements, maps, telephony data,SMSs, audio tracks and songs, applications, gaming scores, user ratings,questions answer forums, legal documentation, medical records, networktrafic records, satellite mesurants, digital microscopy, cellular records...

Edo Liberty , Jelani Nelson : Streaming Data Mining 3 / 111

Page 4: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

The need for Streaming Data Mining

Distributed storage and file systems (HaddopFS, GFS, Cassandra, XIV)

Edo Liberty , Jelani Nelson : Streaming Data Mining 4 / 111

Page 5: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

The need for Streaming Data Mining

Distributed computation (MapReduce, Hadoop, Message passing)

Edo Liberty , Jelani Nelson : Streaming Data Mining 5 / 111

Page 6: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

The need for Streaming Data Mining

Distributed computation (Web search, Hbase/bigtable)

Edo Liberty , Jelani Nelson : Streaming Data Mining 6 / 111

Page 7: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

The need for Streaming Data Mining

“Study Projects Nearly 45-Fold Annual Data Growth by 2020” EMC Press Release.

Figure: The Economist: Data, data everywhere

Edo Liberty , Jelani Nelson : Streaming Data Mining 7 / 111

Page 8: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

The need for Streaming Data Mining

IDC 2011 Digital Universe Study.

Edo Liberty , Jelani Nelson : Streaming Data Mining 8 / 111

Page 9: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

The need for Streaming Data Mining

Edo Liberty , Jelani Nelson : Streaming Data Mining 9 / 111

Page 10: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

More exact model

Trivial tasks: count items, sum values, sample, find min/max.

Edo Liberty , Jelani Nelson : Streaming Data Mining 10 / 111

Page 11: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Communication complexity

Impossible tasks: finding median, alert on new item, most frequent item.

Edo Liberty , Jelani Nelson : Streaming Data Mining 11 / 111

Page 12: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

When things are possible and not trivial:

1 Most tasks/query-types require different sketches

2 Algorithms are usually randomized

3 Results are, as a whole, approximated

But

1 Approximate result is expectable → significant speedup (one pass)

2 Data cannot be stored → only option

Edo Liberty , Jelani Nelson : Streaming Data Mining 12 / 111

Page 13: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

When things are possible and not trivial:

1 Most tasks/query-types require different sketches

2 Algorithms are usually randomized

3 Results are, as a whole, approximated

But

1 Approximate result is expectable → significant speedup (one pass)

2 Data cannot be stored → only option

Edo Liberty , Jelani Nelson : Streaming Data Mining 12 / 111

Page 14: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

When things are possible and not trivial:

1 Most tasks/query-types require different sketches

2 Algorithms are usually randomized

3 Results are, as a whole, approximated

But

1 Approximate result is expectable → significant speedup (one pass)

2 Data cannot be stored → only option

Edo Liberty , Jelani Nelson : Streaming Data Mining 12 / 111

Page 15: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

When things are possible and not trivial:

1 Most tasks/query-types require different sketches

2 Algorithms are usually randomized

3 Results are, as a whole, approximated

But

1 Approximate result is expectable → significant speedup (one pass)

2 Data cannot be stored → only option

Edo Liberty , Jelani Nelson : Streaming Data Mining 12 / 111

Page 16: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

When things are possible and not trivial:

1 Most tasks/query-types require different sketches

2 Algorithms are usually randomized

3 Results are, as a whole, approximated

But

1 Approximate result is expectable → significant speedup (one pass)

2 Data cannot be stored → only option

Edo Liberty , Jelani Nelson : Streaming Data Mining 12 / 111

Page 17: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

When things are possible and not trivial:

1 Most tasks/query-types require different sketches

2 Algorithms are usually randomized

3 Results are, as a whole, approximated

But

1 Approximate result is expectable → significant speedup (one pass)

2 Data cannot be stored → only option

Edo Liberty , Jelani Nelson : Streaming Data Mining 12 / 111

Page 18: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

When things are possible and not trivial:

1 Most tasks/query-types require different sketches

2 Algorithms are usually randomized

3 Results are, as a whole, approximated

But

1 Approximate result is expectable → significant speedup (one pass)

2 Data cannot be stored → only option

Edo Liberty , Jelani Nelson : Streaming Data Mining 12 / 111

Page 19: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

1 Items (words, IP-adresses, events, clicks,...):

Item frequenciesDistinct elementsMoment estimation

2 Vectors (text documents, images, example features,...)

Dimensionality reductionk-meansLinear Regression

3 Matrices (text corpora, user preferences, social graphs,...)

Efficiently approximating the covariance matrixSparsification by sampling

Edo Liberty , Jelani Nelson : Streaming Data Mining 13 / 111

Page 20: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

1 Items

Item frequenciesDistinct elementsMoment estimation

2 Vectors

Dimensionality reductionk-meansLinear Regression

3 Matrices

Efficiently approximating the covariance matrixSparsification by sampling

Edo Liberty , Jelani Nelson : Streaming Data Mining 14 / 111

Page 21: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Item frequencies

Computing f (i) for all i is easy in O(n) space.

Edo Liberty , Jelani Nelson : Streaming Data Mining 15 / 111

Page 22: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Item frequencies

Computing f (i) for all i is easy in O(n) space.

Edo Liberty , Jelani Nelson : Streaming Data Mining 15 / 111

Page 23: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Sampling

We sample with probability p and estimate f ′(i) = 1pA(i).

Edo Liberty , Jelani Nelson : Streaming Data Mining 16 / 111

Page 24: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Sampling

For f ′(i) = f (i)± εN it suffices that p ≥ c log(n/δ)/Nε2.

Edo Liberty , Jelani Nelson : Streaming Data Mining 17 / 111

Page 25: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Sampling

The space requirement is therefore O(log(n/δ)/ε2).

Edo Liberty , Jelani Nelson : Streaming Data Mining 18 / 111

Page 26: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Count min sketches

This time we keep only 2/ε counters in an array A

Edo Liberty , Jelani Nelson : Streaming Data Mining 19 / 111

Page 27: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Count min sketches

Items are counted in buckets according to a hash function h : [n]→ [2/ε].

Edo Liberty , Jelani Nelson : Streaming Data Mining 20 / 111

Page 28: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Count min sketches

Obviously we have that f ′(i) ≥ f (i)

Edo Liberty , Jelani Nelson : Streaming Data Mining 21 / 111

Page 29: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Count min sketches

But also Pr[f ′(i) ≤ f (i) + εN] ≥ 1/2.

Edo Liberty , Jelani Nelson : Streaming Data Mining 22 / 111

Page 30: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Count min sketches

This reduces the space requirement to O(log(n/δ)/ε).

Edo Liberty , Jelani Nelson : Streaming Data Mining 23 / 111

Page 31: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Lossy Counting

We show how to reduce the space requirement to 1/ε.Misra, Gries. Finding repeated elements, 1982.Demaine, Lopez-Ortiz, Munro. Frequency estimation of internet packet streams with limited space, 2002Karp, Shenker, Papadimitriou. A simple algorithm for finding frequent elements in streams and bags, 2003The name “Lossy Counting” was used for a different algorithm here by Manku and Motwani, 2002Metwally, Agrawal, Abbadi, Efficient Computation of Frequent and Top-k Elements in Data Streams, 2006 (Space Saving)

Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111

Page 32: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Lossy Counting

We keep at most 1/ε different items (in this case 2)

Edo Liberty , Jelani Nelson : Streaming Data Mining 25 / 111

Page 33: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Lossy Counting

If we have more than 1/ε we reduce all counters

Edo Liberty , Jelani Nelson : Streaming Data Mining 26 / 111

Page 34: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Lossy Counting

And repeat...

Edo Liberty , Jelani Nelson : Streaming Data Mining 27 / 111

Page 35: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Lossy Counting

Until the stream is consumed

Edo Liberty , Jelani Nelson : Streaming Data Mining 28 / 111

Page 36: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Lossy Counting

We have f (i) ≥ f ′(i) ≥ f ′(i)− εN.

Edo Liberty , Jelani Nelson : Streaming Data Mining 29 / 111

Page 37: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Lossy Counting

This is because we can delete 1/ε items at most εN times!

Edo Liberty , Jelani Nelson : Streaming Data Mining 30 / 111

Page 38: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Error Relative to the Tail

Figure: Distribution of top 20 and top 1000 most frequent words a messagingtext corpus. The heavy tail distribution is common to many data sources.

Therefore, εN = ε∑n

i=1 f (i) might not be tight enough.We now see how to reduce the approximation guarantee to

εn∑

i=k+1

f (i)

Edo Liberty , Jelani Nelson : Streaming Data Mining 31 / 111

Page 39: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Count Max Sketches

Distributing items with hash function h : [n]→ [4k] to 4k lossy counters.Beware: algorithm does not exist in the litrature, only described for didactic reasons.

Edo Liberty , Jelani Nelson : Streaming Data Mining 32 / 111

Page 40: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Count Max Sketches

Then nj =∑

j :h(j)=h(i) f (j) ≤∑n

i=k+1 f (i)/k with probability at least 1/2

Edo Liberty , Jelani Nelson : Streaming Data Mining 33 / 111

Page 41: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Count Max Sketches

So, f ′(i) > f (i)− ε∑n

i=k+1 f (i) with probability at least 1/2

Edo Liberty , Jelani Nelson : Streaming Data Mining 34 / 111

Page 42: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Count Max Sketches

f ′(i) is taken as the maximum value over log(n/δ) such structures.

Edo Liberty , Jelani Nelson : Streaming Data Mining 35 / 111

Page 43: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Count Max Sketches

This gives a total size of O(log(n/δ)/min(ε, 1/k))

Edo Liberty , Jelani Nelson : Streaming Data Mining 36 / 111

Page 44: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Count Sketches

Reduces the error to ε[∑n

i=k+1 f2(i)]1/2

Charikar, Chen, Farach-Colton. Finding frequent items in data streams. 2002

Edo Liberty , Jelani Nelson : Streaming Data Mining 37 / 111

Page 45: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Count Sketches

While the space increases slightly to O(log(n/δ)/min(ε2, 1/k))

Edo Liberty , Jelani Nelson : Streaming Data Mining 38 / 111

Page 46: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Item frequency estimation

Table: Recap of the six algorithms presented. All quantities are given in thebig-O notation.

Space Update Query Approximation PrfailNaive n 1 1 0 0

Sampling log(n/δ)ε2 1 1 ε

∑ni=1 f (i) δ

Count MinSketches

log(n/δ)ε log( n

δ ) log( nδ ) ε

∑ni=1 f (i) δ

LossyCounting

1ε 1 1 ε

∑ni=1 f (i) 0

Count MaxSketches

log(n/δ)min(ε,1/k) log( n

δ ) log( nδ ) ε

∑ni=k+1 f (i) δ

CountSketches

log(n/δ)min(ε2,1/k) log( n

δ ) log( nδ ) ε[

∑ni=k+1 f

2(i)]1/2 δ

See also: Berinde, Indyk, Cormode, Strauss, Space-optimal heavy hitters with strong error bounds, PODS 2009

Survey at: Gibbons, Matias, External memory algorithms, 1999

Edo Liberty , Jelani Nelson : Streaming Data Mining 39 / 111

Page 47: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

1 Items

Item frequenciesDistinct elementsMoment estimation

2 Vectors

Dimensionality reductionk-meansLinear Regression

3 Matrices

Efficiently approximating the covariance matrixSparsification by sampling

Edo Liberty , Jelani Nelson : Streaming Data Mining 40 / 111

Page 48: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

server

to: cs.princeton.edu

from: 18.9.22.69

packet————————————————————————————————————

Addresses seen:

18.9.22.69

Edo Liberty , Jelani Nelson : Streaming Data Mining 41 / 111

Page 49: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

server

to: cs.princeton.edu

from: 18.9.22.69

packet————————————————————————————————————

Addresses seen:

18.9.22.69

Edo Liberty , Jelani Nelson : Streaming Data Mining 41 / 111

Page 50: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

server

to: cs.princeton.edu

from: 18.9.22.69

packet————————————————————————————————————

Addresses seen:

18.9.22.69

Edo Liberty , Jelani Nelson : Streaming Data Mining 41 / 111

Page 51: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

server

to: cs.princeton.edu

from: 18.9.22.69

packet————————————————————————————————————

Addresses seen:

18.9.22.69

Edo Liberty , Jelani Nelson : Streaming Data Mining 42 / 111

Page 52: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

server

to: cs.princeton.edu

from: 69.172.200.24

packet————————————————————————————————————

Addresses seen:

18.9.22.69

69.172.200.24

Edo Liberty , Jelani Nelson : Streaming Data Mining 43 / 111

Page 53: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

server

to: cs.princeton.edu

from: 69.172.200.24

packet————————————————————————————————————

Addresses seen:

18.9.22.69

69.172.200.24

Edo Liberty , Jelani Nelson : Streaming Data Mining 43 / 111

Page 54: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

server

to: cs.princeton.edu

from: 18.9.22.69

packet————————————————————————————————————

Addresses seen:

18.9.22.69

69.172.200.24

Edo Liberty , Jelani Nelson : Streaming Data Mining 44 / 111

Page 55: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

server

to: cs.princeton.edu

from: 18.9.22.69

packet————————————————————————————————————

Addresses seen:

18.9.22.69

69.172.200.24

Edo Liberty , Jelani Nelson : Streaming Data Mining 45 / 111

Page 56: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

server

to: cs.princeton.edu

from: 18.9.22.69

packet————————————————————————————————————

Addresses seen:

18.9.22.69

69.172.200.24

Edo Liberty , Jelani Nelson : Streaming Data Mining 46 / 111

Page 57: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

server

to: cs.princeton.edu

from: 18.9.22.69

packet————————————————————————————————————

Addresses seen:

18.9.22.69

69.172.200.24

Edo Liberty , Jelani Nelson : Streaming Data Mining 47 / 111

Page 58: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

server

to: cs.princeton.edu

from: 106.10.165.51

packet————————————————————————————————————

Addresses seen:

18.9.22.69

69.172.200.24

106.10.165.51

Edo Liberty , Jelani Nelson : Streaming Data Mining 48 / 111

Page 59: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

server

to: cs.princeton.edu

from: 106.10.165.51

packet————————————————————————————————————

Addresses seen:

18.9.22.69

69.172.200.24

106.10.165.51

Edo Liberty , Jelani Nelson : Streaming Data Mining 48 / 111

Page 60: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

server

to: cs.princeton.edu

from: . . .

packet————————————————————————————————————

Addresses seen:

18.9.22.69

69.172.200.24

106.10.165.51

. . .

Goal: Count number of dis-tinct IP addresses that con-tacted server.

Edo Liberty , Jelani Nelson : Streaming Data Mining 49 / 111

Page 61: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

server

to: cs.princeton.edu

from: . . .

packet————————————————————————————————————

Addresses seen:

18.9.22.69

69.172.200.24

106.10.165.51

. . .

Goal: Count number of dis-tinct IP addresses that con-tacted server.

Edo Liberty , Jelani Nelson : Streaming Data Mining 49 / 111

Page 62: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

Two obvious solutions

Store a bitvector x ∈ 0, 12128(xi = 1 if we’ve seen address i)

Store a hash table: O(N) words of memory

In general: sequence of N integers each in 1, . . . , n.Can either use O(N log n) bits of memory, or n bits.

But we can do better!

Edo Liberty , Jelani Nelson : Streaming Data Mining 50 / 111

Page 63: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

Two obvious solutions

Store a bitvector x ∈ 0, 12128(xi = 1 if we’ve seen address i)

Store a hash table: O(N) words of memory

In general: sequence of N integers each in 1, . . . , n.Can either use O(N log n) bits of memory, or n bits.

But we can do better!

Edo Liberty , Jelani Nelson : Streaming Data Mining 50 / 111

Page 64: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

Two obvious solutions

Store a bitvector x ∈ 0, 12128(xi = 1 if we’ve seen address i)

Store a hash table: O(N) words of memory

In general: sequence of N integers each in 1, . . . , n.Can either use O(N log n) bits of memory, or n bits.

But we can do better!

Edo Liberty , Jelani Nelson : Streaming Data Mining 50 / 111

Page 65: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

KMV algorithm[Bar-Yossef, Jayram, Kumar, Sivakumar, Trevisan, RANDOM 2002]

also see [Beyer, Gemulla, Haas, Reinwald, Sismanis, Commun. ACM 52(10), 2009]

(first small-space algorithm published is due to [Flajolet, Martin, FOCS 1983])

Guarantee: Let F0 be the number of distinct elements. Will output a value F0

such that with probability at least 2/3, |F0 − F0| ≤ εF0.

KMV algorithm

1 Pick random hash function h : [n]→ [0, 1]

2 Maintain k = Θ(1/ε2) smallest distinct hash values seen in streamX1 < X2 < . . . < Xk

3 if seen less than k distinct hash values at end of stream, outputnumber of distinct hash values seen

else output k/Xk

Edo Liberty , Jelani Nelson : Streaming Data Mining 51 / 111

Page 66: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

KMV algorithm[Bar-Yossef, Jayram, Kumar, Sivakumar, Trevisan, RANDOM 2002]

also see [Beyer, Gemulla, Haas, Reinwald, Sismanis, Commun. ACM 52(10), 2009]

(first small-space algorithm published is due to [Flajolet, Martin, FOCS 1983])

Guarantee: Let F0 be the number of distinct elements. Will output a value F0

such that with probability at least 2/3, |F0 − F0| ≤ εF0.

KMV algorithm

1 Pick random hash function h : [n]→ [0, 1]

2 Maintain k = Θ(1/ε2) smallest distinct hash values seen in streamX1 < X2 < . . . < Xk

3 if seen less than k distinct hash values at end of stream, outputnumber of distinct hash values seen

else output k/Xk

Edo Liberty , Jelani Nelson : Streaming Data Mining 51 / 111

Page 67: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.00239167

Stream: 5

1 5 2 7 1 3 8 4 6

h(5) = 0.00239167

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 68: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.00239167

Stream: 5

1 5 2 7 1 3 8 4 6

h(5) = 0.00239167

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 69: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.00239167

Stream: 5 1

5 2 7 1 3 8 4 6

h(1) = 0.973811

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 70: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.973811

Stream: 5 1

5 2 7 1 3 8 4 6

h(1) = 0.973811

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 71: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.973811

Stream: 5 1 5

2 7 1 3 8 4 6

h(5) = 0.00239167

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 72: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.973811

Stream: 5 1 5

2 7 1 3 8 4 6

h(5) = 0.00239167

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 73: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.973811

Stream: 5 1 5 2

7 1 3 8 4 6

h(2) = 0.0929362

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 74: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.9738110.0929362

Stream: 5 1 5 2

7 1 3 8 4 6

h(2) = 0.0929362

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 75: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.9738110.0929362

Stream: 5 1 5 2 7

1 3 8 4 6

h(7) = 0.425028

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 76: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.973811

0.09293620.425028

Stream: 5 1 5 2 7

1 3 8 4 6

h(7) = 0.425028

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 77: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.973811

0.09293620.425028

Stream: 5 1 5 2 7 1

3 8 4 6

h(1) = 0.973811

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 78: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.973811

0.09293620.425028

Stream: 5 1 5 2 7 1

3 8 4 6

h(1) = 0.973811

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 79: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.973811

0.09293620.425028

Stream: 5 1 5 2 7 1 3

8 4 6

h(3) = 0.770643

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 80: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.973811

0.09293620.425028

Stream: 5 1 5 2 7 1 3

8 4 6

h(3) = 0.770643

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 81: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.973811

0.09293620.425028

Stream: 5 1 5 2 7 1 3 8

4 6

h(8) = 0.223476

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 82: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.973811

0.09293620.4250280.223476

Stream: 5 1 5 2 7 1 3 8

4 6

h(8) = 0.223476

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 83: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.973811

0.09293620.4250280.223476

Stream: 5 1 5 2 7 1 3 8 4

6

h(4) = 0.204447

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 84: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.973811

0.09293620.4250280.2234760.204447

Stream: 5 1 5 2 7 1 3 8 4

6

h(4) = 0.204447

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 85: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.973811

0.09293620.4250280.2234760.204447

Stream: 5 1 5 2 7 1 3 8 4 6

h(6) = 0.88464

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 86: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.973811

0.09293620.4250280.2234760.204447

Stream: 5 1 5 2 7 1 3 8 4 6

h(6) = 0.88464

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 87: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.973811

0.09293620.4250280.2234760.204447

Stream: 5 1 5 2 7 1 3 8 4 6

output k/Xk = 3/.204447 = 14.6737

Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 88: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV algorithm example

k = 3k minimum (hash) values (i.e. kmv)

0.002391670.973811

0.09293620.4250280.2234760.204447

Stream: 5 1 5 2 7 1 3 8 4 6

output k/Xk = 3/.204447 = 14.6737Note: true answer is 8

Edo Liberty , Jelani Nelson : Streaming Data Mining 52 / 111

Page 89: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — Why does KMV work?

Question: Suppose we pick t random numbers X1, . . . ,Xt in the range[0, 1]. What do we expect the kth smallest Xi to be on average?

Answer: kt+1

0——•——•——•——1aaaa︸︷︷︸ aaaa︸︷︷︸ aaaa︸︷︷︸ aaaa︸︷︷︸

14

14

14

14

We expect the t random numbers to be evenly spaced when sorted fromsmallest to largest, so kth smallest is expected to be k/(t + 1).

For us t = F0, so if things go according to expectation then k/Xk = F0 + 1Of course, things don’t always go exactly according to expectation

Edo Liberty , Jelani Nelson : Streaming Data Mining 53 / 111

Page 90: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — Why does KMV work?

Question: Suppose we pick t random numbers X1, . . . ,Xt in the range[0, 1]. What do we expect the kth smallest Xi to be on average?

Answer: kt+1

0——•——•——•——1aaaa︸︷︷︸ aaaa︸︷︷︸ aaaa︸︷︷︸ aaaa︸︷︷︸

14

14

14

14

We expect the t random numbers to be evenly spaced when sorted fromsmallest to largest, so kth smallest is expected to be k/(t + 1).

For us t = F0, so if things go according to expectation then k/Xk = F0 + 1Of course, things don’t always go exactly according to expectation

Edo Liberty , Jelani Nelson : Streaming Data Mining 53 / 111

Page 91: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — Why does KMV work?

Question: Suppose we pick t random numbers X1, . . . ,Xt in the range[0, 1]. What do we expect the kth smallest Xi to be on average?

Answer: kt+1

0——•——•——•——1aaaa︸︷︷︸ aaaa︸︷︷︸ aaaa︸︷︷︸ aaaa︸︷︷︸

14

14

14

14

We expect the t random numbers to be evenly spaced when sorted fromsmallest to largest, so kth smallest is expected to be k/(t + 1).

For us t = F0, so if things go according to expectation then k/Xk = F0 + 1

Of course, things don’t always go exactly according to expectation

Edo Liberty , Jelani Nelson : Streaming Data Mining 53 / 111

Page 92: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — Why does KMV work?

Question: Suppose we pick t random numbers X1, . . . ,Xt in the range[0, 1]. What do we expect the kth smallest Xi to be on average?

Answer: kt+1

0——•——•——•——1aaaa︸︷︷︸ aaaa︸︷︷︸ aaaa︸︷︷︸ aaaa︸︷︷︸

14

14

14

14

We expect the t random numbers to be evenly spaced when sorted fromsmallest to largest, so kth smallest is expected to be k/(t + 1).

For us t = F0, so if things go according to expectation then k/Xk = F0 + 1Of course, things don’t always go exactly according to expectation

Edo Liberty , Jelani Nelson : Streaming Data Mining 53 / 111

Page 93: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV analysis

Assume F0 > k .

Event E1: fewer than k elements hash below k/(F0(1 + ε))

Event E2: at least k elements hash below k/(F0(1− ε))

As long as both E1, E2 happen, (1− ε)F0 ≤ F0 ≤ (1 + ε)F0, as we want

What’s Pr[¬E1]?Yi indicator random variable for event h(ith item) ≤ k/(F0(1 + ε))Y =

∑F0i=1 Yi is the number of items below threshold

EY =∑F0

i=1 EYi = k/(1 + ε)

Var[Y ] =∑F0

i=1 Var[Yi ] ≤ k/(1 + ε)

Chebyshev: Pr(¬E1) = Pr(Y ≥ k) ≤ Var[Y ]/(k − EY )2 ≤ (1 + ε)/(ε2k)

Edo Liberty , Jelani Nelson : Streaming Data Mining 54 / 111

Page 94: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV analysis

Assume F0 > k . Define good events:

Event E1: fewer than k elements hash below k/(F0(1 + ε))

Event E2: at least k elements hash below k/(F0(1− ε))

As long as both E1, E2 happen, (1− ε)F0 ≤ F0 ≤ (1 + ε)F0, as we want

What’s Pr[¬E1]?Yi indicator random variable for event h(ith item) ≤ k/(F0(1 + ε))Y =

∑F0i=1 Yi is the number of items below threshold

EY =∑F0

i=1 EYi = k/(1 + ε)

Var[Y ] =∑F0

i=1 Var[Yi ] ≤ k/(1 + ε)

Chebyshev: Pr(¬E1) = Pr(Y ≥ k) ≤ Var[Y ]/(k − EY )2 ≤ (1 + ε)/(ε2k)

Edo Liberty , Jelani Nelson : Streaming Data Mining 54 / 111

Page 95: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV analysis

Assume F0 > k . Define good events:

Event E1: fewer than k elements hash below k/(F0(1 + ε))

Event E2: at least k elements hash below k/(F0(1− ε))

As long as both E1, E2 happen, (1− ε)F0 ≤ F0 ≤ (1 + ε)F0, as we want

What’s Pr[¬E1]?

Yi indicator random variable for event h(ith item) ≤ k/(F0(1 + ε))Y =

∑F0i=1 Yi is the number of items below threshold

EY =∑F0

i=1 EYi = k/(1 + ε)

Var[Y ] =∑F0

i=1 Var[Yi ] ≤ k/(1 + ε)

Chebyshev: Pr(¬E1) = Pr(Y ≥ k) ≤ Var[Y ]/(k − EY )2 ≤ (1 + ε)/(ε2k)

Edo Liberty , Jelani Nelson : Streaming Data Mining 54 / 111

Page 96: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV analysis

Assume F0 > k . Define good events:

Event E1: fewer than k elements hash below k/(F0(1 + ε))

Event E2: at least k elements hash below k/(F0(1− ε))

As long as both E1, E2 happen, (1− ε)F0 ≤ F0 ≤ (1 + ε)F0, as we want

What’s Pr[¬E1]?Yi indicator random variable for event h(ith item) ≤ k/(F0(1 + ε))Y =

∑F0i=1 Yi is the number of items below threshold

EY =∑F0

i=1 EYi = k/(1 + ε)

Var[Y ] =∑F0

i=1 Var[Yi ] ≤ k/(1 + ε)

Chebyshev: Pr(¬E1) = Pr(Y ≥ k) ≤ Var[Y ]/(k − EY )2 ≤ (1 + ε)/(ε2k)

Edo Liberty , Jelani Nelson : Streaming Data Mining 54 / 111

Page 97: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV analysis

Assume F0 > k . Define good events:

Event E1: fewer than k elements hash below k/(F0(1 + ε))

Event E2: at least k elements hash below k/(F0(1− ε))

As long as both E1, E2 happen, (1− ε)F0 ≤ F0 ≤ (1 + ε)F0, as we want

What’s Pr[¬E1]?Yi indicator random variable for event h(ith item) ≤ k/(F0(1 + ε))Y =

∑F0i=1 Yi is the number of items below threshold

EY =∑F0

i=1 EYi = k/(1 + ε)

Var[Y ] =∑F0

i=1 Var[Yi ] ≤ k/(1 + ε)

Chebyshev: Pr(¬E1) = Pr(Y ≥ k) ≤ Var[Y ]/(k − EY )2 ≤ (1 + ε)/(ε2k)

Edo Liberty , Jelani Nelson : Streaming Data Mining 54 / 111

Page 98: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV analysis

Assume F0 > k . Define good events:

Event E1: fewer than k elements hash below k/(F0(1 + ε))

Event E2: at least k elements hash below k/(F0(1− ε))

As long as both E1, E2 happen, (1− ε)F0 ≤ F0 ≤ (1 + ε)F0, as we want

What’s Pr[¬E1]?Yi indicator random variable for event h(ith item) ≤ k/(F0(1 + ε))Y =

∑F0i=1 Yi is the number of items below threshold

EY =∑F0

i=1 EYi = k/(1 + ε)

Var[Y ] =∑F0

i=1 Var[Yi ] ≤ k/(1 + ε)

Chebyshev: Pr(¬E1) = Pr(Y ≥ k) ≤ Var[Y ]/(k − EY )2 ≤ (1 + ε)/(ε2k)

Edo Liberty , Jelani Nelson : Streaming Data Mining 54 / 111

Page 99: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements — KMV analysis

Assume F0 > k . Define good events:

Event E1: fewer than k elements hash below k/(F0(1 + ε))

Event E2: at least k elements hash below k/(F0(1− ε))

As long as both E1, E2 happen, (1− ε)F0 ≤ F0 ≤ (1 + ε)F0, as we want

What’s Pr[¬E1]?Yi indicator random variable for event h(ith item) ≤ k/(F0(1 + ε))Y =

∑F0i=1 Yi is the number of items below threshold

EY =∑F0

i=1 EYi = k/(1 + ε)

Var[Y ] =∑F0

i=1 Var[Yi ] ≤ k/(1 + ε)

Chebyshev: Pr(¬E1) = Pr(Y ≥ k) ≤ Var[Y ]/(k − EY )2 ≤ (1 + ε)/(ε2k)

Edo Liberty , Jelani Nelson : Streaming Data Mining 54 / 111

Page 100: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Distinct elements

See algorithms with even better space performance in

[Durand, Flajolet, 2003] (with implementation!)

[Flajolet, Fusy, Gandouet, Meunier, Disc. Math. and Theor. Comp. Sci., 2007] (with implementation!)

[Kane, N., Woodruff, 2010]

Edo Liberty , Jelani Nelson : Streaming Data Mining 55 / 111

Page 101: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

1 Items

Item frequenciesDistinct elementsMoment estimation

2 Vectors

Dimensionality reductionk-meansLinear Regression

3 Matrices

Efficiently approximating the covariance matrixSparsification by sampling

Edo Liberty , Jelani Nelson : Streaming Data Mining 56 / 111

Page 102: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation

Problem: Compute value Fp which, with 2/3 probability, lies in theinterval [(1− ε)Fp, (1 + ε)Fp].

Fp = ‖f ‖pp =n∑

i=1

|f (i)|p

p = 0: Distinct elements

p =∞: Most frequent item (actually “F1/∞∞ ”, or ‖f ‖∞)

larger p: Closer approximation to F∞

Unfortunately known that poly(n) space required for p > 2.[Bar-Yossef, Jayram, Kumar, Sivakumar, JCSS 68(4), 2004]

[Chakrabarti, Khot, Sun, CCC 2003]

Edo Liberty , Jelani Nelson : Streaming Data Mining 57 / 111

Page 103: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation

Problem: Compute value Fp which, with 2/3 probability, lies in theinterval [(1− ε)Fp, (1 + ε)Fp].

Fp = ‖f ‖pp =n∑

i=1

|f (i)|p

p = 0: Distinct elements

p =∞: Most frequent item (actually “F1/∞∞ ”, or ‖f ‖∞)

larger p: Closer approximation to F∞

Unfortunately known that poly(n) space required for p > 2.[Bar-Yossef, Jayram, Kumar, Sivakumar, JCSS 68(4), 2004]

[Chakrabarti, Khot, Sun, CCC 2003]

Edo Liberty , Jelani Nelson : Streaming Data Mining 57 / 111

Page 104: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation

Problem: Compute value Fp which, with 2/3 probability, lies in theinterval [(1− ε)Fp, (1 + ε)Fp].

Fp = ‖f ‖pp =n∑

i=1

|f (i)|p

p = 0: Distinct elements

p =∞: Most frequent item (actually “F1/∞∞ ”, or ‖f ‖∞)

larger p: Closer approximation to F∞

Unfortunately known that poly(n) space required for p > 2.[Bar-Yossef, Jayram, Kumar, Sivakumar, JCSS 68(4), 2004]

[Chakrabarti, Khot, Sun, CCC 2003]

Edo Liberty , Jelani Nelson : Streaming Data Mining 57 / 111

Page 105: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — p = 2

A look at p = 2

p = 2 is as close as we can get to ‖f ‖∞ while not taking polynomialspace (“is there an outlier?”)

Linear sketches give a way to estimate dot product

(read: cosine similarity)

Suppose x , y are unit vectors and ‖z‖2 is some estimate (1± ε)‖z‖2

Recall ‖x − y‖22 = ‖x‖2

2 + ‖y‖22 − 2 〈x , y〉

so 12 · (‖ ˜(x − y)‖2

2 − ‖x‖22 − ‖y‖2

2) = 〈x , y〉 ± 2ε

Edo Liberty , Jelani Nelson : Streaming Data Mining 58 / 111

Page 106: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — p = 2

A look at p = 2

p = 2 is as close as we can get to ‖f ‖∞ while not taking polynomialspace (“is there an outlier?”)

Linear sketches give a way to estimate dot product

(read: cosine similarity)

Suppose x , y are unit vectors and ‖z‖2 is some estimate (1± ε)‖z‖2

Recall ‖x − y‖22 = ‖x‖2

2 + ‖y‖22 − 2 〈x , y〉

so 12 · (‖ ˜(x − y)‖2

2 − ‖x‖22 − ‖y‖2

2) = 〈x , y〉 ± 2ε

Edo Liberty , Jelani Nelson : Streaming Data Mining 58 / 111

Page 107: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — p = 2

A look at p = 2

p = 2 is as close as we can get to ‖f ‖∞ while not taking polynomialspace (“is there an outlier?”)

Linear sketches give a way to estimate dot product

(read: cosine similarity)

Suppose x , y are unit vectors and ‖z‖2 is some estimate (1± ε)‖z‖2

Recall ‖x − y‖22 = ‖x‖2

2 + ‖y‖22 − 2 〈x , y〉

so 12 · (‖ ˜(x − y)‖2

2 − ‖x‖22 − ‖y‖2

2) = 〈x , y〉 ± 2ε

Edo Liberty , Jelani Nelson : Streaming Data Mining 58 / 111

Page 108: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — p = 2

TZ sketch[Thorup, Zhang, SODA 2004]

also see [Charikar, Chen, Farach-Colton, ICALP 2002]

(first small-space algorithm published is due to [Alon, Matias, Szegedy, STOC 1996])

TZ sketch

Initialize counters A1, . . . ,Ak to 0 for k = Θ(1/ε2)

Pick random hash functions h : [n]→ [k] and σ : [n]→ −1, 1upon update fi ← fi + v add σ(i) · v to Ah(i)

output∑k

j=1 A2j

Just O(1/ε2) words of space and constant update time!

Edo Liberty , Jelani Nelson : Streaming Data Mining 59 / 111

Page 109: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 0 0 0a a a a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

0 0 0 0 0 0 0 0 0 0

a a a a a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 110: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 0 0 0a a a a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

0 0 0 0 0 0 0 0 0 0

+1 a a a a a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 111: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 0 0 0a a a a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

1 0 0 0 0 0 0 0 0 0

+1 a a a a a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 112: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 0 0 0a a a +1 a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

1 0 0 0 0 0 0 0 0 0

+1 a a a a a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 113: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 0 1 0a a a +1 a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

1 0 0 0 0 0 0 0 0 0

+1 a a a a a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 114: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 0 1 0a a a a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

1 0 0 0 0 0 0 0 0 0

a a a -4 a a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 115: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 0 1 0a a a a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

1 0 0 0 -4 0 0 0 0 0

a a a -4 a a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 116: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 0 1 0a a a -4 a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

1 0 0 0 -4 0 0 0 0 0

a a a -4 a a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 117: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 -4 1 0a a a -4 a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

1 0 0 0 -4 0 0 0 0 0

a a a -4 a a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 118: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 -4 1 0a a a a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

1 0 0 0 -4 0 0 0 0 0

-5 a a a a a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 119: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 -4 1 0a a a a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

-4 0 0 0 -4 0 0 0 0 0

-5 a a a a a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 120: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 -4 1 0a a a -5 a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

-4 0 0 0 -4 0 0 0 0 0

-5 a a a a a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 121: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 -4 -4 0a a a -5 a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

-4 0 0 0 -4 0 0 0 0 0

-5 a a a a a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 122: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 -4 -4 0a a a a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

-4 0 0 0 -4 0 0 0 0 0

a a a a +2 a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 123: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 -4 -4 0a a a a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

-4 0 0 0 -4 0 2 0 0 0

a a a a +2 a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 124: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 -4 -4 0a a a -2 a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

-4 0 0 0 -4 0 2 0 0 0

a a a a +2 a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 125: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 -4 -6 0a a a -2 a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

-4 0 0 0 -4 0 2 0 0 0

a a a a +2 a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 126: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch

n = 10

k = 5A1 A2 A3 A4 A5

A = 0 0 -4 -6 0a a a a

output 02 +02 +(−4)2 +(−6)2 +02 = 52

(note ‖f ‖22 = (−4)2 + (−4)2 + 22 = 38)

h valuesh(1): 4

h(2): 1

h(3): 3

h(4): 1

h(5): 3

h(6): 4

h(7): 4

h(8): 4

h(9): 2

h(10): 4

σ valuesσ(1): +1

σ(2): +1

σ(3): +1

σ(4): −1

σ(5): +1

σ(6): +1

σ(7): −1

σ(8): +1

σ(9): +1

σ(10): +1

f =

f (1) f (2) f (3) f (4) f (5) f (6) f (7) f (8) f (9) f (10)

-4 0 0 0 -4 0 2 0 0 0

a a a a a a a

Edo Liberty , Jelani Nelson : Streaming Data Mining 60 / 111

Page 127: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch analysis

Why does it work?

Let δi ,r be an indicator random variable for the event h(i) = r

Ar =∑n

i=1 δi ,rσ(i)f (i)

⇒ A2r =

∑ni=1 δi ,r f (i)2 +

∑i 6=j δi ,rδj ,rσ(i)σ(j)f (i)f (j)

⇒ EA2r =

∑ni=1(Eδi ,r )f (i)2 +

∑i 6=j(Eδi ,rδj ,r )(Eσ(i)Eσ(j))f (i)f (j)

⇒ EA2r = ‖f ‖2

2/k

⇒ E∑k

r=1 A2j =

∑kr=1 EA2

r = ‖f ‖22

Edo Liberty , Jelani Nelson : Streaming Data Mining 61 / 111

Page 128: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch analysis

Why does it work?

Let δi ,r be an indicator random variable for the event h(i) = r

Ar =∑n

i=1 δi ,rσ(i)f (i)

⇒ A2r =

∑ni=1 δi ,r f (i)2 +

∑i 6=j δi ,rδj ,rσ(i)σ(j)f (i)f (j)

⇒ EA2r =

∑ni=1(Eδi ,r )f (i)2 +

∑i 6=j(Eδi ,rδj ,r )(Eσ(i)Eσ(j))f (i)f (j)

⇒ EA2r = ‖f ‖2

2/k

⇒ E∑k

r=1 A2j =

∑kr=1 EA2

r = ‖f ‖22

Edo Liberty , Jelani Nelson : Streaming Data Mining 61 / 111

Page 129: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch analysis

Why does it work?

Let δi ,r be an indicator random variable for the event h(i) = r

Ar =∑n

i=1 δi ,rσ(i)f (i)

⇒ A2r =

∑ni=1 δi ,r f (i)2 +

∑i 6=j δi ,rδj ,rσ(i)σ(j)f (i)f (j)

⇒ EA2r =

∑ni=1(Eδi ,r )f (i)2 +

∑i 6=j(Eδi ,rδj ,r )(Eσ(i)Eσ(j))f (i)f (j)

⇒ EA2r = ‖f ‖2

2/k

⇒ E∑k

r=1 A2j =

∑kr=1 EA2

r = ‖f ‖22

Edo Liberty , Jelani Nelson : Streaming Data Mining 61 / 111

Page 130: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch analysis

Why does it work?

Let δi ,r be an indicator random variable for the event h(i) = r

Ar =∑n

i=1 δi ,rσ(i)f (i)

⇒ A2r =

∑ni=1 δi ,r f (i)2 +

∑i 6=j δi ,rδj ,rσ(i)σ(j)f (i)f (j)

⇒ EA2r =

∑ni=1(Eδi ,r )f (i)2 +

∑i 6=j(Eδi ,rδj ,r )(Eσ(i)Eσ(j))f (i)f (j)

⇒ EA2r = ‖f ‖2

2/k

⇒ E∑k

r=1 A2j =

∑kr=1 EA2

r = ‖f ‖22

Edo Liberty , Jelani Nelson : Streaming Data Mining 61 / 111

Page 131: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch analysis

Why does it work?

Let δi ,r be an indicator random variable for the event h(i) = r

Ar =∑n

i=1 δi ,rσ(i)f (i)

⇒ A2r =

∑ni=1 δi ,r f (i)2 +

∑i 6=j δi ,rδj ,rσ(i)σ(j)f (i)f (j)

⇒ EA2r =

∑ni=1(Eδi ,r )f (i)2 +

∑i 6=j(Eδi ,rδj ,r )(Eσ(i)Eσ(j))f (i)f (j)

⇒ EA2r = ‖f ‖2

2/k

⇒ E∑k

r=1 A2j =

∑kr=1 EA2

r = ‖f ‖22

Edo Liberty , Jelani Nelson : Streaming Data Mining 61 / 111

Page 132: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch analysis

Why does it work?

Let δi ,r be an indicator random variable for the event h(i) = r

Ar =∑n

i=1 δi ,rσ(i)f (i)

⇒ A2r =

∑ni=1 δi ,r f (i)2 +

∑i 6=j δi ,rδj ,rσ(i)σ(j)f (i)f (j)

⇒ EA2r =

∑ni=1(Eδi ,r )f (i)2 +

∑i 6=j(Eδi ,rδj ,r )(Eσ(i)Eσ(j))f (i)f (j)

⇒ EA2r = ‖f ‖2

2/k

⇒ E∑k

r=1 A2j =

∑kr=1 EA2

r = ‖f ‖22

Edo Liberty , Jelani Nelson : Streaming Data Mining 61 / 111

Page 133: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch analysis

Why does it work?

Let δi ,r be an indicator random variable for the event h(i) = r

Ar =∑n

i=1 δi ,rσ(i)f (i)

⇒ A2r =

∑ni=1 δi ,r f (i)2 +

∑i 6=j δi ,rδj ,rσ(i)σ(j)f (i)f (j)

⇒ EA2r =

∑ni=1(Eδi ,r )f (i)2 +

∑i 6=j(Eδi ,rδj ,r )(Eσ(i)Eσ(j))f (i)f (j)

⇒ EA2r = ‖f ‖2

2/k

⇒ E∑k

r=1 A2j =

∑kr=1 EA2

r = ‖f ‖22

Edo Liberty , Jelani Nelson : Streaming Data Mining 61 / 111

Page 134: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch analysis

Expectation is unbiased . . . what about the variance?

Var(‖A‖22) = E(‖A‖2

2 − E‖A‖22)2 = E‖A‖4

2 − ‖f ‖42

After some calculations I’ll omit

Var(‖A‖22) =

k∑r=1

∑i 6=j

(Eδi ,rδj ,r )f 2i f

2j ≤

k∑r=1

1

k2(‖f ‖2

2)2 =‖f ‖4

2

k

Chebyshev: Pr(|‖A‖22 − E‖A‖2

2| > ε‖f ‖22) <

Var(‖A‖22)

ε2‖f ‖42

So, we can set k = 3/ε2 to get error probability 1/3

Edo Liberty , Jelani Nelson : Streaming Data Mining 62 / 111

Page 135: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch analysis

Expectation is unbiased . . . what about the variance?

Var(‖A‖22) = E(‖A‖2

2 − E‖A‖22)2 = E‖A‖4

2 − ‖f ‖42

After some calculations I’ll omit

Var(‖A‖22) =

k∑r=1

∑i 6=j

(Eδi ,rδj ,r )f 2i f

2j ≤

k∑r=1

1

k2(‖f ‖2

2)2 =‖f ‖4

2

k

Chebyshev: Pr(|‖A‖22 − E‖A‖2

2| > ε‖f ‖22) <

Var(‖A‖22)

ε2‖f ‖42

So, we can set k = 3/ε2 to get error probability 1/3

Edo Liberty , Jelani Nelson : Streaming Data Mining 62 / 111

Page 136: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Moment estimation — TZ sketch analysis

Expectation is unbiased . . . what about the variance?

Var(‖A‖22) = E(‖A‖2

2 − E‖A‖22)2 = E‖A‖4

2 − ‖f ‖42

After some calculations I’ll omit

Var(‖A‖22) =

k∑r=1

∑i 6=j

(Eδi ,rδj ,r )f 2i f

2j ≤

k∑r=1

1

k2(‖f ‖2

2)2 =‖f ‖4

2

k

Chebyshev: Pr(|‖A‖22 − E‖A‖2

2| > ε‖f ‖22) <

Var(‖A‖22)

ε2‖f ‖42

So, we can set k = 3/ε2 to get error probability 1/3

Edo Liberty , Jelani Nelson : Streaming Data Mining 62 / 111

Page 137: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

1 Items

Item frequenciesDistinct elementsMoment estimation

2 Vectors

Dimensionality reductionk-meansLinear Regression

3 Matrices

Efficiently approximating the covariance matrixSparsification by sampling

Edo Liberty , Jelani Nelson : Streaming Data Mining 63 / 111

Page 138: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Many problems, as one would expect, become computationally harder asthe dimensionality of the underlying input data grows

Nearest neighbor search (exponential preprocessing time to getsublinear query)

Optimization problems for geometric problems: closest pair, diameter,minimum spanning tree, . . .

Linear algebra problems: regression, low-rank approximation

Clustering

How can we reduce dimensionality in such a way that we can still(approximately) solve the above problems above?

Edo Liberty , Jelani Nelson : Streaming Data Mining 64 / 111

Page 139: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Many problems, as one would expect, become computationally harder asthe dimensionality of the underlying input data grows

Nearest neighbor search (exponential preprocessing time to getsublinear query)

Optimization problems for geometric problems: closest pair, diameter,minimum spanning tree, . . .

Linear algebra problems: regression, low-rank approximation

Clustering

How can we reduce dimensionality in such a way that we can still(approximately) solve the above problems above?

Edo Liberty , Jelani Nelson : Streaming Data Mining 64 / 111

Page 140: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Good news when underlying distance metric is ‖ · ‖2

Theorem (Johnson-Lindenstrauss (JL) lemma, 1984)

For every 0 < ε ≤ 1/2 and set of points x1, . . . , xN ∈ Rn, there exists amatrix A ∈ Rm×n for m = O(ε−2 logN) such that

∀i 6= j , (1− ε)‖xi − xj‖2 ≤ ‖Axi − Axj‖2 ≤ (1 + ε)‖xi − xj‖2

Bad news when underlying distance metric is not ‖ · ‖2

Work of [Naor, Johnson, SODA 2009] shows that ‖ · ‖2 (or metric spaces“close” to it) are the only spaces where we could hope to embed intoO(logN) dimensions with any constant distortion guarantee.

Edo Liberty , Jelani Nelson : Streaming Data Mining 65 / 111

Page 141: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Good news when underlying distance metric is ‖ · ‖2

Theorem (Johnson-Lindenstrauss (JL) lemma, 1984)

For every 0 < ε ≤ 1/2 and set of points x1, . . . , xN ∈ Rn, there exists amatrix A ∈ Rm×n for m = O(ε−2 logN) such that

∀i 6= j , (1− ε)‖xi − xj‖2 ≤ ‖Axi − Axj‖2 ≤ (1 + ε)‖xi − xj‖2

Bad news when underlying distance metric is not ‖ · ‖2

Work of [Naor, Johnson, SODA 2009] shows that ‖ · ‖2 (or metric spaces“close” to it) are the only spaces where we could hope to embed intoO(logN) dimensions with any constant distortion guarantee.

Edo Liberty , Jelani Nelson : Streaming Data Mining 65 / 111

Page 142: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

How do you prove the JL lemma?

Theorem (Johnson-Lindenstrauss (JL) lemma, 1984)

For every 0 < ε ≤ 1/2 and set of points x1, . . . , xN ∈ Rn, there exists a matrix

A ∈ Rm×n for m = O(ε−2 logN) such that

∀i 6= j , (1− ε)‖xi − xj‖2 ≤ ‖Axi − Axj‖2 ≤ (1 + ε)‖xi − xj‖2

Use the Distributional JL (DJL) lemma

Theorem

For every 0 < ε, δ ≤ 1/2 there exists a distribution Dε,δ over Rm×n for

m = O(ε−2 log(1/δ)) such that for any x ∈ Rn with ‖x‖2 = 1

PrA∼Dε,δ

(∣∣‖Ax‖22 − 1

∥∥ > ε)< δ

Proof of JL via DJL: Set δ = 1/N2 so that (xi − xj)/‖xi − xj‖2 is

preserved w.p. 1− 1/N2. Union bound over all(N

2

)i < j .

Edo Liberty , Jelani Nelson : Streaming Data Mining 66 / 111

Page 143: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

How do you prove the JL lemma?

Theorem (Johnson-Lindenstrauss (JL) lemma, 1984)

For every 0 < ε ≤ 1/2 and set of points x1, . . . , xN ∈ Rn, there exists a matrix

A ∈ Rm×n for m = O(ε−2 logN) such that

∀i 6= j , (1− ε)‖xi − xj‖2 ≤ ‖Axi − Axj‖2 ≤ (1 + ε)‖xi − xj‖2

Use the Distributional JL (DJL) lemma

Theorem

For every 0 < ε, δ ≤ 1/2 there exists a distribution Dε,δ over Rm×n for

m = O(ε−2 log(1/δ)) such that for any x ∈ Rn with ‖x‖2 = 1

PrA∼Dε,δ

(∣∣‖Ax‖22 − 1

∥∥ > ε)< δ

Proof of JL via DJL: Set δ = 1/N2 so that (xi − xj)/‖xi − xj‖2 is

preserved w.p. 1− 1/N2. Union bound over all(N

2

)i < j .

Edo Liberty , Jelani Nelson : Streaming Data Mining 66 / 111

Page 144: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Can choose A to have random Gaussian entries[Indyk, Motwani, STOC 1998]

also see [Dasgupta, Gupta, Rand. Struct. Alg. 22(1), 2003]

y

=1√m

has density function p(x) = 1√2πe−x

2/2

Can choose m = (4 + o(1))ε−2 lnN

x

Edo Liberty , Jelani Nelson : Streaming Data Mining 67 / 111

Page 145: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Can choose A to have random Gaussian entries

Sketch of why it works:

Recall, want Pr(|‖Ax‖22 − 1| > ε) < δ for all x with unit norm.

We have ‖Ax‖22 − 1 = xTATAx − 1

xTATAx = (1/m) ·∑m

r=1

∑i 6=j gr ,igr ,jxixj . Let

g = (g1,1, . . . , g1,n, g2,1, . . .) be the vector of rows of A concatenated.

This is (1/m) · gTBg where B is a block-diagonal matrix with mblocks each equaling xxT . Orthogonal change of basis!

gTBg = gTQTΛQg = (Qg)TΛ(Qg) = g ′TΛg ′ =∑m

i=1 g2i . Thus we

just want to show (1/m)(∑m

i=1(g2i − 1)) is small with high

probability. Can use statistics about the chi-squared distribution.

Edo Liberty , Jelani Nelson : Streaming Data Mining 68 / 111

Page 146: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Can choose A to have random Gaussian entries

Sketch of why it works:

Recall, want Pr(|‖Ax‖22 − 1| > ε) < δ for all x with unit norm.

We have ‖Ax‖22 − 1 = xTATAx − 1

xTATAx = (1/m) ·∑m

r=1

∑i 6=j gr ,igr ,jxixj . Let

g = (g1,1, . . . , g1,n, g2,1, . . .) be the vector of rows of A concatenated.

This is (1/m) · gTBg where B is a block-diagonal matrix with mblocks each equaling xxT . Orthogonal change of basis!

gTBg = gTQTΛQg = (Qg)TΛ(Qg) = g ′TΛg ′ =∑m

i=1 g2i . Thus we

just want to show (1/m)(∑m

i=1(g2i − 1)) is small with high

probability. Can use statistics about the chi-squared distribution.

Edo Liberty , Jelani Nelson : Streaming Data Mining 68 / 111

Page 147: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Can choose A to have random Gaussian entries

Sketch of why it works:

Recall, want Pr(|‖Ax‖22 − 1| > ε) < δ for all x with unit norm.

We have ‖Ax‖22 − 1 = xTATAx − 1

xTATAx = (1/m) ·∑m

r=1

∑i 6=j gr ,igr ,jxixj . Let

g = (g1,1, . . . , g1,n, g2,1, . . .) be the vector of rows of A concatenated.

This is (1/m) · gTBg where B is a block-diagonal matrix with mblocks each equaling xxT . Orthogonal change of basis!

gTBg = gTQTΛQg = (Qg)TΛ(Qg) = g ′TΛg ′ =∑m

i=1 g2i . Thus we

just want to show (1/m)(∑m

i=1(g2i − 1)) is small with high

probability. Can use statistics about the chi-squared distribution.

Edo Liberty , Jelani Nelson : Streaming Data Mining 68 / 111

Page 148: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Can choose A to have random Gaussian entries

Sketch of why it works:

Recall, want Pr(|‖Ax‖22 − 1| > ε) < δ for all x with unit norm.

We have ‖Ax‖22 − 1 = xTATAx − 1

xTATAx = (1/m) ·∑m

r=1

∑i 6=j gr ,igr ,jxixj . Let

g = (g1,1, . . . , g1,n, g2,1, . . .) be the vector of rows of A concatenated.

This is (1/m) · gTBg where B is a block-diagonal matrix with mblocks each equaling xxT . Orthogonal change of basis!

gTBg = gTQTΛQg = (Qg)TΛ(Qg) = g ′TΛg ′ =∑m

i=1 g2i . Thus we

just want to show (1/m)(∑m

i=1(g2i − 1)) is small with high

probability. Can use statistics about the chi-squared distribution.

Edo Liberty , Jelani Nelson : Streaming Data Mining 68 / 111

Page 149: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Can choose A to have random Gaussian entries

Sketch of why it works:

Recall, want Pr(|‖Ax‖22 − 1| > ε) < δ for all x with unit norm.

We have ‖Ax‖22 − 1 = xTATAx − 1

xTATAx = (1/m) ·∑m

r=1

∑i 6=j gr ,igr ,jxixj . Let

g = (g1,1, . . . , g1,n, g2,1, . . .) be the vector of rows of A concatenated.

This is (1/m) · gTBg where B is a block-diagonal matrix with mblocks each equaling xxT . Orthogonal change of basis!

gTBg = gTQTΛQg = (Qg)TΛ(Qg) = g ′TΛg ′ =∑m

i=1 g2i . Thus we

just want to show (1/m)(∑m

i=1(g2i − 1)) is small with high

probability. Can use statistics about the chi-squared distribution.

Edo Liberty , Jelani Nelson : Streaming Data Mining 68 / 111

Page 150: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Can choose A to have random Gaussian entries

Sketch of why it works:

Recall, want Pr(|‖Ax‖22 − 1| > ε) < δ for all x with unit norm.

We have ‖Ax‖22 − 1 = xTATAx − 1

xTATAx = (1/m) ·∑m

r=1

∑i 6=j gr ,igr ,jxixj . Let

g = (g1,1, . . . , g1,n, g2,1, . . .) be the vector of rows of A concatenated.

This is (1/m) · gTBg where B is a block-diagonal matrix with mblocks each equaling xxT . Orthogonal change of basis!

gTBg = gTQTΛQg = (Qg)TΛ(Qg) = g ′TΛg ′ =∑m

i=1 g2i . Thus we

just want to show (1/m)(∑m

i=1(g2i − 1)) is small with high

probability. Can use statistics about the chi-squared distribution.

Edo Liberty , Jelani Nelson : Streaming Data Mining 68 / 111

Page 151: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Another perhaps useful fact . . .

[Klartag, Mendelson, J. Funct. Anal. 225(1), 2005]: Don’t need m = Ω((logN)/ε2) all thetime, but rather can take m = O(γ2

2(X)/ε2) where X = xi − xji 6=j .

Here γ2(X) = E supx∈X 〈g , x〉2 for Gaussian vector g .

Bottom line: Given your data can estimate γ2 by picking a fewindependent Gaussian vectors and taking the empirical mean ofsupx∈X 〈g , x〉

2. Might improve dimensionality reduction on your data.

Caveat: Takes Ω(N2) time to estimate γ2 (X is the set of pairwisedifferences), so probably only makes sense when n N (note:Gram-Schmidt is O(N2n), so assuming n < N is expensive).

Edo Liberty , Jelani Nelson : Streaming Data Mining 69 / 111

Page 152: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Another perhaps useful fact . . .

[Klartag, Mendelson, J. Funct. Anal. 225(1), 2005]: Don’t need m = Ω((logN)/ε2) all thetime, but rather can take m = O(γ2

2(X)/ε2) where X = xi − xji 6=j .

Here γ2(X) = E supx∈X 〈g , x〉2 for Gaussian vector g .

Bottom line: Given your data can estimate γ2 by picking a fewindependent Gaussian vectors and taking the empirical mean ofsupx∈X 〈g , x〉

2. Might improve dimensionality reduction on your data.

Caveat: Takes Ω(N2) time to estimate γ2 (X is the set of pairwisedifferences), so probably only makes sense when n N (note:Gram-Schmidt is O(N2n), so assuming n < N is expensive).

Edo Liberty , Jelani Nelson : Streaming Data Mining 69 / 111

Page 153: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Another perhaps useful fact . . .

[Klartag, Mendelson, J. Funct. Anal. 225(1), 2005]: Don’t need m = Ω((logN)/ε2) all thetime, but rather can take m = O(γ2

2(X)/ε2) where X = xi − xji 6=j .

Here γ2(X) = E supx∈X 〈g , x〉2 for Gaussian vector g .

Bottom line: Given your data can estimate γ2 by picking a fewindependent Gaussian vectors and taking the empirical mean ofsupx∈X 〈g , x〉

2. Might improve dimensionality reduction on your data.

Caveat: Takes Ω(N2) time to estimate γ2 (X is the set of pairwisedifferences), so probably only makes sense when n N (note:Gram-Schmidt is O(N2n), so assuming n < N is expensive).

Edo Liberty , Jelani Nelson : Streaming Data Mining 69 / 111

Page 154: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Another perhaps useful fact . . .

[Klartag, Mendelson, J. Funct. Anal. 225(1), 2005]: Don’t need m = Ω((logN)/ε2) all thetime, but rather can take m = O(γ2

2(X)/ε2) where X = xi − xji 6=j .

Here γ2(X) = E supx∈X 〈g , x〉2 for Gaussian vector g .

Bottom line: Given your data can estimate γ2 by picking a fewindependent Gaussian vectors and taking the empirical mean ofsupx∈X 〈g , x〉

2. Might improve dimensionality reduction on your data.

Caveat: Takes Ω(N2) time to estimate γ2 (X is the set of pairwisedifferences), so probably only makes sense when n N (note:Gram-Schmidt is O(N2n), so assuming n < N is expensive).

Edo Liberty , Jelani Nelson : Streaming Data Mining 69 / 111

Page 155: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Can also choose A to have random sign entries[Achlioptas, PODS 2001]

also see [Achlioptas, PODS 2001]

y

=1√m

+ − + + + + + − + +

+ + − − + − + − − +

− − − + + + − + − +

+ + + + + + − − − −

+ − + − − − − − − −

has density function p(x) = 1√2πe−x

2/2

Can choose m = (4 + o(1))ε−2 lnN

x

Edo Liberty , Jelani Nelson : Streaming Data Mining 70 / 111

Page 156: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Downside of last two constructions: The preprocessing (performingthe embedding) is dense matrix-vector multiplication: O(mn) time

Achlioptas actually showed that we can take a random sign matrix with2/3rds of its entries zero (sparser, so faster)

Then there was the the Fast Johnson-Lindenstrauss Transform* (FJLT)[Ailon, Chazelle, SIAM J. Comput. 39(1), 2009]

1√m

1 0 0 0 0

0 0 0 1 0

0 1 0 0 0

︸ ︷︷ ︸

random samples

0 0 0 0 0

H

︸ ︷︷ ︸Hadamard (or Fourier)

±1

±1

±1

±1

±1

x

* Actual Ailon-Chazelle construction does something a tad better than the sampling matrix (see their paper).

Edo Liberty , Jelani Nelson : Streaming Data Mining 71 / 111

Page 157: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Downside of last two constructions: The preprocessing (performingthe embedding) is dense matrix-vector multiplication: O(mn) time

Achlioptas actually showed that we can take a random sign matrix with2/3rds of its entries zero (sparser, so faster)

Then there was the the Fast Johnson-Lindenstrauss Transform* (FJLT)[Ailon, Chazelle, SIAM J. Comput. 39(1), 2009]

1√m

1 0 0 0 0

0 0 0 1 0

0 1 0 0 0

︸ ︷︷ ︸

random samples

0 0 0 0 0

H

︸ ︷︷ ︸Hadamard (or Fourier)

±1

±1

±1

±1

±1

x

* Actual Ailon-Chazelle construction does something a tad better than the sampling matrix (see their paper).

Edo Liberty , Jelani Nelson : Streaming Data Mining 71 / 111

Page 158: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Downside of last two constructions: The preprocessing (performingthe embedding) is dense matrix-vector multiplication: O(mn) time

Achlioptas actually showed that we can take a random sign matrix with2/3rds of its entries zero (sparser, so faster)

Then there was the the Fast Johnson-Lindenstrauss Transform* (FJLT)[Ailon, Chazelle, SIAM J. Comput. 39(1), 2009]

1√m

1 0 0 0 0

0 0 0 1 0

0 1 0 0 0

︸ ︷︷ ︸

random samples

0 0 0 0 0

H

︸ ︷︷ ︸Hadamard (or Fourier)

±1

±1

±1

±1

±1

x

* Actual Ailon-Chazelle construction does something a tad better than the sampling matrix (see their paper).

Edo Liberty , Jelani Nelson : Streaming Data Mining 71 / 111

Page 159: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Downside of last two constructions: The preprocessing (performingthe embedding) is dense matrix-vector multiplication: O(mn) time

Achlioptas actually showed that we can take a random sign matrix with2/3rds of its entries zero (sparser, so faster)

Then there was the the Fast Johnson-Lindenstrauss Transform* (FJLT)[Ailon, Chazelle, SIAM J. Comput. 39(1), 2009]

1√m

1 0 0 0 0

0 0 0 1 0

0 1 0 0 0

︸ ︷︷ ︸

random samples

0 0 0 0 0

H

︸ ︷︷ ︸Hadamard (or Fourier)

±1

±1

±1

±1

±1

x

* Actual Ailon-Chazelle construction does something a tad better than the sampling matrix (see their paper).

Edo Liberty , Jelani Nelson : Streaming Data Mining 71 / 111

Page 160: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction — FJLT

y =1√mSHDx

Recall H is the matrix with Hi ,j = (−1)(〈~i ,~j〉 mod 2).Actually just need any matrix that has entries bounded in magnitude by 1, and is orthogonal when we normalize by 1/

√n.

Why does this work?

Randomly sampling entries of x has correct `22 expectation but poor

variance (e.g. when x has exactly one non-zero coordinate)

Can show ‖HDx‖∞ ≤√

log(nN)/n ≈√

log(N)/n w.p. 1− 1/N2

Conditioned on above item, apply the Chernoff bound to say samplingby S works with high probability as long as we have enough samples.

Unfortunately requires m = Θ(ε−2 log2 N) samples (extra logN). Canfix by finishing off with a slow matrix (e.g. Gaussian or sign) for anadditional O(mε−2 logN) time. Total time is O(n log n + ε−4 log3 N).

Edo Liberty , Jelani Nelson : Streaming Data Mining 72 / 111

Page 161: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction — FJLT

y =1√mSHDx

Recall H is the matrix with Hi ,j = (−1)(〈~i ,~j〉 mod 2).Actually just need any matrix that has entries bounded in magnitude by 1, and is orthogonal when we normalize by 1/

√n.

Why does this work?

Randomly sampling entries of x has correct `22 expectation but poor

variance (e.g. when x has exactly one non-zero coordinate)

Can show ‖HDx‖∞ ≤√

log(nN)/n ≈√

log(N)/n w.p. 1− 1/N2

Conditioned on above item, apply the Chernoff bound to say samplingby S works with high probability as long as we have enough samples.

Unfortunately requires m = Θ(ε−2 log2 N) samples (extra logN). Canfix by finishing off with a slow matrix (e.g. Gaussian or sign) for anadditional O(mε−2 logN) time. Total time is O(n log n + ε−4 log3 N).

Edo Liberty , Jelani Nelson : Streaming Data Mining 72 / 111

Page 162: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction — FJLT

y =1√mSHDx

Recall H is the matrix with Hi ,j = (−1)(〈~i ,~j〉 mod 2).Actually just need any matrix that has entries bounded in magnitude by 1, and is orthogonal when we normalize by 1/

√n.

Why does this work?

Randomly sampling entries of x has correct `22 expectation but poor

variance (e.g. when x has exactly one non-zero coordinate)

Can show ‖HDx‖∞ ≤√

log(nN)/n ≈√

log(N)/n w.p. 1− 1/N2

Conditioned on above item, apply the Chernoff bound to say samplingby S works with high probability as long as we have enough samples.

Unfortunately requires m = Θ(ε−2 log2 N) samples (extra logN). Canfix by finishing off with a slow matrix (e.g. Gaussian or sign) for anadditional O(mε−2 logN) time. Total time is O(n log n + ε−4 log3 N).

Edo Liberty , Jelani Nelson : Streaming Data Mining 72 / 111

Page 163: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction — FJLT

y =1√mSHDx

Recall H is the matrix with Hi ,j = (−1)(〈~i ,~j〉 mod 2).Actually just need any matrix that has entries bounded in magnitude by 1, and is orthogonal when we normalize by 1/

√n.

Why does this work?

Randomly sampling entries of x has correct `22 expectation but poor

variance (e.g. when x has exactly one non-zero coordinate)

Can show ‖HDx‖∞ ≤√

log(nN)/n ≈√

log(N)/n w.p. 1− 1/N2

Conditioned on above item, apply the Chernoff bound to say samplingby S works with high probability as long as we have enough samples.

Unfortunately requires m = Θ(ε−2 log2 N) samples (extra logN). Canfix by finishing off with a slow matrix (e.g. Gaussian or sign) for anadditional O(mε−2 logN) time. Total time is O(n log n + ε−4 log3 N).

Edo Liberty , Jelani Nelson : Streaming Data Mining 72 / 111

Page 164: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction — FJLT

y =1√mSHDx

Recall H is the matrix with Hi ,j = (−1)(〈~i ,~j〉 mod 2).Actually just need any matrix that has entries bounded in magnitude by 1, and is orthogonal when we normalize by 1/

√n.

Why does this work?

Randomly sampling entries of x has correct `22 expectation but poor

variance (e.g. when x has exactly one non-zero coordinate)

Can show ‖HDx‖∞ ≤√

log(nN)/n ≈√

log(N)/n w.p. 1− 1/N2

Conditioned on above item, apply the Chernoff bound to say samplingby S works with high probability as long as we have enough samples.

Unfortunately requires m = Θ(ε−2 log2 N) samples (extra logN). Canfix by finishing off with a slow matrix (e.g. Gaussian or sign) for anadditional O(mε−2 logN) time. Total time is O(n log n + ε−4 log3 N).

Edo Liberty , Jelani Nelson : Streaming Data Mining 72 / 111

Page 165: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction — FJLT

Improvements to the original FJLT since Ailon and Chazelle’s work

[Ailon, Liberty, SODA 2008]: m = O(ε−2 logN) in O(n log n + m2+γ) time

[Ailon, Liberty, SODA 2011], [Krahmer, Ward, SIAM J. Math. Anal. 43(3), 2011]:m = O(ε−2 logN log4 n) in O(n log n) time (faster for huge N).Construction is actually what we just saw ((1/

√m)SHD), but with a

much different-looking analysis (doesn’t use DJL).

It’s conceivable the (1/√m)SHD construction actually gets the

optimal m = O(ε−2 logN), but we just don’t know how to prove ityet. So, can try with smaller m but buyer beware.

Edo Liberty , Jelani Nelson : Streaming Data Mining 73 / 111

Page 166: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction — FJLT

Improvements to the original FJLT since Ailon and Chazelle’s work

[Ailon, Liberty, SODA 2008]: m = O(ε−2 logN) in O(n log n + m2+γ) time

[Ailon, Liberty, SODA 2011], [Krahmer, Ward, SIAM J. Math. Anal. 43(3), 2011]:m = O(ε−2 logN log4 n) in O(n log n) time (faster for huge N).Construction is actually what we just saw ((1/

√m)SHD), but with a

much different-looking analysis (doesn’t use DJL).

It’s conceivable the (1/√m)SHD construction actually gets the

optimal m = O(ε−2 logN), but we just don’t know how to prove ityet. So, can try with smaller m but buyer beware.

Edo Liberty , Jelani Nelson : Streaming Data Mining 73 / 111

Page 167: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction — FJLT

Improvements to the original FJLT since Ailon and Chazelle’s work

[Ailon, Liberty, SODA 2008]: m = O(ε−2 logN) in O(n log n + m2+γ) time

[Ailon, Liberty, SODA 2011], [Krahmer, Ward, SIAM J. Math. Anal. 43(3), 2011]:m = O(ε−2 logN log4 n) in O(n log n) time (faster for huge N).Construction is actually what we just saw ((1/

√m)SHD), but with a

much different-looking analysis (doesn’t use DJL).

It’s conceivable the (1/√m)SHD construction actually gets the

optimal m = O(ε−2 logN), but we just don’t know how to prove ityet. So, can try with smaller m but buyer beware.

Edo Liberty , Jelani Nelson : Streaming Data Mining 73 / 111

Page 168: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

One downside to FJLT: kills sparsity

Works by randomly spreading mass around to decrease variance forsampling. Takes O(n log n) time, but dense matrices (Gaussian or sign)take only O(m · ‖x‖0).

Who cares about sparsity?

Edo Liberty , Jelani Nelson : Streaming Data Mining 74 / 111

Page 169: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

One downside to FJLT: kills sparsity

Works by randomly spreading mass around to decrease variance forsampling. Takes O(n log n) time, but dense matrices (Gaussian or sign)take only O(m · ‖x‖0).

Who cares about sparsity?

Edo Liberty , Jelani Nelson : Streaming Data Mining 74 / 111

Page 170: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

One downside to FJLT: kills sparsity

Works by randomly spreading mass around to decrease variance forsampling. Takes O(n log n) time, but dense matrices (Gaussian or sign)take only O(m · ‖x‖0).

Who cares about sparsity?

Edo Liberty , Jelani Nelson : Streaming Data Mining 74 / 111

Page 171: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Who cares about sparsity?

You do

Document as bag of words: xi = number of occurrences of word i .Compare documents using cosine similarity.

d = lexicon size; most documents aren’t dictionaries

Network traffic: xi ,j = #bytes sent from i to j

d = 264 (2256 in IPv6); most servers don’t talk to each other

User ratings: xi is user’s score for movie i on Netflix

d = #movies; most people haven’t rated all movies

Streaming: x receives a stream of updates of the form: “add v toxi”. Maintaining Sx requires calculating v · Sei .. . .

Edo Liberty , Jelani Nelson : Streaming Data Mining 75 / 111

Page 172: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction

Who cares about sparsity?

You do

Document as bag of words: xi = number of occurrences of word i .Compare documents using cosine similarity.

d = lexicon size; most documents aren’t dictionaries

Network traffic: xi ,j = #bytes sent from i to j

d = 264 (2256 in IPv6); most servers don’t talk to each other

User ratings: xi is user’s score for movie i on Netflix

d = #movies; most people haven’t rated all movies

Streaming: x receives a stream of updates of the form: “add v toxi”. Maintaining Sx requires calculating v · Sei .. . .

Edo Liberty , Jelani Nelson : Streaming Data Mining 75 / 111

Page 173: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction — SparseJL

One way to embed sparse vectors faster: use sparse matrices

[Kane, N., SODA 2012]

building on work of [Weinberger, Dasgupta, Langford, Smola, Attenberg, ICML, 2009], [Dasgupta, Kumar, Sarlos, STOC 2010]

Embedding time O(m + s‖x‖0) = O(m + εm‖x‖0)

=

0

0 m

00

m/s =

0

0m

00

Each black cell is ±1/√s at random

m = O(ε−2 logN), s = O(ε−1 logN)

Edo Liberty , Jelani Nelson : Streaming Data Mining 76 / 111

Page 174: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction — SparseJL

One way to embed sparse vectors faster: use sparse matrices

[Kane, N., SODA 2012]

building on work of [Weinberger, Dasgupta, Langford, Smola, Attenberg, ICML, 2009], [Dasgupta, Kumar, Sarlos, STOC 2010]

Embedding time O(m + s‖x‖0) = O(m + εm‖x‖0)

=

0

0 m

00

m/s =

0

0m

00

Each black cell is ±1/√s at random

m = O(ε−2 logN), s = O(ε−1 logN)Edo Liberty , Jelani Nelson : Streaming Data Mining 76 / 111

Page 175: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction — SparseJL

Why does it work?

Let’s look at construction where each column has non-zeroes in exactly srandom locations. Let δr ,i be 1 if Ar ,i 6= 0, and let σr ,i be a random signso that Ar ,i = δr ,iσr ,i . Then,

(Ax)r = 1√s

∑ni=1 δr ,iσr ,ixi

(Ax)2r = 1

s

∑ni=1 δr ,ix

2i + 1

s

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

‖Ax‖22 = 1

s

∑nr=1

∑ni=1 δr ,ix

2i + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

= 1m

∑ni=1 x

2i · (

∑mr=1 δr ,i ) + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

= ‖x‖22 + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

The error term above is some quadratic form σTBσ. Can argue that it’ssmall with high probability using known tail bounds for quadratic forms(the “Hanson-Wright inequality”).

Edo Liberty , Jelani Nelson : Streaming Data Mining 77 / 111

Page 176: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction — SparseJL

Why does it work?Let’s look at construction where each column has non-zeroes in exactly srandom locations. Let δr ,i be 1 if Ar ,i 6= 0, and let σr ,i be a random signso that Ar ,i = δr ,iσr ,i . Then,

(Ax)r = 1√s

∑ni=1 δr ,iσr ,ixi

(Ax)2r = 1

s

∑ni=1 δr ,ix

2i + 1

s

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

‖Ax‖22 = 1

s

∑nr=1

∑ni=1 δr ,ix

2i + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

= 1m

∑ni=1 x

2i · (

∑mr=1 δr ,i ) + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

= ‖x‖22 + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

The error term above is some quadratic form σTBσ. Can argue that it’ssmall with high probability using known tail bounds for quadratic forms(the “Hanson-Wright inequality”).

Edo Liberty , Jelani Nelson : Streaming Data Mining 77 / 111

Page 177: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction — SparseJL

Why does it work?Let’s look at construction where each column has non-zeroes in exactly srandom locations. Let δr ,i be 1 if Ar ,i 6= 0, and let σr ,i be a random signso that Ar ,i = δr ,iσr ,i . Then,

(Ax)r = 1√s

∑ni=1 δr ,iσr ,ixi

(Ax)2r = 1

s

∑ni=1 δr ,ix

2i + 1

s

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

‖Ax‖22 = 1

s

∑nr=1

∑ni=1 δr ,ix

2i + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

= 1m

∑ni=1 x

2i · (

∑mr=1 δr ,i ) + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

= ‖x‖22 + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

The error term above is some quadratic form σTBσ. Can argue that it’ssmall with high probability using known tail bounds for quadratic forms(the “Hanson-Wright inequality”).

Edo Liberty , Jelani Nelson : Streaming Data Mining 77 / 111

Page 178: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction — SparseJL

Why does it work?Let’s look at construction where each column has non-zeroes in exactly srandom locations. Let δr ,i be 1 if Ar ,i 6= 0, and let σr ,i be a random signso that Ar ,i = δr ,iσr ,i . Then,

(Ax)r = 1√s

∑ni=1 δr ,iσr ,ixi

(Ax)2r = 1

s

∑ni=1 δr ,ix

2i + 1

s

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

‖Ax‖22 = 1

s

∑nr=1

∑ni=1 δr ,ix

2i + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

= 1m

∑ni=1 x

2i · (

∑mr=1 δr ,i ) + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

= ‖x‖22 + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

The error term above is some quadratic form σTBσ. Can argue that it’ssmall with high probability using known tail bounds for quadratic forms(the “Hanson-Wright inequality”).

Edo Liberty , Jelani Nelson : Streaming Data Mining 77 / 111

Page 179: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction — SparseJL

Why does it work?Let’s look at construction where each column has non-zeroes in exactly srandom locations. Let δr ,i be 1 if Ar ,i 6= 0, and let σr ,i be a random signso that Ar ,i = δr ,iσr ,i . Then,

(Ax)r = 1√s

∑ni=1 δr ,iσr ,ixi

(Ax)2r = 1

s

∑ni=1 δr ,ix

2i + 1

s

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

‖Ax‖22 = 1

s

∑nr=1

∑ni=1 δr ,ix

2i + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

= 1m

∑ni=1 x

2i · (

∑mr=1 δr ,i ) + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

= ‖x‖22 + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

The error term above is some quadratic form σTBσ. Can argue that it’ssmall with high probability using known tail bounds for quadratic forms(the “Hanson-Wright inequality”).

Edo Liberty , Jelani Nelson : Streaming Data Mining 77 / 111

Page 180: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction — SparseJL

Why does it work?Let’s look at construction where each column has non-zeroes in exactly srandom locations. Let δr ,i be 1 if Ar ,i 6= 0, and let σr ,i be a random signso that Ar ,i = δr ,iσr ,i . Then,

(Ax)r = 1√s

∑ni=1 δr ,iσr ,ixi

(Ax)2r = 1

s

∑ni=1 δr ,ix

2i + 1

s

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

‖Ax‖22 = 1

s

∑nr=1

∑ni=1 δr ,ix

2i + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

= 1m

∑ni=1 x

2i · (

∑mr=1 δr ,i ) + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

= ‖x‖22 + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

The error term above is some quadratic form σTBσ. Can argue that it’ssmall with high probability using known tail bounds for quadratic forms(the “Hanson-Wright inequality”).

Edo Liberty , Jelani Nelson : Streaming Data Mining 77 / 111

Page 181: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Dimensionality reduction — SparseJL

Why does it work?Let’s look at construction where each column has non-zeroes in exactly srandom locations. Let δr ,i be 1 if Ar ,i 6= 0, and let σr ,i be a random signso that Ar ,i = δr ,iσr ,i . Then,

(Ax)r = 1√s

∑ni=1 δr ,iσr ,ixi

(Ax)2r = 1

s

∑ni=1 δr ,ix

2i + 1

s

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

‖Ax‖22 = 1

s

∑nr=1

∑ni=1 δr ,ix

2i + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

= 1m

∑ni=1 x

2i · (

∑mr=1 δr ,i ) + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

= ‖x‖22 + 1

s

∑nr=1

∑i 6=j δr ,iδr ,jσr ,iσr ,jxixj

The error term above is some quadratic form σTBσ. Can argue that it’ssmall with high probability using known tail bounds for quadratic forms(the “Hanson-Wright inequality”).

Edo Liberty , Jelani Nelson : Streaming Data Mining 77 / 111

Page 182: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

1 Items

Item frequenciesDistinct elementsMoment estimation

2 Vectors

Dimensionality reductionk-meansLinear Regression

3 Matrices

Efficiently approximating the covariance matrixSparsification by sampling

Edo Liberty , Jelani Nelson : Streaming Data Mining 78 / 111

Page 183: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

k-means clustering

Input: x1, . . . , xN ∈ Rn, integer k > 1Output: A partition of input points into k clusters C1, . . . ,Ck

Goal: Minimizek∑

r=1

∑i∈Cr

‖xi − µr‖22,

where µr is the centroid of Cr , i.e. µr = 1|Cr |∑

i∈Crxi .

Edo Liberty , Jelani Nelson : Streaming Data Mining 79 / 111

Page 184: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

k-means clustering

Input: x1, . . . , xN ∈ Rn, integer k > 1Output: A partition of input points into k clusters C1, . . . ,Ck

Goal: Minimizek∑

r=1

∑i∈Cr

‖xi − µr‖22,

where µr is the centroid of Cr , i.e. µr = 1|Cr |∑

i∈Crxi .

Edo Liberty , Jelani Nelson : Streaming Data Mining 79 / 111

Page 185: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

k-means clustering

Let’s look at the objective function

Minimize ∑kr=1

∑i∈Cr

∥∥∥xi − 1|Cr | ·

∑j∈Cr

xj

∥∥∥2

2(∗)

Let’s focus on a particular r with |Cr | > 1

Look at each term in the inner sum as⟨xi− 1|Cr |

∑j∈Cr | xj ,xi−

1|Cr

∑j∈Cr xj

⟩For each i ∈ Cr the coefficient of ‖xi‖2

2 in the inner sum is1− 2/|Cr |+ |Cr | · (1/|Cr |2) = (1− 1/|Cr |)For each i < j ∈ Cr the coefficient of 〈xi , xj〉 is −2/|Cr |Noting that ‖xi − xj‖2

2 = ‖xi‖22 + ‖x‖2

2 − 2 〈xi , xj〉,(∗) equals

∑kr=1

1|Cr |·(∑

i<j∈Cr ‖xi−xj‖22)

⇒ JL with O(ε−2 logN) rows preserves optimality of k-means up to 1± ε

Edo Liberty , Jelani Nelson : Streaming Data Mining 80 / 111

Page 186: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

k-means clustering

Let’s look at the objective function

Minimize ∑kr=1

∑i∈Cr

∥∥∥xi − 1|Cr | ·

∑j∈Cr

xj

∥∥∥2

2(∗)

Let’s focus on a particular r with |Cr | > 1

Look at each term in the inner sum as⟨xi− 1|Cr |

∑j∈Cr | xj ,xi−

1|Cr

∑j∈Cr xj

⟩For each i ∈ Cr the coefficient of ‖xi‖2

2 in the inner sum is1− 2/|Cr |+ |Cr | · (1/|Cr |2) = (1− 1/|Cr |)For each i < j ∈ Cr the coefficient of 〈xi , xj〉 is −2/|Cr |Noting that ‖xi − xj‖2

2 = ‖xi‖22 + ‖x‖2

2 − 2 〈xi , xj〉,(∗) equals

∑kr=1

1|Cr |·(∑

i<j∈Cr ‖xi−xj‖22)

⇒ JL with O(ε−2 logN) rows preserves optimality of k-means up to 1± ε

Edo Liberty , Jelani Nelson : Streaming Data Mining 80 / 111

Page 187: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

k-means clustering

Let’s look at the objective function

Minimize ∑kr=1

∑i∈Cr

∥∥∥xi − 1|Cr | ·

∑j∈Cr

xj

∥∥∥2

2(∗)

Let’s focus on a particular r with |Cr | > 1

Look at each term in the inner sum as⟨xi− 1|Cr |

∑j∈Cr | xj ,xi−

1|Cr

∑j∈Cr xj

⟩For each i ∈ Cr the coefficient of ‖xi‖2

2 in the inner sum is1− 2/|Cr |+ |Cr | · (1/|Cr |2) = (1− 1/|Cr |)For each i < j ∈ Cr the coefficient of 〈xi , xj〉 is −2/|Cr |Noting that ‖xi − xj‖2

2 = ‖xi‖22 + ‖x‖2

2 − 2 〈xi , xj〉,(∗) equals

∑kr=1

1|Cr |·(∑

i<j∈Cr ‖xi−xj‖22)

⇒ JL with O(ε−2 logN) rows preserves optimality of k-means up to 1± ε

Edo Liberty , Jelani Nelson : Streaming Data Mining 80 / 111

Page 188: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

k-means clustering

Let’s look at the objective function

Minimize ∑kr=1

∑i∈Cr

∥∥∥xi − 1|Cr | ·

∑j∈Cr

xj

∥∥∥2

2(∗)

Let’s focus on a particular r with |Cr | > 1

Look at each term in the inner sum as⟨xi− 1|Cr |

∑j∈Cr | xj ,xi−

1|Cr

∑j∈Cr xj

For each i ∈ Cr the coefficient of ‖xi‖22 in the inner sum is

1− 2/|Cr |+ |Cr | · (1/|Cr |2) = (1− 1/|Cr |)For each i < j ∈ Cr the coefficient of 〈xi , xj〉 is −2/|Cr |Noting that ‖xi − xj‖2

2 = ‖xi‖22 + ‖x‖2

2 − 2 〈xi , xj〉,(∗) equals

∑kr=1

1|Cr |·(∑

i<j∈Cr ‖xi−xj‖22)

⇒ JL with O(ε−2 logN) rows preserves optimality of k-means up to 1± ε

Edo Liberty , Jelani Nelson : Streaming Data Mining 80 / 111

Page 189: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

k-means clustering

Let’s look at the objective function

Minimize ∑kr=1

∑i∈Cr

∥∥∥xi − 1|Cr | ·

∑j∈Cr

xj

∥∥∥2

2(∗)

Let’s focus on a particular r with |Cr | > 1

Look at each term in the inner sum as⟨xi− 1|Cr |

∑j∈Cr | xj ,xi−

1|Cr

∑j∈Cr xj

⟩For each i ∈ Cr the coefficient of ‖xi‖2

2 in the inner sum is1− 2/|Cr |+ |Cr | · (1/|Cr |2) = (1− 1/|Cr |)

For each i < j ∈ Cr the coefficient of 〈xi , xj〉 is −2/|Cr |Noting that ‖xi − xj‖2

2 = ‖xi‖22 + ‖x‖2

2 − 2 〈xi , xj〉,(∗) equals

∑kr=1

1|Cr |·(∑

i<j∈Cr ‖xi−xj‖22)

⇒ JL with O(ε−2 logN) rows preserves optimality of k-means up to 1± ε

Edo Liberty , Jelani Nelson : Streaming Data Mining 80 / 111

Page 190: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

k-means clustering

Let’s look at the objective function

Minimize ∑kr=1

∑i∈Cr

∥∥∥xi − 1|Cr | ·

∑j∈Cr

xj

∥∥∥2

2(∗)

Let’s focus on a particular r with |Cr | > 1

Look at each term in the inner sum as⟨xi− 1|Cr |

∑j∈Cr | xj ,xi−

1|Cr

∑j∈Cr xj

⟩For each i ∈ Cr the coefficient of ‖xi‖2

2 in the inner sum is1− 2/|Cr |+ |Cr | · (1/|Cr |2) = (1− 1/|Cr |)For each i < j ∈ Cr the coefficient of 〈xi , xj〉 is −2/|Cr |

Noting that ‖xi − xj‖22 = ‖xi‖2

2 + ‖x‖22 − 2 〈xi , xj〉,

(∗) equals∑k

r=11|Cr |·(∑

i<j∈Cr ‖xi−xj‖22)

⇒ JL with O(ε−2 logN) rows preserves optimality of k-means up to 1± ε

Edo Liberty , Jelani Nelson : Streaming Data Mining 80 / 111

Page 191: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

k-means clustering

Let’s look at the objective function

Minimize ∑kr=1

∑i∈Cr

∥∥∥xi − 1|Cr | ·

∑j∈Cr

xj

∥∥∥2

2(∗)

Let’s focus on a particular r with |Cr | > 1

Look at each term in the inner sum as⟨xi− 1|Cr |

∑j∈Cr | xj ,xi−

1|Cr

∑j∈Cr xj

⟩For each i ∈ Cr the coefficient of ‖xi‖2

2 in the inner sum is1− 2/|Cr |+ |Cr | · (1/|Cr |2) = (1− 1/|Cr |)For each i < j ∈ Cr the coefficient of 〈xi , xj〉 is −2/|Cr |Noting that ‖xi − xj‖2

2 = ‖xi‖22 + ‖x‖2

2 − 2 〈xi , xj〉,(∗) equals

∑kr=1

1|Cr |·(∑

i<j∈Cr ‖xi−xj‖22)

⇒ JL with O(ε−2 logN) rows preserves optimality of k-means up to 1± ε

Edo Liberty , Jelani Nelson : Streaming Data Mining 80 / 111

Page 192: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

k-means clustering

Let’s look at the objective function

Minimize ∑kr=1

∑i∈Cr

∥∥∥xi − 1|Cr | ·

∑j∈Cr

xj

∥∥∥2

2(∗)

Let’s focus on a particular r with |Cr | > 1

Look at each term in the inner sum as⟨xi− 1|Cr |

∑j∈Cr | xj ,xi−

1|Cr

∑j∈Cr xj

⟩For each i ∈ Cr the coefficient of ‖xi‖2

2 in the inner sum is1− 2/|Cr |+ |Cr | · (1/|Cr |2) = (1− 1/|Cr |)For each i < j ∈ Cr the coefficient of 〈xi , xj〉 is −2/|Cr |Noting that ‖xi − xj‖2

2 = ‖xi‖22 + ‖x‖2

2 − 2 〈xi , xj〉,(∗) equals

∑kr=1

1|Cr |·(∑

i<j∈Cr ‖xi−xj‖22)

⇒ JL with O(ε−2 logN) rows preserves optimality of k-means up to 1± ε

Edo Liberty , Jelani Nelson : Streaming Data Mining 80 / 111

Page 193: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

k-means clustering

Also see [Boutsidis, Zouzias, Mahoney, Drineas, arXiv abs/1110.2897], which shows thatO(k/ε2) dimensions preserves optimal solution up to 2± ε

For papers on how to actually do fast k-means clustering withprovable guarantees, see

[Guha, Meyerson, Mishra, Motwani, O’Callaghan, IEEE Trans. Knowl. Data Eng. 15(3), 2003]

[Har-Peled, Mazumdar, STOC 2004]

[Ostrovsky, Rabani, Schulman, Swamy, FOCS 2006]

[Arthur, Vasilvitskii, SODA 2007]

[Aggarwal, Deshpande, Kannan, APPROX 2009]

[Ailon, Jaiswal, Monteleoni, NIPS 2009]

[Jaiswal, Garg, RANDOM 2012]

. . .

Edo Liberty , Jelani Nelson : Streaming Data Mining 81 / 111

Page 194: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

k-means clustering

Also see [Boutsidis, Zouzias, Mahoney, Drineas, arXiv abs/1110.2897], which shows thatO(k/ε2) dimensions preserves optimal solution up to 2± ε

For papers on how to actually do fast k-means clustering withprovable guarantees, see

[Guha, Meyerson, Mishra, Motwani, O’Callaghan, IEEE Trans. Knowl. Data Eng. 15(3), 2003]

[Har-Peled, Mazumdar, STOC 2004]

[Ostrovsky, Rabani, Schulman, Swamy, FOCS 2006]

[Arthur, Vasilvitskii, SODA 2007]

[Aggarwal, Deshpande, Kannan, APPROX 2009]

[Ailon, Jaiswal, Monteleoni, NIPS 2009]

[Jaiswal, Garg, RANDOM 2012]

. . .

Edo Liberty , Jelani Nelson : Streaming Data Mining 81 / 111

Page 195: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

1 Items

Item frequenciesDistinct elementsMoment estimation

2 Vectors

Dimensionality reductionk-meansLinear Regression

3 Matrices

Efficiently approximating the covariance matrixSparsification by sampling

Edo Liberty , Jelani Nelson : Streaming Data Mining 82 / 111

Page 196: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Linear regression

Want to find a linear function that best matches data, quadraticpenalty

minx‖Sx − b‖2(∗)

If rows of S are Si , we want our linear function to have f (Si ) = bi for all i

The JL connection:‖Sx − b‖2

2 = ‖Sx − b|| − b⊥‖22 = ‖Sx − b||‖2

2 + ‖b⊥‖22, where b|| is in the

column span of S , and b⊥ is in the orthogonal complement. So, b|| = Sy||for some y||, so Sx − b|| = S(x − y||). Thus if we apply some JL matrix Awhich satisfies ‖ASz‖2 ≈ ‖Sz‖2 for all z , we preserve (*). Can get awaywith m = O(d/ε2) [Arora, Hazan, Kale, RANDOM 2006].

Edo Liberty , Jelani Nelson : Streaming Data Mining 83 / 111

Page 197: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Linear regression

Want to find a linear function that best matches data, quadraticpenalty

minx‖Sx − b‖2(∗)

If rows of S are Si , we want our linear function to have f (Si ) = bi for all i

The JL connection:‖Sx − b‖2

2 = ‖Sx − b|| − b⊥‖22 = ‖Sx − b||‖2

2 + ‖b⊥‖22, where b|| is in the

column span of S , and b⊥ is in the orthogonal complement. So, b|| = Sy||for some y||, so Sx − b|| = S(x − y||). Thus if we apply some JL matrix Awhich satisfies ‖ASz‖2 ≈ ‖Sz‖2 for all z , we preserve (*). Can get awaywith m = O(d/ε2) [Arora, Hazan, Kale, RANDOM 2006].Edo Liberty , Jelani Nelson : Streaming Data Mining 83 / 111

Page 198: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Linear regression

For more on fast linear regression, low rank approximation, andother numerical linear algebra problems:

[Sarlos, FOCS 2006]

[Clarkson, Woodruff, STOC 2009]

[Mahoney, Randomized Algorithms for Matrices and Data, 2011]

[Halko, Martinsson, Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate

matrix decompositions, 2009]

[Boutsidis, Drineas, Magdon-Ismail, arXiv abs/1202.3505]

[Clarkson, Woodruff, arXiv abs/1207.6365]

. . .

Last paper listed can do linear regression on d × n matrices in O(nnz(S) + poly(d/ε)) time!

(nnz(S) is the number of non-zero entries in S)

Edo Liberty , Jelani Nelson : Streaming Data Mining 84 / 111

Page 199: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

1 Items

Item frequenciesDistinct elementsMoment estimation

2 Vectors

Dimensionality reductionk-meansLinear Regression

3 Matrices

Efficiently approximating the covariance matrixSparsification by sampling

Edo Liberty , Jelani Nelson : Streaming Data Mining 85 / 111

Page 200: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Matrices

A stream of vectors can be viewed a very large matrix A.

Edo Liberty , Jelani Nelson : Streaming Data Mining 86 / 111

Page 201: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Matrices

What do we usually want to get from large matrices?

Low rank approximation

Singular Value Decomposition

Principal Component Analysis

Latent Dirichlet allocation

Latent Semantic Indexing

...

For the above, it is sufficient to compute AAT .

Edo Liberty , Jelani Nelson : Streaming Data Mining 87 / 111

Page 202: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Matrices

Computing AAT is trivial from the stream of columns Ai

AAT =n∑

i=1

AiATi

In words, AAT is sum of outer products of the columns of A.

Naıve solution

Time O(nd2) and space O(d2).

What is d = 10, 000? or 100, 000?Order of d2 time or space is out of the question...

Edo Liberty , Jelani Nelson : Streaming Data Mining 88 / 111

Page 203: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Matrices

Computing AAT is trivial from the stream of columns Ai

AAT =n∑

i=1

AiATi

In words, AAT is sum of outer products of the columns of A.

Naıve solution

Time O(nd2) and space O(d2).

What is d = 10, 000? or 100, 000?Order of d2 time or space is out of the question...

Edo Liberty , Jelani Nelson : Streaming Data Mining 88 / 111

Page 204: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Matrices

Computing AAT is trivial from the stream of columns Ai

AAT =n∑

i=1

AiATi

In words, AAT is sum of outer products of the columns of A.

Naıve solution

Time O(nd2) and space O(d2).

What is d = 10, 000? or 100, 000?Order of d2 time or space is out of the question...

Edo Liberty , Jelani Nelson : Streaming Data Mining 88 / 111

Page 205: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Matrix Column Sampling

We want to create a “sketch” B which approximates A.

1 B is as small as possible

2 Computing B is as efficient as possible

3 AAT ≈ BBT

More accurately:‖AAT − BBT‖ ≤ ε‖A‖2

f

Amazingly enough, we can get this by sampling columns from A![Alan Frieze, Ravi Kannan, and Santosh Vempala, Fast monte-carlo algorithms for finding low-rank approximations, 1998]

[Rudolf Ahlswede and Andreas Winter. Strong converse for identification via quantum channels, 2002]

[Petros Drineas and Ravi Kannan. Pass efficient algorithms for approximating large matrices. 2003]

[Mark Rudelson and Roman Vershynin. Sampling from large matrices: An approach through geometric functional analysis, 2007]

[Roberto Imbuzeiro Oliveira. Sums of random hermitian matrices and an inequality by Rudelson, 2010]

[Christos Boutsidis, Petros Drineas. and Malik Magdon-Ismail. Near optimal column-based matrix reconstruction, 2011]

Edo Liberty , Jelani Nelson : Streaming Data Mining 89 / 111

Page 206: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Matrix Column Sampling

Let B contain ` independently chosen columns from A

Bj = Ai/√`pi w.p. pi = ‖Ai‖2

2/‖A‖2f

To see why this makes sense, let us compute the expectation of BBT

E[BjBTj ] =

n∑i=1

piAi√`pi

ATi√`pi

=1

`

n∑i=1

AiATi =

1

`AAT

So,

E[BBT ] =∑j=1

E[BjBTj ] = AAT

Note, BBT =∑`

j=1 BjBTj is a sum of independent random variables...

Edo Liberty , Jelani Nelson : Streaming Data Mining 90 / 111

Page 207: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Matrix Column Sampling

Figuring out the minimal value for ` is non trivial.But, it is enough to require

` = c log(r)/rε2 .

r = ‖A‖2f /‖A‖2

2 is the numeric rank of A.

1 ≤ r ≤ Rank(A) ≤ d .

r ≈ const for “Interesting” matrices.

Matrix Column Sampling

We can compute B in time O(dn) (same as reading the matrix).B requires at most O(d/ε2) space and:

‖AAT − BBT‖ ≤ ε‖A‖2f

Edo Liberty , Jelani Nelson : Streaming Data Mining 91 / 111

Page 208: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Matrix Column Sampling

Figuring out the minimal value for ` is non trivial.But, it is enough to require

` = c log(r)/rε2 .

r = ‖A‖2f /‖A‖2

2 is the numeric rank of A.

1 ≤ r ≤ Rank(A) ≤ d .

r ≈ const for “Interesting” matrices.

Matrix Column Sampling

We can compute B in time O(dn) (same as reading the matrix).B requires at most O(d/ε2) space and:

‖AAT − BBT‖ ≤ ε‖A‖2f

Edo Liberty , Jelani Nelson : Streaming Data Mining 91 / 111

Page 209: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Matrix Column Sampling

A streaming procedure for matrix column sampling.

Input: ε ∈ (0, 1], A ∈ Rd×n

`← dc/ε2eB ← all zeros matrix ∈ Rd×`

T = 0for i ∈ [n] do

t = ‖Ai‖2

T ← T + tfor j ∈ [`] do

w.p. t/TBj ← 1√

`t/TAi

Return: B

The dependency on 1/ε2 is problematic for small ε.

Edo Liberty , Jelani Nelson : Streaming Data Mining 92 / 111

Page 210: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Matrix Sketching

The space requirement can be reduced to O(d/ε)

Edo Liberty , Jelani Nelson : Streaming Data Mining 93 / 111

Page 211: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Matrix Sketching

Columns are added until the sketch is ‘full’

Edo Liberty , Jelani Nelson : Streaming Data Mining 94 / 111

Page 212: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Matrix Sketching

Then, we compute the SVD of the sketch and rotate it

Edo Liberty , Jelani Nelson : Streaming Data Mining 95 / 111

Page 213: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Matrix Sketching

B = USV is rotated to Bnew = US (note that BBT = BnewBTnew )

Edo Liberty , Jelani Nelson : Streaming Data Mining 96 / 111

Page 214: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Matrix Sketching

Let ∆ = ‖B1/ε‖

Edo Liberty , Jelani Nelson : Streaming Data Mining 97 / 111

Page 215: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Matrix Sketching

Shrink all the columns such that their `22 is reduced by ∆

Edo Liberty , Jelani Nelson : Streaming Data Mining 98 / 111

Page 216: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Matrix Sketching

Start aggregating columns again...

Edo Liberty , Jelani Nelson : Streaming Data Mining 99 / 111

Page 217: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Matrix Sketching

Input: ε ∈ (0, 1], A ∈ Rn×m

`← d1/εeB ← all zeros matrix ∈ R`×mfor i ∈ [n] do

B` ← Ai

[U,Σ,V ]← SVD(B)∆← Σ2

c`,c`

Σ←√

max(Σ2 − I`∆, 0)B ← ΣV

Return: BLiberty, 2012

Matrix Sketching

We can compute B in O(dn/ε) time and O(d/ε) space.

‖AAT − BBT‖ ≤ ε‖A‖2f

Edo Liberty , Jelani Nelson : Streaming Data Mining 100 / 111

Page 218: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Matrix Sketching

Example approximation of a synthetic matrix of size 1, 000× 10, 000

Edo Liberty , Jelani Nelson : Streaming Data Mining 101 / 111

Page 219: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

1 Items

Item frequenciesDistinct elementsMoment estimation

2 Vectors

Dimensionality reductionk-meansLinear Regression

3 Matrices

Efficiently approximating the covariance matrixSparsification by sampling

Edo Liberty , Jelani Nelson : Streaming Data Mining 102 / 111

Page 220: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Sparsification by sampling

Sometimes, we only get one matrix entry at a time...

Edo Liberty , Jelani Nelson : Streaming Data Mining 103 / 111

Page 221: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Sparsification by sampling

Recommender system example: user i rated item j with Ai ,j -stars.

Edo Liberty , Jelani Nelson : Streaming Data Mining 104 / 111

Page 222: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Sparsification by sampling

Note that these matrices are usually very sparse!

Edo Liberty , Jelani Nelson : Streaming Data Mining 105 / 111

Page 223: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Sparsification by sampling

If we sample entries we can only improve sparsity.

Edo Liberty , Jelani Nelson : Streaming Data Mining 106 / 111

Page 224: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Sparsification by sampling

How close is B the original matrix A, ‖A− B‖2 ≤?

Edo Liberty , Jelani Nelson : Streaming Data Mining 107 / 111

Page 225: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Sparsification by sampling

Generic sampling algorithm:

B ← all zeros matrixfor (i , j ,Ai ,j) do

w.p. pi ,jBi ,j ← Ai ,j/pi ,j

Return: B

For any choice of pi ,j we have E[A− B] = 0 or:

E[B] = A

What about Var[A− B] ? In other words, when is ‖A− B‖2 small?

Edo Liberty , Jelani Nelson : Streaming Data Mining 108 / 111

Page 226: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Sparsification by sampling

Generic sampling algorithm:

B ← all zeros matrixfor (i , j ,Ai ,j) do

w.p. pi ,jBi ,j ← Ai ,j/pi ,j

Return: B

For any choice of pi ,j we have E[A− B] = 0 or:

E[B] = A

What about Var[A− B] ? In other words, when is ‖A− B‖2 small?

Edo Liberty , Jelani Nelson : Streaming Data Mining 108 / 111

Page 227: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Sparsification by sampling

Assume w.l.o.g. that |Ai ,j | ≤ 1 we can get that:

‖A− B‖2 ≤ ε

[Fast Computation of Low Rank Matrix Approximations, Achlioptas, McSherry]

pi ,j = O(1) ·max(A2i ,j/n, |Ai ,j |/

√n)

|B| = O(n + nε2

∑i ,j A

2i ,j)

[A Fast Random Sampling Algorithm for Sparsifying Matrices, Arora, Hazan, Kale]

pi ,j = min(|Ai ,j |√n/ε, 1)

|B| = O(√nε

∑i ,j |Ai ,j |)

The latter has a very simple proof, see paper for more details.Edo Liberty , Jelani Nelson : Streaming Data Mining 109 / 111

Page 228: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Streaming Data Mining

1 Items (words, IP-adresses, events, clicks,...):

Item frequenciesDistinct elementsMoment estimation

2 Vectors (text documents, images, example features,...)

Dimensionality reductionk-meansLinear Regression

3 Matrices (text corpora, user preferences, social graphs,...)

Efficiently approximating the covariance matrixSparsification by sampling

Edo Liberty , Jelani Nelson : Streaming Data Mining 110 / 111

Page 229: Streaming Data Mining - Edo Liberty's homepage · Edo Liberty , Jelani Nelson : Streaming Data Mining 24 / 111. Lossy Counting We keep at most 1="di erent items (in this case 2) Edo

Thank you!

We would like to thank: Nir Ailon, Moses S. Charikar, Sudipto Guha, EladHazan, Yishay Mansour, Muthu Muthukrishnan, David P. Woodruff,Petros Drineas, Piotr Indyk, Graham Cormode, Andrew McGregor, andDimitris Achlioptas for pointing out important algorithms and references.

Edo Liberty , Jelani Nelson : Streaming Data Mining 111 / 111