Download - Distinct Elements Problem
![Page 1: Distinct Elements Problem](https://reader033.vdocuments.mx/reader033/viewer/2022061602/56815b11550346895dc8bb21/html5/thumbnails/1.jpg)
Ariel Rosenfeld
![Page 2: Distinct Elements Problem](https://reader033.vdocuments.mx/reader033/viewer/2022061602/56815b11550346895dc8bb21/html5/thumbnails/2.jpg)
Input: a stream of m integers i1, i2, ..., im. (over 1,…,n)
Output: the number of distinct elements in the stream.
Example – count the distinct number of IP addresses you encounter.
![Page 3: Distinct Elements Problem](https://reader033.vdocuments.mx/reader033/viewer/2022061602/56815b11550346895dc8bb21/html5/thumbnails/3.jpg)
Bit vector of size n (mark 1 when encountered)
Keeping all m integers and naively answer.◦ Sort and count
O(min{n,mlogm})
![Page 4: Distinct Elements Problem](https://reader033.vdocuments.mx/reader033/viewer/2022061602/56815b11550346895dc8bb21/html5/thumbnails/4.jpg)
a determinitic exact algorithm is impossible using o(n) bits.
A deterministic approximation algorithm for this problem providing a (1 ± 1/1000)-approximation using o(n) bits is impossible.
![Page 5: Distinct Elements Problem](https://reader033.vdocuments.mx/reader033/viewer/2022061602/56815b11550346895dc8bb21/html5/thumbnails/5.jpg)
2 2Var(X) = E(X ) E(X) . Pick random hash function h :
[n] → [0, 1]
Calculate z = mini stream ∈ h(i)
Output 1/z − 1
![Page 6: Distinct Elements Problem](https://reader033.vdocuments.mx/reader033/viewer/2022061602/56815b11550346895dc8bb21/html5/thumbnails/6.jpg)
Same ints gets same hash value.
We will show that the output is a good approximation.
![Page 7: Distinct Elements Problem](https://reader033.vdocuments.mx/reader033/viewer/2022061602/56815b11550346895dc8bb21/html5/thumbnails/7.jpg)
This is idealized for 2 reasons:1.We don’t have perfect precision.2. We need n bits at least to remember the
randomness associated with every i.
Lets ignore it for now…
![Page 8: Distinct Elements Problem](https://reader033.vdocuments.mx/reader033/viewer/2022061602/56815b11550346895dc8bb21/html5/thumbnails/8.jpg)
S = {j1,…jt} (unique elements in the stream)
h(j1), ..., h(jt) = X1, ..., Xt are independent variables from Unif[0, 1]
Z = min{Xi}
![Page 9: Distinct Elements Problem](https://reader033.vdocuments.mx/reader033/viewer/2022061602/56815b11550346895dc8bb21/html5/thumbnails/9.jpg)
P=1
0 1
0 1
F(x)
1
1
1
0
1
1
1
1
1111
1
y
t
y
tt
tt
ttt
dyyytdyyyfxE
xtxf
xxFxF
xxF
xf
![Page 10: Distinct Elements Problem](https://reader033.vdocuments.mx/reader033/viewer/2022061602/56815b11550346895dc8bb21/html5/thumbnails/10.jpg)
1
0
11y
t
y
tt dyxytdyyyfxE
1
1
1
1011
1,1
1,
1
0
11
0
1
0
1
t
t
ydyyyyxE
xtdvdu
xvyu
t
y
ttt
t
t
![Page 11: Distinct Elements Problem](https://reader033.vdocuments.mx/reader033/viewer/2022061602/56815b11550346895dc8bb21/html5/thumbnails/11.jpg)
1. .
2. .
(HW)
We get a bounded variance.
1
1][
t
ZE
)2)(1(
2]2^[
ttZE
![Page 12: Distinct Elements Problem](https://reader033.vdocuments.mx/reader033/viewer/2022061602/56815b11550346895dc8bb21/html5/thumbnails/12.jpg)
![Page 13: Distinct Elements Problem](https://reader033.vdocuments.mx/reader033/viewer/2022061602/56815b11550346895dc8bb21/html5/thumbnails/13.jpg)
q increases -> better approximation
Chebyshev
2^*)()))11((| azVaratz
P
![Page 14: Distinct Elements Problem](https://reader033.vdocuments.mx/reader033/viewer/2022061602/56815b11550346895dc8bb21/html5/thumbnails/14.jpg)
We want a function that doesn't need n bits or more to represent.
So we will use k-wise independent hash functions (H) each can be represented using a small number of bits (log|H|).◦ In lecture.
![Page 15: Distinct Elements Problem](https://reader033.vdocuments.mx/reader033/viewer/2022061602/56815b11550346895dc8bb21/html5/thumbnails/15.jpg)
An example - Set q > k a prime power, and define Hpoly,k to be the set of all degree ≤ (k − 1) polynomials in Fq[x].
Hpoly,k is a k-wise independent family.
Size: qk
Needs: k log q bits.