ariel rosenfeld. input: a stream of m integers i1, i2,..., im. (over 1,…,n) output: the number...
TRANSCRIPT
Input: a stream of m integers i1, i2, ..., im. (over 1,…,n)
Output: the number of distinct elements in the stream.
Example – count the distinct number of IP addresses you encounter.
Bit vector of size n (mark 1 when encountered)
Keeping all m integers and naively answer.◦ Sort and count
O(min{n,mlogm})
a determinitic exact algorithm is impossible using o(n) bits.
A deterministic approximation algorithm for this problem providing a (1 ± 1/1000)-approximation using o(n) bits is impossible.
2 2Var(X) = E(X ) E(X) . Pick random hash function h :
[n] → [0, 1]
Calculate z = mini stream ∈ h(i)
Output 1/z − 1
This is idealized for 2 reasons:1.We don’t have perfect precision.2. We need n bits at least to remember the
randomness associated with every i.
Lets ignore it for now…
S = {j1,…jt} (unique elements in the stream)
h(j1), ..., h(jt) = X1, ..., Xt are independent variables from Unif[0, 1]
Z = min{Xi}
We want a function that doesn't need n bits or more to represent.
So we will use k-wise independent hash functions (H) each can be represented using a small number of bits (log|H|).◦ In lecture.