zohar karnin arxiv:1603.05346v1 [cs.ds] 17 mar 2016 · almost optimal streaming quantiles...

15
Almost Optimal Streaming Quantiles Algorithms Zohar Karnin Yahoo Research Kevin Lang Yahoo Research Edo Liberty Yahoo Research November 21, 2018 Abstract Streaming approximation of quantiles is a widely performed operation and, as a result, a well studied problem. The goal is to construct a data structure for approximately answering quantile queries in one pass under strict memory constraints. 1 For ε approximating a single quantile query with probability 1 - δ we obtain a O((1) log log(1)) space algorithm and a matching lower bound. We also provide a O((1) log 2 log(1)) sketch which is fully mergeable. These give non-mergeable and merge- able summaries for simultaneously approximating all quantiles in space O((1) log log(1/εδ)) and O((1) log 2 log(1/εδ)) respectively. The algo- rithms operate in the comparison model and are simple to implement. 1 Introduction Given a set of items x 1 ,...,x n , the quantile of a value x is the fraction of items in the stream such that x i x. It is also convenient to define the rank of x as the number of items such that x i x. An additive error εn for R(x) is an ε approximation of its rank. The literature distinguishes between several different definitions of this problem. In this manuscript we distinguish between the single quantile approximation problem and the all quantiles approximation problem. Definition 1. The single quantile approximation problem: Given x 1 ,...,x n in a streaming fashion in arbitrary order, construct a data structure for computing ˜ R(x). By the end of the stream, receive a single element x and compute ˜ R(x) such that | ˜ R(x) - R(x)|≤ εn with probability 1 - δ. There are variations of this problem in which, the algorithm is not given x (as a query) but rather a rank r. It should be able to provide an element x i from the stream such that |R(x i ) - r|≤ εn. There are also variants that make the value r, or r/n known to the algorithm in advance. For example, one could search for an approximate median which is not trivial in streaming model. The most difficult variant is the all quantiles problem, in the sense that a solution to it provides a solution to all other variants. 1 The problem is defined explicitly in the introduction section. 1 arXiv:1603.05346v1 [cs.DS] 17 Mar 2016

Upload: trinhngoc

Post on 11-Jan-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Zohar Karnin arXiv:1603.05346v1 [cs.DS] 17 Mar 2016 · Almost Optimal Streaming Quantiles Algorithms Zohar Karnin Yahoo Research Kevin Lang Yahoo Research Edo Liberty Yahoo Research

Almost Optimal Streaming Quantiles Algorithms

Zohar KarninYahoo Research

Kevin LangYahoo Research

Edo LibertyYahoo Research

November 21, 2018

Abstract

Streaming approximation of quantiles is a widely performed operationand, as a result, a well studied problem. The goal is to construct a datastructure for approximately answering quantile queries in one pass understrict memory constraints.1 For ε approximating a single quantile querywith probability 1 − δ we obtain a O((1/ε) log log(1/δ)) space algorithmand a matching lower bound. We also provide a O((1/ε) log2 log(1/δ))sketch which is fully mergeable. These give non-mergeable and merge-able summaries for simultaneously approximating all quantiles in spaceO((1/ε) log log(1/εδ)) and O((1/ε) log2 log(1/εδ)) respectively. The algo-rithms operate in the comparison model and are simple to implement.

1 Introduction

Given a set of items x1, . . . , xn, the quantile of a value x is the fraction of itemsin the stream such that xi ≤ x. It is also convenient to define the rank of x asthe number of items such that xi ≤ x. An additive error εn for R(x) is an εapproximation of its rank. The literature distinguishes between several differentdefinitions of this problem. In this manuscript we distinguish between the singlequantile approximation problem and the all quantiles approximation problem.

Definition 1. The single quantile approximation problem: Given x1, . . . , xn ina streaming fashion in arbitrary order, construct a data structure for computingR(x). By the end of the stream, receive a single element x and compute R(x)such that |R(x)−R(x)| ≤ εn with probability 1− δ.

There are variations of this problem in which, the algorithm is not given x(as a query) but rather a rank r. It should be able to provide an element xifrom the stream such that |R(xi)− r| ≤ εn. There are also variants that makethe value r, or r/n known to the algorithm in advance. For example, one couldsearch for an approximate median which is not trivial in streaming model. Themost difficult variant is the all quantiles problem, in the sense that a solutionto it provides a solution to all other variants.

1The problem is defined explicitly in the introduction section.

1

arX

iv:1

603.

0534

6v1

[cs

.DS]

17

Mar

201

6

Page 2: Zohar Karnin arXiv:1603.05346v1 [cs.DS] 17 Mar 2016 · Almost Optimal Streaming Quantiles Algorithms Zohar Karnin Yahoo Research Kevin Lang Yahoo Research Edo Liberty Yahoo Research

Definition 2. The all quantiles approximation problem: Given x1, . . . , xn in astreaming fashion in arbitrary order, construct a data structure for computingR(x). By the end of the stream, with probability 1 − δ, for all values of xsimultaneously it should hold that |R(x)−R(x)| ≤ εn.

Observe that approximating a set of O(1/ε) single quantiles correctly sufficesfor solving the all quantiles approximation problem. Using the union bound,solving the single quantiles approximation problem with failure probability atmost εδ constitutes a valid solution for the all quantiles approximation problem.

1.1 Related work

Two recent surveys [1][2] on this problem give ample motivation and explainthe state of the art in terms of algorithms and theory in a very accessible way.In what follows, we shortly review some of the prior work that is most relevantin the context of this manuscript.

Manku, Rajagopalan and Lindsay [3] built on the work of Munro and Pa-terson [4] and gave a randomized solution which uses at most O((1/ε) log2(nε))space.2 A simple deterministic version of their algorithm achieves the samebounds.3 We refer to their algorithm as MRL. Greenwald and Khanna [5] cre-ated another deterministic algorithm that requires O((1/ε) log(nε)) space. Werefer to their algorithm as GK.

Note that a uniform sample of size n′ = O(log(1/ε)/ε2) from the streamsuffices to produce approximate quantiles. Feeding the sampled elements intothe GK sketch yields a O((1/ε) log(1/ε)) solution. However, to produce suchsamples, one must know n (approximately) in advance. This observation wasalready made by Manku et al. [3]. Since n is not known in advance it is nota trivial task to combine a sampler with a GK sketch. Recently, Felber andOstrovsky [6] managed to do exactly that and achieved space complexity ofO((1/ε) log(1/ε)) by using sampling and several GK sketches in conjunction ina non trivial way. To the best of our knowledge, this is the best known spacecomplexity result to date.

Mergeability: an important property of sketches is called mergability [7].Informally, this property allows one to sketch different sections of the streamindependently and then combine the resulting sketches. The combined sketchshould be as accurate as a single sketch one would have computed had theentire stream been consumed by a single sketcher. This property is extremelyimportant in practice since large datasets are often distributed across manymachines. Agarwal et al [7] conjecture that the GK sketch is not mergeable and

go on to describe a mergeable sketch of space complexity (1/ε) log3/2(1/ε). Itis worth noting that this results predates that of [6].

2For readability, all the mentioned results for randomized algorithm apply for a constantsuccess probability, unless otherwise stated.

3This was pointed out, for example, by [1].

2

Page 3: Zohar Karnin arXiv:1603.05346v1 [cs.DS] 17 Mar 2016 · Almost Optimal Streaming Quantiles Algorithms Zohar Karnin Yahoo Research Kevin Lang Yahoo Research Edo Liberty Yahoo Research

Model: different sketching algorithms perform different kinds of operationson the elements in the stream. The most restricted model is the comparisonmodel. In this model, there is a strong order imposed on the elements andthe algorithm can only compare two items to decide which is larger. This is thecase, for example, for lexicographic ordering of strings. All the cited works aboveoperate in this model. Another model assumes the total size of the universe isbounded by |U |. An O((1/ε) log(|U |)) space algorithm was suggested by [8] inthat model. If the items are numbers, for example, one could also computeaverages or perform gradient descent like algorithms such as [9]. In machinelearning, a quantile loss function refers to an asymmetric or weighted `1 loss.Predicting the values of the items in the stream with the quantile loss convergesto the correct quantiles. Such methods however only apply to randomly shuffledstreams.

Lower bounds: For any algorithm, there exists a (trivial) space lower boundof Ω(1/ε). Hung and Ting [10] showed that any deterministic comparisonbased algorithm for the single quantile approximation problem must storeΩ((1/ε) log(1/ε)) items. Felber and Ostrovsky [6] suggest, as an open problem,that the Ω((1/ε) log(1/ε)) lower bound could potentially hold for randomizedalgorithms as well. Prior to this work, it was very reasonable to believe thisconjecture is true. For example, no o((1/ε) log(1/ε)) space algorithm was knowneven for randomly permuted streams.4

1.2 Main contribution

We begin by re-explaining the work of [7] and the deterministic version of [3]from a slightly different view point. The basic building block for these algorithmsis a single compactor. A compactor with capacity k can store up to k items allwith the same weight w. When the compactor is full (it contains k elements)it can compact its k elements into k/2 elements of weight 2w. This is done asfollows. First, the items are sorted. Then either the even or the odd elements inthe sequence are chosen and their weight is set to 2w. The unchosen items arediscarded. Consider a single element x. Its rank estimation before and after thecompaction defers by at most w regardless of k. This is illustrated in Figure 1.

This already gives a deterministic algorithm. Assume we use such a com-pactor. When it outputs items, we feed them into another compactor and so on.Since each compactor halves the number of items in the sequence, there could beat most H ≤ dlog(n/k)e compactors chained together. Let h denote the heightof the compactor where the last one created has height H and the first one hasheight 1. Let wh = 2h−1 be the weight of items compactor h gets. Then, thenumber of compact operation it performs is at most mh = n/kwh. Summing up

all the errors in the system∑H

h=1mhwh =∑H

h=1 n/k = Hn/k ≤ n log(n/k)/k.

4If the stream is presented in random order, obtaining an O((1/ε) log(1/ε)) solution is easy.Namely, save the first O((1/ε) log(1/ε)) items and, from that point on, only count how manyitems fall between two consecutive samples. Due to coupon collector, at least one sample willfall in each stretch of εn items which makes the solution trivially correct.

3

Page 4: Zohar Karnin arXiv:1603.05346v1 [cs.DS] 17 Mar 2016 · Almost Optimal Streaming Quantiles Algorithms Zohar Karnin Yahoo Research Kevin Lang Yahoo Research Edo Liberty Yahoo Research

Figure 1: An illustration of a single compactor of capacity 6 performing a singlecompaction operation.

Since we have H compactors, the space usage is kH ≤ k log(n/k). Settingk = O((1/ε) log(εn)) yields error of εn with space O((1/ε) log2(εn)).

One conceptual contribution of Agarwal et al. [7] is to have each compactordelete the odd or even items with equal probability. This has the benefit thatthe expected error is zero. It also lets one use standard concentration resultsto bound the maximal error. This eliminates one log factor from the worstcase analysis. The dependence over δ, however adds a factor of

√log(1/δ). In

the all quantiles problem this translates into an additional√

log(1/ε) factor forconstant failure probability. Intuitively Agarwal et al. [7] also show that whenn ≥ poly(1/ε) one can sample items from the stream before feeding them to

the sketch. This gives total space usage of O(

(1/ε) log(1/ε)√

log(1/δ))

for the

single quantile problem and O(

(1/ε) log(1/ε)√

log(1/εδ))

for the all quantiles

problem.The first improvement we provide to the algorithms above is by using differ-

ent compactor capacities in different heights, denoted by kh. We show that khcan, for example, decrease exponentially kh ≈ kH(2/3)H−h. That is, compactorsin lower levels in the hierarchy can operate with significantly less capacity. Sur-prisingly enough, this turns out to not effect the asymptotic statistical behaviorof the error at all. Moreover, the space complexity is clearly improved.

The capacity of any functioning compactor must be at least 2. This couldcontribute O(H) = O(log(n/k)) to the space complexity. To remove this depen-dence, we notice that a sequence of H ′′ compactors with capacity 2 essentiallyperform sampling. Out of every 2H

′′elements they select one at random and

output that element with weight 2H′′. This is clearly very efficiently com-

putable and does not truly require memory complexity of O(H ′′) but rather ofO(1). The total capacity of all compactors whose capacity is more than 2 is

bounded by∑H

h=H′′+1 k(2/3)H−h ≤ 3k. This yields a total space complexity of

O(k). Setting k = O(1/ε)√

log(1/δ) give an algorithm with space complexity

O(1/ε)√

log(1/δ) for the single quantile problem, which constitutes our first

4

Page 5: Zohar Karnin arXiv:1603.05346v1 [cs.DS] 17 Mar 2016 · Almost Optimal Streaming Quantiles Algorithms Zohar Karnin Yahoo Research Kevin Lang Yahoo Research Edo Liberty Yahoo Research

result. Interestingly, our algorithm can be thought of a smooth interpolationbetween carful compaction of heavy items and efficient sampling for light items.

The next improvement comes from special handling of the top log log(1/δ)compactors. Intuitively, the number of compaction operations (and thereforerandom bits) in those levels is so small that one could expect the worst casebehavior. For worst case analysis a fixed kh is preferred to diminishing valuesof kh. Therefore, we suggest to set kh = k when h ≥ H − O(log log(1/δ))and kh ≈ k(2/3)H−h otherwise. By analyzing the worst case error of the toplog log(1/δ) compactors separately from the bottom H−log log(1/δ) we improveour analysis to O((1/ε) log2 log(1/δ)). Interestingly, the worst case analysis ofthe top log log(1/δ) compactors is identical to that of the MRL sketch [3].

This last observation leads us to our third and final improvement. If onereplaces the top log log(1/δ) compactors with a GK sketch, the space complexitycan be shown to reduce to O((1/ε) log log(1/δ)). However, this prevents thesketch from being mergeable because the GK sketch is not known to have thisproperty. Another way to view this algorithm is as a concatenation of threesketches. The first, receiving the stream of elements is a sampler that simulatesall the compactors of capacity 2. Its output is fed into a KLL sketch with ahierarchy of compactors with increasing capacities. This sketch outputs itemsof weight εn/ poly log(1/δ) that are fed into an instance of GK. This idea isillustrated in Figure 2.

Single quantile All quantilesDeterministic -Mergeable

Comment

MRL [3] (1/ε) log2(εn) (1/ε) log2(εn) Yes - YesGK [5] (1/ε) log(εn) (1/ε) log(εn) Yes - No

ACHPWY [7] (1/ε) log(1/ε)· (1/ε) log(1/ε)· No - Yes√log(1/δ)

√log(1/εδ)

FO [6] (1/ε) log(1/ε) (1/ε) log(1/ε) No - No fixed δ =

e−(1/ε)c

KLL [Here] 1/ε log2 log(1/δ) (1/ε) log2 log(1/δε) No - YesKLL [Here] 1/ε log log(1/δ) (1/ε) log log(1/δε) No - No

Table 1: The table describes the space complexity of several streaming quantilesalgorithms in big-O notation. The randomized algorithms are required to suc-ceed with constant probability. They all work in the comparison model and forarbitrarily ordered streams. The non-mergeability of some of these algorithmsis due to the fact that they use GK, which is not known to be mergeable, as asubroutine.

2 The Algorithm

As mentioned above our algorithm includes a hierarchy of compactors. Thecode for a single compactor object is given as Algorithm 1. Compactors containa list of items and support the operations of append and compact.

5

Page 6: Zohar Karnin arXiv:1603.05346v1 [cs.DS] 17 Mar 2016 · Almost Optimal Streaming Quantiles Algorithms Zohar Karnin Yahoo Research Kevin Lang Yahoo Research Edo Liberty Yahoo Research

Algorithm 1 Compactor: a compactor is a linked list with one added func-tion, namely, compact. It inherits the following linked list functions sort()length(), pop(), and extend().

procedure compact()sort()with probability 1/2

while length() ≥ 2 do; pop(); yield pop();else

while length() ≥ 2 do; yield pop(); pop();

Algorithm 2 KLL Sketch: the algorithm below uses a lazy approach to com-pacting. It verifies that each compaction uses enough items for the proof tohold. Yet, while the overall number of items is below its limit, it accepts moreitems. This practical improvement makes a significant difference in the exper-imental behavior of the algorithm. The pseudo code uses 1-based indexing tostay consistent with the mathematical derivation.

procedure initialize(k) input:grow()

procedure grow()Add a new empty compactor to compactors heirarchyH = number of compactorsmaxSize =

∑Hh=1(

⌈k(2/3)H−h

⌉+ 1)

procedure update(x)Append x to compactors[1]compact()

procedure compact()while sum of lengths of compactors ≥ maxSize do

for h ∈ [1, . . . ,H] doif length(compactors[h]) ≥

⌈k(2/3)H−h

⌉+ 1 then

if h+ 1 > H then grow() #Add compactorcompactors[h+1].extend(compactors[h].compact())if sum of lengths of compactors < maxSize then break

procedure merge(other)# Assumes H ≥ other .H, otherwise switch roles.for h ∈ [1, . . . , other .H] do

compactors[h].extend(other . compactors[h])

compact()

procedure rank(x)r = 0for h = 1, . . . ,H do

for x′ ∈ compactors[h] doif x′ ≤ x then r = r + 2h−1

return r

6

Page 7: Zohar Karnin arXiv:1603.05346v1 [cs.DS] 17 Mar 2016 · Almost Optimal Streaming Quantiles Algorithms Zohar Karnin Yahoo Research Kevin Lang Yahoo Research Edo Liberty Yahoo Research

Figure 2: Sampling, varying capacity compactors, equal capacity compactorsand GK achieve different efficiencies at different stream lengths. The set ofcapacitated compactors used by KLL creates a smooth transition between sam-pling and GK sketching. The top figure corresponds to the first constructionexplained in Section 3. The second from the top is explained in Section 4. Thetwo bottom figures correspond to our main contributions and are explained inSections 5 and 5.1 respectively.

Our KLL sketch is detailed in Algorithm 2. For simplicity we provide aversion of it that may maintain log(n) compactors and explain later how toreduce this number to O(log(1/ε)). We remind the reader that the additionalcompactors have capacity exactly 2. Hence the difference in terms of memoryconsumption between the two version is an additive log(n) term. For additionalsimplification, the version given here is one where a compactor of height h isassigned a capacity of kh =

⌈k(2/3)H−h

⌉+ 1 rather than an arbitrary capacity.

Our sketch supports the procedures of initialize, with an input of k, deter-mining the memory capacity. It also supports the procedures update, rank,and merge. The update procedure feeds an additional single item to thesketch. The merge procedure receives as input another instance of a KLLsketch. It merges the two sketches into a single one that sketches the concate-nation of both streams. The rank procedure receives as input an item x and

7

Page 8: Zohar Karnin arXiv:1603.05346v1 [cs.DS] 17 Mar 2016 · Almost Optimal Streaming Quantiles Algorithms Zohar Karnin Yahoo Research Kevin Lang Yahoo Research Edo Liberty Yahoo Research

returns the estimate for the rank of x in the stream. It does so by summing theweights of items in the sketch that are smaller or equal to x.

3 Analysis

Let us begin by reviewing some properties of the KLL sketch. Consider a runof the algorithm that terminates with H different compactors. The compactorsare indexed by their hight h ∈ 1, . . . ,H. The weight of items at hight h iswh = 2h−1. Denote by kh the smallest number of items that the compactor athight h contained during a compact operation. For brevity, denote k = kH . Forreasons that will become clear later assume that kh ≥ kcH−h. for c ∈ (0.5, 1).

Since the top compactor was created, we know that the second compactorfrom the top compacted its elements at least once. Therefore kH−12H−2 ≤ ntis gives that

H ≤ log(n/kH−1) + 2 ≤ log(n/ck) + 2 .

Therefore, the number of compact operations mh at height h is at most(2/c)H−h−1. To see that, every compact procedure call is performed on at leastkh items and the items have a weight of wh = 2h−1. Therefore,

mh ≤n

khwh≤ 2n

k2H(2/c)H−h ≤ (2/c)H−h−1 .

To analyze the total error produced by the sketch, we first consider the errorgenerated in each individual level. Define by R(k, h) the rank of x among thefollowing weighted set of items. The items yielded by the compactor at heighth and all the items stored in the compactors of heights h′ ≤ h at end of thestream. For convenience R(x, 0) = R(x) is the exact rank of x in the inputstream. Define err(x, h) = R(x, h) − R(x, h − 1) to be the total change in theapproximated rank of x due to level h.

Note that each compaction operation in level h either leaves the rank ofx unchanged or adds wh or subtracts wh with equal probability. To be moreexplicit, if x has an even rank among the item inside the compactor, the totalmass to the left of it (its rank) is unchanged by the compaction operation. Ifits rank inside the compactor is odd however and the odd items are chosen,this mass increases by wh. If its rank inside the compactor is odd and the evenitems are chosen, its mass decreases by wh. Therefore, err(x, h) =

∑mh

i=1 whXi,h

where E[Xi,h] = 0 and |Xi,h| ≤ 1. The final discrepancy between the real rank

of x and its approximation R(x) = R(x,H) is

R(x,H)−R(x, 0) =

H∑h=1

R(x, h)−R(x, h− 1) =

H∑h=1

err(x, h) =

H∑h=1

mh∑i=1

whXi,h .

Lemma 1 (Hoeffding). Let X1, . . . , Xm be independent random variables, eachwith an expected value of zero, taking values in the range [−wi, wi]. Then for

8

Page 9: Zohar Karnin arXiv:1603.05346v1 [cs.DS] 17 Mar 2016 · Almost Optimal Streaming Quantiles Algorithms Zohar Karnin Yahoo Research Kevin Lang Yahoo Research Edo Liberty Yahoo Research

any t > 0 we have

Pr

[∣∣∣∣∣m∑i=1

Xi

∣∣∣∣∣ > t

]≤ 2 exp

(− t2

2∑m

i=1 w2i

)with exp being the natural exponent function.

We now apply Hoeffding’s inequality to bound the probability that the bot-tom H ′ compactors contribute more than εn to the total error. The reason forconsidering only the bottom levels and not all the levels will become apparentin Section 4.

Pr

H′∑h=1

mh∑i=1

whXi,h > εn

≤ 2 exp

(− ε2n2

2∑H′

h=1

∑mh

i=1 w2h

)(1)

A straight forward computation shows that

H′∑h=1

mh∑i=1

w2h =

H′∑h=1

mhw2h =

H′∑h=1

(c/2)H′−h−122h−1

=(2/c)H

′−1

4

H′∑h=1

(2c)h ≤ (2/c)H′−1

4

(2c)H′

2c− 1≤ c

8(2c− 1)22H

Substituting 22H′

= 22H/22(H−H′) and recalling that H ≤ log(n/ck) + 2 we get

thatH′∑h=1

mh∑i=1

w2h ≤

n2/k2

2c2(2c− 1)

1

22(H−H′)(2)

Substituting Equation 2 into Equation 1 and setting C = c2(2c− 1) we get thefollowing convenient form

Pr [|R(x,H ′)−R(x)| ≥ εn] ≤ 2 exp(−Cε2k222(H−H

′))

(3)

Theorem 1. There exists a streaming algorithm that computes an ε approxi-mation for the rank of a single item with probability 1−δ whose space complexityis O((1/ε)

√log(1/δ) + log(εn)). This algorithm also produces mergeable sum-

maries.

Proof. Let kh =⌈kcH−h

⌉+ 1. Note that kh changes throughout the run off

the algorithm. Nevertheless, H can only increase and so kh is monotonicallydecreasing with the length of the stream. This matches the requirement thatkh ≥ kcH−h where H is the final number of compactors. Notice that kh is atleast 2.

Setting H ′ = H in Equation 3 and requiring failure probability at most δwe conclude that it suffices to set k = (C/ε)

√log(2/δ). The space complexity

9

Page 10: Zohar Karnin arXiv:1603.05346v1 [cs.DS] 17 Mar 2016 · Almost Optimal Streaming Quantiles Algorithms Zohar Karnin Yahoo Research Kevin Lang Yahoo Research Edo Liberty Yahoo Research

of the algorithm is O(∑H

h=1 kh).

H∑h=1

kh ≤H∑

h=1

(kcH−h + 2) ≤ k/(1− c) + 2H = O(k + log(n/k))

Setting δ = Ω(ε) suffices to union bound over the failure probabilities of O(1/ε)different quantiles. This provides a mergeable sketching algorithm for the allquantiles problem of space O((1/ε)

√log(1/ε) + log(εn)).

4 Reducing the space complexity

As mentioned above, the sketch in Algorithm 2 maintains a total of H = log(εn)sketches. Note, however, that only O(log(k)) compactors have capacity greaterthan 2. More accurately the bottom H ′′ = H − dlogc(2/k)e all have capacityexactly 2. In what follows we argue that any number of such compactors canbe replaced with a simple sampling scheme that requires O(1) space to manage.

To understand why, consider a sequence of capacity 2 compactors. Each ofthose receives two items at a time, performs a random match between them,and sends the winner of the match to the compactor of the next level. Hence,the compactor of level H ′′ simply selects one item uniformly at random fromevery 2H

′′elements in the stream and passes that item with weight 2H

′′to the

compactor at hight H ′′ + 1. This is easily done efficiently but could potentiallycause issues with merging such sketches. In what follows we describe how toachieve O(1) space while maintaining mergeability.

In the new sketch we have in addition to the compactors, a new objectwe call a sampler. The pseudo-code for its methods is given in Algorithm 3.The sampler supports an update method where an item of weight w may beinserted. When observing a stream this method will always be called with aweight of 1. Other weight values are supported only for merging two samplers.At any time the sampler will have an associated height h to it and it will outputitems of weight 2h, as input to the compactor of level h+ 1; since the height ofthe sampler should depend on H, it will increase as the stream length grows,hence it supports a Grow procedure that increments its height. When havinga sampler of height h, the sketch will only maintain compactors with heigh > h.The sampler keeps a single item in storage along with a weight of at most 2h−1.When merging two sketches, the sketch with the sampler of smaller height willfeed its item with its appropriate weight to the sampler of the other sketch.Also, all compactors with height ≤ h in the ‘smaller’ sketch will feed the itemsin their buffers to the sampler of the ‘larger sketch’.

It is easy to verify that the above described sketching scheme correspondsto the following offline operations. The sampler outputs items by performingan action we call sample. This action takes as input W items of weight 1 andoutputs either no items or a single item of weight 2h. More explicitly, with prob-ability W/2h it outputs one of the observed items uniformly at random. Withprobability 1−W/2h it outputs nothing. Before analyzing the error associated

10

Page 11: Zohar Karnin arXiv:1603.05346v1 [cs.DS] 17 Mar 2016 · Almost Optimal Streaming Quantiles Algorithms Zohar Karnin Yahoo Research Kevin Lang Yahoo Research Edo Liberty Yahoo Research

with a sample operation we mention that in a streaming setting, it suffices torestrict that value of W to be exactly 2h. However, in order to account formerges we must let W obtain values in the range W ∈ (2h−1, 2h].

Lemma 2. Let R(x, h, i) be the rank of x after sample operation i of the samplerof height h, where the rank is computed based on the stream elements that didnot undergo a sample operation, and the weighted items outputted by the sampleoperations 1 through i. Let R(x, h, i)−R(x, h, i− 1) = 2hYh,i. Then E[Yh,i] = 0and |Yh,i| ≤ 1.

Proof. Let r be the exact rank of x among the W items before the samplingoperation. After the sampling, those W items are replaced with a single element.The rank of x after the sampling 2h with probability W

2h· rW . The first term is

the probability of outputting anything. The second is of selecting an elementsmaller than x. Therefore, the expected contribution or 2hW

2h· r

W − r = 0.

Clearly, the maximal contribution is bounded by 2h.

Let mh be total number of times a sample operation can be performed atheight h. Since the sampler at that height takes a minimum of W > 2h−1 itemsand there are a total of n items

mh ≤n

2h−1.

It follows that the expression of the error accounted for samplers of heights upto5 H ′′ can be expressed as

errH′′ =

H′′∑h=1

mh∑i=1

[R(x, h, i)−R(x, h, i− 1)] =

H′′∑h=1

mh∑i=1

2hYi,h

with Yi,h being independent, E[Yi,h] = 0 and |Yi,h| ≤ 1. We compute the sumof weights appearing in Hoeffding’s inequality (Lemma 1) in order to apply it.

H′′∑h=1

mh∑i=1

22h ≤H′′∑h=1

n2h+1 ≤ 4n2H′′

=4n2H

2H−H′′ ≤16n2

ck2H−H′′

Hence,

Pr [errH′′ > εn] ≤ 2 exp(−cε2k2H−H

′′/32). (4)

Theorem 2. Assume we apply a KLL sketch that uses H levels of compactors,with capacity kh ≥ kcH−h. Also, assume that an arbitrary subset of the streamis fed into samplers of heights 1 through H ′′, while the output of these samplersis fed to appropriate compactors. Then for any H ′ > H ′′ it holds that

Pr [errH′ > 2εn] < 2 exp(−cε2k2H−H

′′/32)

+ 2 exp(−Cε2k222(H−H

′))

5Notice that unlike compactors, there is no hierarchy of samplers. However, due to thefact that the height of the sampler grows with time, the sample operations may be performedon different heights. The only guarantee is that any item appears in at most one sampleoperation and that the height of the sampler is always at most H′′.

11

Page 12: Zohar Karnin arXiv:1603.05346v1 [cs.DS] 17 Mar 2016 · Almost Optimal Streaming Quantiles Algorithms Zohar Karnin Yahoo Research Kevin Lang Yahoo Research Edo Liberty Yahoo Research

Here, errH′ denotes the error of the stream outputted by the compactors of levelH ′, and C = c2(2c− 1).

Algorithm 3 Mergeable sampler

procedure initializeh = 1W = 0

procedure update(x,w)if W + w ≤ 2h then

W ←W + wwith probability w/W set y ← x.if W = 2h then W = 0; yield y

else if W + w > 2h and W ≤ w thenwith probability w/2h yield x

else if W + w > 2h and W ≥ w thenW = w; y ← xwith probability W/2h yield y

procedure growh← h+ 1

By taking H ′ = H, H ′′ = H−O(log(k)), and k = (1/ε)√

log(1/δ) we obtainthe following corollary.

Corollary 1. There exists a streaming algorithm that computes an ε approxi-mation for the rank of a single item with probability 1−δ whose space complexityis O((1/ε)

√log(1/δ)). This algorithm also produced mergeable summaries.

5 Reducing the failure probability

In this section we take full advantage of equation 3 (and Theorem 2 for theversion with space complexity independent of n) to obtain a streaming algo-rithm with asymptotically better space complexity. Notice that lion share ofthe contribution to the error is due to the top compactors. For those, however,Hoeffding’s bound is not tight. Let s = O(log log(1/δ)) be a small number of toplayers. For the bottom H − s layers we use Theorem 2, applied on H ′ = H − sand corresponding H ′′, to bound their error. For the top s we simply use adeterministic bound.

Theorem 3. There exists a streaming algorithm that computes an ε approxi-mation for the rank of a single item with probability 1−δ whose space complexityis O((1/ε) log2 log(1/δ)). This algorithm also produces mergeable summaries.

Proof. Using Theorem 2 we see that the bottom compactors of height at mostH ′ = H − s and the sampler, when set to be of height at most H ′′ = H −2s − log2(k), contribute at most εn to the error with probability 1 − δ at

12

Page 13: Zohar Karnin arXiv:1603.05346v1 [cs.DS] 17 Mar 2016 · Almost Optimal Streaming Quantiles Algorithms Zohar Karnin Yahoo Research Kevin Lang Yahoo Research Edo Liberty Yahoo Research

long as εk2s ≥ c′√

log(2/δ) for sufficiently small constant c′. For the tops compactors, we set their capacities to kh = k. That is, we do not lettheir capacity drop exponentially. Those levels contribute to the error at most∑H

h=H′+1mhwh =∑H

h=H′+1 n/kh = sn/k. Requiring that this contribution isat most εn as well we obtain the relation s ≤ kε. Setting s = O(log log(1/δ))and k = O( 1

ε log log(1/δ)) satisfies both conditions. The space complexity ofthis algorithm is dominated by maintaining the top s levels which is O(ks) =O((1/ε) log2 log(1/δ)).

Interestingly, the analysis of the top s levels is identical to the equal capacitycompactors used in the MRL sketch. In the next section we show that one couldreplace the top s levels with a different algorithm and reduce the dependenceon δ even further.

5.1 Reducing further; loosing mergeability

The most space efficient version of our algorithm, with respect to the failureprobability, operates as follows. For δ being the target error probability weset s = O(log log(1/δ)) as in the section above but set k = O(1/ε). At anytime point we keep 2 different copies of the GK sketch, tuned for a relativeerror of ε. They are correspondingly associated with the compactors of heightsh1 < h2 which are the two largest height values are multiples of s. The GKsketch associated with height h receives as input the outputs of the compactorof layer h− 1. For h = 0 the GK sketch associated with it receives as input thestream elements. When a new GK sketch is built due to a new compactor (i.e.the Grow procedure) the bottom one is discarded.

Theorem 4. There exists a streaming algorithm that computes an ε approxi-mation for the rank of a single item with probability 1−δ whose space complexityis O((1/ε) log log(1/δ)).

Proof. Notice that the height of h1 is at least H − 2s. It follows that the totalnumber of items that is ever fed into a single GK sketch at most n1 = k22s.Applying Theorem 2 again, on H ′ = H − s, and H ′′ = H − 2s − log2(k),the error w.r.t. to the output of the compactor feeding elements into the GKsketch matching h1 is at most O(εn), with k = O(1/ε). Therefore, the sum oferrors is still O(εn). The memory required by the GK sketch with respect to itsinput is at most O((1/ε) log(εn1)) = O ((1/ε)s) = O ((1/ε) log log(1/δ)), whichdominates the memory of our sketch with k = O(1/ε). The claim follows.

We note that the GK sketch is not known to be fully mergeable and so thatproperty of the sketch is lost by this construction.

6 Tightness of our result

We mention that in [10] a lower bound is given for deterministic algorithms.A more precise statement of their result, implicitly shown in the proof of their

13

Page 14: Zohar Karnin arXiv:1603.05346v1 [cs.DS] 17 Mar 2016 · Almost Optimal Streaming Quantiles Algorithms Zohar Karnin Yahoo Research Kevin Lang Yahoo Research Edo Liberty Yahoo Research

Theorem 2, is as follows

Lemma 3 (Implicit in [10], Theorem 2). Let AD be deterministic comparisonbased algorithm solving single quantile ε approximate for all streams of length atmost C(1/ε)2 log(1/ε)2 for some sufficiently large universal constant C. ThenAD must store at least c(1/ε) log(1/ε) elements from the stream for some suffi-ciently small constant c.

Below we obtain a lower bound matching our result completely for the caseof single quantile problem and almost completely for the case of ε approximationof all quantiles.

Theorem 5. Let AR be a randomized comparison based algorithm solving theε approximate single quantile problem with probability at least 1− δ. Then ARmust store at least Ω ((1/ε) log log(1/δ)) elements from the stream.

Proof. Assume by contradiction that there exists a randomized algorithm ARthat succeeds in computing a single quantile approximation up to error ε withprobability 1 − δ while storing o ((1/ε) log log(1/δ)) elements from the stream.Let n be the length of the stream and δ = 1/2n!. With probability 1/2 therandomized algorithm succeeds simultaneously for all n! possible inputs. Let rdenote a sequence of random bits used by AR in one of these instances. It is nowpossible to construct a deterministic algorithm AD(r) which is identical to AR

but with r hardcoded into it. Note that AD(r) deterministically succeeds forstreams of length n. Let n = C(1/ε)2 log(1/ε)2 for the same C as in Lemma 3.We obtain that AD(r) succeeds on all streams of length C(1/ε)2 log(1/ε)2 whilestoring o((1/ε) log(1/ε)) elements from the stream. This contradicts Lemma 3above.

This lower bound perfectly matched the single quantile approximation resultwe achieve. Using the union bound over a set of O(1/ε) quantiles shows thatO((1/ε) log log(1/ε)) elements suffice. Nevertheless, it is potentially possiblethat a more carful analysis could avoid the log log(1/ε) term.

References

[1] Lu Wang, Ge Luo, Ke Yi, and Graham Cormode. Quantiles over datastreams: An experimental study. In Proceedings of the 2013 ACM SIGMODInternational Conference on Management of Data, SIGMOD ’13, pages737–748, New York, NY, USA, 2013. ACM.

[2] Michael B. Greenwald and Sanjeev Khanna. Quantiles and equidepth his-tograms over streams. 2016.

[3] Gurmeet Singh Manku, Sridhar Rajagopalan, and Bruce G. Lindsay. Ran-dom sampling techniques for space efficient online computation of orderstatistics of large datasets. SIGMOD Rec., 28(2):251–262, June 1999.

14

Page 15: Zohar Karnin arXiv:1603.05346v1 [cs.DS] 17 Mar 2016 · Almost Optimal Streaming Quantiles Algorithms Zohar Karnin Yahoo Research Kevin Lang Yahoo Research Edo Liberty Yahoo Research

[4] J.I. Munro and M.S. Paterson. Selection and sorting with limited storage.Theoretical Computer Science, 12(3):315 – 323, 1980.

[5] Michael Greenwald and Sanjeev Khanna. Space-efficient online computa-tion of quantile summaries. In Proceedings of the 2001 ACM SIGMODInternational Conference on Management of Data, SIGMOD ’01, pages58–66, New York, NY, USA, 2001. ACM.

[6] David Felber and Rafail Ostrovsky. A randomized online quantile summaryin 1

ε log1ε words. CoRR, abs/1503.01156, 2015.

[7] Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff Phillips,Zhewei Wei, and Ke Yi. Mergeable summaries. In Proceedings of the 31stSymposium on Principles of Database Systems, PODS ’12, pages 23–34,New York, NY, USA, 2012. ACM.

[8] Nisheeth Shrivastava, Chiranjeeb Buragohain, Divyakant Agrawal, andSubhash Suri. Medians and beyond: New aggregation techniques for sensornetworks. In Proceedings of the 2Nd International Conference on Embed-ded Networked Sensor Systems, SenSys ’04, pages 239–249, New York, NY,USA, 2004. ACM.

[9] Andrej Brodnik, Alejandro Lopez-Ortiz, Venkatesh Raman, and AlfredoViola. Space-Efficient Data Structures, Streams, and Algorithms: Papersin Honor of J. Ian Munro, on the Occasion of His 66th Birthday, volume8066. Springer, 2013.

[10] Regant YS Hung and Hingfung F Ting. An Ω( 1ε log 1

ε ) space lower boundfor finding ε-approximate quantiles in a data stream. In Frontiers in Algo-rithmics, pages 89–100. Springer, 2010.

15