a formal analysis of conservative update based approximate counting gil einziger and roy freidman...
TRANSCRIPT
A Formal Analysis of Conservative Update Based Approximate
Counting
Gil Einziger and Roy FreidmanTechnion, Haifa
We wish to count the number of occurrences of various items from a very large domain.
To gain space efficiency, we are willing to tolerate an “approximate count” only.
Approximate Counting
Bloom Filters• An array BF of m bits and k hash functions {h1,…,hk} over the
domain [0,…,m-1]• Adding an object obj to the Bloom filter is done by computing
h1(obj),…, hk(obj) and setting the corresponding bits in BF• Checking for set membership for an object cand is done by
computing h1(cand),…, hk(cand) and verifying that all corresponding bits are set
m=11, k=3,
1 11
h1(o1)=0, h2(o1)=7, h3(o1)=5
BF=
h1(o2)=0, h2(o2)=7, h3(o2)=4
√
×
Counting Bloom Filters
• A vector of counters (instead of bits)• A counting Bloom filter supports the operations:
– Increment• Increment by 1 all entries that correspond to the results of the k hash
functions
– Decrement• Decrement by 1 all entries that correspond to the results of the k hash
functions
– Estimate (instead of get)• Return the minimal value of all corresponding entries
m=11
3 68
k=3, h1(o1)=0, h2(o1)=7, h3(o1)=5
CBF=
Estimate(o1)=4
4 9 7
• Give up the ability to Decrement in favor of accuracy/space efficiency– During an Increment operation, only update the
lowest counters
m=11
3 68
k=3, h1(o1)=0, h2(o1)=7, h3(o1)=5
SBF-MI=
Increment(o1) only addsto the first entry (3->4)
4
Empirically shown to improve accuracy! Up to two orders of magnitude for some workloads. – But not formally understood.
Conservative Update Technique
Motivation
• Applications: – Network messurements and heavy hitters.– Network security: anomaly detection.– Cache admission policy
Additional applications in other fields: e.g. databases and natural language processing.
TinyLFU - Cache Admission Policy (PDP 2014)
Fre
qu
en
cy
Rank
• The access distribution of most content is skewed▫Often modeled using Zipf-like functions, power-law, etc.
Long Heavy Tail For example~(50% of the weight)
A small number of very popular itemsFor example~(50% of the weight)
Cache Victim
Winner
Eviction and Admission Policies
Eviction Policy Admission Policy
New Item
One of you guys should leave …
is the new item any better than
the victim ?
What is the common Answer ?
• Conservative Update allows counting just the head items, with high accuracy, so our cache can make educated admission decisions.
Undesired
Desired Items
Conservative Update - Intuition
Admission Policy Example
More memory
Better cache management
Without admission policy
Frequency based admission policy
Cache Size
Hit
Rate
The Basic Observation
CBF =
LCS =
1 1 1
1 1 1
2 2 2
1 1 1
1
1
If we can quantify how many items are inserted to each level in the LCS we can bound the error.
A CBF is exactly like
Simple Observations• It is useful to discuss the number of items that
are inserted to each level of the LCS.
• Since all levels are considered the same – the false positive probability of each level is determined only by the number of items inserted to that level.
• A false positive at a higher level implies false positive at all lower levels.
• Known (constant) distribution • Large enough sample– We assume that we can make a ‘characteristic’
histogram.
Formally we know how many items are going to appear every number of times.
The Model
Denote A[i] - the number of items that are actually inserted to level i.• By definition: A min/max argument about the lowest level that could have experienced a false positive yields the following:
Lower Bound
𝐴 [𝑖 ]≥𝐷 [ 𝑖 ]
(𝑃 (𝐹 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑−𝐹ℜ𝑎𝑙 )≥𝑘 )≤𝐹𝑃 (𝐷 [𝑘 ] )
Upper Bound
• Is derived similar by upper bounding A[i]. • Requires a bit further assumptions.
Technical details in the paper.