strata new-york-2012
DESCRIPTION
This set of slides describes several on-line learning algorithms which taken together can provide significant benefit to real-time applications.TRANSCRIPT
1©MapR Technologies - Confidential
Online Learning Bayesian bandits and more
2©MapR Technologies - Confidential
whoami – Ted Dunning
Ted [email protected]@apache.org@ted_dunning
We’re hiring at MapR
For slides and other info http://www.slideshare.net/tdunning
3©MapR Technologies - Confidential
Online
ScalableIncremental
4©MapR Technologies - Confidential
Scalability and Learning
What does scalable mean?
What are inherent characteristics of scalable learning?
What are the logical implications?
5©MapR Technologies - Confidential
Scalable ≈ On-line
If you squint just right
6©MapR Technologies - Confidential
unit of work ≈ unit of time
7©MapR Technologies - Confidential
Learning
State
Infinite Data
Stream
8©MapR Technologies - Confidential
Pick One
9©MapR Technologies - Confidential
10©MapR Technologies - Confidential
11©MapR Technologies - Confidential
Now pick again
12©MapR Technologies - Confidential
A Quick Diversion
You see a coin– What is the probability of heads?– Could it be larger or smaller than that?
I flip the coin and while it is in the air ask again I catch the coin and ask again I look at the coin (and you don’t) and ask again Why does the answer change?– And did it ever have a single value?
13©MapR Technologies - Confidential
Which One to Play?
One may be better than the other The better coin pays off at some rate Playing the other will pay off at a lesser rate– Playing the lesser coin has “opportunity cost”
But how do we know which is which?– Explore versus Exploit!
14©MapR Technologies - Confidential
A First Conclusion
Probability as expressed by humans is subjective and depends on information and experience
15©MapR Technologies - Confidential
A Second Conclusion
A single number is a bad way to express uncertain knowledge
A distribution of values might be better
16©MapR Technologies - Confidential
I Dunno
17©MapR Technologies - Confidential
5 and 5
18©MapR Technologies - Confidential
2 and 10
19©MapR Technologies - Confidential
The Cynic Among Us
20©MapR Technologies - Confidential
Demo
21©MapR Technologies - Confidential
An Example
22©MapR Technologies - Confidential
An Example
23©MapR Technologies - Confidential
The Cluster Proximity Features
Every point can be described by the nearest cluster – 4.3 bits per point in this case– Significant error that can be decreased (to a point) by increasing number of
clusters Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign
bit + 2 proximities)– Error is negligible– Unwinds the data into a simple representation
24©MapR Technologies - Confidential
Diagonalized Cluster Proximity
25©MapR Technologies - Confidential
Lots of Clusters Are Fine
26©MapR Technologies - Confidential
Surrogate Method
Start with sloppy clustering into κ = k log n clusters Use these clusters as a weighted surrogate for the data Cluster surrogate data using ball k-means
Results are provably high quality for highly clusterable data Sloppy clustering can be done on-line Surrogate can be kept in memory Ball k-means pass can be done at any time
27©MapR Technologies - Confidential
Algorithm Costs
O(k d log n) per point for Lloyd’s algorithm … not so good for k = 2000, n = 108
Surrogate methods …. O(d log κ) = O(d (log k + log log n)) per point
This is a big deal:– k d log n = 2000 x 10 x 26 = 500,000– log k + log log n = 11 + 5 = 17– 30,000 times faster makes the grade as a bona fide big deal
28©MapR Technologies - Confidential
30,000 times faster sounds good
29©MapR Technologies - Confidential
30,000 times faster sounds good
but that isn’t the big news
30©MapR Technologies - Confidential
30,000 times faster sounds good
but that isn’t the big news
these algorithms do on-line clustering
31©MapR Technologies - Confidential
Parallel Speedup?
✓
32©MapR Technologies - Confidential
What about deployment?
33©MapR Technologies - Confidential
Learning
State
Infinite Data
Stream
34©MapR Technologies - Confidential
Mapper
State
Data Split
35©MapR Technologies - Confidential
Mapper
State
Data Split
Need shared memory!
MapperMapper
36©MapR Technologies - Confidential
whoami – Ted Dunning
We’re hiring at MapR
Ted [email protected]@apache.org@ted_dunning
For slides and other infohttp://www.slideshare.net/tdunning