how to tell which algorithms really matter

How to Tell Which Algorithms Really Matter

Ted DunningMapR Technologies

© 2014 MapR Technologies 2

Which Algorithms are Important?(and how can you know?)

Ted Dunning, Chief Application ArchitectMapR Technologies


00:011.65TBWITH 298 SERVERS


129KRECCOMENDATIONS

00:02


Advertising Automation

Cloud

Sellers Cloud

BuyersCloud

63MAD AUCTIONS

00:03


00:04422.2KGENETIC SEQUENCES


Largest Biometric Database

00:054.73MAUTHENTICATIONS

© 2014 MapR Technologies 8© 2014 MapR Technologies

But How is This Done?

What really matters?


Topic For Today

• What is important? What is not?• Why?• What is the difference from academic research?• Some examples


What is Important?

• Deployable

• Robust

• Transparent

• Skillset and mindset matched?

• Proportionate


What is Important?

• Deployable– Clever prototypes don’t count if they can’t be standardized

• Robust

• Transparent


• Proportionate


What is Important?

• Deployable– Clever prototypes don’t count

• Robust– Mishandling is common

• Transparent– Will degradation be obvious?


• Proportionate


What is Important?

• Deployable– Clever prototypes don’t count

• Robust– Mishandling is common

• Transparent– Will degradation be obvious?

• Skillset and mindset matched?– How long will your fancy data scientist enjoy doing standard ops tasks?

• Proportionate– Where is the highest value per minute of effort?


Academic Goals vs Pragmatics

• Academic goals– Reproducible– Isolate theoretically important aspects– Work on novel problems

• Pragmatics– Highest net value– Available data is constantly changing– Diligence and consistency have larger impact than cleverness– Many systems feed themselves, exploration and exploitation are both

important– Engineering constraints on budget and schedule


Example 1:Making Recommendations Better


Recommendation Advances

• What are the most important algorithmic advances in recommendations over the last 10 years?

• Cooccurrence analysis?

• Matrix completion via factorization?

• Latent factor log-linear models?

• Temporal dynamics?


The Winner – None of the Above

• What are the most important algorithmic advances in recommendations over the last 10 years?

1. Result dithering (random noise)

2. Anti-flood (don’t repeat yourself)


The Real Issues

• Exploration• Diversity• Speed

• Not the last fraction of a percent


Result Dithering

• Dithering is used to re-order recommendation results – Re-ordering is done randomly

• Dithering is guaranteed to make off-line performance worse

• Dithering also has a near perfect record of making actual performance much better


Result Dithering

• Dithering is used to re-order recommendation results – Re-ordering is done randomly

• Dithering is guaranteed to make off-line performance worse

• Dithering also has a near perfect record of making actual performance much better

“Made more difference than any other change”


Example … ε = 0.5


Example … ε = log 2 = 0.69


Exploring The Second Page


Lesson 1:Exploration is good


Example 2:Bayesian Bandits


Bayesian Bandits

• Based on Thompson sampling• Very general sequential test • Near optimal regret• Trade-off exploration and exploitation

• Possibly best known solution for exploration/exploitation

• Incredibly simple


Fast Convergence


Thompson Sampling on Ads

An Empirical Evaluation of Thompson Sampling - Chapelle and Li, 2011


Bayesian Bandits versus Result Dithering

• Many useful systems are difficult to frame in fully Bayesian form• Thompson sampling cannot be applied without posterior

sampling

• Can still do useful exploration with dithering

• But better to use Thompson sampling if possible


Lesson 2:Exploration is easy to do and pays big benefits.


Example 3:On-line Clustering


The Problem

• K-means clustering is useful for feature extraction or compression

• At scale and at high dimension, the desirable number of clusters increases

• Very large number of clusters may require more passes through the data

• Super-linear scaling is generally infeasible


The Solution

• Sketch-based algorithms produce a sketch of the data• Streaming k-means uses adaptive dp-means to produce this

sketch in the form of many weighted centroids which approximate the original distribution

• The size of the sketch grows very slowly with increasing data size

• Many operations such as clustering are well behaved on sketches

Fast and Accurate k-means For Large Datasets. Michael Shindler, Alex Wong, Adam Meyerson.

Revisiting k-means: New Algorithms via Bayesian Nonparametrics . Brian Kulis, Michael Jordan.


An Example


Streaming k-means Ideas

• By using a sketch with lots (k log N) of centroids, we avoid pathological cases

• We still get a very good result if the sketch is created – in one pass– with approximate search

• In fact, adaptive dp-means works just fine

• In the end, the sketch can be used for clustering or …


Lesson 3:Sketches make big data small.


Example 4:Search Abuse


Recommendations

Alice got an apple and a puppy

Charles got a bicycle

Alice

Charles


Recommendations

Alice got an apple and a puppy

Charles got a bicycle

Bob got an apple

Alice

Bob

Charles


Recommendations

What else would Bob like??

Alice

Bob

Charles


Log Files

Alice

Bob

Charles

Alice

Bob

Charles

Alice


History Matrix: Users by Items

Alice

Bob

Charles

✔ ✔ ✔

✔ ✔

✔ ✔


Co-occurrence Matrix: Items by Items

-

1 2

1 1

1

1

2 1

How do you tell which co-occurrences are useful?.

00

0 0


Indicator Matrix: Anomalous Co-Occurrence

✔✔

Result: The marked row will be added to the indicator field in the item document…


Indicator Matrix

✔

id: t4title: puppydesc: The sweetest little puppy ever.keywords: puppy, dog, pet

indicators: (t1)

That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommendation engine.

Note: data for the indicator field is added directly to meta-data for a document in Solr index. You don’t need to create a separate index for the indicators.


Internals of the Recommender Engine

56


Real-life example


Lesson 4:Recursive search abuse pays

Search can implement recsWhich can implement search


How Does This Apply?


How Can I Start?


Q & A

@ted_dunning @mapr maprtech

[email protected]

Engage with us!

MapR

maprtech

mapr-technologies

how to tell which algorithms really matter

Technology