recommendation engines & accumulo - sqrrl data science group may 21, 2013

Recommendation Engines &

Accumulo - Sqrrl

Data Science Group

May 21, 2013

Agenda

6:15 - 6:30 More data trumps better algorithms by Michael

Walker

6:30 - 7:30 Recommendation Engines by Tom Rampley

7:30 - 8:30 Accumulo - Sqrrl by John Dougherty

8:30 - 9:30 Network at Old Chicago at 14th and Market.

Data Science Group New Sponsors

Cloudera

O'Reilly Media

More data is better

Even if less exact or messier

One sensor = strict accuracy

Multiple sensors = less accurate & messy

More data points = greater value

Aggregate = more comprehensive picture

Increase frequency of sensor readings

One measure per min = accurate

100 readings per second = less accurate

> volume vs. exactitude

Accept messiness to get scale

Sacrifice accuracy in return for knowing general trend

Big data = probabilistic (not precise)

Sacrifice accuracy in return for knowing general trend

Big data = probabilistic (not precise)

Good yet has problems

Internet of Things

"Data Science" means the scientific study of the creation, manipulation and transformation of data to create meaning.

Internet of Things

"Data Scientist" means a professional who uses scientific methods to liberate and create meaning from raw data.

Internet of Things

"Big Data" means large data sets that have different properties from small data sets and requires special data science methods to differentiate signal from noise to extract meaning and requires special compute systems and power.

Data Science

"Signal" means a meaningful interpretation of data based on science that may be transformed into scientific evidence and knowledge.

Data Science

"Noise" means a competing interpretation of data not grounded in science that may not be considered scientific evidence. Yet noise may be manipulated into a form of knowledge (what does not work).

Machine Learning

Field of study that gives computers the ability to learn without being explicitly programmed.

Algorithms

Process or set of rules to be followed in calculations or other problem-solving operations to achieve a goal, especially a mathematical rule or procedure used to compute a desired result, produce the answer to a question or the solution to a problem in a finite number of steps.

More data trumps better algorithms

Microsoft Word Grammar Checker

Improve algorithms

New techniques

Improve algorithms

New techniques

New features

Feed more data into existing methods

Most ML-A one million words or less

Experiment: 10 mil - 100 mill - 1 billion

Results: algorithms improved dramatically

Simple algorithm that was worst performer with 1/2 mill words performed better than all others with 1 bill words

Algorithm worked best with 1/2 mill performed worst with 1 bill words

Conclusions:

More trumps less

More trumps smarter (not always)

Tradeoff between spending time and money on algorithm development versus spending it on data development

Google language translation

1 billion words

1 trillion words

larger yet messier data set - entire internet

Tom Rampley

Recommendation Engines: an Introduction

A Brief History of Recommendation Engines

What Does a Recommender Do?

Recommendation engines use algorithms of varying complexity to suggest items based upon historical information

• Item ratings or content• Past user behavior/purchase history

Recommenders typically use some form of collaborative filtering

Collaborative Filtering

The name:•‘Collaborative’ because the algorithm takes the choices of many users into account to make a recommendation•Rely on user taste similarity•‘Filtering’ because you use the preferences of other users to filter out the items most likely to be of interest to the current user

Collaborative Filtering

algorithms include:

•K nearest neighbors•Cosine similarity•Pearson correlation•Bayesian belief nets•Markov decision processes•Latent semantic indexing methods•Association Rules Learning

Cosine Similarity ExampleLets walk through an example of a simple collaborative

filtering algorithm, namely cosine similarityCosine similarity can be used to find similar items, or

similar individuals. In this case, we’ll be trying to identify individuals with similar taste

Imagine individual ratings on a set of items to be a [user,item] matrix. You can then treat the ratings of each individual as an N-dimensional vector of ratings on items: {r1, r2…rN}

The similarity of vectors (individuals’ ratings) can be computed by computing the cosine of the angle between them:

The closer the cosine is to 1, the more alike the two individuals’ ratings are

Cosine Similarity Example ContinuedLets say we have the following matrix of users

and ratings of TV shows:

And we encounter a new user, James, who has only seen and rated 5 of these 7 shows:

Of the two remaining shows, which one should we recommend to James?

True Blood CSI JAG Star Trek Castle The Wire

Twin Peaks

Bob 5 2 1 4 3 2 5Mary 4 4 2 1 3 1 2Jim 1 1 5 2 5 2 3

George 3 4 3 5 5 4 3Jennifer 5 2 4 2 4 1 0Natalie 0 5 0 4 4 1 4Robin 5 5 0 0 4 2 2

True Blood CSI JAG Star Trek CastleJames 5 5 3 1 0

Cosine Similarity Example ContinuedTo find out, we’ll see who James is most similar to

among the folks who have rated all the shows by calculating the cosine similarity between the vectors of the 5 shows that each individual have in common:

It seems that Mary is the closest to James in terms of show ratings among the group. Of the two remaining shows, The Wire and Twin Peaks, Mary slightly preferred Twin Peaks so that is what we recommend to James

Cosine Similarity James

Bob 0.73

Mary 0.89

Jim 0.47

George 0.69

Jennifer 0.78

Natalie 0.50

Robin 0.79

Collaborative Filtering Continued

This simple cosine similarity example could be extended to extremely large datasets with hundreds or thousands of dimensions

You can also compute item to item similarity by treating the item as the vectors for which you’re computing similarity, and the users as the dimensions•Allows for recommending similar items to a user after they’ve made a purchase•Amazon uses a variant of this algorithm•This is an example of item-to-item collaborative filtering

Adding ROI to the Equation: an Example with Naïve Bayes

When recommending products, some may generate more margin for the firm than others

Some algorithms can take cost into account when making recommendations

Naïve Bayes is a commonly used classifier that allows for the inclusion of marginal value of a product sale in the recommendation decision

Naïve Bayes

Bayes theorem tells us the probability of

our beliefs being true given prior beliefs

and evidence

Naïve Bayes is a classifier that utilizes Bayes’

theorem (with simplifying assumptions) to generate a probability of an instance

belonging to a class

Class likelihood can be combined with expected payoff to generate the optimal payoff from a

recommendation

Naïve Bayes ContinuedHow does the NB algorithm generate class

probabilities, and how can we use the algorithmic output to maximize expected payoff?

Let’s say we want to figure out which of two products to recommend to a customerEach product generates a different amount of

profit for our firm per unit soldWe know the target customer’s past purchasing

behavior, and we know the past purchasing behavior of twelve other customers who have bought one of the two potential recommendation products

Let’s represent our knowledge as a series of matrices and vectors

Naïve Bayes Continued

Past Customer Purchasing Behavior

Toys Games Candy Books BoatJohn Squirt Gun Chess Skittles Harry Potter SpeedboatMary Doll Life M&Ms Emma SpeedboatPete Kite Chess M&Ms Twilight SailboatKevin Squirt Gun Life Snickers Emma SailboatDale Doll Life Skittles Twilight SpeedboatJane Kite Monopoly Skittles Twilight SpeedboatRaquelle Squirt Gun Monopoly Skittles Harry Potter SailboatJoanne Kite Chess Snickers Twilight SpeedboatSusan Squirt Gun Chess Skittles Twilight SailboatTim Doll Life M&Ms Harry Potter SailboatLarry Kite Chess M&Ms Twilight SpeedboatRegina Doll Monopoly Snickers Harry Potter SailboatEric Squirt Gun Life Snickers Harry Potter ?

Naïve Bayes ContinuedNB uses (independent) probabilities of events

to generate class probabilitiesUsing Bayes’ theorem (and ignoring the

scaling constant) the probability of a customer with past purchase history α (a vector of past purchases) buying item θ is:

P (α1, …, αi | θj ) P (θj )Where P (θj) is the frequency with which the

item appears in the training data, and P (α1,

…, αi | θj ) is Π P (αi | θj ) for all i items in the training dataThat P (α1, …, αi | θj ) P (θj ) = Π P (αi | θj ) P (θj) is

dependent up on the assumption of conditional independence between past purchases

Naïve Bayes ContinuedIn our example, we can calculate the

following probabilities:Sailboat Speedboat

P(θ) 6/12 6/12

Sailboat SpeedboatSquirt Gun 3/6 1/6Kite 1/6 3/6Doll 2/6 2/6Life 2/6 2/6Monopoly 2/6 1/6Chess 2/6 3/6Skittles 2/6 3/6M&Ms 2/6 2/6Snickers 2/6 1/6Harry Potter 3/6 1/6Twilight 2/6 4/6Emma 1/6 1/6

Now that we can calculate P (α1, …, αi | θj ) P (θj ) for all instances, let’s figure out the most likely boat purchase for Eric:

These probabilities may seem very low, but recall that we left out the scaling constant in Bayes theorem since we’re only interested in the relative probabilities of the two outcomes

P(θ) Toys Games Candy Books Boat

Eric Squirt Gun Life Snickers Harry Potter ?

Sailboat 6/12 3/12 2/12 2/12 3/12 0.00086806

Speedboat 6/12 1/12 2/12 1/12 1/12 0.00004823

So it seems like the sailboat is a slam dunk to

recommend. It’s much more likely (18 times!) for

Eric to buy than the speedboat.

But let’s consider a scenario: let’s say our hypothetical firm

generates $20 of profit whenever a customer buys a speedboat, but only $1 when they buy a sailboat (outboard motors are apparently very high margin)

In that case, it would make more sense to recommend the

speedboat, because our expected payoff from the speedboat

recommendation would be 11% greater ($20/$1 * .0000048/.00087) than our expected payout from the

sailboat recommendation

This logic can be applied to any number of products, by

multiplying the set of purchase probabilities by the

set of purchase payoffs, taking the maximum value as the

recommendation engines & accumulo - sqrrl data science group may 21, 2013

data points

transformation of data

raw data

data science noise

special data science

large data sets

small data sets

internet of things big

Documents

accumulo fotovoltaico

accumulo design

fusion iomemory pcie solutions from sandisk ® and sqrll...

completing the big data ecosystem: sqrrl and...

sqrrl real time_big_data_20130411

sqrrl june webinar: an accumulo love story

sqrrl february webinar: breaking down data silos

accumulo summit 2015: real-time distributed and reactive...

accumulo summit 2015: accumulo in-depth: building bulk...

accumulo summit 2014: accismus -- percolating with accumulo

sqrrl and accumulo -...

accumulo summit 2015: event-driven big data with accumulo -...

sqrrl€¦ · secure.’’scale.’’adapt.’ 3...

sqrrl march webinar: how to build a big app

aws sqrrl · sqrrl reveals connections between alerts,...

accumulo summit 2015: verifiable responses to accumulo...

sqrrl november webinar: encryption and security in accumulo

scaldaqua elettrici ad accumulo i accumulo smaltato

sqrrl may webinar: data-centric security

accumulo summit 2015: zookeeper, accumulo, and you...