partner webinar: recommendation engines with mongodb and hadoop

K Young - CEO, Mortar

Recommendation Engines with MongoDB and Hadoop

Recommendation Engine

Recommendation engines automatically recommend the "right" items for each user.

• Retail• Music• Videos• Dating• Etc…

WHAT IS IT

EXAMPLES

LinkedIn: 50% of new connections come from "People You May Know"

Netflix: 75% of content is viewed because of a recommendation

Amazon: 35% of sales are driven by recommendations

THAT’S ME

K Young

FOR THIS WEBINAR

Agenda

1. Recommendation Engines2. Hadoop3. Demo: Build a Recommendation Engine4. Your Recommendation Engine5. Q&A

Recommendation EngineNOW GENERALLY AVAILABLE

• Open source, free• Very flexible• Massively Scalable• 100% Customizable• Tested and proven

Technical implementation of how humans make recommendations.

Using:• past behavior• similar users• content metadata• outside signals e.g. instagram

HOW DO THEY WORK?

Recommendation EngineUSER INTERACTIONS: SIGNALS

Recommendation EngineITEM-ITEM RECOMMENDATIONS

Recommendation EngineUSER-ITEM RECOMMENDATIONS

WHERE DO RECOMMENDATIONS APPEAR?

Landing pageProduct pageCartPush emailEtc.

Predictions based on macro-trends, e.g. trending on twitter

Numeric predictions, e.g. price elasticity

WHAT IS IT ISN’T

A WARNING

Recommendation engines are famously hard to launch because they touch: engineering, finance, product, executive.

How to succeed:1) speedy implementation (target 1 week)2) engine flexibility3) gradual roll-out4) visible KPI-impact

RAPID OVERVIEW

Hadoop

Platform for distributed data processing.

Strengths:• Can scale up to thousands of

computers• Widely used• Very broadly applicable• Free, open

Problem:• Difficult to use for complex problems

ON HADOOP

Less code Compiles to native Hadoop codePopular (LinkedIn, Twitter, Salesforce, Yahoo, Spotify...)

BRIEF, EXPRESSIVELIKE PROCEDURAL SQL

(thanks: twitter hadoop world presentation)

FOR SERIOUSThe Same Script, In MapReduce

MOTIVATIONS

MongoDB + Pig

Data storage and data processing are often separate concerns

Hadoop is built for scalable processing of large datasets

SIMILAR PHILOSOPHY

MongoDB, Pig

Poly-structured data• MongoDB: stores data, regardless of

structure• Pig: reads data, regardless of structure

SIMILAR PHILOSOPHY

MongoDB Hadoop Connector

Open source connector for Hadoop (and family) to read from and write to MongoDB.

(Links at end).

Build a recommendation engineENOUGH PREAMBLE, NOW IT’S…

Demo Time!

Build a recommendation engineDEMO AGENDA

1) Intro to Mortar

2) Download recommendation code

3) Hook up the demo implementation (last.fm)

4) Generate recommendations at scale

5) View recommendations

Build a recommendation engineDEMO

Use Mortar for demo

Free to use

Open, code runs anywhere

Complete tutorial online (link at end)

MortarONLINE TUTORIAL

MortarFAST INTRO

Data science lacks a way to organize, test, deploy, and collaborate with code. So:

• One-button code deployment, powered by Github

• Award-winning job monitoring and visualization

• Realtime log collection and error analysis

• Free local development with one-click installation

> mortar projects:fork git@github.com:mortardata/mortar-recsys.git mortar_webinar_20140415

Sending request to register project: mortar_webinar_20140415... done

Status: Success!

Your project is ready for use. Type 'mortar help' to see the commands you can perform on the project.

DEFINITIONS

Users: Someone interacting with your items and generating events that you captureItems: The things you are recommending: videos, articles, products, etc.Signal: A user-item interaction with a weighting that tells us the relative value of the interaction.

Recommendation EngineUSER INTERACTIONS: SIGNALS

Steps in a recommendation engine:• Load your data• Generate your signals• Call code to generate

recommendations• Store your recommendationsNot covered today:• Serve your recommendations• Track KPI-impact

17.5MM documents of 360K users’ top played artists. Provided by Last.fm at http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-360K.html

Used a Pig job to load a MongoLab database with the data.

> db.lastfm_plays.find()

{ "user" : "faf…a60", "num_plays" : 67,

"artist_name" : "beastie boys" }

{ "user" : "faf0…a60", "num_plays" : 66,

"artist_name" : "the beatles" }

{ "user" : "faf0…a60", "num_plays" : 65,

"artist_name" : "the smashing pumpkins" }

DEMO: LOAD THE DATA

First step: Load our listening data.

%default DB 'mongo_webinar'

%default PLAYS_COLLECTION ‘lastfm_plays'

raw_input =

load '$CONN/$DB.$PLAYS_COLLECTION'

using com.mongodb.hadoop.pig.MongoLoader('

user:chararray,

artist_name:chararray,

num_plays:int

Pig code

DEMO: GENERATE SIGNALS

Now that we have our data loaded we need to extract: user, item, signal.

user_signals = foreach raw_input generate

artist_name as item,

num_plays as weight:int;

Pig code

DEMO: CALL MORTAR

Now that the data is in the correct format we’ll call the mortar algorithms for generating item-item and user-item recommendations.

item_item_recs =

recsys__GetItemItemRecommendations(user_signals);

user_item_recs =

recsys__GetUserItemRecommendations(user_signals,

item_item_recs);

Pig code

DEMO: STORE OUR RESULTS

Now that we have our results let’s store them back to MongoDB for use by our application.

%default II_COLLECTION 'item_item_recs'

%default UI_COLLECTION 'user_item_recs'

store item_item_recs into

'$CONN/$DB.$II_COLLECTION' using

com.mongodb.hadoop.pig.MongoInsertStorage('','');

store user_item_recs into

'$CONN/$DB.$UI_COLLECTION' using

com.mongodb.hadoop.pig.MongoInsertStorage('','');

Pig code

DEMO: RUN IT!

Now we’re going to use Mortar to start and manage a Hadoop cluster to run our recommender.

> mortar run pigscripts/mongo/lastfm-recsys-online.pig -f params/lastfm.params --clustersize 10

Taking code snapshot... done

Sending code snapshot to Mortar... done

Requesting job execution... done

job_id: 534462bea22f3803fd9cacca

Job status can be viewed on the web at:

https://app.mortardata.com/jobs/job_detail?job_id=53

4462bea22f3803fd9cacca

> db.item_item_recs.find()

{ "item_A":"yo-yo ma", "rank":1,

"item_B":"natalie clein" }

{ "item_A":"miley cyrus", "rank":1,

"item_B":"miley cyrus and billy ray cyrus” }

{ "item_A":"dimmu borgir", "rank":1,

"item_B":"ad inferna” }

EVALUATING YOUR RESULTS

Your Recommendation Engine

At first, use your knowledge of your domain knowledge to determine whether recommendations are sensible.

Mortar provides a recommendation browser.

Optionally get detailed recommendations.

item_item_recs =

recsys__GetItemItemRecommendationsDetailed(user_signals

Pig code

Later, run A/B tests with your recommendations to see how they improve the metrics you care about.

Usually not multivariate.

Usually no training set is possible.

CUSTOMIZING

To make customization easier Mortar has help documentation and code covering more than a dozen common cases:

• Removing bots from your signal data

• Removing out-of-stock items• Boosting popular items• Adding categories to your items• Cold start• Greater discovery and variety

PRODUCTION QUESTIONS

How do you read your MongoDB?

1) Read backup files from S32) Connect to secondary nodes3) Connect to primary nodes4) Connect to dedicated analytics nodes5) Turn file-system snapshot backups into BSON

PRODUCTION QUESTIONS

How do you release new recommendations while serving the old ones?

APIFlip between live and offline databaseAlso enables rollback

WE DISCUSSED

Summary

What a recommendation engine isHow Hadoop works with MongoDBSet up a demo recommendation engineHow to connect your data Touched on advanced techniquesSteered away from pot holesResources for next step

help.mortardata.com/recommenders

answers.mortardata.com

@kky@mortardata

partner webinar: recommendation engines with mongodb and hadoop

data recommendation

demo recommendation

signals recommendation

recommendation amazon

mortar recommendation

download recommendation

engine flexibility

hadoop pig

Technology

deploying mongodb and hadoop to amazon web services ·...

mongodb & hadoop: flexible hourly batch processing model

mongodb & hadoop - understanding your big data

sql-on-hadoop engines explained - mapr · 2019-07-16 ·...

mongodb, hadoop and humongous data - mongosv 2012

barcelona mug mongodb + hadoop presentation

workflow engines for hadoop

accumulo/hadoop, mongodb, and elasticsearch performance

mongodb days silicon valley: mongodb and the hadoop...

mariadb 10.2 new features · other storage engines ii...

analytics with mongodb aggregation framework and hadoop...

introduction to new high performance storage engines in...

setting up hadoop with mongodb on windows 7 64-bit

setting up hadoop with mongodb on windows 7...

moving from c#/.net to hadoop/mongodb

webinar: mongodb and hadoop - essential tools for your big...

performance evaluation of a mongodb and hadoop platform...

mongodb + pig on hadoop (mongosv 2012)

big data analytics with hadoop, mongodb and sql server

mug nantes - mongodb et son connecteur pour hadoop