partner webinar: recommendation engines with mongodb and hadoop

57
K Young - CEO, Mortar Recommendation Engines with MongoDB and Hadoop

Upload: mongodb

Post on 15-Jan-2015

5.020 views

Category:

Technology


3 download

DESCRIPTION

Personalized recommendations drive business, helping people find the products they want, the news they need, and the music they didn't know they would love. Despite the obvious advantages, many companies either don't have recommendations or don't leverage their data to make good ones. Too many recommendation engines are black-box algorithms that are hard to change or don't scale well. Using the same recommendation techniques as used at StubHub, Viacom, and AP, this technical webinar will show you how to load your data from MongoDB into Hadoop, generate recommendations, and then put those recommendations into MongoDB, ready to serve end-users. This webinar will prepare you to build a custom recommender for your company that is highly scalable, easy to understand, and built on open-source technology. K Young: About the speaker K Young is the CEO of Mortar Data. Mortar serves data scientists and engineers with a service that makes creating and operating high-scale data pipelines easy. Mortar contributes to several open source projects including Pig, Luigi, and the Mongo-Hadoop connector. Prior to founding Mortar Data, K built software that reaches one in ten public school students in the U.S. He holds a Computer Science degree from Rice University.

TRANSCRIPT

Page 1: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

K Young - CEO, Mortar

Recommendation Engines with MongoDB and Hadoop

Page 2: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

Recommendation Engine

Recommendation engines automatically recommend the "right" items for each user.

• Retail• Music• Videos• Dating• Etc…

WHAT IS IT

Page 3: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

EXAMPLES

Recommendation Engine

LinkedIn: 50% of new connections come from "People You May Know"

Netflix: 75% of content is viewed because of a recommendation

Amazon: 35% of sales are driven by recommendations

Page 4: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

THAT’S ME

K Young

Page 5: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

FOR THIS WEBINAR

Agenda

1. Recommendation Engines2. Hadoop3. Demo: Build a Recommendation Engine4. Your Recommendation Engine5. Q&A

Page 6: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

Recommendation EngineNOW GENERALLY AVAILABLE

• Open source, free• Very flexible• Massively Scalable• 100% Customizable• Tested and proven

Page 7: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

Recommendation Engine

Technical implementation of how humans make recommendations.

Using:• past behavior• similar users• content metadata• outside signals e.g. instagram

HOW DO THEY WORK?

Page 8: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

Recommendation EngineUSER INTERACTIONS: SIGNALS

Page 9: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

Recommendation EngineITEM-ITEM RECOMMENDATIONS

Page 10: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

Recommendation EngineUSER-ITEM RECOMMENDATIONS

Page 11: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

WHERE DO RECOMMENDATIONS APPEAR?

Recommendation Engine

Landing pageProduct pageCartPush emailEtc.

Page 12: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

Recommendation Engine

Predictions based on macro-trends, e.g. trending on twitter

Numeric predictions, e.g. price elasticity

WHAT IS IT ISN’T

Page 13: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

A WARNING

Recommendation Engine

Recommendation engines are famously hard to launch because they touch: engineering, finance, product, executive.

How to succeed:1) speedy implementation (target 1 week)2) engine flexibility3) gradual roll-out4) visible KPI-impact

Page 14: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

RAPID OVERVIEW

Hadoop

Platform for distributed data processing.

Strengths:• Can scale up to thousands of

computers• Widely used• Very broadly applicable• Free, open

Problem:• Difficult to use for complex problems

Page 15: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

ON HADOOP

Pig

Less code Compiles to native Hadoop codePopular (LinkedIn, Twitter, Salesforce, Yahoo, Spotify...)

Page 16: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

BRIEF, EXPRESSIVELIKE PROCEDURAL SQL

Pig

(thanks: twitter hadoop world presentation)

Page 17: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

FOR SERIOUSThe Same Script, In MapReduce

Page 18: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

MOTIVATIONS

MongoDB + Pig

Data storage and data processing are often separate concerns

Hadoop is built for scalable processing of large datasets

Page 19: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

SIMILAR PHILOSOPHY

MongoDB, Pig

Poly-structured data• MongoDB: stores data, regardless of

structure• Pig: reads data, regardless of structure

Page 20: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

SIMILAR PHILOSOPHY

MongoDB Hadoop Connector

Open source connector for Hadoop (and family) to read from and write to MongoDB.

(Links at end).

Page 21: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

Build a recommendation engineENOUGH PREAMBLE, NOW IT’S…

Demo Time!

Page 22: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

Build a recommendation engineDEMO AGENDA

1) Intro to Mortar

2) Download recommendation code

3) Hook up the demo implementation (last.fm)

4) Generate recommendations at scale

5) View recommendations

Page 23: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

Build a recommendation engineDEMO

Use Mortar for demo

Free to use

Open, code runs anywhere

Complete tutorial online (link at end)

Page 24: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

MortarONLINE TUTORIAL

Page 25: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

MortarFAST INTRO

Page 26: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

MortarFAST INTRO

Data science lacks a way to organize, test, deploy, and collaborate with code. So:

• One-button code deployment, powered by Github

• Award-winning job monitoring and visualization

• Realtime log collection and error analysis

• Free local development with one-click installation

Page 27: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

> mortar projects:fork [email protected]:mortardata/mortar-recsys.git mortar_webinar_20140415

Sending request to register project: mortar_webinar_20140415... done

Status: Success!

Your project is ready for use. Type 'mortar help' to see the commands you can perform on the project.

Page 28: Partner Webinar: Recommendation Engines with MongoDB and Hadoop
Page 29: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

DEFINITIONS

Recommendation Engine

Users: Someone interacting with your items and generating events that you captureItems: The things you are recommending: videos, articles, products, etc.Signal: A user-item interaction with a weighting that tells us the relative value of the interaction.

Page 30: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

Recommendation EngineUSER INTERACTIONS: SIGNALS

Page 31: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

STEPS

Recommendation Engine

Steps in a recommendation engine:• Load your data• Generate your signals• Call code to generate

recommendations• Store your recommendationsNot covered today:• Serve your recommendations• Track KPI-impact

Page 32: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

DEMO

Recommendation Engine

17.5MM documents of 360K users’ top played artists. Provided by Last.fm at http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-360K.html

Used a Pig job to load a MongoLab database with the data.

Page 33: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

> db.lastfm_plays.find()

{ "user" : "faf…a60", "num_plays" : 67,

"artist_name" : "beastie boys" }

{ "user" : "faf0…a60", "num_plays" : 66,

"artist_name" : "the beatles" }

{ "user" : "faf0…a60", "num_plays" : 65,

"artist_name" : "the smashing pumpkins" }

Page 34: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

DEMO: LOAD THE DATA

Recommendation Engine

First step: Load our listening data.

Page 35: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

%default DB 'mongo_webinar'

%default PLAYS_COLLECTION ‘lastfm_plays'

raw_input =

load '$CONN/$DB.$PLAYS_COLLECTION'

using com.mongodb.hadoop.pig.MongoLoader('

user:chararray,

artist_name:chararray,

num_plays:int

');

Pig code

Page 36: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

DEMO: GENERATE SIGNALS

Recommendation Engine

Now that we have our data loaded we need to extract: user, item, signal.

Page 37: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

user_signals = foreach raw_input generate

user,

artist_name as item,

num_plays as weight:int;

Pig code

Page 38: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

DEMO: CALL MORTAR

Recommendation Engine

Now that the data is in the correct format we’ll call the mortar algorithms for generating item-item and user-item recommendations.

Page 39: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

item_item_recs =

recsys__GetItemItemRecommendations(user_signals);

user_item_recs =

recsys__GetUserItemRecommendations(user_signals,

item_item_recs);

Pig code

Page 40: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

DEMO: STORE OUR RESULTS

Recommendation Engine

Now that we have our results let’s store them back to MongoDB for use by our application.

Page 41: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

%default II_COLLECTION 'item_item_recs'

%default UI_COLLECTION 'user_item_recs'

store item_item_recs into

'$CONN/$DB.$II_COLLECTION' using

com.mongodb.hadoop.pig.MongoInsertStorage('','');

store user_item_recs into

'$CONN/$DB.$UI_COLLECTION' using

com.mongodb.hadoop.pig.MongoInsertStorage('','');

Pig code

Page 42: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

DEMO: RUN IT!

Recommendation Engine

Now we’re going to use Mortar to start and manage a Hadoop cluster to run our recommender.

Page 43: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

> mortar run pigscripts/mongo/lastfm-recsys-online.pig -f params/lastfm.params --clustersize 10

Taking code snapshot... done

Sending code snapshot to Mortar... done

Requesting job execution... done

job_id: 534462bea22f3803fd9cacca

Job status can be viewed on the web at:

https://app.mortardata.com/jobs/job_detail?job_id=53

4462bea22f3803fd9cacca

Page 44: Partner Webinar: Recommendation Engines with MongoDB and Hadoop
Page 45: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

> db.item_item_recs.find()

{ "item_A":"yo-yo ma", "rank":1,

"item_B":"natalie clein" }

{ "item_A":"miley cyrus", "rank":1,

"item_B":"miley cyrus and billy ray cyrus” }

{ "item_A":"dimmu borgir", "rank":1,

"item_B":"ad inferna” }

Page 46: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

EVALUATING YOUR RESULTS

Your Recommendation Engine

At first, use your knowledge of your domain knowledge to determine whether recommendations are sensible.

Mortar provides a recommendation browser.

Page 47: Partner Webinar: Recommendation Engines with MongoDB and Hadoop
Page 48: Partner Webinar: Recommendation Engines with MongoDB and Hadoop
Page 49: Partner Webinar: Recommendation Engines with MongoDB and Hadoop
Page 50: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

EVALUATING YOUR RESULTS

Your Recommendation Engine

Optionally get detailed recommendations.

Page 51: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

item_item_recs =

recsys__GetItemItemRecommendationsDetailed(user_signals

);

Pig code

Page 52: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

EVALUATING YOUR RESULTS

Your Recommendation Engine

Later, run A/B tests with your recommendations to see how they improve the metrics you care about.

Usually not multivariate.

Usually no training set is possible.

Page 53: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

CUSTOMIZING

Your Recommendation Engine

To make customization easier Mortar has help documentation and code covering more than a dozen common cases:

• Removing bots from your signal data

• Removing out-of-stock items• Boosting popular items• Adding categories to your items• Cold start• Greater discovery and variety

Page 54: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

PRODUCTION QUESTIONS

Your Recommendation Engine

How do you read your MongoDB?

1) Read backup files from S32) Connect to secondary nodes3) Connect to primary nodes4) Connect to dedicated analytics nodes5) Turn file-system snapshot backups into BSON

Page 55: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

PRODUCTION QUESTIONS

Your Recommendation Engine

How do you release new recommendations while serving the old ones?

APIFlip between live and offline databaseAlso enables rollback

Page 56: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

WE DISCUSSED

Summary

What a recommendation engine isHow Hadoop works with MongoDBSet up a demo recommendation engineHow to connect your data Touched on advanced techniquesSteered away from pot holesResources for next step

Page 57: Partner Webinar: Recommendation Engines with MongoDB and Hadoop

help.mortardata.com/recommenders

answers.mortardata.com

@kky@mortardata