partner webinar: recommendation engines with mongodb and hadoop

Post on 15-Jan-2015

5.020 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Personalized recommendations drive business, helping people find the products they want, the news they need, and the music they didn't know they would love. Despite the obvious advantages, many companies either don't have recommendations or don't leverage their data to make good ones. Too many recommendation engines are black-box algorithms that are hard to change or don't scale well. Using the same recommendation techniques as used at StubHub, Viacom, and AP, this technical webinar will show you how to load your data from MongoDB into Hadoop, generate recommendations, and then put those recommendations into MongoDB, ready to serve end-users. This webinar will prepare you to build a custom recommender for your company that is highly scalable, easy to understand, and built on open-source technology. K Young: About the speaker K Young is the CEO of Mortar Data. Mortar serves data scientists and engineers with a service that makes creating and operating high-scale data pipelines easy. Mortar contributes to several open source projects including Pig, Luigi, and the Mongo-Hadoop connector. Prior to founding Mortar Data, K built software that reaches one in ten public school students in the U.S. He holds a Computer Science degree from Rice University.

TRANSCRIPT

K Young - CEO, Mortar

Recommendation Engines with MongoDB and Hadoop

Recommendation Engine

Recommendation engines automatically recommend the "right" items for each user.

• Retail• Music• Videos• Dating• Etc…

WHAT IS IT

EXAMPLES

Recommendation Engine

LinkedIn: 50% of new connections come from "People You May Know"

Netflix: 75% of content is viewed because of a recommendation

Amazon: 35% of sales are driven by recommendations

THAT’S ME

K Young

FOR THIS WEBINAR

Agenda

1. Recommendation Engines2. Hadoop3. Demo: Build a Recommendation Engine4. Your Recommendation Engine5. Q&A

Recommendation EngineNOW GENERALLY AVAILABLE

• Open source, free• Very flexible• Massively Scalable• 100% Customizable• Tested and proven

Recommendation Engine

Technical implementation of how humans make recommendations.

Using:• past behavior• similar users• content metadata• outside signals e.g. instagram

HOW DO THEY WORK?

Recommendation EngineUSER INTERACTIONS: SIGNALS

Recommendation EngineITEM-ITEM RECOMMENDATIONS

Recommendation EngineUSER-ITEM RECOMMENDATIONS

WHERE DO RECOMMENDATIONS APPEAR?

Recommendation Engine

Landing pageProduct pageCartPush emailEtc.

Recommendation Engine

Predictions based on macro-trends, e.g. trending on twitter

Numeric predictions, e.g. price elasticity

WHAT IS IT ISN’T

A WARNING

Recommendation Engine

Recommendation engines are famously hard to launch because they touch: engineering, finance, product, executive.

How to succeed:1) speedy implementation (target 1 week)2) engine flexibility3) gradual roll-out4) visible KPI-impact

RAPID OVERVIEW

Hadoop

Platform for distributed data processing.

Strengths:• Can scale up to thousands of

computers• Widely used• Very broadly applicable• Free, open

Problem:• Difficult to use for complex problems

ON HADOOP

Pig

Less code Compiles to native Hadoop codePopular (LinkedIn, Twitter, Salesforce, Yahoo, Spotify...)

BRIEF, EXPRESSIVELIKE PROCEDURAL SQL

Pig

(thanks: twitter hadoop world presentation)

FOR SERIOUSThe Same Script, In MapReduce

MOTIVATIONS

MongoDB + Pig

Data storage and data processing are often separate concerns

Hadoop is built for scalable processing of large datasets

SIMILAR PHILOSOPHY

MongoDB, Pig

Poly-structured data• MongoDB: stores data, regardless of

structure• Pig: reads data, regardless of structure

SIMILAR PHILOSOPHY

MongoDB Hadoop Connector

Open source connector for Hadoop (and family) to read from and write to MongoDB.

(Links at end).

Build a recommendation engineENOUGH PREAMBLE, NOW IT’S…

Demo Time!

Build a recommendation engineDEMO AGENDA

1) Intro to Mortar

2) Download recommendation code

3) Hook up the demo implementation (last.fm)

4) Generate recommendations at scale

5) View recommendations

Build a recommendation engineDEMO

Use Mortar for demo

Free to use

Open, code runs anywhere

Complete tutorial online (link at end)

MortarONLINE TUTORIAL

MortarFAST INTRO

MortarFAST INTRO

Data science lacks a way to organize, test, deploy, and collaborate with code. So:

• One-button code deployment, powered by Github

• Award-winning job monitoring and visualization

• Realtime log collection and error analysis

• Free local development with one-click installation

> mortar projects:fork git@github.com:mortardata/mortar-recsys.git mortar_webinar_20140415

Sending request to register project: mortar_webinar_20140415... done

Status: Success!

Your project is ready for use. Type 'mortar help' to see the commands you can perform on the project.

DEFINITIONS

Recommendation Engine

Users: Someone interacting with your items and generating events that you captureItems: The things you are recommending: videos, articles, products, etc.Signal: A user-item interaction with a weighting that tells us the relative value of the interaction.

Recommendation EngineUSER INTERACTIONS: SIGNALS

STEPS

Recommendation Engine

Steps in a recommendation engine:• Load your data• Generate your signals• Call code to generate

recommendations• Store your recommendationsNot covered today:• Serve your recommendations• Track KPI-impact

DEMO

Recommendation Engine

17.5MM documents of 360K users’ top played artists. Provided by Last.fm at http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-360K.html

Used a Pig job to load a MongoLab database with the data.

> db.lastfm_plays.find()

{ "user" : "faf…a60", "num_plays" : 67,

"artist_name" : "beastie boys" }

{ "user" : "faf0…a60", "num_plays" : 66,

"artist_name" : "the beatles" }

{ "user" : "faf0…a60", "num_plays" : 65,

"artist_name" : "the smashing pumpkins" }

DEMO: LOAD THE DATA

Recommendation Engine

First step: Load our listening data.

%default DB 'mongo_webinar'

%default PLAYS_COLLECTION ‘lastfm_plays'

raw_input =

load '$CONN/$DB.$PLAYS_COLLECTION'

using com.mongodb.hadoop.pig.MongoLoader('

user:chararray,

artist_name:chararray,

num_plays:int

');

Pig code

DEMO: GENERATE SIGNALS

Recommendation Engine

Now that we have our data loaded we need to extract: user, item, signal.

user_signals = foreach raw_input generate

user,

artist_name as item,

num_plays as weight:int;

Pig code

DEMO: CALL MORTAR

Recommendation Engine

Now that the data is in the correct format we’ll call the mortar algorithms for generating item-item and user-item recommendations.

item_item_recs =

recsys__GetItemItemRecommendations(user_signals);

user_item_recs =

recsys__GetUserItemRecommendations(user_signals,

item_item_recs);

Pig code

DEMO: STORE OUR RESULTS

Recommendation Engine

Now that we have our results let’s store them back to MongoDB for use by our application.

%default II_COLLECTION 'item_item_recs'

%default UI_COLLECTION 'user_item_recs'

store item_item_recs into

'$CONN/$DB.$II_COLLECTION' using

com.mongodb.hadoop.pig.MongoInsertStorage('','');

store user_item_recs into

'$CONN/$DB.$UI_COLLECTION' using

com.mongodb.hadoop.pig.MongoInsertStorage('','');

Pig code

DEMO: RUN IT!

Recommendation Engine

Now we’re going to use Mortar to start and manage a Hadoop cluster to run our recommender.

> mortar run pigscripts/mongo/lastfm-recsys-online.pig -f params/lastfm.params --clustersize 10

Taking code snapshot... done

Sending code snapshot to Mortar... done

Requesting job execution... done

job_id: 534462bea22f3803fd9cacca

Job status can be viewed on the web at:

https://app.mortardata.com/jobs/job_detail?job_id=53

4462bea22f3803fd9cacca

> db.item_item_recs.find()

{ "item_A":"yo-yo ma", "rank":1,

"item_B":"natalie clein" }

{ "item_A":"miley cyrus", "rank":1,

"item_B":"miley cyrus and billy ray cyrus” }

{ "item_A":"dimmu borgir", "rank":1,

"item_B":"ad inferna” }

EVALUATING YOUR RESULTS

Your Recommendation Engine

At first, use your knowledge of your domain knowledge to determine whether recommendations are sensible.

Mortar provides a recommendation browser.

EVALUATING YOUR RESULTS

Your Recommendation Engine

Optionally get detailed recommendations.

item_item_recs =

recsys__GetItemItemRecommendationsDetailed(user_signals

);

Pig code

EVALUATING YOUR RESULTS

Your Recommendation Engine

Later, run A/B tests with your recommendations to see how they improve the metrics you care about.

Usually not multivariate.

Usually no training set is possible.

CUSTOMIZING

Your Recommendation Engine

To make customization easier Mortar has help documentation and code covering more than a dozen common cases:

• Removing bots from your signal data

• Removing out-of-stock items• Boosting popular items• Adding categories to your items• Cold start• Greater discovery and variety

PRODUCTION QUESTIONS

Your Recommendation Engine

How do you read your MongoDB?

1) Read backup files from S32) Connect to secondary nodes3) Connect to primary nodes4) Connect to dedicated analytics nodes5) Turn file-system snapshot backups into BSON

PRODUCTION QUESTIONS

Your Recommendation Engine

How do you release new recommendations while serving the old ones?

APIFlip between live and offline databaseAlso enables rollback

WE DISCUSSED

Summary

What a recommendation engine isHow Hadoop works with MongoDBSet up a demo recommendation engineHow to connect your data Touched on advanced techniquesSteered away from pot holesResources for next step

help.mortardata.com/recommenders

answers.mortardata.com

@kky@mortardata

top related