building personalized applications at scale

Post on 25-Jun-2015

1.895 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Garrett Wu presents WibiData to the Bay Area Software Engineering meetup.

TRANSCRIPT

Building Personalized Applications at Scale

Garrett WuDirector of Engineering

Odiago, Inc.

Personalized Applications

Personalized Applications

Examples

● Recommendations○ Amazon○ Netflix

● Ad Targeting○ Hulu○ YouTube

● Fraud Detection○ Visa○ JPMC

● Spam○ GMail

● Search Personalization○ Google

Overall Requirements

● React to events in near real time.○ Low latency reads/writes.○ Event-driven analysis (not just batch).

● Web scale: 100's of millions of users.○ High throughput reads/writes.

● Reliable.○ Distributed, fault tolerant, graceful degradation.

● Flexible.○ Evolvable schema.○ Support ad-hoc experimentation and analyses.

Data Flow

Data Flow

Datastore Requirements

1. Random writes.2. Analysis (MapReduce).3. Random reads.

Datastore Requirements

1. Random writes.2. Analysis (MapReduce).3. Random reads.

Data Model Requirements

1. Write user-centric data.○ "Bob bought the Hunger Games book."○ "Sally viewed product page X."

2. Query user-centric data.○ "What were Jim's most recent 5 purchases?"○ "What are Sue's top 3 recommendations?"

Given everything we know about John:● Transactions.● Tweets.● Likes.

... recommend, classify, predict, cluster, profile.

User-centric Data Model

User-centric Data Model

<column> <name>email</name> <description>Email address</description> <schema>"string"</schema></column>

Cells have Avro schemas for evolvable storage and retrieval.

User-centric Data Model

● 3-D storage with timestamps.

Analyzing Data: Producers

● produce() generates derived data for a single row:○ recommend○ profile○ classify○ etc.

Analyzing Data: Gatherers

● gather() aggregates data across all rows.○ build association rules for collaborative filtering.○ train classifier models.○ compute prior probabilities for events.○ etc.

Example: Ad Targeting

User Games Interests Recommended AdsAlex MiniGolf Pro,

Extreme Pond Fishing

Bob Kitten Krash

Carol Apples Everywhere,Underground Racer

Game CategoriesMiniGolf Pro Golf,

Sports

Kitten Krash Cats,Racing

Apples Everywhere Puzzles

Example: Ad Targeting

User Games Interests Recommended AdsAlex MiniGolf Pro,

Extreme Pond FishingGolf,Sports

Bob Kitten Krash

Carol Apples Everywhere,Underground Racer

Game CategoriesMiniGolf Pro Golf,

Sports

Kitten Krash Cats,Racing

Apples Everywhere Puzzles

Producer

Example: Ad Targeting

User Games Interests Recommended AdsAlex MiniGolf Pro,

Extreme Pond FishingGolf,Sports

Bob Kitten Krash

Carol Apples Everywhere,Underground Racer

Category AdvertisementGolf ESPN.com

Animals Petco.com

Racing Nascar.com

Producer

ESPN.com

Example: Ad Targeting

User Games Interests Recommended AdsAlex MiniGolf Pro,

Extreme Pond FishingGolf,Sports

Bob Kitten Krash

Carol Apples Everywhere,Underground Racer

Category AdvertisementGolf ESPN.com

Animals Petco.com

Racing Nascar.com

Producer

ESPN.com

Wait, where did this come from?

Example: Gathering Associations

User Games Interests Clicked AdsAlex MiniGolf Pro,

Extreme Pond FishingGolf,Sports

Bob Kitten Krash

Carol Apples Everywhere,Underground Racer

Example: Gathering Associations

User Games Interests Clicked AdsAlex MiniGolf Pro,

Extreme Pond FishingGolf,Sports

Bob Kitten Krash

Carol Apples Everywhere,Underground Racer

Example: Gathering Associations

Example: Gathering Associations

Example: Gathering Associations

Example: Gathering Associations

Example: Gathering Associations

Map

.

.

.

Example: Gathering Associations

Map

.

.

.

Reduce

Final Thoughts

● A user-centric data storage model has great advantages:○ Fast per-user reads and writes.○ Already pivoted by your most common analysis.

● HBase provides fast, reliable random-access and scans.○ Billions of rows, millions of columns.○ Integrates well with MapReduce for analysis.

● Build scalable personalized applications with WibiData.○ Check out www.wibidata.com

Garrett Wu | gwu@odiago.com

top related