large scale recommendation in e-commerce -- qiang yan

Large Scale Recommenda/on in E-‐Commerce

Qiang Yan, Quan Yuan

Taobao Search&P13N Team

Alibaba Group

Outline

•  Introduc@on – Data in Taobao – Recommenda@on in Taobao

•  Approaches to recommenda@on – eTREC – Rank

•  Lessons we learn •  Conclusion & Challenges

Outline

•  Introduc/on – Data in Taobao – Recommenda@on in Taobao



Largest online and mobile commerce company in the world

Data in Taobao

Item Discovery

P13n in Ver/cal Industry

My Taobao – Guess You Like

Recommenda@ons in taobao.com

Shop Discovery

Recommenda@ons in Taobao Mobile Flash Sale

Dress rec for female

P13n HQ REC

New Items REC

Powered by Recommenda/on

Recommenda@ons in Taobao Mobile

Powered by Recommenda/on

P13n in Ver@cal Industry

Shop Discovery

Item Rec

Outline




PlaIorm

Match/Retrieval

Rank

Applica/on

TPP Tair Hbase UPS JStrom ODPS

RT CF Content User-‐Based DT LC

RTP Xlib Olive

Re-‐Rank

Diversity Freshness Business Goal

HomePage（PC）

Ver/cal (PC)

HomePage(mobile)

Shopping Path

Ver/cal (mobile)

Overview

PlaIorm

Match/Retrieval

Rank

Applica/on

TPP Tair Hbase UPS JStrom ODPS

RT CF Content User-‐Based DT LC

RTP Xlib Olive

Re-‐Rank

Diversity Freshness Business Goal

HomePage（PC）

Ver/cal (PC)

HomePage(mobile)

Shopping Path

Ver/cal (mobile)

Overview

eTREC

eTREC

Items

Users

Content(word、tag)

User Item

ItemCF

UserCF

ContentCF

Feature-‐Based CF

User Item Features

Tags

Style

Latent Class

….

A high efficient distributed feature-‐based collabora/ve filtering tool

Implementa@on trick 1 – Operators

Jaccard Cosine

eTREC

Implementa@ons trick 2 –  Less Map/Reduce •  NormAndDot •  CalSim

–  Less emiSed item-‐item pairs

NormAndDot Job

CalSim Job

feature_id en/ty_id Preference payload

en/ty_id norm(i) <j,dot(i ,j)> ….

en/ty_id <j, sim(i,j)> ….

ItemCF in Mahout

eTREC

•  Features – Fast

•  400M users X 200M items in less than 20 mins

– Easy to use – Scalable

•  User-‐defined similarity (Default: cosine，jaccard，asymcosine)

•  User-‐defined item-‐item pairs

eTREC

Rank: Olive •  Olive = Real-‐@me Streaming System + Online Learning •  Why need Online Learning？

–  User — User interests shi_ing — Mixture account、Family account

–  Item — Millions of new items per day — 10M updated items(@tle,price .etc)

–  Context — Promo@ons, Discounts .etc — Fes@vals: Na@onal days, 11.11

Olive

Goal n  Make real-‐@me response to P/N feedback , and improve the user experiences

n More accurate recommenda@ons n Stable model

Model n FTRL n AdPredictor

16

Asynchronous Distributed OGD

Parameter Server (Tair/Hbase)

Reducer

Reducer

w

△ w

w

△ w

Updater

Updater

FG

FG

IG

IG

Data Shard

Data Shard

Data Shard

Data Shard

Strom/Jstorm TT/MetaQ

Framework

• FTRL-‐Proximal

Update with (sub-‐)gradient Updated models not far from previous L1-‐Norm L2-‐Norm

Olive -‐-‐ FTRL

3.9B samples，1.7B features，Pre-‐train 21mins

n  Cold start ü  OWLQN-‐LR based FTRL Pre-‐train Model 0.5

0.55

0.6

0.65

0.7

0.75

0 1 2 3 4 5 6 7 8 9 10 11 12

AUC

Hour

-‐3

-‐2.5

-‐2

-‐1.5

-‐1

-‐0.5

0

beta

n  Stability ü  Residual-‐based Cascading online train ü  |w-‐w0| Constrain ü  Mini-‐batch update

FTRL in Ac/on

Experiments

Samples(3.9B)： Offline: pre-‐train offline FTRL model based on pv and click data in 14 days Online: in the following 4 days

Model: FTRL

0.65

0.67

0.69

0.71

0.73

0.75

0.77

20140604 20140605 20140605 20140607

GAUC

LR FTRL

n  Accuracy

n  Stability

-‐2.196

-‐2.194

-‐2.192

-‐2.19

beta

0.15

0.25

0.35 gender_comb_1_1

Experiments

10%+

Olive — AdPredictor

n  AdPredictor -‐-‐ Not Sparse ü  Pruning parameters

n  Advantages： ü  Bayesian Model(easy to add domain knowledge) ü  Model uncertainty explicitly ü  Natural explora/on

Outline




Lessons we learn •  Ways to improve recommender systems •  Relevance vs. User experiences

•  Mobile (Contextual) features is very important in ranking of recommenda@ons on the mobile

RT Item-‐CF Content User-‐CF

Relevance

User Experiences

Re-‐rank 10%

Rank 20%

Match 30%

Data 40%

Outline



•  Lessons we learn •  Challenges

Challenges

•  Heterogeneous data(search, social, poi, image .etc) for recommenda@on

•  Mul@modal inputs : images, speech, QR code

•  Context-‐aware and interac@ve recommenda@on

•  Recommenda@on traffic alloca@on to a beSer ecommerce eco-‐system

Reference •  T. Graepel, J. Q. Candela, T. Borchert, and R. Herbrich.Web-‐scale Bayesian click-‐through rate

predic/on for sponsored search adver/sing in microsols bing search engine. In Proc. 27th Internat. Conf. on Machine Learning, 2010.

•  H. B. McMahan. Follow-‐the-‐regularized-‐leader and mirror descent: Equivalence theorems and L1 regulariza/on. In AISTATS, 2011.

•  H. B. McMahan and O. Muralidharan. On calibrated predic/ons for auc/on selec/on mechanisms. CoRR,abs/1211.3955, 2012.

•  Jing Jiang , Jie Lu , Guangquan Zhang , Guodong Long, Scaling-‐Up Item-‐Based Collabora/ve Filtering Recommenda/on Algorithm Based on Hadoop, Proceedings of the 2011 IEEE World Congress on Services, p.490-‐497, July 04-‐09, 2011

•  Chu, W., L. Li, et al. (2011). "Contextual Bandits with Linear Payoff Func/ons." JMLR.

•  Peter Auer. (2002). " Using confidence bounds for exploita/on /explora/on trade-‐offs." JMLR.

WE’RE HIRING Qiang Yan [email protected] Chang Liu [email protected]

Large Scale Recommenda/on in E-‐Commerce

large scale recommendation in e-commerce -- qiang yan

Data & Analytics