Download - Large-Scale Machine Learning for E-commerce
Data: Structured or Unstructured
2
Structured Data Unstructured Data
Product Catalog is Unstructured Data
3
Two Key Tasks:
Organizing and Searching Product Catalog
Source: http://www.macosxautomation.com/automator/examples/ex04/index.html
Product Catalog is Unstructured Data
4
Two Key Tasks:
Organizing and Searching Product Catalog
5
Organizing Product Catalog
Product CatalogTaxonomy
Machine Learning
Organize Information for browsing / search / data analysis
Application
Organizing Product Catalog using Classification
6
Lips Too Women's
'Too Sliver' Patent
Casual Shoes
Size 6
10 Crosby Women's
Ynez Pump
1883 by Wolverine
Women's Maisie
Oxford Tan/Taupe
Leather/Suede
1803 Women's
'Nome' Crocodile
Dress Shoes Size 9
Women’s
Shoes
Comfort
Shoes
Pump
s
Sneaker
s
Flats
Machine Learning Model
Input Output
Decision Tree
7
Machine Learning Model: Many Decision Trees
8
…
…
…… +++
f1(x) f2(x) fM(x)
Combined decision for x
w1
w2wM
Our Large-Scale Machine Learning System for Classificatio
n
1. Normalize text
2. Extract features
3. Many-levels of Deci
sion Trees serve as
classification model
s
9
Classification Results of New Product Titles
10
Product Title: Cross-Front Peplum Layered Dress
General: Women’s Clothing > Clothing
More Specific: Party & Cocktail Dresses > Dresses > Women’s
Clothing > Clothing
Product Title: Cut-Out Leather Platform Wedge Espadrilles
General: Shoes
More Specific: Pumps > Women’s Shoes > Shoes
Product Catalog is Unstructured Data
11
Two Key Tasks:
Organizing and Searching Product Catalog
12
Compact desktop
computer
Somewhere in US on
Wed, 13 Apr 2016 15:59:47 GMT ….
13
Compact desktop
computer
Lenovo thinkcentre
Lenovo all in one
14
Page 2
Page 3
Purchase!
15
16
Ideal
Situatio
n
Current
Situatio
n
?
How do we find the most relevant
products for a search query?
Text-based search alone does not do the job!
Learning to Rank
17
Machine Learning Model that learns to rank search results
Source: http://blog.csdn.net/eastmount/article/details/43080791
query
document
relevance
Relevance based on text
alone is not enough!
What else can we use?
How about user-behavior
signals?
User’s behavior signals
18
buyclick add
Results of Learning to Rank
19
Search Query: “40inch tv”
Regular Text Search Search with User-Signals and Learning to Rank
Not relevant
Not relevant
Summary
• E-commerce data is primarily unstructured data
– Product catalog, merchant and item reviews, search queries
• Proper organization and precise search of this data is necess
ary for good customer experience
– We built machine learning models for large-scale classification of prod
uct catalogs
– Also, we are learning from user behavior to improve our search releva
nce
20
…
…
… …+++
f1(x) f2(x) fM(x)
w1
w2 wM