machine learning for retail and ecommerce
TRANSCRIPT
Machine Learning (ML) for eCommerce and Retail
Dr. Andrei Lopatenko Director of Engineering,
Recruit Institute of Technology Recruit Holdings
former Walmart Labs, Google (twice), Apple (twice) [email protected]
ML for eCommerce
• Search, Browse, for commerce sites and application
• Help users to find and discover items they will purchase
• Maximize revenue/profit per user session
Search
Search - ranking
ranking
Search - LHN
Left Hand
Navigation
Search spell correction
Search type ahead
Browse
Search data size
• Catalogue items • 8 M items now compare ~ 400 M
Amazon / eBay • X 10 in near future • 2 K text description per item + images • Several hundreds of structured attributes
per catalog
Search – user searches
• Tens of millions per day • Tens billions session per year • Online sales 13.2 B per year (http://
fortune.com/2015/11/17/walmart-ecommerce/)
• 500B per year sales offline stories (8% USA economy) in ~ 11K stores
• The number of transactions ~ 10B (public data)
ML addressable problems
• Learning to rank • Given a query, what’s the list of items
with the highest probability of conversion (purchase), ATC (add to card), page view
ML addressable problems
• Typeahead • Given a sequence of characters types by
user, what’s most probably competitions, what are most probable items users wants to buy
ML addressable problems
• Spell correction • Given a user query, what’s the query user
actually wanted to type
ML addressable problems
• Cold start • Given a new items with it’s set of
attributes and no history of sales or exposure on site, predict items sales and item sales per query
ML addressable problems
• Prediction of LHN • Given a user query, what’s the best set of
facet and facet values, which gives higher probability of users interacting with them and finally buying an item
ML addressable problems
• Query understanding • Given a query, build a semantic parse of
query, tag tokens with attributes: blue tshirts for teenagers -> blue:color tshirts:type for:opt teenagers:agerestriction10-20
• Classification: blue tshirts for teenagers: -> type:apparel, price preference: 10-30, releaseyearpreference: 2014-2016
ML addressable problems
• Related searches • Given a query, what are queries which are
either semantically close to this one, or represent coincidental users interests
• Nike shoes -> adidas shoes, sport shoes, • Coffee mugs -> travel mugs, photo coffee
mugs, cappuccino cups
ML addressable problems
• product discovery • help users to explore product assortment, • drive users to diverse products • reduce risk of selecting irrelevant items • help to find price,quality,brand etc
alternatives • reduce pigeonhole risk • provide relevant data to make a decision
ML addressable problems
• Image similarity • Given images of the items, give other
items such that images of those are visually appealing to the users which like the original item (appealing by shape? Color? Texture?) -> causing high conversion in recommendation
ML addressable problems
• Voice search • Given voice input, reply with a list of the
best items • “what are the cheapest samsung tvs in the
store” • “what is best deal on queen bed today?”
ML addressable problems
• extraction of item attributes • Given an item: what are item attributes:
brand, color, size (wheel, screen, height, S/M/XL, Queen/Twin/King/Full), Gender, Pattern, Shape, Features
ML addressable problems
• Representations of users : actions on websites/apps -> searches, clicks, browsing behaviour, product -> purchase preferences, reviews, ratings, return rates
ML addressable problems
• title generation: how to generate the title which will cause maximum conversion rate
• which product attributes select for the title?
What makes a good title?
What makes a good title?
Limits
• Most models should be served in production
• 50ms on prediction • Part of big system, memory limits ~ 10G
Retail
Retail
• Key directions which require machine learning:
• discounting tools • coupons and rewards • loyalty • inventory management
Inventory management
• Customer want to buy products • Customers have diverse needs • Products should be in stock, ideally in
warehouses close to customers • but it’s expensive to store products • Problem: How many products of each type
should be stored, when product supply should be refilled?