classifying

13
Products Classification Joyce Chan

Upload: joyce-chan

Post on 02-Nov-2014

291 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Classifying

Products Classification Joyce Chan

Page 2: Classifying

Preliminary Knowledge

• the words labeling, tagging, classification, categorization are used interchangeably• the words taxonomy, hierarchy are used interchangeably• facet: is one of the path of the hierachy• static taxonomy: products are manually, or editorially mapped to the hierarchy• dynamic taxonomy: mapping of product to hierarchy generated by the system

automatically, no help from people• document: a commonly used term when talking about searching

o here, we are refering to a product, plus all the metadata that are associated with the product

o ie.  document Beatrice homo milk, it's metadata or attributes can be: title is Beatrice homo milk it is a type of recipe ingredient description is that it's tasty and rich & creamy made by the Beatrice company it's price is $3.60 per bag, etc it is on sale at the Oakville location Loblaws users highly recommend buying this milk it's image name is b-milk.jpg

Page 3: Classifying

Classification: Different strategiesNo Classification1. products are not tagged to anything

Single level classification1. single level static taxonomy

Multi level classification– 1 dimensional static taxonomy applying tree breakdown– hybrid of one dimensional & single level static taxonomy w/ Jeremy's tree

breakdown– multi-dimensional static taxonomy applying Jeremy's tree breakdown– dynamic taxonomy w/ supervised extraction of facet from annotated text

documents– dynamic taxonomy w/ unsupervised extraction of facets– static taxonomy w/ dataproviders' labels & unsupervised extraction of

facets

Page 4: Classifying

No tagging / No Classification

• all products are to be directly retrieved through sql or search engine queries 

• we assume users can find relevant information quickly with no further assistance

Pros• simple to implement, this is done already as we have a product

database

Cons• with a large product database, it is confusing to users• ie. users search milk, many types of results are returned, they may

have to flip through a few pages before finding what they need

Page 5: Classifying

The case for having a products taxonomy

Pros• helps people find & explore what they are looking for in website

and concierge device if they cannot quickly find it though directly searching

• users have became used to e-commerce interfaces with product taxonomy

Page 6: Classifying

The case for static/editorially classified taxonomy

Pros• highly mappable to product shelfing, kind of like the dewey

classification system for the library

Cons• a lot of manual labor effort to maintain the classification structure

that we provide, since we have thousands and thousands of products and hope to expand our product database in the future

Page 7: Classifying

Single Level Static Taxonomy - only labeling / tagging

• each product has one label• ie. Beatrice brand homo milk <= 'dairy'

Pros• provided by Gladson already, very straightforward to implement

Cons• not incredibly descriptive, not useful to users (customers,

managers, inventory staff, or us) 

Page 8: Classifying

One dimensional (one path) static Taxonomy with fixed levels of

classifications•  •  • here there is a path from the root - department down to product• for instance Beatrice homo milk is classified as dairy, milk,

homo, upc=1234567890Pros• easy to implement• everything classified under standard number of level of concepts• improves searching quite a bit

Cons• not allowing a product to be classified in multiple 'classes'• labour intensive to editorially edit product - classifications

Page 9: Classifying

The case for dynamic taxonomy

Pros• cheap to have the computer place the products on the taxonomy

by itself every time we add a new product to the database 

Cons• we're probably going to be applying a fairly complex taxonomy

scheme, such as Amazon's• some possible implementational challenges, such as the correct

use of machine learning libraries

Page 10: Classifying

Dynamic taxonomy predefined w/ a fixed product db & supervised facet extraction from collections

of text annotated items• we would have a predefined taxonomy, with some data already

mapped under it• when a new item appears that has not mapped to the base

taxonomy, use of machine learning algorithms to put it in the correct place

Pros• completely automated classification• with Amazon, because it's the most feature complete grocery

multi-leveled taxonomy that I found (Tesco being another good one, but it's n/a right now)

Cons• new types of facets cannot be discovered, because we're using the

predefined one

Page 11: Classifying

Dynamic Taxonomy & unsupervised facet extraction for collections of text documents• no prior facets to begin with, algorithm will build taxonomy all by itself • usually used on things like unclassified articles, etc

algorithm– for each item in products collection, identify which term is important– for each important term, query 1+ external resources & get contextual

terms that appear in the result.  Add retrieved terms to the original document as part of its meta-data, now it is a context-aware document

– analyze the frequency of the terms, both in the original collection & the expanded collection to identify the candidate facet terms

pros• new facet keywords can be created and automatically inserted into the

taxonomy with no human interventioncons• for each step in above algo, we need to use a ML algorithm• hard (for our company) to evaluate recall & precision given our small and

non-standardized set of data

Page 12: Classifying

Hybrid: i) Dynamic taxonomy w/ a fixed hierarchy& supervised facet extraction + ii) social tagging (aka.

folksonomy)• we see that unsupervised learning is not suitable for our dataset, therefore I propose the use

of a hybrid scheme to enable taxonomy creation

• we can use our dynamic taxonomy scheme and also allow users to create new facet keywords, but maybe only the moderator can add the the new keyword into the taxonomy

• the rest of the tags are just freely floating outside of the taxonomy• ie: 

http://www.amazon.com/gp/product/tags-on-product/B001EO5XTO/ref=tag_dpp_cust_edpp_sao Amazon had allowed their customers to create their own tags of the product that is

helpful for their own purposes

• possibly to even merge our tags with facebooko http://techcrunch.com/2010/07/27/amazon-now-taps-into-facebook-for-social-product-r

ecommendations/

• Pros: possibly more useful to shoppers for them to remember their own stuff• Cons: we'd have to get comfortable with having a plethora of tags not necessarily related to

each other

Page 13: Classifying

Hybrid: 4 level (Jeremy's) taxonomy creation, w/ Gladson or GS1 labels and unsupervised facet

extractionPro• sounds the closes to what we're trying to accomplish• possible extensions with social tagging as well• works kind of well w/ shelfing

Cons• not as richly descriptive due to having only a fewer levels on

the taxonomy• since taxonomy is confined to a certain number of levels, I don't

really know how to implement this right now (I can research)