classifying
DESCRIPTION
TRANSCRIPT
![Page 1: Classifying](https://reader036.vdocuments.mx/reader036/viewer/2022082419/54594e66af795953128b4ce8/html5/thumbnails/1.jpg)
Products Classification Joyce Chan
![Page 2: Classifying](https://reader036.vdocuments.mx/reader036/viewer/2022082419/54594e66af795953128b4ce8/html5/thumbnails/2.jpg)
Preliminary Knowledge
• the words labeling, tagging, classification, categorization are used interchangeably• the words taxonomy, hierarchy are used interchangeably• facet: is one of the path of the hierachy• static taxonomy: products are manually, or editorially mapped to the hierarchy• dynamic taxonomy: mapping of product to hierarchy generated by the system
automatically, no help from people• document: a commonly used term when talking about searching
o here, we are refering to a product, plus all the metadata that are associated with the product
o ie. document Beatrice homo milk, it's metadata or attributes can be: title is Beatrice homo milk it is a type of recipe ingredient description is that it's tasty and rich & creamy made by the Beatrice company it's price is $3.60 per bag, etc it is on sale at the Oakville location Loblaws users highly recommend buying this milk it's image name is b-milk.jpg
![Page 3: Classifying](https://reader036.vdocuments.mx/reader036/viewer/2022082419/54594e66af795953128b4ce8/html5/thumbnails/3.jpg)
Classification: Different strategiesNo Classification1. products are not tagged to anything
Single level classification1. single level static taxonomy
Multi level classification– 1 dimensional static taxonomy applying tree breakdown– hybrid of one dimensional & single level static taxonomy w/ Jeremy's tree
breakdown– multi-dimensional static taxonomy applying Jeremy's tree breakdown– dynamic taxonomy w/ supervised extraction of facet from annotated text
documents– dynamic taxonomy w/ unsupervised extraction of facets– static taxonomy w/ dataproviders' labels & unsupervised extraction of
facets
![Page 4: Classifying](https://reader036.vdocuments.mx/reader036/viewer/2022082419/54594e66af795953128b4ce8/html5/thumbnails/4.jpg)
No tagging / No Classification
• all products are to be directly retrieved through sql or search engine queries
• we assume users can find relevant information quickly with no further assistance
Pros• simple to implement, this is done already as we have a product
database
Cons• with a large product database, it is confusing to users• ie. users search milk, many types of results are returned, they may
have to flip through a few pages before finding what they need
![Page 5: Classifying](https://reader036.vdocuments.mx/reader036/viewer/2022082419/54594e66af795953128b4ce8/html5/thumbnails/5.jpg)
The case for having a products taxonomy
Pros• helps people find & explore what they are looking for in website
and concierge device if they cannot quickly find it though directly searching
• users have became used to e-commerce interfaces with product taxonomy
![Page 6: Classifying](https://reader036.vdocuments.mx/reader036/viewer/2022082419/54594e66af795953128b4ce8/html5/thumbnails/6.jpg)
The case for static/editorially classified taxonomy
Pros• highly mappable to product shelfing, kind of like the dewey
classification system for the library
Cons• a lot of manual labor effort to maintain the classification structure
that we provide, since we have thousands and thousands of products and hope to expand our product database in the future
![Page 7: Classifying](https://reader036.vdocuments.mx/reader036/viewer/2022082419/54594e66af795953128b4ce8/html5/thumbnails/7.jpg)
Single Level Static Taxonomy - only labeling / tagging
• each product has one label• ie. Beatrice brand homo milk <= 'dairy'
Pros• provided by Gladson already, very straightforward to implement
Cons• not incredibly descriptive, not useful to users (customers,
managers, inventory staff, or us)
![Page 8: Classifying](https://reader036.vdocuments.mx/reader036/viewer/2022082419/54594e66af795953128b4ce8/html5/thumbnails/8.jpg)
One dimensional (one path) static Taxonomy with fixed levels of
classifications• • • here there is a path from the root - department down to product• for instance Beatrice homo milk is classified as dairy, milk,
homo, upc=1234567890Pros• easy to implement• everything classified under standard number of level of concepts• improves searching quite a bit
Cons• not allowing a product to be classified in multiple 'classes'• labour intensive to editorially edit product - classifications
![Page 9: Classifying](https://reader036.vdocuments.mx/reader036/viewer/2022082419/54594e66af795953128b4ce8/html5/thumbnails/9.jpg)
The case for dynamic taxonomy
Pros• cheap to have the computer place the products on the taxonomy
by itself every time we add a new product to the database
Cons• we're probably going to be applying a fairly complex taxonomy
scheme, such as Amazon's• some possible implementational challenges, such as the correct
use of machine learning libraries
![Page 10: Classifying](https://reader036.vdocuments.mx/reader036/viewer/2022082419/54594e66af795953128b4ce8/html5/thumbnails/10.jpg)
Dynamic taxonomy predefined w/ a fixed product db & supervised facet extraction from collections
of text annotated items• we would have a predefined taxonomy, with some data already
mapped under it• when a new item appears that has not mapped to the base
taxonomy, use of machine learning algorithms to put it in the correct place
Pros• completely automated classification• with Amazon, because it's the most feature complete grocery
multi-leveled taxonomy that I found (Tesco being another good one, but it's n/a right now)
Cons• new types of facets cannot be discovered, because we're using the
predefined one
![Page 11: Classifying](https://reader036.vdocuments.mx/reader036/viewer/2022082419/54594e66af795953128b4ce8/html5/thumbnails/11.jpg)
Dynamic Taxonomy & unsupervised facet extraction for collections of text documents• no prior facets to begin with, algorithm will build taxonomy all by itself • usually used on things like unclassified articles, etc
algorithm– for each item in products collection, identify which term is important– for each important term, query 1+ external resources & get contextual
terms that appear in the result. Add retrieved terms to the original document as part of its meta-data, now it is a context-aware document
– analyze the frequency of the terms, both in the original collection & the expanded collection to identify the candidate facet terms
pros• new facet keywords can be created and automatically inserted into the
taxonomy with no human interventioncons• for each step in above algo, we need to use a ML algorithm• hard (for our company) to evaluate recall & precision given our small and
non-standardized set of data
![Page 12: Classifying](https://reader036.vdocuments.mx/reader036/viewer/2022082419/54594e66af795953128b4ce8/html5/thumbnails/12.jpg)
Hybrid: i) Dynamic taxonomy w/ a fixed hierarchy& supervised facet extraction + ii) social tagging (aka.
folksonomy)• we see that unsupervised learning is not suitable for our dataset, therefore I propose the use
of a hybrid scheme to enable taxonomy creation
• we can use our dynamic taxonomy scheme and also allow users to create new facet keywords, but maybe only the moderator can add the the new keyword into the taxonomy
• the rest of the tags are just freely floating outside of the taxonomy• ie:
http://www.amazon.com/gp/product/tags-on-product/B001EO5XTO/ref=tag_dpp_cust_edpp_sao Amazon had allowed their customers to create their own tags of the product that is
helpful for their own purposes
• possibly to even merge our tags with facebooko http://techcrunch.com/2010/07/27/amazon-now-taps-into-facebook-for-social-product-r
ecommendations/
• Pros: possibly more useful to shoppers for them to remember their own stuff• Cons: we'd have to get comfortable with having a plethora of tags not necessarily related to
each other
![Page 13: Classifying](https://reader036.vdocuments.mx/reader036/viewer/2022082419/54594e66af795953128b4ce8/html5/thumbnails/13.jpg)
Hybrid: 4 level (Jeremy's) taxonomy creation, w/ Gladson or GS1 labels and unsupervised facet
extractionPro• sounds the closes to what we're trying to accomplish• possible extensions with social tagging as well• works kind of well w/ shelfing
Cons• not as richly descriptive due to having only a fewer levels on
the taxonomy• since taxonomy is confined to a certain number of levels, I don't
really know how to implement this right now (I can research)