cost-sensitive learning for large-scale hierarchical classification of commercial products

22
Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

Upload: tea

Post on 06-Jan-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial Products. Jianfu Chen, David S. Warren Stony Brook University. Classification is a fundamental problem in information management. UNSPSC. Vehicles and their Accessories and Components (25). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial

Products

Jianfu Chen, David S. Warren

Stony Brook University

Page 2: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

Classification is a fundamental problem in information management.

Email content Product description

UNSPSC

Product and material transport vehicles (16)

Passenger motor vehicles (15)

Safety and rescue vehicles (17)

Limousines (06)Automobiles or cars (03)Buses (02)

Food Beverage and Tobacco Products (50)

Vehicles and their Accessories and

Components (25)

Office Equipment and Accessories and

Supplies (44)

Marine transport (11) Motor vehicles (10) Aerospace systems (20)

Segment

Family

Class

Commodity

Spam Ham

Page 3: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

How should we design a classifier for a given real world task?

Page 4: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

Method 1. No Design

Training Set f(x) Test Set

Try Off-the-shelf ClassifiersSVM

Logistic-regressionDecision Tree

Neural Network...

Implicit Assumption: We are trying to minimize error rate, or equivalently, maximize accuracy

Page 5: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

What’s the use of the classifier?

How do we evaluate the performance of a classifier according to our interests?

Method 2. Optimize what we really care about

Quantify what we really care about

Optimize what we care about

Page 6: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

Hierarchical classification of commercial products

Textual product description

UNSPSC

Product and material transport vehicles (16)

Passenger motor vehicles (15)

Safety and rescue vehicles (17)

Limousines (06)Automobiles or cars (03)Buses (02)

Food Beverage and Tobacco Products (50)

Vehicles and their Accessories and

Components (25)

Office Equipment and Accessories and Supplies

(44)

Marine transport (11) Motor vehicles (10) Aerospace systems (20)

Segment

Family

Class

Commodity

Page 7: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

Product taxonomy helps customers to find desired products quickly.

• Facilitates exploring similar products• Helps product recommendation• Facilitates corporate spend analysis

Looking for gift ideas for a kid?

Toys&Games

dolls building toyspuzzles

...

Page 8: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

...

We assume misclassification of products leads to revenue loss.

Textual product description of a mouse

Product

...

Desktop computer and accessories

mouse keyboard

......

pet

...

realize an expected annual revenue lose part of the potential revenue

Page 9: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

What do we really care about?

A vendor’s business goal is to maximize revenue, or equivalently, minimize revenue loss

Page 10: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

Observation 1: the misclassification cost of a product depends on its potential revenue.

Page 11: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

Observation 2: the misclassification cost of a product depends on how far apart the true class

and the predicted class in the taxonomy.

...

Textual product description of a mouse

Product

...

Desktop computer and accessories

mouse keyboard

......

pet

...

Page 12: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

The proposed performance evaluation metric: average revenue loss

• example weight is the potential annual revenue of product x

• error function is the loss ratio– the percentage of the potential revenue a vendor

will lose due to misclassification from class y to class y’.

– a non-decreasing monotonic function of hierarchical distance between y and y’, f(d(y, y’))

d(y,y’) 0 1 2 3 4

0 0.2 0.4 0.6 0.8

revenue loss of product x

Page 13: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

Learning – minimizing average revenue loss

Minimize convex upper bound

Page 14: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

Multi-class SVM with margin re-scaling

𝜃𝑦 𝑖

𝑇 𝑥 𝑖

𝜃𝑦 ′𝑇 𝑥 𝑖

∀ 𝑖 ,∀ 𝑦 ′ :𝜃𝑦 𝑖

𝑇 𝑥𝑖−𝜃𝑦 ′𝑇 𝑥 𝑖≥ 𝐿 (𝑥 𝑖 , 𝑦 𝑖 , 𝑦

′ )−𝜉 𝑖

𝜉 𝑖≥0

min𝜃 , 𝜉

12||𝜃||2+ 𝐶

𝑚∑𝑖=1

𝑚

𝜉 𝑖

𝑠 .𝑡 .

Page 15: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

0-1 error rate (standard multi-class SVM)

VALUE product revenue

TREE hierarchical distance

REVLOSS revenue loss

Multi-class SVM with margin re-scaling

∀ 𝑖 ,∀ 𝑦 ′ :𝜃𝑦 𝑖

𝑇 𝑥𝑖−𝜃𝑦 ′𝑇 𝑥 𝑖≥ 𝐿 (𝑥 𝑖 , 𝑦 𝑖 , 𝑦

′ )−𝜉 𝑖

𝜉 𝑖≥0

min𝜃 , 𝜉

12||𝜃||2+ 𝐶

𝑚∑𝑖=1

𝑚

𝜉 𝑖

𝑠 .𝑡 .

plug in any loss function

Convex upper bound of

Page 16: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

Dataset

• UNSPSC (United Nations Standard Product and Service Code) dataset

• Product revenues are simulated– revenue = price * sales

data source multiple online market places oriented for DoD and Federal government customers

GSA AdvantageDoD EMALL

taxonomy structure 4-level balanced tree UNSPSC taxonomy

#examples 1.4M

#leaf classes 1073

Page 17: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

Experimental results

0-1 TREE VALUE REVLOSS0

10

20

30

40

50

60

4.745 4.964

47.708 48.082

5.092 5.082

IDENTITYUNIT

Average revenue loss (in K$) of different algorithms

Page 18: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

What’s wrong?

𝑣 (𝑥𝑖 ) ⋅ 𝐿𝑦 𝑖 𝑦′

Revenue loss ranges from a few K to several M

Page 19: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

Loss normalization

• Linearly scale loss function to a fixed range , say [1, 10]

The objective now upper bounds both 0-1 loss and the average normalized loss.

Page 20: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

0-1 TREE VALUE REVLOSS0

10

20

30

40

50

60

4.745 4.964

47.708 48.082

5.092 5.0824.387 4.371

IDENTITYUNITRANGE

Final results

Average revenue loss (in K$) of different algorithms

7.88% reduction in average revenue loss!

Page 21: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

ConclusionWhat do we really care about for this task?

Minimize error rate?Minimize revenue loss?

Performance evaluation metric

Model + Tractable loss function

Optimization

How do we approximate the performance evaluation metric to make it tractable?

Find the best parameters

empirical risk, average misclassification cost:

regularized empirical risk minimizationA general method: multi-class SVM with margin re-scaling and loss normalization

Page 22: Cost-Sensitive Learning  for Large-Scale  Hierarchical Classification  of Commercial Products

Thank you!

Questions?