cost-sensitive learning for large- scale hierarchical classification of commercial products jianfu...

22
Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

Upload: ralf-cummings

Post on 16-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial

Products

Jianfu Chen, David S. Warren

Stony Brook University

Page 2: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

Classification is a fundamental problem in information management.

Email content Product description

UNSPSC

Product and material transport vehicles (16)

Passenger motor vehicles (15)

Safety and rescue vehicles (17)

Limousines (06)Automobiles or cars (03)Buses (02)

Food Beverage and Tobacco Products (50)

Vehicles and their Accessories and

Components (25)

Office Equipment and Accessories and

Supplies (44)

Marine transport (11) Motor vehicles (10) Aerospace systems (20)

Segment

Family

Class

Commodity

Spam Ham

Page 3: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

How should we design a classifier for a given real world task?

Page 4: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

Method 1. No Design

Training Set f(x) Test Set

Try Off-the-shelf ClassifiersSVM

Logistic-regressionDecision Tree

Neural Network...

Implicit Assumption: We are trying to minimize error rate, or equivalently, maximize accuracy

Page 5: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

What’s the use of the classifier?

How do we evaluate the performance of a classifier according to our interests?

Method 2. Optimize what we really care about

Quantify what we really care about

Optimize what we care about

Page 6: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

Hierarchical classification of commercial products

Textual product description

UNSPSC

Product and material transport vehicles (16)

Passenger motor vehicles (15)

Safety and rescue vehicles (17)

Limousines (06)Automobiles or cars (03)Buses (02)

Food Beverage and Tobacco Products (50)

Vehicles and their Accessories and

Components (25)

Office Equipment and Accessories and Supplies

(44)

Marine transport (11) Motor vehicles (10) Aerospace systems (20)

Segment

Family

Class

Commodity

Page 7: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

Product taxonomy helps customers to find desired products quickly.

• Facilitates exploring similar products• Helps product recommendation• Facilitates corporate spend analysis

Looking for gift ideas for a kid?

Toys&Games

dolls building toyspuzzles

...

Page 8: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

...

We assume misclassification of products leads to revenue loss.

Textual product description of a mouse

Product

...

Desktop computer and accessories

mouse keyboard

......

pet

...

realize an expected annual revenue lose part of the potential revenue

Page 9: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

What do we really care about?

A vendor’s business goal is to maximize revenue, or equivalently, minimize revenue loss

Page 10: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

Observation 1: the misclassification cost of a product depends on its potential revenue.

Page 11: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

Observation 2: the misclassification cost of a product depends on how far apart the true class

and the predicted class in the taxonomy.

...

Textual product description of a mouse

Product

...

Desktop computer and accessories

mouse keyboard

......

pet

...

Page 12: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

The proposed performance evaluation metric: average revenue loss

• example weight is the potential annual revenue of product x

• error function is the loss ratio– the percentage of the potential revenue a vendor

will lose due to misclassification from class y to class y’.

– a non-decreasing monotonic function of hierarchical distance between y and y’, f(d(y, y’))

d(y,y’) 0 1 2 3 4

0 0.2 0.4 0.6 0.8

revenue loss of product x

Page 13: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

Learning – minimizing average revenue loss

Minimize convex upper bound

Page 14: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

Multi-class SVM with margin re-scaling

𝜃𝑦 𝑖

𝑇 𝑥 𝑖

𝜃𝑦 ′𝑇 𝑥 𝑖

∀ 𝑖 ,∀ 𝑦 ′ :𝜃𝑦 𝑖

𝑇 𝑥𝑖−𝜃𝑦 ′𝑇 𝑥 𝑖≥ 𝐿 (𝑥 𝑖 , 𝑦 𝑖 , 𝑦

′ )−𝜉 𝑖

𝜉 𝑖≥0

min𝜃 , 𝜉

12||𝜃||2+ 𝐶

𝑚∑𝑖=1

𝑚

𝜉 𝑖

𝑠 .𝑡 .

Page 15: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

0-1 error rate (standard multi-class SVM)

VALUE product revenue

TREE hierarchical distance

REVLOSS revenue loss

Multi-class SVM with margin re-scaling

∀ 𝑖 ,∀ 𝑦 ′ :𝜃𝑦 𝑖

𝑇 𝑥𝑖−𝜃𝑦 ′𝑇 𝑥 𝑖≥ 𝐿 (𝑥 𝑖 , 𝑦 𝑖 , 𝑦

′ )−𝜉 𝑖

𝜉 𝑖≥0

min𝜃 , 𝜉

12||𝜃||2+ 𝐶

𝑚∑𝑖=1

𝑚

𝜉 𝑖

𝑠 .𝑡 .

plug in any loss function

Convex upper bound of

Page 16: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

Dataset

• UNSPSC (United Nations Standard Product and Service Code) dataset

• Product revenues are simulated– revenue = price * sales

data source multiple online market places oriented for DoD and Federal government customers

GSA AdvantageDoD EMALL

taxonomy structure 4-level balanced tree UNSPSC taxonomy

#examples 1.4M

#leaf classes 1073

Page 17: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

Experimental results

0-1 TREE VALUE REVLOSS0

10

20

30

40

50

60

4.745 4.964

47.708 48.082

5.092 5.082

IDENTITYUNIT

Average revenue loss (in K$) of different algorithms

Page 18: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

What’s wrong?

𝑣 (𝑥𝑖 ) ⋅ 𝐿𝑦 𝑖 𝑦′

Revenue loss ranges from a few K to several M

Page 19: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

Loss normalization

• Linearly scale loss function to a fixed range , say [1, 10]

The objective now upper bounds both 0-1 loss and the average normalized loss.

Page 20: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

0-1 TREE VALUE REVLOSS0

10

20

30

40

50

60

4.745 4.964

47.708 48.082

5.092 5.0824.387 4.371

IDENTITYUNITRANGE

Final results

Average revenue loss (in K$) of different algorithms

7.88% reduction in average revenue loss!

Page 21: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

ConclusionWhat do we really care about for this task?

Minimize error rate?Minimize revenue loss?

Performance evaluation metric

Model + Tractable loss function

Optimization

How do we approximate the performance evaluation metric to make it tractable?

Find the best parameters

empirical risk, average misclassification cost:

regularized empirical risk minimizationA general method: multi-class SVM with margin re-scaling and loss normalization

Page 22: Cost-Sensitive Learning for Large- Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

Thank you!

Questions?