experience with simple approaches wei fan erheng zhong sihong xie yuzhao huang kun zhang $ jing peng...

12
Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun Yat-sen University $ Xavier University of Louisiana # Montclair State University

Upload: aiden-saunders

Post on 27-Mar-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun

Experience with Simple Approaches

Wei Fan‡ Erheng Zhong† Sihong Xie† Yuzhao Huang† Kun Zhang$

Jing Peng# Jiangtao Ren†

‡IBM T. J. Watson Research Center†Sun Yat-sen University

$Xavier University of Louisiana#Montclair State University

Page 2: Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun

RDT: Random Decision Tree (Fan et al’03)

“Encoding data” in trees. At each node, an un-used feature is chosen randomly

A discrete feature is un-used if it has never been chosen previously on a given decision path starting from the root to the current node.

A continuous feature can be chosen multiple times on the same decision path, but each time a different threshold value is chosen

Stop when one of the following happens: A node becomes too small or belong to same class Or the total height of the tree exceeds some limits:

Page 3: Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun

Illustration of RDTB1: {0,1}

B2: {0,1}

B3: continuous

B2: {0,1}

B3: continuous

B2: {0,1}

B3: continuous

B3: continous

B1 == 0

B2 == 0?

Y

B3 < 0.3?

N

Y N

……… B3 < 0.6?

Random threshold 0.3

Random threshold 0.6

B1 chosen randomly

B2 chosen randomly

B3 chosen randomly

Page 4: Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun

Probabilistic view of decision trees - PETs

|

Petal.Width< 1.75

setosa 50/0/0

versicolor0/49/5

virginica 0/1/45

Petal.Length< 2.45

P(setosa|x,θ) = 0

P(versicolor|x,θ) = 49/54

P(virginica|x,θ) = 5/54

Given an example x :

iiL

yL N/N),x|y(P • , E.g. (C4.5, CART)

• confidences in the predicted labels

• the dependence of P(y|x,θ) on θ is non-trivial

For example :

Page 5: Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun

Problems of probability estimation via conventional DTs

1. Probability estimates tend to approach the extremes of 1 and 0.

---------------------------------------------2. Additional inaccuracies result

from the small number of examples at a leaf.

---------------------------------------------3. Same probability is assigned

to the entire region of space defined by a given leaf.

C4.4(Provost,03)

BC44(Zhang,06),

RDT(Fan,03)

Page 6: Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun

bRDT

“ bRDT” is the averaging of RDT and BC44, where RDT is Random Decision Tree and BC44 is Bagged C4.4

RDT pr(y|x)

BC44 pb(y|x)

bRDT [pr(y|x)+pb(y|x)]/2

Page 7: Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun

Sampling strategy for Task 1 &2

For station Z, negative instances are partitioned

into “blocks” such that the size of each block is

Approximately 3 times as that of the positive.

………… ……

Positive

Negative

Block 1

Block n

Page 8: Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun

Task 1 & 2 - Result For V station, row 2 and 3, corresponding to task 1 and 2 The optimal classifiers of task 1 and 2 for station W, X, Y,

Z are the same. Thus there’s only one row for these 4 stations

Page 9: Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun

Task 1 - ROC

VwXYZ

Page 10: Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun

Task 2 - ROC

VwXYZ

Page 11: Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun

Task 3 – Feature Expansion

X X X2 ln(X+1)

Example Three instances with only one feature, A and B are positive while C is negative. A(0.9) B(1.0) C(1.1)

Distant (A, B) = Distant (B, C) 0.01 vs. 0.01

Expand A(0.9,0.81,0.64) B(1.0,1.0,0.69) C(1.1,1.21,0.74)

Distant (A, B) < Distant (B, C) 0.049 vs. 0.056

Page 12: Experience with Simple Approaches Wei Fan Erheng Zhong Sihong Xie Yuzhao Huang Kun Zhang $ Jing Peng # Jiangtao Ren IBM T. J. Watson Research Center Sun

Task3 – Result of test 3

Parameter-free