a framework for modeling positive class expansion with ... · 1, is prefered ( , ) 1, is prefered...

http://lamda.nju.edu.cn

A Framework for Modeling Positive

Class Expansion with Single Snapshot

Yang Yu and Zhi-Hua Zhou

LAMDA Group

National Key Laboratory for Novel Software Technology

Nanjing University, China

Motivating task

evolution of mobile

telecom network

Motivating task

evolution of mobile

telecom network

we are at the moment

of moving towards 3G

Motivating task

evolution of mobile

telecom network

we are at the moment

of moving towards 3G

predict the 2G users that will turn to use 3G

Analysis of the task

2G starts 2G dominates 3G starts 3G dominates

time line:

event:

class distribution:

time line:

event:

class distribution:

time line:

event:

when we train the model what we want to predict

class distribution:

time line:

event:

when we train the model what we want to predict

positive class expansion with single snapshot (PCES) problem

Outline

• A new data mining problem: PCES

• Why we need the PCES problem

• A solution to the PCES problem

• Results

• Conclusion

Outline

• Results

• Conclusion

Formulation of classical learning

• i.i.d. instances

• training set drawn from a distribution

• fixed labeling function

a learning algorithm outputs a function to minimize:

1{ }ni i

( | )p y x

(̂ ; , ( | ))f D p y x

ˆ ~( ; , ( | ))( (̂ ; , ( | )) (, )| )

f D p yf Der pr y p y

xxx xL x

can not model a changing labeling function

• labeling function at training time

• labeling function at testing time

a learning algorithm outputs a function to minimize:

Formulation of PCES

( | )trp y x

( | )tep y x

(̂ ; , ( | ))tr

f D p y x

ˆ ~( ; , ( | )), )(̂ ; , ( | )) ( |( )

t ef D p tryf pD per y yr

xxxL x x

with a constraint:

( | ) (: |~ )te t tr ty y p yp yxx x

( 1 | ) ({ 1, 1}, ~ 1 ): |te try p yy px x x

for convenience, we assume:

Another example

positive class:

hot items

negative class:

not hot items

Another example

the PCES problem

, only one snapshotthe positive class is expanding

positive class:

hot items

negative class:

not hot items

Further example

positive class:

hot items

negative class:

not hot items

Further example

the PCES problem

, only one snapshotthe positive class is expanding

positive class:

hot items

negative class:

not hot items

Outline

• Results

• Conclusion

Related learning frameworks

• PU-Learning (learning with positive and unlabeled data)

• Concept drift

• Covariance shift

PU-Learning

Setting:

only positive instances and unlabeled instances are in

the training data

Assumption:

the positive instances are representatives of the positive

class concept [Liu et al, ICML02][Yu et al, KDD02]

PCES: positive class is in expansion

PU-Learning could not catch expanded class concept

Concept Drift

Setting:

instances are coming sequentially batch by batch,

the target concept may change in the coming batch

Assumption:

a series of data samples are available for drift detection [Klinkenberg & Joachims, ICDM00][Kolter & Maloof, ICML03]

PCES: only a single snapshot is available

concept drift approaches are disabled

Assumption:

the labeling function is fixed

Covariance Shift

(or sample selection bias [Shimodaira, JSPI00])

Setting:

training and test instances are drawn from different

distributions, i.e., is in changing

( | )p y x

( )p x

PCES: is fixed but is in change

covariance shift approaches are disabled

( )p x ( | )p y x

Outline

• Results

• Conclusion

Optimized by SGBDota

The proposed approach

Learn from

pure data

Incorporate

preference bias

Combined objective

Learn from pure data

Observation:

a desired leaner ranks positive training instances higher

than negative training instances

exactly expressed by the AUC (area under ROC) criterion:

1( ( ) ( ))

| ||( )

f fD D

L fx x

Learn from pure data

( ( ) ( )) 11( ) 1

|| |(1 f

( ( ) ( )) 1

(1 )| |

( , )1

smoothed loss function:

instance-wise loss function:

Incorporate preference bias

User can provide preferences by

• indicating preferences on randomly sampled instance pairs

• applying a priori rules that indicate the preferences

1, is prefered

( , ) 1, is prefered

0, equal or unknown

|) ( , )( ) 1

a b a bD

efff kL

x x x xI

In either way, we can have a preference function

Loss function

Incorporate preference bias

smoothed loss function

1( ( ) ( )) , )

|1 1) a b a b

Dreff eL

x x x x

instance-wise loss function

( ) ( )) ( , )( , ) 11

Df x x x x

Combine the two objectives

the combined loss function

( )( ) ) (auc pref

L f L f L f

the learning problem thus is

ˆ argmin argmi( ) ( )n ( )auc pref

f fL f L f Lf f

Optimization

Gradient Boosting [Friedman, AnnStat01, CSDA02]

* argmin ( ( ), )f

f L f yx x

( ;( ) )t

tF h xx

( , ) 1, ) argmin (; )( )(t t t

( ) ( )

( ( ))( ; )

( )argmin

tD f F

fx x x

x 1argmin (; ))(

t t tL F h

Optimization

Gradient Boosting [Friedman, AnnStat01, CSDA02]

* argmin ( ( ), )f

f L f yx x

( ;( ) )t

tF h xx

( , ) 1, ) argmin (; )( )(t t t

( ) ( )

( ( ))( ; )

( )argmin

tD f F

fx x x

x 1argmin (; ))(

t t tL F h

Gradient Boosting fits y, but we need to

fit both y and k

Optimization with double targets

SGBDota (Stochastic Gradient Boosting with DOuble TArgets)

* argm ( )in ) (auc pref fL ff L f

,1 ,1 ,2 ,21 20

( ; )( )( ; )) (T

tt t t thF hx xx

1 1 2 2,,1 ,1 ,2 , 1 2, ( ) 22 , 1 1, , ) ar( gmin (; ) (( ;, ))

t t t t thF hL

( ) ( )

( ( ))( ;argmin )

fx x x

1 1 1 ,1 ,2

argmin

(; ) (; ))

t t thL F h

( ) ( )

( ( ))( ; )

( )argmin

SGBDota

Optimize by SGBDota

Learn from

pure dataIncorporate

preference bias

Combined objective

Outline

• Results

• Conclusion

Data Sets

A synthetic data set + 4 UCI data sets

postoperative

segment

veteran

Evaluation method2/3 as the training data, 1/3 as test data

repeated for 20 times random splits

Data Sets – con’t

Dataset

description: patient state after operation

original classes:

ICU, general hospital floor, prepare to go home

Positive class for training

Positive class for testing

ICU + general hospital floor some patients in general hospital floor will be sent to ICU

Dataset

description: outdoor images

original classes:

brickface, sky, cement, window, path, foliage, and grass

grass + foliage + path

moving focus

Dataset

description: lung cancer trial data

original class:

survival time

survival time < 12 hours

survival time < 24 hours

predict future victims

Dataset

description: primary biliary cirrhosis trial data

original class:

living time

living time < 365 days

living time < 1460 days

predict future victims

Comparing Methods

The only one approach for PCES

GetEnsemble

A classical learning approach

Random Forests

A PU-Learning approach

PU-SVM

A degenerate version: which does not use domain knowledge

SGBAUC

An easy approach

Random guess

SGBDota Configuration

SGBDota-1: positive class expands from dense positive area

to sparse positive area

SGBDota-2: positive class expands from dense positive area

to sparse positive area and sparse negative area

SGBDota-3: positive class expands along with the

neighborhoods linearly

for UCI datasets, we try three preferencesthe first two are reasonable for most tasks

Result on Synthetic Data

Random Forests PU-SVM SGBAUC SGBDota-1

Results on UCI data sets

AUC values of SGBDota, Random forests (RF), PU-SVM, SGBAUC and Random

t-test results (win/tie/loss counts)

using the first two preferences

SGBDota with reasonable preference is better

Dataset SGBDota-1 SGBDota-2 GetEnsemble SGBAUC PU-SVM RF Random

posto .470±.131 .483±.111 .464±.083 .457±.084 .457±.107 .448±.076 .456±.148

segment .821±.031 .822±.029 .757±.030 .744±.012 .753±.020 .750±.014 .506±.018

veteran .658±.118 .650±.115 .663±.090 .658±.093 .627±.146 .637±.102 .522±.069

pbc .721±.034 .726±.032 .684±.033 .665±.041 .709±.033 .710±.043 .503±.043

GetEnsemble SGBAUC PU-SVM RF Random

SGBDota-1 2/2/0 2/2/0 1/3/0 1/3/0 3/1/0

SGBDota-2 2/2/0 2/2/0 2/2/0 2/2/0 3/1/0

Results on UCI data sets

AUC values of SGBDota, Random forests (RF), PU-SVM, SGBAUC and Random

t-test results (win/tie/loss counts)

How about using a less reasonable preference ?

The preference must not be misleading

Dataset SGBDota-3 GetEnsemble SGBAUC PU-SVM RF Random

posto .459±.132 .464±.083 .457±.084 .457±.107 .448±.076 .456±.148

segment .744±.025 .757±.030 .744±.012 .753±.020 .750±.014 .506±.018

veteran .544±.094 .663±.090 .658±.093 .627±.146 .637±.102 .522±.069

pbc .638±.054 .684±.033 .665±.041 .709±.033 .710±.043 .503±.043

GetEnsemble SGBAUC PU-SVM RF Random

SGBDota-3 0/2/2 0/2/2 0/2/2 0/2/2 2/2/0

Outline

• Results

• Conclusion

Conclusions

Main contribution

• exists in many real world applications

• not well handled by current techniques

• An initial solution

Feature work

• better solutions

• real applications

THANK YOU

a framework for modeling positive class expansion with ... · 1, is prefered ( , ) 1, is prefered...

Documents

stockholders agreement-multiple series prefered stock

bulnew - narod.ruzyurvas.narod.ru/knigi/bulinsky.pdf · "d...

kids prefered 2016 spring catalog

basic exercises 1 · basic exercises 1 page 1 01.) app -...

filebddd lama (swot analysis) 01. ffl.q.bddb g). b. g). b....

warm-up x = 6 x = 4 x = 6 (x + 5)(x + 5) x+5 xx2x2 5x +55x25...

期待値と分散 e(ax+b) = ae(x) + b e(x+y) = e(x) + e(y)...

geometria e Álgebra. dois ângulos opostos pelo vértice...

prefered club fact sheet ingles final - serenade punta cana...

y r i y r i y r i x s a x b a x b x a b a b «» us s 1 s a...

example x := read() v := a + b x := x + 1 w := x + 1 a := w...

original program: a x b x c -> e1 b x c x d -> e2 c x...

the prefered opinion regarding the issue of rafʿ al-yadayn

i p i b -...

segregaciÓn b d b d b b x a x b d x b a x b a b d x b b x

cannon prefered products

ggosy.files.wordpress.com€¦ · web viewỹ x =a+ b 1 *...

variational data assimilationdarc/training/ecmwf... ·...

metodologÍa de la superficie de respuesta · superficie de...

nonlinear regression without i.i.d. assumption · 2019. 4....