a framework for modeling positive class expansion with ... · 1, is prefered ( , ) 1, is prefered...
Post on 17-Oct-2020
4 Views
Preview:
TRANSCRIPT
http://lamda.nju.edu.cn
A Framework for Modeling Positive
Class Expansion with Single Snapshot
Yang Yu and Zhi-Hua Zhou
LAMDA Group
National Key Laboratory for Novel Software Technology
Nanjing University, China
http://lamda.nju.edu.cn
Motivating task
3G
2G
1G
evolution of mobile
telecom network
http://lamda.nju.edu.cn
Motivating task
3G
2G
1G
evolution of mobile
telecom network
we are at the moment
of moving towards 3G
http://lamda.nju.edu.cn
Motivating task
3G
2G
1G
evolution of mobile
telecom network
we are at the moment
of moving towards 3G
predict the 2G users that will turn to use 3G
http://lamda.nju.edu.cn
Analysis of the task
2G starts 2G dominates 3G starts 3G dominates
time line:
event:
http://lamda.nju.edu.cn
Analysis of the task
class distribution:
2G starts 2G dominates 3G starts 3G dominates
time line:
event:
http://lamda.nju.edu.cn
Analysis of the task
class distribution:
2G starts 2G dominates 3G starts 3G dominates
time line:
event:
when we train the model what we want to predict
http://lamda.nju.edu.cn
Analysis of the task
class distribution:
2G starts 2G dominates 3G starts 3G dominates
time line:
event:
when we train the model what we want to predict
positive class expansion with single snapshot (PCES) problem
http://lamda.nju.edu.cn
Outline
• A new data mining problem: PCES
• Why we need the PCES problem
• A solution to the PCES problem
• Results
• Conclusion
http://lamda.nju.edu.cn
Outline
• A new data mining problem: PCES
• Why we need the PCES problem
• A solution to the PCES problem
• Results
• Conclusion
http://lamda.nju.edu.cn
Formulation of classical learning
• i.i.d. instances
• training set drawn from a distribution
• fixed labeling function
a learning algorithm outputs a function to minimize:
1{ }ni i
D x
( | )p y x
(̂ ; , ( | ))f D p y x
ˆ ~( ; , ( | ))( (̂ ; , ( | )) (, )| )
f D p yf Der pr y p y
xxx xL x
can not model a changing labeling function
http://lamda.nju.edu.cn
• labeling function at training time
• labeling function at testing time
a learning algorithm outputs a function to minimize:
Formulation of PCES
( | )trp y x
( | )tep y x
(̂ ; , ( | ))tr
f D p y x
ˆ ~( ; , ( | )), )(̂ ; , ( | )) ( |( )
t ef D p tryf pD per y yr
xxxL x x
with a constraint:
( | ) (: |~ )te t tr ty y p yp yxx x
( 1 | ) ({ 1, 1}, ~ 1 ): |te try p yy px x x
for convenience, we assume:
http://lamda.nju.edu.cn
Another example
positive class:
hot items
negative class:
not hot items
http://lamda.nju.edu.cn
Another example
the PCES problem
, only one snapshotthe positive class is expanding
positive class:
hot items
negative class:
not hot items
http://lamda.nju.edu.cn
Further example
positive class:
hot items
negative class:
not hot items
http://lamda.nju.edu.cn
Further example
the PCES problem
, only one snapshotthe positive class is expanding
positive class:
hot items
negative class:
not hot items
http://lamda.nju.edu.cn
Outline
• A new data mining problem: PCES
• Why we need the PCES problem
• A solution to the PCES problem
• Results
• Conclusion
http://lamda.nju.edu.cn
Related learning frameworks
• PU-Learning (learning with positive and unlabeled data)
• Concept drift
• Covariance shift
http://lamda.nju.edu.cn
PU-Learning
Setting:
only positive instances and unlabeled instances are in
the training data
Assumption:
the positive instances are representatives of the positive
class concept [Liu et al, ICML02][Yu et al, KDD02]
PCES: positive class is in expansion
PU-Learning could not catch expanded class concept
http://lamda.nju.edu.cn
Concept Drift
Setting:
instances are coming sequentially batch by batch,
the target concept may change in the coming batch
Assumption:
a series of data samples are available for drift detection [Klinkenberg & Joachims, ICDM00][Kolter & Maloof, ICML03]
PCES: only a single snapshot is available
concept drift approaches are disabled
http://lamda.nju.edu.cn
Assumption:
the labeling function is fixed
Covariance Shift
(or sample selection bias [Shimodaira, JSPI00])
Setting:
training and test instances are drawn from different
distributions, i.e., is in changing
( | )p y x
( )p x
PCES: is fixed but is in change
covariance shift approaches are disabled
( )p x ( | )p y x
http://lamda.nju.edu.cn
Outline
• A new data mining problem: PCES
• Why we need the PCES problem
• A solution to the PCES problem
• Results
• Conclusion
http://lamda.nju.edu.cn
Optimized by SGBDota
The proposed approach
Learn from
pure data
Incorporate
preference bias
Combined objective
http://lamda.nju.edu.cn
Learn from pure data
Observation:
a desired leaner ranks positive training instances higher
than negative training instances
exactly expressed by the AUC (area under ROC) criterion:
1( ( ) ( ))
| ||( )
|1
Du
Da c
f fD D
L fx x
I x x
http://lamda.nju.edu.cn
Learn from pure data
( ( ) ( )) 11( ) 1
|)
|| |(1 f
uD
cD
faL
Def
Dx
x x
x
( ( ) ( )) 1
( ( ) ( )) 1
1(1 )
| |1
(1 )| |
1
( , )1
a
f f
Duc
f f
D
eD
DL
e
D
D
f
x x
x
x x
x
x
xx
smoothed loss function:
instance-wise loss function:
http://lamda.nju.edu.cn
Incorporate preference bias
User can provide preferences by
• indicating preferences on randomly sampled instance pairs
• applying a priori rules that indicate the preferences
1, is prefered
( , ) 1, is prefered
0, equal or unknown
a
bbak
x
x x x
2(
1( )
|( )
|) ( , )( ) 1
ba
a b a bD
prD
efff kL
Df
x x
x x x xI
In either way, we can have a preference function
Loss function
http://lamda.nju.edu.cn
Incorporate preference bias
smoothed loss function
1( ( ) ( )) , )
2
(1
|(
|1 1) a b a b
ba
f f k
Dp
Dreff eL
D x x
x x x x
instance-wise loss function
(1
( ) ( )) ( , )( , ) 11
1| |
a
a
af f
fD
re
k
pL e
Df x x x x
x
x
http://lamda.nju.edu.cn
Combine the two objectives
the combined loss function
( )( ) ) (auc pref
L f L f L f
the learning problem thus is
ˆ argmin argmi( ) ( )n ( )auc pref
f fL f L f Lf f
http://lamda.nju.edu.cn
Optimization
Gradient Boosting [Friedman, AnnStat01, CSDA02]
* argmin ( ( ), )f
f L f yx x
x0
( ;( ) )t
t
T
tF h xx
( , ) 1, ) argmin (; )( )(t t t
L hF
1
2
( ) ( )
( ( ))( ; )
( )argmin
t
tD f F
L fh
fx x x
xx
x 1argmin (; ))(
t t tL F h
http://lamda.nju.edu.cn
Optimization
Gradient Boosting [Friedman, AnnStat01, CSDA02]
* argmin ( ( ), )f
f L f yx x
x0
( ;( ) )t
t
T
tF h xx
( , ) 1, ) argmin (; )( )(t t t
L hF
1
2
( ) ( )
( ( ))( ; )
( )argmin
t
tD f F
L fh
fx x x
xx
x 1argmin (; ))(
t t tL F h
Gradient Boosting fits y, but we need to
fit both y and k
http://lamda.nju.edu.cn
Optimization with double targets
SGBDota (Stochastic Gradient Boosting with DOuble TArgets)
* argm ( )in ) (auc pref fL ff L f
,1 ,1 ,2 ,21 20
( ; )( )( ; )) (T
tt t t thF hx xx
1 1 2 2,,1 ,1 ,2 , 1 2, ( ) 22 , 1 1, , ) ar( gmin (; ) (( ;, ))
t t t t thF hL
1
2
,1
( ) ( )
( ( ))( ;argmin )
( )t
auct
D f F
L fh
fx x x
xx
x1 2
,1 ,2
(
1 1 1 ,1 ,2
, )
2 2
, )
(
(
argmin
(; ) (; ))
t t
t t thL F h
1
2
,2
( ) ( )
( ( ))( ; )
( )argmin
t
pref
tD
f F
L fh
fxx x
xx
x
http://lamda.nju.edu.cn
SGBDota
Optimize by SGBDota
Learn from
pure dataIncorporate
preference bias
Combined objective
http://lamda.nju.edu.cn
Outline
• A new data mining problem: PCES
• Why we need the PCES problem
• A solution to the PCES problem
• Results
• Conclusion
http://lamda.nju.edu.cn
Data Sets
A synthetic data set + 4 UCI data sets
postoperative
segment
veteran
pbc
Evaluation method2/3 as the training data, 1/3 as test data
repeated for 20 times random splits
http://lamda.nju.edu.cn
Data Sets – con’t
Dataset
name: postoperative
description: patient state after operation
original classes:
ICU, general hospital floor, prepare to go home
Positive class for training
ICU
Positive class for testing
ICU + general hospital floor some patients in general hospital floor will be sent to ICU
http://lamda.nju.edu.cn
Data Sets – con’t
Dataset
name: segment
description: outdoor images
original classes:
brickface, sky, cement, window, path, foliage, and grass
Positive class for training
grass
Positive class for testing
grass + foliage + path
moving focus
http://lamda.nju.edu.cn
Data Sets – con’t
Dataset
name: veteran
description: lung cancer trial data
original class:
survival time
Positive class for training
survival time < 12 hours
Positive class for testing
survival time < 24 hours
predict future victims
http://lamda.nju.edu.cn
Data Sets – con’t
Dataset
name: pbc
description: primary biliary cirrhosis trial data
original class:
living time
Positive class for training
living time < 365 days
Positive class for testing
living time < 1460 days
predict future victims
http://lamda.nju.edu.cn
Comparing Methods
The only one approach for PCES
GetEnsemble
A classical learning approach
Random Forests
A PU-Learning approach
PU-SVM
A degenerate version: which does not use domain knowledge
SGBAUC
An easy approach
Random guess
http://lamda.nju.edu.cn
SGBDota Configuration
SGBDota-1: positive class expands from dense positive area
to sparse positive area
SGBDota-2: positive class expands from dense positive area
to sparse positive area and sparse negative area
SGBDota-3: positive class expands along with the
neighborhoods linearly
for UCI datasets, we try three preferencesthe first two are reasonable for most tasks
http://lamda.nju.edu.cn
Result on Synthetic Data
http://lamda.nju.edu.cn
Result on Synthetic Data
http://lamda.nju.edu.cn
Result on Synthetic Data
Random Forests PU-SVM SGBAUC SGBDota-1
http://lamda.nju.edu.cn
Result on Synthetic Data
Random Forests PU-SVM SGBAUC SGBDota-1
http://lamda.nju.edu.cn
Result on Synthetic Data
Random Forests PU-SVM SGBAUC SGBDota-1
http://lamda.nju.edu.cn
Results on UCI data sets
AUC values of SGBDota, Random forests (RF), PU-SVM, SGBAUC and Random
t-test results (win/tie/loss counts)
using the first two preferences
SGBDota with reasonable preference is better
Dataset SGBDota-1 SGBDota-2 GetEnsemble SGBAUC PU-SVM RF Random
posto .470±.131 .483±.111 .464±.083 .457±.084 .457±.107 .448±.076 .456±.148
segment .821±.031 .822±.029 .757±.030 .744±.012 .753±.020 .750±.014 .506±.018
veteran .658±.118 .650±.115 .663±.090 .658±.093 .627±.146 .637±.102 .522±.069
pbc .721±.034 .726±.032 .684±.033 .665±.041 .709±.033 .710±.043 .503±.043
GetEnsemble SGBAUC PU-SVM RF Random
SGBDota-1 2/2/0 2/2/0 1/3/0 1/3/0 3/1/0
SGBDota-2 2/2/0 2/2/0 2/2/0 2/2/0 3/1/0
http://lamda.nju.edu.cn
Results on UCI data sets
AUC values of SGBDota, Random forests (RF), PU-SVM, SGBAUC and Random
t-test results (win/tie/loss counts)
How about using a less reasonable preference ?
The preference must not be misleading
Dataset SGBDota-3 GetEnsemble SGBAUC PU-SVM RF Random
posto .459±.132 .464±.083 .457±.084 .457±.107 .448±.076 .456±.148
segment .744±.025 .757±.030 .744±.012 .753±.020 .750±.014 .506±.018
veteran .544±.094 .663±.090 .658±.093 .627±.146 .637±.102 .522±.069
pbc .638±.054 .684±.033 .665±.041 .709±.033 .710±.043 .503±.043
GetEnsemble SGBAUC PU-SVM RF Random
SGBDota-3 0/2/2 0/2/2 0/2/2 0/2/2 2/2/0
http://lamda.nju.edu.cn
Outline
• A new data mining problem: PCES
• Why we need the PCES problem
• A solution to the PCES problem
• Results
• Conclusion
http://lamda.nju.edu.cn
Conclusions
Main contribution
• A new data mining problem: PCES
• exists in many real world applications
• not well handled by current techniques
• An initial solution
Feature work
• better solutions
• real applications
http://lamda.nju.edu.cn
THANK YOU
top related