apr. 11, 2018 global mutual information based feature ... · recruit restaurant visitor forecasting...
TRANSCRIPT
![Page 1: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/1.jpg)
Global Mutual Information Based Feature Selection By Quantum Annealing
Kotaro Tanahashi*, Shinichi Takayanagi*, Tomomitsu Motohashi*, Shu Tanaka✝ * Recruit Communications Co.,Ltd. ✝ Waseda University, JST PRESTO
Apr. 11, 2018
![Page 2: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/2.jpg)
(C)Recruit Communications Co., Ltd.2
Introduction of Recruit
We provide various kinds of online services from job search to hotel reservations across the world.
Automobile
Education
Life & Local O2O
Travel
Beauty
Housing Bridal & Baby
Human Resources
IT & Trends Media
Dining
www.flaticon.com
![Page 3: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/3.jpg)
(C)Recruit Communications Co., Ltd.3
Introduction of Recruit
Internet Users Clients
• We help users to find the best clients through our services. • Data science plays an important role in the business.
![Page 4: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/4.jpg)
(C)Recruit Communications Co., Ltd.4
Data Science at Recruit
Recruit has hosted two data mining competitions in Kaggle Kaggle, KDD Cup: International competitions of data mining
{Engineers at Recruit (as of March 2018)
We are passionate about data scienceSome of us came in 1st and 2nd place in KDD Cup 2015
www.kdd.org/kdd-cup
Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015)
www.kaggle.com
![Page 5: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/5.jpg)
(C)Recruit Communications Co., Ltd.5
Feature Selection: A Key Technique
“Beating Kaggle the easy way”
• A key technique to win data mining competitions • Find the most relevant features • Balance bias-variance trade-off
Features
ndata
n featuresrelevant features
data
User 1 User 2 User 3 User 4
User n-1 User n
✔ Improve prediction ✔ Reduce computational cost
Benefits
https://www.ke.tu-darmstadt.de/lehre/arbeiten/studien/2015/Dong_Ying.pdf
Feature selection is essential in predictive analysis
![Page 6: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/6.jpg)
(C)Recruit Communications Co., Ltd.6
Types of Feature Selection (FS) Algorithms
Wrapper methods Iteratively evaluate a feature subset by black-box learning algorithm
Set of all features
Generate a subset Learning Algorithm
Selecting the best subset
Performance
Embedded methods Train a model and select features at the same time
Set of all features
Generate a subset
Learning Algorithm + Performance
Selecting the best subset
Filter methods Features are selected by some criteria such as Mutual Information
✔ Independent on learning algorithms ✔ Can be used as a pre-processing
Set of all features PerformanceSelecting the
best subsetLearning Algorithm
Filter methods are useful as a pre-processing since it does not dependent on the predictive models
![Page 7: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/7.jpg)
(C)Recruit Communications Co., Ltd.7
What is Mutual Information (MI)?
Figures are retrieved from http://minepy.readthedocs.io/en/latest/python.html
Mutual Information I(X;Y) is a measure of the mutual independence between two random variables X and Y
Shannon entropy
Pearson r = 0.8 MI = 0.5
Pearson r = 0.0 MI = 0.7
Pearson r = 0.0 MI = 0.1
✔ MI can capture non-linear relationships unlike Pearson’s correlation coefficient
Mutual Information I(X;Y)
Able to predict Y given X
Hard to predict Y given XLow
High
![Page 8: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/8.jpg)
(C)Recruit Communications Co., Ltd.8
General formulation of MIFS
Mutual Information based Feature Selection (MIFS)
MIFS: using Mutual Information as a criteria in filter methods
MIFS selects a feature subset with a size of k which maximizes the Mutual Information (MI) between the features and the target variable
Unfortunately, the exact calculation of is intractable…
![Page 9: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/9.jpg)
(C)Recruit Communications Co., Ltd.9
Heuristic MIFS Algorithms
[1] H. Peng et al., 2005 [2] J. R. Vergara & P. A. Estévez, 2015
Max Relevance method Selecting the most relevant feature iteratively
Mim Redundancy & Max Relevance method[1] (MRMR) Selecting the most relevant and least redundant feature iteratively
Repeat k times
Repeat k times
Some heuristic MIFS algorithms have been developed. However, these methods are greedy approximations[2].
![Page 10: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/10.jpg)
(C)Recruit Communications Co., Ltd.10
Our Contributions
MIFS optimization
QUBO formulation of MIFS
MI i
ncre
ase
(%) w
.r.t L
inea
r
5 6 7 8 10 15 20 25 30 40 #features
()06 2-4 1- 0
(2) We confirmed optimizations by D-Wave do well in MIFS
(1) We reformulate MIFS by QUBO
image is retrieved from https://www.dwavesys.com/resources/media-resourcesHOW?
Bet
ter
QUBO: Quadratic Unconstrained Binary Optimization
![Page 11: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/11.jpg)
(C)Recruit Communications Co., Ltd.11
Reformulation of MIFS by QUBO (1)
Theorem 1.1: Chain theorem for Conditional Mutual Information
Using theorem 1.1, the following equation holds for all i ∈ S
Averaging the equation above for all i leads to
Proof.
Expand the MI term
![Page 12: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/12.jpg)
(C)Recruit Communications Co., Ltd.12
Reformulation of MIFS by QUBO (2)
Approximate under the assumption of Conditional Independence (CI)
Proof.If we assume the conditional independence
We can obtain
![Page 13: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/13.jpg)
(C)Recruit Communications Co., Ltd.13
Optimization of MIFS
QUBO formulation of MIFS
α: penalty strengthMI Penalty for selecting
only k features
Reformulation of MIFS by QUBO (3)
MIFS can be optimized by Ising annealing machines
![Page 14: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/14.jpg)
(C)Recruit Communications Co., Ltd.14
Interpretation of the Derived Formulation
Heuristic methods such as Max Relevance or MRMR are included in the derived formulation
Expand the derived formulation
Increase: Relevance, Complementary Reduce: Redundancy
Relevance Redundancy Complementary
![Page 15: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/15.jpg)
(C)Recruit Communications Co., Ltd.15
Comparison of Optimization Methods
Binary Quadratic Problem (BQP)
QUBO
Linear Relaxation[1] (Linear)
Problem Formulation Optimization Methods
Truncated Power[1,2] (TPower)
Tabu Search by qbsolv[3]
D-Wave 2000Q
[1] H. Venkateswara, et al., 2015 [2] X. T. Yuan & T. Zhang, 2013 [3] https://github.com/dwavesystems/qbsolv
We compared several optimization methodsfor two types of formulations (BQP, QUBO)
![Page 16: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/16.jpg)
(C)Recruit Communications Co., Ltd.16
Linear Relaxation Method (Linear)
[1] H. Venkateswara, et al., 2015
Linearize the quadratic term by introducing new variables
One of the optimal conditions is , which leads to
Since Qij ≧ 0, the solution of this problem is given by k largest column sum of Q.This solution is tightly bounded[1]. Time complexity is O(nk).
The computation of Linear is fastand the solution is tightly bounded.
![Page 17: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/17.jpg)
(C)Recruit Communications Co., Ltd.17
Truncated Power Method (TPower)Finding the largest k-sparse eigenvector of Q is defined as
We select i th feature if xi > 0This is calculated by the following procedure[1]
[1] X. T. Yuan & T. Zhang, 2013 [2] H. Venkateswara, et al., 2015
Repeat T times
This method is confirmed to be the best-performing method for BQP problem with non-negative matrix[2]. Time complexity of the algorithm is O(Tn2).
TPower is known to be the state-of-the-art method for BQP problems
![Page 18: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/18.jpg)
(C)Recruit Communications Co., Ltd.18
Optimization by D-Wave Machine
• Machine: D-Wave 2000Q • Embedding: 64 bit full connection • Annealing Time: 20µs • Annealing Repetitions: 10
Full Connection Embedding for C(4,4,4)
We used the D-Wave machine with the following settings
When feature size n is larger than hardware size h (=64), we use Linear to narrow down the candidate features to h as a pre-processing.
![Page 19: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/19.jpg)
(C)Recruit Communications Co., Ltd.19
Comparison of Mutual Information Score
Data Name: a1a #features: 122 #data points: 8000
MI i
ncre
ase
(%) w
.r.t L
inea
r
5 6 7 8 10 15 20 25 30 40 #features
()06 2-4 1- 0
Mutual Information Score
Bet
ter
We compared MI scores of each optimization method for a public dataset. The increases with regard to Linear are shown in the graph below.
D-Wave obtained the best MI scores among other methods
![Page 20: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/20.jpg)
(C)Recruit Communications Co., Ltd.20
Classification AccuracyWe calculated the classification accuracy for different #features. Accuracy is a good measure to evaluate the quality of a selected subset of features.
Classification Accuracy
Original features
Selected k-features
Measure the classification accuracy by random forest classifiers
![Page 21: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/21.jpg)
(C)Recruit Communications Co., Ltd.
0.78
0.76
0.74
0.72
0.70
Acc
urac
y
403530252015105#features
D-Wave TPower Tabu(qbsolv) Linear
21
The accuracies of D-Wave are better when #features is small
Classification Accuracy
Better
Bet
ter
We evaluated each method by classification accuracy for different #features.
Data Name: a1a #features: 122 #data points: 8000
Classification Accuracy
![Page 22: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/22.jpg)
(C)Recruit Communications Co., Ltd.22
Summary
• We derived the QUBO formulation of MIFS so that the problem can be embedded in Ising machines
• We used the D-Wave quantum annealing machine as a solver in MIFS
• The optimization method by D-Wave outperformed TPower which is the state-of-the-art optimization method for BQP
• We are planning to use MIFS by D-Wave in Kaggle!
![Page 23: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/23.jpg)
(C)Recruit Communications Co., Ltd.23
Thank you for listening
![Page 24: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/24.jpg)
(C)Recruit Communications Co., Ltd.24
Runtime of Optimizations
Data Name: a1a #features: 122 #data points: 8000
method Averaege Runtime
Linear 9.0 msec
TPower 26.1 msec
Tabu(qbsolv) 14.3 sec
D-Wave 9.0 msec (Linear)+ 100 μsec (annealing)
![Page 25: Apr. 11, 2018 Global Mutual Information Based Feature ... · Recruit Restaurant Visitor Forecasting (2018) Coupon Purchase Prediction (2015) . 5 (C)Recruit Communications Co.,](https://reader034.vdocuments.mx/reader034/viewer/2022050603/5fab229718201c63f9663ede/html5/thumbnails/25.jpg)
(C)Recruit Communications Co., Ltd.25
Comparison to MRMR, Max Rel.
0.78
0.76
0.74
0.72
0.70
Acc
urac
y
403530252015105#features
D-Wave MRMR Max Rel.
Data Name: a1a #features: 122 #data points: 8000