fast feature selection for learning to rank - acm international conference on information retrieval...
TRANSCRIPT
ACM International Conference on the
Theory of Information RetrievalUniversity of Delaware, Newark, DE, USA September 13-16, 2016
Fast Feature Selection Algorithms
for Learning to Rank
Andrea Gigli
Department of Computer Science, University of Pisa & ISTI – CNR PisaFranco Maria Nardini, Claudio Lucchese, Raffaele Perego
ISTI – CNR Pisa & istella*, Pisa
Outline
� Introduction
�Proposed Feature Selection Algorithms (FSA)
�Application to Learning to Rank
ICTIR 2016, Newark, DE
Outline
� Introduction
�Proposed Feature Selection Algorithms (FSA)
�Application to Learning to Rank
ICTIR 2016, Newark, DE
...
…
…
...
...
Learning
System
Ranking
System
Indexed
Documents
... Training
Prediction
How to Rank Documents using
Supervised Learning
...
��,� ��,�
��,�
��,��
��,� ��,��
…
…
�,� �,�
�,�
�,��
�,� �,��
��
� �����, ��,��
� ��
�������, ����
�������, ����
��: query i
��,�: document j
associated to the query i
��,�: relevance
label for the j-thdocument associated to the i-th query
����, ��: scoring function
ICTIR 2016, Newark, DE
� �� �, �
Learning to Rank
� ,� � ,� � ,�!��… …� ,� � ,� � ,�!
"�,�(�)
"�,�(#)
"�,�($)
⋮
"�,�(&)
"�,#(�)
"�,#(#)
"�,#($)
⋮
"�,#(&)
"�,��
(�)
"�,��
(#)
"�,�� ($)
⋮
"�,��
(&)
…
Documents Query-Document LabelsQuery
� �� �, � ≈ ��(()
� K is in order of hundreds, thousands
ICTIR 2016, Newark, DE
Outline
� Introduction
�Proposed Feature Selection Algorithms (FSA)
�Application to Learning to Rank
ICTIR 2016, Newark, DE
We propose the following algorithms
� Naïve Greedy search Algorithm for feature Selection (N-GAS)
� eXtended naïve Greedy search Algorithm for feature Selection
(X-GAS)
� Hierarchical clustering Greedy search Algorithm for feature
Selection (H-GAS)
Proposed Algorithms for Feature
Selection
ICTIR 2016, Newark, DE
� We compare them with the Greedy search Algorithm for
feature Selection (GAS) proposed by Geng, Liu, Qin, Li (SIGIR07)
� All the competing FSAs belong to Filter Methods family.
� Competing FSAs try to to Maximise the Importance of a feature w.r.t. the judgements and Minimize Similarity
among selected features.
� Both X-GAS and the GAS require hyper-parameter calibration.
Proposed Algorithms for Feature
Selection
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection
� Naïve Greedy search Algorithm for feature Selection (N-GAS)
� eXtended naïve Greedy search Algorithm for feature Selection
(X-GAS)
� Hierarchical clustering Greedy search Algorithm for feature
Selection (H-GAS)
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: N-GAS
The graph is built and the Subset S of n=4 selected features is initialized.
Importance of the 8th
feature w.r.t. query-offer
relevance judgements
Similarity
between features
6th and 7th
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: N-GAS
Start by adding the node with the highest importance to S(Node ❶ in this example)
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: N-GAS
• Let � be the node having the lowest similarity wrt Node ❶
• Let � be the node having the highest similarity wrt Node �
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: N-GAS
From (�, � ) select the Node with the highest importance and add it to S (Node � in the example).
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: N-GAS
• Let ❷ be the node having the lowest similarity wrt Node �• Let ❸ be the node having the highest similarity wrt Node ❷
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: N-GAS
From (❷, ❸ ) select the node with the highest importance and add it to S (Node ❷ in the example).
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: N-GAS
• Let ❹be the node having the lowest similarity wrt Node ❷• Let ❽ be the node having the highest similarity wrt Node ❹
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: N-GAS
In (❹, ❽ ) select the node with the highest importance and add it to S (Node ❹ in the example)
ICTIR 2016, Newark, DE
� Naïve Greedy search Algorithm for feature Selection (N-GAS)
� eXtended naïve Greedy search Algorithm for feature Selection
(X-GAS)
� Hierarchical clustering Greedy search Algorithm for feature
Selection (H-GAS)
Proposed Algorithms for Feature
Selection
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: X-GAS
The graph is built and the Subset S of n=4 selected features is initialized.
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: X-GAS
Start by adding the node with the highest importance to S(Node ❶ in this example)
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: X-GAS
Select the 50% of nodes less similar to ❶
Filter
Parameter
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: X-GAS
From the selection take the node with the highest importance and
add it to S (Node � in the example)
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: X-GAS
Select the 50% of the nodes less similar to �
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: X-GAS
From the selection take the node with the highest importance and add it to S ( Node ❸ in the example)
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: X-GAS
Select the 50% of the nodes less similar to ❸
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: X-GAS
From the selection take the node with the highest importance
and add it to S (Node ❹ in the example)
ICTIR 2016, Newark, DE
� Naïve Greedy search Algorithm for feature Selection (N-GAS)
� eXtended naïve Greedy search Algorithm for feature Selection
(X-GAS)
� Hierarchical clustering Greedy search Algorithm for feature
Selection (H-GAS)
Proposed Algorithms for Feature
Selection
ICTIR 2016, Newark, DE
Proposed Algorithms for Feature
Selection: H-GAS
1
4
5
7
6
8
2
3
1
4
6
5
2
3
8
7
8
3
7
5
2
1
4
6
1
5
8
4
ICTIR 2016, Newark, DE
Outline
� Introduction
�Proposed Feature Selection Algorithms (FSA)
�Application to Learning to Rank
ICTIR 2016, Newark, DE
Application to Web Search
Engine Data
� Bing data http://research.microsoft.com/en-us/projects/mslr/
� Yahoo! data http://webscope.sandbox.yahoo.com
Train Validation Test
# queries 19,944 2,994 6,983
# urls 473,134 71,083 165,660
# features 519
Train Validation Test
# queries 18,919 6,306 6,306
# urls 723,412 235,259 241,521
# features 136
ICTIR 2016, Newark, DE
Experimental Framework
Importance, ) �� :+,-.@10 using each � as a
ranking model
Similarity, 2 ��, �� :Spearman Rank Correlation
Coefficient
Distance, 3 ��, �� : 1 − S � , �6
L2R Algorithm: LambdaMART
ICTIR 2016, Newark, DE
Select a subset of n<K
features using a given FSA
Repeat for different n in{5%K, 10%K, 20%K, 30%K, 40%K, 50%K, 75%K, K}
Experimental Protocol
Train LamdaMARTusing n features
Measure LamdaMARTPerformance on the
Test SetCompare
FSAs using average
�789@�:
1 2 3 4
Repeat from ❶for each FSA
ICTIR 2016, Newark, DE
Results on “Bing” dataset �789@�:
Feature Subset Size
as % of the Feature Set
Size (K)
ICTIR 2016, Newark, DE
Results on “Yahoo!” dataset
Feature Subset Size
as % of the Feature Set
Size (K)
ICTIR 2016, Newark, DE
�789@�:
Feature Subset
Dimension5% 10% 20% 30% 40% 100%
N-GAS 0.4011▼ 0.4459 0.471 0.4739▼ 0.4813 0.4863
X-GAS, p = 0.05 0.4376▲ 0.4528 0.4577▼ 0.4825 0.4834 0.4863
H-GAS, "single" 0.4423▲ 0.4643▲ 0.4870▲ 0.4854 0.4848 0.4863
H-GAS, "ward" 0.4289 0.4434▼ 0.4820 0.4879 0.4853 0.4863
GAS, c = 0.01 0.4294 0.4515 0.4758 0.4848 0.4863 0.4863
Feature Subset
Dimension5% 10% 20% 30% 40% 100%
N-GAS 0.7430▼ 0.7601 0.7672 0.7717 0.7724 0.7753
X-GAS, p = 0.8 0.7655 0.7666 0.7723 0.7742 0.7751 0.7753
H-GAS, "single" 0.7350▼ 0.7635 0.7666 0.7738 0.7742 0.7753
H-GAS, "ward" 0.7570▼ 0.7626 0.7704 0.7743 0.7755 0.7753
GAS, c = 0.01 0.7628 0.7649 0.7671 0.773 0.7737 0.7753
Results
Yahoo! dataset
Bing dataset
ICTIR 2016, Newark, DE
Conclusion
� X-GAS e H-GAS show a performance greater or equal thanthe benchmark model
� H-CAS and N-GAS are more efficient than the othersbecause do not need any hyper-parameter calibration.
� Future Work:
� experiments on the new LtR dataset provided by istella*(http://blog.istella.it/istella-learning-to-rank-dataset/)
� application to other ML contexts, sorting problems and ensemble learning.
ICTIR 2016, Newark, DE
ACM International Conference on the
Theory of Information RetrievalUniversity of Delaware, Newark, DE, USA September 13-16, 2016
Thank you and
special thanks to ACM-SIGIR for
the Travel Grant support
Andrea Gigli Email: [email protected]
Twitter: @andrgig
http://www.slideshare.net/andrgig