temporal latent topic user profiles for search personalisation
TRANSCRIPT
Thanh Vu, Alistair Willis, Dawei Song
The Open University, UK
Temporal Latent Topic User Profiles for Search Personalisation
Son N. TranCity London University
The 37th European Conference on Information Retrieval
31st of March, 2015
Temporal Latent Topic User Profiles for Search Personalisation
Search Personalisation
2
Return search results based onThe input queryThe user searching interests
Different users submit the same input query will probably get different search result lists
Even an individual user will get different search results at different search times (e.g., Open US)
Temporal Latent Topic User Profiles for Search Personalisation
The performance of search personalisation
depends onthe richness of a user
profileJ. Teevan, M. R. Morris, and S. Bush. Discovering and using groups to improve personalized search. In WSDM’2009
3
Temporal Latent Topic User Profiles for Search Personalisation
Topic-based user profiles
4
Use Human generated ontology (ODP – dmoz.org) to extract topics from all clicked/relevant documents of a specific user to build her profile
1. R. W. White, et al., Enhancing Personalized Search by Mining and Modeling Task Behavior. In WWW’20132. P. N. Bennett, et al., Modeling the impact of short- and long-term behavior on search personalization. In SIGIR’2012
Temporal Latent Topic User Profiles for Search Personalisation
Challenges for Human Generated Ontology
5
New topics which are not covered in the Ontology will possibly emerge overtime
Expensive human effort to classify/maintain each document into correct categories
Temporal Latent Topic User Profiles for Search Personalisation
Challenges for Time-awareness
6
Previous methods use all the clicked/relevant documents of a user to build her searching profile
The documents are treated equally without considering temporal features (i.e., the time of documents being clicked and viewed)The profile is too broad Cannot fully express the current interest of
the user1. T. T. Vu, et al., Improving search personalisation with dynamic group formation. In SIGIR’20142. K. Raman, et al., Toward whole-session relevance: Exploring intrinsic diversity in web search. In SIGIR’2013
Temporal Latent Topic User Profiles for Search Personalisation
Research Questions
7
1. How can we build user profiles with time-awareness?
2. Do the time-aware profiles help improve search performance?
Temporal Latent Topic User Profiles for Search Personalisation
Applying Latent Dirichlet Allocation
8
Temporal Latent Topic User Profiles for Search Personalisation
Building temporal latent topic user profiles (1)
9
Non-temporal method
4th 1st2nd3rd
FootballLawHealthOS
0.510.330.110.05
Clicked documents
FootballLawOSHealth
0.550.270.100.08
LawOSHealthFootball
0.410.370.120.10
OSLawFootballHealth
0.650.210.100.04
Distribution over topics
FootballLawOSHealth
0.320.300.290.09
Means over topics
The topic-based user profile
Temporal Latent Topic User Profiles for Search Personalisation
Building temporal latent topic user profiles (2)
10
Our method
1st
FootballLawHealthOS
0.510.330.110.05
FootballLawHealthOS
0.510.330.110.05
The temporal topic user profile
0.90
Temporal Latent Topic User Profiles for Search Personalisation
FootballLawHealthOS
0.530.300.090.08
Building temporal latent topic user profiles (2)
11
2nd 1st
FootballLawHealthOS
0.510.330.110.05
FootballLawOSHealth
0.550.270.100.08
The temporal topic user profile
0.91 0.90
Temporal Latent Topic User Profiles for Search Personalisation
FootballLawOSHealth
0.370.340.190.10
0.91
0.92
Building temporal latent topic user profiles (2)
12
3rd 1st2nd
FootballLawHealthOS
0.510.330.110.05
FootballHealthOSLaw
0.550.270.100.08
LawOSHealthFootball
0.410.370.120.10
The temporal topic user profile
0.90
Temporal Latent Topic User Profiles for Search Personalisation
OSLawFootballHealth
0.320.300.290.09
Building temporal latent topic user profiles (2)
13
4th 1st2nd3rd
FootballLawHealthOS
0.510.330.110.05
FootballHealthOSLaw
0.550.270.100.08
LawOSHealthFootball
0.410.370.120.10
OSLawFootballHealth
0.650.210.100.04
Temporal topic profile
0.93
0.92
0.91
0.90
FootballLawOSHealth
0.320.300.290.09
Non-temporal topic profile
Temporal Latent Topic User Profiles for Search Personalisation
Building temporal latent topic user profiles (3)
14
Du = {d1, d2, …, dn} is a relevant document set of the user u
The user profile of u is a distribution over the topic Z (extracted by LDA)
tdi = n indicates that di is the nth most relevant/clicked document of u
α is the decay parameter; K is the normalisation factor
Temporal Latent Topic User Profiles for Search Personalisation
Building temporal latent topic user profiles (4)
15
Long-term user profileUse relevant documents extracted from the
user’s whole search historyDaily user profile
Use relevant documents extracted from the search history of the user in the current searching day
Session user profileUse relevant documents extracted from the
search history of the user in the current search session
Temporal Latent Topic User Profiles for Search Personalisation
Re-ranking search results (1)
16
1 32
HealthLawFootballOS
0.510.330.110.05
FootballLawHealthOS
0.550.270.130.05
FootballOSHealthLaw
0.410.370.120.10
Original Rank
132
HealthLawFootballOS
0.510.330.110.05
FootballLawHealthOS
0.550.270.130.05
FootballOSHealthLaw
0.410.370.120.10
After re-ranking
FootballLawOSHealth
0.470.240.160.12
The user profile (p)
Temporal Latent Topic User Profiles for Search Personalisation
Re-ranking search results (2)
17
Personalised scoresUse Jensen-Shannon divergence (DJS[d||p] )
1 32
HealthLawFootballOS
0.510.330.110.05
FootballLawHealthOS
0.550.270.130.05
FootballOSHealthLaw
0.410.370.120.10
FootballLawOSHealth
0.470.240.160.12
Returned documents (d)
The user profile (p)
Re-ranking search results (3)
18
Re-ranking Features
Re-Ranking Algorithm: LambdaMART[1]
1. C. J. Burges, et al., Learning to rank with non-smooth cost functions. In NIPS’2007.
Feature DescriptionPersonalised FeaturesLongTermScore
Personalised score between document and long-term profile
DailyScore Personalised score between document and daily profile
SessionScore Personalised score between document and session profile
Non-personalised FeaturesDocRank Rank of document on original returned listQuerySim Cosine similarity score between current and
previous queriesQueryNo Total number of queries that have been submitted in
the current search session (included the current query)
Temporal Latent Topic User Profiles for Search Personalisation
Evaluation
19
DatasetThe query logs of 1166 anonymous users in four
weeks, from 01st to 28th July 2012A log entity consists of an anonymous user
identifier, a query, top-10 returned URLs, and clicked documents along with the user’s dwell time
Download all the URLs’ content for learning topicsA search session is demarcated by 30 minutes of
user inactivityA relevant document is a click with dwell time of at
least 30 seconds or the last click in a session (SAT click)
Temporal Latent Topic User Profiles for Search Personalisation
Evaluation methodology
20
Assign a positive (relevant) label to a returned URL ifit is a SAT click in the current queryit is a SAT click in one of the other repeated
queries in the same search sessionAssign negative (irrelevant) labels to the
rest of URLs
Temporal Latent Topic User Profiles for Search Personalisation
Personalisation Methods and Baselines
21
Personalisation MethodsLON uses only LongTermScore from long-term profileDAI uses only DailyScore from daily profileSES uses SessionScore from session profileALL uses all personalised scores from three profiles
(ALL)Baselines
Default is the default ranking returned by the search engine
Static uses the LongTermScore from long-term profile without time-awareness (i.e., not using decay function)
Temporal Latent Topic User Profiles for Search Personalisation
Results
22
Evaluation metricsMean Average Precision (MAP)Precision (P@k)Mean Reciprocal Rank (MRR)Normalized Discounted Cumulative Gain
(nDCG@k) For each evaluation metric, the higher
value indicates the better ranking
Temporal Latent Topic User Profiles for Search Personalisation
Overall Performance
23
• All the improvements over the baselines are all significant with paired t-test of p < 0.001
Temporal Latent Topic User Profiles for Search Personalisation
• Three temporal profiles help to improve search performance over default ranking and the use of non-temporal profile
Conclusions (1)
24
Temporal Latent Topic User Profiles for Search Personalisation
• Using all features (ALL) achieves the highest performance
Conclusions (2)
25
Temporal Latent Topic User Profiles for Search Personalisation
Conclusions (3)
26
• The session profile achieves better performance than the daily profile
• The daily profile gains advantages over the long-term profile
Temporal Latent Topic User Profiles for Search Personalisation
Conclusions (4)
27
• Without time-awareness, the long-term profile gets no improvement over the default ranking
Temporal Latent Topic User Profiles for Search Personalisation
Summary
28
Build long-term, daily and session profiles with time-awareness using topics extracted automatically from relevant documents in different time scales
Use the three profiles to re-rank search results returned by Bing and show the significant improvement in search performances
Temporal Latent Topic User Profiles for Search Personalisation
Click Entropies
32
P(d|q) is the percentage of the clicks on document d among all the clicks for q
A smaller query click entropy value indicates more agreement between users on clicking a small number of web pages
Temporal Latent Topic User Profiles for Search Personalisation
Query Positions in Search Session
34
Aim to study whether the position of a query has any effect on the performance of the temporal latent topic profiles
Label the queries by their positions during the search
Temporal Latent Topic User Profiles for Search Personalisation35
FootballLawHealthOS
0.510.330.110.05
Clicked documents
FootballHealthOSLaw
0.550.270.130.05
LawOSHealthFootball
0.410.370.120.10
OSLawFootballHealth
0.650.150.110.09
Distribution over topics
FootballLawOSHealth
0.320.290.280.11
Means over topics
The topic-based user profile
Temporal Latent Topic User Profiles for Search Personalisation
Re-ranking search results (1)
36
Query: MU