Download - Slides SRS 2012
Folksonomy-Based Adaptive Query Expansion
Claudio Biancalana, Fabio Gasparetti, Alessandro Micarelli, Alfonso Miola, and Giuseppe Sansonetti
Department of Computer Science and Automation
Artificial Intelligence Laboratory, Roma Tre University
Via della Vasca Navale, 79, 00146 Rome, Italy
SRS 2012 – Montreal, Canada, July 17, 2012
State of the Art • 1993 - Web Search Engines
• Popular techniques to improve their performance Explicit Relevance Feedback and (Automatic) Query Expansion
(Maron Kuhns 1960, Rocchio 1971)
PageRank (1998)
(Implicitly built) User Profiles (2004)
• e.g., Google Personalized
Exploiting Social Networks or Signals (2010)
• Facebook, YouTube, Twitter
Implicitly Understanding User Actions
SRS 2012 – Montreal, Canada, July 17, 2012
SRS 2012 – Montreal, Canada, July 17, 2012
Query Expansion
The process of expanding a user query with additional related words and phrases
Original Query Q: {q1, q2,…, qk, qk+1,…, qn}
Terms to Add Q+: {e1, e2,..., em}
Terms to Remove Q-: {qk+1,..., qn}
Expanded Query
EQ = (Q U Q+) - Q-
{q1,q2,...,qk,e1,e2,...,em}
SRS 2012 – Montreal, Canada, July 17, 2012
Building a Co-Occ Matrix
For each document, a co-occurrence matrix is generated and then summed up in a single matrix
Usually a POS tagger extracts nouns, proper nouns, and adjectives
t1
t2
t3
t4
t5
t1 t2 t3 t4 t5
0.0
0.0
0.0
0.0
0.0
2.0
0.0
0.0
0.0
0.0
1.0
9.0
1.0
4.0
1.0
2.0
2.0
1.0
0.0
3.0
3.0
4.0 0.0
2.0
9.0
Co-Occurrence Matrix
SRS 2012 – Montreal, Canada, July 17, 2012
Limits of Co-Occ Matrices
• Furnas’ Vocabulary problem (1987) Polysemy and Homonym
• Mouth (river-sea; cave entrance; body part)
• River Bank or Financial Bank
• Corpus-dependent Small corpora contain few statistics
Relevant concepts missing
SRS 2012 – Montreal, Canada, July 17, 2012
Research Question
Is it possible to combine Query Expansion, Social Web, Semantic Search, and User Personalization in traditional Web search tools?
SRS 2012 – Montreal, Canada, July 17, 2012
Nereau, Master of Spiders, is the name of a divinity worshipped in the Nauru islands, in Micronesia. It is a foremost figure in many myths, some of which give it a specific role, that of endowing the mad with rationality and the mute with speech, thus making them complete human beings.
Nereau
SRS 2012 – Montreal, Canada, July 17, 2012
Nereau Co-Occurrence Matrix • Extension of Co-occurrence matrix:
Semantic meta-data as 3rd dimension
The user matrix is built on usage data
• Use of Social Bookmarking Services for metadata retrieval: e.g., delicious, StumbleUpon, Digg
SRS 2012 – Montreal, Canada, July 17, 2012
• Tags associated with visited URLs are collected and associated to the (stemmed) keywords from extracted content.
• Each co-occ matrix is associated to a tag
<t1, t2, tag, co-occ>
Nereau Co-Occurrence Matrix
SRS 2012 – Montreal, Canada, July 17, 2012
• The expansion follows similar steps: each term of the query retrieves multiple co-occ matrices associated to different tags
• The occs of the tags are summed up over all the query terms obtaining a weighted set:
Nereau Co-Occurrence Matrix
SRS 2012 – Montreal, Canada, July 17, 2012
The co-occ keywords associated to the most relevant tags compose the new query
Nereau Co-Occurrence Matrix
SRS 2012 – Montreal, Canada, July 17, 2012
Experimental Evaluations
Does it work?
SRS 2012 – Montreal, Canada, July 17, 2012
• Three kinds of evaluations TREC corpus-based (500K docs, 249 queries)
• RF vs CoOcc vs Google vs Nereau
ODP corpus-based
• RF vs CoOcc vs Google vs Nereau
Web user-based
• Google vs PersGoogle vs RF vs Nereau
• nDCG, P@n, MAP
Experimental Evaluations
Web corpus
Web
cor
pus
42 users on real Web ���sessions nDCG@{1,5,10}
SRS 2012 – Montreal, Canada, July 17, 2012
Experimental Evaluations
Conclusions and Future Work • A Nereau search engine that combines:
Traditional Query Expansion
Social Web
Semantic Spaces
Basic User Personalization
• Suitable to be included in traditional search engines;
Complexity O(n2K)
n = training docs
K = keywords extracted
• Future Work
Including more social data (e.g., networks, user authority)
Addressing the dynamic of folksonomies
Automatically assign tags when no social data is available
SRS 2012 – Montreal, Canada, July 17, 2012