1 ranked-listed or categorized results in ir zheng zhu, ingemar j. cox, mark levene birkbeck...
TRANSCRIPT
1
Ranked-Listed or Categorized Results in IR
Zheng Zhu, Ingemar J. Cox, Mark LeveneBirkbeck College, University of London
UCL
2
Content
• Motivation• Methodology• Results• Conclusions
3
The motivation
• Improve navigational experience for both normal users and users of handheld devices.
• Intuitively, we would expect grouping documents to reduce search time.
4
Introduction
• We quantify the benefits of grouping documents based on classification.
• We study how the benefits of grouping degrade with classification errors.
• We take into account errors that arise from both the user and the classifier.
5
The methodology
• Three types of simulated user model:1. The user knows the class.2. The user doesn’t know the class.3. The user thinks he knows the class.
• Two classification scenarios:1. Correct classification
2. misclassification
6
The methodology
• To measure the benefits, we define:– class rank.– document rank.
• For ranked-list results, scroll rank is used
7
The Methodology
• For categorized results, based on different user models and operation scenarios, we define: – In-Class Rank(ICR), – Scrolled-Classification Rank(SCR),– Out-Class/Scroll-Class Rank(OSCR)– Out-Class/Revert Rank(ORR).
8
The methodology
querydoc1
doc2
doc3
doc4
doc5
doc6
doc7
doc1
doc2
doc3
doc4
doc5
doc6
doc7
Class1:
Class2:
SR=6
ICR=1+3=4
SCR=1+3=4
ORR=2+4+6=12
OSCR=2+4+4=10
9
The methodology
Simulated user/target
Correctly classified
misclassified
Knows class ICR OSCR or ORR
Does not know class
SCR or SR SCR or SR
Thinks knows class
OSCR or ORR OSCR or ORR
10
The methodology
• Known-Item Search (Target Testing), followed by comparison of the ranks.
• Given a document, we generate a query so that the target document appears within a designated range of scroll rank.
11
The implementation
• Open Directory Project provides an oracle for classification so that we can control both user and machine error.
• Search Engine is based on Lucene, which is an open source tool.
12
The ideal case with an Oracle
13
KNN Classifier
14
More realistic scenario
15
Conclusions
• Classification-based display can improve users’ interaction with SE
• However, this depends on the user strategy:– The hybrid strategy has the best
performance.– Using a hybrid strategy, performance
degrades gracefully with errors
16
Thanks!
17
18
19
Reference
• Kummamuru, K., Lotlikar, R., Roy, S., Singal, K., Krishnapuram, R.: A hierarchi-cal monothetic document clustering algorithm for summarization and browsing search results. In: Proceedings of the 13th International Conference on World Wide Web, pp. 658–665 (2004)
• Chen, H., Dumais, S.: Bring order to the web: Automatically categorizing search results. In: CHI 2000: Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 145–152. ACM Press, New York (2000)
• http://www.useit.com/alertbox/reading_pattern.html