social book search: a combination of personalized recommendations and retrieval
DESCRIPTION
TRANSCRIPT
![Page 1: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/1.jpg)
Social Book Search:A Combination of Personalized Recommendations and Retrieval
Author:Justin van WeesSupervisor:Marijn KoolenSecond assessor:Frank Nack
Master ThesisInformation Science
Human Centered MultimediaAugust 23, 2012
![Page 2: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/2.jpg)
Outline
1. Background
2. Research questions
3. Data collection
4. Experiments and results
5. Conclusions
6. Discussion and future work
7. Questions
![Page 3: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/3.jpg)
Current situation
• Traditional information retrieval (IR) models:
• developed for use on small collections
• contain only officially published documents, annotated by professionals
• Many modern web (2.0) applications still use traditional models for search
• Millions of documents
• Combination of user-generated content (UDG) and professional metadata
![Page 4: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/4.jpg)
Current situation
• User uses IR system to find those documents that are topically relevant to her information need
• Queries can lead to thousands of relevant documents
• Evaluating large number of results expensive for user
• Other notions of relevance, i.e. how well-written, popular, recent, fun is the document
• Combination of professional and user-generated metadata
![Page 5: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/5.jpg)
Social Book Search Track
• Evaluate relative value of controlled book metadata versus social metadata (Koolen et al., 2012)
• Amazon.com and LibraryThing (LT) corpus
• ~2.8 million book records, both social and professional metadata
• book search requests from LT discussion forums as topics, suggestions by other users as relevance judgements
![Page 6: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/6.jpg)
![Page 7: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/7.jpg)
Recommender Systems
• Recommender Systems (RSs) suggest items of interest to individuals or groups of users (Resnick and Varian, 1997)
• Assumes that individual’s taste or interest in a particular item can be explained by features recorded by the RS (demographics, previous interactions, etcetera)
• Different strategies: collaborative filtering (CF), content-, community-, knowledge-based, hybrid (Burke, 2007)
• Differs from traditional retrieval in terms of query formulation, source of relevance feedback and personalization (Furner, 2002)
![Page 8: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/8.jpg)
Research Questions
• What data are we able to collect?
• Can we automatically make accurate predictions of a user’s preference for an unknown book?
• How do we combine results from IR system with RSs?
• Social Book Search scenario and data
Does a combination of techniques from the field of IR with those from RSs improve retrieval performance when searching for works in a large scale on-line collaborative media catalogue?
![Page 9: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/9.jpg)
Crawling LibraryThing
• Perform four different crawls of user profiles and personal catalogues
• For each crawl, also crawl links to other profiles
• Compare crawls to determine representativeness for entire LT user-base
• All crawl combined approximately 6% of LT userbase
Crawl seed list profiles unique works profile overlap
Forum users 1,104 60,131 4,354,387
Random – 211 works 1,306 8,040 2,537,065 7,048
Random – 1,000 works 5,577 18,381 3,580,296 14,262
Random – 10,000 works 35,671 64,379 5,122,848 37,300
Total - 89,693 5,299,399 -
![Page 10: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/10.jpg)
Crawling LibraryThing
Crawl min. max. median mean std. dev.
Forum users
Friends 0 172 3.0 8.47 16.31
Groups 0 10 9.0 6.79 3.74
Interesting Libraries 0 510 2.0 11.19 26.46
Random – 211 works
Friends 0 79 0.0 2.61 7.46
Groups 0 10 0.0 1.70 3.05
Interesting Libraries 0 394 0.0 3.30 17.80
Random – 1,000 works
Friends 0 84 0.0 2.18 6.07
Groups 0 10 0.0 1.64 3.02
Interesting Libraries 0 574 0.0 2.74 14.41
Random – 10,000 works
Friends 0 2,858 0.0 1.73 17.49
Groups 0 10 0.0 1.24 2.61
Interesting Libraries 0 855 0.0 1.69 10,40
Total
Friends 0 2,858 1.0 2.14 12.77
Groups 0 10 0.0 1.18 2.44
Interesting Libraries 0 855 0.0 1.27 8.00
![Page 11: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/11.jpg)
Crawling LibraryThing
Crawl min. max. median mean std. dev. sum
Forum users
Unrated 0 28,402 84.0 397.22 929.70 23,885,23
Rated 0 12,190 3.0 78.80 238.53 4,738,018
Total 0 28,402 148.00 476.02 980.88 28,623,249
Random – 211 works
Unrated 0 28,402 458.00 1,112.81 1,835.81 8,946,997
Rated 0 12,190 10.00 182.77 472.08 1,469,531
Total 0 28,402 657.00 1,295.58 1,908.65 10,416,528
Random – 1,000 works
Unrated 0 28,402 331.00 864.32 1,480.98 15,887,025
Rated 0 12,190 3.00 130.20 369.06 2,393,233
Total 0 28,402 475.00 994.52 1539.15 18,280,258
Random – 10,000 works
Unrated 0 28,402 163.00 486.63 955.86 31,328,971
Rated 0 12,190 1.00 74.04 237.01 4,766,750
Total 0 28,402 201.00 560.68 1,000.50 36,095,721
Total
Unrated 0 28,402 102.00 378.18 834.94 33,920,353
Rated 0 12,190 1.00 62.85 206.40 5,637,097
Total 0 28,402 156.00 441.03 876.76 39,557,450
![Page 12: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/12.jpg)
Generating Recommendations
• Collaborative filtering approach
• Unary and rated transactions
• Memory- and model-based recommenders
• Randomly split transactions (80% train/20% test) for performance evaluation
![Page 13: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/13.jpg)
Generating Recommendations
• Neighbourhood (Desrosiers and Karypis, 2011):
• Directly use user-item ratings to predict ratings for ‘unseen’ items
• Find n most similar neighbours (Pearson correlation)
• Use the weighted average rating given by the user’s neighbours
• Let neighbours ‘vote’ on unary transactions
![Page 14: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/14.jpg)
Generating Recommendations
• Singular Value Decomposition (SVD) (Schafer et al., 2007):
• Reduce domain complexity by mapping item space to k dimensions
• Remaining dimensions represent the latent topics: preferences classes of users, categorical classes of items
• Currently considered ‘state of the art’
![Page 15: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/15.jpg)
Recommender Performance
Method MAE RMSE P@5 P@10 P@50
Neighbourhood (N=25) 0.7813 1.0286 0.0712 0.0661 0.0614
Neighbourhood (N=50) 0.7721 1.0105 0.0376 0.0371 0.0339
Neighbourhood (N=100) 0.7633 0.9927 0.0246 0.0239 0.0232
SVD (K=50) 0.6210 0.8139 0.0021 0.0019 0.0026
SVD (K=100) 0.6203 0.8131 0.0025 0.0022 0.0028
SVD (K=150) 0.6192 0.8122 0.0281 0.0107 0.0030
Method Accuracy P@5 P@10 P@50
Neighbourhood (N=25) 0.2430 0.3711 0.2425 0.1829
Neighbourhood (N=50) 0.3014 0.3824 0.2561 0.1861
Neighbourhood (N=100) 0.3621 0.3640 0.2422 0.1812
SVD (K=50) 0.2240 0.0214 0.0198 0.0216
SVD (K=100) 0.2601 0.0219 0.0203 0.0229
SVD (K=150) 0.2676 0.0424 0.0212 0.0234
![Page 16: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/16.jpg)
Retrieving Works
• Setup used for INEX 2012; top performing run
• Index consists of user-generated content
• Removed stopwords
• Stemming with Krovetz
• Topic titles as queries
• Language model
• Pseudo relevance feedback, 50 terms of top 10 results
![Page 17: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/17.jpg)
Combining IR and RS
• Retrieval system: ranked list, probability score between 0 and 1 per work
• Recommendations: estimated preference of user for work between 0.5 and 5.0 or 0 or 1 (unary)
• Normalise ratings
• ‘Boost’ works with estimated preference, CombSUM (Fox and Shaw, 19994)
• Use average rating when no prediction can be made
• Introduce weight (λ) between systems
![Page 18: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/18.jpg)
Results
Method � nDCG@10 P@10 R@10
Baseline - 0.1437 0.1219 0.1494
Neighbourhood
Rated (n=25) 0.0001700 0.1709 (18.93%) 0.1490 (22.23%) 0.1899 (27.11%)
Rated (n=50) 0.0001855 0.1778 (23.73%) 0.1500 (23.05%) 0.1913 (28.05%)
Rated (n=100) 0.0001800 0.1669 (16.14%) 0.1490 (22.23%) 0.1878 (25.70%)
Unary (n=25) 0.0001500 0.1446 (0.63%) 0.1229 (0.82%) 0.1520 (1.74%)
Unary (n=50) 0.0001500 0.1441 (0.28%) 0.1229 (0.82%) 0.152 (1.74%)
Unary (n=100) 0.0001500 0.1441 (0.28%) 0.1229 (0.82%) 0.152 (1.74%)
SVD
Rated (K=50) 0.0001800 0.1718 (19.55%) 0.149 (22.23%) 0.1866 (24.9%)
Rated (K=100) 0.0001850 0.1721 (19.76%) 0.149 (22.23%) 0.1866 (24.9%)
Rated (K=150) 0.0001850 0.172 (19.69%) 0.149 (22.23%) 0.1866 (24.90%)
Unary (K=50) 0.0001500 0.1449 (0.84%) 0.124 (1.72%) 0.1541 (3.15%)
Unary (K=100) 0.0001550 0.1441 (0.28%) 0.1229 (0.82%) 0.1520 (1.74%)
Unary (K=150) 0.0001550 0.1424 (-0.9%) 0.1250 (2.54%) 0.1561 (4.48%)
![Page 19: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/19.jpg)
Conclusions
• Collected representative sample of user profiles
• Collaborative filtering obvious choice
• SVD best at estimating rated preference
• Poor performance on unary transactions
• Successfully combined retrieval with personalized recommendations
• Rated transactions most useful
• Personal preference is relevance evidence that can highly improve retrieval performance in SBS
![Page 20: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/20.jpg)
Discussion and Future Work
• Popularity as relevance evidence
• Value of λ depending on IR score distribution
• Other (mixtures of) RS setups
• Scaling, cold-start problems
• Trust and transparency of the system
![Page 21: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/21.jpg)
Questions?
![Page 22: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/22.jpg)
References• R. Burke. Hybrid web recommender systems. In The adaptive web, pages 377–408. Springer-Verlag, 2007.
• C. Desrosiers and G. Karypis. A comprehensive survey of neighborhood-based recommendation methods. Recommender
Systems Handbook, pages 107–144, 2011.
• E. Fox and J. Shaw. Combination of multiple searches. NIST SPECIAL PUBLICATION SP, pages 243–243, 1994.
• J. Furner. On recommending. Journal of the American Society for Information Science and Technology, 53(9):747–763, 2002.
• M. Koolen, G. Kazai, J. Kamps, A. Doucet, and M. Landoni. Overview of the INEX 2011 books and social search track. In S. Geva, J.
Kamps, and R. Schenkel, editors, Focused Retrieval of Content and Structure: 10th International Workshop of the Initiative for the
Evaluation of XML Retrieval (INEX 2011), volume 7424 of LNCS. Springer, 2012.
• P. Resnick and H. Varian. Recommender systems. Communi- cations of the ACM, 40(3):56–58, 1997.
• J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen. Collaborative Filtering Recommender Systems. Inter- national Journal of Electronic
Business, 2(1):77, 2007. ISSN 14706067. doi: 10.1504/IJEB.2004.004560. URL http://www.springerlink.com/index/
t87386742n752843.pdf.
![Page 23: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/23.jpg)
Number of Books in Catalogue
(a) Unrated works (b) Rated works
![Page 24: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/24.jpg)
Document scoring
S(d) = (1� �)PRet(d|q) + �PCF (d)
• PRet(d|q): work’s score obtained through IR system
• PCF : estimated rating of current user for work obtained through RS
• �: weight between systems
![Page 25: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/25.jpg)
Estimating preference (rated)
r̂ui =
Pv2Ni(u)
wuvrvi
Pv2Ni(u)
|wuv|
• r̂ui: estimated preference of user u for item i
• wuv: preference similarity between users v and u
• Ni(u): k-NN of u that rated item i
Desrosiers and Karypis, 2011
![Page 26: Social Book Search: A Combination of Personalized Recommendations and Retrieval](https://reader034.vdocuments.mx/reader034/viewer/2022052618/549843c1b479595b4d8b5417/html5/thumbnails/26.jpg)
Estimating preference (unary)
vir =X
v2Ni(u)
�(rvi = r)Wuv
Desrosiers and Karypis, 2011