to each his own: personalized content selection based on text comprehensibility
DESCRIPTION
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility. Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich , Bo Pang Source: WSDM ’12 Advisor: Dr. Jia -Ling Koh Speaker: Yi- Hsuan Yeh. Outline. Introduction Methodology Estimating text comprehensibility - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/1.jpg)
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility
Date: 2013/01/24Author: Chenhao Tan, Evgeniy Gabrilovich, Bo PangSource: WSDM ’12Advisor: Dr. Jia-Ling KohSpeaker: Yi-Hsuan Yeh
![Page 2: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/2.jpg)
2
Outline• Introduction• Methodology
• Estimating text comprehensibility• Generating pairwise preferences• Modeling user comprehensibility preferences • Combining the ranking
• Experiments• Web search• CQA
• Conclusion
![Page 3: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/3.jpg)
3
Introduction• Search engines mostly adopt the one-size-fits-all solution.
• A famous physician might also be a beginner cook, and would therefore prefer easy-to-understand cooking instructions.
The text comprehensibility level should match the user’s level of preparedness.
• Goal : • Use text comprehensibility as well as user’s preference to predict
future content choices and to rank results.
![Page 4: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/4.jpg)
4
Introduction
Rank texts by comprehensibility.
Modeling user comprehensibility preferences
Personalized ranking.
![Page 5: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/5.jpg)
5
Outline• Introduction• Methodology
• Estimating text comprehensibility• Generating pairwise preferences• Modeling user comprehensibility preferences • Combining the ranking
• Experiments• Web search• CQA
• Conclusion
![Page 6: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/6.jpg)
6
Estimating text comprehensibility• Comprehensibility classifier
• Output : the likelihood of the article being hard to read (comprehensibility score, Sc ) (0-1)
• Training data :• Simple English Wikipedia easy • English Wikipedia hard
• Feature • Basic English 850 word (TF, L2-normalize)
![Page 7: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/7.jpg)
7
Global threshold = 0.5
![Page 8: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/8.jpg)
8
• It is more meaningful to use Sc for comparing document within the same topic.
![Page 9: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/9.jpg)
9
Generating pairwise preferences1. Web search click log
1) CSA (Click > Skip)
2) LCSA (Last click > Skip above)
3) LCAA (Last click > All above)
Query : toyResult : l1, l2 , l3, l4, l5
Click : l2 , l4
2. Best answers in CQA forums• best answers > all the other
![Page 10: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/10.jpg)
10
Preference weighting1. Web search
ex : l4 >u l1 , w = 0.25
2. CAQw = 1/n (There are n answers for a question)
![Page 11: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/11.jpg)
11
Modeling user comprehensibility preferences
1. Topical-independent profile (BASIC)
n : number of preference pairs for each user
![Page 12: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/12.jpg)
12
• Weighted version
![Page 13: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/13.jpg)
13
2. Topic-dependent profile (TOPIC)
• Many users might have only read a few texts in each topic, or none at all in some topics
lack of topic-specific preference pairs
root
t1 t2 t17 ….
… …. …. .… t11 t12 t1n t17m
default
![Page 14: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/14.jpg)
14
3. Collaborative filtering (COLLABORATIVE)• Analyze the observed comprehensibility preferences over all pairs,
and predict comprehensibility preferences for unseen ones
G =
nu x nt
nu : the number of usersnt : the number of topicGij : the likelihood of user i preferring harder content in topic j (Pu)
• Maximum-margin matrix factorization : G UTV
Minimize
CofiRank
![Page 15: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/15.jpg)
15
Combining the ranking
• R(d) : the rank of document order by topic-relevance-based• Ru(d) : the rank of document order by the comprehensibility score (Sc)• β = 0.4
![Page 16: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/16.jpg)
16
Outline• Introduction• Methodology
• Estimating text comprehensibility• Generating pairwise preferences• Modeling user comprehensibility preferences• Combining the ranking
• Experiments• Web search• CQA
• Conclusion
![Page 17: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/17.jpg)
17
Search dataset• Yahoo! Web Search query logs• 2011, May
• Filter out navigational queries• Top 10 results for queries• Snippets higher Sc
• Domain-averaged Sc as the smoothed Sc for each page• Pages classified into 17 top-level nodes• 154,650,334 pages• 424,566 users
![Page 18: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/18.jpg)
18
Evaluation measures1. Average Click Rank
• Lower average clicked rank values indicate better performance
s : queryCs : the set of clicked Web pages for query sR(p) : the rank of page pS : query set
![Page 19: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/19.jpg)
19
2. Rank Scoring
• Larger rank scoring value indicates better performance
j : the rank of a page in the listα: half-life, the number of the page on the list such that there is a 50-55 chance the user will review that page. (α = 5)
1 if page j is clicked for the query s
0 otherwise
All clicked page appear at the top of ranked list
![Page 20: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/20.jpg)
20
3. User Saliency
![Page 21: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/21.jpg)
21
Experiments on search data
LCAA + weight + collaborative
![Page 22: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/22.jpg)
22
• Overall analysis over non-repeated queries
![Page 23: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/23.jpg)
23
LCAA + weight
![Page 24: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/24.jpg)
24
• Improvement by topic
![Page 25: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/25.jpg)
25
• The variance in Sc for search result of a query.
![Page 26: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/26.jpg)
26
• Percentage of users with Pu > 0.5
![Page 27: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/27.jpg)
27
Answer dataset• Yahoo! Answers• 2010, January – 2011, April
• Best answer was chosen by asker• 4.9 million questions, 39.5 million answers • 85,172 users
![Page 28: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/28.jpg)
28
Experiments on answers
• Lower values indicate better performance.
Random baseline : answers were ranked in random orderMajority baseline : rank answers in decreasing Sc
![Page 29: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/29.jpg)
29
Outline• Introduction• Methodology
• Estimating text comprehensibility• Generating pairwise preferences• Modeling user comprehensibility preferences• Combining the ranking
• Experiments• Web search• CQA
• Conclusion
![Page 30: TO Each His Own: Personalized Content Selection Based on Text Comprehensibility](https://reader035.vdocuments.mx/reader035/viewer/2022062501/56816191550346895dd12dc0/html5/thumbnails/30.jpg)
30
Conclusion• Modeling text comprehensibility can significantly improve
content ranking, in both Web search and a CQA forum.
• A comprehensibility classifier.
• Modeling user comprehensibility preferences.
• In future work, develop more sophisticated comprehensibility classifiers, and use demographic information (gender, age).