penguins in sweaters, or serendipitous entity search on user-generated content
DESCRIPTION
In many cases, when browsing the Web users are searching for specic information or answers to concrete questions. Sometimes, though, users find unexpected, yet interesting and useful results, and are encouraged to explore further. What makes a result serendipitous? We propose to answer this question by exploring the potential of entities extracted from two sources of user-generated content - Wikipedia, a user-curated online encyclopedia, and Yahoo! Answers, a more unconstrained question/answering forum - in promoting serendipitous search. In this work, the content of each data source is represented as an entity network, which is further enriched with metadata about sentiment, writing quality, and topical category. We devise an algorithm based on lazy random walk with restart to retrieve entity recommendations from the networks. We show that our method provides novel results from both datasets, compared to standard web search engines. However, unlike previous research, we find that choosing highly emotional entities does not increase user interest for many categories of entities, suggesting a more complex relationship between topic matter and the desirable metadata attributes in serendipitous search.TRANSCRIPT
Penguins in Sweaters,or Serendipi tous Ent i ty Search
on User-generated-Content
I l a r i a B o r d i n o , Ye l e n a M e j o v a , a n d M o u n i a L a l m a s
( Ya h o o L a b s )A C M I n t e r n a t i o n a l C o n f e r e n c e o n I n f o r m a t i o n a n d K n o w l e d g e
M a n a g e m e n t ( C I K M 2 0 1 3 )
O c t o b e r 2 9 t h , 2 0 1 3
Entity Search
we build an entity-driven serendipitous search system based on enriched entity networks extracted from Wikipedia and Yahoo! Answers
Serendipityfinding something good or useful while not specifically looking for it, serendipitous search systems provide relevant and interesting results
2
Why/when do penguins wear sweaters?
1. What connections between entities do web community knowledge portals offer?
WHAT
3
WHY2. How do they contribute to an interesting, serendipitous browsing experience?
Yahoo Answers vs Wikipedia
community-driven question & answer portal 67M questions & 262M
answers 2 years [2010/2011] English-language
community-driven encyclopedia• 3 795 865 articles• from end of December
2011• English Wikipedia
4
minimally curatedopinions, gossip, personal info
variety of points of view
curatedhigh-quality knowledgevariety of niche topics
Entity & Relationship Extraction
Entity: any concept having a Wikipedia page
Use an internal tool to
(1) identify surface forms,
(2) resolve to Wikipedia entities,
(3) rank entities using aboutness score;
W. Zhao, J. Jiang, J. Weng, J. He, E.P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. ECIR 2011.D. Paranjpe. Learning document aboutness from implicit user feedback and document structure. CIKM 2009.
5
Relationship: Cosine similarity of tf/idf vectors (concatenation of documents where entity appears)
Dataset Features
Sentiment› using SentiStrength compute positive & negative scores› compute attitude and sentimentality [Kucuktunc’12]
› Entity-level scores Quality
› Flesch Reading Ease score
Attitude (Polarity) Sentimentality (Strength) Readability
Topical Category– Yahoo Content Taxonomy
6
Dataset # Nodes # Edges # Isolated
Yahoo! Answers 896,799 112,595,138 69,856
Wikipedia 1,754,069 237,058,218 82,381
Wikipedia
7
Yahoo Answers
Retrieval
Wikipedia Yahoo! Answers Combined
Precision @ 5 0.668 0.724 0.744
MAP 0.716 0.762 0.782
Justin Bieber, Nicki Minaj, Katy Perry, Shakira, Eminem, Lady Gaga, Jose Mourinho, Selena Gomez, Kim Kardashian, Miley Cyrus, Robert Pattinson, Adele (singer), Steve Jobs, Osama bin Laden, Ron Paul, Twitter, Facebook, Netflix, IPad, IPhone, Touchpad, Kindle, Olympic Games, Cricket, FIFA, Tennis, Mount Everest, Eiffel Tower, Oxford Street, Nubcrburgring, Haiti, Chile, Libya, Egypt, Middle East, Earthquake, Oil spill, Tsunami, Subprime mortgage crisis, Bailout, Terrorism, Asperger syndrome, McDonal's, Vitamin D, Appendicitis, Cholera, Influenza, Pertussis, Vaccine, Childbirth
3 label per query-result pair
Yahoo! AnswersJon RubinsteinTimothy CookKane Kramer
Steve WozniakJerry York
WikipediaSystem 7
PowerPC G4SuperDrive
Power MacintoshPower Computing Corp.
Steve Jobs Annotator agreement (overlap): 0.85
Average overlap in top 5 results: 12%
8
Algorithm: Lazy Random walk with restart
Serendipity
“making fortunate discoveries by accident”
M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: evaluating recommender systems by coverage and serendipity. IRecSys 2010.
9
Serendipity = unexpectedness + relevance“Expected” result baselines from web search
Serendipity = interestingness + relevanceResult interestingness given the queryPersonal interest in result
P. Andre, J. Teevan, and S. T. Dumais. From x-rays to silly putty via uranus: Serendipity and its role in web search. SIGCHI 2009.
Baseline Data General High Read.
Top: 5 entities that occur most frequently
WP 0.63 (0.58) 0.56 (0.53)
in the top 5 search results provided by YA 0.69 (0.63) 0.71 (0.65)
Bing and Google Comb 0.70 (0.61) 0.68 (0.61)
Top –WP: same as above, but excluding WP 0.63 (0.58) 0.56 (0.54)
the Wikipedia page from the set of YA 0.70 (0.64) 0.71 (0.66)
results Comb 0.71 (0.64) 0.68 (0.63)
Rel: top 5 entities in the related query WP 0.64 (0.61) 0.57 (0.56)
suggestions provided by Bing and Google
YA 0.70 (0.65) 0.71 (0.66)
Comb 0.72 (0.67) 0.69 (0.65)
Rel + Top: union of Top and Rel WP 0.61 (0.54) 0.55 (0.51)
YA 0.68 (0.57) 0.69 (0.59)
Comb 0.68 (0.55) 0.66 (0.56)
| relevant & unexpected | / | unexpected |number of serendipitous results out of all of the unexpected results retrieved
| relevant & unexpected | / | retrieved |serendipitous out of all retrieved
10
User-perceived Quality
11
1. Which result is more relevant to the query?
2. If someone is interested in the query, would they also be interested in these results?
3. Even if you are not interested in the query, are these results interesting to you personally?
4. Would you learn anything new about the query?
Interestingness
Labelers provide pairwise comparisons between results; Combine into a reference ranking and Compare result ranking to optimal (Kendall’s tau-b)
12
Relevance (83%), Query interest (81%),
Personal interest (76%), Learning something new (81%)Agreement:
Interesting > Relevant
Oil Spill Sweaters for Penguins
WP
Robert Pattinson Water for Elephants WP
Egypt Ptolemaic Kingdom WP & YA
Relevant > Interesting Egypt Cairo Conference WP
Netflix Blu-ray Disc YAJ. Arguello, F. Diaz, J. Callan, and B. Carterette. A methodology for evaluating aggregated search results. ECIR 2011.
Data General +Topic
Which result is more WP 0.162 0.194
relevant to the query? YA 0.336 0.374
Comb 0.201 0.222
If someone is interested in WP 0.162 0.176
the query, would they also YA 0.312 0.343
be interested in the result? Comb 0.184 0.222
Even if you are not interested
WP 0.139 0.144
in the query, is the result YA 0.324 0.359
interesting to you personally?
Comb 0.168 0.198
Would you learn anything WP 0.167 0.164
new about the query from YA 0.307 0.346
this result? Comb 0.184 0.203
Similarity (Kendall’s tau-b) between result sets and reference ranking
Topicalcategoryconstraintpromote resultsof same topicas query entity
Sentiment andReadabilityconstraintshurt performance
13
1. What connections between entities do web community knowledge portals offer?
14
≠ANSWERS
>ANSWERS
2. How do they contribute to an interesting, serendipitous browsing experience?
What did we learn?
15 Yahoo Confidential & Proprietary
Thank you!!!!!