q&a over social networks kse 801 uichin lee. aardvark: the anatomy of a large- scale social...

37
Q&A over Social Networks KSE 801 Uichin Lee

Upload: keaton-nowell

Post on 16-Dec-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Q&A over Social Networks

KSE 801Uichin Lee

Page 2: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Aardvark: The Anatomy of a Large-Scale Social Search Engine

Damon Horowitz, Aardvark Sepandar D. Kamvar, Stanford University

WWW '10

Original slides by Hailong Sun, April 13, 2010

Page 3: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Web Search vs. Social Q&A

• Google is about traditional Web search – Give me keywords, I will provide contents– Search for the most suitable contents

• While Aardvark is about social search – Users can ask questions in natural language, not keywords – Content is generated “on-demand”, tapping the huge

amount of information in peoples’ heads (i.e., everyone knows something)

– The system is fueled by the goodwill of its users

Page 4: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Problem in Aardvark

• How to find the user who can best answer a given question?

SearchEngine

Page 5: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,
Page 6: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Aardvark Architecture

Social graph indexingUser’s topic parsing

Determine the appropriate topics for the question

2

3 Edit question?

1 Question?(Routing Suggestion Request) -- find a list of candidate answerers (and rank them)

3

Ask one by one

4

Page 7: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

VarkRank: Relevance and Connectedness

Topic

User

User

User

User

User

User

User

User

User

P(ui|t) P(ui|uj)Used for measuring relevance score of a user’s question (query dependent)

Used for measuring connectedness score between users (query

independent)

Question

Page 8: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

VarkRank: Relevance and Connectedness

• Given a question q, the probability that a user ui can answer it (relevance score)

• Score that user ui can answer a question from uj: (query dependent user’s query relevance score * query independent user connectedness score)– Query independent user quality score = p(ui|uj): i.e., user i delivers a

satisfying answer to user j (simply measured using connectedness)

Page 9: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

9

Relevance Scores (Expertise)• How to find experts on a given topic? • Expertise score: p(ui|t), and w/ Bayes’ law, we have:

• For each user, we need to profile a user’s interest in a given topic by using the following information sources: – 3+ topics provided by a user– Topics provided by friends of a user– Online profiles– Online unstructured data– Status message updates

Page 10: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Connectedness Scores

• Connection strengths between people i.e., p(ui|uj) are computed using a weighted cosine similarity over this feature set (normalized)

• Utilize existing social networks– Facebook, Twitter, LinkedIn…

• Feature set measures similarities in demographics/behavior– Social connection (common friends and affiliations)– Demographic similarity– Profile similarity (e.g., common favorite movies)– Vocabulary match (e.g., IM shortcuts)– Chattiness match (frequency of follow-up messages)– Verbosity match (the average length of messages)– Politeness match (e.g., use of “Thanks!”)– Speed match (responsiveness to other users)

Page 11: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Analyzing Questions: TopicMapper

• Question classification: – Question or not?– Inappropriate question?– Trivial question?– Location sensitive question?

• Map a question to a topic (weighted linear sum of the following features)– Keyword matches w/ a user’s profile topics? – Classifies the question text into a taxonomy of roughly 3000 popular

question topics (using an SVM trained on an annotated corpus of several million questions)

– Extracting salient phrases from questions and find semantically similar user topics

Page 12: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Ranking Algorithm• Topic and connectedness matching availability

• Routing Engine prioritizes candidate answerers – Optimize the chances that the present question will be answered– Yet, preserving the available set of answerers (i.e., the quantity of “answering

resource” in the system) as much as possible by spreading out the answering load across the user base

• Considering factors: currently online users (e.g., via IM presence data, iPhone usage, etc.), user’s daily activity history, and user’s response history (lowering scores of non-responsive users)

• Conversation Manager serially inquiring whether candidates would like to answer the present question; and iterating until an answer is provided and returned to the asker.

Page 13: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

User Interface Aardvark

IM

Email

Twitter

Page 14: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Deployment Status

• Aardvark is actively used– Users: from 2,272 to 90,361– 55.9% active users, 73.8% passive users– 3,167.2 questions/day– 3.1 queries/month

• Mobile users are particularly active– Average 3.6322 sessions/month– Comparison: Google– Desktop v.s. mobile users: 3– Mobile users: 5.68 sessions/month

Page 15: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Categories of Questions in Aardvark

• Questions are highly contextualized– Average query length: 18.6 words

• 2.2~2.9 for Web search– 45.3% are about context

• Questions often have a subjective element– What are the things/crafts/toys your children have made that made them

really proud of themselves?

Page 16: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Answers

• Answers– 57.2% received answers in less than 10 minutes – A question receives 2 answers averagely– The quality of answers are good

• 70.4% are “good”; 14.1% are “OK”; 15.5% are “bad”

Distribution of questions and answering times Distribution of questions and number of answers received

Page 17: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

17

Topic Distribution

• People are indexable– 97.7% have 3+ topics

Page 18: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Comparative Evaluation with Google

• Experimental setup– Randomly select a group of questions– Insert a tip “do you want to help Aardvark run an

experiment?”– Recording response time and quality of answers from

Google and Aardvark• Experimental results– Aardvark: 71.5% answered; rating: 3.93 (σ=1.23)– Google: 70.5% answered; rating: 3.07 (σ=1.46)

• Aardvark is more suitable for subjective questions

Page 19: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

What Do People Ask Social Networks?

Meredith Ringel Morris, MSRJaime Teevan, MSR

Katrina Panovich, MIT

Page 20: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

good restaurants in atlanta

http://www.yelp.com

Page 21: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,
Page 22: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,
Page 23: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Questions About People’s Questions

• What questions do people ask?– How are the questions phrased?– What are the question types and topics?– Who asks which questions and why?

• Which questions get answered?– How is answer speed and utility perceived?– What are people’s motivations for answering?

Page 24: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

What Is Known About Question Asking

• Collaborative search [Morris & Teevan]

• Searching v. asking [Evans et al.; Morris et al.]

• Expertise-finding [vark.com; White et al.; Bernstein et al.]

• Online question answering (Q&A) tools– Question type [Harper et al.: conversational v. informational]

– Response rate [Hseih & Counts: 80%]

– Response time [Zhang et al.: 9 hours; Hseih & Counts: 3 hours]

– Motivation [Raban & Harper; Ackerman & Palen; Beenan et al.]

Page 25: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Survey of Asking via Status Messages

• Survey content– Used a status message to ask a question?• Frequency of asking, question type, responses received• Provide an example

– Answered a status message question?• Why or why not?• Provide an example

• 624 participants– Focus on Facebook and Twitter behavior

Page 26: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Questions About People’s Questions

• What questions do people ask?– How are the questions phrased?– What are the question types and topics?– Who asks which questions and why?

• Which questions get answered?– How is answer speed and utility perceived?– What are people’s motivations for answering?

Page 27: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Questions: Phrasing

• Questions short (75 characters, 1 sentence)• 18.5% of phrased as a statement

I need a recommendation on a good all purpose pair of sandals.

• Often scoped– 1 out of 5 directed to “anyone”

Anyone know of a good Windows 6 mobile phone that won’t break the bank?

– Network subsetHey Seattle tweeps: Feel like karaoke on the Eastside tonight?

Page 28: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Questions: TypesType % Example

Recommendation 29% Building a new playlist – any ideas for good running songs?

Opinion 22% I am wondering if I should buy the Kitchen-Aid ice cream maker?

Factual 17% Anyone know a way to put Excel charts into LaTeX?

Rhetorical 14% Why are men so stupid?

Invitation 9% Who wants to go to Navya Lounge this evening?

Favor 4% Need a babysitter in a big way tonight… anyone??

Social connection 3% I am hiring in my team. Do you know anyone who would be interested?

Offer 1% Could any of my friends use boys size 4 jeans?

Page 29: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Questions: TopicsTopic % Example

Technology 29% Anyone know if WOW works on Windows 7?

Entertainment 17% Was seeing Up in the theater worth the money?

Home & Family 12% So what’s the going rate for the tooth fairy?

Professional 11% Which university is better for Masters? Cornell or Georgia Tech?

Places 8% Planning a trip to Whistler in the off-season. Recommendation on sites to see?

Restaurants 6% Hanging in Ballard tonight. Dinner recs?

Current events 5% What is your opinion on the recent proposition that was passed in California?

Shopping 5% What’s a good Mother’s Day gift?

Philosophy 2% What would you do if you had a week to live?

Missing: Health and Pornography

Religion, Politics, Dating, Finance

Page 30: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Questions: Who Asks WhatType

Recommendation

Opinion

Factual

Rhetorical

Invitation

Favor

Social connection

Offer

Topic

Technology

Entertainment

Home & Family

Professional

Places

Restaurants

Current events

Shopping

Philosophy

men

women

old

young

Twitter

Facebook

Page 31: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Questions: Motives for AskingTopic % Example

Trust 24.8% I trust my friends more than I trust strangers.

Subjective 21.5% Search engine can provide data but not an opinion.Thinks search would fail 15.2% I’m pretty search engine couldn’t answer a question of

that nature.Audience 14.9% Friends with kids, first hand real experience.

Connect 12.4% I wanted my friends to know I was asking the question.

Speed 6.6% Quick response time, no formalities.

Context 5.4% Friends know my tastes.

Tried search 5.4% I tried searching and didn’t get good results.

Easy 5.4% Didn’t want to look through multiple search results.

Quality 4.1% Human-vetted responses.

Page 32: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Questions About People’s Questions

• What questions do people ask?– How are the questions phrased?– What are the question types and topics?– Who asks which questions and why?

• Which questions get answered?– How is answer speed and utility perceived?– What are people’s motivations for answering?

Page 33: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Answers: Speed and Utility

• 94% of questions received an answer• Answer speed– A quarter in 30 minutes, almost all in a day– People expected faster, but satisfied with speed– Shorter questions got more useful responses

• Answer utility– 69% of responses helpful

Page 34: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Answers: Speed and UtilityType

Recommendation

Opinion

Factual

Rhetorical

Invitation

Favor

Social connection

Offer

Topic

Technology

Entertainment

Home & Family

Professional

Places

Restaurants

Current events

Shopping

Philosophy

Fast

Unhelpful No correlation

Page 35: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Answers: Motives for AnsweringMotive % Example

Altruism 37.0 Just trying to be helpful.

Expertise 31.9 If I’m an expert in the area.

Question 15.4 Interest in the topic.

Relationship 13.7 If I know and like the person.

Connect 13.5 Keeps my network alive.

Free time 12.3 Boredome/procrastination.

Social capital 10.5 I will get help when I need it myself.

Obligation 5.4 A tit-for-tat.

Humor 3.7 Thinking I might have a witty response.

Ego 3.4 Wish to seem knowledgeable.

Motives for Not Answering

- Don’t know the answer

- Private topic

- Question impersonal

Page 36: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

Answers About People’s Questions

• The questions people ask– Short, directed to “anyone”– Subjective questions on acceptable topics– Social relationships important motivators

• The questions that get answered– Fast, helpful responses, related to length and type– Answers motivated by altruism and expertise

Page 37: Q&A over Social Networks KSE 801 Uichin Lee. Aardvark: The Anatomy of a Large- Scale Social Search Engine Damon Horowitz, Aardvark Sepandar D. Kamvar,

QUESTIONS?Meredith Ringel MorrisJaime TeevanKatrina Panovich

M. R. Morris, J. Teevan, and K. Panovich. What Do People Ask Their Social Networks, and Why? A Survey Study of Status Message Q&A Behavior. CHI 2010.

M. R. Morris, J. Teevan, and K. Panovich. A Comparison of Information Seeking Using Search Engines and Social Networks. ICWSM 2010 (to appear).