chunyi peng, zaoyang gong, guobin shen microsoft research asia hotweb 2006 measurement and modeling...

Chunyi Peng, Zaoyang Gong, Guobin shenMicrosoft Research AsiaHotWeb 2006

MEASUREMENT AND MODELING OF A WEB-BASED QUESTION ANSWERING SYSTEM

Solve it yourself! – Ooh, out of our scope! Usually, Search it! –A common and good

way in many cases, but Search engine typically returns pages of links,

not direct answers. Some time it is very difficult for people to

describe their questions in a precise way. not all information is readily available in the

web. So, Ask! –A natural and effective way

Question-Answering (QA) utilizes grassroots intelligence and collaboration

Especially as a specific information acquisition.

When you have a question…

So, our goals…

Measurement and modeling o f a real large-scale QA system how a real QA system works? What are the typical user behaviors and

their impacts? Seek Better QA system

How to design a QA system? How to make performance tradeoffs?

iAsk (http://iask.sina.com.cn)

A topic-based web-QA system Question lifecycle:

questioning->wait for reply -> confirmation (closed)

Provide optimal reply selection & reply rewarding

Measurement Results

Data Set 2-month (Nov 22, 2005 to Jan 23, 2006) 350K questions and 2M replies 220K users, 1901 topics

Measurement on Question/reply patterns over time Question/reply pattern over topics Question/reply pattern across users Question/reply Incentive mechanisms

Behavior Pattern over Time

On Hourly Scale: a consistent usage pattern

Behavior Pattern over Topics Topic characteristics

P--Popularity (#Q) (Zipf-Popularity) questioning and replying activities

Q--Question Proneness (#Q/#U) the likelihood that a user will ask a question

R-- Reply Proneness (#R/#U) the likelihood that a user will reply a question

Our measurement shows that topic characteristics vary intensively and user behaves quite differently.

Behavior Pattern across Users Active and non-active users

about 9% users to 80% replies VS.about 22% users to 80% questions

asymmetric questioning/replying pattern 4.7% altruists VS. 17.7% free-riders

Narrow user interests #topic (Q): 1.8 #topic (R): 3.3

Performance Metric

Reply-Rate how likely his question can be replied

Reply-Number How likely his question can get an expected

answer

Reply-Latency how quickly he can get an answer

iAsk performance

Long-term performance: Reply-Rate: 99.8% Reply-Number: about 5 Reply-Latency: about 10hr

Within 24hrs Reply-Rate: 85% Reply-Number: about 4 Reply-Latency: about 6hr

In summary, the performance is quite satisfactory except sometimes users need tolerate a relative long delay

Measurement on Incentive Mechanism

Modeling

The question arrival distribution: Poisson distribution

The reply behavior: an approximate exponentially-decaying model

Performance formula Define dynamic performance

Parameter Impact

Possible Improvement

Active or Push-based Question Delivery Better Webpage Layout, e.g. adding

shortcuts Better Incentive mechanism Utilize Power of Social Networks

Conclusions

Web-QA that leverages the grassroots’ intelligence and collaboration is hot and getting hotter…

Our measurement and model revealed that the QA’s QoS heavily depends on three key factors: user scale, user reply probability and a system design artifact, e.g. webpage design.

Current simple Web-QA System achieved the acceptable performance, but there still is improvement room

Backup

P--Popularity (#Q) (Zipf-Popularity)

P--Popularity (#Q), Zipf-Popularity Q--Question Proneness (#Q/#U) R-- Reply Proneness (#R/#U)

chunyi peng, zaoyang gong, guobin shen microsoft research asia hotweb 2006 measurement and modeling...

system slide

reply behavior

u slide

collaboration slide

scope slide

answer slide

dynamic performance

backup slide

Documents

shen disorders

september 21, 2010: chunyi lin - season 17 encore healing...

chunyi peng guobin(jacky) shen, yongguang zhang, yanlin li, ...

the shen touch - shen therapy...

ben shen sufletele entităţilor viscerale /les ben shen ou...

dialogues with the masters chunyi lin, m.a., q.m. · pdf...

cse3461/5461 · cse3461/5461:! computer!networking!...

chunyi peng guobin(jacky) shen, yongguang zhang, yanlin li,...

ol shen hol

shen atlas light

anish arora ten h. lai chunyi peng ness shroff prasun...

use it free: instantly knowing your phone attitude pengfei...

shen atlas

shen presentation

cellular networks: overview...agenda •...

epsilon: a visible light based positioning system -...

shen games

viri: view it right - pan hupanhu.me/pdf/viri.pdfviri: view...

shen and hun.docx

beepbeep: a high accuracy acoustic ranging system using...