analyzing and predicting question quality in community question answering services baichuan li, tan...

Download Analyzing and Predicting Question Quality in Community Question Answering Services Baichuan Li, Tan Jin, Michael R. Lyu, Irwin King, and Barley Mak CQA2012,

If you can't read please download the document

Upload: eric-chapman

Post on 18-Jan-2018

214 views

Category:

Documents


0 download

DESCRIPTION

Question Quality 22/2/20163 Number of tag-of-interests Number of answers

TRANSCRIPT

Analyzing and Predicting Question Quality in Community Question Answering Services Baichuan Li, Tan Jin, Michael R. Lyu, Irwin King, and Barley Mak CQA2012, Lyon, France Introduction 22/2/20162 Question Quality 22/2/20163 Number of tag-of-interests Number of answers Motivation Question quality aects answer quality Low quality questions hinder the CQA services High quality questions promote the development of the community Identifying question quality facilitates question search and recommendation 22/2/20164 Outline Problem Definition Data Two Studies Factors Affecting Question Quality Prediction of Question Quality Discussion and Conclusion 22/2/20165 Problem Definition 22/2/20166 Figure 1. Construct of question quality in CQA Data Description 22/2/20167 Table 1. Summary of data in Entertainment & Music category and its subcategories Ground Truth 22/2/20168 NTA Table 2. Rule base for the ground truth setting RM Table 3. Summary of questions in four levels Level1234 Count53,80662,19269,83652,715 NTA: number of tag-of-interests + number of answers RM: reciprocal of the minutes for getting the best answer Study One: Factors Affecting Question Quality Possible Factors Process Select the two most popular subcategories (say, Music and Movies) and check their distributions of question quality Track askers with at least five questions in both these two subcategories 22/2/20169 Askers Topics Observations 22/2/ Table 4. Summary of question quality for different askers Observations 22/2/ Question Quality Study Two: Prediction of Question Quality Modeling the relationships among questions, topics and askers as a bipartite graph 22/2/ Asking Expertise Question Quality Mutual Reinforcement Label Propagation for Predicting Question Quality 22/2/ MRLP 22/2/ similar users asking expertise question quality asking expertise similar questions quality Data for Study Two 22/2/201615 Methods Comparison Logistic Regression LG_Q and LG_QA Stochastic Gradient Boosted Tree (Friedman, J. H., 1999) SGBT_Q and SGBT_QA Harmonic Function (Zhou et al., 2007) HF_Q and HF_QA 22/2/201616 Experimental Results: Accuracy 22/2/201617 Sensitivity & Specicity Sensitivity measures the algorithms ability to identify high quality questions Sensitivity = TP/(TP+FN) Specificity measures the algorithms ability to identify low quality questions Specificity = TN/(TN+FP) 22/2/201618 Experimental Results: Music 22/2/201619 Experimental Results: Movies 22/2/201620 Discussion MRLP is more effective in distinguishing high quality questions from low quality ones than state-of-the-art methods At present, neither MRLP nor other methods achieves satisfactory performance due to the influence of features 22/2/201621 Discussion Salient features? User study via crowdsourcing sytems 22/2/201622 Conclusion Define Question Quality in CQA Conduct two studies to investigate question quality in CQA services Analyze the factors inuencing question quality Propose a mutual reinforcement-based label propagation algorithm to predict question quality Future Work Explore more salient features Utilize question quality to improve question search and question recommendation 22/2/201623 Thank You! Q&A Data Description 238,549 resolved questions under the Entertainment & Music category of Yahoo! Answers Question Features Text, post time, etc. Asker Features Total points, No. of questions asked, No. of questions resolved, etc. 22/2/201625 MRLP 22/2/ For the question part of the bipartite graph, we create edges between any two questions within same topics: n n probabilistic transition matrix For the asker part of the bipartite graph, we generate the probabilistic transition matrix M similarly.