faitcrowd: fine grained truth discovery for crowdsourced data aggregation fenglong ma 1, yaliang li...
TRANSCRIPT
![Page 1: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/1.jpg)
1/61
FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation
Fenglong Ma1, Yaliang Li1, Qi Li1, Minghui Qiu2,
Jing Gao1, Shi Zhi3, Lu Su1, Bo Zhao4, Heng Ji5, Jiawei Han3
Presenter: Jing Gao1SUNY Buffalo; 2Singapore Management University; 3University of Illinois Urbana-Champaign; 4LinkedIn;
5Rensselaer Polytechnic Institute
![Page 2: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/2.jpg)
Which of these square numbers also happens to be the sum of two smaller numbers?
16 25
36 49
https://www.youtube.com/watch?v=BbX44YSsQ2I
A B C D
50%
30%19%
1%
![Page 3: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/3.jpg)
3
A Straightforward Aggregation Method
• Voting/Averaging– Take the value that is claimed by majority of the
sources (users)– Or compute the mean of all the claims
![Page 4: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/4.jpg)
Which of these square numbers also happens to be the sum of two smaller numbers?
16 25
36 49
https://www.youtube.com/watch?v=BbX44YSsQ2I
A B C D
50%
30%19%
1%
![Page 5: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/5.jpg)
5
A Straightforward Aggregation Method
• Voting/Averaging– Take the value that is claimed by majority of the
sources (users)– Or compute the mean of all the claims
• Limitation– Ignore source reliability (user expertise)
• Source reliability– Is crucial for finding the true fact but unknown
![Page 6: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/6.jpg)
6
Source 1 Source 2 Source 3 Source 4 Source 5
Aggregation
Object
![Page 7: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/7.jpg)
7/61
Truth Discovery
• Principle– To learn users’ reliability degree and discover
trustworthy information (i.e., the truths) from conflicting data provided by various users on the same object.
• A user is reliable if it provides many pieces of true information
• A piece of information is likely to be true if it is provided by many reliable users
![Page 8: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/8.jpg)
8/61
Existing Work on Truth Discovery
• Existing methods– Assign single expertise (reliability degree) to each
user (source).E
xper
tise
Barack Obama
Albert Einstein
Michael Jackson
![Page 9: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/9.jpg)
Example--Existing Truth Discovery Methods
• Input– Question Set – User Set – Answer Set
• Output– Users’ Expertise– Truths
User u1 u2 u3
Expertise 5.00E-11 0.961 3.989
Question q1 q2 q3 q4 q5 q6
Truth 1 2 2 2 1 2
QuestionUser
u1 u2 u3q1 1 2 1q2 2 1 2q3 1 2 2q4 1 2 2
q5 2 1
q6 1 2 2
Question q1 q2 q3 q4 q5 q6
Ground Truth 1 2 1 2 1 2
![Page 10: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/10.jpg)
10/61
Overview of Our Work
• Goal– To learn fine-grained (topical-level) user expertise
and the truths from conflicting crowd-contributed answers.
Politics
Physics
Music
![Page 11: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/11.jpg)
Example--Our Model
• Input– Question Set – User Set – Answer Set– Question Content
• Output– Questions’ Topic– Topical-Level
Users’ Expertise– Truths Question q1 q2 q3 q4 q5 q6
Truth 1 2 1 2 1 2
QuestionUser
Wordu1 u2 u3
q1 1 2 1 a b
q2 2 1 2 b c
q3 1 2 2 a c
q4 1 2 2 d e
q5 2 1 e f
q6 1 2 2 d f
Question q1 q2 q3 q4 q5 q6
Ground Truth 1 2 1 2 1 2
User u1 u2 u3
ExpertiseK1 2.34 2.70E-4 1.00K2 1.30E-4 2.34 2.35
Topic Question
K1 q1 q2 q3
K2 q4 q5 q6
![Page 12: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/12.jpg)
FaitCrowd Model
• Overview
qmw qz qua qbqM QqN
K
eUK
u
2
qt q
'2'
'
qmy
Input Output HyperparameterIntermediate
Variable
Modeling Content Modeling Answers
– Jointly modeling question content and users’ answers by introducing latent topics.
– Modeling question content can help estimate reasonable user reliability, and in turn, modeling answers leads to the discovery of meaningful topics.
– Learning topic-level user expertise, truths and topics simultaneously.
![Page 13: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/13.jpg)
Modeling Question Content
• Word Generation– Assume that each question is about
a single topic (the length of each question is short).
• Draw a topic indicator
qmw qz qua qbqM QqN
K
eUK
u
2
qt q
'2'
'
qmy
![Page 14: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/14.jpg)
Modeling Question Content
• Word Generation– Assume that each question is about
a single topic (the length of each question is short).
• Draw a topic indicator
– Assume that a word can be drawn from topical word distribution or background word distribution.
• Draw a word category
qmw qz qua qbqM QqN
K
eUK
u
2
qt q
'2'
'
qmy
![Page 15: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/15.jpg)
Modeling Question Content
• Word Generation– Assume that each question is about
a single topic (the length of each question is short).
• Draw a topic indicator
– Assume that a word can be drawn from topical word distribution or background word distribution.
• Draw a word category
• Draw a word
qmw qz qua qbqM QqN
K
eUK
u
2
qt q
'2'
'
qmy
![Page 16: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/16.jpg)
Modeling Answers
• Answer Generation– The correctness of a user’s answer
may be affected by the question’s topic, user’s expertise on the topic and the question’s bias.
• Draw user’s expertiseqmw qz qua qb
qM QqN
K
eUK
u
2
qt q
'2'
'
qmy
![Page 17: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/17.jpg)
Modeling Answers
• Answer Generation– The correctness of a user’s answer
may be affected by the question’s topic, user’s expertise on the topic and the question’s bias.
• Draw user’s expertise
• Draw the truth
qmw qz qua qbqM QqN
K
eUK
u
2
qt q
'2'
'
qmy
![Page 18: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/18.jpg)
Modeling Answers
• Answer Generation– The correctness of a user’s answer
may be affected by the question’s topic, user’s expertise on the topic and the question’s bias.
• Draw user’s expertise
• Draw the truth
• Draw the bias
qmw qz qua qbqM QqN
K
eUK
u
2
qt q
'2'
'
qmy
![Page 19: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/19.jpg)
Modeling Answers
• Answer Generation– The correctness of a user’s answer
may be affected by the question’s topic, user’s expertise on the topic and the question’s bias.
• Draw user’s expertise
• Draw the truth
• Draw the bias
• Draw a user’s answer
qmw qz qua qbqM QqN
K
eUK
u
2
qt q
'2'
'
qmy
![Page 20: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/20.jpg)
Inference Method
• Gibbs-EM– Gibbs sampling to learn the hidden variables and .– Gradient descent to learn hidden factors and .
qmw qz qua qbqM QqN
K
eUK
u
2
qt q
'2'
'
qmy
![Page 21: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/21.jpg)
Datasets & Measure
• Datasets– The Game Dataset
• Collected from a crowdsourcing platform via an Android App based on a TV game show “Who Wants to Be a Millionaire”.
• 2,103 questions, 37,029 sources, 214,849 answers and 12,995 words
– The SFV Dataset• Extracted from Slot Filling Validation (SFV) task of the NITS Text Analysis
Conference Knowledge Base Population (TAC-KBP) track.• 328 questions, 18 sources, 2,538 answers and 5,587 words
• Measure– Error Rate
• The lower the better
![Page 22: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/22.jpg)
Baseline Methods
• Basic Method– MV
• Truth Discovery– Truth Finder– AccuPr– Investment– 3-Estimates– CRH– CATD
• Crowdsourcing– D&S– ZenCrowd
• Variations of FaitCrowd– FaitCrowd-b– FaitCrowd-b-g
![Page 23: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/23.jpg)
Performance Validation
• Analysis– For easy questions (from Level 1 to Level 7), all
the methods can estimate most answers correctly.
– For difficult questions (from Level 8 to Level 10) , the performance of FaitCrowd is much better than that of the baseline methods.
– FaitCrowd performs well on both Game and SFV datasets.
Table 1: Performance on the Game Dataset.
Table 2: Performance on the SFV Dataset.
![Page 24: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/24.jpg)
Model Validation
• Goal– Illustrate the importance of joint modeling
question content and answers by comparing with the method that conducts topic modeling and true answer inference separately.
• Explanation– Dividing the whole dataset into sub-topical
datasets will reduce the number of responses per topic, which leads to insufficient data for baseline approaches.
Table 3: Results of Model Validation.
![Page 25: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/25.jpg)
Topical Expertise Validation
• Goal– Validate the correctness of topical expertise learned by FaitCrowd.– Ideally, the expertise estimated by the proposed method is
consistent with the ground truth accuracy.
Figure 1: Topic 2 on the Game Dataset. Figure 2: Topic 4 on the SFV Dataset.
![Page 26: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/26.jpg)
Expertise Diversity Analysis
• Goal– Demonstrate that the topical expertise for each source varies on
different topics. – Ideally, the topical expertise should correspond to the ground
truth accuracy, i.e., the higher expertise, the higher the ground truth accuracy.
Figure 3: Source 7 on the Game Dataset. Figure 4: Source 16 on the SFV Dataset.
![Page 27: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/27.jpg)
27/61
Summary
• Problem– Recognize the difference in source reliability among topics
on the truth discovery task and propose to incorporate the estimation of fine grained reliability into truth discovery.
• Solution– Propose a probabilistic model that simultaneously learns
the topic-specific expertise for each source, aggregates true answers, and assigns topic labels to questions.
• Results– Empirically show that the proposed model outperforms
existing methods in multi-source aggregation with two real world datasets.
![Page 28: FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation Fenglong Ma 1, Yaliang Li 1, Qi Li 1, Minghui Qiu 2, Jing Gao 1, Shi Zhi 3, Lu](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649dda5503460f94ad031d/html5/thumbnails/28.jpg)
28/61
Thank you!Questions?