#crowdtruth: biomedical data mining, modeling & semantic integration (bdm2i 2015) @iswc2015
TRANSCRIPT
![Page 1: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/1.jpg)
Anca Dumitrache, Lora Aroyo, Chris Welty http://CrowdTruth.org
Achieving Expert-Level Annotation Quality with the Crowd
The Case of Medical Relation Extraction
Biomedical Data Mining, Modeling & Semantic Integration @ ISWC2015
#CrowdTruth @anouk_anca @laroyo @cawelty #BDM2I
![Page 2: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/2.jpg)
• Annotator disagreement is signal, not noise.
• It is indicative of the variation in human semantic interpretation of signs
• It can indicate ambiguity, vagueness, similarity, over-generality, etc, as well as quality
CrowdTruth http://CrowdTruth.org
![Page 3: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/3.jpg)
• Goals: collecting a relation extraction
gold standard improve the performance of a
relation extraction classifier
• Approach: crowdsource 900 medical
sentences measure disagreement with
CrowdTruth metrics train & evaluate classifier with
CrowdTruth score
CrowdTruth for medical rela2on extrac2on
http://CrowdTruth.org
![Page 4: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/4.jpg)
RelEx TA
SK in CrowdFlow
er Pa2ents with ACUTE FEVER and nausea could be suffering from INFLUENZA AH1N1
Is ACUTE FEVER – related to → INFLUENZA AH1N1?
h"p://CrowdTruth.org
![Page 5: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/5.jpg)
1 1 1
Worker Vector
h"p://CrowdTruth.org
![Page 6: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/6.jpg)
1 1 1
1 1
1
1 1
1 1
1 1
1
1
1
0 1 1 0 0 4 3 0 0 5 1 0
Sentence Vector
h"p://CrowdTruth.org
![Page 7: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/7.jpg)
0.907, p = 0:007
0.844
Annota2on Quality of Expert vs. Crowd Annota2ons
h"p://CrowdTruth.org
![Page 8: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/8.jpg)
0.907, p = 0:007
0.844
[0.6 -‐ 0.8] crowd significantly out-‐performs expert with max in 0.907 F1 @ 0.7 threshold
Annota2on Quality of Expert vs. Crowd Annota2ons
h"p://CrowdTruth.org
![Page 9: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/9.jpg)
0.642, p = 0:016 0.638
Relex CAUSE Classifier F1 for Crowd vs. Expert Annota2ons
h"p://CrowdTruth.org
![Page 10: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/10.jpg)
0.642, p = 0:016 0.638
crowd provides training data that is at least as good if not beEer than experts
Relex CAUSE Classifier F1 for Crowd vs. Expert Annota2ons
h"p://CrowdTruth.org
![Page 11: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/11.jpg)
(crowd with pos./neg. threshold at 0.5)
h"p://CrowdTruth.org
Learning Curves
![Page 12: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/12.jpg)
Learning Curves
(crowd with pos./neg. threshold at 0.5)
above 400 sent.: crowd consistently over baseline & single above 600 sent.: crowd out-‐performs experts
h"p://CrowdTruth.org
![Page 13: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/13.jpg)
Learning Curves Extended
(crowd with pos./neg. threshold at 0.5)
h"p://CrowdTruth.org
![Page 14: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/14.jpg)
Learning Curves Extended
(crowd with pos./neg. threshold at 0.5)
h"p://CrowdTruth.org
crowd consistently performs beEer than baseline
![Page 15: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/15.jpg)
# of Workers: Impact on Sentence-‐Rela2on Score
h"p://CrowdTruth.org
![Page 16: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/16.jpg)
# of Workers: Impact on Annota2on Quality
only 54 sent. had 15 or more workers
h"p://CrowdTruth.org
![Page 17: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/17.jpg)
Experts vs. Crowd in Human Annota2on Overall Comparison
• 91% of expert annotations covered by the crowd • expert annotators reach agreement only in 30% • most popular crowd vote covers 95% of this
expert annotation agreement
h"p://CrowdTruth.org
![Page 18: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/18.jpg)
F1 Cost per sentence
CrowdTruth 0.642 $0.66
Expert Annotator 0.638 $2.00
Single Annotator 0.492 $0.08
h"p://CrowdTruth.org
Expert vs. Crowd in Human Annota2on
Cost Comparison
![Page 19: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/19.jpg)
• crowd performs just as well as medical experts
• crowd is also cheaper • crowd is always available
• using only a few annotators for ground truth is faulty
• min 10 workers/sentence are needed for highest quality annotations
• CrowdTruth = a solution to Clinical
NLP Challenge: • lack of ground truth for training &
benchmarking
Experimentsproved that:
http://CrowdTruth.org
![Page 20: #CrowdTruth: Biomedical Data Mining, Modeling & Semantic Integration (BDM2I 2015) @ISWC2015](https://reader031.vdocuments.mx/reader031/viewer/2022021920/58d129121a28abe3298b4bc1/html5/thumbnails/20.jpg)
#CrowdTruth @anouk_anca @laroyo @cawelty #BDM2I #ISWC2015
CrowdTruth.org
http://data.CrowdTruth.org/medical-relex