(presentation chris) crowdsourcing & semantic web: dagstuhl 2014

20
How to Measure Quality with Disagreement? or the Three Sides of CrowdTruth Lora Aroyo & Chris Welty

Upload: lora-aroyo

Post on 15-Nov-2014

792 views

Category:

Technology


0 download

DESCRIPTION

How to Measure Quality with Disagreement? or the Three Sides of CrowdTruth

TRANSCRIPT

Page 1: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

How to Measure Quality with Disagreement?

or the Three Sides of CrowdTruth

Lora Aroyo & Chris Welty

Page 2: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

CrowdTruth Annotator disagreement is signal, not noise.

It is indicative of the variation in human

semantic interpretation of signs

It can indicate ambiguity, vagueness, similarity, over-generality, etc, as well as

quality

Page 3: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

CrowdTruth Dependencies

worker metrics for detecting spam à quality of sentences à quality of the target semantics worker quality metrics can improve significantly when the quality of these other aspects of semantic interpretation are considered

Page 4: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

The Three Sides of CrowdTruth

Page 5: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

Representation

Worker Vector

1 1 1

Page 6: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

Representation

Sentence Vector

1 1 1

1 1

1

1 1

1 1

1 1

1

1

1

0 1 1 0 0 4 3 0 0 5 1 0

Page 7: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

Feeling the way the CHEST expands (PALPATION), can identify areas of the lung that are full of fluid.

?PALPATIONIs CHEST related to

diagnose location associated with

is_a otherpart_of

0 0 02 3 0 0 0 1 0 0 44 1

Disagreement for Sentence Clarity

Unclear relationship between the two arguments reflected in the disagreement

Page 8: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

?CONJUNCTIVITISHYPERAEMIA related toIs0 0 0 1 0 0 0 013 0 0 0 0 0

symptomcause

Redness (HYPERAEMIA), irritation (chemosis) and watering (epiphora) of the eyes are symptoms common to all forms of CONJUNCTIVITIS.

Disagreement for Sentence Clarity

Clearly expressed relation between the two arguments reflected in the agreement

Page 9: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

Sentence-Relation Score

Measures how clearly a sentence expresses a relation

0 1 1 0 0 4 3 0 0 5 1 0

Unit vector for relation R6

Sentence Vector

Cosine = .55

Page 10: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

Worker Disagreement

Measured per worker

Worker-sentence disagreement

0 1 1 0 0 4 3 0 0 5 1 0

Worker’s sentence vector

Sentence Vector

AVG (Cosine)

Page 11: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

Worker Metrics how much A WORKER disagrees with THE CROWD per sentence à the avg of all cosine distances between each worker’s sentence vector & the full sentence vector (minus that worker) are there consistently like-minded workers à pairwise metric - avg for a particular worker à there may be communities of thought that consistently disagree with others, but agree within themselves Low quality workers generally have high scores in both avg relations per sentence à per worker the number of relations he/she chooses per sentence averaged over all sentences he/she annotates. High score here can help indicate low quality workers.

Page 12: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

Sentence Metrics Sentence-relation score à core CrowdTruth metric for relation extraction à measured for each relation on each sentence as the cosine of the unit vector for the relation with the sentence vector indicating that a relation is clearly or vaguely expressed, Sentence clarity à defined for each sentence as the max relation score for that sentence indicating a clear or ambiguous or confusing sentence

Page 13: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

Relation Metrics Relation similarity à the causal power (pairwise conditional probability). high similarity score indicates the relations are confusable to workers Relation ambiguity is defined for each relation as the max relation similarity for the relation. If a relation is clear, then it will have a low score. Relation clarity à defined for each relation as the max sentence-relation score for the relation over all sentences. If a relation has a high clarity score, it means that it is at least possible to express the relation clearly Relation frequency is the number of times the relation is annotated at least once in a sentence

Page 14: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

Impact of Dependencies

Page 15: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

Impact of Dependencies

Page 16: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

Impact of Sentence Quality on Worker Quality

(a) the space with no filtering of sentences or relations, a single line cannot separate the spammers from non-spammers

(b) the space after sentence filtering, Figure (c) after relation filtering, and Figure (d) after both sentence and relation filtering. Sentence filtering makes the classes linearly separable, and the separation between the classes increases in the subsequent figures.

Page 17: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

Impact of Relation Quality on Worker

Quality

(a) the space with no filtering of sentences or relations, a single line cannot separate the spammers from non-spammers (c) after relation filtering

the relation filtering much more clearly defines the space, with a large separation between positive and negative instances. the pairwise improvements to the worker scores are significant with p < :001, which is better than the sentence clarity improvements

Page 18: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

Combining Sentence & Relation Filtering

•  first filtering out low clarity sentences

•  then filtering vague and ambiguous relations

•  worker metrics were computed on these new sentences and vectors

•  proves to even further separate the space, and the pairwise improvement in worker scores from the baseline (unfiltered) is significant with p < :0005.

•  The improvement over sentence filtering alone is also significant (p < :01)

•  The improvement over relation filtering alone is only significant with p < :05.

Page 19: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

quality measures in semantic interpretation tasks

are inter-dependent higher accuracy can be achieved by considering the impact of sentence quality & relation quality on worker quality measurements significant improvement in worker quality metrics with respect to known spammers by incorporating the quality of the individual sentences & target relations relationships between the different corners of the triangle of reference, e.g. à the impact of relation & worker quality on sentence measures, à the impact of worker & sentence quality on relation measures

Page 20: (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014

crowdtruth.org