semi-automated assessment of annotation trustworthiness

Semi-Automated Assessment of Annotation

Trustworthiness

Davide Ceolin, Archana NottamkandathWan Fokkink

Museums...

Photo: flickr.com/clumsyjim

Museums...

Photo: flickr.com/clumsyjim

have a problem.

Photo: flickr.com/grrrl

Let’s recruit some help

Photo: flickr.com/anirudhkoul


Photo: flickr.com/hz536n

Red!



Red!

Tulips!



Tasty!

Red!

Tulips!



Tasty!

Red!

Tulips!

???Photo: flickr.com/anirudhkoul


Museums are meticolous

• The crowd alone does not solve the problem.

• We need to select only the most trustworthy annotations.

• System requirements:

✓reliable

✓with minumum effort

✓efficient

User ReputationUser reputation should reflect the trustworthiness of

their annotations.

Tulip

Flower

Ugly


Reputation estimation

Tulip

Flower

Red

Pink

Purple



Tulip

Flower

Red

Pink

Purple

0.0

0.5

1.0

1.5

2.0

Reputation

Reputation values

Prob

abilit

y D

ensi

ty

0 0.1 0.3 0.5 0.7 0.9 1

We compute reputations as in subjective logic.



Tulip

Flower

Red

Pink

Purple

Few annotations per author:✓ minimum effort0.

00.

51.

01.

52.

0

Reputation

Reputation values

Prob

abilit

y D

ensi

ty

0 0.1 0.3 0.5 0.7 0.9 1


Interpreting reputation0.

00.

51.

01.

52.

0

Reputation

Reputation values

Prob

abilit

y D

ensi

ty

0 0.1 0.3 0.5 0.7 0.9 1

Interpreting reputation

User reputation: 0.6

0.0

0.5

1.0

1.5

2.0

Reputation

Reputation values

Prob

abilit

y D

ensi

ty

0 0.1 0.3 0.5 0.7 0.9 1

E0.6



Should we accept all his annotations? Reject all?



Should we accept all his annotations? Reject all?

No: accept only the best 60%.How?

Estimating expertise can help

ExpertiseTraining set

New annotationTulip

Flower

Red

Pink

Purple Semantic distance

Rose

We are almost there...Rank the annotations and evaluate them.

But we can improve the efficiency of the system

Rose

Violet

...

...

Tomato

0.90.8

...

...

0.3

}

}

60% accepted

40% rejectedUser reputation:

0.6

Test set

Semantic clusteringClustering the training set semantically helps in

improving the system efficiency

Tulip

Flower

Red

Pink

Purple

Rose

Semantic clustering

✓efficiency

Clustering the training set semantically helps in improving the system efficiency

Tulip

Flower

Red

Pink

Purple

Rose

Results

• We run some experiments with from 5 to 30 annotations per user, on two datasets.

• Accuracy: 68% - 84%

• Precision: 87% - 88%

• Recall: 80% - 96%

• Time saved by clustering: 44% - 52%

Discussion

• Results are satisfactory but they can be further improved

• Provenance can help

• Image analysis programs can help

• Reuse of evaluations can reduce the museum effort

Recap

• We propose a system to evaluate the trustworthiness of museum annotations

• Uses reputation, semantic similarity and clustering

✓Reliable

✓With minumum effort

✓Efficient

Recap

• We propose a system to evaluate the trustworthiness of museum annotations

• Uses reputation, semantic similarity and clustering

✓Reliable

✓With minumum effort

✓Efficient

Thanks!

semi-automated assessment of annotation trustworthiness

Education