semi-automated assessment of annotation trustworthiness

28
Semi-Automated Assessment of Annotation Trustworthiness Davide Ceolin, Archana Nottamkandath Wan Fokkink

Upload: davide-ceolin

Post on 06-Apr-2017

301 views

Category:

Education


1 download

TRANSCRIPT

Semi-Automated Assessment of Annotation

Trustworthiness

Davide Ceolin, Archana NottamkandathWan Fokkink

Museums...

Photo: flickr.com/clumsyjim

Museums...

Photo: flickr.com/clumsyjim

have a problem.

Photo: flickr.com/grrrl

Let’s recruit some help

Photo: flickr.com/anirudhkoul

Photo: flickr.com/anirudhkoul

Photo: flickr.com/hz536n

Red!

Photo: flickr.com/anirudhkoul

Photo: flickr.com/hz536n

Red!

Tulips!

Photo: flickr.com/anirudhkoul

Photo: flickr.com/hz536n

Tasty!

Red!

Tulips!

Photo: flickr.com/anirudhkoul

Photo: flickr.com/hz536n

Tasty!

Red!

Tulips!

???Photo: flickr.com/anirudhkoul

Photo: flickr.com/hz536n

Museums are meticolous

• The crowd alone does not solve the problem.

• We need to select only the most trustworthy annotations.

• System requirements:

✓reliable

✓with minumum effort

✓efficient

User ReputationUser reputation should reflect the trustworthiness of

their annotations.

Tulip

Flower

Ugly

Photo: flickr.com/hz536n

Reputation estimation

Tulip

Flower

Red

Pink

Purple

Photo: flickr.com/hz536n

Reputation estimation

Tulip

Flower

Red

Pink

Purple

0.0

0.5

1.0

1.5

2.0

Reputation

Reputation values

Prob

abilit

y D

ensi

ty

0 0.1 0.3 0.5 0.7 0.9 1

We compute reputations as in subjective logic.

Photo: flickr.com/hz536n

Reputation estimation

Tulip

Flower

Red

Pink

Purple

Few annotations per author:✓ minimum effort0.

00.

51.

01.

52.

0

Reputation

Reputation values

Prob

abilit

y D

ensi

ty

0 0.1 0.3 0.5 0.7 0.9 1

Photo: flickr.com/hz536n

Interpreting reputation0.

00.

51.

01.

52.

0

Reputation

Reputation values

Prob

abilit

y D

ensi

ty

0 0.1 0.3 0.5 0.7 0.9 1

Interpreting reputation

User reputation: 0.6

0.0

0.5

1.0

1.5

2.0

Reputation

Reputation values

Prob

abilit

y D

ensi

ty

0 0.1 0.3 0.5 0.7 0.9 1

E0.6

Interpreting reputation

User reputation: 0.6

Should we accept all his annotations? Reject all?

Interpreting reputation

User reputation: 0.6

Should we accept all his annotations? Reject all?

No: accept only the best 60%.How?

Estimating expertise can help

ExpertiseTraining set

New annotationTulip

Flower

Red

Pink

Purple Semantic distance

Rose

We are almost there...Rank the annotations and evaluate them.

But we can improve the efficiency of the system

Rose

Violet

...

...

Tomato

0.90.8

...

...

0.3

}

}

60% accepted

40% rejectedUser reputation:

0.6

Test set

Semantic clusteringClustering the training set semantically helps in

improving the system efficiency

Tulip

Flower

Red

Pink

Purple

Rose

Semantic clusteringClustering the training set semantically helps in

improving the system efficiency

Tulip

Flower

Red

Pink

Purple

Rose

Semantic clusteringClustering the training set semantically helps in

improving the system efficiency

Tulip

Flower

Red

Pink

Purple

Rose

Semantic clustering

✓efficiency

Clustering the training set semantically helps in improving the system efficiency

Tulip

Flower

Red

Pink

Purple

Rose

Results

• We run some experiments with from 5 to 30 annotations per user, on two datasets.

• Accuracy: 68% - 84%

• Precision: 87% - 88%

• Recall: 80% - 96%

• Time saved by clustering: 44% - 52%

Discussion

• Results are satisfactory but they can be further improved

• Provenance can help

• Image analysis programs can help

• Reuse of evaluations can reduce the museum effort

Recap

• We propose a system to evaluate the trustworthiness of museum annotations

• Uses reputation, semantic similarity and clustering

✓Reliable

✓With minumum effort

✓Efficient

Recap

• We propose a system to evaluate the trustworthiness of museum annotations

• Uses reputation, semantic similarity and clustering

✓Reliable

✓With minumum effort

✓Efficient

Thanks!