does history help?

16
Does History Help? An Experiment on How Context Affects Crowdsourcing Dialogue Annotation Elnaz Nouri Computer Science Department University of Southern California Natural Dialogue Group, Institute for Creative Technologies

Upload: oro

Post on 23-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Does History Help? . An Experiment on How Context Affects Crowdsourcing Dialogue Annotation. Elnaz Nouri Computer Science Department University of Southern California Natural Dialogue Group, Institute for Creative Technologies. Crowdsourcing Annotation. Faster (?) Cheaper (?). Quality (?) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Does History Help?

Does History Help? An Experiment on How Context Affects Crowdsourcing Dialogue Annotation

Elnaz Nouri

Computer Science Department University of Southern CaliforniaNatural Dialogue Group, Institute for Creative Technologies

Page 2: Does History Help?

Crowdsourcing Annotation

Faster (?)Cheaper (?)

Quality (?) Snow(2008)

Page 3: Does History Help?

In Crowdsourcing Dialogue Annotation Tasks

Dialogue data is sequential by nature.

Does providing context from previous parts of the dialogue (e.g. turns) affect the annotation of the target part?

Example: Judge the sentiment on the following turn of the dialogue:

Person 1: Come on out, honey! I'm telling you look good! Tell her she looks good, tell her she looks good.

Person 2: Oh my God, you look so good!

Page 4: Does History Help?
Page 5: Does History Help?

From Seinfeld…

WAITRESS : Tuna on toast, coleslaw, cup of coffee.

GEORGE: Yeah. No, no, no, wait a minute, I always have tuna on toast. Nothing's ever worked out for me with tuna on toast. I want the complete opposite of on toast. Chicken salad, on rye, un-toasted with a side of potato salad ... and a cup of tea.

JERRY: You know chicken salad is not the opposite of tuna, salmon is the opposite of tuna, 'cuz salmon swim against the current, and the tuna swim with it.

GEORGE: Good for the tuna!

Link to Video

Page 6: Does History Help?

Interesting Questions General aspect:

Do annotators need context to do each instance of the annotation?

Can we present them with only the needed previous context? How does context affect stability of the annotation?

Crowdsourcing aspect: Should we present the whole dialogue to annotator if the

compensation rate is low? Can we consider each annotation task as a stand alone micro

task? Do annotators on Amazon Mechanical Turk read the instructions

or the context provided?

Page 7: Does History Help?

So we ran an experiment…

Page 8: Does History Help?

The Idea: A Variable Context Window Size

How is it going? I am Bronson from the Hill

restaurant. I am Milton. I am from the Vally restaurant.

Alright, cool. So looks like we got some good resources on the table.

And, uh, we want to find a way that works

for both of us.

Uh, yeah I agree. I just want to, we want to maximize

both of our profits.

So what do we have right here?

Page 9: Does History Help?

The Data Set The “Farmers Market” negotiation dataset 41 dyadic sessions of negotiation based on instructions Two restaurant owners are trying to divide some items among

themselves

Page 10: Does History Help?

The Task: Sentiment Analysis Task

Emotion Tag Score Emotion Embodied

Strongly positive 2 extremely happy or excited toward the topic

Positive 1 generally happy or satisfied, but the emotion wasn't extreme.

Neutral 0 Not positive or negative

Negative -1 perceived to be angry or upsetting toward the topic, but not to the extreme

Strongly Negative -2  extremely negative toward the topic

• 3 dialogues used: D1 (31 turns), D2 (16 turns), D3 (30 turns) = 77 turns• 5 annotators for each instance: A1, A2, A3, A4, A5• annotators recruited on Amazon Turk• $0.02 for annotating each instance

• “Sentiment Annotation Task” on the turns of the dialogue

Page 11: Does History Help?

Example StimuliPrevious Context Window Size =3

Page 12: Does History Help?

Example Annotation Result

TURN A1 A2 A3 A4 A5 AVG Gold

Person 2: I need the apples so that is done. We get equal bananas and equal strawberries, so… Done! 1 2 1 0 1 1 1

Person 1: Perfect! 1 2 1 1 2 1.4 2

Person 2: We have reached an agreement. 1 1 2 1 1 1.2 1

Gold annotation: the whole dialogue was presented to the annotator

Page 13: Does History Help?

Evaluation Method 1: Distance to the Gold Annotation

Context Window Size D1 D2 D3

0 turns 0.260 0.341 0.236

1 turns 0.261 0.317* 0.228*

2 turns 0.215* 0.326 0.248

3 turns 0.299 0.349 0.313

4 turns 0.277 0.413 0.268

5 turns 0.238 0.356 0.247

6 turns 0.246 0.341 0.255

(* shows the minimum distance from Gold annotation)

Page 14: Does History Help?

Evaluation Method 2: Inter-annotator Agreement

Context Window Size

Krippendorff's alpha

0 0.0976

1 0.2165

2 0.1133

3 0.2431*

4 0.1670

5 0.1923

6 0.1790

* shows the maximum agreement

Hypothesis:

higher inter-annotator reliability implies more stability

indicator of the optimal context window size

The differences between window sizes were not significant according to t-test except for that of 0 window size.

Page 15: Does History Help?

Conclusions (and considerations)

Our results imply that: the number of previous turns doesn’t really affect the annotation

of the target not necessary to show a big number of previous turns or the whole dialogue.

A context window size of 3 is perhaps enough to do the job.

Considerations: sample size is very small the nature of the dialogues and the negotiation task might have

affect the results our dataset wasn’t too emotional! these are not real negotiation or conversations

the annotation task can also affect the outcome

Page 16: Does History Help?

Future Work

Further investigation is needed:

Different datasets Different annotation tasks Appropriate metrics for measuring Suitable baseline annotation for comparison

Questions?Please tell me what you think! Your feedback and ideas are sincerely appreciated!