formal arguments, preferences, and natural language interfaces to humans: an empirical evaluation

31
Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation Federico Cerutti Nava Tintarev Nir Oren ECAI 2014 — Friday 22 nd August, 2014

Upload: federico-cerutti

Post on 30-May-2015

98 views

Category:

Education


1 download

DESCRIPTION

Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation Talk given during ECAI 2014

TRANSCRIPT

Page 1: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Formal Arguments, Preferences,and Natural Language Interfaces

to Humans: an EmpiricalEvaluation

Federico Cerutti Nava Tintarev Nir Oren

ECAI 2014 — Friday 22nd August, 2014

Page 2: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Motivation

– Distributed autonomous systems increasingly used– Reasoning can be formalized as argumentation– However, if we need to explain this to people the information

presentation needs to be more natural– Can we create a bridge between natural language and formal

argumentation?– What kind of factors need to be considered

- Preferences between arguments?- Domain specific knowledge?

2 of 31

Page 3: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Background

The ExperimentMethodology

ResultsConclusions

3 of 31

Page 4: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Background on P&S

Rule-based argumentation frameworkAllows to express arguments in favour of preferences among rulesIncludes negation as failure an strong negationAlthough it is pre-Dung1995, it is easy to draw a correspondence withan abstract argumentation frameworks (there are some points wherewe should be cautious, but it is not the case of this work)

4 of 31

Page 5: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Crash course on P&S

Each rule as a set of antecedents and a consequentStrict (they cannot contain negation as failure atoms) and defeasiblerulesArguments as sequence (instead of recursive structure like in ASPIC)of rulesThe conclusions of an argument is the set containing each consequentof each rule of the argumentAttacks:

on some antecedent of some ruleon some conclusion

Skeptical semantics: groundedCredulous semantics: stable

5 of 31

Page 6: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

ExampleS Ds1 : ⇒ sAAAs2 : ⇒ sBBBs3 : ⇒ sdoc

r1 : sAAA ∧ ∼ exAAA⇒ poorerr2 : sBBB ∧ sdoc ∧ ∼ exBBB ∧ ∼ exdoc⇒¬ poorerr3 : ∼ exexpert⇒ r1 ≺ r2

A politician and an economist discuss the potential financial outcome of theindependence of a region X. The politician puts forward an argument in favour ofthe conclusion “If Region X becomes independent, X’s citizens will be poorerthan they are now”. Another argument holding a contradicting conclusion (i.e.Region X will not be poorer) is advanced by the economist. The economist’sopinion is likely to be preferred to that of the politician, and is supported by ascientific document.

A rgs = {a1 = ⟨s1, r1⟩,a2 = ⟨s2, s3, r2⟩,a3 = ⟨r3⟩}; a2A rgs-defeats a1a2 justified

6 of 31

Page 7: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Background

The Experiment

MethodologyResults

Conclusions

7 of 31

Page 8: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

The Experiment

Presenting each participant with a text, written in natural language,followed by a questionnaireBetween subjects design across eight texts: each participant is shown asingle (randomly selected) textFour domains:

1 weather forecast2 political debate3 used car sale4 romantic relationship

Two KBs: base case, and extended caseThe base case always consider two arguments a1 and a2 with twocontradicting conclusions; and a preference in favour of a2

8 of 31

Page 9: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

The Extended Case for the Example

More recent research disputes the claim of the economist

S Ds1 : ⇒ sAAAs2 : ⇒ sBBBs3 : ⇒ sdocs4 : ⇒ sresearchs5 : sresearch⇒¬sdoc

r1 : sAAA ∧ ∼ exAAA⇒ poorerr2 : sBBB ∧ sdoc ∧ ∼ exBBB ∧ ∼ exdoc⇒¬ poorerr3 : ∼ exexpert⇒ r1 ≺ r2

A rgs = {a1 = ⟨s1, r1⟩,a2 = ⟨s2, s3, r2⟩,a3 = ⟨r3⟩,a4 = ⟨s4, s5⟩}a2A rgs-defeats a1,a2A rgs-defeats a4,a4A rgs-defeats a2,

Two stable extensions:{a1,a3,a4} and {a2,a3}

9 of 31

Page 10: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Domain 1: weather forecast

The weather forecasting service of the broadcasting company AAA saysthat it will rain tomorrow (a1).Meanwhile, the forecast service of the broadcasting company BBB says thatit will be cloudy tomorrow but that it will not rain (a2).It is also well known that the forecasting service of BBB is more accuratethan the one of AAA (a3).However, yesterday the trustworthy newspaper CCC published an articlewhich said that BBB has cut the resources for its weather forecastingservice in the past months, thus making it less reliable than in the past (a4).

10 of 31

Page 11: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Domain 2: political debate

In a TV debate, the politician AAA argues that if Region X becomesindependent then X’s citizens will be poorer than now (a1).Subsequently, financial expert (a3) Dr. BBB presents a document; whichscientifically shows that Region X will not be worse off financially if itbecomes independent (a2).After that, the moderator of the debate reminds BBB of more recentresearch by several important economists that disputes the claims in thatdocument (a4).

11 of 31

Page 12: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Domain 3: buying a car

You are planning to buy a second-hand car, and you go to a dealership withBBB, a mechanic whom has been recommended you by a friend (a3).The salesperson AAA shows you a car and says that it needs very littlework done to it (a1).BBB says it will require quite a lot of work, because in the past he had tofix several issues in a car of the same model (a2).While you are at the dealership, your friend calls you to tell you that heknows (beyond a shadow of a doubt) that BBB made unnecessary repairsto his car last month (a4).

12 of 31

Page 13: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Domain 4: romance

After several dates, you would like to start a serious relationship with J.but you turn to ask two friends of yours, AAA and BBB, for advice. Youhave known BBB for longer than you have known AAA (a3).AAA tells you that J is lovely and you should go ahead (a1),while BBB suggests that you should be very cautious because J might havea hidden agenda (a2).After some weeks, CCC, who is also a close friend of BBB, tells you thatBBB has been into you for years; BBB is too shy to tell you about theirfeelings about you, but are still possessive of you (a4).

13 of 31

Page 14: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Formalisation summary

Domain Base Case ExtendedCase

Type of reinstatement

1, weather 1.B 1.E preference attack

2, politics 2.B 2.E a2 rebuttal

3, buying car 3.B 3.E preference attack

4, romance 4.B 4.E preference rebuttal

14 of 31

Page 15: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

BackgroundThe Experiment

Methodology

ResultsConclusions

15 of 31

Page 16: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Methodology

Participants are asked to determine which of the following positionsthey think is accurate:

PA: I think that AAA’s position is correct (e.g. “X’s citizens will bepoorer than now”)PB: I think that BBB’s position is correct (e.g. “X’s citizens will not beworse off financially”)PU: I cannot determine if either AAA’s or BBB’s position is correct(e.g. “I cannot conclude anything about Region X’s finances”)

Rate a statements in terms of relevance (for the conclusion) andagreement on a 7 points scale from Disagree to Agree for eachstatement

16 of 31

Page 17: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Hypotheses

H1: In the base cases (Scenarios 1.B, 2.B, 3.B and 4.B), the majority ofparticipants will agree with BBB’s statement (positionPB)H2: In the extended cases (Scenarios 1.E, 2.E, 3.E and 4.E), themajority of participants will agree that they cannot concludeanything from the text (positionPU).H3: The majority of participants who view a base case scenario willagree with the preference argument, and find it relevant

17 of 31

Page 18: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

BackgroundThe Experiment

Methodology

ResultsConclusions

18 of 31

Page 19: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Hypotheses H1 and H2

0

15

30

45

60

PA PB PU

%

Distribution of acceptability of actors’ positions

Base cases Extended cases

Distribution of the final conclusionPA/PB/PUBase cases, χ 2 analysis (2, N=77)=37.74, p< 0.001;

extended cases χ 2 (2, N=84)=8.0, p< 0.0219 of 31

Page 20: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Hypothesis H3

Participants rate how much (on a scale of 1 to 7) they agree with thefollowing statement (agreement), and whether it is relevant in drawingtheir conclusion (relevance): “BBB is more trustworthy than AAA.”

Significant difference between the base and the extended cases foragreement (Mann-Whitney U(1778), Z=−5.0, p< 0.001) and relevance(Mann-Whitney U(1852), Z=−4.7, p< 0.001).

In addition, the median values both for agreement and relevance aregreater for the base cases than for the extended cases

20 of 31

Page 21: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Post Hoc: Motivations

Base Cases Extended Cases

PA PB PU PA PB PU

1, weather 5.0 50.0 45.0 15.8 21.1 63.2

2, politics 5.3 63.2 31.6 21.1 10.5 68.4

3, buying car 0.0 68.2 31.8 23.8 23.8 52.4

4, romance 12.5 68.8 18.8 48.0 36.0 16.0

Distribution of the final conclusionPA/PB/PUFisher (N = 161) = 48.756, p< 0.001, 10000 sampled tables, Monte Carlo

approach with 99% confidence interval (MC99)

21 of 31

Page 22: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Post Hoc: Distributions of Base Cases

0

15

30

45

60

U1 U2 U3

%

Distributions of motivations forPU (scenarios 1.B and 3.B)

1.B 3.B

Agreement with thePU position in scenarios 1.B and 3.B:U1: lack of information, U2: domain specific reasons; U3: other

22 of 31

Page 23: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Post Hoc: Distributions between Base/ExtendedCases

Base Cases Extended CasesPA PB PU PA PB PU

1, weather 5.0 50.0 45.0 15.8 21.1 63.22, politics 5.3 63.2 31.6 21.1 10.5 68.43, buying car 0.0 68.2 31.8 23.8 23.8 52.44, romance 12.5 68.8 18.8 48.0 36.0 16.0

Are the distributions of choices (amongPA,PB, andPU) in the base caseis significantly different from the distribution of choices in thecorresponding extended case?

YES for the third domain (3.B and 3.E, buying a car) — Fisher(N = 43) = 10.693, p< 0.001, 10000 sampled tables, MC99.NO for the first domain (1.B and 1.E, weather forecasts) — Fisher(N = 39) = 3.832, p= 0.187, 10000 sampled tables, MC99.

23 of 31

Page 24: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Post Hoc: Distributions Extended Cases

Base Cases Extended CasesPA PB PU PA PB PU

1, weather 5.0 50.0 45.0 15.8 21.1 63.22, politics 5.3 63.2 31.6 21.1 10.5 68.43, buying car 0.0 68.2 31.8 23.8 23.8 52.44, romance 12.5 68.8 18.8 48.0 36.0 16.0

Domain has a significant effect on the distribution of positions — Fisher(N = 84) = 16.308, p< 0.05, 10000 sampled tables, MC99.

24 of 31

Page 25: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Post Hoc: Relevance and Agreement

Base cases Extended cases

RB†

Md∗B RE†

Md∗E C.D.‡

Rel

evan

ce 1, weather 110.38 6.00 82.92 4.00 46.602, politics 107.45 6.00 69.45 4.00 47.19

3, buying car 118.05 6.50 67.45 4.00 44.384, romance 48.34 2.00 44.40 2.00 46.57

Agr

eem

ent 1, weather 116.38 6.00 87.18 4.00 46.60

2, politics 103.34 6.00 65.05 4.00 47.193, buying car 121.93 6.50 64.33 4.00 44.384, romance 44.94 2.00 44.20 2.00 46.57

Statistically significant cases when |Rx−Ry|> C.D.† Mean rank as computed with the Kruskal-Wallis test‡ Critical Difference, as computed in [Siegel and Castellan Jr., 1988] citedby [Field, 2009] with α= 0.05.

25 of 31

Page 26: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Post Hoc: Relevance and Agreement

Scenario 3.B Scenario 4.B

R3.B†

Md∗3.B R4.B†

Md∗4.B C.D.‡

Relevance 118.05 6.50 48.34 2.00 47.79Agreement 121.93 6.50 44.94 2.00 47.79

Statistically significant cases when |Rx−Ry|> C.D.† Mean rank as computed with the Kruskal-Wallis test‡ Critical Difference, as computed in [Siegel and Castellan Jr., 1988] citedby [Field, 2009] with α= 0.05.

26 of 31

Page 27: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

BackgroundThe Experiment

MethodologyResults

Conclusions

27 of 31

Page 28: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Conclusions

Investigation into the relationship between formal systems ofdefeasible argumentation and arguments in natural languageResults suggest a correspondence between the formal theory and itsrepresentation in natural languagePreference generally applied “following” Prakken and Sartor:importance of being able to represent them

Humans evaluate preference depending on the contextCollateral knowledgeReverse of preference

28 of 31

Page 29: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Acknowledgement

Research was sponsored by US Army Research laboratory and the UK Ministryof Defence and was accomplished under Agreement Number W911NF-06-3-0001.The views and conclusions contained in this document are those of the authorsand should not be interpreted as representing the official policies, either expressedor implied, of the US Army Research Laboratory, the U.S. Government, the UKMinistry of Defense, or the UK Government. The US and UK Governments areauthorized to reproduce and distribute reprints for Government purposesnotwithstanding any copyright notation hereon.

This research has been carried out within the project “Scrutable AutonomousSystems” (SAsSY), funded by the Engineering and Physical Sciences ResearchCouncil (EPSRC, UK), grant ref. EP/J012084/1.

29 of 31

Page 30: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

Advert

30 of 31

Page 31: Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation

References I

[Field, 2009] Field, A. (2009).Discovering Statistics Using SPSS (Introducing Statistical Methods series).SAGE Publications Ltd.

[Siegel and Castellan Jr., 1988] Siegel, S. and Castellan Jr., N. J. (1988).Nonparametric Statistics for The Behavioral Sciences.McGraw-Hill Humanities/Social Sciences/Languages.

31 of 31