formal arguments, preferences, and natural language interfaces to humans: an empirical evaluation
DESCRIPTION
Formal Arguments, Preferences, and Natural Language Interfaces to Humans: an Empirical Evaluation Talk given during ECAI 2014TRANSCRIPT
Formal Arguments, Preferences,and Natural Language Interfaces
to Humans: an EmpiricalEvaluation
Federico Cerutti Nava Tintarev Nir Oren
ECAI 2014 — Friday 22nd August, 2014
Motivation
– Distributed autonomous systems increasingly used– Reasoning can be formalized as argumentation– However, if we need to explain this to people the information
presentation needs to be more natural– Can we create a bridge between natural language and formal
argumentation?– What kind of factors need to be considered
- Preferences between arguments?- Domain specific knowledge?
2 of 31
Background
The ExperimentMethodology
ResultsConclusions
3 of 31
Background on P&S
Rule-based argumentation frameworkAllows to express arguments in favour of preferences among rulesIncludes negation as failure an strong negationAlthough it is pre-Dung1995, it is easy to draw a correspondence withan abstract argumentation frameworks (there are some points wherewe should be cautious, but it is not the case of this work)
4 of 31
Crash course on P&S
Each rule as a set of antecedents and a consequentStrict (they cannot contain negation as failure atoms) and defeasiblerulesArguments as sequence (instead of recursive structure like in ASPIC)of rulesThe conclusions of an argument is the set containing each consequentof each rule of the argumentAttacks:
on some antecedent of some ruleon some conclusion
Skeptical semantics: groundedCredulous semantics: stable
5 of 31
ExampleS Ds1 : ⇒ sAAAs2 : ⇒ sBBBs3 : ⇒ sdoc
r1 : sAAA ∧ ∼ exAAA⇒ poorerr2 : sBBB ∧ sdoc ∧ ∼ exBBB ∧ ∼ exdoc⇒¬ poorerr3 : ∼ exexpert⇒ r1 ≺ r2
A politician and an economist discuss the potential financial outcome of theindependence of a region X. The politician puts forward an argument in favour ofthe conclusion “If Region X becomes independent, X’s citizens will be poorerthan they are now”. Another argument holding a contradicting conclusion (i.e.Region X will not be poorer) is advanced by the economist. The economist’sopinion is likely to be preferred to that of the politician, and is supported by ascientific document.
A rgs = {a1 = ⟨s1, r1⟩,a2 = ⟨s2, s3, r2⟩,a3 = ⟨r3⟩}; a2A rgs-defeats a1a2 justified
6 of 31
Background
The Experiment
MethodologyResults
Conclusions
7 of 31
The Experiment
Presenting each participant with a text, written in natural language,followed by a questionnaireBetween subjects design across eight texts: each participant is shown asingle (randomly selected) textFour domains:
1 weather forecast2 political debate3 used car sale4 romantic relationship
Two KBs: base case, and extended caseThe base case always consider two arguments a1 and a2 with twocontradicting conclusions; and a preference in favour of a2
8 of 31
The Extended Case for the Example
More recent research disputes the claim of the economist
S Ds1 : ⇒ sAAAs2 : ⇒ sBBBs3 : ⇒ sdocs4 : ⇒ sresearchs5 : sresearch⇒¬sdoc
r1 : sAAA ∧ ∼ exAAA⇒ poorerr2 : sBBB ∧ sdoc ∧ ∼ exBBB ∧ ∼ exdoc⇒¬ poorerr3 : ∼ exexpert⇒ r1 ≺ r2
A rgs = {a1 = ⟨s1, r1⟩,a2 = ⟨s2, s3, r2⟩,a3 = ⟨r3⟩,a4 = ⟨s4, s5⟩}a2A rgs-defeats a1,a2A rgs-defeats a4,a4A rgs-defeats a2,
Two stable extensions:{a1,a3,a4} and {a2,a3}
9 of 31
Domain 1: weather forecast
The weather forecasting service of the broadcasting company AAA saysthat it will rain tomorrow (a1).Meanwhile, the forecast service of the broadcasting company BBB says thatit will be cloudy tomorrow but that it will not rain (a2).It is also well known that the forecasting service of BBB is more accuratethan the one of AAA (a3).However, yesterday the trustworthy newspaper CCC published an articlewhich said that BBB has cut the resources for its weather forecastingservice in the past months, thus making it less reliable than in the past (a4).
10 of 31
Domain 2: political debate
In a TV debate, the politician AAA argues that if Region X becomesindependent then X’s citizens will be poorer than now (a1).Subsequently, financial expert (a3) Dr. BBB presents a document; whichscientifically shows that Region X will not be worse off financially if itbecomes independent (a2).After that, the moderator of the debate reminds BBB of more recentresearch by several important economists that disputes the claims in thatdocument (a4).
11 of 31
Domain 3: buying a car
You are planning to buy a second-hand car, and you go to a dealership withBBB, a mechanic whom has been recommended you by a friend (a3).The salesperson AAA shows you a car and says that it needs very littlework done to it (a1).BBB says it will require quite a lot of work, because in the past he had tofix several issues in a car of the same model (a2).While you are at the dealership, your friend calls you to tell you that heknows (beyond a shadow of a doubt) that BBB made unnecessary repairsto his car last month (a4).
12 of 31
Domain 4: romance
After several dates, you would like to start a serious relationship with J.but you turn to ask two friends of yours, AAA and BBB, for advice. Youhave known BBB for longer than you have known AAA (a3).AAA tells you that J is lovely and you should go ahead (a1),while BBB suggests that you should be very cautious because J might havea hidden agenda (a2).After some weeks, CCC, who is also a close friend of BBB, tells you thatBBB has been into you for years; BBB is too shy to tell you about theirfeelings about you, but are still possessive of you (a4).
13 of 31
Formalisation summary
Domain Base Case ExtendedCase
Type of reinstatement
1, weather 1.B 1.E preference attack
2, politics 2.B 2.E a2 rebuttal
3, buying car 3.B 3.E preference attack
4, romance 4.B 4.E preference rebuttal
14 of 31
BackgroundThe Experiment
Methodology
ResultsConclusions
15 of 31
Methodology
Participants are asked to determine which of the following positionsthey think is accurate:
PA: I think that AAA’s position is correct (e.g. “X’s citizens will bepoorer than now”)PB: I think that BBB’s position is correct (e.g. “X’s citizens will not beworse off financially”)PU: I cannot determine if either AAA’s or BBB’s position is correct(e.g. “I cannot conclude anything about Region X’s finances”)
Rate a statements in terms of relevance (for the conclusion) andagreement on a 7 points scale from Disagree to Agree for eachstatement
16 of 31
Hypotheses
H1: In the base cases (Scenarios 1.B, 2.B, 3.B and 4.B), the majority ofparticipants will agree with BBB’s statement (positionPB)H2: In the extended cases (Scenarios 1.E, 2.E, 3.E and 4.E), themajority of participants will agree that they cannot concludeanything from the text (positionPU).H3: The majority of participants who view a base case scenario willagree with the preference argument, and find it relevant
17 of 31
BackgroundThe Experiment
Methodology
ResultsConclusions
18 of 31
Hypotheses H1 and H2
0
15
30
45
60
PA PB PU
%
Distribution of acceptability of actors’ positions
Base cases Extended cases
Distribution of the final conclusionPA/PB/PUBase cases, χ 2 analysis (2, N=77)=37.74, p< 0.001;
extended cases χ 2 (2, N=84)=8.0, p< 0.0219 of 31
Hypothesis H3
Participants rate how much (on a scale of 1 to 7) they agree with thefollowing statement (agreement), and whether it is relevant in drawingtheir conclusion (relevance): “BBB is more trustworthy than AAA.”
Significant difference between the base and the extended cases foragreement (Mann-Whitney U(1778), Z=−5.0, p< 0.001) and relevance(Mann-Whitney U(1852), Z=−4.7, p< 0.001).
In addition, the median values both for agreement and relevance aregreater for the base cases than for the extended cases
20 of 31
Post Hoc: Motivations
Base Cases Extended Cases
PA PB PU PA PB PU
1, weather 5.0 50.0 45.0 15.8 21.1 63.2
2, politics 5.3 63.2 31.6 21.1 10.5 68.4
3, buying car 0.0 68.2 31.8 23.8 23.8 52.4
4, romance 12.5 68.8 18.8 48.0 36.0 16.0
Distribution of the final conclusionPA/PB/PUFisher (N = 161) = 48.756, p< 0.001, 10000 sampled tables, Monte Carlo
approach with 99% confidence interval (MC99)
21 of 31
Post Hoc: Distributions of Base Cases
0
15
30
45
60
U1 U2 U3
%
Distributions of motivations forPU (scenarios 1.B and 3.B)
1.B 3.B
Agreement with thePU position in scenarios 1.B and 3.B:U1: lack of information, U2: domain specific reasons; U3: other
22 of 31
Post Hoc: Distributions between Base/ExtendedCases
Base Cases Extended CasesPA PB PU PA PB PU
1, weather 5.0 50.0 45.0 15.8 21.1 63.22, politics 5.3 63.2 31.6 21.1 10.5 68.43, buying car 0.0 68.2 31.8 23.8 23.8 52.44, romance 12.5 68.8 18.8 48.0 36.0 16.0
Are the distributions of choices (amongPA,PB, andPU) in the base caseis significantly different from the distribution of choices in thecorresponding extended case?
YES for the third domain (3.B and 3.E, buying a car) — Fisher(N = 43) = 10.693, p< 0.001, 10000 sampled tables, MC99.NO for the first domain (1.B and 1.E, weather forecasts) — Fisher(N = 39) = 3.832, p= 0.187, 10000 sampled tables, MC99.
23 of 31
Post Hoc: Distributions Extended Cases
Base Cases Extended CasesPA PB PU PA PB PU
1, weather 5.0 50.0 45.0 15.8 21.1 63.22, politics 5.3 63.2 31.6 21.1 10.5 68.43, buying car 0.0 68.2 31.8 23.8 23.8 52.44, romance 12.5 68.8 18.8 48.0 36.0 16.0
Domain has a significant effect on the distribution of positions — Fisher(N = 84) = 16.308, p< 0.05, 10000 sampled tables, MC99.
24 of 31
Post Hoc: Relevance and Agreement
Base cases Extended cases
RB†
Md∗B RE†
Md∗E C.D.‡
Rel
evan
ce 1, weather 110.38 6.00 82.92 4.00 46.602, politics 107.45 6.00 69.45 4.00 47.19
3, buying car 118.05 6.50 67.45 4.00 44.384, romance 48.34 2.00 44.40 2.00 46.57
Agr
eem
ent 1, weather 116.38 6.00 87.18 4.00 46.60
2, politics 103.34 6.00 65.05 4.00 47.193, buying car 121.93 6.50 64.33 4.00 44.384, romance 44.94 2.00 44.20 2.00 46.57
Statistically significant cases when |Rx−Ry|> C.D.† Mean rank as computed with the Kruskal-Wallis test‡ Critical Difference, as computed in [Siegel and Castellan Jr., 1988] citedby [Field, 2009] with α= 0.05.
25 of 31
Post Hoc: Relevance and Agreement
Scenario 3.B Scenario 4.B
R3.B†
Md∗3.B R4.B†
Md∗4.B C.D.‡
Relevance 118.05 6.50 48.34 2.00 47.79Agreement 121.93 6.50 44.94 2.00 47.79
Statistically significant cases when |Rx−Ry|> C.D.† Mean rank as computed with the Kruskal-Wallis test‡ Critical Difference, as computed in [Siegel and Castellan Jr., 1988] citedby [Field, 2009] with α= 0.05.
26 of 31
BackgroundThe Experiment
MethodologyResults
Conclusions
27 of 31
Conclusions
Investigation into the relationship between formal systems ofdefeasible argumentation and arguments in natural languageResults suggest a correspondence between the formal theory and itsrepresentation in natural languagePreference generally applied “following” Prakken and Sartor:importance of being able to represent them
Humans evaluate preference depending on the contextCollateral knowledgeReverse of preference
28 of 31
Acknowledgement
Research was sponsored by US Army Research laboratory and the UK Ministryof Defence and was accomplished under Agreement Number W911NF-06-3-0001.The views and conclusions contained in this document are those of the authorsand should not be interpreted as representing the official policies, either expressedor implied, of the US Army Research Laboratory, the U.S. Government, the UKMinistry of Defense, or the UK Government. The US and UK Governments areauthorized to reproduce and distribute reprints for Government purposesnotwithstanding any copyright notation hereon.
This research has been carried out within the project “Scrutable AutonomousSystems” (SAsSY), funded by the Engineering and Physical Sciences ResearchCouncil (EPSRC, UK), grant ref. EP/J012084/1.
29 of 31
Advert
30 of 31
References I
[Field, 2009] Field, A. (2009).Discovering Statistics Using SPSS (Introducing Statistical Methods series).SAGE Publications Ltd.
[Siegel and Castellan Jr., 1988] Siegel, S. and Castellan Jr., N. J. (1988).Nonparametric Statistics for The Behavioral Sciences.McGraw-Hill Humanities/Social Sciences/Languages.
31 of 31