connectionist modeling of sentence comprehension as mental ...kvasnicka/seminar_of_ai/farkas... ·...
TRANSCRIPT
Connectionist modeling of sentence comprehension as mental simulation
in simple microworld
Igor FarkašDepartment of applied informatics
Comenius UniversityBratislava
AI seminar, October 2009, FIIT STU Bratislava
How does human cognition work?
brain
perception
action
cognitionenvironment
body
● What is cognition?● Where and how is knowledge represented?
Symbolic knowledge representation
● properties
– symbols, transduced from perceptual inputs
– (conceptual) symbols are amodal (new repr. language)
– mental narratives using “inner speech” or words
– cognition separated from perception
● Virtues (of this expressive powerful type of KR)
– Productivity, type-token distinction, categorical inferences, accounts for abstractness, compositionality, propositions
● Problems
– lacking empirical evidence, symbol grounding problem explanation of transduction, integration with other sciences
– neither parsimonious, nor falsifiable (post-hoc accounts)
Embodied knowledge representation
● properties
– Symbols are perceptual, derived from perceptual inputs
– (conceptual) symbols are modal
– mental narratives using are modality-tied (e.g. perceptual simulations)
– cognition overlaps with perception
● Virtues
– accumulated empirical evidence, symbol grounding solved, accounts for abstractness, makes predictions for exper.
● difficulties
– abstractness, type-token distinctions, categorical inferences
Amodal vs Perceptual Symbol System
(Barsalou, 1999)
Meaning - a key concept for cognition
● What is meaning?
– content carried by signs during communication with environment
● realist semantics● Extensional - meanings as objects in the world (Frege, Tarski)
● Intensional - meanings as mappings to possible worlds (Kripke)
● cognitive semantics● meanings as mental entities (Barsalou, Lakoff, Rosch,...)
● Meanings go beyond language
– linguistic view too restricted
– cf. functionalist semantics (Wittgenstein,...), speech acts
Meanings in language comprehension
● Are propositions necessary?
– Barsalou: yes, belief: can be realized by (mental) simulators
● Mental simulation as alternative theory
● empirical evidence, e.g. Stanfield & Zwaan 2001
– “John put the pencil in the cup / drawer”
– How to get from in(pencil, cup) to orientation(pencil, vertical)?
● theory of text understanding:
3 levels of representation (Kintsch & van Dijk, 1978)
● surface level – e.g. Pencil is in cup. There is a pencil in the cup.● propositional level - e.g. in(pencil, cup)● situational level – goes beyond language
Sentence comprehension in neural nets
● typically off-line training mode (no autonomy)
● distributed representations involved
● earlier NN models – use propositional representations (usually prepared before-hand)
– e.g. Hadley, Desay, Dominey, St. John & McClelland, Miikkulainen, Mayberry et al, …
● our approach – based on (distributed) situational representations
– motivated by Frank et al's (2003-) work
InSOMnet
(Mayberry & Miikkulainen, 2003, in press)
Minimum Recursion Semantics Framework
Situation space of a microworld
● situational space is built from example situations, exploiting their statistical properties (constraints), in self-organized way
● representations are analogical (cf. Barsalou's perceptual symbols) and non-compositional
● microworld of Frank et al (2003-)
– 3 persons, engaged in various activities at various places, jointly or independently
– Situation ~ consists of basic events
– operates on 'higher' level: using amodal reps
● Our (object) microworld
– max. 2 objects is a scene, various positions, identity and color.
– Situation ~ consists of object properties (rather than events)
– hence, representations are modal
Microworld properties / constraints
● small 2D grid microworld (3x3)
● max. 2 objects (blocks, pyramids) simultaneously present in a situation, two colours (red, blue)
● Microworld constraints:
– all objects are subject to gravity
– only one object at a time can be help in the air (by an arm)
– pyramid is an unstable object (cannot support another object)
=> objects are more likely to be on the ground
Building a situational space
● train a self-organizing map (SOM) with possible example situations
● Situations presented to SOM in the form of binary proposition vectors – specifying object position & features (two visual streams)
– [x1 y1 x2 y2 id1 id2 clr1 clr2]
“Where” | “what” e.g.
[0110 1100 0011 0011 | 01 10 00 11]
Situational representations = (non-propositional) distributed output activations of SOM
Position encoding:right = 0011 = upmiddle = 0110 = middleleft = 1100 = bottom
24-dim
i =[
i (p),...
i(q)]
unit i
Property encoding:Block = [10], pyramid = [01]Red = [10], blue = [01]
(Kohonen, 1995)
Propositions – occurrence of properties
● Microworld is described by example situations (non-linguistic description)
● Each situation j: proposition vector = a boolean combination of 24 basic properties: b
j = [b
j(p),b
j(q),...]
– bj(p) indicates whether basic property p occurs in situation j
– there exist dependencies b/w components (properties)
● Rules of fuzzy logic applicable for combination of properties
bj(¬p) = 1 – b
j(p)
bj(p∧q) = b
j(p).b
j(q) we used instead: min{b
j(p),b
j(q)}
bj(p∨q) = b
j(p) +b
j(q) - b
j(p∧q)
Probabilities and beliefs about properties
A priori probability about occurrence of property p in microworld
Prob p =1/ k∑ j=1
kb j p
n = 12x12
K =
275 s
ituati
ons
where what
SOM accurately approximates probabilities in microworld by beliefs in DSS:
(CorrCoef ≈ 0.98)
Microworld: pDSS
probabilities beliefs
dim. reduction (k to n)
p=1/ n∑ j=1
n j p
SOM representations of basic properties
Extracting beliefs from SOM output
P p∣X =P p∧X
P X
p∣X =
∑i
min {i p , x i }
∑i
x i
SOM: neurons i = 1,2,...,n
For each proposition and each neuron:
membership value: i (p) = extent to which
neuron i contributes to representing property p
The whole map: (p) =[1 (p),
2 (p),...,
n (p)]
Belief in p in situation X
Assume generated situation vector (SOM output) X = [x1x
2 ... x
n]
Conditional probability:
p
X
Modeling text comprehension
● microlanguage with 13 words:
red, blue, block, pyramid,left, right, on-top, up, in-middle, bottom, above, just, '.'
● Length: 4-5 (1 obj), 7-8 (2 obj)
● Word encoding: localist
● Standard Elman network, with 13-h-144 units
● trained via error back-propagation learning algorithm
● (in general) a rather complex mapping: simplified scheme used (1 sentence ~ 1 situation) Input sequence: red block in-middle
blue pyramid up right .
Rules for sentence generation
● Object 1 - always specified with absolute position
● If object 2 shares one coordinate with object 1,
then object 2 is given relative position
– e.g. “red block in-middle red pyramid above .”
otherwise absolute position
– e.g. “red block in-middle red pyramid up right .”
● If object lying at bottom alone, posY not specified by any word.
● For relative pos: just left (distance 1) or left (distance 2)
● In-middle – ambiguous (applies to both coordinates)
Simulation setup
● hidden layer size manipulated (60-120 units)
● logistic units at hidden and output layers
● all network weights randomly initialized (-.1,+1)
● constant learning rate = 0.05
● weights updated at each step
● target (DSS vector) fixed during sentence presentation
● average results reported (over 3 random splits)
● training set: 200 sentences, test set: 75 sentences
● training: 4000 epochs
Sentence comprehension score
Evolution of comprehension score during sentence processing (110 hidden units)(evaluated at the end of sentences)
p∣S − p1− p
if p∣S p
p∣S − p p
otherwise
Comprehensionscore =
Merging syntax with semantics
● NN forced to simultaneously learn to predict next words (in addition to situational representation)
● internal representations shared
Next word
delayPrediction measure used:Normalized negative log-likelihood:
NNL ∝ -<log(p(wnext
|ctx)>
= probs of the next wordCurrent word
(outputs first converted to probs)
Prediction results
Model 1 Model 2
# hidden units [compreh score] Trn / tst
[compreh score] Trn / tst
[NNL] Trn / tst
90 .61 / .42 .61 / .44 .34 / .42
100 .62 / .47 .67 / .40 .30 / .41
110 .64 / .43 .64 / .44 .31 / .37
The lower NNL, the better prediction
Model 1: w/out next-word predictionModel 2: with next-word prediction
Breaking down comprehension score
Most difficult testing predictions
Lowest compreh. score (<.1):
● Situations with two objects, at least one not at bottom. ● Situations that were more different from all training sentences (by 2 properties)
=>
2 degrees of generalization (underlying systematicity)
Summary
● We presented a connectionist approach to (simple) sentence comprehension based on (mental simulation of) distributed situational representations of the block microworld.
● Situational representations are grounded in vision (what+where info), constructed online from example situations.
● Sentence understanding was evaluated by comprehension score which was in all cases positive.
● The model can learn both semantics and syntax at the same time.
● Questions: Scaling up (non-propositional reps)? How about abstract concepts?
Ďakujem za pozornosť.