systematicity in sentence processing by recurrent neural networks

27
Systematicity in sentence processing by recurrent neural networks Stefan Frank Nijmegen Institute for Cognition and Information Radboud University Nijmegen The Netherlands

Upload: jerold

Post on 12-Jan-2016

60 views

Category:

Documents


0 download

DESCRIPTION

Systematicity in sentence processing by recurrent neural networks. Stefan Frank Nijmegen Institute for Cognition and Information Radboud University Nijmegen The Netherlands. “Please make it heavy on computers and AI and light on the psycho stuff” - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Systematicity in sentence processing by recurrent neural networks

Systematicity in sentence processing

by recurrent neural networks

Stefan FrankNijmegen Institute for Cognition and

InformationRadboud University Nijmegen

The Netherlands

Page 2: Systematicity in sentence processing by recurrent neural networks

“Please make it heavy on computers and AI and light on the psycho stuff”

(Konstantopoulos, personal communication, December 23, 2005)

Page 3: Systematicity in sentence processing by recurrent neural networks

Systematicity in language

Imagine you meet someone who only knows two sentences of English:

Could you please tell me where the toilet is?I can’t find my hotel.

So (s)he does not know:

Could you please tell me where my hotel is?I can’t find the toilet.

This person has no knowledge of English but simply memorized some lines from a phrase book.

Page 4: Systematicity in sentence processing by recurrent neural networks

Systematicity in language

Human language behavior is (more or less) systematic: if you know some sentences, you know many. Sentences are not atomic but made up of words. Likewise, words can be made up of morphemes.

(e.g., un + clear = unclear, un + stable = unstable, …)

It seems like language results from applying a set of rules (grammar, morphology) to symbols (words, morphemes).

Page 5: Systematicity in sentence processing by recurrent neural networks

Systematicity in language

The Classical symbol system hypothesis: the mind contains word-like symbols the are manipulated by structure-sensitive processes (Fodor & Pylyshyn, 1988). E.g., for dealing with language:

– boy and girl are nouns (N)– loves and sees are verbs (V)– N V N is a possible sentence structure

This hypothesis explains the systematicity found in language: If you know the N V N structure, you know all N V N sentences (boy sees girl, girl loves boy, boy sees boy, …)

Page 6: Systematicity in sentence processing by recurrent neural networks

Some issues for the Classical theory

Lack of systematic behavior: Why are people often so unsystematic in practice?The boy plays. OK

The boy who the girl likes plays. OK

The boy who the girl who the man sees likes plays. OK?The athlete who the coach who the sponsor hired trained won. OK!

Page 7: Systematicity in sentence processing by recurrent neural networks

Some issues for the Classical theory

Lack of systematic behavior: Why are people often so unsystematic in practice?

Lack of systematicity in language: Why are there exceptions to rules?

help + full = helpfulhelp + less = helplessmeaning + full = meaningfulmeaning + less = meaninglessbeauty + full = beautifulbeauty + less = ugly

Page 8: Systematicity in sentence processing by recurrent neural networks

Some issues for the Classical theory

Lack of systematic behavior: Why are people often so unsystematic in practice?

Lack of systematicity in language: Why are there exceptions to rules?

Development: How do children learn the rules from what they hear?

The Classical theory has answers to these questions, but no explanations.

Page 9: Systematicity in sentence processing by recurrent neural networks

Connectionism

The “state of mind” is represented as a pattern of activity over a large number of simple, quantitative (i.e., non-logical) processing units (“neurons”).

These units are connected by weighted links, forming a (neural) network through which activation moves around.

The connection weights are adjusted to the network’s input and task.

The network develops its own internal representation of the input.

It should generalize to new (test) inputs

Page 10: Systematicity in sentence processing by recurrent neural networks

Connectionism and the Classical issues

Lack of systematic behavior: Systematicity is built on top of an unsystematic architecture.

Lack of systematicity in language: “Beautiless” is expected statistically but never occurs, so the network learns it doesn’t exist.

Development: The network adapts to its input.

But can neural networks explain systematicity, or even behave systematically?

Page 11: Systematicity in sentence processing by recurrent neural networks

Connectionism and systematicity

Fodor & Pylyshyn (1988): Neural networks cannot be systematic. They only learn to associate examples rather than becoming sensitive to structure.

Systematicity: knowing X knowing Y.Generalization: training on X learning Y.So, systematicity equals generalization (Hadley, 1994)

Demonstrations of connectionist systematicity– require many training examples but only use few tests– are not robust: oversensitive to training details– only display weak systematicity: words occur in the

same ‘syntactic positions’ of training and test sentences

Page 12: Systematicity in sentence processing by recurrent neural networks

Simple Recurrent Networks Elman (1990)

input layer

hidden layer

output layer

Feedforward networks have long-term memory (LTM) but no short-term memory (STM). So how to process sequential input, like the words of a sentence?

A common SRN task is next- word prediction: The words of a sentences form the input sequence is. After each word, the output should be the next word.

Page 13: Systematicity in sentence processing by recurrent neural networks

SRNs and systematicity Van der Velde et al. (2004)

An SRN processed a minilanguage with 18 words (boy, girl, loves, sees, who, “.”, …) 3 sentence types:

– N V N . (boy sees girl.)– N V N who V N . (boy sees girl who loves boy.)– N who N V V N . (boy who girl sees loves boy.)

Nouns and verbs were divided into four groups, each had two nouns and two verbs. In training sentences, nouns and verbs were from the same group: < 0.44% of sentences used for training. In test sentences, nouns and verbs came from different groups. Note: weak systematicity only.

Page 14: Systematicity in sentence processing by recurrent neural networks

SRNs and systematicity Van der Velde et al. (2004)

SRNs “fail” on test sentences, so– They do not generalize to structurally similar sentences– They cannot learn systematic behavior from a small training

set– They do not form good models of human language behavior

But– what does it mean to “fail”? Maybe the network was more

than completely non-systematic?– was the size of the network appropriate?

larger network more STM better processing ?smaller network less LTM better

generalization ?– was the language complex enough? With more different

words there is more reason to abstract to syntactic types (nouns, verbs)

Page 15: Systematicity in sentence processing by recurrent neural networks

SRNs and systematicityreplication of Van der Velde et al. (2004)

What if a network does not generalize at all? When given a new sentence, it can only use the last word because combing words requires generalization.

This hypothetical, unsystematic network serves as the baseline for rating SRN performance.– Performance +1: network never makes ungrammatical predictions– Performance 0: network does not generalize at all, but gives the best possible output based on the

last word– Performance –1: network only makes ungrammatical predictions.

Positive performance indicates systematicity

Page 16: Systematicity in sentence processing by recurrent neural networks

Network architecture

input layer

recurrenthidden layer

hidden layer

output layer w = 18 units(one for each word)

10 units

n = 20 units

w = 18 units(one for each word)

Page 17: Systematicity in sentence processing by recurrent neural networks

SRN Results

Positive performance at each word of each test sentence type, so there is some systematicity.

Page 18: Systematicity in sentence processing by recurrent neural networks

SRN Resultseffect of recurrent layer size

Larger networks (n = 40) do better, but very large ones (n = 100) overfit.

N V N N V N who V N N who N V V N

Page 19: Systematicity in sentence processing by recurrent neural networks

SRN performance and memory

SRNs do show systematicity to some extent. But their performance is limited:

− small n limited processing capacity (STM)− large n large LTM overfitting.

How to combine large STM with small LTM?

Page 20: Systematicity in sentence processing by recurrent neural networks

Echo State NetworksJaeger (2003)

Keep the connections to and within the recurrent layer fixed at random values. The recurrent layer becomes a “dynamical reservoir”: a non-specific STM for the input sequence. Some constraints on the dynamical reservoir:

− large enough− sparsely connected (here: 15%)− weight matrix has spectral radius < 1

LTM capacity:− In SRNs: O(n2)− In ESNs: O(n)

So, can ESNs combine large STM with small LTM?

Page 21: Systematicity in sentence processing by recurrent neural networks

Network architecture

input layer

recurrenthidden layer

hidden layer

output layer

= trained= untrained

The STM remains untrained, but the network does develop internal representations

w = 18 units

10 units

n = 20 units

w = 18 units

Page 22: Systematicity in sentence processing by recurrent neural networks

ESN Results

Positive performance at each word of each test sentence type, so there is some systematicity, but less than in an SRN of the same size

Page 23: Systematicity in sentence processing by recurrent neural networks

ESN Resultseffect of recurrent layer size

Bigger is better: no overfitting even when n = 1530!

N V N N V N who V N N who N V V N

Page 24: Systematicity in sentence processing by recurrent neural networks

ESN Resultseffect of lexicon size (n = 100)

Note: with larger w, a smaller percentage of possible sentences is used for training.

N V N N V N who V N N who N V V N

Page 25: Systematicity in sentence processing by recurrent neural networks

Strong systematicity

30 words (boy(s), girl(s), like(s), see(s), who, …) Many sentence types:

– N V N . (girl sees boys.)– N V N who V N . (girl sees boys who like boy.)– N who N V V N . (girl who boy sees likes boy.)– N who V N who N V . (girls who like boys see boys who girl likes.)

Unlimited recursion (girls see boy who sees boy who sees man who …)

Number agreement between nouns and verbs

Page 26: Systematicity in sentence processing by recurrent neural networks

Strong systematicity

In training sentences: females as grammatical subjects, males as grammatical objects (girl sees boy)

In test sentences: vice versa (boy sees girl) Positive performance on all words of four test sentences types:

– N who V N V N . (boy who likes girls sees woman.)– N V N who V N . (boy likes girls who see woman.)– N who N V V N . (boys who man likes see girl.)– N V N who N V . (boys like girl who man sees.)

Page 27: Systematicity in sentence processing by recurrent neural networks

Conclusions

ESNs can display both weak and strong systematicity Even with few training sentences and many test sentences By doing less training, the network can learn more:

– Training fewer connections gives better results– Training a smaller part of possible sentences gives better results

Can connectionism explain systematicity?− No, because neural networks do not need to be

systematic− Yes, because they need to adapt to systematicity in

the training input.

The source of systematicity is not the cognitive system, but the external world.