Systematicity in sentence processing by recurrent neural networks
Post on 12-Jan-2016
Embed Size (px)
DESCRIPTIONSystematicity in sentence processing by recurrent neural networks. Stefan Frank Nijmegen Institute for Cognition and Information Radboud University Nijmegen The Netherlands. Please make it heavy on computers and AI and light on the psycho stuff - PowerPoint PPT Presentation
Systematicity in sentence processingby recurrent neural networksStefan FrankNijmegen Institute for Cognition and InformationRadboud University NijmegenThe Netherlands
Please make it heavy on computers and AI and light on the psycho stuff (Konstantopoulos, personal communication, December 23, 2005)
Systematicity in languageImagine you meet someone who only knows two sentences of English:Could you please tell me where the toilet is?I cant find my hotel.So (s)he does not know:Could you please tell me where my hotel is?I cant find the toilet.This person has no knowledge of English but simply memorized some lines from a phrase book.
Systematicity in languageHuman language behavior is (more or less) systematic: if you know some sentences, you know many.Sentences are not atomic but made up of words.Likewise, words can be made up of morphemes. (e.g., un + clear = unclear, un + stable = unstable, )It seems like language results from applying a set of rules (grammar, morphology) to symbols (words, morphemes).
Systematicity in languageThe Classical symbol system hypothesis: the mind contains word-like symbols the are manipulated by structure-sensitive processes (Fodor & Pylyshyn, 1988). E.g., for dealing with language:boy and girl are nouns (N)loves and sees are verbs (V)N V N is a possible sentence structureThis hypothesis explains the systematicity found in language: If you know the N V N structure, you know all N V N sentences (boy sees girl, girl loves boy, boy sees boy, )
Some issues for the Classical theoryLack of systematic behavior: Why are people often so unsystematic in practice?The boy plays. OKThe boy who the girl likes plays. OKThe boy who the girl who the man sees likes plays. OK?The athlete who the coach who the sponsor hired trained won. OK!
Some issues for the Classical theoryLack of systematic behavior: Why are people often so unsystematic in practice?Lack of systematicity in language: Why are there exceptions to rules?help + full = helpfulhelp + less = helplessmeaning + full = meaningfulmeaning + less= meaninglessbeauty + full= beautifulbeauty + less= ugly
Some issues for the Classical theoryLack of systematic behavior: Why are people often so unsystematic in practice?Lack of systematicity in language: Why are there exceptions to rules?Development: How do children learn the rules from what they hear?The Classical theory has answers to these questions, but no explanations.
ConnectionismThe state of mind is represented as a pattern of activity over a large number of simple, quantitative (i.e., non-logical) processing units (neurons).These units are connected by weighted links, forming a (neural) network through which activation moves around.The connection weights are adjusted to the networks input and task.The network develops its own internal representation of the input.It should generalize to new (test) inputs
Connectionism and the Classical issuesLack of systematic behavior: Systematicity is built on top of an unsystematic architecture.Lack of systematicity in language: Beautiless is expected statistically but never occurs, so the network learns it doesnt exist.Development: The network adapts to its input.
But can neural networks explain systematicity, or even behave systematically?
Connectionism and systematicityFodor & Pylyshyn (1988): Neural networks cannot be systematic. They only learn to associate examples rather than becoming sensitive to structure.Systematicity: knowing X knowing Y. Generalization: training on X learning Y. So, systematicity equals generalization (Hadley, 1994)Demonstrations of connectionist systematicityrequire many training examples but only use few testsare not robust: oversensitive to training detailsonly display weak systematicity: words occur in the same syntactic positions of training and test sentences
Simple Recurrent Networks Elman (1990)input layerhidden layeroutput layerFeedforward networks have long-term memory (LTM) but no short-term memory (STM). So how to process sequential input, like the words of a sentence?
SRNs and systematicity Van der Velde et al. (2004)An SRN processed a minilanguage with18 words (boy, girl, loves, sees, who, ., )3 sentence types:N V N .(boy sees girl.)N V N who V N .(boy sees girl who loves boy.)N who N V V N .(boy who girl sees loves boy.)Nouns and verbs were divided into four groups, each had two nouns and two verbs.In training sentences, nouns and verbs were from the same group: < 0.44% of sentences used for training.In test sentences, nouns and verbs came from different groups. Note: weak systematicity only.
SRNs and systematicity Van der Velde et al. (2004)SRNs fail on test sentences, soThey do not generalize to structurally similar sentencesThey cannot learn systematic behavior from a small training setThey do not form good models of human language behaviorButwhat does it mean to fail? Maybe the network was more than completely non-systematic?was the size of the network appropriate?larger network more STM better processing ?smaller network less LTM better generalization ?was the language complex enough? With more different words there is more reason to abstract to syntactic types (nouns, verbs)
SRNs and systematicityreplication of Van der Velde et al. (2004)What if a network does not generalize at all? When given a new sentence, it can only use the last word because combing words requires generalization.This hypothetical, unsystematic network serves as the baseline for rating SRN performance.Performance +1:network never makes ungrammatical predictionsPerformance 0:network does not generalize at all, but gives the best possible output based on the last wordPerformance 1:network only makes ungrammatical predictions.Positive performance indicates systematicity
Network architectureinput layerrecurrent hidden layerhidden layeroutput layerw = 18 units (one for each word)10 unitsn = 20 unitsw = 18 units (one for each word)
SRN ResultsPositive performance at each word of each test sentence type, so there is some systematicity.
SRN Resultseffect of recurrent layer sizeLarger networks (n = 40) do better, but very large ones (n = 100) overfit.N V NN V N who V NN who N V V N
SRN performance and memorySRNs do show systematicity to some extent.But their performance is limited:small n limited processing capacity (STM)large n large LTM overfitting.How to combine large STM with small LTM?
Echo State NetworksJaeger (2003)Keep the connections to and within the recurrent layer fixed at random values.The recurrent layer becomes a dynamical reservoir: a non-specific STM for the input sequence.Some constraints on the dynamical reservoir:large enoughsparsely connected (here: 15%)weight matrix has spectral radius < 1LTM capacity:In SRNs: O(n2)In ESNs: O(n)So, can ESNs combine large STM with small LTM?
Network architectureinput layerrecurrent hidden layerhidden layeroutput layer= trained= untrainedThe STM remains untrained, but the network does develop internal representationsw = 18 units10 unitsn = 20 unitsw = 18 units
ESN ResultsPositive performance at each word of each test sentence type, so there is some systematicity, but less than in an SRN of the same size
ESN Resultseffect of recurrent layer sizeBigger is better: no overfitting even when n = 1530!N V NN V N who V NN who N V V N
ESN Resultseffect of lexicon size (n = 100)Note: with larger w, a smaller percentage of possible sentences is used for training.N V NN V N who V NN who N V V N
Strong systematicity30 words (boy(s), girl(s), like(s), see(s), who, )Many sentence types:N V N . (girl sees boys.)N V N who V N . (girl sees boys who like boy.)N who N V V N . (girl who boy sees likes boy.)N who V N who N V . (girls who like boys see boys who girl likes.) Unlimited recursion (girls see boy who sees boy who sees man who )Number agreement between nouns and verbs
Strong systematicityIn training sentences: females as grammatical subjects, males as grammatical objects (girl sees boy)In test sentences: vice versa (boy sees girl)Positive performance on all words of four test sentences types:N who V N V N . (boy who likes girls sees woman.)N V N who V N . (boy likes girls who see woman.)N who N V V N . (boys who man likes see girl.)N V N who N V . (boys like girl who man sees.)
ConclusionsESNs can display both weak and strong systematicityEven with few training sentences and many test sentencesBy doing less training, the network can learn more:Training fewer connections gives better resultsTraining a smaller part of possible sentences gives better resultsCan connectionism explain systematicity?No, because neural networks do not need to be systematicYes, because they need to adapt to systematicity in the training input.The source of systematicity is not the cognitive system, but the external world.