contrasting approaches to semantic knowledge representation and inference
DESCRIPTION
Contrasting Approaches To Semantic Knowledge Representation and Inference. Psychology 209 February 15, 2013. The Nature of Cognition. Many view human cognition as inherently Structured Systematic Rule-governed - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/1.jpg)
Contrasting Approaches ToSemantic Knowledge Representation and
Inference
Psychology 209February 15, 2013
![Page 2: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/2.jpg)
The Nature of Cognition• Many view human cognition as inherently
– Structured– Systematic– Rule-governed
• We argue instead that cognition (and the domains to which cognition is applied) is inherently– Quasi-regular– Semi-systematic– Context sensitive
![Page 3: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/3.jpg)
Emergent vs. Stipulated Structure
Old London Midtown Manhattan
![Page 4: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/4.jpg)
Natural vs. Stipulated Structure
![Page 5: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/5.jpg)
Where does structure come from?
• It’s built in
• It’s learned from experience
• There are constraints built in that shape what’s learned– We all agree about this!
![Page 6: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/6.jpg)
What is the nature of the constraints?
• Do they specify types of structural forms?– Kemp & Tenenbaum
• Or are they more generic?– Many other probabilistic models– Most work in the PDP framework
• Are they constraints on the process and mechanism? Or on the space of possible outcomes?
![Page 7: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/7.jpg)
Outline
• The Rumelhart model of semantic cognition, and how it relates to probabilistic models of semantic cognition.
• Some claims it instantiates about the nature of semantic cognition that distinguish it from K&T– Data supporting these claims, and challenging K&T
• Toward a fuller characterization of what the Rumelhart model learns (and one of the things it can do that sustains our interest in it).
• Some future directions
![Page 8: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/8.jpg)
The Rumelhart Model
The QuillianModel
![Page 9: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/9.jpg)
1. Show how learning could capture the emergence of hierarchical structure
2. Show how the model could make inferences as in the Quillian model
DER’s Goals for the Model
![Page 10: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/10.jpg)
Experience
Early
Later
LaterStill
![Page 11: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/11.jpg)
Start with a neutral representation on the representation units. Use backprop to adjust the representation to minimize the error.
![Page 12: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/12.jpg)
The result is a representation similar to that of the average bird…
![Page 13: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/13.jpg)
Use the representation to infer what this new thing can do.
![Page 14: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/14.jpg)
Questions About the Rumelhart Model
• Does the model offer any advantages over other approaches?
• Can the mechanisms of learning and representation in the model tell us anything about
• Development?• Effects of neuro-degeneration?
![Page 15: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/15.jpg)
Phenomena in Development• Progressive differentiation• Overgeneralization of
– Typical properties– Frequent names
• Emergent domain-specificity of representation
• Basic level advantage• Expertise and frequency effects• Conceptual reorganization
![Page 16: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/16.jpg)
Disintegration in Semantic Dementia
• Loss of differentiation • Overgeneralization
![Page 17: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/17.jpg)
• Distributed representations reflect different kinds of content in different areas
• Semantic dementia kills neurons in the temporal pole, and affects all kinds of knowledge
• We propose temporal pole as ‘hidden units’ mediating relations among properties in different modalities.
• Bi-directional connections between content-specific regions and temporal pole and recurrent connections
• [Medial temporal lobes provide a fast learning system for acquisition of new knowledge that may not be consistent with what is already known and as the site of initial storage of this knowledge.]
language
The Neural Basis of Semantic Cognition
![Page 18: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/18.jpg)
Neural Networks and Probabilistic Models
• The Rumelhart model is learning to match the conditional probability structure of the training data:
P(Attributei = 1|Itemj & Contextk) for all i,j,k
• The adjustments to the connection weights move them toward values than minimize a measure of the divergence between the network’s estimates of these probabilities and their values as reflected in the training data.
• It does so subject to strong constraints imposed by the initial connection weights and the architecture.– These constraints produce progressive differentiation, overgeneralization, etc.
• Depending on the structure in the training data, it can behave as though it is learning something much like one of K&T’s structure types, as well as many structures that cannot be captured exactly by any of the structures in K&T’s framework.
![Page 19: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/19.jpg)
The Hierarchical Naïve Bayes Classifier Model (with R. Grosse and J. Glick)
• The world consists of things that belong to categories.
• Each category in turn may consist of things in several sub-categories.
• The features of members of each category are treated as independent– P({fi}|Cj) = Pi p(fi|Cj)
• Knowledge of the features is acquired for the most inclusive category first.
• Successive layers of sub-categories emerge as evidence accumulates supporting the presence of co-occurrences violating the independence assumption.
Living Things
…
Animals Plants
Birds Fish Flowers Trees
![Page 20: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/20.jpg)
Property One-Class Model 1st class in two-class model
2nd class in two-class model
Can Grow 1.0 1.0 0Is Living 1.0 1.0 0Has Roots 0.5 1.0 0Has Leaves 0.4375 0.875 0Has Branches 0.25 0.5 0Has Bark 0.25 0.5 0Has Petals 0.25 0.5 0Has Gills 0.25 0 0.5Has Scales 0.25 0 0.5Can Swim 0.25 0 0.5Can Fly 0.25 0 0.5Has Feathers 0.25 0 0.5Has Legs 0.25 0 0.5Has Skin 0.5 0 1.0Can See 0.5 0 1.0
A One-Class and a Two-Class Naïve Bayes Classifier Model
![Page 21: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/21.jpg)
Accounting for the network’s feature attributions with mixtures of classes at
different levels of granularity R
egre
ssio
n B
eta
Wei
ght
Epochs of Training
Property attribution model:P(fi|item) = akp(fi|cjk) + (1-ak)[(ak-1p(fi|cjk-1) + (1-ak-2)[…])
![Page 22: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/22.jpg)
Should we replace the PDP model with the Naïve Bayes Classifier?
• It explains a lot of the data, and offers a succinct abstract characterization
• But– It only characterizes what’s learned when
the data actually has hierarchical structure
• So it may be a useful approximate characterization in some cases, but can’t really replace a model that captures structure more organically.
![Page 23: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/23.jpg)
An exploration of these ideas in the domain of mammals
• What is the best representation of domain content?
• How do people make inferences about different kinds of object properties?
![Page 24: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/24.jpg)
![Page 25: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/25.jpg)
Structure Extracted by a Structured Statistical Model
![Page 26: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/26.jpg)
![Page 27: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/27.jpg)
![Page 28: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/28.jpg)
Predictions
• Similarity ratings and patterns of inference will violate the hierarchical structure
• Patterns of inference will vary by context
![Page 29: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/29.jpg)
Experiments
• Size, predator/prey, and other properties affect similarity across birds, fish, and mammals– That is, there’s cross cutting structure that
is not consistent with the hierarchy
• Property inferences for blank biological properties violate predictions of K&T’s tree model for weasels, hippos, and other animals
![Page 30: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/30.jpg)
![Page 31: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/31.jpg)
![Page 32: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/32.jpg)
Applying the Rumelhart model to the mammals dataset
Items
Contexts
Behaviors
Body Parts
Foods
Locations
Movement
Visual Properties
![Page 33: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/33.jpg)
Simulation Results
• We look at the model’s outputs and compare these to the structure in the training data.
• The model captures the overall and context specific structure of the training data
• The model progressively extracts more and more detailed aspects of the structure as training progresses
![Page 34: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/34.jpg)
Simulation Output
![Page 35: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/35.jpg)
Training Data
![Page 36: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/36.jpg)
Network Output
![Page 37: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/37.jpg)
Progressive PCs
![Page 38: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/38.jpg)
![Page 39: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/39.jpg)
Improving on the Naïve Bayes Classifier as a Model of the Rumelhart Model
(with Andrew Saxe & Surya Ganguli)
ItemsFeatures
Items can be distributed patterns or localist units
Target is the set of features as in the Rumelhart model butcollapsed across contexts
![Page 40: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/40.jpg)
SVD of a Simple Hierarchical Domain
There can also be cross-cutting modes or semi-cross-cutting modesor even quasi-cross-cutting modes
![Page 41: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/41.jpg)
Time course of learning
Note the strong constraints imposed by the network on what isrepresented – this arises solely from the conservatism of learningrather than an explicit constraint on representation
![Page 42: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/42.jpg)
Learning the Structure in the Training Data
• Progressive sensitization to successive singular dimensions captures learning of the mammals data set and other data sets as captured in the SVD of the input-output correlation matrix.
• This subsumes the naïve Bayes classifier as a special case when there is pure hierarchical structure (as in the Quillian training corpus), and also applies to a range of other types of structure (cross-cutting structure, linear structure, … )
• The progressive learning of Singular dimensions is more generic, yet still very constraining, compared with other models.
• Question: Are PDP networks – and our brain’s neural networks – simply performing familiar statistical analyses?
– Maybe neural networks can extend beyond what we generally envision in such analyses
![Page 43: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/43.jpg)
Extracting Cross-Domain Structure
![Page 44: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/44.jpg)
v
![Page 45: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/45.jpg)
Input Similarity
Learned Similarity
![Page 46: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/46.jpg)
A Newer Direction
• Exploiting knowledge sharing across domains– Lakoff:
• Abstract cognition is grounded in concrete reality• And this applies to mathematical cognition
– Boroditsky:• Cognition about time is grounded in our
conceptions of space• Can we capture these kinds of influences
through knowledge sharing across contexts?• Work by Thibodeau, Glick, Sternberg & Flusberg
shows that the answer is yes
![Page 47: Contrasting Approaches To Semantic Knowledge Representation and Inference](https://reader035.vdocuments.mx/reader035/viewer/2022070502/56813061550346895d962ff0/html5/thumbnails/47.jpg)
Summary
• Distributed representations provide the substrate for learned semantic representations that develop gradually with experience and degrade gracefully with damage
• The structure that is learned is organic rather than constrained to conform to any specific structure type.
• The SVD representation can capture much of this structure, but neural networks can go beyond this, e.g., exploiting second-order similarity.
• Structures of specific types should be seen as useful approximations rather than a procrustean bed into which actual knowledge must be thought of as conforming.