baby’s eye view: temporal dynamics of rapid visual object learning
DESCRIPTION
We hope to show that infants might be able to gather enough information to learn to locate faces very quickly. To this end, we will gather visual information and other information available to infants. - PowerPoint PPT PresentationTRANSCRIPT
Baby’s Eye View: Temporal Dynamics of Rapid Visual Object LearningNicholas Butko ♦ Dept. of Cognitive Science ♦ UCSD ♦ [email protected]
Ian Fasel, Javier Movellan ♦ Institute for Neural Computation ♦ {ianfasel, movellan}@mplab.ucsd.edu
Motivation Motivation
We set out to explore the nature of the visual information that neonate infants have available to them. Is it enough to learn detailed object categories reliably? If so, this provides evidence that an alternative hypothesis to the dominant paradigm is feasible, viz. that infants may not be born with the ability to recognize conspecifics.
Hyper Adaptation? Hyper Adaptation? “We wish to propose the general term CONSPEC to refer to a unit of mental architecture in any species that ... contains structural information concerning the visual characteristics of conspecifics.” [bold emph. added]
--Morton & Johnson, Psych. Review, 1991
Computational ModelComputational Model
Social ContingencySocial Contingency
Baby Robot BEVBaby Robot BEV Can Faces Be Learned?Can Faces Be Learned?
Rapid Learning Rapid Learning HypothesisHypothesis
BEV DatasetBEV Dataset
• We hope to show that infants might be able to gather enough information to learn to locate faces very quickly.
• To this end, we will gather visual information and other information available to infants.
• We will present a computational learning system that shows that using only this limited amount of information, faces can be reliably located in images.
Current Hypotheses:
• To collect data from a Baby’s Eye View, we created BEV, a simple baby robot.
• BEV has two sensors: • A microphone in the chest to detect overall volume• An IEEE1394 Webcam in the forehead, capturing unrectified 320x240 pixel images.
• BEV has one actuator: • A monaural speaker in the chest, for vocalization.
Contingency Detection and Data Collection:
Schematic Generalization Schematic Generalization
Evidence for Conspecific Processing: [Johnson et al., Cognition, 1991]
• Social Hypothesis: - Infants are genetically predisposed to look at things that look like human faces.
• Sensory Hypothesis: - Infants look for general visual features, which are shared by faces
--Kleiner & Banks, Experimental Psych., Human Perception & Performance 1987
Infants quickly become interested in certain aspects of the visual scene presented to them, and learn to attend to specific salient things.
Bushnell et al. 1989
- 2 day old infants fixate longer to images of their mothers than to images of other women with similar hair colors and facial complexion
Evidence for Rapid Learning:
Segmental Boltzmann Segmental Boltzmann ProcessesProcesses
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
• Watson (1972) found that two-month infants exhibit social responses to contingent mobiles, indicating that infants use contingency as a means of identifying caregivers.
• Movellan & Watson (1985) found that ten-month infants are very optimized detectors of contingency.
• Movellan (2002) developed a model of this optimal contingency detection based on the principles of information maximization and optimal control.
• For this experiment we used auditory contingencies as a cue for the presence of a person. However, other cues like touch or uninitiated motion may be more appropriate for neonates, and should produce similar results.
• Fasel & Movellan (2006) developed a novel visual learning algorithm called “Segmental Boltzmann processes”
• This algorithm is a weakly supervised algorithm. It requires one label for per image, indicating whether an object of a category of interest is in that image with a probability better than chance.
• From this weak label, the algorithm learns to localize the object of interest in novel images, or indicate that the object is not present.
• The algorithm is a probabilistic model that looks for “objects”: clusters of pixels that are codependent but independent of the rest of the image.
• Segmental Boltzmann Processes can be viewed as a connectionist architecture, simulating 4,000,000 neurons running in real time (30 Frames Per Second).
• Segmental Boltzmann Processes are ideal for multimodal learning in which a secondary modality can provide a better than chance label about the presence of an object in the visual field.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
• BEV was attached to a Mac PowerBook G4 laptop that ran contingency detection software and stored data continuously for 88 Minutes while the experiment was in progress, making moment-to-moment decisions about how to best vocalize in order to detect people.
• BEV used her speaker to utter baby sounds collected from the Internet. There were five sounds, ranked in level of excitement from high -> low by the experimenters. These were uttered when high -> low levels of contingency were detected respectively.
• 9 subjects were asked to interact with BEV so as to make her excited.
• An image was added to the dataset whenever a) a vocalization was made, and b) BEV was 97.5% confident that a person was present or absent.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
• 3700 Images collected over 90 minutes of interaction.
• No experimenter intervention.
• Variety of lighting and background conditions.
• No post-processing of images (rectification, etc.)
18% - No face ; 4% - No Person
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
17% - Face ; 20% - Person
Contingency Detected
No Contingency Detected
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Very little information required:
Max Posterior & Posterior Probability Maps:
0 Minutes
3 Minutes
6 Minutes
(Of 3700 Images)(Of 90 Minutes)
“We would not expect the experience of the mother’s face to transfer to the two-dimensional schematic stimuli used with newborns.”
--Morton & Johnson, Psych. Review, 1991
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Performance of one BEV-Trained SBP learner on Johnson Stimuli:
Performance of all BEV-Trained SBP learners on Johnson Stimuli: