lecture 1 connecting the mind with the world: an empirical introduction zenon pylyshyn, rutgers...
Post on 29-Dec-2015
217 Views
Preview:
TRANSCRIPT
Lecture 1 Connecting the mind with the
world: An empirical introduction
Zenon Pylyshyn, Rutgers University
http://ruccs.rutgers.edu/faculty/pylyshyn.html
Plan of the lectures1. I will introduce a mechanism that I believe to be
part of the autonomous vision module. This mechanism, which I call a Visual Index or FINST, is able to keep track of several token individual things in a visual scene while the scene is being encoded or is changing. I will illustrate this mechanism with some experimental findings and demonstrations. I will conclude the first lecture by beginning to building a bridge from the empirically-based assumptions about FINSTs to certain philosophical issues– particularly issues of nonconceptual representation.
2. In the second lecture will discuss the generality and importance of attention and selection for this set of problems, and will relate these to the FINST proposal. I will then discuss the proposal that FINSTs pick out things rather than places. This will lead me to examine Austen Clark’s views of sentience and his claim that in sentience, spatiotemporal regions are picked out nonconceptually – i.e., a version of the Feature-Placing hypothesis initially proposed by Peter Strawson.
3. Lecture 3 will introduce the topic of the nonconceptual representation of space, including the question of the status of conscious experience in perception science. I will then attempt to cast the problem of spatial representation in a general framework – one to which Jean Nicod contributed important insights. This framework will draw heavily on the insights of Henri Poincaré and will introduce the role of visuomotor coordination and the general coordinate transformation problem. From this I will argue that there is no unitary spatial coordinate system nor any representations of empty places – only indexes to sensory individuals.
4. Following the discussion of space, the final lecture will examine contemporary views concerning how properties of space are captured in thought, namely the view that our ability to represent space in reasoning derives from some form of internalization of space. I will argue that a popular version of this approach, based on neuroimaging data, is misguided and I will suggest a different approach – one that relies on the function provide by visual indexes.
Different types of mind-world interfaces “I say that vision occurs when the image of the whole
hemisphere of the world that is before the eye … is fixed in the reddish white concave surface of the retina. How the image or picture is composed by the visual spirits that reside in the retina and the [optic] nerve, and whether it is made to appear before the soul or the tribunal of the visual faculty by a spirit within the hollows of the brain, or whether the visual faculty, like a magistrate sent by the soul, goes forth from the administrative chamber of the brain into the optic nerve and the retina to meet this image, as though descending to a lower court – I leave to be disputed by [others]. For the armament of the opticians does not take them beyond this first opaque wall encountered within the eye.” (Johannes Kepler, quoted in Lindberg, 1976)
From geometrical optics to Information, and from information to reference
Two distinct types of mind-world connectionsThe nonconceptual connection: causeThe semantic connection: satisfaction
The question how we make the transition from cause to meaning/reference is one of the great mysteries of mind (Brentano’s Problem). I will address a (very small) corner of that problem – the question how perception picks out individuals without already having concepts – as the first nononceptual step in the mind-world relation.
Why do we need to be able to pick out individuals without concepts?
We need to make nonconceptual contact with the world through perception in order to stop the regress of concepts being defined in terms of other concepts which are defined in terms of still other concepts …This is known as the Grounding Problem
The question of where to stop has received different answers by different philosophical schools. But sensory transduction by itself will not work: most concepts cannot be reduced to sense data. It has to be something more abstract that sensory properties
Our candidate is individuals as the forerunner of predication To explain why, I will begin with a personal experience
trying to developing a computer model of reasoning about geometry by drawing a diagram.
Notice what you have so far….(noticings are local – you encode what you attend to – as we will see in lecture 2)
There is an intersection of two lines…
But which of the two lines you drew are they?
There is no way to indicate which individual things are seen again unless there is a way to refer to individual things
Look around some more to see what is there ….
Here is another intersection of two lines…
Is it the same intersection as the one seen earlier?
To be able to tell without a reference to individuals you would have to encode unique properties of the individual lines. Which properties should you encode?
Footnote It would not help this problem at all if you labelled
the diagram as you drew it. Why? Because to refer to the line with label L1 you would
have to be able to think “This is line L1” and you could not think that unless you had a way to think “this” and the label would not help you to do that.
Being able to think “this” is another way to view the very problem I will be concerned with in these lectures. You still need an independent way to pick out and refer to an individual element even if it is labelled! And you need to do it for several individuals
That is exactly the point of John Perry’s claim about the “essential indexical”
Keeping track by encoding unique properties of individual items will not work in general
No description can work when items are moving or changing their appearance during the representation-construction process No description is guaranteed to remain unique if the scene is
changing But a visual representation is always changing since it is
always built up over time as properties are noticed An observer can pick out several individuals even if they are
in a field of identical individuals – e.g., pick out a dot in a uniform field of dots
Many writers have postulated a “marking” process. But where is the “mark” placed? It can’t be placed in the representation, because it is supposed to help keep track of which things in the world correspond to which things in the representation!
Pick out a single dot (or pick out 4 dots)? Can you keep track of the dots you picked if you move your eyes?
(Only for 4 or fewer, Irwin, 1996)
The requirements for picking out individual things and keeping track of them reminded me of an
early comic book character called “Plastic Man”
Imagine being able to place your fingers on things in the world without being able to detect their properties in this way, but being able to refer to those things so you could move your gaze or attention to them. If you could you would possess FINgers of INSTantiation (FINSTs)!
FINST Theory postulates a limited number of pointer mechanisms in early vision that are elicited by causal
events in the visual field and that enable vision to refer to things without doing so under concept or description
The assumption is that you have a way to directly pick out individual distal elements as sensory individuals –
not as bearers of certain known properties
Examples where such indexes are needed: Incremental construction of visual representations – the
correspondence problem over time (geometry example) Encoding relational predicates; e.g., Collinear (x,y,z,..);
Closed (C); Inside (x, C); Above (x,y); Square (w,x,y,z), requires simultaneously binding the arguments of n-place predicates to n elements in the visual scene Evaluating such visual predicates requires individuating
and referencing the objects over which the predicate is evaluated: i.e., the arguments in the predicate must be bound to individual objects in the scene.
Several objects must be picked out at once in making relational judgments
When we judge that certain objects are collinear, we must have picked out the relevant individual objects first.
Several objects must be picked out at once in making relational judgments
The same is true for other relational judgments like inside or on-the-same-contour… etc. We must pick out the relevant individual objects first.
Demonstrating FINSTs in action withMultiple Object Tracking
In a typical experiment, 8 simple identical objects are presented on a screen and 4 of them are briefly distinguished in some visual manner – usually by flashing them on and off.
After these 4 targets are briefly identified, all objects resume their identical appearance and move randomly. The subjects’ task is to keep track of the ones designated as targets.
After a period of 5-10 seconds the motion stops and subjects must indicate, using a mouse, which objects were the targets.
People are very good at this task (85%-98% correct). The question is: How do they do it?
How do we do it? What properties of individual objects do we use?
If we had such a mechanism, what should we be able to do?
Basic finding: Most people (even 5 year old children) can track at least 4 individual objects that have no unique visual properties
How is it done? We have shown that it is unlikely that the tracking
is done by keeping a record of target locations, and updating them while serially visiting the objects
We have proposed that individuating and keeping track of certain kinds of individuals uses the mechanism of visual indexes or FINSTs
Analysing Multiple Object Tracking
So what are FINSTs? They are a primitive reference mechanism that refer to
individual visual objects in the world (FINGs?) Objects are picked out and referred to without using any
encoding of their properties, including their location. Picking out objects is prior to encoding any properties!
Indexing is nonconceptual because it does not represent the individuals as members of some conceptual category.
FINSTs function something like visual demonstratives (like this or that); they pick out and refer to individual visual objects without appealing to their properties to do so.
An important function of FINST indexes is to bind arguments of visual predicates or of motor commands to things in the world to which they refer. Only predicates with bound arguments can be evaluated.
MOT with occlusion MOT with virtual occluders MOT with matched nonoccluding disappearance Track endpoints of lines Track rubber-band linked boxes Track and remember ID by location Track and remember ID by name (number) Track while
everything briefly disappears (½ sec) and goes on moving while invisible
Track while everything briefy disappears and reappears where they were when they disappeared
Additional examples of MOT
Another example: Subitizing vs CountingHow many squares are there?
Concentric squares cannot be subitized because individuating them requires the serial operation of curve tracing
Subitizing indexed objects is fast, accurate and independent of how many items there are. Only the squares on the right can be subitized because only automatically indexed items can be subitized
Example of the operation of Visual Indexes: Subset selection for search
singlefeaturesearch
conjunctionfeaturesearch
Target =
+ + +
+
Subset search results:
Only the properties of the subset matter – but properties of the entire subset are taken into account (since that is what distinguishes a feature search from a conjunction search)
If the subset is a single-feature search it is fast and the slope is very shallow
If the subset is a conjunction search set, it takes longer and is more sensitive to the set size
The dispersion among the targets does not matter
singlefeaturesearch
conjunctionfeaturesearch
Target =
+ + +
+
A saccadeoccurshere
The selective search experiment with a saccade induced between the late onset cues and start of search
Must we encode location in detecting the presence of a property?
Some researchers claim that “… when an observer detects a target defined by other dimensions, this provides information about the location of the stimulus being detected” (Nissen, 1985) True, but is this information encoded? Is it used?
Because such experiments always confound individuals and locations we cannot distinguish two competing explanations of how vision primitively selects things in the world (by location or by individual)
How can we unconfound locations and individuals in order to study the role of individuals in property encoding?
Study moving objects (e.g., MOT experiments, as well as I.O.R. and Object File priming studies to be descrined in Lecture 2)
Study more general types of “objects” that share the same spatial locus and “move” through a space of properties
Trajectories:
* pseudo-random and independent* frequent changes in speed and direction* Gabors frequently "pass" each other along a dimension(s)
Surfaces in feature-space
Snapshots
snapshots taken every 250 msec
People are able to track this single-location “objects” and
Single-object advantage is obtained
What goes on in MOT?According to the FINST account, some objects
cause a small number of indexes (4-5) to be captured. These indexes remain attached to their objects despite changes in their properties (in fact subjects do not notice property changes)
When objects stop moving the subject can move focal attention to indexed objects and thus is able to indicate which ones are targets.
Following up…
The next 3 lectures will explore several aspects of this idea that FINSTs provide a nonconceptual reference to visual objects in the world, including its implications for sentience, for nonconceptual impressions of space, and for the possible role of mental space in thought (i.e., mental imagery)
[Extra] Distinguishing several functions associated with indexing
Before objects are indexed they must first be individuated and their spatiotemporal boundaries established – they must undergo what Gestalt Psychologists called figure-ground separation
This Gestalt separation process parses a scene into enduring objects – into individual ‘space-time worms.’ We will call these individuals (or sometimes objects).
In constructing such enduring individuals the visual system must solve the correspondence problem, a central recurring problem in vision
Once such individuals have been isolated, some of the salient ones can capture the small number of available indexes. Those indexed in this way can be referred to and bound to arguments of visual predicates.
Individuation in early vision Individuation is an important function of mind. It is what
enables you to know when two encounters are encounters with the same individual – to distinguish this and that from this and this again. It requires a conceptual apparatus and conditions of individuation
But there is a weaker sense of individuation computed by the early vision system without benefit of identity or of concepts. It is what allows two proximal tokens to be treated as arising from the same distal individual – i.e., to solve the correspondence problem.
Computing correspondence determines same-individual within the framework of early vision. Same-individual is whatever the visual system treats as an enduring individual by placing separate occurrences into correspondence, and by selecting and tracking it as one single thing.
There can’t be indexing without individuation, but can there be individuation without indexing?Two different functions are involved in Indexing
1) Individuating by clustering and establishing correspondences (space-time worms)
2) Establishing a reference relation to bind objects to the arguments of predicates
There are empirical cases where objects appear to be individuated, yet where there does not seem to be a limit of 4 or 5 objects that can be individuated at once Apparent Motion Stereo vision Punctate inhibition of moving objects
Could these involve individuating without reference?
The puzzle of individuation without indexing, and what that might mean!
If objects show punctate inhibition, then inhibition moves with the objects. How can this be unless they are being tracked?
This puzzle may signal the need for a kind of individuation that does not involve indexing – a mere clustering, circumscribing, figure-ground distinction without a pointer or access mechanism – i.e. without reference
Such a circumscribing-clustering is needed often when the correspondence problem arises – in stereo, apparent motion, and other grouping situations
Data show that correspondence is not computed over continuous manifolds but over proximal clusters
But in many of these cases the difficulty of pairing does not increase with the number of elements to be paired
This makes individuating-by-clustering fundamental (see later)
top related