within-category variation is used in spoken word recognition
DESCRIPTION
Within-Category Variation is Used in Spoken Word Recognition Temporal Integration at Two Time Scales Bob McMurray University of Iowa Dept. of Psychology. Collaborators. Richard Aslin Michael Tanenhaus David Gow. Joe Toscano Dana Subik Julie Markant. Perception & Cognition. - PowerPoint PPT PresentationTRANSCRIPT
Within-Category Variation is Used in Spoken Word Recognition
Temporal Integration at Two Time Scales
Bob McMurrayUniversity of Iowa
Dept. of Psychology
Collaborators
Richard AslinMichael TanenhausDavid Gow
Joe ToscanoDana SubikJulie Markant
Perception & Cognition
A detailed understanding of perceptual processing is critical to understanding higher level cognition.
Specifically:
Sensitivity to fine-grained perceptual detail can help integrate information over time.
Temporal Integration
Temporal integration: a critical problem for cognition. - information never arrives synchronously.
• Vision: integration across head-movements, saccades and attention-shifts.
• Music perception: long-term dependencies and short term expectancies.
In language, information arrives sequentially.
• Partial syntactic and semantic representations are formed as words arrive.
The Hawkeyes beat the Boilermakers
• Words are identified over sequential phonemes.
əŋ
(once)
Spoken Word Recognition is an ideal arena in which to study these issues because:
• Research divides word recognition into perceptual and cognitive mechanisms.
• Perceptual information available for temporal information integration.
Scales of temporal integration in word recognition
• A Word: ordered series of articulations.- Build abstract representations.- Form expectations about future events.- Fast (online) processing.
• A phonology: - Abstract across utterances.- Expectations about possible future events.- Slow (developmental) processing
Mechanisms of Temporal Integration
Stimuli do not change arbitrarily.
Perceptual cues reveal something about the change itself.
Active integration:• Anticipating future events• Retain partial present representations.• Resolve prior ambiguity.
Overview
2) Lexical activation is sensitive to fine-grained detail in speech.
1) Speech perception and Spoken Word Recognition.
3) Fast temporal integration: taking advantage of regularity in the signal for temporal integration.
4) Slow temporal integration: Developmental consequences
bakeryba…
basic
barrier
barricade bait
baby
Xkery
bakery
X
XXX
Online Word Recognition
• Information arrives sequentially• At early points in time, signal is temporarily ambiguous.
• Later arriving information disambiguates the word.
Current models of spoken word recognition
• Immediacy: Hypotheses formed from the earliest moments of input.
• Activation Based: Lexical candidates (words) receive activation to the degree they match the input.
• Parallel Processing: Multiple items are active in parallel.
• Competition: Items compete with each other for recognition.
time
Input: b... u… tt… e… r
beach
bump putter
dog
butter
These processes have been well defined for a phonemic representation of the input.
But considerably less ambiguity if we consider subphonemic information.
Example: subphonemic effects of motor processes.
Coarticulation
Sensitivity to these perceptual details might yield earlier disambiguation.
Example: CoarticulationArticulation (lips, tongue…) reflects current, future and past events.
Subtle subphonemic variation in speech reflects temporal organization.
n ne et c
k
Any action reflects future actions as it unfolds.
These processes have largely been ignored because of a history of evidence that perceptual variability gets discarded.
Example: Categorical Perception
Categorical Perception
B
P
Subphonemic variation in VOT is discarded in favor of a discrete symbol (phoneme).
• Sharp identification of tokens on a continuum.
VOT
0
100
PB
% /p
/
ID (%/pa/)0
100Discrim
ination
Discrimination
• Discrimination poor within a phonetic category.
Evidence against the strong form of Categorical Perception from psychophysical-type tasks:
Discrimination Tasks Pisoni and Tash (1974) Pisoni & Lazarus (1974)Carney, Widin & Viemeister (1977)
Training Samuel (1977)Pisoni, Aslin, Perey & Hennessy (1982)
Goodness Ratings Miller (1997)Massaro & Cohen (1983)
?Does within-category acoustic detail
systematically affect higher level language?
Is there a gradient effect of subphonemic detail on lexical activation?
Experiment 1
A gradient relationship would yield systematic effects of subphonemic information on lexical activation.
If this gradiency is useful for temporal integration, it must be preserved over time.
Need a design sensitive to both acoustic detail and detailed temporal dynamics of lexical activation.
McMurray, Aslin & Tanenhaus (2002)
Use a speech continuum—more steps yields a better picture acoustic mapping.
KlattWorks: generate synthetic continua from natural speech.
Acoustic Detail
9-step VOT continua (0-40 ms)
6 pairs of words.beach/peach bale/pale bear/pearbump/pump bomb/palm butter/putter
6 fillers.lamp leg lock ladder lip leafshark shell shoe ship sheep shirt
How do we tap on-line recognition?With an on-line task: Eye-movements
Subjects hear spoken language and manipulate objects in a visual world.
Visual world includes set of objects with interesting linguistic properties.
a beach, a peach and some unrelated items.
Eye-movements to each object are monitored throughout the task.
Temporal Dynamics
Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy, 1995
• Relatively natural task.
• Eye-movements generated very fast (within 200ms of first bit of information).
• Eye movements time-locked to speech.
• Subjects aren’t aware of eye-movements.
• Fixation probability maps onto lexical activation..
Why use eye-movements and visual world paradigm?
A moment to view the items
Task
Task
Bear
Repeat 1080 times
By subject: 17.25 +/- 1.33ms By item: 17.24 +/- 1.24ms
High agreement across subjects and items for category boundary.
0 5 10 15 20 25 30 35 400
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
VOT (ms)
prop
ortio
n /p
/
B P
Identification Results
Task
Target = Bear
Competitor = Pear
Unrelated = Lamp, Ship
200 ms
1
2
3
4
5
Trials
Time
% fi
xatio
ns
Task
00.10.20.30.40.50.60.70.80.9
0 400 800 1200 1600 0 400 800 1200 1600 2000
Time (ms)
More looks to competitor than unrelated items.
VOT=0 Response= VOT=40 Response=Fi
xatio
n p
ropo
rtio
n
Task
Given that • the subject heard bear• clicked on “bear”…
How often was the subject looking at the “pear”?
Categorical Results Gradient Effect
target
competitortime
Fixa
tion
prop
ortio
n target
competitor competitorcompetitortime
Fixa
tion
prop
ortio
n target
Results
0 400 800 1200 16000
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0 ms5 ms10 ms15 ms
VOT
0 400 800 1200 1600 2000
20 ms25 ms30 ms35 ms40 ms
VOT
Com
petit
or F
ixat
ions
Time since word onset (ms)
Response= Response=
Long-lasting gradient effect: seen throughout the timecourse of processing.
0 5 10 15 20 25 30 35 400.02
0.03
0.04
0.05
0.06
0.07
0.08
VOT (ms)
CategoryBoundary
Response= Response=
Looks to
Looks to C
ompe
titor
Fix
atio
ns
B: p=.017* P: p<.001***Clear effects of VOTLinear Trend B: p=.023* P: p=.002***
Area under the curve:
0 5 10 15 20 25 30 35 400.02
0.03
0.04
0.05
0.06
0.07
0.08
VOT (ms)
Response= Response=
Looks to
Looks to
B: p=.014* P: p=.001***Clear effects of VOTLinear Trend B: p=.009** P: p=.007**
Unambiguous Stimuli Only
CategoryBoundaryC
ompe
titor
Fix
atio
ns
Summary
Subphonemic acoustic differences in VOT have gradient effect on lexical activation.
• Gradient effect of VOT on looks to the competitor.
• Seems to be long-lasting.• Effect holds even for unambiguous stimuli.
Consistent with growing body of work using priming (Andruski, Blumstein & Burton, 1994; Utman, Blumstein & Burton, 2000; Gow, 2001, 2002).
1) Word recognition is systematically sensitive to subphonemic acoustic detail.
The Proposed Framework
2) Acoustic detail is represented as gradations in activation across the lexicon.
3) This sensitivity enables the system to take advantage of subphonemic regularities for temporal integration.
4) This has fundamental consequences for development: learning phonological organization.
Sensitivity & Use
Lexical Sensitivity
1) Word recognition is systematically sensitive to subphonemic acoustic detail.
Voicing Laterality, Manner, Place Natural Speech
X Metalinguistic Tasks P
B Sh
L
Bear
Lexical Sensitivity
1) Word recognition is systematically sensitive to subphonemic acoustic detail.
0 5 10 15 20 25 30 35 40
VOT (ms)
CategoryBoundary
0
0.02
0.04
0.06
0.08
0.1
Response=BLooks to B
Response=PLooks to B
Com
petit
or F
ixat
ions
Voicing Laterality, Manner, Place Natural Speech
X Metalinguistic Tasks
Lexical Sensitivity
1) Word recognition is systematically sensitive to subphonemic acoustic detail.
0 5 10 15 20 25 30 35 40
VOT (ms)
CategoryBoundary
0
0.02
0.04
0.06
0.08
0.1
Response=BLooks to B
Response=PLooks to B
Com
petit
or F
ixat
ions
Voicing Laterality, Manner, Place Natural Speech
X Metalinguistic Tasks
Lexical Sensitivity
1) Word recognition is systematically sensitive to subphonemic acoustic detail.
Voicing Laterality, Manner, Place Natural Speech
X Metalinguistic Tasks
? Non minimal pairs? Duration of effect
(experiment 1)
2) Acoustic detail is represented as gradations in activation across the lexicon.
time
Input: b... u… m… p…
bun
bumper
pump
dump
bump
bomb
3) This sensitivity enables the system to take advantage of subphonemic regularities for temporal integration.
Regressive ambiguity resolution (exp 1):• Ambiguity retained until more information arrives.
Progressive expectation building (exp 2):• Phonetic distinctions are spread over time• Anticipate upcoming material.
Temporal Integration
4) Consequences for development: learning phonological organization.
Learning a language: • Integrating input across many utterances to build
long-term representation.
Sensitivity to subphonemic detail (exp 4 & 5).• Allows statistical learning of categories (model).
Development
?Experiment 2
?
How long are gradient effects of within-category detail maintained?
Can subphonemic variation play a role in ambiguity resolution?
How is information at multiple levels integrated?
Competitor still active - easy to activate it rest of the way.
Competitor completely inactive- system will “garden-path”.
P ( misperception ) distance from boundary.
Gradient activation allows the system to hedge its bets.
What if initial portion of a stimulus was misperceived?
Misperception
time
Input: p/b eI r ə k i t…
parakeetbarricade
Categorical Lexicon
barricade vs. parakeet
parakeet
barricade
Gradient Sensitivity
/ beIrəkeId / vs. / peIrəkit /
10 Pairs of b/p items.
Voiced Voiceless OverlapBumpercar Pumpernickel 6Barricade Parakeet 5Bassinet Passenger 5Blanket Plankton 5Beachball Peachpit 4Billboard Pillbox 4Drain Pipes Train Tracks 4Dreadlocks Treadmill 4Delaware Telephone 4Delicatessen Television 4
Methods
X
05101520253035
0
0.2
0.4
0.6
0.8
1
300 600 900
Time (ms)
Fixa
tions
to T
arge
t
VOT
Barricade -> Parricade
Eye Movement Results
Faster activation of target as VOTs near lexical endpoint.
--Even within the non-word range.
05101520253035
0
0.2
0.4
0.6
0.8
1
300 600 900
Time (ms)
Fixa
tions
to T
arge
t
VOT
Barricade -> Parricade
Eye Movement Results
Parakeet -> Barakeet
300 600 900 1200
Time (ms)
Faster activation of target as VOTs near lexical endpoint.
--Even within the non-word range.
Gradient effect of within-category variation without minimal-pairs.
Experiment 2 Conclusions
Gradient effect long-lasting: mean POD = 240 ms.
Regressive ambiguity resolution:
• Subphonemic gradations maintained until more information arrives.
• Subphonemic gradation can improve (or hinder) recovery from garden path.
Progressive Expectation Formation
Can within-category detail be used to predict future acoustic/phonetic events?
Yes: Phonological regularities create systematic within-category variation.
• Predicts future events.
time
Input: m… a… rr… oo… ng… g… oo… s…
maroongoose
goatduck
Word-final coronal consonants (n, t, d) assimilate the place of the following segment.
Place assimilation -> ambiguous segments —anticipate upcoming material.
Experiment 3: Anticipation
Maroong Goose Maroon Duck
Subject hears “select the maroon duck”“select the maroon goose”“select the maroong goose”“select the maroong duck” *
We should see faster eye-movements to “goose” after assimilated consonants.
Results
Looks to “goose“ as a function of time
00.10.20.30.40.50.60.70.80.9
0 200 400 600Time (ms)
Fixa
tion
Prop
ortio
n
AssimilatedNon Assimilated
Onset of “goose” + oculomotor delay
Anticipatory effect on looks to non-coronal.
Inhibitory effect on looks to coronal (duck, p=.024)
0
0.05
0.1
0.15
0.2
0.25
0.3
0 200 400 600Time (ms)
Fixa
tion
Prop
ortio
n
AssimilatedNon Assimilated
Looks to “duck” as a function of time
Onset of “goose” + oculomotor delay
Sensitivity to subphonemic detail:• Increase priors on likely upcoming events.• Decrease priors on unlikely upcoming events.• Active Temporal Integration Process.
Occasionally assimilation creates ambiguity• Resolves prior ambiguity: mudg drinker• Similar to experiment 2…
Lexical activation is exquisitely sensitive to within-category detail.
This sensitivity is useful to integrate material over time.
• Regressive Ambiguity resolution. • Progressive Facilitation
Taking advantage of phonological and lexical regularities.
Adult Summary
Historically, work in speech perception has been linked to development.
Sensitivity to subphonemic detail must revise our view of development.
Development
Use: Infants face additional temporal integration problems
No lexicon available to clean up noisy input: rely on acoustic regularities.
Extracting a phonology from the series of utterances.
Sensitivity to subphonemic detail:
For 30 years, virtually all attempts to address this question have yielded categorical discrimination (e.g. Eimas, Siqueland, Jusczyk & Vigorito, 1971).
Exception: Miller & Eimas (1996).• Only at extreme VOTs.• Only when habituated to non- prototypical token.
Nonetheless, infants possess abilities that would require within-category sensitivity.
• Infants can use allophonic differences at word boundaries for segmentation (Jusczyk, Hohne & Bauman, 1999; Hohne, & Jusczyk, 1994)
• Infants can learn phonetic categories from distributional statistics (Maye, Werker & Gerken, 2002; Maye & Weiss, 2004).
Use?
Speech production causes clustering along contrastive phonetic dimensions.
E.g. Voicing / Voice Onset TimeB: VOT ~ 0P: VOT ~ 40
Result: Bimodal distribution
Within a category, VOT forms Gaussian distribution.
VOT0ms 40ms
Statistical Category Learning
• Extract categories from the distribution.
+voice -voice
• Record frequencies of tokens at each value along a stimulus dimension.
VOT
freq
uenc
y
0ms 50ms
To statistically learn speech categories, infants must:
• This requires ability to track specific VOTs.
Why no demonstrations of sensitivity?
• HabituationDiscrimination not ID.Possible selective adaptation.Possible attenuation of sensitivity.
• Synthetic speechNot ideal for infants.
• Single exemplar/continuumNot necessarily a category representation
Experiment 4: Reassess issue with improved methods.
Experiment 4
Head-Turn Preference Procedure (Jusczyk & Aslin, 1995)
Infants exposed to a chunk of language:
• Words in running speech.
• Stream of continuous speech (ala statistical learning paradigm).
• Word list.
Memory for exposed items (or abstractions) assessed:• Compare listening time between consistent and
inconsistent items.
HTPP
Test trials start with all lights off.
Center Light blinks.
Brings infant’s attention to center.
One of the side-lights blinks.
When infant looks at side-light……he hears a word
Beach… Beach… Beach…
…as long as he keeps looking.
7.5 month old infants exposed to either 4 b-, or 4 p-words.
80 repetitions total.
Form a category of the exposed class of words.
PeachBeachPailBailPearBearPalmBomb
Measure listening time on…
VOT closer to boundaryCompetitors
Original words
Pear*Bear*BearPearPearBear
Methods
B* and P* were judged /b/ or /p/ at least 90% consistently by adult listeners.
B*: 97%P*: 96%
Stimuli constructed by cross-splicing naturally produced tokens of each end point.
B: M= 3.6 ms VOTP: M= 40.7 ms VOT
B*: M=11.9 ms VOTP*: M=30.2 ms VOT
Novelty/Familiarity preference varies across infants and experiments.
1221P
1636B
FamiliarityNoveltyWithin each group will we see evidence for gradiency?
We’re only interested in the middle stimuli (b*, p*).
Infants were classified as novelty or familiarity preferring by performance on the endpoints.
Novelty or Familiarity?
Categorical
What about in between?
After being exposed to bear… beach… bail… bomb…
Infants who show a novelty effect……will look longer for pear than bear.
Gradient
Bear*Bear Pear
List
enin
g Ti
me
4000
5000
6000
7000
8000
9000
10000
Target Target* Competitor
Lis
teni
ng T
ime
(ms)
BP
Exposed to:
Novelty infants (B: 36 P: 21)
Target vs. Target*:Competitor vs. Target*:
p<.001p=.017
Results
Familiarity infants (B: 16 P: 12)
Target vs. Target*:Competitor vs. Target*:
P=.003p=.012
4000
5000
6000
7000
8000
9000
10000
Target Target* Competitor
Lis
teni
ng T
ime
(ms) B
P
Exposed to:
NoveltyN=21
P P* B
.024*
.009**
P P* B
.024*
.009**
4000
5000
6000
7000
8000
9000
10000
Lis
teni
ng T
ime
(ms)
Infants exposed to /p/
P* B4000
5000
6000
7000
8000
9000
.018*
.028*
.018*
P
Lis
teni
ng T
ime
(ms)
.028*
FamiliarityN=12
NoveltyN=36
<.001**>.1
<.001**>.2
4000
5000
6000
7000
8000
9000
10000
B B* P
Lis
teni
ng T
ime
(ms)
Infants exposed to /b/
FamiliarityN=16
4000
5000
6000
7000
8000
9000
10000
B B* P
Lis
teni
ng T
ime
(ms)
.06.15
7.5 month old infants show gradient sensitivity to subphonemic detail.
• Clear effect for /p/• Effect attenuated for /b/.
Contrary to all previous work:
Experiment 4 Conclusions
Reduced effect for /b/… But:
Bear Pear
List
enin
g Ti
me
Bear*
Null Effect?
Bear Pear
List
enin
g Ti
me
Bear*
Expected Result?
• Bear* Pear
Bear Pear
List
enin
g Ti
me
Bear*
Actual result.
• Category boundary lies between Bear & Bear*- Between (3ms and 11 ms) [??]
• Within-category sensitivity in a different range?
Same design as experiment 3.
VOTs shifted away from hypothesized boundary
Train
40.7 ms.Palm Pear Peach Pail
3.6 ms.Bomb* Bear* Beach* Bale*
-9.7 ms.Bomb Bear Beach Bale
Test:
Bomb Bear Beach Bale -9.7 ms.
Experiment 5
Familiarity infants (34 Infants)
4000
5000
6000
7000
8000
9000
B- B P
Lis
teni
ng T
ime
(ms) =.05*
=.01**
Novelty infants (25 Infants)
=.02*
=.002**
4000
5000
6000
7000
8000
9000
B- B P
Lis
teni
ng T
ime
(ms)
• Within-category sensitivity in /b/ as well as /p/.
• Shifted category boundary in /b/: not consistent with adult boundary (or prior infant work). Why?
Experiment 5 Conclusions
/b/ results consistent with (at least) two mappings.C
ateg
ory
Map
ping
Stre
ngth 1) Shifted boundary
• Inconsistent with prior literature.
• Why would infants have this boundary?
VOT
/b/ /p/
/p/
VOT
Adult boundary
/b/
Cat
egor
y M
appi
ngSt
reng
th
HTPP is a one-alternative task. Asks: B or not-B not: B or P
Hypothesis: Sparse categories: by-product of efficient learning.
2) Sparse Categoriesunmappedspace
Distributional learning model
1) Model distribution of tokens asa mixture of Gaussian distributions over phonetic dimension (e.g. VOT) .
2) After receiving an input, the Gaussian with the highest posterior probability is the “category”.
VOT
3) Each Gaussian has threeparameters:
/b/
VOT
Adult boundary
/p/
Categ
ory M
appi
ngStr
engt
h
unmappedspace/b/
VOT
Adult boundary
/p/
Categ
ory M
appi
ngStr
engt
h
unmappedspace
Computational Model
Statistical Category Learning
1) Start with a set of randomly selected Gaussians.
2) After each input, adjust each parameter to find best description of the input.
3) Start with more Gaussians than necessary--model doesn’t innately know how many categories.
-> 0 for unneeded categories.
VOT VOT
Overgeneralization • large • costly: lose phonetic distinctions…
Undergeneralization• small • not as costly: maintain distinctiveness.
To increase likelihood of successful learning:• err on the side of caution.• start with small
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50 60
Starting
P(Su
cces
s)
2 Category Model39,900ModelsRun
3 Category Model
Sparseness coefficient: % of space not strongly mapped to any category.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 2000 4000 6000 8000 10000 12000
Training Epochs
Avg
Spa
rsen
ess C
oeff
icie
nt Starting
VOT
Small
.5-1
Unmapped space
Start with large σ
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 2000 4000 6000 8000 10000 12000
Training Epochs
Avg
Spa
rsity
Coe
ffic
ient
20-40
Starting
VOT
.5-1
Intermediate starting σ
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 2000 4000 6000 8000 10000 12000
Training Epochs
Avg
Spa
rsity
Coe
ffic
ient
12-173-11
Starting
VOT
.5-1
20-40
1) Occasionally model leaves sparse regions at the end of learning.
• Competition/Choice framework:Additional competition or selection mechanisms during processing: categorization despite incomplete information.
Limitations
2) Multi-dimensional categories1-D: 3 parameters / category2-D: 6 “ “3-D: 13 “ “
4-D: 15 “ “• Cue/model-reliability may reduce dimensionality.
• Similar properties in terms of starting and sparseness.
VOT
Categories• Competitive Hebbian Learning
(Rumelhart & Zipser, 1986).• Not constrained by a particular
equation—can fill space better.
Non-parametric approach?
Small or even medium starting ’s lead to sparse category structure during infancy—much of phonetic space is unmapped.
To avoid overgeneralization……better to start with small estimates for
Sparse categories:Similar temporal integration to exp 2
Retain ambiguity (and partial representations) until more input is available.
Model Conclusions
Examination of sparseness/completeness of categories needs a two alternative task.
Anticipatory Eye Movements(McMurray & Aslin, 2005)
Infants are trained to make anticipatory eye movements in response to auditory or visual stimulus.
Post-training, generalization can be assessed with respect to both targets.
bear
pail
AEM Paradigm
Quicktime Demo
Also useful with• Color• Shape• Spatial Frequency• Faces
Anticipatory Eye Movements
Train: Bear0: LeftPail35: Right
Test: Bear0 Pear40
Bear5 Pear35
Bear10 Pear30
Bear15 Pear25
Same naturally-produced tokens from Exps 4 & 5.
palm
beach
Experiment 6
Expected results
VOT
Adult boundary
unmapped
space
VOTVOT
Pail
Perf
orm
ance
Bear
Sparse categories
% Correct: 67%9 / 16 Better than chance.Training Tokens {
0
0.25
0.5
0.75
1
0 10 20 30 40
VOT
% C
orre
ct
Beach
Palm
Results
Infants show graded sensitivity to subphonemic detail.
/b/-results: regions of unmapped phonetic space.
Statistical approach provides support for sparseness.• Given current learning theories, sparseness results from
optimal starting parameters.
Empirical test will require a two-alternative task.• AEM: train infants to make eye-movements in
response to stimulus identity.
Infant Summary
Conclusions
Infant and adults sensitive to subphonemic detail.
Sensitivity is important to adult and developing word recognition systems.
1) Short term cue integration.2) Long term phonology learning.
In both cases…Partially ambiguous material is retained until more data arrives.
Partially active representations anticipate likelihood of future material
Conclusions
Spoken language is defined by change.
But the information to cope with it is in the signal—if we look online.
Within-category acoustic variation is signal, not noise.
Within-Category Variation is Used in Spoken Word Recognition
Temporal Integration at Two Time Scales
Bob McMurrayUniversity of Iowa
Dept. of Psychology
Head-Tracker Cam Monitor
IR Head-Tracker Emitters
EyetrackerComputer
SubjectComputer
Computers connected via Ethernet
Head
2 Eye cameras
Misperception: Additional Results
10 Pairs of b/p items.• 0 – 35 ms VOT continua.
20 Filler items (lemonade, restaurant, saxophone…)
Option to click “X” (Mispronounced).
26 Subjects
1240 Trials over two days.
0.000.100.200.300.400.500.600.700.800.901.00
0 5 10 15 20 25 30 35
Barricade
Res
pons
e R
ate
VoicedVoicelessNW
Identification Results
Parricade
0.000.100.200.300.400.500.600.700.800.901.00
0 5 10 15 20 25 30 35
VoicedVoicelessNW
Barakeet Parakeet
Res
pons
e R
ate
Significant target responses even at extreme.
Graded effects of VOT on correct response rate.
“Garden-path” effect:Difference between looks to each target (b
vs. p) at same VOT.
VOT = 0 (/b/)
0
0.2
0.4
0.6
0.8
1
0 500 1000
Time (ms)
Fixa
tions
to T
arge
t
BarricadeParakeet
VOT = 35 (/p/)
0 500 1000 1500
Time (ms)
Phonetic “Garden-Path”
-0.1
-0.05
0
0.05
0.1
0.15
0 5 10 15 20 25 30 35
VOT (ms)
Gar
den-
Path
Eff
ect
( Bar
rica
de -
Para
keet
)
-0.1
-0.08
-0.06
-0.04
-0.02
0
0.02
0.04
0.06
0 5 10 15 20 25 30 35
VOT (ms)
Gar
den-
Path
Eff
ect
( Bar
rica
de -
Para
keet
)
Target
Competitor
GP Effect:Gradient effect of VOT.
Target: p<.0001Competitor: p<.0001
Assimilation: Additional Results
runm picks
runm takes ***
When /p/ is heard, the bilabial feature can be assumed to come from assimilation (not an underlying /m/).
When /t/ is heard, the bilabial feature is likely to be from an underlying /m/.
Within-category detail used in recovering from assimilation: temporal integration.
• Anticipate upcoming material• Bias activations based on context
- Like Exp 2: within-category detail retained to resolve ambiguity..
Phonological variation is a source of information.
Exp 3 & 4: Conclusions
Subject hears“select the mud drinker”“select the mudg gear” “select the mudg drinker
Critical Pair
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 200 400 600 800 1000 1200 1400 1600 1800 2000Time (ms)
Fixa
tion
Prop
ortio
n
Initial Coronal:Mud Gear
Initial Non-Coronal:Mug Gear
Onset of “gear” Avg. offset of “gear” (402 ms)
Mudg Gear is initially ambiguous with a late bias towards “Mud”.
0
0.1
0.2
0.3
0.4
0.5
0.6
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Time (ms)
Fixa
tion
Prop
ortio
n
Initial Coronal: Mud Drinker
Initial Non-Coronal: Mug Drinker
Onset of “drinker” Avg. offset of “drinker (408 ms)
Mudg Drinker is also ambiguous with a late bias towards “Mug” (the /g/ has to come from somewhere).
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 200 400 600Time (ms)
Fixa
tion
Prop
ortio
n
AssimilatedNon Assimilated
Onset of “gear”
Looks to non-coronal (gear) following assimilated or non-assimilated consonant.
In the same stimuli/experiment there is also a progressive effect!