mcgurk doesn’t work: evidence against the mcgurk effect...

1
Synthetic-Lab Natural-Lab Natural-MTurk ForcedChoice OpenEnded Auditory Fusion Visual Other McGurk Doesn’t Work: Evidence Against the McGurk Effect as a Perceptual Illusion RESULTS Laura M. Getz & Joseph C. Toscano [laura.getz, joseph.toscano]@villanova.edu DISCUSSION Visual speech cues play an important role in speech recognition, and the McGurk effect is a classic demonstration of this. REFERENCES CURRENT EXPERIMENTS INTRODUCTION MacDonald, J., & McGurk, H. (1978). Visual influences on speech perception processes. Perception & Psychophysics. Mallick, D. B., Magnotti, J. F., & Beauchamp, M. S. (2015). Variability and stability in the McGurk effect: Contributions of participants, stimuli, time, and response type. Psychonomic Bulletin & Review. Massaro, D. W. (1998). Perceiving talking faces: From speech perception to a behavioral principle. MIT Press. McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature. Toscano, J. C., & Lansing, C. R. (2017). Age-related changes in temporal and spectral cue weights in speech. Language and Speech. Expt. Subjects Report “Ba” Report “Ga” Report “Da” McGurk & MacDonald (1976) 3-5 yr (n=21) 19% 0% 81% 7-8 yr (n=28) 36% 0% 64% 18-40 yr (n=54) 2% 0% 98% MacDonald & McGurk (1978) 18-24 yr (n=44) 9% 27% 64% We set out to systematically look at these individual differences, investigating a number of factors that could influence fusion rates. Ø Participant differences: lab vs. online Lab: Villanova University Intro Psychology students Age range: 18-21 years Online: Amazon’s Mechanical Turk (MTurk) Age range: 21-72 years Ø Stimulus differences: synthetic vs. natural Synthetic: Klatt-synthesized audio; /ɑ/ vs. /æ/ vowel contexts Baldi visual lip movements; /ba/ vs. /da/ Combined audio and video using iMovie Natural: 2 male and 2 female talkers (Mallick et al., 2015) Congruent AV stimuli separated and recombined in audB-visG and visG-audB combinations using iMovie Ø Design differences: open-ended vs. 3-alternative forced choice Asked to report: What did the speaker say? HEAR SEE REPORT “ba” “da” “ga” Stimulus Participant Design McGurk & MacDonad’s explanation for the illusory “fusioneffect deals with the way the sounds are articulated. Bilabial Alveolar Velar Voiced /b/ /d/ /g/ Voiceless /p/ /t/ /k/ More recent work shows that the effect may not be as robust as previously believed, as the proportion of fusion responses depends on individual and task differences (Mallick et al., 2015). Ø Lower proportion of fusion responses overall than in original experiments Open-ended MTurk fusion response rate (0.38) similar to Mallick et al. (2015) with “tha” included as a fusion response Participant differences: more fusion responses on MTurk than in lab One relevant individual difference may be age, with older participants more likely to show fusion effect This suggests that phonetic cue weights continue to change across the lifespan, in line with previous work (Toscano & Lansing, 2017) Stimulus differences: synthetic stimuli resulted in more “other” responses, suggesting that despite high control, we may need to use natural stimuli to see fusion effect Design differences: in each experiment, more fusion responses to 3-alternative forced-choice than open-ended design (cf. Mallick et al., 2015) Similar proportion of fusion responses with single modality trials integrated with AV trials and blocked design Ø Rather than a robust perceptual illusion, we argue that the McGurk effect is a product of individual differences and task demands Maybe it’s time to find a more reliable classroom demonstration of visual influence on spoken word recognition? Synthetic Stimuli: Lab Participants Forced Choice (N=24) Open-Ended (N=11) BA DA GA BA DA GA combo other audioB 0.93 0.02 0.05 0.73 0.05 0.02 0.00 0.20 audioD 0.01 0.95 0.04 0.01 0.72 0.02 0.00 0.25 audioG 0.04 0.08 0.88 0.03 0.14 0.75 0.00 0.09 visualB 0.89 0.05 0.06 0.52 0.04 0.04 0.00 0.40 visualD/G 0.02 0.51 0.46 0.01 0.32 0.23 0.00 0.44 AV -congruentB 0.89 0.05 0.06 0.41 0.00 0.02 0.00 0.55 AV -congruentD 0.01 0.93 0.06 0.00 0.64 0.02 0.00 0.34 AV -congruentG 0.00 0.02 0.98 0.00 0.01 0.94 0.00 0.03 AV -audioB-visD/G 0.39 0.34 0.26 0.05 0.01 0.08 0.00 0.86 AV -audioG-visB 0.10 0.02 0.89 0.04 0.02 0.88 0.00 0.04 Natural Stimuli: Lab Participants Forced Choice (N=46) Open-Ended (N=46) BA DA GA BA DA GA combo other audioB 0.98 0.02 0.00 0.98 0.00 0.00 0.00 0.01 audioD 0.00 0.99 0.01 0.00 0.99 0.00 0.00 0.00 audioG 0.00 0.01 0.99 0.00 0.00 1.00 0.00 0.00 visualB 0.99 0.00 0.01 0.99 0.00 0.00 0.00 0.00 visualD 0.00 0.91 0.09 0.01 0.85 0.11 0.00 0.03 visualG 0.01 0.39 0.60 0.01 0.38 0.57 0.00 0.03 AV -congruentB 0.99 0.01 0.00 0.99 0.00 0.01 0.00 0.00 AV -congruentD 0.01 0.95 0.04 0.00 0.97 0.02 0.00 0.01 AV -congruentG 0.00 0.06 0.94 0.00 0.05 0.94 0.00 0.00 AV -audioB-visG 0.75 0.14 0.11 0.74 0.10 0.11 0.00 0.04 AV -audioG-visB 0.21 0.01 0.79 0.23 0.00 0.74 0.01 0.00 Natural Stimuli: Online MTurk Participants Forced Choice (N=37) Open-Ended (N=39) BA DA GA BA DA GA combo other audioB 0.93 0.04 0.02 0.80 0.02 0.00 0.00 0.18 audioD 0.02 0.91 0.07 0.00 0.94 0.03 0.00 0.03 audioG 0.02 0.02 0.96 0.00 0.02 0.95 0.00 0.03 visualB 0.90 0.08 0.02 0.89 0.02 0.02 0.00 0.06 visualD 0.03 0.82 0.15 0.02 0.55 0.14 0.01 0.28 visualG 0.04 0.50 0.46 0.01 0.35 0.43 0.01 0.20 AV -congruentB 0.94 0.05 0.01 0.92 0.01 0.00 0.00 0.07 AV -congruentD 0.01 0.93 0.06 0.00 0.95 0.03 0.00 0.02 AV -congruentG 0.03 0.03 0.94 0.01 0.02 0.95 0.00 0.02 AV -audioB-visG 0.49 0.41 0.09 0.37 0.17 0.06 0.00 0.40 AV -audioG-visB 0.03 0.02 0.95 0.09 0.01 0.77 0.11 0.02 M=45 years M=35 years M=36 years M=39 years

Upload: others

Post on 14-Aug-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: McGurk Doesn’t Work: Evidence Against the McGurk Effect ...wraplab.co/publications/GetzToscano-Psychonomics-2018.pdfSynthetic-Lab Natural-Lab Natural-MTurk ce i o h C d rce o F d

Synthetic-Lab Natural-Lab Natural-MTurk

ForcedChoice

OpenEnded

Auditory Fusion Visual Other

McGurk Doesn’t Work: Evidence Against the McGurk Effect as a Perceptual Illusion

RESULTS

Laura M. Getz & Joseph C. Toscano[laura.getz, joseph.toscano]@villanova.edu

DISCUSSION

Visualspeechcuesplayanimportantroleinspeechrecognition,andtheMcGurkeffectisaclassicdemonstrationofthis.

REFERENCES

CURRENT EXPERIMENTS

INTRODUCTION

MacDonald,J.,&McGurk,H.(1978).Visualinfluencesonspeechperceptionprocesses. Perception&Psychophysics.

Mallick,D.B.,Magnotti,J.F.,&Beauchamp,M.S.(2015).VariabilityandstabilityintheMcGurkeffect:Contributionsofparticipants,stimuli,time,andresponsetype. Psychonomic Bulletin&Review.

Massaro,D.W.(1998). Perceivingtalkingfaces:Fromspeechperceptiontoabehavioralprinciple.MITPress.

McGurk,H.,&MacDonald,J.(1976).Hearinglipsandseeingvoices. Nature.

Toscano,J.C.,&Lansing,C.R.(2017).Age-relatedchangesintemporalandspectralcueweightsinspeech. LanguageandSpeech.

Expt. Subjects Report“Ba”

Report“Ga”

Report“Da”

McGurk&MacDonald(1976)

3-5yr (n=21) 19% 0% 81%

7-8 yr (n=28) 36% 0% 64%

18-40 yr (n=54) 2% 0% 98%

MacDonald&McGurk (1978) 18-24 yr (n=44) 9% 27% 64%

Wesetouttosystematicallylookattheseindividualdifferences,investigatinganumberoffactorsthatcouldinfluencefusionrates.

Ø Participantdifferences:labvs.online• Lab:VillanovaUniversityIntroPsychologystudents

Agerange:18-21years• Online:Amazon’sMechanicalTurk(MTurk)

Agerange:21-72years

Ø Stimulusdifferences:syntheticvs.natural• Synthetic:Klatt-synthesizedaudio;/ɑ/vs./æ/vowelcontexts

Baldi visuallipmovements;/ba/vs./da/CombinedaudioandvideousingiMovie

• Natural:2maleand2femaletalkers(Mallick etal.,2015)CongruentAVstimuliseparatedandrecombinedinaudB-visG andvisG-audB combinationsusingiMovie

Ø Designdifferences:open-endedvs.3-alternativeforcedchoiceAskedtoreport:Whatdidthespeakersay?

HEAR SEE REPORT

“ba” “da”“ga”

StimulusParticipant Design

McGurk&MacDonad’s explanationfortheillusory“fusion”effectdealswiththewaythesoundsarearticulated.

Bilabial Alveolar Velar

Voiced /b/ /d/ /g/

Voiceless /p/ /t/ /k/

Morerecentworkshowsthattheeffectmaynotbeasrobustaspreviouslybelieved,astheproportionoffusionresponsesdependsonindividualandtaskdifferences(Mallick etal.,2015).

Ø Lower proportionoffusionresponsesoverallthaninoriginalexperiments• Open-endedMTurk fusionresponserate(0.38)similartoMallick etal.

(2015)with“tha”includedasafusionresponse

• Participantdifferences:more fusionresponsesonMTurk thaninlab• Onerelevantindividualdifferencemaybeage,witholderparticipants

more likelytoshowfusioneffect• Thissuggeststhatphoneticcueweightscontinuetochangeacrossthe

lifespan,inlinewithpreviouswork(Toscano &Lansing,2017)

• Stimulusdifferences:syntheticstimuliresultedinmore “other”responses,suggestingthatdespitehighcontrol,wemayneedtousenaturalstimulitoseefusioneffect

• Designdifferences:ineachexperiment,more fusionresponsesto3-alternativeforced-choicethanopen-endeddesign(cf.Mallick etal.,2015)• Similarproportionoffusionresponseswithsinglemodalitytrials

integrated withAVtrialsandblocked design

Ø Ratherthanarobustperceptualillusion,wearguethattheMcGurkeffectisaproductofindividualdifferencesandtaskdemands

• Maybeit’stimetofindamorereliableclassroomdemonstrationofvisualinfluenceonspokenwordrecognition?

SyntheticStimuli:LabParticipantsForcedChoice(N=24) Open-Ended(N=11)

BA DA GA BA DA GA combo otheraudioB 0.93 0.02 0.05 0.73 0.05 0.02 0.00 0.20audioD 0.01 0.95 0.04 0.01 0.72 0.02 0.00 0.25audioG 0.04 0.08 0.88 0.03 0.14 0.75 0.00 0.09visualB 0.89 0.05 0.06 0.52 0.04 0.04 0.00 0.40visualD/G 0.02 0.51 0.46 0.01 0.32 0.23 0.00 0.44AV-congruentB 0.89 0.05 0.06 0.41 0.00 0.02 0.00 0.55AV-congruentD 0.01 0.93 0.06 0.00 0.64 0.02 0.00 0.34AV-congruentG 0.00 0.02 0.98 0.00 0.01 0.94 0.00 0.03AV-audioB-visD/G 0.39 0.34 0.26 0.05 0.01 0.08 0.00 0.86AV-audioG-visB 0.10 0.02 0.89 0.04 0.02 0.88 0.00 0.04

NaturalStimuli:LabParticipantsForcedChoice(N=46) Open-Ended(N=46)

BA DA GA BA DA GA combo otheraudioB 0.98 0.02 0.00 0.98 0.00 0.00 0.00 0.01audioD 0.00 0.99 0.01 0.00 0.99 0.00 0.00 0.00audioG 0.00 0.01 0.99 0.00 0.00 1.00 0.00 0.00visualB 0.99 0.00 0.01 0.99 0.00 0.00 0.00 0.00visualD 0.00 0.91 0.09 0.01 0.85 0.11 0.00 0.03visualG 0.01 0.39 0.60 0.01 0.38 0.57 0.00 0.03AV-congruentB 0.99 0.01 0.00 0.99 0.00 0.01 0.00 0.00AV-congruentD 0.01 0.95 0.04 0.00 0.97 0.02 0.00 0.01AV-congruentG 0.00 0.06 0.94 0.00 0.05 0.94 0.00 0.00AV-audioB-visG 0.75 0.14 0.11 0.74 0.10 0.11 0.00 0.04AV-audioG-visB 0.21 0.01 0.79 0.23 0.00 0.74 0.01 0.00

NaturalStimuli:OnlineMTurk ParticipantsForcedChoice(N=37) Open-Ended(N=39)

BA DA GA BA DA GA combo otheraudioB 0.93 0.04 0.02 0.80 0.02 0.00 0.00 0.18audioD 0.02 0.91 0.07 0.00 0.94 0.03 0.00 0.03audioG 0.02 0.02 0.96 0.00 0.02 0.95 0.00 0.03visualB 0.90 0.08 0.02 0.89 0.02 0.02 0.00 0.06visualD 0.03 0.82 0.15 0.02 0.55 0.14 0.01 0.28visualG 0.04 0.50 0.46 0.01 0.35 0.43 0.01 0.20AV-congruentB 0.94 0.05 0.01 0.92 0.01 0.00 0.00 0.07AV-congruentD 0.01 0.93 0.06 0.00 0.95 0.03 0.00 0.02AV-congruentG 0.03 0.03 0.94 0.01 0.02 0.95 0.00 0.02AV-audioB-visG 0.49 0.41 0.09 0.37 0.17 0.06 0.00 0.40AV-audioG-visB 0.03 0.02 0.95 0.09 0.01 0.77 0.11 0.02

M=45years

M=35years

M=36years

M=39years