neural representations of speech at the “cocktail party” in...

Post on 15-Jul-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Neural Representations of Speech at the “Cocktail Party” in Human Auditory Cortex

Jonathan Z. SimonDepartment of Electrical & Computer EngineeringDepartment of BiologyInstitute for Systems Research

University of Maryland

Peking University / NYU Shanghai / Zhejiang University 2016http://www.isr.umd.edu/Labs/CSSL/simonlab

AcknowledgementsCurrent (Simon Lab & Affiliates)

Christian BrodbeckFrancisco CervantesDavid NahmiasMahshid NajafiKrishna PuvvadaPeng Zan

Past (Simon Lab & Affiliate Labs)Nayef AhmarMurat AytekinClaudia BoninMaria ChaitMarisel Villafane Delgado Kim DrnecNai Ding Victor Grau-SerratJulian JenkinsNatalia LapinskayaKai Sum LiHuan LuoLing MaAlex Presacco

Raul RodriguezBen WalshJuanjuan Xiang Jiachen Zhuo

CollaboratorsPamela AbshireSamira Anderson Behtash Babadi Catherine CarrMonita ChatterjeeAlain de CheveignéDidier DepireuxMounya ElhilaliBernhard EnglitzJonathan FritzStefanie KuchinskyCindy MossDavid PoeppelShihab Shamma

Past Postdocs & Visitors Sahar Akram

Funding NIH (NIDCD, NIA, NIBIB); USDA; UMD

Aline Gesualdi Manhães Dan HertzJonas VanthornhoutYadong Wang

Undergraduate Students Abdulaziz Al-Turki Nicholas AsendorfSonja BohrElizabeth CamengaCorinne CameronJulien DagenaisKatya DombrowskiKevin HoganKevin KahnAlexandria MillerAndrea ShomeSandra SoltzMadeleine VarmerJames Williams

Outline• Cortical Representations of Speech (via MEG)

‣ Encoding vs. Decoding

• Cortical Representations of the “Cocktail Party”

• Recent Results

‣ Attentional Dynamics

‣ Aging & Cortical Representations of Speech

‣ Higher Level Interference & Noise

Functional Brain Imaging= Non-invasive recording from human brain

Hem

odyn

amic

tech

niqu

esEl

ectr

omag

netic

tech

niqu

es

fMRI functional magnetic resonance imaging

PET positron emission tomography

EEG electroencephalography

MEG magnetoencephalography

Excellent Spatial Resolution(~1 mm)

Poor Temporal Resolution(~1 s)

Poor Spatial Resolution(~1 cm)

Excellent Temporal Resolution(~1 ms)

Functional Brain Imaging

fMRI & MEG can capture effects in single

subjects

MEG & Auditory Cortex

• Non-invasive, Passive, Silent Neural Recordings

• MEG Response Patterns Time-Locked to Stimulus Events

• Robust

• Strongly Lateralized

• Cortical Origin Only

Pure Tone

Broadband Noise

time (ms)

time (ms)

MEG Responses

AuditoryModel

to Speech Modulations

Ding & Simon, J Neurophysiol (2012) “Spectro-Temporal Response Function”

(up to ~10 Hz)

MEG Responses Predicted by STRF Model

Linear Kernel = STRF

Long duration speech: ~60 s

Ding & Simon, J Neurophysiol (2012) “Spectro-Temporal Response Function”

(up to ~10 Hz)

MEG Responses Predicted by STRF Model

Linear Kernel = STRF

Long duration speech: ~60 s

Ding & Simon, J Neurophysiol (2012)Zion-Golumbic et al., Neuron (2013)

Neural Reconstruction of Speech Envelope

2 s

stimulus speech envelopereconstructed stimulus speech envelope

Reconstruction accuracy comparable to single unit & ECoG recordings

(up to ~ 10 Hz)

MEG Responses

...

DecoderSpeech Envelope

Ding & Simon, J Neurophysiol (2012)Zion-Golumbic et al., Neuron (2013)

Neural Reconstruction of Speech Envelope

2 s

stimulus speech envelopereconstructed stimulus speech envelope

Reconstruction accuracy comparable to single unit & ECoG recordings

(up to ~ 10 Hz)

MEG Responses

...

DecoderSpeech Envelope

Ding & Simon, J Neurophysiol (2012)Zion-Golumbic et al., Neuron (2013)

Neural Reconstruction of Speech Envelope

2 s

stimulus speech envelopereconstructed stimulus speech envelope

Reconstruction accuracy comparable to single unit & ECoG recordings

(up to ~ 10 Hz)

MEG Responses

...

DecoderSpeech Envelope

Neural Representation of Speech: Temporal

Speech in Stationary Noise

Ding & Simon, J Neuroscience (2013)

Speech in Stationary Noise

Ding & Simon, J Neuroscience (2013)

Speech in Noise: Results

+6 dB

-6 dB1 s

A Neural Reconstruction ofUnderlying Speech Envelope

B Reconstruction Accuracy

corre

latio

n

SNR (dB)Q +6 +2 −3 −6 −9

0

.1

.2

0 25 50 75 1000

.1

.2

.3

C Correlation with Intelligiblity

intelligiblity (%)

reco

nstru

ctio

n ac

cura

cy

Ding & Simon, J Neuroscience (2013)

Speech in Noise: Results

+6 dB

-6 dB1 s

A Neural Reconstruction ofUnderlying Speech Envelope

+6 dB

-6 dB1 s

A Neural Reconstruction ofUnderlying Speech Envelope

B Reconstruction Accuracy

corre

latio

n

SNR (dB)Q +6 +2 −3 −6 −9

0

.1

.2

0 25 50 75 1000

.1

.2

.3

C Correlation with Intelligiblity

intelligiblity (%)

reco

nstru

ctio

n ac

cura

cy

Ding & Simon, J Neuroscience (2013)

Speech in Noise: Results

+6 dB

-6 dB1 s

A Neural Reconstruction ofUnderlying Speech Envelope

+6 dB

-6 dB1 s

A Neural Reconstruction ofUnderlying Speech Envelope

B Reconstruction Accuracy

corre

latio

n

SNR (dB)Q +6 +2 −3 −6 −9

0

.1

.2

0 25 50 75 1000

.1

.2

.3

C Correlation with Intelligiblity

intelligiblity (%)

reco

nstru

ctio

n ac

cura

cy

+6 dB

-6 dB1 s

A Neural Reconstruction ofUnderlying Speech Envelope

B Reconstruction Accuracy

corre

latio

n

SNR (dB)Q +6 +2 −3 −6 −9

0

.1

.2

0 25 50 75 1000

.1

.2

.3

C Correlation with Intelligiblity

intelligiblity (%)

reco

nstru

ctio

n ac

cura

cy

across Subjects

Ding & Simon, J Neuroscience (2013)

Noise-Vocoded Speech

Ding, Chatterjee & Simon, NeuroImage (2014)

Intelligibility Reflected only in Delta Band (1– 4 Hz)

Multiple Cortical Speech Representations?

Di Liberto, et al. (2015) Low-Frequency Cortical Entrainment to Speech Reflects Phoneme-Level Processing

Kayser et al. (2015) Irregular Speech Rate Dissociates Auditory Cortical Entrainment, Evoked Responses, and Frontal Alpha

Ding et al. (2015) Cortical tracking of hierarchical linguistic structures in connected speech

Cortical Speech Representations

• Neural Representations: Encoding & Decoding

• Linear models: Useful & Robust

• Speech Envelope only (as seen in MEG)

• Envelope Rates: ~ 1 - 10 Hz

• Intelligibility linked to lower range of frequencies (Delta)

Alex Katz, The Cocktail Party

Listening to Speech at the Cocktail Party

Alex Katz, The Cocktail Party

Listening to Speech at the Cocktail Party

Alex Katz, The Cocktail Party

Listening to Speech at the Cocktail Party

Alex Katz, The Cocktail Party

Listening to Speech at the Cocktail Party

speech

competing speech

Competing Speech Streams

Selective Neural Encoding

Selective Neural Encoding

Unselective vs. Selective Neural Encoding

Selective Neural Encoding

Selective Encoding: Resultsrepresentative

subject

Identical Stimuli!

reconstructed from MEG

attended speech envelopes

reconstructed from MEG

attending tospeaker 1

attending tospeaker 2

Ding & Simon, PNAS (2012)

Single Trial Speech Reconstruction

Ding & Simon, PNAS (2012)

Single Trial Speech Reconstruction

Ding & Simon, PNAS (2012)

STRF Results

•STRF separable (time, frequency)•300 Hz - 2 kHz dominant carriers•M50STRF positive peak•M100STRF negative peak

TRF

•M100STRF strongly modulated by attention, but not M50STRF

attended

.2

.5

1

3

0 100 200

Background

fre

qu

en

cy (

kH

z)

.2

.5

1

3

0 100 200

Attended

time (ms) time (ms)

background

Neural Sources

RightLeft

anterior

posterior

medial

M50STRFM100STRFM100

•M100STRF source near (same as?) M100 source: Planum Temporale

•M50STRF source is anterior and medial to M100 (same as M50?): Heschl’s Gyrus

5 mm

•PT strongly modulated by attention, but not HG

Recent Results

• Attentional Dynamics

• Aging & Cortical Representations of Speech

• High Level Interference & Noise

Recent Results

• Attentional Dynamics

• Aging & Cortical Representations of Speech

• High Level Interference & Noise

Attentional DynamicsAttend to Speaker 1

Attend to Speaker 2

Prob

abilit

y of

atte

ndin

g S

peak

er 1

Akram et al., NeuroImage (2016)

Attentional DynamicsAttend to Speaker 1

Switch Attention

Attend to Speaker 2

Prob

abilit

y of

atte

ndin

g S

peak

er 1

Akram et al., NeuroImage (2016)

Recent Results

• Attentional Dynamics

• Aging & Cortical Representations of Speech

• High Level Interference & Noise

-200 0 200 400 600 800Time (ms)

-5

-3

-1

0

1

3

Stan

dard

ized

am

plitu

de (f

T)Evoked response to the 500 Hz tone burst

Younger

Older*

*

X: 0.7861Y: 0.1801

Aging & Auditory Cortex

Average Responses to Pure Tone

Older Younger

-200 0 200 400 600 800Time (ms)

-5

-3

-1

0

1

3

Stan

dard

ized

am

plitu

de (f

T)Evoked response to the 500 Hz tone burst

Younger

Older*

*

X: 0.7861Y: 0.1801

Aging & Auditory Cortex

Average Responses to Pure Tone

Older Younger

Over-Representation

-200 0 200 400 600 800Time (ms)

-5

-3

-1

0

1

3

Stan

dard

ized

am

plitu

de (f

T)Evoked response to the 500 Hz tone burst

Younger

Older*

*

X: 0.7861Y: 0.1801

Aging & Auditory Cortex

Average Responses to Pure Tone

Older Younger

Subject0

2

4

6

z-score

Older

Younger

M100 Power by Subject

Over-Representation

Over-Representation

Speech Over-Representation

Speech Reconstruction by SubjectPresacco et al., J Neurophysiol (2016a)

Rec

onst

ruct

ion

Accu

racy

Speech Over-Representation

Speech Reconstruction by Subject

Speech Reconstruction by SNR

Quiet +3 dB 0 dB -3 dB -6 dB0

0.1

0.2

0.3

0.4

r

Older Younger

500350500 500350150 500350150 5003501500

0.1

0.2

0.3

0.4

r

Noise

***

***

*

N.S.N.S.

**** *** *

Quiet Integrationtime (ms)

Rec

onst

ruct

ion

Accu

racy

Presacco et al., J Neurophysiol (2016a)

Rec

onst

ruct

ion

Accu

racy

Aging & Integration Time

withCompetingSpeaker

Rec

onst

ruct

ion

Accu

racy

withCompetingSpeaker

Older AdultsYounger Adults

In Quiet

Integration window (ms)

In Quiet

Presacco et al., J Neurophysiol (2016a)

Neural vs Inhibitory ControlR

econ

stru

ctio

n Ac

cura

cy

Flanker Score

Olderρ = –0.62

Youngerρ = 0.43

Presacco et al., J Neurophysiol (2016b)

Recent Results

• Attentional Dynamics

• Aging & Cortical Representations of Speech

• High Level Interference & Noise

Rec

onst

ruct

ion

Accu

racy

High Level Interference Effects

Speech Reconstruction by SNR

In Quiet

Presacco et al., J Neurophysiol (2016b)

Rec

onst

ruct

ion

Accu

racy

High Level Interference Effects

Speech Reconstruction by SNR

In Quiet

Presacco et al., J Neurophysiol (2016b)

• Unfamiliarity of Background- Boosts Intelligibility

of Attended Speech

Rec

onst

ruct

ion

Accu

racy

High Level Interference Effects

Speech Reconstruction by SNR

In Quiet

Presacco et al., J Neurophysiol (2016b)

• Unfamiliarity of Background- Boosts Intelligibility

of Attended Speech

Rec

onst

ruct

ion

Accu

racy

High Level Interference Effects

Speech Reconstruction by SNR

In Quiet

Presacco et al., J Neurophysiol (2016b)

• Unfamiliarity of Background- Boosts Intelligibility

of Attended SpeechUnfamiliar

Unfamiliar

Rec

onst

ruct

ion

Accu

racy

High Level Interference Effects

Speech Reconstruction by SNR

In Quiet

Presacco et al., J Neurophysiol (2016b)

• Unfamiliarity of Background- Boosts Intelligibility

of Attended Speech- Also Boosts Cortical

Reconstruction of Attended Speech

Unfamiliar

Unfamiliar

Summary• Cortical representations of speech- representation of envelope (up to ~10 Hz)- robust against a variety of noise types- neural representation of perceptual object

• Object-based representation at 100 ms latency (PT), but not by 50 ms (HG)

• Robust Dynamical Foreground Monitoring

• Over-Representation with Aging- Reconstruction depends on integration time- Over-Representation tracks inhibitory control

• Background familiarity: neural tracks behavior

Thank You

top related