medizinische physik auditory principles in speech...

41
Medizinische Physik Auditory principles in speech Auditory principles in speech processing processing do computers need do computers need silicon ears ? silicon ears ? Prof. Dr. Prof. Dr. rer rer . nat. Dr. med. . nat. Dr. med. Birger Birger Kollmeier Kollmeier * * * with contributions by V. Hohmann, M. Kleinschmidt, T. Brand, J. Nix, R. Beutelmann, and more members of our medical physics group

Upload: others

Post on 11-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Auditory principles in speech Auditory principles in speech processing processing �� do computers need do computers need

silicon ears ?silicon ears ?

Prof. Dr. Prof. Dr. rerrer. nat. Dr. med.. nat. Dr. med.

Birger Birger KollmeierKollmeier**

* with contributions by V. Hohmann, M. Kleinschmidt, T. Brand, J. Nix, R. Beutelmann, and more members of our medical physics group

Page 2: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

A quick test of your Cocktail-party-processor....

A quick test of your Cocktail-party-processor....

Can you understand this sentence?(Three repetitions, increasing signal-to-noise ratio)

Rachel has seven pretty spoons

-10 dB -5dB 0dB 5 dB 10 dB SNR

Humans, optimum conditions

Humans, poorconditions

Humans, hearing impaired

Computer, optimum condition

Computer, poorconditions

Page 3: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

OutlineOutline

! Auditory principles already �in silico�! Additional properties not yet exploited! Auditory models! Modulation processing! Binaural information processing! ...why it matters not only for hearing aids

Page 4: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Auditory properties used in speech processing systemsAuditory properties used in speech processing systems

! Logarithmic/compressive intensity units (dB, loudness, cepstrum)

! Quasi-logarithmic Mel/bark-scale as frequency scale

! Linear predictive coding with quantization noise masked by speech (Schroeder & Atal)

! Masking model for HiFi-coding (MPEG, Brandenburg)

! Speech coding & Audio quality objective assessment (Beerends & Stemerdink)

! RASTA techniques for ASR (Hermansky& Morgan)

Page 5: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Spectral Masking modelsSpectral Masking models

! Classical approach: Loudness model (Fletcher, Stevens, Zwicker & Fastl, Moore)

! Forward & backward masking produce temporal sluggishness

Filter-bank

Filter-bank

Envelope/ compressionEnvelope/

compressionLevel-depen-dent slopes

Level-depen-dent slopes

ExcitationPattern =

MaximumAcross

channels

ExcitationPattern =

MaximumAcross

channels

Page 6: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

! Assessment ofsignal quality(Cellular phone networks,....)

! Signal coding (MP3, MiniDisc,..)

! Speech & pattern recognition

! Hearing aids

Application of auditory modelsApplication of auditory models

CodedSignal

Original-Signal

ModelQualityMeasure

Signal Model Recognizer

SignalModell (NH)

Modell (HI)

Modifi-cations

CoderSignal

DecoderModel

Page 7: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

What do we miss when applying classical models?

What do we miss when applying classical models?

! Temporal processing! Modulation processing ! Spectro-temporal modulation processing ! Binaural/spatial processing! Cognitive effects (interpolation,

suppression)

Page 8: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Speech perception without a spectrum?

Speech perception without a spectrum?

Flat spectrum (phase only) speech using the Oldenburg sentence test

Ear performs temporal analysis

Page 9: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Temporal Temporal processing used for processing used for robust robust speech recognitionspeech recognition

! Hermansky & Sharma: TRAPS! Perception model

�Auditory front endfor speech recognizer�: Robustnessagainst noise

Tchorz & Kollmeier, JASA 106,1999

Signal Model Recognizer

Page 10: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

What do we miss when applying classical models?

What do we miss when applying classical models?

! Temporal processing ! Modulation processing ! Spectro-temporal modulation processing ! Binaural/spatial processing! Cognitive effects (interpolation,

suppression)

Sinusoidally Amplitude modulated noise Spoken sentence

Page 11: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Asymmetry in masking: Which is the �better� masker?

Asymmetry in masking: Which is the �better� masker?

Tone 2 kHzNarrow-band noise

2 kHz, 256 Hz bandwith

Count the audible steps!Tone in steps masked

by continuous noiseNoise in steps masked

by continuous tone

Ear performs detailed envelope analysis (modulationspectrum)

f

f

Page 12: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

What do we miss when applying classical models?

What do we miss when applying classical models?

! Temporal processing ! Modulation processing ! Spectro-temporal modulation processing! Binaural/spatial processing! Cognitive effects

time frequency

Gabor spectro-temporal feature

Page 13: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Music + pauses + noise (+6dB)

Music & pauses (500/125 ms)

Continuity illusion: Can we trust our ears?

Continuity illusion: Can we trust our ears?

Do you hear ongoing music?

top-down processing by our brain

Page 14: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Visual analogyVisual analogy

Bregman´s Bs

Page 15: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

What do we miss when applying classical models?

What do we miss when applying classical models?

! Temporal processing ! Modulation processing ! Spectro-temporal modulation processing! Binaural/spatial processing! Cognitive effects (interpolation,

suppression)

How can we quantify these effects and put them to work in speech processing?

Page 16: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

44

Filter-bank

Filter-bank

Modulation-Filterbank

Modulation-Filterbank

InternalNoise

InternalNoise

Envelope/CompressionEnvelope/

Compression

Central �optimal�

patternrecognizer/

processor

Central �optimal�

patternrecognizer/

processor

BinauralNoise

Reduction

BinauralNoise

Reduction44

2211

Filter-bank

Filter-bank

Modulation-Filterbank

Modulation-Filterbank

Envelope/CompressionEnvelope/

Compression

2211

Model of the �effective� processing in the auditory system 11& Impairments

Internal representation

Sound input33

Model frameworkModel framework

Dau, Kollmeier& Kohlrausch, 1997, Zerbs, 2000. Kollmeier, 2000, Derleth et al., 2001

Page 17: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Approach: Analysis by model& simulation system

Approach: Analysis by model& simulation system

Acoustical scene

Filter-bank

Filter-bank

Modulation-Filterbank

Modulation-Filterbank

InternalNoise

InternalNoise

Envelope/CompressionEnvelope/

Compression

Central �optimal�

patternrecognizer/

processor

Central �optimal�

patternrecognizer/

processor

BinauralNoise

Reduction

BinauralNoise

Reduction

Filter-bank

Filter-bank

Modulation-Filterbank

Modulation-Filterbank

Envelope/CompressionEnvelope/

Compression

Human listener

Auditory model

Hearing Aid/computer

Test result

PredictedTest result

System perfor-mance

Page 18: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Working hypothesisWorking hypothesis

! Lossy Front End (quantified by auditorymodel with information compression): acoustical input �internal representation�

! Perfect Back End: central pattern recognition (limited only by �internal noise�)

Auditory System exploits all available acoustical cues, employing

Technical �copy� of auditory front end yields near optimum performance of technical system

Page 19: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

OutlineOutline

! Auditory principles already �in silico�! Additional properties not yet exploited! Auditory models! Modulation processing! Binaural information processing! ...why it matters not only for hearing aids

Page 20: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Filter-bank

Filter-bank

InternalNoise

InternalNoise

Envelope/CompressionEnvelope/

Compression

Central �optimal�

patternrecognizer/

processor

Central �optimal�

patternrecognizer/

processor

BinauralNoise

Reduction

BinauralNoise

Reduction

Filter-bank

Filter-bank

Modulation-Filterbank

Modulation-Filterbank

Envelope/CompressionEnvelope/

Compression

Model of the �effective� processing in the auditory system

Model framework: Modulation frequency analysis

Model framework: Modulation frequency analysis

Dau, Kollmeier& Kohlrausch, 1997, Zerbs, 2000. Kollmeier, 2000, Derleth et al., 2001

Modulation-Filterbank

Modulation-Filterbank

Page 21: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Modulation frequency selectivity

Modulation frequency selectivity

! Modulation-map in the auditory system (Langer & Schreiner)

! Psychoacoustics: Modulationfilterbank(Dissertation Dau, Dau et al., JASA 1997, Dissertations Verhey, Derleth, Ewert)

! Signal processing, noise reduction (Kollmeier & Koch, JASA 1994)

! Advantage: Separation of different auditory objects covering the same frequency region

Page 22: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

ExamplesExamples of Amplitude of Amplitude Modulation Modulation SpectrogramSpectrogram (AMS)(AMS)

(voiced) speech speech simulation noise

Speech shows joint distribution in frequency/modulation frequency domain

Page 23: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

SNR Estimation fromModulation spectrograms

SNR SNR Estimation fromEstimation fromModulation Modulation spectrogramsspectrograms

Analysis frame

AMSPattern

Neural Network

Measured S/N in Analysis frame

Output Activities = Estimate of SNR!

Speech-to-Noise Ratio estimate either broadband or multiple narrowband

During training only

SignalSignal

Model

Recognizer

Page 24: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Importance of two-dimensional featuresin modulation spectrogram

Importance of two-dimensional featuresin modulation spectrogram

! Estimation error based on

+

5.2 dB

6.6 dB

7.6 dB

5.8 dB

Joint distribution of modulations and spectrum required!

�Full 2-dim distribution

�modulation spectrum

�bark spectrum

�combination of both

Page 25: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Monaural Noise reduction using Monaural Noise reduction using Amplitude Modulation Amplitude Modulation cuescues

Suppression of a fluctuating background noise using AMS

Industrial noise with speechunprocessed processed

Improves speech recognizer in noise (Tchorz & Kollmeier, 2002)

Modulation frequency analysis is important for speech perception & promising for speech processing

Page 26: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Spectrotemporal features from neurophysiology

Spectrotemporal features from neurophysiology

DeCharms et al. (1998)

Receptive fields of cortical neurons

Model of modulationperception

Chi et al. (1999)

Indications for spectro-temporal feature extraction

Depireux et al. (2000)

Page 27: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Gabor Tandem recognizerGabor Tandem recognizerMore details: M. Kleinschmidt: 'Localized Spectro-temporal Features for ASR'

Eurospeech Session SThBb - Time is of the Essence, Thursday 10am, Room 2

Page 28: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Aurora: relative reduction in word error rate (WER)

Aurora: relative reduction in word error rate (WER)

51,750,4

46,7

53,9

55,8

58,3

40

42

44

46

48

5052

54

56

58

60

rela

tive

redu

ctio

n in

WER

new ETSI standard

ICSI/OGI

ICSI/OGI +

Gabor

Tidigits

speechdat-car

Kleinschmidt & GelbartICSLP 2002

Page 29: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

OutlineOutline

! Auditory principles already �in silico�! Additional properties not yet exploited! Auditory models! Modulation processing! Binaural information processing! ...why it matters not only for hearing aids

Page 30: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Filter-bank

Filter-bank

InternalNoise

InternalNoise

Envelope/CompressionEnvelope/

Compression

Central �optimal�

patternrecognizer/

processor

Central �optimal�

patternrecognizer/

processor

BinauralNoise

Reduction

BinauralNoise

Reduction

Filter-bank

Filter-bank

Modulation-Filterbank

Modulation-Filterbank

Envelope/CompressionEnvelope/

Compression

Model of the �effective� processing in the auditory system

Model framework: Binaural noise reduction

Model framework: Binaural noise reduction

Modulation-Filterbank

Modulation-Filterbank

Page 31: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Speech Reception Threshold for different spatial arrangements

Speech Reception Threshold for different spatial arrangements

! One continuous jammer provides maximum effect

! Second, opposite jammer can not be cancelled simultaneously

Azimuth of jammer

Second, fixed jammer

Target speech from front

Jammer from different azimuth

Peissig &Kollmeier,

1997

Binaural hearing operates like 2-sensor adaptive beamformer

Binaural advantage

Page 32: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

v.Hövel-model (´84) modelv.Hövel-model (´84) model

Zur Bedeutung der Übertragungseigenschaften des Außenohres sowie des binauralen Hörsystems bei gestörter Sprachübertragung, Dissertation, RWTH

Aachen

EC-circuit

......

EC-circuit

EC-circuit

EC-circuit

EC-circuit

F1

......

F4

F3

F2

Fn

S/N estimate

BinauralInput

Filterbank Noisereduction

F1

......

F4

F3

F2

Fn

S/N estimate

S/N estimate

S/N estimate

S/N estimate

....

Articulation

Index

Output:

Intelligibility estimate

τw

Page 33: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

v.Hövel-model (´84) model: modifications

v.Hövel-model (´84) model: modifications

EC-circuit

......

EC-circuit

EC-circuit

EC-circuit

EC-circuit

F1

......

F4

F3

F2

Fn

S/N estimate

BinauralInput

Filterbank

F1

......

F4

F3

F2

Fn

S/N estimate

S/N estimate

S/N estimate

S/N estimate

....

Articulation

Index

Output:

Intelligibility estimate

Gammatone Filterbank

Numerical optimization of delay& gains

Speech Intelligibility Index (SII)

τw

Page 34: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Speech Reception Threshold fordifferent spatial arrangements

Speech Reception Threshold fordifferent spatial arrangements

Azimuth of jammer

Second, fixed jammer

Target speech from front

Jammer from different azimuth

Peissig &Kollmeier,

1997

Predictions by binaural model(R. Beutelmann, T.Brand)

Page 35: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Corresponding binaural Beamformer hearing aid Corresponding binaural Beamformer hearing aid

Adaptation

Activation

Control algorithm

Cocktail-Party processor

Page 36: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Performance of two-input �Cocktail party processors�Performance of two-input

�Cocktail party processors�

Binaural situation-adaptive directional filter (Wittkop, 2000)! One speaker in stationary noise! One speaker from the front

! + 3 interfering speakers! + Algorithmus

Localization model-driven beamformer (Nix & Hohmann, 2002)! 2 Sources, unprocessed! 2 Sources, processed, first direction // second direction! 3 Sources, unprocessed//processed

Blind Sound Source Separation (Anemüller&Kollmeier, 2002)! Mixture of two sources in a room! Separation of first source second source

No convincing separation of more than two sources

Page 37: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Particle filter to estimate best beamformer online

Particle filter to estimate best beamformer online

Feature ExtractionConvolution withHRTF/AOI

BinauralSpeech Signals

Codebook

Transition Matrix

Particle Initialization

Calculate HypotheticalSpectra

State Prediction

Compare to Observation

Resample Particles

Get ExpectationValues

HRTF Database

Details: Johannes Nix, M. Kleinschmidt, V. Hohmann: 'CASA by Using Statistics of High-Dimensional Speech Dynamics and Sound Source Direction� Eurospeech 2003 Session PTuDe - Speech Enhancement II, Tuesday4pm, Main Hall, Level-1

Page 38: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Results: On-line Recovery of Spectral Envelopes

Results: On-line Recovery of Spectral Envelopes

(see: Nix, Kleinschmidt, Hohmann, Tuesday 4pm)

original voices

left/right ear signal

estimated voices (left side)

Separation from multiple sources with much computation & �cognitive� complexity!

Page 39: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

ConclusionsConclusions

! Auditory principles for speech processing look promising

! Interaction experiment-model-application

! Amplitude Modulation Spectrogram ! ! optimally switched two-sensor

beamformer to mimic binaural system! Top-down vs. bottom-up processing

yet to be explored

Test result

PredictedTest result

System perfor-mance

Page 40: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

Hearing aid or personal communication device?Hearing aid or personal communication device?

Technology for hearing aids and mobile phones convergeKnowledge from hearing aid design is required for modern speech communication systems!

JABRAJABRAK-WON

HörTech PrototypeIn-the-ear hearing aid

Behind-the-ear

hearing aid

Page 41: Medizinische Physik Auditory principles in speech ...medi.uni-oldenburg.de/download/docs/paper/koll...jammer provides maximum effect! Second, opposite jammer can not be cancelled simultaneously

Medizinische Physik

........thankthank youyou!!

A binaural hearing aid to sit in �

The ultimate way of achieving a good

performance in cocktailparties!

Auditory throne in front of the

new�House ofhearing�

(Oldenburg)