Download - Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Neuromorphic Audition Group

Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky, David Anderson, Malcolm

Slaney, Andrew Schwartz, Tara Julia Hamilton, John Harris, Nima Mesgarani, Shihab Shamma

Outline

• Field Programmable Analog Array (Dave)• Speaker Identification (Malcolm, Nima and Max)• Speech Recognition (Hynek, Misha, Jordon)• STRF Noise Suppression (Nima, Shihab, Dave)• Reconstructions from STRF/Modulation Detectors

(Nima, Shihab)• Social sonar demonstration using silicon cochlea and

RoboQuad toy (Toby and Malcolm)• Cochlear ITD Detector (Andrew, Malcolm, Shih-Chii)• Cochlear Periodicity Detector (Teddy, John, Malcolm,

Shih-Chii)

Speaker ID

Features Model

Features Model

Features Model

Features Model

WinnerTake

All

MFCCSTRF

GMMART

Speaker ID - STRF

Speaker ID – ARTMalcom Slaney – Heather Ames – Max Versace

Supervised Fuzzy Adaptive Resonance Theory neural network (ARTMAP) uses top-down expectations to learn categories

First test: three synthesized vowels (large clusters) spoken by three speakers (different colors) represented in 2D feature space.

Speaker ID - ART Results

Feature extraction

Feature extraction

Vowel extraction

Vowel extraction

TrainingTraining

Features

Feature vectors for“vowel” data

Acoustic Model of

Speaker Identity

Speech input (.wav)

12 MFCC + E, First and second derivs

Utterance Independent

transformation

Utterance Independent

transformation

TransformedFeatures

½ wave rectify, Lowpass filter,

Choice of high energy timeslices

TBD

ARTMAP TestingTesting

PredictedSpeaker Identity

50% correct after 100 cross-validations (# of instances of ARTMAP run)on 10 speaker identification

Continued work:1.Improved vowel extraction2.Utterance independent transformation of feature space

Why we care?Top-DownOnline

Speaker ID - Results

Test % Correct

% Correctin 5dB noise

MFCC (Baseline)

81.3% 81.0%

STRF 79.8%

ART ~60%

Very preliminary work!!!! Comparing to technology (MFCC+GMM) that have been perfected over

decades.

ASR - Phoneme Posteriors

ASR - Combining InformationTr

aini

ng

Cont

ext

{ } { }, Pr | ,Q X C Correct X C=

C

X

?Machines P(word|sound)P(word|context)

Humans [1-P(word|sound)] [1-P(word|context)]

Maximize

Inverse model: from neural responses to sound

QuickTime™ and a decompressor

are needed to see this picture.

Reconstruction of speech in white noise

• Reconstructed speech is “cleaner” than the original noisy



Original Spectrograms Reconstructed Spectrograms

Psychoacoustically-motivated Speech Enhancement

• Perceptual loudnessL=(b*e(t))^a

• By mapping loudness using the same type of function, noise can be decreased

• Results from STRFprocessing

QuickTime™ and aTIFF (Uncompressed) decompressor


Noise suppression using inverse model

• Train G-filters on reconstructing clean stimuli from corresponding noisy responses. Apply the trained filters to new noisy responses

14

Cortical decomposition “Trained” inverse filters





Noise Suppression for White, Jet and City Noise

15

RS MediaLinux version 2.4.18-rmk5-mx1ads-p3 ([email protected]) (gcc version 2.95.3 20010315 (release)) #517 Fri Feb 16 11:40:45 HKT 2007Processor: ARM/CIRRUS Arm920Tsid(wb) revision 0Architecture: Motorola MX1ADSOn node 0 totalpages: 8192zone(0): 8192 pages.zone(1): 0 pages.zone(2): 0 pages.Kernel command line: root=fe01 ro mem=32MConsole: colour dummy device 80x30Calibrating delay loop... 98.50 BogoMIPSMemory: 32MB = 32MB totalMemory: 30816KB available (1023K code, 316K data, 60K init)Dentry-cache hash table entries: 4096 (order: 3, 32768 bytes)Inode-cache hash table entries: 2048 (order: 2, 16384 bytes)Mount-cache hash table entries: 512 (order: 0, 4096 bytes)Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes)Page-cache hash table entries: 8192 (order: 3, 32768 bytes)POSIX conformance testing by UNIFIXLinux NET4.0 for Linux 2.4Based upon Swansea University Computer Society NET3.039Initializing RT netlink socketStarting kswapdttySA0 at I/O 0x206000 (irq = 29) is a MX1ADSttySA1 at I/O 0x207000 (irq = 23) is a MX1ADSpty: 256 Unix98 ptys configuredDMA InitializingLinux version 2.4.18-rmk5-mx1ads-p3 ([email protected]) (gcc version 2.95.3 20010315 (release)) #517 Fri Feb 16 11:40:45 HKT 2007Processor: ARM/CIRRUS Arm920Tsid(wb) revision 0Architecture: Motorola MX1ADSOn node 0 totalpages: 8192zone(0): 8192 pages.zone(1): 0 pages.zone(2): 0 pages.Kernel command line: root=fe01 ro mem=32MConsole: colour dummy device 80x30Calibrating delay loop... 98.50 BogoMIPSMemory: 32MB = 32MB totalMemory: 30816KB available (1023K code, 316K data, 60K init)Dentry-cache hash table entries: 4096 (order: 3, 32768 bytes)Inode-cache hash table entries: 2048 (order: 2, 16384 bytes)Mount-cache hash table entries: 512 (order: 0, 4096 bytes)Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes)Page-cache hash table entries: 8192 (order: 3, 32768 bytes)POSIX conformance testing by UNIFIXLinux NET4.0 for Linux 2.4Based upon Swansea University Computer Society NET3.039Initializing RT netlink socketStarting kswapdttySA0 at I/O 0x206000 (irq = 29) is a MX1ADSttySA1 at I/O 0x207000 (irq = 23) is a MX1ADSpty: 256 Unix98 ptys configuredDMA Initializing

Cochlear - ITD Detector



Time

Position

Cochlear - JAER Demo



Cochlear - Periodicity detectorResponse to “hiss” Response to “coo”

When both channels conditionally independent• pCpA – probability of correct recognition in both channels• pC(1-pA ) – correct in ch1 but not in ch2

• pA(1-pC) – correct in ch2 but not in ch1

These three cases are mutually exclusive, thus probability of correct recogntion is

p = pCpA + pC(1-pA) + pA(1-pC) = pC+pA-pCpA

Probability of error

e = (1-p) = 1-pC-pA+pCpA = (1-pC)(1-pA) = eCeA

context(top-down)

acoustic(bottom-up)

pC

pA

stimulus decision

Download - Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,

Top Related