![Page 1: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/1.jpg)
Neuromorphic Audition Group
Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky, David Anderson, Malcolm
Slaney, Andrew Schwartz, Tara Julia Hamilton, John Harris, Nima Mesgarani, Shihab Shamma
![Page 2: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/2.jpg)
Outline
• Field Programmable Analog Array (Dave)• Speaker Identification (Malcolm, Nima and Max)• Speech Recognition (Hynek, Misha, Jordon)• STRF Noise Suppression (Nima, Shihab, Dave)• Reconstructions from STRF/Modulation Detectors
(Nima, Shihab)• Social sonar demonstration using silicon cochlea and
RoboQuad toy (Toby and Malcolm)• Cochlear ITD Detector (Andrew, Malcolm, Shih-Chii)• Cochlear Periodicity Detector (Teddy, John, Malcolm,
Shih-Chii)
![Page 3: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/3.jpg)
FPAA
![Page 4: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/4.jpg)
Speaker ID
Features Model
Features Model
Features Model
Features Model
WinnerTake
All
MFCCSTRF
GMMART
![Page 5: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/5.jpg)
Speaker ID - STRF
![Page 6: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/6.jpg)
Speaker ID – ARTMalcom Slaney – Heather Ames – Max Versace
Supervised Fuzzy Adaptive Resonance Theory neural network (ARTMAP) uses top-down expectations to learn categories
First test: three synthesized vowels (large clusters) spoken by three speakers (different colors) represented in 2D feature space.
![Page 7: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/7.jpg)
Speaker ID - ART Results
Feature extraction
Feature extraction
Vowel extraction
Vowel extraction
TrainingTraining
Features
Feature vectors for“vowel” data
Acoustic Model of
Speaker Identity
Speech input (.wav)
12 MFCC + E, First and second derivs
Utterance Independent
transformation
Utterance Independent
transformation
TransformedFeatures
½ wave rectify, Lowpass filter,
Choice of high energy timeslices
TBD
ARTMAP TestingTesting
PredictedSpeaker Identity
50% correct after 100 cross-validations (# of instances of ARTMAP run)on 10 speaker identification
Continued work:1.Improved vowel extraction2.Utterance independent transformation of feature space
Why we care?Top-DownOnline
![Page 8: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/8.jpg)
Speaker ID - Results
Test % Correct
% Correctin 5dB noise
MFCC (Baseline)
81.3% 81.0%
STRF 79.8%
ART ~60%
Very preliminary work!!!! Comparing to technology (MFCC+GMM) that have been perfected over
decades.
![Page 9: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/9.jpg)
ASR - Phoneme Posteriors
![Page 10: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/10.jpg)
ASR - Combining InformationTr
aini
ng
Cont
ext
{ } { }, Pr | ,Q X C Correct X C=
C
X
?Machines P(word|sound)P(word|context)
Humans [1-P(word|sound)] [1-P(word|context)]
Maximize
![Page 11: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/11.jpg)
Inverse model: from neural responses to sound
QuickTime™ and a decompressor
are needed to see this picture.
![Page 12: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/12.jpg)
Reconstruction of speech in white noise
• Reconstructed speech is “cleaner” than the original noisy
QuickTime™ and a decompressor
are needed to see this picture.
Original Spectrograms Reconstructed Spectrograms
![Page 13: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/13.jpg)
Psychoacoustically-motivated Speech Enhancement
• Perceptual loudnessL=(b*e(t))^a
• By mapping loudness using the same type of function, noise can be decreased
• Results from STRFprocessing
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 14: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/14.jpg)
Noise suppression using inverse model
• Train G-filters on reconstructing clean stimuli from corresponding noisy responses. Apply the trained filters to new noisy responses
14
Cortical decomposition “Trained” inverse filters
QuickTime™ and a decompressor
are needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
![Page 15: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/15.jpg)
Noise Suppression for White, Jet and City Noise
15
![Page 16: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/16.jpg)
RS MediaLinux version 2.4.18-rmk5-mx1ads-p3 ([email protected]) (gcc version 2.95.3 20010315 (release)) #517 Fri Feb 16 11:40:45 HKT 2007Processor: ARM/CIRRUS Arm920Tsid(wb) revision 0Architecture: Motorola MX1ADSOn node 0 totalpages: 8192zone(0): 8192 pages.zone(1): 0 pages.zone(2): 0 pages.Kernel command line: root=fe01 ro mem=32MConsole: colour dummy device 80x30Calibrating delay loop... 98.50 BogoMIPSMemory: 32MB = 32MB totalMemory: 30816KB available (1023K code, 316K data, 60K init)Dentry-cache hash table entries: 4096 (order: 3, 32768 bytes)Inode-cache hash table entries: 2048 (order: 2, 16384 bytes)Mount-cache hash table entries: 512 (order: 0, 4096 bytes)Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes)Page-cache hash table entries: 8192 (order: 3, 32768 bytes)POSIX conformance testing by UNIFIXLinux NET4.0 for Linux 2.4Based upon Swansea University Computer Society NET3.039Initializing RT netlink socketStarting kswapdttySA0 at I/O 0x206000 (irq = 29) is a MX1ADSttySA1 at I/O 0x207000 (irq = 23) is a MX1ADSpty: 256 Unix98 ptys configuredDMA InitializingLinux version 2.4.18-rmk5-mx1ads-p3 ([email protected]) (gcc version 2.95.3 20010315 (release)) #517 Fri Feb 16 11:40:45 HKT 2007Processor: ARM/CIRRUS Arm920Tsid(wb) revision 0Architecture: Motorola MX1ADSOn node 0 totalpages: 8192zone(0): 8192 pages.zone(1): 0 pages.zone(2): 0 pages.Kernel command line: root=fe01 ro mem=32MConsole: colour dummy device 80x30Calibrating delay loop... 98.50 BogoMIPSMemory: 32MB = 32MB totalMemory: 30816KB available (1023K code, 316K data, 60K init)Dentry-cache hash table entries: 4096 (order: 3, 32768 bytes)Inode-cache hash table entries: 2048 (order: 2, 16384 bytes)Mount-cache hash table entries: 512 (order: 0, 4096 bytes)Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes)Page-cache hash table entries: 8192 (order: 3, 32768 bytes)POSIX conformance testing by UNIFIXLinux NET4.0 for Linux 2.4Based upon Swansea University Computer Society NET3.039Initializing RT netlink socketStarting kswapdttySA0 at I/O 0x206000 (irq = 29) is a MX1ADSttySA1 at I/O 0x207000 (irq = 23) is a MX1ADSpty: 256 Unix98 ptys configuredDMA Initializing
![Page 17: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/17.jpg)
Cochlear - ITD Detector
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Time
Position
![Page 18: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/18.jpg)
Cochlear - JAER Demo
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 19: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/19.jpg)
Cochlear - Periodicity detectorResponse to “hiss” Response to “coo”
![Page 20: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/20.jpg)
![Page 21: Neuromorphic Audition Group Massimiliano Versace, Jonathan Tapson, Rico Moeckel, Cristina Muoz, Yingxue Wang, Shih-Chii Liu, Andre van Schaik, Hynek Hermansky,](https://reader036.vdocuments.mx/reader036/viewer/2022062618/55149091550346d36e8b51ea/html5/thumbnails/21.jpg)
When both channels conditionally independent• pCpA – probability of correct recognition in both channels• pC(1-pA ) – correct in ch1 but not in ch2
• pA(1-pC) – correct in ch2 but not in ch1
These three cases are mutually exclusive, thus probability of correct recogntion is
p = pCpA + pC(1-pA) + pA(1-pC) = pC+pA-pCpA
Probability of error
e = (1-p) = 1-pC-pA+pCpA = (1-pC)(1-pA) = eCeA
context(top-down)
acoustic(bottom-up)
pC
pA
stimulus decision