expanding accessibility audio signal processing

16
Expanding Accessibility – Audio Signal Processing Ivan Tashev Partner Software Architect Audio and Acoustics Research Group MSR Labs – Redmond

Upload: others

Post on 28-Dec-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Expanding Accessibility Audio Signal Processing

Expanding Accessibility –Audio Signal Processing

Ivan Tashev

Partner Software Architect

Audio and Acoustics Research Group

MSR Labs – Redmond

Page 2: Expanding Accessibility Audio Signal Processing

Agenda

• Audio Understanding• Bumblebee project

• Spatial Audio • Cities Unlocked project

• HoloLens device• Research platform for applications helping visually and hearing impaired

7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 2

Page 3: Expanding Accessibility Audio Signal Processing

Collaborators

• Audio and Acoustics Research Group in MSR Labs, Redmond

• Interns: Piotr Bilinski, Archontis Politis, Nilesh Madhu, Jinkyu Lee, Kun Han, Keith Godin, Hoang Do, many others

• The exceptional engineering teams in HoloLens, Kinect, and Windows we had the honor to work with

Hannes GamperMicrosoft Research

David JohnstonMicrosoft Research

Ivan TashevMicrosoft Research

Mark R. P. ThomasDolby Laboratories

Jens AhrensChalmers University,

Sweden

7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 3

Page 4: Expanding Accessibility Audio Signal Processing

Audio UnderstandingBumblebee project

7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 4

Page 5: Expanding Accessibility Audio Signal Processing

Extracting non-verbal cues

• The meaning of speech – less than 50% of human to human communication

• Audio understanding• Speaker identification and verification

• Gender and age detection

• Emotion detection

• Audio environment recognition

• Audio events detection

• Core application• Smarter and better HMIs

7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 5

Page 6: Expanding Accessibility Audio Signal Processing

General architecture

• Framing and feature extraction• MFCCs, pitch, log-Energy, ZCR, …

• Up to 960 features in some cases

• Classifier• Frame/segment level: DNN, GMM

• Utterance level: SVM, ELM, HMM

• LSTM RNNs for end-to-end

DNNUtterance-level

featureUtterance-

level classifierEmotion

Segment-levelfeature extraction

7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 6

Page 7: Expanding Accessibility Audio Signal Processing

Project Bumblebee

• Mobile phone application

• Visualizes the sound• Level and frequency content

• Recognizes audio objects• Fire or CO2 alarm• Door bell• Phone ring• Baby crying

• Social involvement• Sound can be sent for recognition to a support group• The added to the dataset of recognizable sounds

• Started as a Hakaton project in 2015• Work continues this summer with 6 interns

7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 7

Page 8: Expanding Accessibility Audio Signal Processing

Spatial AudioCities Unlocked project

7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 8

Page 9: Expanding Accessibility Audio Signal Processing

Binaural recording and reproduction

• Theatrophone, 1881

• Binaural recordings, mid-50s

• Problems:• Fixed audio scene

• HRTFs mismatch

Neuman KU-100

7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 9

Page 10: Expanding Accessibility Audio Signal Processing

HRTF and personalization

• HRTFs describe acoustic path from sound source to ear entrances• Contain all intraural and spectral localization cues• Are a function of sound direction• Can be considered distance-independent for radii > 1m

• Head and torso geometry affects wave propagation• Anthropometric features are individual • Hence HRTFs are individual• Spatial hearing is individual!

• Using machine learning approaches for HRTF personalization

7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 10

Page 11: Expanding Accessibility Audio Signal Processing

Cities Unlocked project

• Joint project with Guide Dogs UK, Microsoft UK, and MSR Labs in Cambridge and Redmond

• Headset device with IMU + smartphone• Knot of problems

• Detection and tracking of markers• 3D audio representation and rendering• UI aspects for visually impaired

• November 2014 – first phase• trial deployment, 5 people

• November 2015 – second phase• deployed to 50 people

• November 2016 – third phase

7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 11

Page 12: Expanding Accessibility Audio Signal Processing

HoloLens deviceA platform for applications helping visually and hearing impaired people

7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 12

Page 13: Expanding Accessibility Audio Signal Processing

HoloLens – released March 2016

• Wearable AR device:• Heads-up display

• Spatial audio system

• Windows 10 computer

• Set of DSPs underneath

• Sensors:• RGB camera

• 4 microphones

• Depth camera

• Head orientation and position tracking

7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 13

Page 14: Expanding Accessibility Audio Signal Processing

Usage scenarios

• Gaming

• Entertainment

• Productivity

• Science

• Design and art

• Education

7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 14

Page 15: Expanding Accessibility Audio Signal Processing

HoloLens for Enabling scenarios research

• Autonomous wearable device

• Packed with sensors

• Gesture, graphics, voice (GGV) - HMI input modalities

• HUD and spatial audio – HMI output modalities

• Substantial computing power, Wi-Fi connected

• Attractive device for conducting research and designing UI and other functionality

7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 15

Page 16: Expanding Accessibility Audio Signal Processing

Finally …

Questions?

7/6/2016 UW CSE - MSR Summer Institute: Audio Signal Processing 16