improving speech recognition with embodied cognition and behaviour-based robotics

30
Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics Jorge Davila-Chacon University of Hamburg - Knowledge Technology www.informatik.uni-hamburg.de/WTM/ Spotify ML Meetup – November 3 rd 2014

Upload: jorge-davila-chacon

Post on 11-Jul-2015

183 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Improving Speech Recognitionwith Embodied Cognition

and Behaviour-based Robotics

Improving Speech Recognitionwith Embodied Cognition

and Behaviour-based Robotics

Jorge Davila-Chacon

University of Hamburg - Knowledge Technology

www.informatik.uni-hamburg.de/WTM/

Spotify ML Meetup – November 3rd 2014

Page 2: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

MotivationMotivation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 2

• Why is bio-inspired SSL interesting / useful?

Page 3: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Neurobotic ExperimentsNeurobotic Experiments

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 3

Page 4: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Virtual Reality LabVirtual Reality Lab

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 4

Bauer, J., Davila-Chacon, J., Strahl, E., Wermter, S. Smoke and Mirrors — Virtual Realities for Sensor Fusion Experiments in Biomimetic Robotics. In: Multisensor Fusion and Integration for Intelligent Systems, 2012

Page 5: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Neurobotic ExperimentsNeurobotic Experiments

Jorge Davila-Chacon 5Bio-Inspired SSL for Robot ASR

Page 6: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 6

ITD

ILD

ITDs fromLow Frequencies

ITDs fromLow Frequencies

ILDs fromHigh Frequencies

ILDs fromHigh Frequencies

Spatial cues allow sound source localisation:

• Interaural Time Difference (ITD)• Interaural Level Difference (ILD)

Spatial cues allow sound source localisation:

• Interaural Time Difference (ITD)• Interaural Level Difference (ILD)

Same frequency component

Same frequency component

Page 7: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 7

ITDs extracted in Medial Superior Olive (MSO)

ITDs extracted in Medial Superior Olive (MSO)

• AVCN - Anterior Ventral Cochlear Nucleus

• AN - Auditory Nerve

• IC – Inferior Colliculus

Interaural Time DifferencesNeuroanatomy

Interaural Time DifferencesNeuroanatomy

Page 8: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 8

Interaural Time DifferencesComputational Principle

Interaural Time DifferencesComputational Principle

Page 9: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 9

ILDs extracted in Lateral Superior Olive (LSO)

ILDs extracted in Lateral Superior Olive (LSO)

• MNTB - Medial Nucleus of the Trapezoid Body

• IC – Inferior Colliculus

Interaural Level DifferencesNeuroanatomy

Interaural Level DifferencesNeuroanatomy

Page 10: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 10

Output ofMSO and LSO integrated in

IC

Output ofMSO and LSO integrated in

IC

J. Dávila-Chacón, S. Heinrich, J. Liu, S. Wermter. Biomimetic Binaural Sound Source Localisation with Ego-Noise Cancellation. International Conference on Artificial Neural Networks, 2012.

Page 11: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 11

Page 12: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 12

Page 13: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 13

Page 14: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 14

MLP

IC

IC

Page 15: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 15

J. Dávila-Chacón, S. Magg, J. Liu, S. Wermter. Neural and Statistical Processing of Spatial Cues for Sound Source Localisation. International Joint Conference on Neural Networks, 2013.

Page 16: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 16

Page 17: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 17

Simple IC outputSimple IC output

Page 18: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 18

Complex IC outputComplex IC output

Page 19: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Bio-Inspired Sound Source LocalisationBio-Inspired Sound Source Localisation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 19

Static SSLStatic SSL

Dynamic SSL

Dynamic SSL

Feed forwardneural network

Page 20: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Robotic Automatic Speech RecognitionRobotic Automatic Speech Recognition

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 20

Platforms used for ASR: iCub and Soundman

Platforms used for ASR: iCub and Soundman

Page 21: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Robotic Automatic Speech RecognitionRobotic Automatic Speech Recognition

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 21

J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation. International Conference on Artificial Neural Networks, 2014.

Binary measure - Static ASRBinary measure - Static ASR

Page 22: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Robotic Automatic Speech RecognitionRobotic Automatic Speech Recognition

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 22

Continuous measure - Static ASR

Continuous measure - Static ASR

J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation. International Conference on Artificial Neural Networks, 2014.

Page 23: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

● Robotics as a “sandbox” for learning ML

● Neuroscience provides clues for computational principles

● Embodiment• iCub allows computation of spatial cues

• Interaction with environment can reduce noise

● Signal processing with ANN• Spiking ANN are an effective representation of spatial cues

• Bayesian integration important for dimensionality reduction

• Softmax Neural layer robust to ego-noise and reverberation

ConclusionConclusion

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 23

Page 24: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Future WorkFuture Work

● Neural SSL• Integrate GPU version of MSO and LSO

• Propagation of probabilities through time

• From discrete to continuous

● Integration with vision• From supervised to unsupervised SSL

• Possible extension to sensorimotor contingencies• Vision to select between multiple sound sources

• Vision for speech segregation

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 24

Page 25: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

Thank you for your attention.

[email protected]

LinkedIn: Jorge Davila Chacon

• J. Liu, D. Perez-Gonzalez, A. Rees, H. Erwin, S. Wermter. A biologically inspired spiking neural network model of the auditory midbrain for sound source localisation. Neurocomputing (2010)

• J. Davila-Chacon, S. Heinrich, J. Liu, and S. Wermter. Biomimetic binaural sound source localisation with ego-noise cancellation. International Conference on Artificial Neural Networks (2012)

• J. Bauer, J. Davila-Chacon, E. Strahl, S. Wermter. Smoke and Mirrors — Virtual Realities for Sensor Fusion Experiments in Biomimetic Robotics. Multisensor Fusion and Integration for Intelligent Systems (2012)

• J. Davila-Chacon, S. Magg, J. Liu, S. Wermter. Neural and Statistical Processing of Spatial Cues for Sound Source Localisation. International Joint Conference on Neural Networks (2013)

• J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation. International Conference on Artificial Neural Networks (2014)

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 25

Page 26: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

AppendixAppendix

Best performances with clustering layer

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 26

Page 27: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

AppendixAppendix

Best performances with clustering layer

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 27

Page 28: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

AppendixAppendix

Bayesian IC model

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 28

Page 29: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

AppendixAppendix

Bayesian IC model

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 29

Page 30: Improving Speech Recognition with Embodied Cognition and Behaviour-based Robotics

AppendixAppendix

Levenshtein distance

Jorge Davila-Chacon Bio-Inspired SSL for Robot ASR 30

J. Dávila-Chacón, J. Twiefel, J. Liu, S. Wermter. Improving Humanoid Robot Speech Recognition with Sound Source Localisation. International Conference on Artificial Neural Networks, 2014.