stardust project – speech recognition for people with severe dysarthria mark parker specialist...
TRANSCRIPT
STARDUST PROJECT – Speech Recognition for People with Severe Dysarthria
Mark ParkerSpecialist Speech and Language Therapist
Project Team
DoH NEAT University of Sheffield Barnsley District General Hospital Prof P Enderby/ M Parker – Clinical Speech
Therapy Prof P Green/ Dr Athanassios Hatzis – Computer
Sciences Prof M Hawley/ Dr Simon Brownsall – Medical
Physics
What is Dysarthria?
A neurological motor speech impairment characterised by slow, weak, imprecise and/or uncoordinated movements of the speech musculature.
May be congenital or acquired 170/100 000 (Emerson & Enderby
1995)
Severity Rating
Typically based on ‘intelligibility’‘…the extent a listener understands
the speech produced…’ (Yorkston et al, 1999)
Not a pure measure – interaction of events
Mild 70-90%Moderate 40-70%Severe 10-40%
Aim
VRS used to access other technology Many of the people with severe
dysarthria will have associated severe physical disability
ECA operated with switching systems slow, laborious, positioning
VRS to supplement or replace switching
Background
Voice recognition systems commercially available packages -mobile phones,
WP packages-Dragon Dictate Continuous vs Discrete
Normal speech - with recognition training can get >90% recognition rates (Rose and Galdo, 1999)
Dysarthric speech - mild 10-15% lower recognition rates (Ferrier, 1992),
Declining rapidly as speech deteriorates 30-40% single words (Thomas-Stonell, 1998)- functionally useless
Intelligibility vs Consistency
Difference between machine recognition and human perception
‘Normal’ speech may be 100% intelligible and with a narrow band of differences across time (consistency).
‘Severe’ dysarthria may be completely unintelligible but may show consistency of key elements (or not)
Development of the system
10-12 volunteers - severe dysarthria and physical disability
Speech <30% intelligibility ratingVideo/DAT recording/computer
samplingAssessing for the range of phonetic
contrasts that can be achieved
Development of a system (2)
Discrete system - the number of contrasts that can be achieved will determine the number of commands that the VRS can handle
Don’t need intelligibility - need consistency
Determine what word/sound/phonetic contrast will represent what command
Development of a system (3)
Train the VRS - neural networks and hidden Markhov modelling
Speech consistency training Implement the system
Current position
Software development – sophisticated recording and data logging facility to be combined with ‘consistency’ measure and spectography package.
Developing ‘user friendliness’ and possibility of ‘remote’ usage.
Identifying & Recording EC commands ‘Labelling’ the sample Attempting to define measures of baseline
consistency at an ‘acoustic’ level Experimenting with recognition accuracy of
commercially available product - Sicare
Labelling
Breaking an utterance into component parts
To establish the extent of variance over time
Sicare testing
Recognition rates compatible with previous research
Begins to illustrate the points at which a recogniser becomes ‘confused’
May illustrate the areas where distinction has to be made
May start to illustrate some of the key acoustic factors that are crucial in dysarthric speech and VR
Non adapted commercial product functionally useless for this population
Subsidiary Questions
Is dysarthric speech consistent?Does the underlying acoustic/soundwave
pattern contain consistent differences in contrasts that are not perceptually distinguishable?
Can consistency be trained in the absence of intelligibility?
Does increasing consistency increase intelligibility?
Normal speech “alarm” 1&2
Normal speech “alarm” 2
Normal speech “television”
Dysarthric speech “television”