medical diagnosis coms 6998 fall 2009 promiti dutta
Post on 22-Dec-2015
223 views
TRANSCRIPT
Medical Diagnosis
COMS 6998 Fall 2009
Promiti Dutta
An Exploratory Social-Emotional Prosthetic for Autism Spectrum
Disorders
Measurement of emotional involvement in spontaneous
speaking behavior
Vocal Emotion Recognition with Cochlear Implants
Use of prosodic speech characteristics for automated
detection of alcohol intoxication
M. Levit, R. Huber, A. Batliner, E. Noeth
2001
R. Kaliouby, A. Teeters, R. W. Picard
2006
S. Luo, Q. J. Fu, J. J. Galvin
2006
B. Z. Pollermann
2000
An Exploratory Social-Emotional Prosthetic for Autism Spectrum
Disorders
Measurement of emotional involvement in spontaneous
speaking behavior
Vocal Emotion Recognition with Cochlear Implants
Use of prosodic speech characteristics for automated
detection of alcohol intoxication
M. Levit, R. Huber, A. Batliner, E. Noeth
2001
R. Kaliouby, A. Teeters, R. W. Picard
2006
S. Luo, Q. J. Fu, J. J. Galvin
2006
B. Z. Pollermann
2000
Cochlear Implants (CI)
• Can restore hearing sensation to deaf individuals
• How do they work?– Use spectrally-based speech-processing
strategies– Temporal envelope is extracted from number of frequency
analysis bands and used to modulate pulse trains of current delivered to appropriate electrodes
• CI performance is poor for challenging listening tasks– Speech in noise– Music perception– Voice gender– Speaker recognitionV
oca
l E
mo
tio
n R
eco
gn
itio
n w
ith
Co
chle
ar I
mp
lan
ts
Cochlear Implants (CI)
• Current research goal – Improve difficult listening conditions– How?
• Improve transmission of spectro- temporal fine structure cues
– Methods• Increase spectral resolution for apical electrodes to better code
pitch information (Geurts and Wouters)
• Sharpening the temporal envelope to enhance periodicity cues transmitted by the speech processor (Green et. al.)
Vo
cal
Em
oti
on
Rec
og
nit
ion
wit
h C
och
lear
Im
pla
nts
Prosodic Information in Spoken Language
• Prosodic features = variations in speech rhythm, intonation, etc.
• Prosodic cues = emotion of speaker• Acoustic features associated with vocal emotion
– Pitch (mean value and variability)– Intensity– Speech rate– Voice quality – Articulation
Normal hearing – 70 – 80% accuracy in recognition
AI, NN, Statistical classifiers equally as
well
Vo
cal
Em
oti
on
Rec
og
nit
ion
wit
h C
och
lear
Im
pla
nts
This Study
• CI Users: Investigate ability to recognize vocal emotions in acted emotional speech– Limited access to pitch information and spectro-temporal fine
structure cues
• Normal Hearers: vocal emotion recognition using unprocessed speech and speech processed by acoustic CI simulations– Simulations: different amounts of spectral resolution and
temporal information to examine relative contributions of spectral and temporal cues
Vo
cal
Em
oti
on
Rec
og
nit
ion
wit
h C
och
lear
Im
pla
nts
Subjects
• 6 NH (3 males and 3 females)– Puretone treshold better than 20 dB HL at octave frequencies
from 125 to 8000 Hz in both ears
• 6 CI (3 males and 3 females)– Post-lingually deafened– 5 of 6 subjects: at-least one-year experience with device– 1 subject: 4 months’ experience with device– (3 Nucleus-22 users, 2 Nucelus-24 users, 1 Freedom User)– Tested using clinically assigned speech processors
• Native English speakers• Participants paid for participation
Vo
cal
Em
oti
on
Rec
og
nit
ion
wit
h C
och
lear
Im
pla
nts
Stimuli and Speech Processing
• HEI-ESD – emotional speech database – 1 male; 1 female– 50 simple English sentences– 5 target emotions (neutral, anxious, happy, sad, and angry)– Same sentences used to convey different target emotions in
order to minimize contextual and discourse cues
• Speech processing– Digitized using 16-bit A/D converter– 22,050 Hz sampling rate, without high-frequency pre-
emphasis– Relative intensity cues preserved for each emoitonal qualities– Samples NOT normalized
Vo
cal
Em
oti
on
Rec
og
nit
ion
wit
h C
och
lear
Im
pla
nts
Stimuli and Speech Processing
• Database evaluated– 3 NH English-speaking listeners– 10 sentences that produced highest vocal emotion
recognition scores selected for experimental testing– Total = 100 tokens (2 speakers * 5 emotions * 10 sentences)
Vo
cal
Em
oti
on
Rec
og
nit
ion
wit
h C
och
lear
Im
pla
nts
Emotion Recognition Tests
• CI subjects - unprocessed speech• NH subjects - unprocessed speech + speech
processed by acoustic, sine-wave vocoder CI simulations.
• Continuous Interleaved Sampling (CIS) strategy
Vo
cal
Em
oti
on
Rec
og
nit
ion
wit
h C
och
lear
Im
pla
nts
Experimental Set-Up
• Subjects seated in double-walled sound-treated booth • Listen to stimuli in free field over loud speaker• Presentation level = 65 dBA • Calibrated by average power of "angry" emotion
sentences produced by male talker• Closed-set, 5-alternative identification task used to
measure vocal emotion recognition• Trial - sentence randomly selected (without
replacement) from stimulus set and presented to subject
Vo
cal
Em
oti
on
Rec
og
nit
ion
wit
h C
och
lear
Im
pla
nts
Experimental Set-Up: Response
• Subject respond by clicking on 1 of 5 choices on screen (neutral, anxious, happy, sad, angry)
• No feedback or training• Responses collected and scored in terms of percent
correct• At least 2 runs for each experimental condition• CI simulations
– Test order of speech processing conditions randomized across subjects
– Different between two runs
Vo
cal
Em
oti
on
Rec
og
nit
ion
wit
h C
och
lear
Im
pla
nts
ResultsV
oca
l E
mo
tio
n R
eco
gn
itio
n w
ith
Co
chle
ar I
mp
lan
ts
Discussion
• Results show both spectral and temporal cues significantly contribute to performance
• Spectral cues may contribute more strongly to recognition of linguistic information
• Temporal cues may contribute more strongly to recognition of emotional content coded in spoken language
• Results show a potential trade-off between spectral resolution and periodicity cues when performing vocal emotion recognition task
• Future: improve access to spectral and temporal fine structure cues to enhance recognition
Vo
cal
Em
oti
on
Rec
og
nit
ion
wit
h C
och
lear
Im
pla
nts
An Exploratory Social-Emotional Prosthetic for Autism Spectrum
Disorders
Measurement of emotional involvement in spontaneous
speaking behavior
Vocal Emotion Recognition with Cochlear Implants
Use of prosodic speech characteristics for automated
detection of alcohol intoxication
M. Levit, R. Huber, A. Batliner, E. Noeth
2001
R. Kaliouby, A. Teeters, R. W. Picard
2006
S. Luo, Q. J. Fu, J. J. Galvin
2006
B. Z. Pollermann
2000
Introduction
• Spoken language influenced by– Speaker– Emotions– Physiological impairment caused by drugs or alcohol
• Goal – to identify and classify stress and emotions in spoken language– Acoustic features (cepstral coeffiecients)– Prosodic features (fundamental frequency)
• No experiments on automated detection of alcohol intoxication by spoken language– Structural prosodic features – one vector of prosodic features for
each signal interval of a lexical unit of speech• ASR problems?
Use
of
pro
sod
ic s
pee
ch c
har
acte
rist
ics
for
auto
mat
ed
det
ecti
on
of
alco
ho
l in
toxi
cati
on
Contribution
• New approach to determine signal intervals which underlie extraction of prosodic features
• Avoid use of ASR• Use of phrasal units
– Relates prosodic structural features to signal intervals localized through basic prosodic features
• i.e. - zero-crossing, energy, fundamental frequency
Use
of
pro
sod
ic s
pee
ch c
har
acte
rist
ics
for
auto
mat
ed
det
ecti
on
of
alco
ho
l in
toxi
cati
on
Phrasal Units
• Prosodic units– Micro-intervals, Entire signal is an interval, Macro intervals
• Phrasal Unit - Speech intervals calculated frame-wise– Fundamental frequency, Zero-crossing, Energy
Use
of
pro
sod
ic s
pee
ch c
har
acte
rist
ics
for
auto
mat
ed
det
ecti
on
of
alco
ho
l in
toxi
cati
on
Features
• Prosodic units – one vector– PM21 – prosodic features describing macro-tendencies in
fundamental frequency and energy (21)– VUI11 – duration characteristics of voiced and unvoiced
intervals (11)– LTM24 – long term cepstral coefficients (non-prosodic
features)– Jitter, shimmer, short-term fluctuations in energy,
fundamental frequency
Use
of
pro
sod
ic s
pee
ch c
har
acte
rist
ics
for
auto
mat
ed
det
ecti
on
of
alco
ho
l in
toxi
cati
on
Database
• Alcoholized speech samples from Germany• 120 readings of a German fable• 33 male speakers in different alcoholization conditions
– Blood level – 0 to 2.4– Phrasal units
• Average duration – 2.3 seconds
• Average speech tempo – 20.8 PhU/min
• Alcoholized = 0.8 per mille and higher
Use
of
pro
sod
ic s
pee
ch c
har
acte
rist
ics
for
auto
mat
ed
det
ecti
on
of
alco
ho
l in
toxi
cati
on
Conclusions
• Prosodic speech characteristics can be used to determine intoxication
• Shown how to extract prosodic features with classification abilities from speech signal without lexical segmentation
• Shown phrasal units correspond to syntactic structures of language
• Determined set of structural prosodic features capable of best classification for automatic detection of intoxication– 69% accuracy on unseen data
Use
of
pro
sod
ic s
pee
ch c
har
acte
rist
ics
for
auto
mat
ed
det
ecti
on
of
alco
ho
l in
toxi
cati
on
An Exploratory Social-Emotional Prosthetic for Autism Spectrum
Disorders
Measurement of emotional involvement in spontaneous
speaking behavior
Vocal Emotion Recognition with Cochlear Implants
Use of prosodic speech characteristics for automated
detection of alcohol intoxication
M. Levit, R. Huber, A. Batliner, E. Noeth
2001
R. Kaliouby, A. Teeters, R. W. Picard
2006
S. Luo, Q. J. Fu, J. J. Galvin
2006
B. Z. Pollermann
2000
Autism Spectrum Disorders: Interesting Facts
• Affects 1 in 91 children and 1 in 58 boys• Autism prevalence figures are growing• More children will be diagnosed with autism this year
than with AIDS, diabetes & cancer combined• Fastest-growing serious developmental disability in
the U.S.
An
Exp
lora
tory
So
cial
-Em
oti
on
al P
rost
het
ic f
or
Au
tism
S
pec
tru
m D
iso
rder
s
Autism Spectrum Disorders: Overview
• Neuro-developmental disorder– Mainly characterized by communication and social interaction
• May exhibit atypical autonomic nervous system patterns– May be monitored by electrodes (measure skin conductance)
An
Exp
lora
tory
So
cial
-Em
oti
on
al P
rost
het
ic f
or
Au
tism
S
pec
tru
m D
iso
rder
s
Goal and Proposed Solution
• Goal:– Find an objective set of features to describe social
interactions
• Proposed solution– Use a wearable device as exploratory and monitoring tool for
people with ASD
An
Exp
lora
tory
So
cial
-Em
oti
on
al P
rost
het
ic f
or
Au
tism
S
pec
tru
m D
iso
rder
s
Wearable Device
An
Exp
lora
tory
So
cial
-Em
oti
on
al P
rost
het
ic f
or
Au
tism
S
pec
tru
m D
iso
rder
s
novel wearable device that analyzes social-emotional information in human-human
interaction
=
affective computing
wearable computing
real time machine perception
+
+
A Novel Device?
• Small wearable camera• Sensors combined with machine vision and
perception algorithms• System analyzes facial expression and head
movements of the person with whom user is interacting
• Possible integration of skin conductance sensors– Match video with co-occurring measurable physiological
changes
An
Exp
lora
tory
So
cial
-Em
oti
on
al P
rost
het
ic f
or
Au
tism
S
pec
tru
m D
iso
rder
s
Wearable Device Benefit
An
Exp
lora
tory
So
cial
-Em
oti
on
al P
rost
het
ic f
or
Au
tism
S
pec
tru
m D
iso
rder
s
help identify the spacio-temporal features of a social interaction that predict how interaction is
perceived
+
Record a corpus of natural face to face interactions
Machine perception algorithms
=
Overall Benefit
• Monitor progress of people with ASD in terms of social-emotional interactions
• Determine effectiveness of social skill and behavioral therapies
An
Exp
lora
tory
So
cial
-Em
oti
on
al P
rost
het
ic f
or
Au
tism
S
pec
tru
m D
iso
rder
s
An Exploratory Social-Emotional Prosthetic for Autism Spectrum
Disorders
Measurement of emotional involvement in spontaneous
speaking behavior
Vocal Emotion Recognition with Cochlear Implants
Use of prosodic speech characteristics for automated
detection of alcohol intoxication
M. Levit, R. Huber, A. Batliner, E. Noeth
2001
R. Kaliouby, A. Teeters, R. W. Picard
2006
S. Luo, Q. J. Fu, J. J. Galvin
2006
B. Z. Pollermann
2000
Introduction
• Measurement of vocal indicators of emotions - in laboratory settings: typically done by computing deviation of emotionally charged speech patterns from neutral pattern
• Measurement of genuine emotional reactions occurring spontaneously poses the problem of comparison with base-line level
Mea
sure
men
t o
f em
oti
on
al i
nvo
lvem
ent
in s
po
nta
neo
us
spea
kin
g b
ehav
ior
Similarities
• Spontaneous speaking behavior• Intra-subject comparisons• Do not require a “neutral’ condition
Mea
sure
men
t o
f em
oti
on
al i
nvo
lvem
ent
in s
po
nta
neo
us
spea
kin
g b
ehav
ior
Method 1M
easu
rem
ent
of
emo
tio
nal
in
volv
emen
t in
sp
on
tan
eou
s sp
eaki
ng
beh
avio
rSubjects 39 diabetic patients with different impairment levels ANS
Data
Emotion induced through subjects' verbal recall of their emotional experiences of joy, sadness and anger
At end of each episode - standard sentence on emotion congruent tone
Method
1) Standard sentence acoustically analyzed
2) Extract basic vocal parameters (Zei & Archinard, 1998)
3) Compute Vocal Differential Index (Emotional involvement)• ratio between value obtained in high arousal conditions
(anger and joy) and that in low arousal condition (sadness)
4) Additional variables per vocal parameter: Anger/Sadness Differential and Joy/Sadness Differential
Results
1) Vocal Differential index is positively correlated with functioning of the autonomous nervous system
2) Vocal Arousal Indexa) Computed from cumulative score consisting of acoustic
parameters significantly related to differentiation of 3 emotions
b) Score was composed of Z values
c) Reflects degree of emotional involvement for each emotion
Method 2
Mea
sure
men
t o
f em
oti
on
al i
nvo
lvem
ent
in s
po
nta
neo
us
spea
kin
g b
ehav
ior
Subjects 10 breast cancer patients
Data Collection
Interview to determine coping style (well-adaptive vs. ill-adaptive)
HypothesisConfrontation with emotional contents during interview would cause subjects to encode emotional reactions into voices
Method
1) Interview screened for passages of high and low vocal arousal
2) Vocal Differential Index calculated for each arousal
3) Vocal Arousal measured for passage where subject talks about coping with illness
Results
1) Vocal Arousal index inside the base line range indicative of coping style
2) Relatively narrow Vocal Differential Index related to coping difficulties