medical diagnosis coms 6998 fall 2009 promiti dutta

Medical Diagnosis

COMS 6998 Fall 2009

Promiti Dutta

An Exploratory Social-Emotional Prosthetic for Autism Spectrum

Disorders

Measurement of emotional involvement in spontaneous

speaking behavior

Vocal Emotion Recognition with Cochlear Implants

Use of prosodic speech characteristics for automated

detection of alcohol intoxication

M. Levit, R. Huber, A. Batliner, E. Noeth

2001

R. Kaliouby, A. Teeters, R. W. Picard

2006

S. Luo, Q. J. Fu, J. J. Galvin

2006

B. Z. Pollermann

2000

Cochlear Implants (CI)

• Can restore hearing sensation to deaf individuals

• How do they work?– Use spectrally-based speech-processing

strategies– Temporal envelope is extracted from number of frequency

analysis bands and used to modulate pulse trains of current delivered to appropriate electrodes

• CI performance is poor for challenging listening tasks– Speech in noise– Music perception– Voice gender– Speaker recognitionV

oca

l E

mo

tio

n R

eco

gn

itio

n w

ith

Co

chle

ar I

mp

lan

ts

Cochlear Implants (CI)

• Current research goal – Improve difficult listening conditions– How?

• Improve transmission of spectro- temporal fine structure cues

– Methods• Increase spectral resolution for apical electrodes to better code

pitch information (Geurts and Wouters)

• Sharpening the temporal envelope to enhance periodicity cues transmitted by the speech processor (Green et. al.)

Vo

cal

Em

oti

on

Rec

og

nit

ion

wit

h C

och

lear

Im

pla

nts

Prosodic Information in Spoken Language

• Prosodic features = variations in speech rhythm, intonation, etc.

• Prosodic cues = emotion of speaker• Acoustic features associated with vocal emotion

– Pitch (mean value and variability)– Intensity– Speech rate– Voice quality – Articulation

Normal hearing – 70 – 80% accuracy in recognition

AI, NN, Statistical classifiers equally as

well

Vo

cal

Em

oti

on

Rec

og

nit

ion

wit

h C

och

lear

Im

pla

nts

This Study

• CI Users: Investigate ability to recognize vocal emotions in acted emotional speech– Limited access to pitch information and spectro-temporal fine

structure cues

• Normal Hearers: vocal emotion recognition using unprocessed speech and speech processed by acoustic CI simulations– Simulations: different amounts of spectral resolution and

temporal information to examine relative contributions of spectral and temporal cues

Vo

cal

Em

oti

on

Rec

og

nit

ion

wit

h C

och

lear

Im

pla

nts

Subjects

• 6 NH (3 males and 3 females)– Puretone treshold better than 20 dB HL at octave frequencies

from 125 to 8000 Hz in both ears

• 6 CI (3 males and 3 females)– Post-lingually deafened– 5 of 6 subjects: at-least one-year experience with device– 1 subject: 4 months’ experience with device– (3 Nucleus-22 users, 2 Nucelus-24 users, 1 Freedom User)– Tested using clinically assigned speech processors

• Native English speakers• Participants paid for participation

Vo

cal

Em

oti

on

Rec

og

nit

ion

wit

h C

och

lear

Im

pla

nts

Stimuli and Speech Processing

• HEI-ESD – emotional speech database – 1 male; 1 female– 50 simple English sentences– 5 target emotions (neutral, anxious, happy, sad, and angry)– Same sentences used to convey different target emotions in

order to minimize contextual and discourse cues

• Speech processing– Digitized using 16-bit A/D converter– 22,050 Hz sampling rate, without high-frequency pre-

emphasis– Relative intensity cues preserved for each emoitonal qualities– Samples NOT normalized

Vo

cal

Em

oti

on

Rec

og

nit

ion

wit

h C

och

lear

Im

pla

nts

Stimuli and Speech Processing

• Database evaluated– 3 NH English-speaking listeners– 10 sentences that produced highest vocal emotion

recognition scores selected for experimental testing– Total = 100 tokens (2 speakers * 5 emotions * 10 sentences)

Vo

cal

Em

oti

on

Rec

og

nit

ion

wit

h C

och

lear

Im

pla

nts

Emotion Recognition Tests

• CI subjects - unprocessed speech• NH subjects - unprocessed speech + speech

processed by acoustic, sine-wave vocoder CI simulations.

• Continuous Interleaved Sampling (CIS) strategy

Vo

cal

Em

oti

on

Rec

og

nit

ion

wit

h C

och

lear

Im

pla

nts

Experimental Set-Up

• Subjects seated in double-walled sound-treated booth • Listen to stimuli in free field over loud speaker• Presentation level = 65 dBA • Calibrated by average power of "angry" emotion

sentences produced by male talker• Closed-set, 5-alternative identification task used to

measure vocal emotion recognition• Trial - sentence randomly selected (without

replacement) from stimulus set and presented to subject

Vo

cal

Em

oti

on

Rec

og

nit

ion

wit

h C

och

lear

Im

pla

nts

Experimental Set-Up: Response

• Subject respond by clicking on 1 of 5 choices on screen (neutral, anxious, happy, sad, angry)

• No feedback or training• Responses collected and scored in terms of percent

correct• At least 2 runs for each experimental condition• CI simulations

– Test order of speech processing conditions randomized across subjects

– Different between two runs

Vo

cal

Em

oti

on

Rec

og

nit

ion

wit

h C

och

lear

Im

pla

nts

ResultsV

oca

l E

mo

tio

n R

eco

gn

itio

n w

ith

Co

chle

ar I

mp

lan

ts

Discussion

• Results show both spectral and temporal cues significantly contribute to performance

• Spectral cues may contribute more strongly to recognition of linguistic information

• Temporal cues may contribute more strongly to recognition of emotional content coded in spoken language

• Results show a potential trade-off between spectral resolution and periodicity cues when performing vocal emotion recognition task

• Future: improve access to spectral and temporal fine structure cues to enhance recognition

Vo

cal

Em

oti

on

Rec

og

nit

ion

wit

h C

och

lear

Im

pla

nts


Disorders


speaking behavior





2001


2006


2006

B. Z. Pollermann

2000

Introduction

• Spoken language influenced by– Speaker– Emotions– Physiological impairment caused by drugs or alcohol

• Goal – to identify and classify stress and emotions in spoken language– Acoustic features (cepstral coeffiecients)– Prosodic features (fundamental frequency)

• No experiments on automated detection of alcohol intoxication by spoken language– Structural prosodic features – one vector of prosodic features for

each signal interval of a lexical unit of speech• ASR problems?

Use

of

pro

sod

ic s

pee

ch c

har

acte

rist

ics

for

auto

mat

ed

det

ecti

on

of

alco

ho

l in

toxi

cati

on

Contribution

• New approach to determine signal intervals which underlie extraction of prosodic features

• Avoid use of ASR• Use of phrasal units

– Relates prosodic structural features to signal intervals localized through basic prosodic features

• i.e. - zero-crossing, energy, fundamental frequency

Use

of

pro

sod

ic s

pee

ch c

har

acte

rist

ics

for

auto

mat

ed

det

ecti

on

of

alco

ho

l in

toxi

cati

on

Phrasal Units

• Prosodic units– Micro-intervals, Entire signal is an interval, Macro intervals

• Phrasal Unit - Speech intervals calculated frame-wise– Fundamental frequency, Zero-crossing, Energy

Use

of

pro

sod

ic s

pee

ch c

har

acte

rist

ics

for

auto

mat

ed

det

ecti

on

of

alco

ho

l in

toxi

cati

on

Features

• Prosodic units – one vector– PM21 – prosodic features describing macro-tendencies in

fundamental frequency and energy (21)– VUI11 – duration characteristics of voiced and unvoiced

intervals (11)– LTM24 – long term cepstral coefficients (non-prosodic

features)– Jitter, shimmer, short-term fluctuations in energy,

fundamental frequency

Use

of

pro

sod

ic s

pee

ch c

har

acte

rist

ics

for

auto

mat

ed

det

ecti

on

of

alco

ho

l in

toxi

cati

on

Database

• Alcoholized speech samples from Germany• 120 readings of a German fable• 33 male speakers in different alcoholization conditions

– Blood level – 0 to 2.4– Phrasal units

• Average duration – 2.3 seconds

• Average speech tempo – 20.8 PhU/min

• Alcoholized = 0.8 per mille and higher

Use

of

pro

sod

ic s

pee

ch c

har

acte

rist

ics

for

auto

mat

ed

det

ecti

on

of

alco

ho

l in

toxi

cati

on

Conclusions

• Prosodic speech characteristics can be used to determine intoxication

• Shown how to extract prosodic features with classification abilities from speech signal without lexical segmentation

• Shown phrasal units correspond to syntactic structures of language

• Determined set of structural prosodic features capable of best classification for automatic detection of intoxication– 69% accuracy on unseen data

Use

of

pro

sod

ic s

pee

ch c

har

acte

rist

ics

for

auto

mat

ed

det

ecti

on

of

alco

ho

l in

toxi

cati

on


Disorders


speaking behavior





2001


2006


2006

B. Z. Pollermann

2000

Autism Spectrum Disorders: Interesting Facts

• Affects 1 in 91 children and 1 in 58 boys• Autism prevalence figures are growing• More children will be diagnosed with autism this year

than with AIDS, diabetes & cancer combined• Fastest-growing serious developmental disability in

the U.S.

An

Exp

lora

tory

So

cial

-Em

oti

on

al P

rost

het

ic f

or

Au

tism

S

pec

tru

m D

iso

rder

s

Autism Spectrum Disorders: Overview

• Neuro-developmental disorder– Mainly characterized by communication and social interaction

• May exhibit atypical autonomic nervous system patterns– May be monitored by electrodes (measure skin conductance)

An

Exp

lora

tory

So

cial

-Em

oti

on

al P

rost

het

ic f

or

Au

tism

S

pec

tru

m D

iso

rder

s

Goal and Proposed Solution

• Goal:– Find an objective set of features to describe social

interactions

• Proposed solution– Use a wearable device as exploratory and monitoring tool for

people with ASD

An

Exp

lora

tory

So

cial

-Em

oti

on

al P

rost

het

ic f

or

Au

tism

S

pec

tru

m D

iso

rder

s

Wearable Device

An

Exp

lora

tory

So

cial

-Em

oti

on

al P

rost

het

ic f

or

Au

tism

S

pec

tru

m D

iso

rder

s

novel wearable device that analyzes social-emotional information in human-human

interaction

=

affective computing

wearable computing

real time machine perception

+

+

A Novel Device?

• Small wearable camera• Sensors combined with machine vision and

perception algorithms• System analyzes facial expression and head

movements of the person with whom user is interacting

• Possible integration of skin conductance sensors– Match video with co-occurring measurable physiological

changes

An

Exp

lora

tory

So

cial

-Em

oti

on

al P

rost

het

ic f

or

Au

tism

S

pec

tru

m D

iso

rder

s

Wearable Device Benefit

An

Exp

lora

tory

So

cial

-Em

oti

on

al P

rost

het

ic f

or

Au

tism

S

pec

tru

m D

iso

rder

s

help identify the spacio-temporal features of a social interaction that predict how interaction is

perceived

+

Record a corpus of natural face to face interactions

Machine perception algorithms

=

Overall Benefit

• Monitor progress of people with ASD in terms of social-emotional interactions

• Determine effectiveness of social skill and behavioral therapies

An

Exp

lora

tory

So

cial

-Em

oti

on

al P

rost

het

ic f

or

Au

tism

S

pec

tru

m D

iso

rder

s


Disorders


speaking behavior





2001


2006


2006

B. Z. Pollermann

2000

Introduction

• Measurement of vocal indicators of emotions - in laboratory settings: typically done by computing deviation of emotionally charged speech patterns from neutral pattern

• Measurement of genuine emotional reactions occurring spontaneously poses the problem of comparison with base-line level

Mea

sure

men

t o

f em

oti

on

al i

nvo

lvem

ent

in s

po

nta

neo

us

spea

kin

g b

ehav

ior

Similarities

• Spontaneous speaking behavior• Intra-subject comparisons• Do not require a “neutral’ condition

Mea

sure

men

t o

f em

oti

on

al i

nvo

lvem

ent

in s

po

nta

neo

us

spea

kin

g b

ehav

ior

Method 1M

easu

rem

ent

of

emo

tio

nal

in

volv

emen

t in

sp

on

tan

eou

s sp

eaki

ng

beh

avio

rSubjects 39 diabetic patients with different impairment levels ANS

Data

Emotion induced through subjects' verbal recall of their emotional experiences of joy, sadness and anger

At end of each episode - standard sentence on emotion congruent tone

Method

1) Standard sentence acoustically analyzed

2) Extract basic vocal parameters (Zei & Archinard, 1998)

3) Compute Vocal Differential Index (Emotional involvement)• ratio between value obtained in high arousal conditions

(anger and joy) and that in low arousal condition (sadness)

4) Additional variables per vocal parameter: Anger/Sadness Differential and Joy/Sadness Differential

Results

1) Vocal Differential index is positively correlated with functioning of the autonomous nervous system

2) Vocal Arousal Indexa) Computed from cumulative score consisting of acoustic

parameters significantly related to differentiation of 3 emotions

b) Score was composed of Z values

c) Reflects degree of emotional involvement for each emotion

Method 2

Mea

sure

men

t o

f em

oti

on

al i

nvo

lvem

ent

in s

po

nta

neo

us

spea

kin

g b

ehav

ior

Subjects 10 breast cancer patients

Data Collection

Interview to determine coping style (well-adaptive vs. ill-adaptive)

HypothesisConfrontation with emotional contents during interview would cause subjects to encode emotional reactions into voices

Method

1) Interview screened for passages of high and low vocal arousal

2) Vocal Differential Index calculated for each arousal

3) Vocal Arousal measured for passage where subject talks about coping with illness

Results

1) Vocal Arousal index inside the base line range indicative of coping style

2) Relatively narrow Vocal Differential Index related to coping difficulties

medical diagnosis coms 6998 fall 2009 promiti dutta

Documents

speech rhythm

unprocessed speech

vocal emotions

prosodic cues

temporal information

recognition ai

vocal emotion pitch

prosodic information