Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Featuring the GEMEP Corpus
Experiences and Future Plans
Hannes Pirker
OFAI, Vienna
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Overview
• Features from Audio Channel• Segments and pitch contours
• Features from Video Channel• Faces, silhouettes and hands
• Discussion• What we do have & what we need
© ÖFAI, Wien 3
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Features from Audio Channel
• Phonetic segmentation into• Phonemes• Syllables
• Pitch Extraction
© ÖFAI, Wien 4
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Speech AnalysisPhonetic Segmentation
• Phonetic segmentation works quite good (forced alignment with HTK)• Bootstrapping circle with
• Manual labelling of training data
• Training of HMMs
• Automatic alignment
• Manual correction of alignment results
• Re-training of HMMs
• Processed data: Typ 1 sentences (Ne kal ibam sud molen)
© ÖFAI, Wien 5
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Speech AnalysisPitch Extraction
Testing different pitch extraction methods from SFS (Mark Huckvale, UCL)
• Promising results with fine-tuning of parameters for contour-smoothing & high-pitch correction
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Sample 06joi112
© ÖFAI, Wien 7
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Features from Video Channel
• Face detection
• Silhouettes & Bounding Boxes
• Hand tracking
© ÖFAI, Wien 8
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
FaceDetection Using openCV
• Feature based face detection with pre-trained cascaded classifier
• (almost) ‘out of the box’
• Very good results under close-to-optimal conditions
© ÖFAI, Wien 9
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
FaceDetection Using openCV
• Feature based face detection with pre-trained cascaded classifier
Clip:06joi112
© ÖFAI, Wien 10
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Silhouettes & Bounding Boxes
• Combing silhouettes & bounding boxes in frontal and sideview as simple & robust estimator of• Dynamics of movements• Amount of expansion
© ÖFAI, Wien 11
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Silhouettes & Bounding Boxes
• Combing silhouettes & bounding boxes in frontal and sideview as simple & robust estimator
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
1st Results: 3D Bounding Box Volume per Emotion – Actor 01
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
1st Results: 3D Bounding Box Volume per Emotion – Actor 07
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
3D Bounding Box Volume: JOI vs. TRI / Actor 01 vs. Actor 06
ACTOR 01 ACTOR 06
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Temporal Dynamics: Plotting BoundingBox & Speech-Timing
Clip:06joi112
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Hand Tracking
• Find skin areas in 1st frame
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Hand Tracking
• Find skin areas in 1st frame
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Hand Tracking
• Find skin areas in 1st frame
• Find hand & use center of area as hand position
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Hand Tracking
• Find skin areas in 1st frame
• Find hand & use center of area as hand position
• Interactively accept or correct
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Hand Tracking
• Find skin areas in 1st frame
• Find hand & use center of area as hand position
• Interactively accept or correct
• Perform automatic tracking (using mean shift algorithm)
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Hand Tracking
• Perform automatic tracking (using mean shift algorithm)
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Hand Tracking
• Perform automatic tracking (using mean shift algorithm)
• Interactively classify quality of tracking • The GOOD (70%)• The BAD (15%)• The UGLY (15%)
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Different Actors – Different Results
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Different Actors – Different ResultsDifficult
Easier
© ÖFAI, Wien 25
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Discussion
• What have we gained?
• What do we need?
© ÖFAI, Wien 26
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Investigations 1: Interaction
• Influence of Affect on Gesture AND speech• Activation dimension should be
reflected in speech and body movement in a parallel way. i.e. look at speed, effort etc.
© ÖFAI, Wien 27
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Investigations 1: Interaction
• Influence of Affect on Gesture AND speech• Activation dimension should be
reflected in speech and body movement in a parallel way. i.e. look at speed, effort etc.
Data already usable
© ÖFAI, Wien 28
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Investigations 2: Timing
• Temporal aspects • Look at the relative timing of
speech and non-verbal signals, e.g. The location of strokes in relation to accented syllables
© ÖFAI, Wien 29
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Timing:Traditional Anchor points in Speech
• Borders of • Syllables • Words• (Prosodic and Syntactic) Phrases• Utterances• Turns
• Location of • Pauses• (Pitch) Accents
© ÖFAI, Wien 30
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Timing:Traditional Anchor points in Speech
• Borders of • Syllables +++• Words +++• (Prosodic & Syntactic) Phrases + -• Utterances +?• Turns --
• Location of • Pauses +• (Pitch) Accents + --
© ÖFAI, Wien 31
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Temporal alignmentTraditional Anchor points in
Gestures
•Prepare
•Stroke
•Hold
•Retract
(Graphics by A. Marshall http://twiki.isi.edu/Public/BML_Specification)
© ÖFAI, Wien 32
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Temporal alignmentTraditional Anchor points in
Gestures
Phases difficult to obtain, but general information on dynamics available
(Graphics by A. Marshall http://twiki.isi.edu/Public/BML_Specification)
© ÖFAI, Wien 33
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Investigations 3: Bodily Expression of Emotion
• Relevant Features (cf Wallbott 1998):
• Upper Body: away from camera, collapsed
• Shoulders: up, backward, forward • Head: down, back, turned or bent• Arms: lateral, stretched out
frontal/sideways, crossed
© ÖFAI, Wien 34
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
enceInvestigations 3: Bodily Expression
of Emotion
• Relevant Features (cf Wallbott 1998):
• Hand-form: fist(s), opening/closing• Movement Qualities:
Activity,Expansiveness, Dynamics/Energy/Power
© ÖFAI, Wien 35
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Investigations 3: Bodily Expression of Emotion
• Relevant Features (cf Wallbott 1998):
Data currently focussing on hand location. Upper-body and posture not directly assessed.Movement qualities partially accessible
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
What we DO have by Now
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
What we might WANT to have?
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Possible Representations
• H-ANIM/MPEG-4-style• joint angles
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Possible Representations
• H-ANIM/MPEG-4-style• joint angles
• Gesticon/MURML/HamNoSys-style• wrist position in relation to body• encoding of the stroke-phase etc.
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Possible Representations
• Less ambitious• Symmetric/Assymetric• Extended/Collapsed• Static/Dynamic• …
• In any case: Need to relate pixel-numbers to anthropomorphic measures!
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Possible Representations
• Also consider (manually supported) classification into prototypical classes
© ÖFAI, Wien 42
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Summary
• Numerous possibilities for improvements, e.g., in hand tracking (but is it necessary?)
• Rather concentrating on representations in order to ensure data really is useful and re-usable (e.g. ECAs)
© ÖFAI, Wien 43
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Summary
Time to bundle expertises and distribute efforts within
© ÖFAI, Wien 44
Öst
erre
ichi
sche
s F
orsc
hnun
gsin
stitu
t fü
rA
rtifi
cial
Int
ellig
ence
Thanks for your attention