image based static facial expression recognition with ...image based static facial expression...

25
Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University Cha Zhang Microsoft Research Nov 9 th , 2015

Upload: others

Post on 14-Mar-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

Image based Static Facial Expression Recognition with Multiple Deep

Network Learning Zhiding Yu Carnegie Mellon University

Cha Zhang Microsoft Research

Nov 9th, 2015

Page 2: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

Motivation

• Helps computer to better understand human

• Helps computer to interact with human more naturally

• Wide array of practical applications

• Current emotional intell. is limited (Has considerable room to improve)

Affect-aware personal assistant/companion devices

Autism intervention Honest signal Affect-aware game development

Page 3: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

Datasets

Page 4: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

FER 2013 Dataset

• Web crawled + human labeling

• 48x48 image resolution

• 28709 training samples

• 3589 validation samples

• 3590 testing samples

• Noisy data: Inconsistent face cropping + non-face images + observed labeling errors

Page 5: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

EmotiW-SFEW Challenge 2015

• Frames from movie (requires face detection)

• Wild (spontaneous) setting

• Limited training data (958 Train + 436 Val + 372 Test)

• Unbalanced class sizes

Page 6: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

Face Detection

Page 7: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

The Face Detection Cascade

Images without detected faces

Images without detected faces

Input Images

Faces detected by DCNN

Faces detected by JDA

Images not containing faces

Faces detected by MoT

JDA [1] DCNN [2] MoT [3]

No

Yes Yes Yes

No No

[1] D. Chen, S. Ren, Y. Wei, X. Cao and J. Sun. Joint cascade face detection and alignment, ECCV 2014 [1] C. Zhang and Z. Zhang, Improving Multiview Face Detection with Multi-Task Deep Convolutional Neural Networks, WACV 2014 [2] X. Zhu and D. Ramanan, Face detection, pose estimation and landmark localization in the wild, CVPR 2012

Page 8: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

Examples of JDA and DCNN Detections

Red: JDA Blue: DCNN

Page 9: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

Detection Results on SFEW Test (372 Faces)

JDA DCNN MoT JDA+DCNN JDA+DCNN+MoT

Correct Det Num 333 358 352 363 371

Faces missed by JDA but found by DCNN

Faces missed by JDA+DCNN but found by MoT

False Positive

Page 10: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

Recognition System

Page 11: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

The Basic CNN Architecture

24

24

64

3 3

128

12

12

24

24

64

3 3 Dense

1024

48

48

Stochastic Pooling

3 3

128

6

6 Dense

1024

Dense

7 5

5

Stochastic Pooling

Stochastic Pooling

128

12

12

3 3

Page 12: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

Improvement I: Image Perturbation

With image perturbation, we can:

• Data augmentation by randomly perturbing training data (Data Aug)

• More robust prediction by voting with perturbed testing data (Voting)

Page 13: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

Perturbation with Parameterized Warping

Translation + Rotation + Skewing + Scaling

Before warping

After warping

Page 14: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

CNN Architecture with Data Aug. & Voting

Perturbed Images

24

24

64

3 3

128

12

12

24

24

64

3 3 Dense

1024 48

48

Stochastic Pooling

3 3

128

6

6 Dense

1024

Dense

7

5

5

Stochastic Pooling

Stochastic Pooling

128

12

12

3 3

Averaged Weight

7

Page 15: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

Desired training response

Combined training response

Cost function

CNN #2 Training

Resp.

CNN #1 Training

Resp.

CNN #K Training

Resp.

+

w1

w2

wK

CNN #1

CNN #2

CNN #K

Improvement II: Multiple Network Learning (MNL)

The MNL Algorithm Diagram:

Page 16: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

Proposed MNL Cost Functions

Hinge Loss (HL):

Log Likelihood Loss (LL):

Page 17: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

Experimental Results

Page 18: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

FER 2013

• Preprocessing: Hist Eq + Plane Fitting + Unit Norm

• Training: CNN(s) trained on the FER training set

• Validation: Select the optimal training epoch by maximizing the FER validation set accuracy

Page 19: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

FER 2013

• Human label consistency against majority voted GT label: 65-68%

• Basic CNN: 65.07%

• CNN + Data Aug: 68.6%

• CNN + Data Aug + Voting: 70.33%

• FER 2013 Winner: 71.162%

• MNL (Log Like Loss): 72.05%

• MNL (Hinge Loss): 72.08%

Page 20: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

FER 2013

surprise sad angry sad happy angry sad sad happy angry

happy neutral disgust surprise sad happy angry surprise sad angry

Correct Prediction (Fear)

Wrong Prediction

Page 21: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

EmotiW-SFEW 2015

• Preprocessing: Hist Eq + Plane Fitting + Unit Norm

• Training: • Pre-train on the FER combined set (Train + Val + Test)

• Fix the network parameters at bottom layers (Only allow the last two dense layers to be updated)

• Fine-tune on the SFEW training set (Domain adaptation)

• Validation: Select the optimal fine-tune epoch by maximizing the SFEW validation set accuracy

Page 22: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

EmotiW-SFEW 2015

0.46

0.48

0.5

0.52

0.54

0.56

0.58

0.6

0.62

Validation Test

Acc

ura

cy

Single

Average1

Average2

SVM

LogLike

HingeLoss

Accuracy numbers (Baselines: 35.96%/39.13%) Learned network ensemble weights

Page 23: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

EmotiW-SFEW 2015

Log Likelihood

Hinge Loss

Page 24: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

Conclusions

• CNNs are arguably the most powerful tool so far for emotion recognition tasks

• Fine-tuning plays the role of domain adaptation

• Image perturbations and voting-based prediction are significant keys to improving the performance

• A weighted committee of multiple networks can further improve the classification performance

Page 25: Image based Static Facial Expression Recognition with ...Image based Static Facial Expression Recognition with Multiple Deep Network Learning Zhiding Yu Carnegie Mellon University

Thank You!