modeling visual search in a thousand sceneskehinger.com/pdf/vss2009_ehinger_slides.pdf ·...

51
Modeling visual search in a thousand scenes: The roles of saliency, target features, and scene context Krista Ehinger, Barbara HidalgoSotelo, Antonio Torralba, & Aude Oliva VSS 2009, 5/10/09 1 Modeling Visual Search in 1000 Scenes

Upload: others

Post on 07-Apr-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Modeling visual search in a thousand scenes:The roles of saliency, target features,and scene context

Krista Ehinger, Barbara Hidalgo‐Sotelo, Antonio Torralba, & Aude Oliva

VSS 2009, 5/10/09 1Modeling Visual Search in 1000 Scenes

Page 2: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 2

Page 3: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

• 14 participants, 912 images = 45,144 fixations• Person present/absent?

Overview of experiment

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 3Fixations 1-3 on target-absent scenes

Page 4: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Overview of Model

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 4

Local and globalimage features

Combination ofsources of guidance

Scene context

Target features

Saliency

Page 5: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

• 14 participants, 912 images = 45,144 fixations

Overview of experiment

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 5

Page 6: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Human Agreement

• Inter‐observer agreement = upper bound for model performance

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 6

Page 7: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Human Agreement

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 7

• Inter‐observer agreement = upper bound for model performance

• Cross‐image control = lower bound for model performance

Page 8: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

The ROC curve

Cog Lunch 10/21/2008

Model

Selected image regionsROC curve

Page 9: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

The ROC curve

Cog Lunch 10/21/2008

Model

Selected image regionsROC curve

Page 10: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

The ROC curve

Cog Lunch 10/21/2008

Model

Selected image regionsROC curve

Page 11: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

The ROC curve

Cog Lunch 10/21/2008

Model

Selected image regionsROC curve

Page 12: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

The ROC curve

Cog Lunch 10/21/2008

Model

Selected image regionsROC curve

Page 13: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

The ROC curve

Cog Lunch 10/21/2008

Model

Selected image regionsROC curve

Page 14: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

The ROC curve

Cog Lunch 10/21/2008

Model

Selected image regionsROC curve

Page 15: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

The ROC curve

Cog Lunch 10/21/2008

Model

Selected image regionsROC curve

Page 16: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

The ROC curve

Cog Lunch 10/21/2008

Model

Selected image regionsROC curve

Page 17: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

The ROC curve

Cog Lunch 10/21/2008

Model

Selected image regionsROC curve

AUC

Page 18: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Human Agreement

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 18False alarm rate

Fixa

tion

dete

ctio

n ra

teHuman AgreementAUC = 0.93

Cross-Image ControlAUC = 0.68

Page 19: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Human agreement examples

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 19

High inter-observer agreement

Low inter-observer agreement

Page 20: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Overview of Model

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 20

Local and globalimage features

Combination ofsources of guidance

Target features

Scene context

Saliency

Page 21: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 21

Guidance by Saliency

Page 22: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Saliency Model

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 22False alarm rate

Fixa

tion

dete

ctio

n ra

te

Human AgreementAUC = 0.93

Cross-Image ControlAUC = 0.68

Saliency ModelAUC = 0.77

Page 23: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Saliency Model: Examples

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 23

Best performance

AUC = 0.94

Worst performance

AUC = 0.36

Page 24: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Overview of Model

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 24

Local and globalimage features

Combination ofsources of guidance

Scene context

Target featuresTarget features

SaliencySaliency

Page 25: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Pedestrian Detector• Histograms of Oriented Gradients 

(HOG) detector by Dalal & Triggs

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 25

Dalal & Triggs, 2005 CVPR

Positivefeatures

Negativefeatures

Average gradient

Page 26: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 26

Guidance by Target Features

Page 27: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Target Features Model

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 27False alarm rate

Fixa

tion

dete

ctio

n ra

te

Human AgreementAUC = 0.93

Cross-Image ControlAUC = 0.68

Target Features ModelAUC = 0.78

Page 28: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Target Features Model: Examples

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 28

AUC = 0.95

Best performance

AUC = 0.50

Worst performance

Page 29: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Overview of Model

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 29

Local and globalimage features

Combination ofsources of guidance

Scene context

Target featuresTarget features

SaliencySaliency

Scene context

Target features

Saliency

Page 30: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 30

What is the context region for pedestrians?

Page 31: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Training image(contains a pedestrian)

Scene Context Model

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 31

Oliva & Torralba, 2001

Orientations at variousspatial scales

Scene “gist” + positionof pedestrian

Page 32: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 32

Guidance by Scene Context

Page 33: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Scene Context Model

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 33False alarm rate

Fixa

tion

dete

ctio

n ra

te

Human AgreementAUC = 0.93

Cross-Image ControlAUC = 0.68

Scene Context ModelAUC = 0.85

Page 34: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Scene Context Model: Examples

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 34

Best performance

AUC = 0.95

Worst performance

AUC = 0.27

Page 35: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Overview of Model

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 35

Local and globalimage features

Combination ofsources of guidance

Scene context

Target features

Saliency

Page 36: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 36

Combined Sources of Guidance

Page 37: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Combined Model

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 37False alarm rate

Fixa

tion

dete

ctio

n ra

te

Human AgreementAUC = 0.93

Cross-Image ControlAUC = 0.68

Combined ModelAUC = 0.88

Page 38: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Combined Model: Examples

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 38

Best performance

AUC = 0.94

Worst performance

AUC = 0.36

Page 39: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Target Absent vs. Target Present

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 39

False alarm rate

Fixa

tion

dete

ctio

n ra

te

Target Absent Scenes

False alarm rate

Target Present Scenes

Page 40: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Summary of results

• Combined model accounts for 94% of human agreement in search fixations

• Scene context gives the best prediction of human search fixations in this task

• How to get that last 6%?

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 40

Page 41: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Overview of Model

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 41

Local and globalimage features

Combination ofsources of guidance

Scene context

Target features

Saliency

Context “oracle”

Page 42: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

“Context Oracle” Implementation

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 42

Context Oracle, AUC = 0.90

Context Model, AUC = 0.67

Page 43: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Context Oracle

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 43False alarm rate

Fixa

tion

dete

ctio

n ra

te

Human AgreementAUC = 0.93

Cross-Image ControlAUC = 0.68

Combined ModelAUC = 0.88Context Oracle

AUC = 0.88

Page 44: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Overview of Model

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 44

Local and globalimage features

Combination ofsources of guidance

Scene context

Target features

Saliency

Context “oracle”

Page 45: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Combined Model with Oracle

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 45False alarm rate

Fixa

tion

dete

ctio

n ra

te

Human AgreementAUC = 0.93

Cross-Image ControlAUC = 0.68

Combined Model(with oracle)AUC = 0.89

Combined Model(computational)AUC = 0.88

Page 46: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Summary of results

• Combined model accounts for 94% of human agreement in search fixations

• Context predicts human fixations better than saliency or target features in this search task

• How to get that last 6%?– Context “oracle”?

• Improves performance to 95% of human agreement

– Something else?

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 46

Page 47: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

What’s next?

VSS 2009, 5/10/09 Modeling Visual Search in 1000 Scenes 47False alarm rate

Fixa

tion

dete

ctio

n ra

te

Human AgreementAUC = 0.93

Cross-Image ControlAUC = 0.68

Combined ModelAUC = 0.88

Page 48: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target
Page 49: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target
Page 50: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target
Page 51: Modeling visual search in a thousand sceneskehinger.com/pdf/VSS2009_Ehinger_slides.pdf · 2012-04-10 · Modeling visual search in a thousand scenes: The roles of saliency, target

Acknowledgements

Ehinger, K. A., Hidalgo‐Sotelo, B., Torralba, A. & Oliva, A. Modeling Search for People in 900 Scenes: A combined source model of eye guidance. Visual Cognition, in press.

Barbara Hildalgo‐Sotelo Aude OlivaAntonio Torralba

VSS 2009, 5/10/09 51Modeling Visual Search in 1000 Scenes

Funded by a Singleton graduate research fellowship to KE, an NSF graduate research fellowship to BHS, and NSF CAREER awards to AT and AO.