quantifying context effects on image memorabilityfigrim.mit.edu/figrimposter.pdf · zoya bylinskii,...

1
mountain pasture amusement park HR: 64.2% HR: 59.2% HR: 50.2% HR: 55.6% airport terminal cockpit HR: 49.5% 93% 92% 88% 88% 37% 34% 33% 27% 32% 33% 35% 35% 89% 91% 93% 93% 28% 28% 31% 32% 79% 81% 85% 96% 15% 17% 19% 22% 91% 91% 91% 95% 91% 88% 87% 85% 23% 17% 14% 9% highest memorability lowest memorability 1.0 s ... ... 1.4 s time false alarm correct rejection hit repeat + + + + + Quantifying Context Effects on Image Memorability Zoya Bylinskii , Phillip Isola, Antonio Torralba, Aude Oliva Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology Crowd-sourced (AMT) memory (recognition) games FIGRIM Dataset (Fine-Grained Image Memorability) 21 scene categories, more than 300 instances/category Extrinsic Effects of Image Context Contextually distinct images are more memorable Pearson r(D1,HR1) = 0.26 Pearson r(D2,HR2) = 0.24 Pearson r(D,HR) = 0.35 More varied image contexts are more memorable overall Pearson r(H,HR) = 0.53 Pearson r(H,HR) = 0.74 Extrinsic Effects of Individual Observer P c (f i )= 1 kCk X j2C K(f i - f j ) Intrinsic Effects on Memorability v v Memorable within categories Memorable across categories We train a classifier to predict whether a set of eye movements will lead to a successful encoding H (C )= E c [- log P c (f i )] Memorability rank of images is consistent across participants and experiments Spearman r = 0.69-0.86 (for each of 21 categories) Paper: Bylinskii, Z., Isola, P., Bainbridge, C., Torralba, A., Oliva, A. "Intrinsic and Extrinsic Effects on Image Memorability", Vision Research 2015. Dataset: http://figrim.mit.edu Information theoretic model for context Memorability rank of categories is stable Spearman r = 0.68 (across splits of images) context: set of images from which experimental sequence is sampled Modeling is done with CNN (convolutional neural net) features which encode image semantics. Pairwise correlations b/w experiments r > 0.60 more likely to look like images from other categories more likely to be memorable across different contexts coarse binning and smoothing exemplar classifier trained per image test-time classification HR (hit rate): ratio of hits to total of hits and misses * we only use fixation position for prediction 60.09% 66.02% Eye movements are predictive of whether an image will be remembered later D(I; C2) - D(I; C1) HR c 2 (I) - HR c 1 (I) ï16.5 ï16 ï15.5 ï15 ï14.5 ï14 50 52 54 56 58 60 62 64 Context Entropy airport amusement park badlands bathroom bedroom bridge castle cockpit conference room dining room golf course highway house kitchen lighthouse living room mountain pasture playground skyscraper tower H(C) HR images sorted by expected classifier confidence ... * figures include scores for within-category experiment D(I ; C)= - log P c (f i )

Upload: others

Post on 15-Aug-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quantifying Context Effects on Image Memorabilityfigrim.mit.edu/figrimPoster.pdf · Zoya Bylinskii, Phillip Isola, Antonio Torralba, Aude Oliva Computer Science and Artificial Intelligence

92.9% 92.1% 88.1% 87.6% 36.8% 33.8% 33.0% 27.4%

93.3% 93.3% 90.8% 88.9% 35.3% 34.7% 33.0% 31.9%

mou

ntai

npa

stur

eam

usem

ent

park HR:

64.2%

HR: 59.2%

HR: 50.2%

84.8% 81.4% 78.8%95.7% 32.3% 31.2% 28.4% 28.4%

HR: 55.6%ai

rpor

tte

rmin

al

91.5% 87.5% 86.9% 85.4% 22.9% 17.1% 13.5% 9.0%

cock

pit

HR: 49.5%

94.9% 91.4% 91.4% 91.2% 21.8% 18.8% 17.1% 15.5%

93% 92% 88% 88% 37% 34% 33% 27%

32%33%35%35%89%91%93%93%

28%28%31%32%79%81%85%96%

15%17%19%22%91%91%91%95%

91% 88% 87% 85% 23% 17% 14% 9%

highest memorability lowest memorability

1.0 s

... ...

1.4 stime

false alarmcorrect rejection hitrepeat

+ + ++ +

Quantifying Context Effects on Image MemorabilityZoya Bylinskii, Phillip Isola, Antonio Torralba, Aude Oliva

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology

Crowd-sourced (AMT) memory (recognition) games

FIGRIM Dataset (Fine-Grained Image Memorability) 21 scene categories, more than 300 instances/category

Extrinsic Effects of Image Context

Contextually distinct images are more memorablePearson r(D1,HR1) = 0.26Pearson r(D2,HR2) = 0.24Pearson r(∆D,∆HR) = 0.35

More varied image contexts are more memorable overallPearson r(H,HR) = 0.53Pearson r(∆H,∆HR) = 0.74

Extrinsic Effects of Individual Observer

Pc(fi) =1

kCkX

j2C

K(fi � fj)

Intrinsic Effects on Memorability

v v

Memorable within categories

Memorable across categories

We train a classifier to predict whether a set of eye movements will lead to a successful encoding

H(C) = Ec[� logPc(fi)]

Memorability rank of images is consistent across participants and experimentsSpearman r = 0.69-0.86(for each of 21 categories)

Paper: Bylinskii, Z., Isola, P., Bainbridge, C., Torralba, A., Oliva, A. "Intrinsic and Extrinsic Effects on Image Memorability",

Vision Research 2015.

Dataset: http://figrim.mit.edu

Information theoretic model for context

Memorability rank of categories is stableSpearman r = 0.68(across splits of images)

context: set of images from which experimental sequence is sampled

Modeling is done with CNN (convolutional neural net) features which encode image semantics.

Pairwise correlations b/w experiments r > 0.60

more likely to look like images from other categories

more likely to be memorable across different contexts

coarse binning and smoothing

exemplar classifier trained per image

test-time classification

HR (hit rate): ratio of hits to total of hits and misses

* we only use fixation position for prediction

60.09%

66.02%

Eye movements are predictive of whether an image will be remembered later

D(I;C2)�D(I;C1)

HR

c 2(I)�HR

c 1(I)

ï16.5 ï16 ï15.5 ï15 ï14.5 ï14

50

52

54

56

58

60

62

64

Context Entropy

Ave

rage

HR

(%

)

airport

amusement park

badlands

bathroom

bedroom

bridge

castle

cockpit

conference room dining room

golf course

highway

house

kitchen

lighthouse

living room

mountain

pasture

playground

skyscraper tower

Pearson r = 0.53 (p = 0.01)

H(C)

HR

images sorted by expected classifier confidence

...

* figures include scores for within-category experiment

D(I;C) = � logPc(fi)