quantifying context effects on image memorabilityfigrim.mit.edu/figrimposter.pdf · zoya bylinskii,...

92.9% 92.1% 88.1% 87.6% 36.8% 33.8% 33.0% 27.4%

93.3% 93.3% 90.8% 88.9% 35.3% 34.7% 33.0% 31.9%

mou

ntai

npa

stur

eam

usem

ent

park HR:

64.2%

HR: 59.2%

HR: 50.2%

84.8% 81.4% 78.8%95.7% 32.3% 31.2% 28.4% 28.4%

HR: 55.6%ai

rpor

tte

rmin

al

91.5% 87.5% 86.9% 85.4% 22.9% 17.1% 13.5% 9.0%

cock

pit

HR: 49.5%

94.9% 91.4% 91.4% 91.2% 21.8% 18.8% 17.1% 15.5%

93% 92% 88% 88% 37% 34% 33% 27%

32%33%35%35%89%91%93%93%

28%28%31%32%79%81%85%96%

15%17%19%22%91%91%91%95%

91% 88% 87% 85% 23% 17% 14% 9%

highest memorability lowest memorability

1.0 s

... ...

1.4 stime

false alarmcorrect rejection hitrepeat

+ + ++ +

Quantifying Context Effects on Image MemorabilityZoya Bylinskii, Phillip Isola, Antonio Torralba, Aude Oliva

Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology

Crowd-sourced (AMT) memory (recognition) games

FIGRIM Dataset (Fine-Grained Image Memorability) 21 scene categories, more than 300 instances/category

Extrinsic Effects of Image Context

Contextually distinct images are more memorablePearson r(D1,HR1) = 0.26Pearson r(D2,HR2) = 0.24Pearson r(∆D,∆HR) = 0.35

More varied image contexts are more memorable overallPearson r(H,HR) = 0.53Pearson r(∆H,∆HR) = 0.74

Extrinsic Effects of Individual Observer

Pc(fi) =1

kCkX

j2C

K(fi � fj)

Intrinsic Effects on Memorability

v v

Memorable within categories

Memorable across categories

We train a classifier to predict whether a set of eye movements will lead to a successful encoding

H(C) = Ec[� logPc(fi)]

Memorability rank of images is consistent across participants and experimentsSpearman r = 0.69-0.86(for each of 21 categories)

Paper: Bylinskii, Z., Isola, P., Bainbridge, C., Torralba, A., Oliva, A. "Intrinsic and Extrinsic Effects on Image Memorability",

Vision Research 2015.

Dataset: http://figrim.mit.edu

Information theoretic model for context

Memorability rank of categories is stableSpearman r = 0.68(across splits of images)

context: set of images from which experimental sequence is sampled

Modeling is done with CNN (convolutional neural net) features which encode image semantics.

Pairwise correlations b/w experiments r > 0.60

more likely to look like images from other categories

more likely to be memorable across different contexts

coarse binning and smoothing

exemplar classifier trained per image

test-time classification

HR (hit rate): ratio of hits to total of hits and misses

* we only use fixation position for prediction

60.09%

66.02%

Eye movements are predictive of whether an image will be remembered later

D(I;C2)�D(I;C1)

HR

c 2(I)�HR

c 1(I)

ï16.5 ï16 ï15.5 ï15 ï14.5 ï14

50

52

54

56

58

60

62

64

Context Entropy

Ave

rage

HR

(%

)

airport

amusement park

badlands

bathroom

bedroom

bridge

castle

cockpit

conference room dining room

golf course

highway

house

kitchen

lighthouse

living room

mountain

pasture

playground

skyscraper tower

Pearson r = 0.53 (p = 0.01)

H(C)

HR

images sorted by expected classifier confidence

...

* figures include scores for within-category experiment

D(I;C) = � logPc(fi)

http://figrim.mit.edu

http://figrim.mit.edu

quantifying context effects on image memorabilityfigrim.mit.edu/figrimposter.pdf · zoya bylinskii,...

Documents