quantifying context effects on image memorabilityfigrim.mit.edu/figrimposter.pdf · zoya bylinskii,...
TRANSCRIPT
92.9% 92.1% 88.1% 87.6% 36.8% 33.8% 33.0% 27.4%
93.3% 93.3% 90.8% 88.9% 35.3% 34.7% 33.0% 31.9%
mou
ntai
npa
stur
eam
usem
ent
park HR:
64.2%
HR: 59.2%
HR: 50.2%
84.8% 81.4% 78.8%95.7% 32.3% 31.2% 28.4% 28.4%
HR: 55.6%ai
rpor
tte
rmin
al
91.5% 87.5% 86.9% 85.4% 22.9% 17.1% 13.5% 9.0%
cock
pit
HR: 49.5%
94.9% 91.4% 91.4% 91.2% 21.8% 18.8% 17.1% 15.5%
93% 92% 88% 88% 37% 34% 33% 27%
32%33%35%35%89%91%93%93%
28%28%31%32%79%81%85%96%
15%17%19%22%91%91%91%95%
91% 88% 87% 85% 23% 17% 14% 9%
highest memorability lowest memorability
1.0 s
... ...
1.4 stime
false alarmcorrect rejection hitrepeat
+ + ++ +
Quantifying Context Effects on Image MemorabilityZoya Bylinskii, Phillip Isola, Antonio Torralba, Aude Oliva
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology
Crowd-sourced (AMT) memory (recognition) games
FIGRIM Dataset (Fine-Grained Image Memorability) 21 scene categories, more than 300 instances/category
Extrinsic Effects of Image Context
Contextually distinct images are more memorablePearson r(D1,HR1) = 0.26Pearson r(D2,HR2) = 0.24Pearson r(∆D,∆HR) = 0.35
More varied image contexts are more memorable overallPearson r(H,HR) = 0.53Pearson r(∆H,∆HR) = 0.74
Extrinsic Effects of Individual Observer
Pc(fi) =1
kCkX
j2C
K(fi � fj)
Intrinsic Effects on Memorability
v v
Memorable within categories
Memorable across categories
We train a classifier to predict whether a set of eye movements will lead to a successful encoding
H(C) = Ec[� logPc(fi)]
Memorability rank of images is consistent across participants and experimentsSpearman r = 0.69-0.86(for each of 21 categories)
Paper: Bylinskii, Z., Isola, P., Bainbridge, C., Torralba, A., Oliva, A. "Intrinsic and Extrinsic Effects on Image Memorability",
Vision Research 2015.
Dataset: http://figrim.mit.edu
Information theoretic model for context
Memorability rank of categories is stableSpearman r = 0.68(across splits of images)
context: set of images from which experimental sequence is sampled
Modeling is done with CNN (convolutional neural net) features which encode image semantics.
Pairwise correlations b/w experiments r > 0.60
more likely to look like images from other categories
more likely to be memorable across different contexts
coarse binning and smoothing
exemplar classifier trained per image
test-time classification
HR (hit rate): ratio of hits to total of hits and misses
* we only use fixation position for prediction
60.09%
66.02%
Eye movements are predictive of whether an image will be remembered later
D(I;C2)�D(I;C1)
HR
c 2(I)�HR
c 1(I)
ï16.5 ï16 ï15.5 ï15 ï14.5 ï14
50
52
54
56
58
60
62
64
Context Entropy
Ave
rage
HR
(%
)
airport
amusement park
badlands
bathroom
bedroom
bridge
castle
cockpit
conference room dining room
golf course
highway
house
kitchen
lighthouse
living room
mountain
pasture
playground
skyscraper tower
Pearson r = 0.53 (p = 0.01)
H(C)
HR
images sorted by expected classifier confidence
...
* figures include scores for within-category experiment
D(I;C) = � logPc(fi)