uncovering scene context for predicting privacy of online ......i we use cnn pre-trained on a large...

1
Uncovering Scene Context for Predicting Privacy of Online Shared Images Ashwini Tonge 1 , Cornelia Caragea 1 and Anna Squicciarini 2 1 Kansas State University, 2 Pennsylvania State University I MAGE P RIVACY P REDICTION ? I Rapid increase in social media can cause threat to user’s privacy I Many users are quick to share private images without realizing the consequences of an unwanted disclosure of these images. I Users rarely change default privacy settings, which could jeopardize their privacy [Zerr et al., 2012]. I Current social networking sites do not assist users in making privacy decisions for images that they share online. I An image Privacy Prediction system predicts the privacy setting for images and avoid a possible loss of users’ privacy. P RIOR W ORKS I Recently, Squicciarini et al. [Squicciarini et al., 2014] and Zerr et al. [Zerr et al., 2012] found that user tags are informative for classifying images as private or public. I However, since user tags are at the sole discretion of users, they typically tend to be noisy and incomplete. I Tonge and Caragea [Tonge and Caragea, 2016] automatically derived object tags from images’ content using Convolutional Neural Networks (CNN) and showed that the combination of object tags and user tags outperforms each set of tags individually. M OTIVATION Public Objects Private Bed Furniture Store Home Women Fashion Event Birthday Party People Concerts Religious Ceremony O UR C ONTRIBUTIONS I We propose the extraction of scene-centric tags to capture additional information from the visual content that is not captured by existing object-centric tags. I We show that scene tags are able to learn privacy characteristics to make appropriate predictions. I We explore the combination of user tags with object, scene and object-scene tags for privacy prediction. I Our results show that the combination of all three types of tags (object, scene, and user) yields better performance compared with user tags and the combination of user tags with scene or object tags. DATASETS I We evaluate the proposed features on a subset of Flickr images sampled from the PicAlert dataset [Zerr et al., 2012]. I PicAlert consists of Flickr images on various subjects, which are manually labeled as public or private by external viewers. I We select 32000 images from PicAlert randomly, out of which 10000 images are used as Train set and 22000 images are used as Test set. I The public and private images are in the ratio of 3:1. Private: Private image discloses sensitive information about a user. E.g., images with self-portraits, family, friends, someone’s home, etc. Public: Public images generally depict scenery, objects, animals, etc., which do not provide any personal information about a user. I MPORTANT L INKS https://goo.gl/bPruf6 I MAGE PRIVACY PREDICTION USING TAG FEATURES . Input convolution layer pooling layer convolution layer pooling layer ··· output layer fully-connected layers image content analysis using CNNs fc 6 ··· fc 7 ··· fc 8 ··· prob ··· softmax x 0.15 0 0 0.32 0 0.23 0 0.06 0.03 0 ··· 0.10 Bag-of-words Vector Maillot Kimono Athletic field Two-piece Tank suit Arena performance w T SVM x + b classification Figure: Image privacy prediction using tag features. 1. CNNs are used to extract object and scene tags (shown in green and italics) for input images. 2. The bag-of-word vectors for the tag features are used to predict the class of an image as public or private using Support Vector Machine (SVM) classifier. O BSERVATIONS W HAT TYPE OF PRIVACY - AWARE INFORMATION DO USER TAGS CONTAIN ? Rank Tags 1-5 people portrait outdoor girl woman 6-10 landscape smile architecture plant building 11-15 nature boy sky hair wedding 16-20 family bride bird animal flower 21-25 eyes child happy snow indoor 26-30 cloud party illustration drawing winter Table: Tags having high information gain. P ROPOSED F EATURES :P RIVACY P REDICTION I Feature Extraction We believe that scene tags can contribute along with object tags to learn privacy characteristics of a given image, as they can help provide clues into what the image owners intended to show through the photo. Therefore, we employ two types of semantic features for privacy prediction based on: I Object-centric Tags I We use CNN pre-trained on a large scale object dataset (ImageNet) [Russakovsky et al., 2015], to capture the objects depicted in the image. I We use the probability distribution over 1000 object categories for the input image obtained by applying the softmax function over the last fully-connected layer of the AlexNet CNN [Krizhevsky et al., 2012]. I We consider the top k objects of highest probabilities as object tags. I Scene-centric Tags I We use CNN pre-trained on a large scale scene dataset (Places2) [Zhou et al., 2016], to obtain the scene context of the image. I we obtain the top k scenes derived from the probability distribution over 365 scene categories of the AlexNet CNN. I We refer to the top k predicted scenes as scene tags. I Feature Representations for Classification I To encode the scene and object tags, we use the probability of the tag obtained from the softmax layer of the corresponding CNN. I We also consider user tags and encode them using a binary representation. I Using these feature representations, we train maximum margin (SVM) classifiers to predict the class of an image as private or public O BJECT T AGS ,S CENE T AGS &U SER T AGS Input Maillot Kimono Athletic field Two-piece Tank suit Arena performance Beijing China People outdoor landscape Object & Scene Tags User Tags Figure: Object, Scene and User tags for the input image. E XPERIMENTS AND R ESULTS W OULD SCENE - CENTRIC TAGS OBTAINED FROM THE VISUAL CONTENT BRING ADDITIONAL INFORMATION TO IMPROVE PRIVACY PREDICTION ? Features Acc % F1 Precision Recall #IncPred UT 81.73 0.789 0.803 0.817 - k = 2 UT+ST 82.26 0.797 0.81 0.823 293 UT+OT 83.09 0.812 0.819 0.831 477 UT+ST+OT 83.59 0.819 0.825 0.836 587 k = 10 UT+ST 83.21 0.814 0.821 0.832 503 UT+OT 84.35 0.833 0.834 0.843 755 UT+ST+OT 84.80 0.841 0.84 0.848 854 Table: Object Tags vs. Scene Tags. The best performance is shown in bold. R ESULTS Features UT public public public UT+OT private private public UT+ST public private private UT+ST+OT private public private Figure: Results obtained using tag features. C ONCLUSIONS I We proposed the use of scene-centric tags (along with user tags and object tags) and showed that they can improve image privacy prediction. I The results show that adding scene tags to user tags improves the performance over user tags alone. I The best performance is achieved when we consider the combination of user, scene, and object tags. I We conclude that scene and object tags complement each other and help boost the performance. I These automatically derived tags can provide the relevant cues for privacy-aware image retrieval and can become an essential tool for surfacing the hidden content of the deep web without exposing sensitive details. I In the future, more sophisticated methods to combine objects and scenes can be explored. R EFERENCES Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS’12, pages 1097–1105. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. (2015). ImageNet Large Scale Visual Recognition Challenge. IJCV, pages 1–42. Squicciarini, A. C., Caragea, C., and Balakavi, R. (2014). Analyzing images’ privacy for the modern web. HT ’14, pages 136–147. Tonge, A. K. and Caragea, C. (2016). Image privacy prediction using deep features. In AAAI’ 16, pages 4266–4267. Zerr, S., Siersdorfer, S., Hare, J., and Demidova, E. (2012). Privacy-aware image classification and search. In ACM SIGIR’ 12. Zhou, B., Khosla, A., Lapedriza, A., Torralba, A., and Oliva, A. (2016). Places: An image database for deep scene understanding. arXiv preprint arXiv:1610.02055. Acknowledgements: This research is supported by a grant from NSF to Cornelia Caragea and Anna Squicciarini. WWW: people.cs.ksu.edu/ccaragea

Upload: others

Post on 27-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Uncovering Scene Context for Predicting Privacy of Online ......I We use CNN pre-trained on a large scale scene dataset (Places2)[Zhou et al., 2016], to

Uncovering Scene Context for Predicting Privacy of Online Shared ImagesAshwini Tonge1, Cornelia Caragea1 and Anna Squicciarini2

1Kansas State University, 2Pennsylvania State University

IMAGE PRIVACY PREDICTION?

I Rapid increase in social media can cause threat to user’s privacyI Many users are quick to share private images without realizing the

consequences of an unwanted disclosure of these images.I Users rarely change default privacy settings, which could jeopardize their

privacy [Zerr et al., 2012].I Current social networking sites do not assist users in making privacy

decisions for images that they share online.I An image Privacy Prediction system predicts the privacy setting for images

and avoid a possible loss of users’ privacy.

PRIOR WORKS

I Recently, Squicciarini et al. [Squicciarini et al., 2014] and Zerr et al.[Zerr et al., 2012] found that user tags are informative for classifyingimages as private or public.

I However, since user tags are at the sole discretion of users, they typicallytend to be noisy and incomplete.

I Tonge and Caragea [Tonge and Caragea, 2016] automatically derived objecttags from images’ content using Convolutional Neural Networks (CNN)and showed that the combination of object tags and user tags outperformseach set of tags individually.

MOTIVATION

Public Objects Private

BedFurniture Store Home

WomenFashion Event Birthday Party

PeopleConcerts Religious Ceremony

OUR CONTRIBUTIONS

I We propose the extraction of scene-centric tags to capture additionalinformation from the visual content that is not captured by existingobject-centric tags.

I We show that scene tags are able to learn privacy characteristics to makeappropriate predictions.

I We explore the combination of user tags with object, scene and object-scenetags for privacy prediction.

I Our results show that the combination of all three types of tags (object,scene, and user) yields better performance compared with user tags and thecombination of user tags with scene or object tags.

DATASETS

I We evaluate the proposed features on a subset of Flickr images sampledfrom the PicAlert dataset [Zerr et al., 2012].

I PicAlert consists of Flickr images on various subjects, which are manuallylabeled as public or private by external viewers.

I We select 32000 images from PicAlert randomly, out of which 10000images are used as Train set and 22000 images are used as Test set.

I The public and private images are in the ratio of 3:1.

Private: Private image discloses sensitive information about a user. E.g.,images with self-portraits, family, friends, someone’s home, etc.

Public: Public images generally depict scenery, objects, animals, etc., whichdo not provide any personal information about a user.

IMPORTANT LINKS

https://goo.gl/bPruf6

IMAGE PRIVACY PREDICTION USING TAG FEATURES.

Input

convolutionlayer

pooling layer

convolutionlayer

pooling layer

· · ·

output layer

fully-connected layers

image content analysis using CNNs

fc6 · · ·

fc7 · · ·

fc8 · · ·

prob · · ·

softmax

x

0.1500

0.320

0.230

0.060.03

0· · ·

0.10

Bag-of-wordsVector

Maillot

Kimono

Athletic field

Two-piece

Tank suit

Arena performance

wTSVMx + b

classification

Figure: Image privacy prediction using tag features. 1. CNNs are used to extract object and scene tags (shown in green and italics) for input images. 2. The bag-of-word vectors for the tag featuresare used to predict the class of an image as public or private using Support Vector Machine (SVM) classifier.

OBSERVATIONS

WHAT TYPE OF PRIVACY-AWARE INFORMATION DO USER TAGSCONTAIN?

Rank Tags

1-5 people portrait outdoor girl woman

6-10 landscape smile architecture plant building

11-15 nature boy sky hair wedding16-20 family bride bird animal flower

21-25 eyes child happy snow indoor26-30 cloud party illustration drawing winter

Table: Tags having high information gain.

PROPOSED FEATURES: PRIVACY PREDICTION

I Feature ExtractionWe believe that scene tags can contribute along with object tags to learnprivacy characteristics of a given image, as they can help provide clues intowhat the image owners intended to show through the photo. Therefore, weemploy two types of semantic features for privacy prediction based on:

I Object-centric TagsI We use CNN pre-trained on a large scale object dataset (ImageNet)

[Russakovsky et al., 2015], to capture the objects depicted in the image.I We use the probability distribution over 1000 object categories for the input image

obtained by applying the softmax function over the last fully-connected layer of theAlexNet CNN [Krizhevsky et al., 2012].

I We consider the top k objects of highest probabilities as object tags.

I Scene-centric TagsI We use CNN pre-trained on a large scale scene dataset (Places2) [Zhou et al., 2016], to

obtain the scene context of the image.I we obtain the top k scenes derived from the probability distribution over 365 scene

categories of the AlexNet CNN.I We refer to the top k predicted scenes as scene tags.

I Feature Representations for ClassificationI To encode the scene and object tags, we use the probability of the tag obtained from the

softmax layer of the corresponding CNN.I We also consider user tags and encode them using a binary representation.I Using these feature representations, we train maximum margin (SVM) classifiers to

predict the class of an image as private or public

OBJECT TAGS, SCENE TAGS & USER TAGS

Input

Maillot

Kimono

Athletic field

Two-piece

Tank suit

Arena performance

Beijing

China

People

outdoor

landscape

Object & Scene Tags User Tags

Figure: Object, Scene and User tags for the input image.

EXPERIMENTS AND RESULTS

WOULD SCENE-CENTRIC TAGS OBTAINED FROM THE VISUAL CONTENTBRING ADDITIONAL INFORMATION TO IMPROVE PRIVACY PREDICTION?

Features Acc % F1 Precision Recall #IncPredUT 81.73 0.789 0.803 0.817 -

k = 2UT+ST 82.26 0.797 0.81 0.823 293UT+OT 83.09 0.812 0.819 0.831 477UT+ST+OT 83.59 0.819 0.825 0.836 587

k = 10UT+ST 83.21 0.814 0.821 0.832 503UT+OT 84.35 0.833 0.834 0.843 755UT+ST+OT 84.80 0.841 0.84 0.848 854

Table: Object Tags vs. Scene Tags. The best performance is shown in bold.

RESULTS

FeaturesUT public public publicUT+OT private private publicUT+ST public private privateUT+ST+OT private public private

Figure: Results obtained using tag features.

CONCLUSIONS

I We proposed the use of scene-centric tags (along with user tags and objecttags) and showed that they can improve image privacy prediction.

I The results show that adding scene tags to user tags improves theperformance over user tags alone.

I The best performance is achieved when we consider the combination ofuser, scene, and object tags.

I We conclude that scene and object tags complement each other and helpboost the performance.

I These automatically derived tags can provide the relevant cues forprivacy-aware image retrieval and can become an essential tool forsurfacing the hidden content of the deep web without exposing sensitivedetails.

I In the future, more sophisticated methods to combine objects and scenescan be explored.

REFERENCESKrizhevsky, A., Sutskever, I., and Hinton, G. E. (2012).Imagenet classification with deep convolutional neural networks.In NIPS’12, pages 1097–1105.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C.,and Fei-Fei, L. (2015).ImageNet Large Scale Visual Recognition Challenge.IJCV, pages 1–42.

Squicciarini, A. C., Caragea, C., and Balakavi, R. (2014).Analyzing images’ privacy for the modern web.HT ’14, pages 136–147.

Tonge, A. K. and Caragea, C. (2016).Image privacy prediction using deep features.In AAAI’ 16, pages 4266–4267.

Zerr, S., Siersdorfer, S., Hare, J., and Demidova, E. (2012).Privacy-aware image classification and search.In ACM SIGIR’ 12.

Zhou, B., Khosla, A., Lapedriza, A., Torralba, A., and Oliva, A. (2016).Places: An image database for deep scene understanding.arXiv preprint arXiv:1610.02055.

Acknowledgements: This research is supported by a grant from NSF to Cornelia Caragea and Anna Squicciarini. WWW: people.cs.ksu.edu/∼ccaragea