resource recommendation vs privacy enhancement
DESCRIPTION
Social tagging has opened new possibilities for applications interoperability on the semantic web, while at the same time posing new privacy treats. Recommendation and information filtering systems in fact predict users preferences, providing personalized content to their users, but also exposing their profiles to possible privacy attacks. Tag suppression and forgery are Privacy Enhancing Techniques that protect users privacy to a certain extent, at the loss of semantic accuracy loss, or in other words privacy gain at the expenses of utility loss. The impact of tag suppression and forgery to content-based recommendation is hence investigated in a real world application scenario.TRANSCRIPT
1/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
Silvia [email protected]
“Research Seminar”Master in Telematics Engineering-UPC
On Content-Based Recommendation and Users Privacy in Social Tagging SystemsSilvia Puglisi
Barcelona, UPC, 2013
2/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
Social tagging is the activity that allows users to assign keywords (tags) to web based resources.
What is social tagging?
3/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
Tagging and tags
Tag: a label attached to someone or something for identification or other information
4/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
Scenario
Social tagging enables semantic interoperability in web applications.
Recommendation and information filtering systems have been developed to predict users preferences.
Users hence reveal their personal preferences on social tagging platforms.
Privacy enhancing techniques (PET) have been developed to protect user privacy to a certain extent, at the expense of semantic loss.
5/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
Objective
Using as starting point research done in the field of recommendations systems [1] and PET [2].
The objective of this study is evaluate the impact of two PET, tag forgery and suppression, on the performance of a recommendation system, on real world application data.
[1] Bellogín, Alejandro, Iván Cantador, and Pablo Castells. "A comparative study of heterogeneous item recommendations in social systems." Information Sciences (2012)
[2] Parra-Arnau, Javier, David Rebollo-Monedero, and Jordi Forné. "A privacy-protecting architecture for collaborative filtering via forgery and suppression of ratings." Data Privacy Management and Autonomous Spontaneus Security (2012): 42-57.
6/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
Dataset
Considering different social bookmarking platform, Delicious was identified as a representative system of an application rich in collaborative tagging information.
Delicious is a social bookmarking platform for web resources.
The dataset containing Delicious data was obtained from the ones publicly available at the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems.
7/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
Delicious
8/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
TechniquesModelling the User/Item Profile
The simplest approach to model users and items is to count the number of times a tag has been used:
•By a user to annotate different items in the same category.
•Or by the community to annotate the item.
The user/item profile is then described as a histogram of the relative frequencies of tags within a predefined set of categories of interest.
9/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
TechniquesHistogram of a user profile
10/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
Techniques Privacy Metric
The Kullback-Leibler (KL) divergence has been adopted as privacy criteria, following the perspective of Jaynes’ rationale on entropy maximization methods.
Since the KL divergence may be regarded as a generalization of entropy of a distribution, relative to another, it is often referred to as relative entropy.
11/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
TechniquesUtility Metric
A measure of how an item is useful for a certain user is needed.
We could convey that an item is useful if its profile is somehow similar to the user profile.
Hence we need a measure of similarity.
Content based recommender models are defined as similarity measures between users and item profiles. This is provided by the cosine-based similarity measure:
12/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
TechniquesPerformance Metric
The recommender system is evaluated considering a content retrieval scenario where a user is provided with a ranked list of N recommended items.
The performance metric adopted is hence among the commonly used for ranked list prediction, i.e. precision at top N.
In the field of Information Retrieval precision can be defined as the fraction of recommended items that are relevant for a target user.
13/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
Techniques Tag Forgery and Suppression
Tag suppression and forgery are privacy enhancing techniques that helps users who tags resources online, from revealing sensible information to a possible attacker.
14/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
Techniques Tag Forgery and Suppression Rates
The tag forgery rate represents the ratio of forged items:
The tag suppression rate, is the proportion of items that the user consents to eliminate:
15/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
Techniques The Privacy-Forgery-Suppression Function
Consistently the privacy-forgery-suppression function can be defined:
16/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
Evaluation
17/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
EvaluationStatistics about the dataset
Categories 11 Users 1867
Item-Category Tuples
98998 Avg. tags per user 477.75
Items 69226Avg. Items per Category
81044
Avg. categories per item
1.4 Tags per item 13.06
18/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
ResultsRelative Risk Reduction with forgery - Utility
19/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
ResultsRelative Risk Reduction with suppression - Utility
20/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
Conclusions
Tag suppression and forgery are simple privacy enhancing techniques able to protect users privacy at the cost of some semantic loss.
This study shows with a simple experimental evaluation, in a real world application scenario, how the performances degradation of a recommender system, is small if compared to the privacy risk reduction offered by the application of these techniques.
21/21Research Seminar. Silvia Puglisi
Departament d'Enginyeria Telemàtica
Thank you!