[ieee 2014 ieee international conference on pervasive computing and communication workshops (percom...

6
Personal DLP for Facebook Marco Ghiglieri , Martin Stopczynski , Michael Waidner ∗† Technische Universit¨ at Darmstadt (CASED) Fraunhofer SIT Email: (firstname.lastname)@cased.de Abstract—Data Loss Prevention (DLP) is a well-established security and privacy concept in enterprise environments: Enter- prise DLP tools scan outgoing messages and stop unintended information flows. It may catch malicious insiders, but the main use case is avoiding data leaks due to human errors. Good DLP tools prevent careless employees from doing something they would probably regret if made aware of. Individuals using online social networks are in a very similar situation: Often they share the wrong information with the wrong people, unaware of the risks and often even unaware of the technical meaning of what they are doing. Personal DLP, introduced in this paper, extends the notion of DLP to individual users. It makes the individual users aware of risks and mistakes, and it does so based on rules explicitly set by each user, and rules derived from that users’ past behavior and individual settings. Personal DLP raises awareness by explaining the risks, but the final decision is always with the user. I. I NTRODUCTION Data Loss Prevention (DLP) is a well-established security and privacy concept in enterprise environments [1]: DLP tools scan outgoing messages (electronic mail, files written to USB sticks, etc.) and stop unintended information flows, i.e., mes- sages that would reveal confidential or sensitive information to unauthorized recipients outside the enterprise. There is a plethora of DLP products on the market [2]. In Enterprise DLP it is always the enterprise that decides how to classify messages and who is allowed to send what kind of information to whom. Classification rules are mostly content-based, e.g., any text might be considered confidential. Some DLP tools might also raise an alarm whenever the overall communication behavior of an employee changes dramatically [3], but due to the data collection and management overhead and the obvious privacy problems this is rarely done in practice. The main use case for Enterprise DLP is avoiding data leaks due to human errors. Our work on Personal DLP is motivated by the observation that users of Online Social Networks (OSN) like Facebook 1 are essentially in the same situation: they accidentally leak their own data to the wrong people. For many online users, OSNs are the primary way of communicating with other users. In June 2013 Facebook, the market leader, had more than 699 million active daily users [4]. A gigantic stream of messages or photos are posted every day. Many are meant to be shared only with a certain select group of other users, but are nevertheless leaked to the world. Many are consciously published, but regretted later on [5]. Users have a variety of personal data associated with them, for instance all kinds of personal interests and preferences, shopping behavior or 1 https://www.facebook.com security sensitive data like credit card numbers. Even travel plans could be critical. For example, in 2010 CNN reported on a case where a burglary happened because a status message revealed that nobody is at home at a certain time [6]. In the abstract, many users know that publishing private data might have negative consequences, like hurting personal relationships, becoming the victim of stalking and mobbing, or (as in the example above) becoming the prime target crime. But the actual risk awareness is surprisingly low [7]. Many users still publish sensitive status updates on a daily basis, regretting them only when it is too late [5], [8]. Facebook in particular made matters even worse by continually changing the default privacy settings such that personal information became more and more visible to the public [9]. Several studies confirm that people do not know about their current privacy settings in Facebook, how to change them, and who can access their content [10], [11], [12]. Personal DLP, introduced in this paper, extends the notion of Enterprise DLP to individual users. It does for a user’s data and communications what Enterprise DLP does for the enterprise’s data and communications. It makes the individual users aware of risks and mistakes, and it does so based on rules explicitly set by the users themselves, and rules derived from the users’ past behavior and individual settings. Personal DLP raises awareness, but the final decision is always with the user. Despite all similarities there are a couple of fundamental differences between Enterprise and Personal DLP, which make this a challenging and interesting problem for research: Personal DLP operates on rules which are specific to each user, not on a global rule set. These rules can be dynamically set and modified by the user as well as require a very simple and intuitive management interface. Personal DLP rules must be consistent with the user’s privacy settings, and any other settings that describe the user’s normal and wanted communication behavior. Ideally, the starter set of rules is derived of these settings. Enterprise DLP often simply blocks unauthorized com- munication and raises an alarm, pretty much like classical intrusion detection. This is not an option for Personal DLP. The user needs to understand why the system considers a certain communication (e.g., a status posting) as risky, and needs to take an informed decision on whether to cancel the communication or overrule the recommendation. Enterprise DLP often scans the data without considering their source context, i.e., the decision depends only on the data seen. Personal DLP should be sensitive to the source context, e.g., it should consider the entire user profile and the intended audiences for a specific posting). The Sixth IEEE Workshop on SECurity and SOCial Networking, 2014 978-1-4799-2736-4/14/$31.00 ©2014 IEEE 629

Upload: michael

Post on 24-Mar-2017

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2014 IEEE International Conference on Pervasive Computing and Communication Workshops (PERCOM WORKSHOPS) - Budapest, Hungary (2014.03.24-2014.03.28)] 2014 IEEE International

Personal DLP for Facebook

Marco Ghiglieri∗, Martin Stopczynski∗, Michael Waidner∗†∗ Technische Universitat Darmstadt (CASED)

† Fraunhofer SIT

Email: (firstname.lastname)@cased.de

Abstract—Data Loss Prevention (DLP) is a well-establishedsecurity and privacy concept in enterprise environments: Enter-prise DLP tools scan outgoing messages and stop unintendedinformation flows. It may catch malicious insiders, but the mainuse case is avoiding data leaks due to human errors. Good DLPtools prevent careless employees from doing something they wouldprobably regret if made aware of. Individuals using online socialnetworks are in a very similar situation: Often they share thewrong information with the wrong people, unaware of the risksand often even unaware of the technical meaning of what theyare doing.Personal DLP, introduced in this paper, extends the notion ofDLP to individual users. It makes the individual users aware ofrisks and mistakes, and it does so based on rules explicitly set byeach user, and rules derived from that users’ past behavior andindividual settings. Personal DLP raises awareness by explainingthe risks, but the final decision is always with the user.

I. INTRODUCTION

Data Loss Prevention (DLP) is a well-established securityand privacy concept in enterprise environments [1]: DLP toolsscan outgoing messages (electronic mail, files written to USBsticks, etc.) and stop unintended information flows, i.e., mes-sages that would reveal confidential or sensitive informationto unauthorized recipients outside the enterprise. There is aplethora of DLP products on the market [2]. In EnterpriseDLP it is always the enterprise that decides how to classifymessages and who is allowed to send what kind of informationto whom. Classification rules are mostly content-based, e.g.,any text might be considered confidential. Some DLP toolsmight also raise an alarm whenever the overall communicationbehavior of an employee changes dramatically [3], but due tothe data collection and management overhead and the obviousprivacy problems this is rarely done in practice. The main usecase for Enterprise DLP is avoiding data leaks due to humanerrors.

Our work on Personal DLP is motivated by the observationthat users of Online Social Networks (OSN) like Facebook1

are essentially in the same situation: they accidentally leaktheir own data to the wrong people. For many online users,OSNs are the primary way of communicating with otherusers. In June 2013 Facebook, the market leader, had morethan 699 million active daily users [4]. A gigantic stream ofmessages or photos are posted every day. Many are meant tobe shared only with a certain select group of other users, butare nevertheless leaked to the world. Many are consciouslypublished, but regretted later on [5]. Users have a varietyof personal data associated with them, for instance all kindsof personal interests and preferences, shopping behavior or

1https://www.facebook.com

security sensitive data like credit card numbers. Even travelplans could be critical. For example, in 2010 CNN reportedon a case where a burglary happened because a status messagerevealed that nobody is at home at a certain time [6].

In the abstract, many users know that publishing privatedata might have negative consequences, like hurting personalrelationships, becoming the victim of stalking and mobbing, or(as in the example above) becoming the prime target crime. Butthe actual risk awareness is surprisingly low [7]. Many usersstill publish sensitive status updates on a daily basis, regrettingthem only when it is too late [5], [8]. Facebook in particularmade matters even worse by continually changing the defaultprivacy settings such that personal information became moreand more visible to the public [9]. Several studies confirmthat people do not know about their current privacy settingsin Facebook, how to change them, and who can access theircontent [10], [11], [12].

Personal DLP, introduced in this paper, extends the notionof Enterprise DLP to individual users. It does for a user’sdata and communications what Enterprise DLP does for theenterprise’s data and communications. It makes the individualusers aware of risks and mistakes, and it does so based onrules explicitly set by the users themselves, and rules derivedfrom the users’ past behavior and individual settings. PersonalDLP raises awareness, but the final decision is always with theuser. Despite all similarities there are a couple of fundamentaldifferences between Enterprise and Personal DLP, which makethis a challenging and interesting problem for research:

Personal DLP operates on rules which are specific to eachuser, not on a global rule set. These rules can be dynamicallyset and modified by the user as well as require a very simpleand intuitive management interface.

Personal DLP rules must be consistent with the user’sprivacy settings, and any other settings that describe theuser’s normal and wanted communication behavior. Ideally,the starter set of rules is derived of these settings.

Enterprise DLP often simply blocks unauthorized com-munication and raises an alarm, pretty much like classicalintrusion detection. This is not an option for Personal DLP.The user needs to understand why the system considers acertain communication (e.g., a status posting) as risky, andneeds to take an informed decision on whether to cancel thecommunication or overrule the recommendation.

Enterprise DLP often scans the data without consideringtheir source context, i.e., the decision depends only on the dataseen. Personal DLP should be sensitive to the source context,e.g., it should consider the entire user profile and the intendedaudiences for a specific posting).

The Sixth IEEE Workshop on SECurity and SOCial Networking, 2014

978-1-4799-2736-4/14/$31.00 ©2014 IEEE 629

Page 2: [IEEE 2014 IEEE International Conference on Pervasive Computing and Communication Workshops (PERCOM WORKSHOPS) - Budapest, Hungary (2014.03.24-2014.03.28)] 2014 IEEE International

This paper makes two major contributions. Firstly, weintroduce Personal DLP as a new class of tools for individualend users, helping them protecting their privacy (Section III).We have implemented the system as a proof-of-concept for theGoogle Chrome browser.Secondly, we conducted a user study involving 221 users (Sec-tion IV). This study confirmed the general value of PersonalDLP and identified surprising differences between certain usersegments. It showed us impressively that the FPP is a usableand well-designed tool. For the study, we implemented allfunctionalities of FPP into our test environment and did notforce anybody to use the real FPP tool.

II. RELATED WORK

Enterprise DLP products are available from various ven-dors, e.g., [13], [14]. They differ mostly by their accuracyand the channels they monitor. State-of-the-art EnterpriseDLP products support multiple channels, several gatewayswith different detection methods, central management andcomplex rule sets [2]. Their powerful capabilities imply ahigh configuration and management complexity. To the best ofour knowledge none of the existing Enterprise DLP productsenables the end-users to set their own rules, i.e., monitoringand filtering rules are always set by the enterprise. BothEnterprise DLP and Personal DLP use standard search andmachine learning algorithms for monitoring and filtering, i.e.,in that respect both concepts are very similar. In [13] a machinelearning algorithm is introduced, which can also be usedin a similar way in a Personal DLP for detecting privacyissues, here we implemented a Naive Bayes algorithm. Despitethe commercial success of DLP there has been very littleacademic research on the topic (as is confirmed by [3]). Opengeneral challenges for DLP are dealing with encrypted content,identifying the outsider in a communication, combining DLPwith access control, and preventing data leaks at a semanticlevel [3]. Interestingly, all mentioned challenges do not applyto Personal DLP, since the data is checked on the end-pointbefore the data is being encrypted, outsiders can be identifiedthrough the context information, access control is not neededand the semantic level is being addressed. Other researchin the area of DLP considers watermarking schemes andidentifying evidence for forensic analysis after a data leakagehas happened [15].

In essence, there are three main open problems with theway information sharing is controlled in OSNs like Facebook:

1) Users are concerned about the exposure of their privatedata [16], [10], because they do not understand how to set orchange their privacy settings to achieve the appropriate anddesired protection for their data [17], [12], [11].

2) Although users’ awareness for privacy controls seemsto increase over time [18], [19], another problem arises.Paradoxically, users’ act less concerned about sharing morepersonal information when they believe they gained bettercontrol by reducing the audience for their personal informationfrom ’public’ (all Facebook users) to ’only friends’ [20], [21],[10]. Note, that in Facebook the group ’friends’ contains allcontacts of the user.

3) Users who share their personal information with ’onlyfriends’ are again anxious about sharing content in an OSN

and often regret publishing personal information [22], [8].

Since the audience of ’friends’ is constantly growing (theaverage Facebook user has already 400 ’friends’ [21]) andsince users continue to willingly share more and more content[23], [24], the risk of unintentionally exposing personal datais rising. In order to solve this problem, various efforts havebeen done by researchers.

To protect personal data, authors suggest to use limitedaccess groups (e.g., friend sub-lists like co-worker or family).Those restrictions are possible in modern OSNs, but as re-search has shown, this mitigation strategy still raises concernsby OSN users over unwanted audiences viewing shared content[25], [26], [27]. Engelman et. al. have shown a graphicalinterface concept for Facebook to allow better selection of usergroups having access to content. However, in this design onlythe intersection and union of 3 groups are possible. As the userstudy has stated, the new interface was accepted, but still 50%of the users made errors while solving specific tasks of sharingcontent to subgroups of people [12]. A drastic alternative hasbeen discussed by Becker: He proposes a system to measurethe privacy risk attributed to friend relationships in Facebook.The system quantifies the amount of personal informationthat can be derived from one’s own contacts. As a mitigationstrategy Becker suggests to limit access or remove high-riskfriends from the friends list [28]. Unfortunately, this systemworks only on static profile information and not on continuoussharing of status messages. Alternatively, Watson et. al. [29]have proposed a system which assists users while configuringprivacy settings on profile information to restrict the accessfor specific people. Here again, this solution can not tacklethe three mentioned problems while sharing status messages.

A first attempt in the right direction, Mao et. al. [30]presents a detection system for three types of sensitive in-formation (vacation plans, alcohol and medical conditions).However, this approach only classifies existing Twitter mes-sages and does not provide any real time feedback to the user.Moreover, the classification is done in a general context anddoes not cover one’s own personal context information.

In summary: Despite the large amount of work on privacyand OSNs, there is still no solution that lets users effectivelycontrol who will be able to see status message containingreal sensitive personal information. The technical means mightexist, but users either do not understand those means, cannothandle them, or do not trust them. Personal DLP brings ussignificantly closer to a solution for this problem by directlyhighlighting sensitive personal information before publishingthem in the wrong context or audience.

III. CONCEPT OF A PERSONAL DLP

In this section we present the new general concept ofPersonal DLP, in particular the Facebook Privacy Protector(FPP). We implemented the FPP as an extension for theGoogle Chrome Browser, but we will not cover implemen-tation details in this paper.

A. Overview

Figure 1 outlines the overall concept of a Personal DLP. Weuse context information (e.g., profile information) and rules to

The Sixth IEEE Workshop on SECurity and SOCial Networking, 2014

630

Page 3: [IEEE 2014 IEEE International Conference on Pervasive Computing and Communication Workshops (PERCOM WORKSHOPS) - Budapest, Hungary (2014.03.24-2014.03.28)] 2014 IEEE International

label privacy sensitive elements in a specific content (e.g., theuser’s status message). Users control the rules being used withthe Management GUI. The PDLP Analyzer has two differentpipelines to process incoming content: the Labeling Pipelineand the Machine Learning Pipeline.

Fig. 1. General concept of Personal DLP

The result of the PDLP Analyzer is the set of elementswhich are considered privacy sensitive, according to our laterdefined decision functions. Each element has a privacy-awareLabel and User Advice. In the Facebook Privacy Protector thePDLP Analyzer receives a status message as input, evaluatesit with additional context information from the data’s source(e.g., the user’s profile) and outputs the status message withlabeled elements and advices for the user why a certain elementhas been labeled (e.g., credit card information should not beposted). Labels are implemented as colors, e.g., a word labeled”Critical” is highlighted in red.

B. Facebook Privacy Protector

The FPP implements the concept of Personal DLP specif-ically for Facebook. It helps end-users by preventing themfrom publishing privacy sensitive information to the wrongaudience. Key features of FPP are:

• Real time classification and highlighting of text whiletyping, and assisting information when hovering themouse cursor over the highlighted elements.

• Immediate feedback on critical elements with assistinginformation why those elements might compromiseone’s privacy and how to resolve the issues.

• Classification of unknown content by an appropriatemachine learning method, here Naive Bayes.

C. Analyzing PipelinesThe Machine Learning Pipeline and the Labeling Pipeline

(Figure 1) operate in parallel and both classify content. TheMachine Learning Pipeline classifies content with algorithmssuch as Naive Bayes [31] (implemented in the FPP) or aSupport Vector Machine (SVM) [32]. Formally, the LabelingPipeline gets content C as input, from which the featuresF = {f1, f2, ..., fn} are extracted. In order to obtain theset of labels L for content C we define a set of functionsM = {m1,m2, ...mn′}, which are able to rate and labelfeatures accordingly. Using this notation, the function forderiving set L is:

L0 = {} , Li = mk(F \i−1⋃

j=1

Lo) for i = 1, . . . , n′ (1)

L =

n′⋃

i=1

Li (2)

Eq. (1) describes the process of applying each function mk

to the feature set F minus the elements already labeled in allprevious steps. Eq. (2) determines the final labeling as theunion of all Li. The features F in FPP are the individualand cleaned words W of C, i.e., of the status message theuser is currently typing. ”Cleaned” means that all symbols areremoved that cannot be processed properly. Functions M arethe highlighting modules, i.e., they scan for privacy sensitiveelements. Each module in M evaluates only one specific aspectit is designed for, e.g., a pattern matching module for detectinginternational account numbers. Due to this modular pipelinedesign it is easy to add highlighting modules, and maintainthe modules individually. It allows fast scanning of words aswell as the efficient analysis of full text sections. Alreadyhighlighted words are removed (see Eq. (1)), which turnedout to significantly improve the scanning performance. Figure2 visualizes this concept and shows the already implementedmodules. The remainder of this subsection describes eachmodule of FPP in some detail.

Fig. 2. FPP Analyzing Module Pipeline

Profile Information: In Facebook the profile informationare privacy sensitive depending on the audience set by theuser. Thus, all information gathered are context sensitive andhas to be evaluated. This includes the personal information inthe likes and about section.In future, it is also possible to analyze the photos and eventssection to gather more personal data for analyzing.

Friends: To prevent unintended naming and exposure offriends, FPP warns before publishing the exact names of theuser’s contacts.

Location Information: Publishing location informationtells a lot about the user’s private life. FPP highlights locationinformation like city names, especially the ones included in theuser’s own contact address field and profile (maps section).

Pattern Matching: Sensitive information like credit cardnumbers, telephone numbers, e-mail addresses and indicatorsfor customer identities are described by a set of regular ex-pressions. Adding more regular expressions is easily possible.

Naive Bayes: Containing this classifier, the FPP can betrained to recognize potentially privacy violating postings. Itdecides between the classes critical and non-critical by whichthe overall assessment is determined. This algorithm is knownfor its good accuracy in deciding two-class problems [33].

In the user study we additionally tested the highlighting ofpronouns like in ’my brother’ since we assume that they oftenreveal sensitive information. However, we could not prove ourassumption in the user study.

The Sixth IEEE Workshop on SECurity and SOCial Networking, 2014

631

Page 4: [IEEE 2014 IEEE International Conference on Pervasive Computing and Communication Workshops (PERCOM WORKSHOPS) - Budapest, Hungary (2014.03.24-2014.03.28)] 2014 IEEE International

D. Visualization

FPP uses four visualization elements to give the users feed-back about the possible privacy threat in the status message,see Figure 3.

Fig. 3. Implementation Screen Shot with Visualization Elements

Above the input text box on the right side, FPP givesa hint on how many other users might be able to see theposting. The number depends on the selected audience, in thisexample ’public’, and is estimated through your friend andother groups/lists. The value for ’public’ is fixed and takenfrom [4].

Element Highlighting: Each module labels privacy sensi-tive elements and highlights those depending on the privacythreat in three different colors: green (not critical), yellow(maybe critical), red (critical). We used the traffic lights colorconcept, since it is well known to users indicating danger(mental model), inspired by ISO 9241-10 (conformity withuser expectations) [34]. Moreover, we added a manual colorconfiguration option, so color blind users might set their owncolor scheme.

Privacy Advice Pop-up: In addition to the element high-lighting, FPP offers assistance when hovering the mouse cursorover the highlighted element. Depending on the context, theuser gets a specific advice why an element was highlighted.The background color for the pop-up message is always thesame as the highlighted element. The typical structure of theuser advice pop-up is defined as follows:

• Privacy Setting: Displays the privacy setting (visibil-ity/audience) set for the element or the category it wasidentified in (public, friends, custom)

• Found in: Shows the source (category) in which thehighlighted element was found (Likes/Music, Friends,Location,...)

• Hint: Providing the user an advice why this elementwas highlighted and why it is not recommended topublish it.

For each different privacy sensitive element evaluated bya module, we developed specific assisting advices (hints) forthe user on how to deal with the highlighted elements (seetable appendix2). Inspired by the work of Egelman et. al.[35], in which the researchers show that task interruptingwarning messages result in a better behavior on users, wetested our tool in a pre-study with 10 students on real Facebookwith automatically showing warning messages (hints) whiletyping. Because of visibility reasons, the warning message wasdisplayed until the next highlighted element was detected orwhen the user hovers the mouse over one element. As a result

2URL will be available in final version

9 of 10 users liked this behavior and only one reported that anoption to disable the automatic popup would be nice, whichwe added in the final version of the tool.

We assume that despite Sunshine et. al. [36], in a longterm usage, the users will not ignore the highlighted elements,because unlike challenging SSL warnings, the users understandtheir own personal information typed-in and want to react onspecific highlights to protect their privacy [37].

Smiley Icon: Giving the user an overall status messagerating indication and getting the user’s attention, we includedan automatic changing smiley icon depending on the privacysensitivity. As seen in Figure 3, the smiley is always in the rightcorner inside the status message text box. This indicator shallgive the user the feedback whether the typed text is privacysensitive or not.

Fig. 4. Status Message Visualization: not critical, maybe critical, very critical

Post Button: By coloring the background of the postbutton, the user gets an additional visualization of the overallstatus message evaluation. The resulting color depends on theposting evaluation (see Figure 4).

IV. USER STUDY

We conducted an extensive online study focused on theusers’ habits regarding sensitive status messages and to mea-sure the usability of the Facebook Privacy Protector. Astechnical basis we developed a survey website simulating alightweight OSN environment similar to Facebook, integratedthe FPP and hosted it on a server at our institute. Thisguaranteed a stable environment for each user. With thissetup we could reach a broad range of diverse participantsin February 2013. Summarizing the demographics of the 221participants, our sample concludes: 126 male & 95 female, 74with IT-background, 126 students and 204 Facebook user inthe age 14-54.

A. Methodology

The study was split into three phases: Phase 1 was aclassical usability test; Phase 2 requested the user to choosethe right audience for certain status messages; Phase 3 was aconventional questionnaire3 about the user’s behavior in OSNsand the perception during the previous two phases. Prior toeach phase a tutorial explained the tasks to be performed next,the environment to be used, and where appropriate also thefeatures and functionality of the Facebook Privacy Protectorrelevant for that phase.

Phase 1 was developed to measure whether the tool isusable and can raise the awareness users for critical informa-tion. We developed 20 status messages3, Q1 to Q20, and threedifferent task types, T1 to T3. All tasks had a pre-written statusmessage and were differentiated by the existent of the FPP toolassistance as well as the method of entering the message (seeTable I).

3URL will be available in the final version

The Sixth IEEE Workshop on SECurity and SOCial Networking, 2014

632

Page 5: [IEEE 2014 IEEE International Conference on Pervasive Computing and Communication Workshops (PERCOM WORKSHOPS) - Budapest, Hungary (2014.03.24-2014.03.28)] 2014 IEEE International

TABLE I. TASK TYPES

TaskType

Pre-WrittenStatus Msg.

FPP ToolAssistance

Text-enteringmethod

T1 yes yes yesT2 yes yes noT3 yes no yes

To evaluate three different sequences, we rotationally as-signed each user at registration time to a group U1, U2 or U3

(see Table II).The objective of the groups U1 and U3 was to determine if

TABLE II. USER GROUPS

User Group Sequence UsersU1 10x T1 (Q1 - Q10),

10x T2 (Q11 - Q20)70 (31.67%)

U2 10x T3 (Q1 - Q10),10x T2 (Q1 - Q10)

81 (36.65%)

U3 10x T3 (Q1 - Q10),10x T2 (Q11 - Q20)

70 (31.67%)

there is a difference in the ratings of status messages Q11 toQ20, depending on whether the user had tool assistance on Q1

to Q10 or not.U2 was developed to give a direct indicator whether the toolis helping people to choose a better privacy rating for a statusmessage according to its audience.

Phase 2 aimed at showing whether users are able to selectthe least privacy critical audience for a given status message.Each user had to select the ’right’ audience for the pre-writtenstatus messages Q21 to Q23 with tool assistance.

B. Results

The next paragraphs demonstrate (1) that FPP is usable and(2) can help protecting people from negative effects of postingprivacy critical status messages.

Data Evaluation: As a first step in the evaluation, wegrouped the participants according to the average privacy ratingof the presented status messages. Those with an average above5.0 (critical) were put into group (A) - 19 participants, thosebelow 3.0 into (B) - group size 9, and those in between into(C) - 193 participants. Intuitively, (A) are the most privacy-aware and (B) the most privacy-unaware users. The split alsoshowed that the grouping does not depend on age or gender.Starting with the main user group (C) and looking into U2,which had been developed to measure the performance of FPP,we found 22 (27.16%) participants who, after using the tool,ended up with a better privacy rating. Better privacy ratingmeans here a higher rating, e.g., from 3.5 to 5. Even out ofthe privacy aware group (A) we found 9 (11.11%) participantswith a better rating. This indicates that FPP helped 31 (38.27%)out of 81 users in U2. We measured up to 25% better privacyratings at these users, which means a higher sensibility forcritical information.

In the questionnaire, we asked the participants to selectthose pre-defined categories, which describe a critical statusmessage they would regret when published. In comparison tothe results reported by researchers [22], [5], [8], we show asignificant shift in the ranking of critical categories one would

regret later. While Morrison et. al. found, that people regretmost sharing content in the following order of categories: Ro-mance, Family, Career and Education. We evaluated that usersregret mostly content in this category order: Alcohol 90%,Work & Phone Numbers 88%, Relationships 77%, CurrentLocation 74%, Nightlife 69% Family 58%. By including thosenew categories, we believe that this represents a more realisticposting behavior. These findings demonstrate the actual needof the protection features supported by the Facebook PrivacyProtector.

Looking at groups, we asked the user to select those pre-defined groups which are most similar to those they are usingin the OSN. As a result, around 40% of the users voted for eachof the groups ’Best Friends’, ’Family’, ’Co-Worker’ and 5-10%for ’Online Gaming Friends’ and ’Sport Club’. Surprisingly,only 60.3% participants selected ’Friends’ which in fact is thedefault Facebook group. This supports the results found byother researchers [21], [10], [26], indicating that many usersare not aware of which audience they are publishing to.

Switching to Phase 2, we measured that the selection ofgroups in Facebook is only clear to the users, when the groupsare defined in the status message indirectly. In our case Q21

should post the credit card number to public, where nearlyall participants changed the audience to ’close friends’, thesmallest in our user study. However, in status message Q23, inwhich an invitation for a party is published (similar case as in[38]), 24 people selected ’friends’ as the appropriate recipientgroup. Nevertheless, our tool indicated that this group is notthe most privacy friendly group, the status message was postedby the user. As an explanation the Facebook group ’friends’is a superset of all groups, which is not very clearly definedand hence often misunderstood by users.

Related to this observation we confirmed in Phase 3 thatusers rarely change groups in Facebook. 66% of the partici-pants answered that they never, or just a few times per year,modified their group members. Even more said (71%), thatthey hardly ever change the audience when posting to theirwall. This indicates that users are not used to this possibleoption of limiting the audience and thus to use this as aprivacy setting for the content they publish. An interestingphenomenon identified also in Phase 3 is the question whetherpeople are actively considering the sensitivity of a statusmessage before publishing it. Around 87% users answeredwith yes, suggesting that only the other 13% of users shouldpublish sensitive status messages. But actually 58% users seeprivacy sensitive messages posted by others on their wall.

Usability Evaluation: We tested the usability of FPP withthe System Usability Scale (SUS) [39], which consists of tenquestions to measure the effectiveness, efficiency and usersatisfaction (calculation formula see [39]). The results indicatea very good usability value of 76.63 points (out of 100).Referring to Bangor et al. [39], acceptable products have aSUS score of 70. Evaluating the questionnaire, the majority ofusers liked to see the numbers of potential recipients of a statusmessage displayed above the text box (see Figure 3). 72.22%of the users answered that seeing these numbers changed theirrating of status messages. 77.78% stated that the colored wordhighlighting was very useful, which demonstrates the value ofFPP.

The Sixth IEEE Workshop on SECurity and SOCial Networking, 2014

633

Page 6: [IEEE 2014 IEEE International Conference on Pervasive Computing and Communication Workshops (PERCOM WORKSHOPS) - Budapest, Hungary (2014.03.24-2014.03.28)] 2014 IEEE International

V. CONCLUSION AND FUTURE WORK

We introduced Personal DLP as a new concept and de-scribed a prototypical Personal DLP tool specifically for Face-book, the Facebook Privacy Protector (FPP). The tool helpsusers to understand and assess the privacy sensitivity of theirFacebook status messages, in real time, and relative to the cho-sen audience. A broad user study of FPP with 221 participantsdemonstrated that the tool is useful and well-designed. Thetool has no recognizable impact on performance, and a goodlevel of usability. Most importantly, FPP has a positive impacton the users’ privacy awareness: according to our analysis,more than 38% of the user group U2 achieved a more realisticself-assessment of their status messages when supported byFPP. As expected, very privacy-aware users (Group (A) in ouranalysis) benefit less from FPP than the average user (Group(C)). Overall, the test users had up to 25% better privacyratings (higher sensibility for critical information).From a technical perspective FPP is just the first step towardsa comprehensive Personal DLP for Facebook. In ongoingwork we are extending the concept along multiple dimensions.Firstly, the tool will be equipped with more sophisticatedpatterns and machine learning algorithms and other sourcesof training data. Secondly, we will work on an auto-selectionof appropriate audiences by the FPP. In order to cover thischallenge, we will start with the already developed algorithmsin scanning profiles (extended to all friends) to be able togenerate on-the-fly context specific audience groups.

ACKNOWLEDGMENT

We thank B. Jeutter and S. Funke for their support; bothwrote their Bachelor Theses in the context of this project.This work was supported by the European Center for Securityand Privacy by Design (EC SPRIDE), funded by the GermanFederal Ministry of Education and Research (BMBF), and theCenter for Advanced Security Research Darmstadt (CASED),funded by the LOEWE program of the Hessian Ministry forScience and the Arts (HMWK).

REFERENCES

[1] Ernst and Young, “Insights on governance, risk and compliance – Dataloss prevention: Keeping your sensitive data out of the public domain,”Tech. Rep., 2011.

[2] E. Ouellet, “Magic Quadrant for Content-Aware Data Loss Prevention,”Gartner, Inc., 2013.

[3] P. Raman, H. G. Kayacik, and A. Somayaji, “Understanding Data LeakPrevention,” 2011.

[4] Facebook Inc., “Key Facts - Statistics as of June 2013,”http://newsroom.fb.com/Key-Facts, 2013.

[5] Y. Wang, G. Norcie, and S. Komanduri, “I regretted the minute I pressedshare: A qualitative study of regrets on Facebook,” SOUPS, 2011.

[6] R. Kaye, CNN, “Facebook friend or foe?”http://ac360.blogs.cnn.com/2010/08/05/facebook-friend-or-foe,2010.

[7] T. Govani and H. Pashley, “Student awareness of the privacy implica-tions when using Facebook,” 2005.

[8] M. Morrison and N. J. Roese, “Regrets of the Typical American: Find-ings From a Nationally Representative Sample,” Social Psychologicaland Personality Science, 2011.

[9] T. Paul, M. Stopczynski, D. Puscher, M. Volkamer, and T. Strufe, “C4ps- helping facebookers manage their privacy settings,” 2012.

[10] L. Brandimarte, a. Acquisti, and G. Loewenstein, “Misplaced Confi-dences: Privacy and the Control Paradox,” Social Psychological andPersonality Science, 2012.

[11] Y. Liu, K. Gummadi, B. Krishnamurthy, and A. Mislove, “AnalyzingFacebook Privacy Settings: User Expectations vs. Reality,” ACM SIG-COMM, 2011.

[12] S. Egelman, A. Oates, and S. Krishnamurthi, “Oops, I did it again:Mitigating repeated access control errors on Facebook,” in SIGCHI,2011.

[13] Symantec, “Machine Learning Sets New Standard for Data Loss Pre-vention: Describe, Learn,” White Paper: Data Loss Prevention, 2010.

[14] Websense, “Selecting the right DLP Solution: Enterprise, Lite, OrChannel?” 2011.

[15] P. Papadimitriou and H. Garcia-Molina, “Data leakage detection,” IEEETransactions, 2011.

[16] M. Skeels and J. Grudin, “When social networks cross boundaries:a case study of workplace use of facebook and linkedin,” ACMSIGGROUP, 2009.

[17] M. Madejski, M. Johnson, and S. Bellovin, “A study of privacy settingserrors in an online social network,” SESOC, 2012.

[18] D. Boyd and E. Hargittai, “Facebook privacy settings: Who cares,”2010.

[19] A. I. Anton, J. B. Earp, and J. D. Young, “How Internet Users’Privacy Concerns Have Evolved since 2002,” IEEE Security & PrivacyMagazine, 2010.

[20] F. Stutzman and J. Kramer-Duffield, “Friends only: examining aprivacy-enhancing behavior in facebook,” SIGCHI, 2010.

[21] M. Johnson, S. Egelman, and S. Bellovin, “Facebook and Privacy: It’sComplicated Categories and Subject Descriptors,” 2012.

[22] N. Roese and A. Summerville, “What we regret most... and why,” PETs,2005.

[23] A. Joinson, “Looking at, looking up or keeping up with people?: motivesand use of facebook,” SIGCHI, 2008.

[24] A. Young and A. Quan-Haase, “Information revelation and internetprivacy concerns on social network sites: a case study of facebook,”C&T, 2009.

[25] H. Krasnova, O. Gunther, S. Spiekermann, and K. Koroleva, “Privacyconcerns and identity in online social networks,” Identity in the Infor-mation Society, 2009.

[26] P. Kelley, R. Brewer, and Y. Mayer, “An investigation into Facebookfriend grouping,” IFIP TC, 2011.

[27] S. Jones and E. O’Neill, “Feasibility of structural network clusteringfor group-based privacy control in social networks,” SOUPS, 2010.

[28] J. Becker and H. Chen, “Measuring privacy risk in online socialnetworks,” W2SP, 2009.

[29] J. Watson, M. Whitney, and H. R. Lipford, “Configuring audience-oriented privacy policies,” ACM, 2009.

[30] H. Mao, X. Shuai, and A. Kapadia, “Loose tweets: an analysis ofprivacy leaks on twitter,” WPES, 2011.

[31] G. Dusbabek, “A naive bayesian classifier,”http://www.dusbabek.org/˜garyd/bayes, 2012.

[32] I. Steinwart and A. Christmann, Support vector machines, 2008.

[33] I. Rish, “An empirical study of the naive bayes classifier,” in IJCAIEmpirical methods in artificial intelligence, 2001.

[34] International Organization for Standardization, “Iso 9241-10 dialogueprinciples,” 1996, http://www.iso.org.

[35] S. Egelman, L. F. Cranor, and J. Hong, “Youve Been Warned: An Em-pirical Study of the Effectiveness of Web Browser Phishing Warnings,”CHI, 2008.

[36] J. Sunshine, S. Egelman, H. Almuhimedi, N. Atri, and L. F. Cranor,“Crying Wolf: An Empirical Study of SSL Warning Effectivenes,”USENIX, 2009.

[37] P. Kumaraguru, S. Sheng, A. Acquisti, L. F. Cranor, and J. Hong,“Teaching Johnny not to fall for phish,” 2010.

[38] C. Matyszczyk, “Girl makes Facebook party invite public, riotpolice called,” http://news.cnet.com/8301-17852 3-57518327-71/girl-makes-facebook-party-invite-public-riot-police-called, 2012 .

[39] J. Brooke, “Sus - a quick and dirty usability scale,” Usability evaluationin industry, 1996.

The Sixth IEEE Workshop on SECurity and SOCial Networking, 2014

634