privacy leakage analysis in online social networks

25
Accepted Manuscript Privacy Leakage Analysis in Online Social Networks Yan Li , Yingjiu Li , Qiang Yan , Robert H. Deng PII: S0167-4048(14)00158-8 DOI: 10.1016/j.cose.2014.10.012 Reference: COSE 847 To appear in: Computers & Security Received Date: 16 June 2014 Revised Date: 16 September 2014 Accepted Date: 19 October 2014 Please cite this article as: Li Y, Li Y, Yan Q, Deng RH, Privacy Leakage Analysis in Online Social Networks, Computers & Security (2014), doi: 10.1016/j.cose.2014.10.012. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Upload: robert-h

Post on 01-Apr-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Privacy leakage analysis in online social networks

Accepted Manuscript

Privacy Leakage Analysis in Online Social Networks

Yan Li , Yingjiu Li , Qiang Yan , Robert H. Deng

PII: S0167-4048(14)00158-8

DOI: 10.1016/j.cose.2014.10.012

Reference: COSE 847

To appear in: Computers & Security

Received Date: 16 June 2014

Revised Date: 16 September 2014

Accepted Date: 19 October 2014

Please cite this article as: Li Y, Li Y, Yan Q, Deng RH, Privacy Leakage Analysis in Online SocialNetworks, Computers & Security (2014), doi: 10.1016/j.cose.2014.10.012.

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service toour customers we are providing this early version of the manuscript. The manuscript will undergocopyediting, typesetting, and review of the resulting proof before it is published in its final form. Pleasenote that during the production process errors may be discovered which could affect the content, and alllegal disclaimers that apply to the journal pertain.

Page 2: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Privacy Leakage Analysis in Online Social Networks

Yan Lia,∗, Yingjiu Lia, Qiang Yana, Robert H. Denga

aSchool of Information Systems, Singapore Management University

Abstract

Online Social Networks (OSNs) have become one of the major platforms for social interactions, such as building up relationship,sharing personal experiences, and providing other services. The wide adoption of OSNs raises privacy concerns due to personaldata shared online. Privacy control mechanisms have been deployed in popular OSNs for users to determine who can view theirpersonal information. However, user’s sensitive information could still be leaked even when privacy rules are properly configured.We investigate the effectiveness of privacy control mechanisms against privacy leakage from the perspective of information flow.Our analysis reveals that the existing privacy control mechanisms do not protect the flow of personal information effectively. Byexamining representative OSNs including Facebook, Google+, and Twitter, we discover a series of privacy exploits. We find thatmost of these exploits are inherent due to the conflicts between privacy control and OSN functionalities. The conflicts reveal that theeffectiveness of privacy control may not be guaranteed as most OSN users expect. We provide remedies for OSN users to mitigatethe risk of involuntary information leakage in OSNs. Finally, we discuss the costs and implications of resolving the privacy exploits.

Keywords: Online social network, privacy control, information flow, private information leakage, inherent exploit

1. Introduction

Online Social Network services (OSNs) have become an es-sential element in modern life for human beings to stay con-nected to each other. About 82% online population use at leastone OSN such as Facebook, Google+, Twitter, and Linkedln,which facilitates building relationship, sharing personal expe-riences, and providing other services [1]. Via OSNs, massiveamount of personal data is published online and accessed byusers from all over the world. Prior research [2, 3, 4, 5] showsthat it is possible to infer undisclosed personal data from pub-licly shared information. Nonetheless, the availability and qual-ity of the public data causing privacy leakage are decreasing dueto the following reasons: 1) privacy control mechanisms havebecome the standard feature of OSNs and keep evolving. 2) thepercentage of users who choose not to publicly share informa-tion is also increasing [4]. In this tendency, it seems that privacyleakage could be prevented as increasingly comprehensive pri-vacy control is in place. However, this may not be achievableaccording to our findings.

Instead of focusing on new attacks, we investigate the prob-lem of privacy leakage under privacy control (PLPC). PLPCrefers to private information leakage even if privacy rules areproperly configured and enforced. For example, Facebook al-lows its users to control over who can view their friend listson Facebook. Alice, who has Bob in her friend list on Face-book, may not allow Bob to view her complete friend list. Asan essential functionality, Facebook recommends to Bob a list

∗Corresponding authorEmail addresses: [email protected] (Yan Li),

[email protected] (Yingjiu Li), [email protected](Qiang Yan), [email protected] (Robert H. Deng)

of users, called “people you may know”, to help Bob make morefriends. This list is usually compiled by enumerating the friendsof Bob’s friends on Facebook, which includes Alice’s friends.Even though Alice doesn’t allow Bob to view her friend list,Alice’s friend list could be leaked as recommendation to Bobby Facebook.

We investigate the underlying reasons that make privacycontrol vulnerable from the perspective of information flow. Westart with categorizing the personal information of an OSN userinto three attribute sets according to who the user is, whom theuser knows, and what the user does, respectively. We modelthe information flow between these attribute sets and examinethe functionalities which control the flow. We inspect repre-sentative real-world OSNs including Facebook, Google+, andTwitter, where privacy exploits and their corresponding attacksare identified.

Our analysis reveals that most of the privacy exploits are in-herent due to the underlying conflicts between privacy controland essential OSN functionalities. The recommendation fea-ture for social relationship is a typical example, where it helpsexpanding a user’s social network but it may also conflict withother users’ privacy concerns for hiding their social relation-ships. Therefore, the effectiveness of privacy control may notbe guaranteed even if it is technically achievable. We investi-gate necessary conditions for protecting against privacy leakagedue to the discovered exploits and attacks. Based on the neces-sary conditions, we provide suggestions for users to minimizethe risk of involuntary information leakage when sharing pri-vate personal information in OSNs.

We analyze the potentially vulnerable users due to our iden-tified attacks through user study, in which we investigate partic-ipants’ usage, knowledge, and privacy attitudes towards Face-

Preprint submitted to Elsevier September 16, 2014

Page 3: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

book, Google+, and Twitter. Based on the collected data, we in-vestigate the vulnerability of these participants who could leakthe private information through the attacks. We further discussthe costs and implications of resolving these privacy exploits.

We summarize the contributions of this paper as follows:

• We investigate the interaction between privacy controland information flow in OSNs. We show that the conflictbetween privacy control and essential OSN functionali-ties restricts the effectiveness of privacy control in OSNs.

• We identify privacy exploits for current privacy con-trol mechanisms in typical OSNs, including Facebook,Google+, and Twitter. Based on these privacy exploits,we introduce a series of attacks for adversaries with dif-ferent capabilities to obtain private personal information.

• We investigate necessary conditions for protectingagainst privacy leakage due to the discovered exploits andattacks. We provide suggestions for users to minimizethe risk of privacy leakage in OSNs. We also analyze thecosts and implications of resolving discovered exploits.While it is possible to fix the exploits due to implemen-tation defects, it is not easy to eliminate the inherent ex-ploits due to the conflicts between privacy control and thefunctionalities. These conflicts reveal that the effective-ness of privacy control may not be guaranteed as mostOSN users expect.

The rest of this paper is organized as follows: Section 2 pro-vides background information about OSNs. Section 3 presentsour threat model and assumptions. Section 4 models informa-tion flows between attribute sets in OSNs. Section 5 presentsdiscovered exploits, attacks, and mitigations for the exploits.Section 6 analyzes the potentially vulnerable users due to theattacks. Section 7 discusses the implications of our findings.Finally, Section 8 describes related work and Section 9 summa-rizes our conclusions.

2. Background

In a typical OSN, Alice owns a space which consists of aprofile page and a feed page for publishing Alice’s personal in-formation and receiving other users’ personal information, re-spectively. Alice’s profile page displays Alice’s personal in-formation, which can be viewed by others. Alice’s feed pagedisplays other users’ personal information which Alice wouldlike to keep up with. The personal information in a user’s pro-file page can be categorized into three attribute sets: a) personalparticular set (PP set), b) social relationship set (SR set), and c)social activity set (SA set), according to who the user is, whomthe user interact with, and what the user does, respectively. Weshow corresponding personal information and attribute sets onFacebook, Google+, and Twitter in Table 1.

Alice’s PP set describes persistent facts about Alice in anOSN, such as gender, date of birth, and race, which usually donot change frequently. Alice’s SR set records her social rela-tionships in an OSN, which consist of an incoming list and an

outgoing list. The incoming list consists of the users who in-clude Alice as their friends while the outgoing list consists ofthe users whom Alice includes as her friends. In particular, onGoogle+, the incoming list and the outgoing list correspond to“have you in circles” and “your circles”, respectively. On Twit-ter, the incoming list and the outgoing list correspond to “fol-lowing” and “follower”, respectively. The social relationshipsin certain OSNs are mutual. For example, on Facebook, if Aliceis a friend of Bob, Bob is also a friend of Alice. In such a case,a user’s incoming list and outgoing list are the same, which arecalled friend list. Lastly, Alice’s SA set describes Alice’s socialactivities in her daily life. The SA set includes status messages,photos, links, videos, etc.

To enable users protect their personal information in thethree attribute sets, most OSNs provide privacy control, bywhich users may set up certain privacy rules to control the dis-closure of their personal information. Given a piece of personalinformation, the privacy rules specify who can/cannot view theinformation. A privacy rule usually contains two types of lists,white list, and black list. A white list specifies who can viewthe information while a black list specifies who cannot view theinformation. A white/black list could be local or global. If awhite/black list is local, this list takes effect on specific infor-mation only (e.g. an activity, age information, or gender infor-mation). If a white/black list is global, this list takes effect onall information in a user’s profile page. For example, if Alicewants to share a status with all her friends except Bob, Alicemay use a local white list which includes all Alice’s friends,as well as a local black list which includes Bob only. If Alicedoesn’t want to share any information with Bob, she may use aglobal black list which includes Bob.

To help users share their personal information and interactwith each other, most OSNs provide four basic functionalitiesincluding PUB, REC, TAG, and PUSH. The first three func-tionalities, PUB, REC, and TAG, mainly affect the personalinformation displayed in a user’s profile page, while the lastfunctionality PUSH makes some other users’ personal informa-tion appear in the user’s feed page. These basic functionalitiesare described as follows. We exclude any other functionalitieswhich are not relevant to our findings.

Alice can use PUB functionality to share her personal infor-mation with other users. As shown in Figure 1a, PUB displaysAlice’s personal information in her profile page. Other usersmay view Alice’s personal information in Alice’s profile page.

To help Alice make more friends in an OSN, REC is an es-sential functionality by which the OSN recommends to Alice alist of users that Alice may include in her SR set. The list of rec-ommended users is composed based on the social relationshipsof the users in Alice’s SR set. Considering an example shownin Figure 1b, Alice’s SR set consists of Bob while Bob’s SRset consists of Alice, Carl, Derek, and Eliza. After Alice logsinto her space, REC automatically recommends Carl, Derek,and Eliza to Alice who may update her SR set. If Alice intendsto include Carl in her SR set, Alice may need Carl’s approvaldepending on OSN implementations. Upon approval if needed,Alice can include Carl in her SR set. At the same time, Alice isautomatically included in Carl’s SR set. In particular, on Face-

2

Page 4: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Table 1: Types of Personal Information on Facebook, Google+, and TwitterAcronym Attribute set Facebook Google+ Twitter

PP Personal Particu-lars

Current city, hometown,sex, birthday, relation-ship status, employer,college/university, highschool, religion, politi-cal views, music, books,movies, emails, address,city, zip

Taglines, introduc-tion, bragging rights,occupation, employ-ment, education, placeslived, home phone,relationship, gender

Name, location,bio, website

SR Social Relation-ship (incominglist, outgoing list)

Friends, friends Have you in circles, yourcircles

Following, fol-lower

SA Social Activities Status message, photo,link, video, comments,like

Post, photo, comments,link, video, plus 1’s

Tweets

(a) Alice publishes her personal information (b) Bob’s social relationships are recommendedto Alice

(c) Alice tags Bob in her social activity (d) Bob’s personal information is pushed to Al-ice’s feed page when Bob publishes his personalinformation

Figure 1: Basic functionalities in OSNs

book, if Alice intends to include Carl in her SR set, Alice needsto get Carl’s approval. Upon approval, Alice includes Carl inher friend list. Meanwhile, Facebook automatically includesAlice in Carl’s friend list. On Google+, Alice can include Carlin her outgoing list without Carl’s approval. Then Google+ au-tomatically includes Alice in Carl’s incoming list. On Twitter,if Alice intends to include Carl in her SR set, Alice may needCarl’s approval depending on Carl’ option whether his approvalis required. Upon approval if required, Alice includes Carl inher incoming list. Then Twitter includes Alice in Carl’s outgo-ing list automatically.

To motivate users’ interactions, TAG functionality allows auser to mention another user’s name in his/her social activitieswhen the user publishes social activities in his/her profile page.In Figure 1c, when Alice publishes a social activity in her pro-file page, she can mention Bob in the social activity via TAG,which provides a link to Bob’s profile page (shown as a HTML

hyperlink).For the convenience of keeping up with the personal infor-

mation published by other users, OSNs provides feed page forusers. Considering an example in which Alice intends to keepup with Bob, Alice can subscribe to Bob, and Alice is calledBob’s subscriber. As Bob’s subscriber, Alice is included inBob’s SR set. In particular, on Facebook, a user’s subscribersare usually his/her “friends”. On Google+, a user’s subscribersare usually the users in his/her outgoing list, i.e. “your circles”.On Twitter, a user’s subscribers are usually the users in his/herincoming list, i.e. “follower”. Figure 1d shows that when Bobupdates his personal information via PUB and allows Alice toview the updated personal information, a copy of the updatedpersonal information is automatically pushed to Alice’s feedpage via PUSH. Then, Alice can view Bob’s updated personalinformation both in her feed page and in Bob’s profile page.

3

Page 5: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

3. Threat Model

The problem of PLPC investigates privacy leakage in a sys-tem where privacy control is enforced. Given a privacy controlmechanism, PLPC examines whether a user’s private personalinformation is leaked even if the user properly configures pri-vacy rules to protect the corresponding information.

The problem of PLPC in OSNs involves two parties, dis-tributor and receiver. A user who publishes and shares his/herpersonal information is a distributor while the user whom thepersonal information is shared with is a receiver. An adversaryis a receiver who intends to learn a distributor’s information thatis not shared with him. Correspondingly, the target distributoris referred to as victim.

Prior research [2, 4, 5] mainly focuses the inference ofundisclosed user information from their publicly shared infor-mation. Since the effectiveness of these inference techniqueswill be hampered by increasing user awareness of privacy con-cern [4], we further include insiders in our analysis. The adver-saries have the incentive to register as OSN users so that theymay directly access a victim’s private personal information orinfer the victim’s private personal information from other usersconnected with the victim in OSNs.

The capabilities of an adversary can be characterized ac-cording to two factors. The first factor is the distance betweenadversary and victim. According to privacy rules available inexisting OSNs, a distributor usually chooses specific receiversto share her information based on the distance between the dis-tributor and the receivers. Therefore, we classify an adversary’scapability based on his distance to a victim. Considering thesocial network as a directed graph, the distance between twousers can be measured by the number of hops in the shortestconnected path between the two users. An n-hop adversary canbe defined such that the length of the shortest connected pathfrom victim to adversary is n hops. We consider the followingthree types of adversaries in our discussion, 1-hop adversary,2-hop adversary, and k-hop adversary, where k > 2. On Face-book, they correspond to Friend-only, Friend-of-Friend, andPublic, respectively. On Google+, they correspond to Your-circles, Extended-circles, and Public, respectively. For ease ofreadability, we use friend, friend of friend, and stranger to rep-resent 1-hop adversary, 2-hop adversary, and k-hop adversary(where k > 2) adversaries respectively: 1) If an adversary is afriend of a victim, he is stored in the outgoing list in the victimSR set. The adversary can view the victim’s information that isshared with her friends, friends of friends, or all receivers in anOSN. However, the adversary cannot view the information thatis not shared with any receivers (e.g. the “only me” option onFacebook). 2) If an adversary is a friend of friend, he can viewthe victim’s information shared with her friend-of-friends or allreceivers. However, the adversary cannot view any informationthat is shared with friends only, or any information that is notshared with any receivers. 3) If an adversary is a stranger, hecan access the victim’s information that is shared with all re-ceivers. However, the adversary cannot view any informationwhich is shared with friends of friends and friends.

Besides the above restrictions, an adversary cannot view a

victim’s personal information if the adversary is included in thevictim’s black lists (e.g. “except” or “block” option on Face-book, and “block” option on Google+).

An adversary may have prior knowledge about a victim. Wewill specify the exact requirement of such prior knowledge fordifferent attacks in Section 5.

Since a user may use multiple OSNs, it is possible for anadversary to infer the user’s private data by collecting and an-alyzing the information shared in different OSNs. We excludesocial engineering attacks where a victim is deceived to dis-close her private information voluntarily. We also exclude pri-vacy leakage caused by improper privacy settings. These twocases cannot be addressed completely by any technical mea-sures alone.

4. Information Flows Between Attribute Sets In ProfilePages

In this section, we examine explicit and implicit informa-tion flows in OSNs. These information flows could leak users’private information to an adversary even after the users haveproperly configured the privacy rules to protect their informa-tion.

As analyzed in Section 2, the personal information sharedin a user’s profile page can be categorized into three attributesets including PP set, SR set, and SA set, which are illus-trated as circles in Figure 2. The attribute sets of multipleusers are connected within an OSN, where personal informa-tion may explicitly flow from a profile page to another profilepage via inter-profile functionalities, including REC (recom-mending) and TAG (tagging), as represented by solid arrowsand rectangles in Figure 2. It is also possible to access a user’spersonal information in PP set and SR set via implicit informa-tion flows marked by dashed arrows. The details about theseinformation flows are described below.

Attribute Set X

Functionality Y

Profile Page Z

Figure 2: Information flows between attribute sets

The first explicit flow is caused by REC, as shown in arrow(1) in Figure 2. REC recommends to an OSN user Bob a list ofusers according to the social relationships of the users includedin Bob’s SR set. Therefore, the undisclosed users included inAlice’s SR may be recommended to Bob via REC, if Bob isconnected with Alice.

The second explicit flow caused by TAG is shown in arrow(2) in Figure 2. A typical OSN user may mention the names ofother users in a social activity in SA set in his/her profile page

4

Page 6: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

via TAG, which creates explicit links connecting SA sets withindifferent profile pages.

The third flow is an implicit flow caused by the design ofinformation storage for SR sets, which is shown in arrow (3) inFigure 2. A user’s SR set stores his/her social relationships asconnections. From the perspective of information flow, a con-nection is a directional relationship between two users, includ-ing a distributor and his/her 1-hop receiver, i.e., friend. Thedirection of a connection represents the direction of informa-tion flow. Correspondingly, Alice’s SR set consists of an in-coming list and an outgoing list as defined in Section 2. Foreach user ui in Alice’s incoming list, there is a connection fromui to Alice. For each user uo in Alice’s outgoing list, there aconnection from Alice to uo. Alice can receive information dis-tributed from the users in her incoming list, and distribute herinformation to the users in her outgoing list. Given a connec-tion from Alice to Bob, Bob is included in the outgoing list inAlice’s SR set. Meanwhile Alice is included in the incominglist in Bob’s SR set. The social relationships in certain OSNssuch as Facebook are mutual. Such mutual relationship can beconsidered as a pair of connections linking two users with op-posite directions, similar to replacing a bidirectional edge withtwo equivalent unidirectional edges.

The fourth flow is an implicit flow related to PP set, whichis shown as the arrow (4) in Figure 2. Due to the homophilyeffect [6, 7], a user is more willing to connect with the userswith similar personal particulars compared to other users withdifferent personal particulars. This tendency can be used to linkPP sets of multiple users. For example, colleagues workingin the same department are often friends with each other onFacebook.

In addition to the above information flows, an OSN usermay simultaneously use multiple OSNs, and thus create otherinformation flows connecting the attribute sets of the same useracross different OSNs.

It is difficult to prevent privacy leakage from all these infor-mation flows. A user may be able to prevent privacy leakagecaused by explicit information flows by carefully using corre-sponding functionalities, as these flows are materialized onlywhen inter-profile functionalities are used. However, it is diffi-cult to avoid privacy leakage due to implicit information flows,as they are caused by inherent correlations among the informa-tion shared in OSNs. In fact, all these four information flows il-lustrated in Figure 2 correspond to inherent exploits, which willbe analyzed in Section 5 and 7. The existence of these informa-tion flows introduces a large attack surface for an adversary toaccess undisclosed personal information if any of these flowsis not properly protected. The existing privacy control mecha-nisms [8, 9] regarding data access within a profile page are notsufficient to prevent against privacy leakage. However, the fullcoverage of privacy control may not be feasible as it conflictswith social/business values of OSNs as analyzed in Section 7.

In this paper, we focus on the information flows from theattribute sets in a profile page to the attribute sets in anotherprofile page, which may lead to privacy leakage even if usersproperly configure their privacy rules. There may exist otherexploitable information flows leading to privacy leakage, which

are left as our future work.

5. Exploits, Attacks, And Mitigations

In this section, we analyze the exploits and attacks whichmay lead to privacy leakage in existing OSNs even if privacycontrols are enforced. We organize the exploits and attacksaccording to their targets, which could be a victim’s PP set,SR set, and SA set. We also investigate necessary conditionsregarding prevention of privacy leakage due to the identifiedexploits and attacks. The proofs of the necessary conditionsare available in Appendices. Based on these necessary condi-tions, we provide suggestions on mitigating the correspondingexploits and attacks. All of our findings have been verified inreal-world settings on Facebook, Google+, and Twitter1.

5.1. PP SetA user’s PP set describes persistent facts about who the user

is. The undisclosed information in PP set protected by exist-ing privacy control mechanisms can be inferred by the follow-ing inherent exploits, namely inferable personal particular andcross-site incompatibility.

5.1.1. Inferable Personal ParticularHuman beings have the tendency to interact with others

who share the same or similar personal particulars (such asrace, organization, and education). This assumption is calledhomophily [6, 7]. Due to homophily, users are connectedwith those who have similar personal particulars at higher ratethan with those who have dissimilar personal particulars. Thiscauses an inherent exploit named inferable personal partic-ulars, which corresponds to the information flow shown asdashed arrow (4) in Figure 2.

Exploit 1: If most of a victim’s friends have common or similarpersonal particulars (such as employer information), it couldbe inferred that the victim may have the same or similar per-sonal particulars.

An adversary may use Exploit 1 to obtain undisclosed per-sonal particulars in a victim’s PP set. The following is a typicalattack on Facebook.

Attack 1: Considering a scenario on Facebook shown in Fig-ure 3, where Bob, Carl, Derek, and some other users are Alice’sfriends, and Bob is a friend of Carl, Derek, and most of Alice’sfriends (Note that in Figure 3, a solid arrow connects from a dis-tributor to a friend of the distributor). Alice publishes her em-ployer information “XXX Agency” in her PP set and allows Carland Derek only to view her employer information. However,most of Alice’s friends may publish their employer informationand allow their friends to view this information due to differentperceptions in privacy protection. In this setting, Bob can col-lect the employer information of Alice’s friends and infer thatAlice’s employer is “XXX Agency” with high probability.

1All of our experiments were conducted from September, 2011 to Septem-ber, 2012

5

Page 7: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Figure 3: Alice and most of her friends have common personal particulars (e.g.employer information)

The above attack works on Facebook, Google+, and Twit-ter. The attack can be performed by any adversary who hastwo types of knowledge. The first type of knowledge includesa large portion of users stored in the victim’s SR set. The sec-ond type of knowledge includes the personal particulars of theseusers. The attack may lead to the leakage of the personal partic-ulars including employer, university, current city, religion, etc.Other personal particulars, including gender, age, and relation-ship status, may not be reliably inferred via the above attack.But they can be leaked based on additional context informationsuch as users’ and their friends’ names, social activities, and in-terests [10]. To prevent against privacy leakage due to Exploit1, the following necessary condition should be satisfied

Necessary Condition 1: Given a subset U = u1, u2, ..., unof a victim v’s SR set in an OSN and personal particular valueppui

(ppui6= null) of each receiver ui ∈ U which are obtained

by an adversary, there exists at least one personal particularvalue pp such that |Upp| ≥ |Uv| and pp 6= ppv where ppv isthe victim’s personal particular value and Upp = ui|(ui ∈U) ∧ (ppui = pp) and Uv = uj |(uj ∈ U) ∧ (ppuj = ppv).

To satisfy Necessary Condition 1, the following mitigationsare suggested.

Mitigation 1: If a victim publishes information in her PP setand allows a set of receivers to view the information, the pri-vacy rules chosen by the victim should be propagated to allusers in the victim’s SR set who have similar or common infor-mation in their PP sets.

Mitigation 2: A victim should intentionally set up a certainnumber of connections with other users who have different per-sonal particulars.

5.1.2. Cross-site incompatibilityIf a user publishes personal information in multiple OSNs,

she may employ different privacy control rules provided by dif-ferent OSNs. This causes an inherent exploit named cross-siteincompatibility.

Exploit 2: Personal information could be inferred in multipleOSNs if it is protected by incompatible privacy rules in differentOSNs.

The incompatibility of privacy rules in different OSNs isdue to: 1) inconsistent privacy rules in different OSNs, 2) dif-ferent social relationships in different OSNs, and 3) different

privacy control mechanisms in different OSNs (e.g. differentprivacy control granularities). Due to Exploit 2, an adversarymay obtain a victim’s personal particulars which are hiddenfrom the adversary in one OSN but are shared with the adver-sary in another OSN. The following is an exemplary attack onFacebook and Google+.

Attack 2: Bob is Alice’s friend on both Google+ and Face-book. On Google+, Alice publishes her gender information inher PP set and shares this information with some friends butnot including Bob. On Facebook, Alice publishes her genderinformation and allows all users to view this information be-cause Facebook allows her to share it with either all users orno users. Comparing Alice’s personal information publishedon Facebook and Google+, Bob is able to know Alice’s genderpublished on Facebook which is not supposed to be viewed byBob on Google+.

Any adversary can perform this attack to infer personal in-formation in a victim’s PP set from multiple OSNs. This ex-ploit can also be used to infer undisclosed information in SRset and SA set. To prevent privacy leakage due to Exploit 2, thefollowing necessary condition needs to be satisfied.Necessary Condition 2: Given a set of privacy rules PR =pr1, pr2, ..., prn and pri = (wli, bli) where pri is the pri-vacy rule for a victim’s personal particular published in OSNi,wli is a set of all receivers in a white list, and bli is a set of allreceivers in a black list for i ∈ 1, 2, ..., n, the following con-dition holds: for any i, j ∈ 1, 2, ..., n, wli \ bli = wlj \ blj .2

To satisfy Necessary Condition 2, the following mitigationstrategies can be applied.

Mitigation 3: A victim should share her personal informationwith the same users in all OSNs.

Mitigation 4: If different OSNs provide incompatible privacycontrol on certain personal information, a victim should choosea privacy rule for this information under two requirements: 1)the privacy rule can be enforced in all OSNs; 2) the privacyrule is at least as rigid as the privacy rules which the victimintends to choose in any OSNs.

5.2. SR Set

A user’s SR set records social relationships regarding whomthe user knows. The undisclosed information in SR set pro-tected by existing privacy control mechanisms can be inferredby two inherent exploits, namely inferable social relationshipand unregulated relationship recommendation.

5.2.1. Inferable Social RelationshipOSNs provide SR set for a user to store the lists of the users

who have connections with him/her. If there exists a connectionfrom Alice to Carl, then Carl is recorded in the outgoing list inAlice’s SR set while Alice is recorded in the incoming list in

2Given a privacy rule pr = wl, bl with a white list wl and a black list bl,only the receivers who are in white list and are not in black list (i.e. any recieveru ∈ wl \ bl ) are allowed to view the protected information.

6

Page 8: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Carl’s SR set. The connection between Alice and Carl is storedin both Alice’s SR set and Carl’s SR set. This causes an inherentexploit named inferable social relationship, which correspondsto the information flow shown as dashed arrow (3) in Figure 2.

Exploit 3: Each social relationship in a victim’s SR set indi-cates a connection between the victim and another user u. Useru’s SR set also stores a copy of this relationship for the sameconnection. The social relationship in the victim’s SR set canbe inferred from the SR set of another user who is in the victim’sSR set.

An adversary may use Exploit 3 to obtain undisclosed so-cial relationships in a victim’s SR set, which is shown in thefollowing exemplary attack on Facebook.

Attack 3: Figure 4 shows a scenario on Facebook, where Bobis a stranger to Alice, and Carl is Alice’s friend. Alice shares herSR set with a user group including Carl. Bob guesses Carl maybe connected with Alice, but cannot confirm this by viewingAlice’s SR set as it is protected against him (who is a strangerto Alice). However, Carl shares his SR set to the public due todifferent concerns in privacy protection. Seeing Alice in Carl’sSR set, Bob infers that Carl is Alice’s friend.

Figure 4: Alice’s social relationships flow to Carl’s SR set

Although the adversary is assumed to be a stranger in theabove attack, any adversary with stronger capabilities can uti-lize Exploit 3 to perform the attack as long as he has two typesof knowledge: 1) a list of users in the victim’s SR set; 2) so-cial relationships in these users’ SR sets. This attack could bea stepping stone for an adversary to infiltrate a victim’s socialnetwork. Once the adversary discovers a victim’s friends andestablishes connections with them, he becomes a friend of thevictim’s friends. After that, he has a higher probability to be ac-cepted as the victim’s friend, as they have common friends [11].To prevent privacy leakage caused by Exploit 3, the followingnecessary condition should be satisfied.Necessary Condition 3: Given a victim v’s privacy rule prv =(wlv, blv) for her SR set, a set of all users U = u1, u2, ..., unincluded in the victim’s SR set in an OSN, and a set of privacyrules PR = pr1, pr2, ..., prn where each pri = (wli, bli) isthe privacy rule for ui’s SR set with white list wli and blacklist bli, the following condition holds: for all i ∈ 1, 2, ..., n,wli \ bli ⊆ wlv \ blv .

To satisfy Necessary Condition 3, the following mitigationstrategy can be applied.

Mitigation 5: Let U = u1, u2, ..., um denote the set of usersin a victim’s SR set. If the victim shares her SR set with a setof receivers, then each user ui ∈ U should share the socialrelationship between the user and the victim in the user’s SRset with the same set of receivers only. Since most of existingOSNs use coarse-grained privacy rules to protect social rela-

tionships in SR set, all users in the victim’s SR set should sharetheir whole SR sets with the same set of receivers chosen by thevictim in order to prevent privacy leakage.

5.2.2. Unregulated Relationship RecommendationTo help a user build more connections, most OSNs provide

REC functionality to automatically recommend a list of otherusers whom this user may know. The recommendation list isusually calculated based on the relationships in SR set but notregulated by the privacy rules chosen by the users in the recom-mendation list. This causes an inherent exploit named unreg-ulated relationship recommendation, which corresponds to theinformation flow shown as solid arrow (1) in Figure 2.

Exploit 4: All social relationships recorded in a victim’s SR setcould be automatically recommended by REC to all users in thevictim’s SR set, irrespective of whether or not the victim usesany privacy rules to protect her SR set.

An adversary may use Exploit 4 to obtain undisclosed so-cial relationships in a victim’s SR set, which is shown in thefollowing attack on Facebook.

Attack 4: On Facebook, Bob is a friend of Alice, but not in auser group named Close Friends. Alice shares her SR setwith Close Friends only. Although Bob is not allowed toview Alice’s social relationships in her SR set, such informa-tion is automatically recommended by REC to Bob as “users hemay know”. If Bob is connected with Alice only, the recom-mendation list consists of the social relationships in Alice’s SRset only.

The recommendation list generated by REC may be af-fected by other factors such as personal particulars and inter-ests, which may bring noise in social relationships. To mini-mize such noise, Bob could temporarily delete all his personalparticulars and stay connected with the victim only.

The attack may happen on both Facebook and Google+ aslong as an adversary is a friend of a victim. There is no priorknowledge required for this attack. The attack on Google+ issimilar to the attack on Facebook but with a slight difference.On Facebook, the adversary cannot be connected with the vic-tim unless the victim agrees since the relationship is mutual. Bycontrast, the adversary can set up a connection with the victimon Google+ without getting approval from the victim becausethe connection is unidirectional. This may make it easier forthe adversary to obtain social relationships in the victim’s SRset via REC.

We have reported Exploit 4 to Facebook and got confirma-tion from them. Exploit 4 occurs because REC functionality isimplemented in a separate system not regulated by privacy con-trol of Facebook. To prevent privacy leakage due to Exploit 4,the following necessary condition should be satisfied.Necessary Condition 4: Given a privacy rule pr = (wl, bl)with white list wl and black list bl for a victim’s SR set in anOSN and a set of all users U included in the SR set, the follow-ing condition holds: U ⊆ wl \ bl.

To satisfy Necessary Condition 4, the following mitigationstrategy can be applied.

7

Page 9: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Mitigation 6: Let U = u1, u2, ..., um denote the set of usersin a victim’s SR set. If the victim shares her SR set with a setof users U ′ ⊆ U only, the victim should remove any users inU \ U ′ from her SR set in order to mitigate privacy leakagecaused by REC.

5.3. SA Set

A user’s SA set contains social activities about what the userdoes. The undisclosed information in SA set protected by ex-isting privacy control mechanisms can be inferred due to thefollowing inherent exploits and implementation defects, includ-ing inferable social activity, ineffective rule update, and invalidhiding list.

5.3.1. Inferable Social ActivityIf two users are connected in OSNs, a user’s name can be

mentioned by the other in a social activity via TAG such thatthis social activity provides a link to the profile page of thementioned user. Such links create correlations among all theusers involved in the same activity. This causes an inherent ex-ploit named inferable social activity, which corresponds to theinformation flow shown as solid arrow (2) in Figure 2.

Exploit 5: If a victim’s friend uses TAG to mention the victimin a social activity published by the victim’s friend, it impliesthat the victim may also attend the activity, which is indicatedby the link created by TAG pointing to the victim’s profile page.Although this activity may involve the victim, the visibility ofthis activity is solely determined by the privacy rules specifiedby the victim’s friend who publishes the activity, which is out ofthe control of the victim.

An adversary may use Exploit 5 to obtain undisclosed socialactivities in a victim’s SA set, which is shown in the followingattack on Facebook.

Attack 5: Figure 5 shows a scenario on Facebook, where Boband Carl are Alice’s friends, and Bob is Carl’s friend. Alicepublishes a social activity in her SA set regarding a party whichCarl and she attended together and she allows Carl only to viewthis social activity. However, Carl publishes the same socialactivity in his SA set and mentions Alice via TAG. Due to dif-ferent concerns in privacy protection, Carl allows all his friendsto view this social activity. By viewing Carl’s social activity,Bob can infer that Alice attended this party.

connect to

flow to

Figure 5: Alice’s social activities flow to Carl’s SA set

This attack works on Facebook, Google+, and Twitter. Anyadversary can perform this attack if he knows the social activi-ties published by the victim’s friends pointing to the victim via

TAG. To prevent privacy leakage due to Exploit 5, the followingnecessary condition should be satisfied.Necessary Condition 5: Given a privacy rule pru = (wlu, blu)for an activity where a victim v is tagged by her friend u in anOSN and v’s intended privacy rule prv = (wlv, blv) for theactivity, the following condition holds: wlu \ blu ⊆ wlv \ blv .

To satisfy Necessary Condition 5, the following mitigationstrategy can be applied.

Mitigation 7: If a victim is mentioned in a social activity inanother user’s SA via TAG, the victim should be able to specifyadditional privacy rules to address her privacy concerns evenwhen the social activity is not in her profile page.

5.3.2. Ineffective Rule UpdateIt is common in OSNs that users regret sharing their social

activities with wrong audience. Typical reasons include beingin state of high emotion or under influence of alcohol [12]. It isnecessary to allow users to correct their mistakes by revokingthe access rights of those unwanted audience. Once the accessright of viewing a particular social activity is revoked, a receivershould not be able to view the activity protected by the updatedprivacy rule. On Facebook, a user can remove a receiver fromthe local white list specifying who is allowed to view a socialactivity or add the receiver to the local black list for the activity.Google+ and Twitter currently do not provide local black listsfor individual social activities. A user may remove a receiverfrom the white list or from a user group if the user group is usedto specify the scope of the white list (e.g. sharing a social ac-tivity within a circle on Google+). However, if a user’s socialactivity has been pushed to her subscribers’ feed pages, the up-date of privacy rules on Google+ and Twitter does not apply tothis social activity in feed pages. This causes an implementa-tion defect named ineffective rule update.

Exploit 6: Once a victim publishes a social activity, the so-cial activity is immediately pushed to the feed pages of the vic-tim’s subscribers who are allowed to view the social activityaccording to the victim’s privacy rule. Later, even after the vic-tim changes the privacy rule for this activity to disallow a sub-scriber to view this activity, the social activity still appears inthis subscriber’s feed pages on Google+ and Twitter. The cur-rent implementation of Google+ and Twitter enforces a privacyrule only when a social activity is published and pushed to cor-responding subscribers’ feed pages. Updated privacy rules arenot applied to the activities which have already been pushed tofeed pages (see Figure 6).

An adversary may use Exploit 6 to obtain undisclosed socialactivities in a victim’s SA set without the victim’s awareness.Below shows a typical attack on Google+.

Attack 6: On Google+, Bob is Alice’s friend and subscriber.Alice publishes a social activity and allows her friends in groupClassmate only to view the activity. Alice assigned Bob tothe group Classmate by mistake and realized this mistakeafter publishing the activity. Then, Alice removed Bob fromthe group. However, Bob can still view this social activity as ithas already been pushed to his feed page.

8

Page 10: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Figure 6: Privacy control doesn’t enforce the updated privacy rule to a socialactivity that has been pushed to a feed page.

The above attack can happen on Google+ and Twitter. Toperform the attack, an adversary should be the victim’s friendand subscriber. The attack doesn’t work on Facebook as pri-vacy control in Facebook always actively examines whetherprivacy rule for a social activity is updated. If a privacy ruleis updated, the privacy control is immediately applied to thesocial activity in corresponding feed pages. Consequently, thesocial activity is removed from the feed pages. To prevent thisattack in certain OSNs such as Google+ and Twitter, the follow-ing mitigation strategy can be applied.

Mitigation 8: If a victim mistakenly shares a social activitywith an unintended receiver, instead of changing the privacyrules, the victim should delete the social activity as soon as pos-sible so that the social activity is removed from all feed pages.

Note that Mitigation 8 is not effective unless the deletionof the social activity takes place before an adversary views thesocial activity. If the adversary views the social activity beforeit is deleted, the adversary could keep a copy of this activity,which cannot be prevented.

5.3.3. Invalid Hiding ListTo support flexible privacy control, many OSNs enable

users to use black lists so as to hide information from specificreceivers. On Facebook, a local black list is called hiding list.Using hiding list, a user may apply fine-grained privacy controlon various types of personal information. However, the hidinglists take no effect except for the user’s friends. This causes animplementation defect named invalid hiding list.

Exploit 7: In certain OSN, a victim may include some of herfriends in hiding lists to protect her personal information. How-ever, when a friend breaks his relationship with the victim, theOSN automatically removes him from the hiding lists as thefriend relationship terminates. Releasing from hiding lists, thisformer friend is allowed to view the victim’s protected informa-tion if he is not restricted by other privacy rules.

The implementation defect behind this exploit creates afalse impression on the effectiveness of hiding lists. An adver-sary may use Exploit 7 to obtain undisclosed social activitiesin a victim’s SA set without the victim’s awareness. A typicalattack on Facebook is given below.

Attack 7: On Facebook, Bob and Carl are Alice’s friends.Bob is Carl’s friend, which means Bob is also a friend of Al-ice’s friend. Alice publishes a social activity which allows herfriends and her friends-of-friends to view, except that Bob is

added to the hiding list of this activity. Although Bob cannotview this activity under the current privacy rule, he can breakhis connection with Alice. Then, he is automatically removedfrom the hiding list. After that, Bob is able to view the undis-closed activity since he is a friend of Alice’s friend.

Note that this attack does not work on Google+ and Twitterbecause their current privacy control mechanisms do not sup-port any local black lists. Also note Exploit 7 can be exploitedto target at not only SA set, but also PP set and SR set.

We have reported Exploit 7 to Facebook and received a con-firmation from them3. To prevent this attack in affected OSNssuch as Facebook, the following mitigation strategy can be ap-plied.

Mitigation 9: A victim should avoid using hiding lists whenprotecting personal information. Instead, a victim may usewhite lists or global black lists in forming privacy rules.

6. Analysis of Potentially Vulnerable Users

A user’s personal information in OSNs could be leaked toadversaries who acquire necessary capabilities to perform theattacks, which have been discussed in Section 5. The effective-ness of the attacks can be affected by users’ and their friends’sharing behaviors in OSNs. To investigate the users who can bevulnerable to these attacks, we conducted an online survey andcollected users’ usage data on Facebook, Google+, and Twitter.In this section, we first describe the design of the online survey.We then present the demographic data collected in the survey.Based on the survey results, we analyze how widely the usersin OSNs can be vulnerable to the corresponding attacks.

6.1. MethodologyThe participants to our online survey are mainly recruited

from undergraduate students in our university. We mainly focuson young students in our survey because they are active usersof OSNs. Our study shows that they are particularly vulnerableto the privacy attacks. Each participant uses at least one OSNamong Facebook, Google+, and Twitter.

The survey questionnaire consists of four sections includ-ing 37 questions in total. In the first section, we gave an ini-tial set of demographic questions and a set of general questionssuch as participants’ awareness on privacy and what OSNs (i.e.Facebook, Google+, and Twitter) they use. All the participantsneed to answer the questions in the first section. In the follow-ing three sections, questions about participants’ knowledge andprivacy attitude towards Facebook, Google+, and Twitter areraised, respectively. Each participant only needs to answer thequestions which are relevant to them in these three sections.

6.2. DemographicsThere are 97 participants in total, among which 60 partici-

pants reported being male, and 37 reported female. Our partic-ipants’ age ranges from 18 to 31, with an average of 22.7.

3Exploit 7 has been fixed by Facebook in 2013.

9

Page 11: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

All of the 97 participants are Facebook users, among whom95 participants have been using Facebook for more than 1 year,and 2 have been using Facebook for less than 1 month. Abouta half participants (41/97) are Google+ users, among whom 23participants have been using Google+ for more than 1 year, 13have been using Google+ for about 1 month - 1 year, and 5 havebeen using Google+ for less than 1 month. Similarly, abouta half participants (40/97) are Twitter users, among whom 36participants have been using Twitter for more than 1 year, 3have been using Twitter for about 1 month - 1 year, and 1 hasbeen using Twitter for less than 1 month.

6.3. Attacks To PP SetTo obtain the undisclosed personal information in a victim’s

PP set, adversaries could exploit the inferable personal particu-lars and cross-site incompatibility to launch two correspondingattacks as discussed below.

6.3.1. Inferable Personal ParticularsAs discussed in Section 5.1.1, due to inferable personal par-

ticular (Exploit 1), a victim and most of his/her friends mayshare common or similar personal particulars. Our study re-sults show that 71% of the Facebook users are connected withtheir classmates on Facebook; 78% of the Google+ users areconnected with their classmates on Google+; and 73% of theTwitter users are connected with their classmates on Twitter.

Via Exploit 1, an adversary could perform Attack 1 and in-fer a victim’s personal particular from the personal particularsshared by most of her friends. To perform Attack 1, two typesof knowledge are required: a large portion of users stored in thevictim’s SR set and their personal particulars.

The protection of the victim’s SR set could help prevent theadversary from obtaining the victim’s relationships. Unfortu-nately, our study shows that 22% of the Facebook users, 39%of the Google+ users, and 35% of the Twitter users choose the“Public” privacy rule or the default privacy rule4 for their socialrelationships, which means that these users share their socialrelationships with the public. Moreover, the OSNs users mayconnect to strangers. According to our study, 60% of the Face-book users, 27% of the Google+ users, and 30% of the Twitterusers have set up connections with strangers, which leave theirSR set information vulnerable to Exploit 4 (unregulated rela-tionship recommendation) as discussed in Section 5.2.2.

The privacy rules for personal particulars of the victim’sfriends can be set to prevent the adversary from obtaining thesecond type of knowledge required in Attack 1. However, thevictim’s personal particulars can be exposed to threats if his/herfriends publicly share their personal particulars. In our study,43% of the Facebook users, 44% of the Google+ users, and48% of the Twitter users share their personal particular publiclybecause they choose the “Public” privacy rule or the default pri-vacy rule5.

4Facebook, Google+, and Twitter set “Public” as default privacy rule for theSR set of each user

5Facebook, Google+, and Twitter set “Public” as the default privacy rule foreach user’s personal particulars such as “university” information

6.3.2. Cross-site IncompatibilityUsers may use multiple OSNs at the same time. According

to our survey, 54 out of 97 participants use at least two OSNsas shown in Figure 7. And 27 participants publish their posts inmore than one OSN at the same time as shown in Figure 8. Ifa user publishes personal information in multiple OSNs, he/shemay set different privacy control rules vulnerable to Exploit 2,i.e. cross-site incompatibility.

27

27

43 3 OSNs

2 OSNs

1 OSN

Figure 7: Participants’ usage of multiple OSNs

16

11

70

3 OSNs

2 OSNs

1 OSN

Figure 8: Participants’ publishing posts in multiple OSNs

Due to Exploit 2, an adversary can perform Attack 2 if thevictim shares her personal information with the adversary inany OSN site. This attack is due to three reasons.

The first reason is that users employ inconsistent privacyrules in different OSNs. The results of our study show that 27out of 97 participants use inconsistent privacy rules to protecttheir gender information, 25 participants use inconsistent pri-vacy rules to protect their university information, and 21 par-

10

Page 12: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

ticipants use inconsistent privacy rules to protect their politicalview information.

The second reason is that users maintain different socialrelationships in different OSNs. According to the study, 59out of 97 participants reported that their social relationships onFacebook, Google+, and Twitter are different. Therefore, eventhough users protect their information by the same privacy ruleson multiple OSNs, an adversary can still obtain their informa-tion if he can exploit this vulnerability.

The third reason is the difference between privacy controlmechanisms in different OSNs. The protection of gender infor-mation is a typical example which is discussed in Section 5.1.2.

6.4. Attacks To SR Set

Adversaries could obtain social relationships in a victim’sSR set through two exploits, which are inferable social rela-tionship and unregulated recommendation.

6.4.1. Inferable Social RelationshipInferable social relationship (Exploit 3) is caused by the

storage format of social relationships in SR set as explained inSection 5.2.1. If two users set up a relationship with each other,then each of them stores a copy of the relationship in his/her SRset and choose a privacy rule to protect his/her SR set.

Via Exploit 3, an adversary could perform Attack 3 giventwo types of knowledge, including a list of users in the victim’sSR set and the social relationships in these users’ SR set. There-fore, the protection of the social relationships in the victim’s SRset depends on the privacy rules for the SR sets of the users inthe victim’s SR set. Unfortunately, as mentioned in 6.3.1, 22%of the Facebook users, 39% of the Google+ users, and 35% ofthe Twitters share their SR sets publicly. These users reveal so-cial relationships with their friends publicly regardless of theprivacy rules for their friends’ SR sets.

6.4.2. Unregulated Relationship RecommendationREC functionality helps users establish more social rela-

tionships. According to our study, 71 out 97 Facebook users,21 out of 41 Google+ users, and 17 out of 40 Twitter users haveused REC functionality in OSNs. Unregulated relationship rec-ommendation (Exploit 4) could leak all social relationships in auser’s SR set due to automatic relationship recommendation ofREC.

By Exploit 4, an adversary can perform Attack 4 to ob-tain all social relationships in a victim’s SR set on Facebookor Google+ if the adversary manages to become a “friend” ofthe victim.

As shown in Figure 9, 4% of the Facebook users and 7%of the Google+ users choose to share their SR set with a propersubset of their friends6. Exploit 4 explicitly violates these users’privacy rules.

Although most of the Facebook users and the Google+ usersshare their SR sets with friends, friends of friends, or public,

6An empty subset corresponds to the privacy rule “Only me”.

0% 20% 40% 60% 80% 100%

Twitter

Google+

Facebook

Default privacy rule

Public

Friends of friends

Friends

Subset of friends

NIL

Figure 9: Privacy rules for participants’ SR sets in OSNs

their selection of privacy rules may contradict their privacy at-titude.

In Figure 9, 53% of the Facebook users share their SR setswith friends of friends or publicly7. Among the Facebook userswho share their SR sets with friends of friends or public, 88%of them address concerns about their social relationships beingrevealed to others whom they don’t know.

Among the Google+ users, 36% of them share their SR setswith friends of friends or the public. However, 71% of theGoogle+ users who share their SR sets with friends of friendsor the public are not willing to reveal their social relationshipsto strangers.

As shown in our survey, 43% of the Facebook users and20% of the Google+ users have concerns about revealing theirsocial relationships to strangers but ever including strangers totheir SR sets. This may leak the users’ social relationships tothe strangers irrespective of any privacy rules chosen to protecttheir SR sets.

6.5. Attacks To SA Set

To obtain social activity information in a victim’s SA set,adversaries could perform 3 attacks due to three exploits includ-ing inferable social activity, ineffective rule update, and invalidhiding list.

6.5.1. Inferable Social ActivityIn OSNs, if a user is mentioned in his/her friends’ social ac-

tivity via TAG, the privacy rule for the activity is determined bythe friends and out of this user’s control. This leads to inferablesocial activity (Exploit 5).

Via Exploit 5, an adversary may infer a victim’s social ac-tivities from the victim’s friends’ SA set. As shown in Fig-ure 10, 99% of the Facebook users, 44% of the Google+ users,and 78% of the Twitter users have experience of being taggedin activities. On the other hand, 36% of the Facebook users,

7The “Public” privacy rule and the default privacy rule lead to sharing SRset publicly

11

Page 13: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

34% of the Google+ users, and 40% of the Twitter users haveconcerns about being tagged in certain activities published bytheir friends without any negotiations. Since their friends de-termine the visibility of the activities, these users can informtheir friends of their concerns. Our results show that 82% ofthe Facebook users, 73% of the Google+ users, and 73% of theTwitter users will inform their friends of their concerns if theydon’t agree on being tagged by their friends. The rest of themkeep silent even though their privacy could be violated.

0% 20% 40% 60% 80% 100%

Facebook

Google+

Twitter

Have been mentioned

Have NOT been mentioned

Figure 10: Participants being mentioned in OSNs

6.5.2. Ineffective Rule UpdateAs discussed in Section 5.3.2, if a user changes his/her pri-

vacy rules for social activities, the updated privacy rules do notapply to the activities which have been pushed to the feed pagesof the user’s subscribers. This is named as ineffective rule up-date (Exploit 6).

Via Exploit 6, an adversary could perform Attack 6 onGoogle+ and Twitter and obtain a victim’s activities which areshared with the adversary before privacy rules update.

Changing privacy rules may occur if users regret publishingtheir activities. According to our study, 15% of Google+ usersand 15% of Twitter users have experience of regretting publish-ing their posts. As shown in Figure 11, 20% of the Google+users choose to change their privacy rules if they regret sharingactivities, while 38% of the Twitter users choose to change theirprivacy rules by turning on the protect my tweets option if theyregret sharing such activities.

To mitigate Exploit 6, users may delete the activities theyregret sharing as soon as possible. We found that 61% of theGoogle+ users and 23% of the Twitter users choose to do so.

6.5.3. Invalid Hiding ListOn Facebook, if a user protects his/her social activity by

using a hiding list including the user’s friends, these friendswill be automatically removed from the hiding list after they

8

25

3

11

15

9

4

13

0

5

10

15

20

25

30

Change privacy rule

Delete activity

Comment Nothing

Google+

Twitter

Figure 11: Participants’ actions if regretting sharing activities

terminate their relationships with the user. This is referred to asthe invalid hiding list (Exploit 7).

Via Exploit 7, an adversary could perform Attack 7 to ob-tain a victim’s social activities if the victim uses the “friendsof friends” privacy rule with a hiding list containing the adver-sary. Our study shows that 54% of the Facebook users haveever used the “friends of friends” privacy rule with a hidinglist that includes their friends when they publish activities. Toevaluate the awareness of the risks caused by using the invalidhiding list, we summarized participants’ confidence level re-garding whether their activities are hidden from their friendswho are included in their hiding lists on Facebook. As shownin Figure 12, 31% (30 out of 97) of the Facebook users feelconfident in the effectiveness of the hiding list on Facebook. Ifattack 7 happens, these participants may misunderstand the va-lidity of the hiding lists and still believe that their activities arehidden from their friends included in the hiding lists.

7. Discussion

On the surface, our exploits are caused by the inconsisten-cies between privacy control and functionalities of OSN. In fact,these inconsistencies reflect the conflicts between users’ inten-tion on privacy protection and social/business values of OSNs.We discuss the implications of these conflicts and the impactson users’ sharing behaviors due to the conflicts in this section.

Most of the functionalities involved in our exploits are es-sential in OSNs. These functionalities deal with personal par-ticulars, social relationships, and social activities. While thesocial values of these functionalities should be preserved froma user’s perspective, they are restricted due to privacy controls.

First, exhibiting personal particulars is an important featurefor social recognition. Most OSNs encourage users to sharegenuine information about their personal particulars in order tofoster trust and respect in OSNs [13]. This would help users dis-cover new relationships with those who have similar interests.

12

Page 14: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

14 11

42

21

9

0

5

10

15

20

25

30

35

40

45

1Not

ConfidentAt All

2 3Neutral

4 5Very

Confident

Figure 12: Users’ confidence in validity of Facebook hiding list

This is explained by the homophily theory [6, 7], which statesthat a human being is more willing to interact with others whohave similar personal particulars such as race, organization, andeducation. Meanwhile, the implicit connections among usersmay be exploited to infer undisclosed personal particulars. Ac-cording to [14, 2, 15], 62% of users consider that their personalparticulars published in OSNs are sensitive. To mitigate thisthreat for these users, mitigations 1 and 2 require them to con-nect with other dissimilar users which they may not even like.

Second, maintaining and expanding social relationships isone of the major benefits of OSNs. As socially-oriented be-ings, humans have a desire to stay connected so that they havea sense of communion with others [16]. This desire is ad-dressed in OSNs with the relationship list and the recommenda-tion function. Although the public display of a user’s relation-ship list may disclose certain private information, it also helpsbuild more connections in OSNs. If a user’s profile contains alarge number of connections, it brings satisfactory social recog-nition for the user [17]. The recommendation function furthermakes it easier to establish new connections based on relation-ship lists and other information. This is especially importantfor new users to make friends in OSNs. The current recom-mendation function operates according to the small-world the-ory [11], which states that two connected users are likely tohave common friends who have not yet recorded in their cur-rent relationship lists. This function can also be exploited by anadversary to enumerate all social relationships of a victim. 45%of users believe that their social relationships in OSNs are sensi-tive [14, 15]. Since the disclosure of the social relationships canbe a stepping stone for advance attacks on personal particularsand social activities, the protection of the social relationshipsis also important to those who consider that personal particu-lars and social activities are sensitive [14, 15, 2]. To mitigatethe privacy leakage about social relationships, a user may usemitigations 5 and 6. The consequences of applying these miti-gation strategies are: 1) If a user sets up a strict privacy rule onhis relationship list, this rule should propagate to all users in his

relationship list. 2) The effectiveness of the recommendationfunction would be significantly influenced by such mitigations.

Third, sharing social activities is an important part of hu-man social life. Human beings are curious about what happenaround them. They would like to understand the surroundingenvironment by knowing how other people behave, think, andfeel [18]. OSNs enable users to receive the activities publishedby other users to cure such curiosity. On the other hand, userswho publish activities feel rewarded due to attentions of otherusers, which is usually interpreted as a sign for social recog-nition [19]. Since a social activity usually involves multipleusers, sharing this activity may conflict with the privacy con-cern of these users. The social activities are considered as sen-sitive information by 66% of users [14, 15]. For these users,in order to mitigate this threat, the scope of privacy control inOSNs should be extended as mentioned in mitigation 7, whichenforces privacy control to an activity no matter who publishesit. An application of mitigation 7 can be privacy policy nego-tiation mechanisms [20, 21] which seeks tradeoff between theusers’ privacy intention and their desire to share social activityby requiring the users to choose their privacy rules and sensitiv-ity of each activity before publishing the activity. However, thismay frustrate users who intend to share that activity, and mightbe difficult to achieve due to the incompatibility among privacycontrol mechanisms in different OSNs. As suggested in miti-gations 3 and 4, a user may choose a strict privacy rule so as toachieve his privacy objective. However, this may significantlyrestrict the sharing nature of OSNs.

While OSN users are concerned with the social values ofOSN functionalities, OSN service providers are more con-cerned with business values. As a company, the first prior-ity of an OSN service provider is to generate revenue. How-ever, most existing OSN service providers do not charge theirusers. As Andrew Lewis pointed out, “If you’re not paying forsomething, you’re not the customer; you’re the product beingsold.” This is exactly what OSN service providers do, mon-etizing user-generated contents by maintaining an OSN-basedecosystem. One of the most successful OSN-based businessmodels, targeted advertising [22], usually demands high qual-ity of personal information and large number of connected in-dividuals [13, 23, 24, 25, 26]. Thus, an OSN service providerhas strong incentive to encourage users to share personal infor-mation, and connect to more users.

To address users’ concern about disclosure of sensitive per-sonal information, the existing OSN service provider, suchFacebook and Google+, provide fine-grained privacy rules forusers. As these privacy rules become more complex, userscould leak their sensitive information due to misunderstand-ing and misusing the privacy rules [12]. Thus efforts are spenton improving the usability of the privacy control by automaticcontent-based reminders [12, 27]. For example, an automaticprivacy rule recommendation tool is proposed to deduce users’sharing preferences based on the users’ sharing content in orderto help users avoid misconfiguration of the privacy rules [27].However, the above methods cannot completely resolve theidentified exploits. And these methods and almost all mitiga-tions discussed in the paper add additional restrictions on user

13

Page 15: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

generated contents published or shared in OSNs.

8. Related Work

Due to wide adoption of OSNs, the privacy problem ofOSNs has attracted strong interest among researchers. We sum-marize the related work in this area in terms of attacks, privacysettings, and access control models.

The attack techniques proposed in prior literature mainlyfocus on inferring users’ identity [28] and other personal in-formation [2, 5, 4] from public information shared in OSNs.Zheleva et al. [2] proposed a classification-based approach toinfer users’ undisclosed personal particulars from their socialrelationships and group information which are publicly shared.Chaabane et al. [4] proposed to infer users’ undisclosed per-sonal particulars from public shared interests and public per-sonal particulars of other users who have similar interests. Bal-duzzi et al. [5] utilized email addresses as unique identifiersto identify and link user profiles across several popular OSNs.Since users’ information may be shared publicly in an OSN butnot be shared in another OSN, certain hidden information canbe revealed by combining public information collected fromdifferent OSNs. The effectiveness of these attacks largely de-pends on the quality of public information, which can be af-fected due to users’ awareness of privacy concerns. As reportedin [4], only 18% of Facebook users now publicly share their so-cial relationships and 2% of Facebook users publicly share theirdates of birth. Thus it is more realistic to analyze the threatscaused by more powerful adversaries or insiders as in our anal-ysis.

The threat of privacy leakage caused by insiders is alsomentioned by Johnson et al. [3]. They investigated users’ pri-vacy concerns on Facebook and discovered that the privacycontrol mechanisms in existing OSNs help users manage out-sider threats effectively but cannot mitigate insider threats be-cause users often wrongly include inappropriate audiences asmembers of their friend network. Wang et al. [12] analyzedreasons why users wrongly configure privacy settings and pro-vided suggestions for users to avoid such mistakes. To helpusers handle complex privacy policy management, Cheek etal. [29] proposed two approaches using clustering techniquesto assist users in grouping friends and setting appropriate pri-vacy rules. However, as shown in our work, privacy leakagecould still happen even if a user correctly configures his privacysettings due to the exploits caused by inherent conflicts betweenprivacy control and OSN functionalities.

Some researchers addressed the privacy control problem intraditional access control modeling. Several models [9, 8] areestablished to provide more flexible and fine-grained control soas to increase the expressive power of privacy control models.Nevertheless, this is not sufficient to guarantee effective pri-vacy protection. From our analysis on information flows, OSNfunctionalities may be affected by privacy control. On the otherhand, a more complex privacy control model increases users’burden on configuring privacy rules.

One of the exploits found in our work (Exploit 5) is alsomentioned in previous research on resolving privacy conflicts in

collaborative data sharing. Wishart et al. [20] and Hu et al. [21]analyzed co-owned information disclosure due to conflicts ofprivacy rules set by multiple owners. They also introduced anegotiation mechanism to seek a balance between the risk ofprivacy leakage and the benefit of data sharing. Compared tothem, our work investigates a broader range of privacy threatsin OSNs, discovers the underlying conflicts between privacycontrol and social/business values of OSNs, and analyzes thedifficulty in resolving these conflicts, which have not been ad-dressed in previous works.

9. Conclusion

In this paper, we investigated privacy leakage under privacycontrol in online social networks. Our analysis showed that pri-vacy leakage could still happen even after users correctly con-figure their privacy settings. We examined real-world OSNsincluding Facebook, Google+, and Twitter, and discovered theexploits which lead to privacy leakage. Based on the findings, aseries of attacks were introduced for adversaries with differentcapabilities to learn undisclosed personal information. We an-alyzed necessary conditions and provided suggestions for usersto mitigate privacy leakage in OSNs. We conducted a user studyto investigate the potentially vulnerable users due to the attacks.In the end, we discussed the implications of resolving privacyleakage in OSNs.

Appendix A. Proof of Necessary Condition 1

The input of an adversary includes two types of knowledgeabout a victim: a subset U = u1, u2, ..., un of a victim v’s SRset in an OSN, and personal particular value ppui

(ppui6= null)

of each receiver ui ∈ U . The adversary may infer the victim’spersonal particular ppv (ppv 6= null) by calculating the commonpersonal particular value shared by most of the victim’s friendswith Algorithm 1.

Algorithm 1 Infer Personal ParticularInput: U = u1, u2, ..., un; ppu1

, ppu2, ..., ppun

;Output: ppinfer

1: compute PP = pp1, pp2, ..., ppm from ppui for all i ∈1, 2, ..., n

2: for all j ∈ 1, 2, ...,m do3: calculate Uppj

⊆ U such that for all u ∈ Uppj, ppu =

ppj4: end for5: if there exists Uppt

such that |Uppt| > |Upps

| for all s ∈1, 2, ...,m and t 6= s then

6: return personal particular value ppt7: else8: return null9: end if

Given the inputs, if Algorithm 1 returns a value ppinferwhich is equal to the victim’s personal particular ppv , then thevictim’s personal particular information is leaked to the adver-sary.

14

Page 16: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Appendix B. Proof of Necessary Condition 2

A victim uses the privacy rules pr1, pr2, ..., prn to protecther personal particular published in OSN1, OSN2, ..., OSNn

respectively where each privacy rule pri = (wli, bli) containsa white list wli and a black list bli. Assuming there are twoprivacy rules prt and prj such that wlt \ blt 6= wlj \ blj)where t, j ∈ 1, 2, ..., n and t 6= j, we have Udiff =(wlt \ blt) \ (wlj \ blj) 6= Ø. If an adversary adv ∈ Udiff ,then the victim’s personal information is leaked to the adver-sary although the information is supposed to be hidden fromthe adversary by prj on OSNj .

Appendix C. Proof of Necessary Condition 3

A victim v sets the privacy rule prv = (wlv, blv) for herSR set with white list wlv and black list blv . The victim’sSR includes a set of users U = u1, u2, ..., un. Each userui sets the privacy rule pri = (wli, bli) for his/her SR setwith white list wli and black list bli for all i ∈ 1, 2, ..., n.Assuming an adversary adv is not in wlv \ blv , the adver-sary is not allowed to view any relationships in the victim’sSR set. If there is a privacy rule prt such that wlt \ blt isnot a subset of wlv \ blv and t ∈ 1, 2, ..., n, then we haveUdiff = (wlt \ blt)\ (wlv \ blv) 6= Ø. Assuming adv ∈ Udiff ,then the relationship between user ut and victim v is known byadversary adv although the information in the victim’s SR setshould be hidden from adv by prv .

Appendix D. Proof of Necessary Condition 4

A victim sets a privacy rule prv = (wlv, blv) for her SR setwith white list wlv and black list blv . The victim’s SR includesa set of users U = u1, u2, ..., un. Assuming that U is not asubset of wlv \ blv , then we have Udiff = U \ (wlv \ blv) 6= Ø.If adversary adv ∈ Udiff , then REC functionality recommendsalmost all users in U to adv. Note that these users should behidden from adv by privacy rule prv because adv is not in wlv\blv .

Appendix E. Proof of Necessary Condition 5

Given a privacy rule pru = (wlu, blu) for an activity withwhite list wlu and black list blu where victim v is mentionedby her friend u, any receivers in wlu \ blu are allowed to viewthe activity. We assume that v’s intended privacy rule for theactivity is prv = (wlv, blv) with white list wlv and black listblv . If wlu \ blu is not a subset of wlv \ blv , then we haveUdiff = (wlu \ blu) \ (wlv \ blv) 6= Ø. Assuming adv ∈Udiff , then adv can obtain the activity published by u althoughthe victim’s privacy rule prv prevents adv from viewing theactivity.

References

[1] C. Aquino, http://blog.comscore.com/2012/01/its a social world.html.[2] E. Zheleva, L. Getoor, To join or not to join: the illusion of privacy in

social networks with mixed public and private user profiles, in: Proceed-ings of the 18th international conference on World wide web, 2009, pp.531–540.

[3] M. Johnson, S. Egelman, S. M. Bellovin, Facebook and privacy: it’s com-plicated, in: Proceedings of the Eighth Symposium on Usable Privacy andSecurity, 2012, pp. 9:1–9:15.

[4] A. Chaabane, G. Acs, M. A. Kaafar, You are what you like! informa-tion leakage through users interests, in: Proceedings of the 19th annualnetwork & distributed system security symposium, 2012.

[5] M. Balduzzi, C. Platzer, T. Holz, E. Kirda, D. Balzarotti, C. Kruegel,Abusing social networks for automated user profiling, in: Proceedings ofthe 13th international conference on Recent advances in intrusion detec-tion, 2010, pp. 422–441.

[6] M. McPherson, L. Smith-Lovin, J. M. Cook, Birds of a feather: Ho-mophily in social networks, Annual Review of Sociology 27 (2001) 415–444.

[7] D. Centola, J. C. Gonzalez-Avella, V. M. Eguiluz, M. S. Miguel, Ho-mophily, cultural drift, and the co-evolution of cultural groups, The Jour-nal of Conflict Resolution 51 (6) (2007) 905–929.

[8] B. Carminati, E. Ferrari, R. Heatherly, M. Kantarcioglu, B. Thuraising-ham, A semantic web based framework for social network access control,in: Proceedings of the 14th ACM symposium on Access control modelsand technologies, 2009, pp. 177–186.

[9] P. Fong, M. Anwar, Z. Zhao, A privacy preservation model for facebook-style social network systems, in: Computer Security – ESORICS 2009,Vol. 5789, 2009, pp. 303–320.

[10] C. Tang, K. Ross, N. Saxena, R. Chen, Whats in a name: A study ofnames, gender inference, and gender behavior in facebook, in: DatabaseSystems for Adanced Applications, Vol. 6637 of Lecture Notes in Com-puter Science, Springer Berlin Heidelberg, 2011, pp. 344–356.

[11] D. J. Watts, Small worlds: the dynamics of networks between order andrandomness, Princeton University Press, 1999.

[12] Y. Wang, G. Norcie, S. Komanduri, A. Acquisti, P. G. Leon, L. F. Cranor,”i regretted the minute i pressed share”: a qualitative study of regrets onfacebook, in: Proceedings of the Seventh Symposium on Usable Privacyand Security, 2011, pp. 10:1–10:16.

[13] FaceBook, http://sec.gov/Archives/edgar/data/1326801/000119312512034517/d287954ds1.htm.

[14] A. Yamada, T. H.-J. Kim, A. Perrig, Exploiting privacy policy conflicts inonline social networks, in: Technical report, Carnegie Mellon University,2012, pp. 1–9.

[15] R. Dey, Z. Jelveh, K. Ross, Facebook users have become much moreprivate: A large-scale study, in: Proceedings of the 10th Pervasive Com-puting and Communications Workshops, 2012, pp. 346–352.

[16] K. M. Sheldon, B. A. Bettencourt, Psychological need-satisfaction andsubjective well-being within social groups, British Journal of Social Psy-chology 41 (1) (2002) 25–38.

[17] E. Hei-man TS, An ethnography of social network in cyberspace: Thefacebook phenomenon, The Hong Kong Anthropologist 2 (2008) 53–77.

[18] B. Renner, Curiosity about people: The development of a social curiositymeasure in adults, Journal of Personality Assessment 87 (3) (2006) 305–316.

[19] R. L. HOTZ, Science reveals why we brag somuch, http://online.wsj.com/article/SB10001424052702304451104577390392329291890.html?mod=googlenews wsj.

[20] R. Wishart, D. Corapi, S. Marinovic, M. Sloman, Collaborative privacypolicy authoring in a social networking context, in: Proceedings of the2010 IEEE International Symposium on Policies for Distributed Systemsand Networks, 2010, pp. 1–8.

[21] H. Hu, G.-J. Ahn, J. Jorgensen, Detecting and resolving privacy conflictsfor collaborative data sharing in online social networks, in: Proceedingsof the 27th Annual Computer Security Applications Conference, 2011,pp. 103–112.

[22] T. J. Spaulding, How can virtual communities create value for business?,Electronic Commerce Research and Applications (2010) 38 – 49.

[23] W. F. Brown, The determination of factors influencing brand choice, Jour-nal of Marketing 14 (5) (1950) 699–706.

15

Page 17: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

[24] T. E. Coffin, A pioneering experiment in assessing advertising effective-ness, Journal of Marketing 27 (3) (1963) 1–10.

[25] L. D. Wolin, P. KorgaonkarBhat, Web advertising: gender differences inbeliefs, attitudes and behavior, Internet Research 13 (5) (2003) 375–385.

[26] P. J. DANAHER, G. W. MULLARKEY, Factors affecting online adver-tising recall: A study of students, Journal of Advertising Research 43 (3)(2003) 252–267.

[27] A. Sinha, Y. Li, L. Bauer, What you want is not what you get: Predictingsharing policies for text-based content on facebook, in: Proceedings ofthe 2013 ACM Workshop on Artificial Intelligence and Security, AISec’13, ACM, 2013, pp. 13–24.

[28] L. Backstrom, C. Dwork, J. Kleinberg, Wherefore art thou r3579x?:anonymized social networks, hidden patterns, and structural steganogra-phy, in: Proceedings of the 16th international conference on World WideWeb, 2007, pp. 181–190.

[29] G. P. Cheek, M. Shehab, Policy-by-example for online social networks,in: Proceedings of the 17th ACM symposium on Access Control Modelsand Technologies, SACMAT ’12, 2012, pp. 23–32.

16

Page 18: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

Biographical Sketch

Yan Li

Yan Li is currently a Ph.D. candidate in Information Systems at Singapore Management University. He received his Master’s degree from Peking University in 2009. He current research interests include privacy and security in social networks, face biometric-based user authentication, and liveness detection for face authentication.

Yingjiu Li

Yingjiu Li is currently an Associate Professor in the School of Information Systems at Singapore Management University. He received his Ph.D. degree in Information Technology from George Mason University in 2003. His research interests include RFID security, applied cryptography, and data applications security. He has published over 70 technical papers in international conferences and journals. He has served in the program committees for over 50 international conferences and workshops. Yingjiu Li is a senior member of the ACM and a member of the IEEE.

Qiang Yan

Qiang Yan is currently a privacy engineer in Google. He received his Ph.D. degree in Information Systems from Singapore Management University in 2013. His research interests include mobile security and privacy, human factors insecurity system design, and security and privacy issues in social networks.

Robert H. Deng

Robert H. Deng is currently a professor, associate dean for Faculty and Research, School of Information Systems at Singapore Management University. He received his Ph.D. degrees from the Illinois Institute of Technology. He has more than 200 technical publications in international conferences and journals in the areas of computer networks, network security, and information security. He has served as general chair, program committee chair, and program committee member of numerous international conferences. He is an Associate Editor of the IEEE Transactions on Dependable and Secure Computing, Associate Editor of Security and Communication Networks Journal (John Wiley).

Page 19: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

Think Twice before You Share:Analyzing Privacy Leakage under Privacy Control in

Online Social Networks

Yan Li, Yingjiu Li, Qiang Yan, and Robert H. Deng

School of Information Systems, Singapore Management Universityyan.li.2009,yjli,qiang.yan.2008,[email protected]

Abstract. Online Social Networks (OSNs) have become one of the major plat-forms for social interactions. Privacy control is deployed in popular OSNs toprotect user’s data. However, user’s sensitive information could still be leakedeven when privacy rules are properly configured. We investigate the effectivenessof privacy control against privacy leakage from the perspective of informationflow. Our analysis reveals that the existing privacy control mechanisms do notprotect the flow of personal information effectively. By examining typical OSNsincluding Facebook, Google+, and Twitter, we discover a series of privacy ex-ploits which are caused by the conflicts between privacy control and OSN func-tionalities. Our analysis reveals that the effectiveness of privacy control may notbe guaranteed as most OSN users expect.

Keywords: Online Social Network, Privacy Control, Attacks, Information Flow

1 Introduction

Online Social Network services (OSNs) have become an essential element in modernlife where massive amount of personal data is published. Prior research [9, 3, 1] showsthat it is possible to infer undisclosed personal data from publicly shared information.But the availability and quality of the public data causing privacy leakage are decreasingdue to the following two factors: 1) privacy control mechanisms have become the stan-dard feature of OSNs and keep evolving [4, 2]. 2) the percentage of users who choosenot to publicly share information is also increasing [3]. In this tendency, it seems thatprivacy leakage may be perfectly prevented as the increasingly comprehensive privacycontrol mechanism is available to the users. However, this may not be achievable ac-cording to our findings.

In this paper, we investigate privacy protection from a new perspective, referred toas privacy leakage under privacy control (PLPC). PLPC examines whether a user’s pri-vate personal information is leaked even if the user properly configures privacy rules.The problem of PLPC in OSNs involves distributor and receiver. An adversary is areceiver who intends to learn private information published by a victim who is a dis-tributor. An adversary’s capabilities can be characterized according to two factors. Thefirst factor is the distance between adversary and victim. Considering a social networkas a directed graph, an n-hop adversary can be defined such that the length of the short-est connected path from victim to adversary is n hops. In our discussion, we consider

Page 20: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

1-hop adversary (i.e. friend), 2-hop adversary (i.e. friend of friend), and k-hop adver-sary where k > 2 (i.e. stranger). The second factor is prior knowledge about a victimrequired by corresponding attacks.

We examine the underlying reasons that make privacy control vulnerable using in-formation flow based analysis. We start with categorizing the personal information ofan OSN user into three attribute sets according to who the user is, whom the user knows,and what the user does, respectively. We model the information flow between these at-tribute sets and examine the functionalities which control the flow. We inspect typicalreal-world OSNs including Facebook, Google+, and Twitter, where privacy exploits andcorresponding attacks are identified.

Our analysis reveals that most of the privacy exploits are inherent due to the underly-ing conflicts between privacy control and essential OSN functionalities. Therefore, theeffectiveness of privacy control may not be guaranteed even if it is technically achiev-able.

We summarize the contributions of this paper as follows:

– We investigate the interaction between privacy control and information flow inOSNs. We identify privacy exploits for current privacy control mechanisms in typ-ical OSNs. Based on these privacy exploits, we introduce a series of attacks foradversaries with different capabilities to obtain private personal information.

– We analyze the discovered exploits caused by the conflicts between privacy con-trol and the functionalities. These conflicts reveal that the effectiveness of privacycontrol may not be guaranteed as most OSN users expect.

2 Attribute Sets, Functionalities, and Information Flows In OSNs

In a typical OSN, Alice owns a profile page for publishing her personal information.The personal information can be categorized into three attribute sets: a) personal par-ticular set (PP set), b) social relationship set (SR set), and c) social activity set (SAset), according to who the user is, whom the user interact with, and what the user does,respectively. We show corresponding personal information and attribute sets on Face-book, Google+, and Twitter in Table 1.

Table 1. Types of personal information on Facebook, Google+, and Twitter

Acronym Attribute set Facebook Google+ TwitterPP Personal Particulars Current city, hometown, sex,

birthday, employer, university,religion, political views, music,emails, city, about me

Introduction, occu-pation, employment,education, places lived,phone, gender

Name,location, bio,website

SR Social Relationship (incoming list,outgoing list)

Friends, friends Have you in circles,your circles

Following,follower

SA Social Activities Status message, photo, link,video, comments, like

Post, photo, comments,link, video, plus 1’s

Tweets

Page 21: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

Alice’s PP set describes persistent facts about Alice, such as gender and race. Al-ice’s SR set stores her social relationships as connections. A connection represents in-formation flow from a distributor to her 1-hop receiver. Alice’s SR set consists of anincoming list and an outgoing list. For each user ui in Alice’s incoming list, there is aconnection from ui to Alice. For each user uo in Alice’s outgoing list, there is a connec-tion from Alice to uo. Alice can receive information from the users in her incoming list,and distribute her information to the users in her outgoing list. The social relationshipsin certain OSNs such as Facebook are mutual. Such mutual relationship can be consid-ered as a pair of connections linking two users with opposite directions. The incominglist and outgoing list in SR set and their corresponding names on FaceBook, Google+,and Twitter are shown in Table 1. Lastly, Alice’s SA set describes her social activities,such as status messages and photos.

Most OSNs provide two basic functionalities including REC and TAG. REC func-tionality recommends to Alice a list of users that Alice may include in her SR set. Thelist of recommended users is composed based on the social relationships of the users inAlice’s SR set. TAG functionality allows Alice to mention another user’s name in hersocial activities, which provides a link to the user’s profile page.

The attribute sets (illustrated as circles in Figure 1) of multiple users are connectedwithin an OSN, where personal information may explicitly flow from a profile page toanother profile page via REC and TAG, as represented by solid arrows and rectanglesin Figure 1. It is also possible to access a user’s personal information in PP set andSR set via implicit information flows marked by dashed arrows. The details about theseinformation flows are described below.

Attribute Set X

Functionality Y

Profile Page Z

Fig. 1. Information flows between attribute sets

The first explicit flow is caused by REC, as shown in arrow (1) in Figure 1. RECrecommends to Bob a list of users based on the social relationships of the users in Bob’sSR set. Thus the undisclosed users in Alice’s SR may be recommended to Bob via REC,if Bob is connected with Alice.

The second explicit flow is caused by TAG is shown in arrow (2) in Figure 1. AnOSN user may mention other users’ names in a social activity in his/her SA set via TAG,which creates explicit links connecting SA sets within different profile pages.

The third flow is an implicit flow caused by the design of information storage forSR sets, as shown in arrow (3) in Figure 1. Given a connection from Alice to Bob, Bobis included in Alice’s outgoing list while Alice is included in Bob’s incoming list.

Page 22: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

The fourth flow is an implicit flow related to PP set, which is shown as the arrow (4)in Figure 1. Due to the homophily effect [6], a user is willing to connect with the userswith similar personal particulars. This tendency can be used to link PP sets of multipleusers.

It is difficult to prevent privacy leakage from all these information flows. A usermay be able to prevent privacy leakage caused by explicit information flows by care-fully using corresponding functionalities, as these flows are materialized only when thefunctionalities are used. However, it is difficult to avoid privacy leakage due to implicitinformation flows, as they are caused by inherent correlations among the informationshared in OSNs. In fact, all these four information flows illustrated in Figure 1 corre-spond to inherent exploits, which will be analyzed in Section 3.

3 Exploits And Attacks

In this section, we analyze the exploits and attacks to a victim’s PP set, SR set, and SAset. All of our findings have been verified on Facebook, Google+, and Twitter.

3.1 PP Set

The undisclosed information in PP set can be inferred by the following exploit, namelyinferable personal particular.

Inferable Personal Particular Human beings are more likely to interact with oth-ers who have the same or similar personal particulars [6]. The phenomenon is calledhomophily. This causes an exploit named inferable personal particulars, which corre-sponds to the information flow shown as dashed arrow (4) in Figure 1.

Exploit 1: If most of a victim’s friends have similar personal particulars, it could beinferred that the victim may have the same or similar personal particulars.

An adversary may use Exploit 1 to obtain undisclosed information in a victim’s PPset. The following is a typical attack on Facebook.

Fig. 2. Alice and most of her friends have common employer information

Attack 1: Considering a scenario on Facebook shown in Figure 2, Bob, Carl, Derek,and some other users are Alice’s friends, and Bob is a friend of Carl, Derek, and most

Page 23: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

of Alice’s friends. Alice shares her employer information “XXX Agency” with Carl andDerek. Most of Alice’s friends may share their employer information with their friendsdue to different perceptions in privacy protection. Bob can collect the employer infor-mation of Alice’s friends and infer that Alice’s employer may be “XXX Agency”.

The above attack works on Facebook, Google+, and Twitter. An adversary shouldhave two types of knowledge. The first type of knowledge includes a large portion ofusers stored in the victim’s SR set. The second type of knowledge includes the personalparticulars of these users.

3.2 SR Set

The information in SR set can be leaked by two exploits, namely inferable social rela-tionship and unregulated relationship recommendation.

Inferable Social Relationship A user’s SR set consists of incoming list and outgoinglist. Given a connection from Alice to Carl, Carl is recorded in Alice’s outgoing listwhile Alice is recorded in Carl’s incoming list. This causes an exploit named inferablesocial relationship, which corresponds to the information flow shown as dashed arrow(3) in Figure 1.

Exploit 2: Each social relationship in a victim’s SR set indicates a connection betweenthe victim and another user u. User u’s SR set also stores a copy of this relationshipfor the same connection, which can be used to infer the relationship in the victim’s SRset.

An adversary may use Exploit 2 to obtain undisclosed social relationships in a vic-tim’s SR set, which is shown as the following exemplary attack on Facebook.

Attack 2: Considering a scenario on Facebook, where Bob is a stranger to Alice, andCarl is Alice’s friend. Alice shares her SR set with a user group including Carl. Bobguesses Carl may be connected with Alice, but cannot confirm this by viewing Alice’sSR set as it is protected against him. However, Carl shares his SR set publicly due todifferent concerns in privacy protection. Seeing Alice in Carl’s SR set, Bob infers thatCarl is Alice’s friend.

Any adversary can use Exploit 2 as long as he has two types of knowledge: 1) a listof users in the victim’s SR set; 2) social relationships in these users’ SR sets. This attackcould be a stepping stone for an adversary to infiltrate a victim’s social network. Oncethe adversary discovers a victim’s friends and becomes a friend of the victim’s friends.After that, he is more likely to be accepted as the victim’s friend [7].

Unregulated Relationship Recommendation Most OSNs provide REC functionalityto recommend a list of other users whom this user may know. The recommendation listis usually calculated based on the relationships in SR set but not regulated by the privacyrules chosen by the users in the recommendation list. This causes an exploit namedunregulated relationship recommendation, which corresponds to the information flowshown as solid arrow (1) in Figure 1.

Page 24: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

Exploit 3: All social relationships in a victim’s SR set could be automatically recom-mended by REC to all users in the victim’s SR set, irrespective of whether or not thevictim uses any privacy rules to protect her SR set.

An adversary may use Exploit 3 to obtain undisclosed relationships in a victim’s SRset, which is shown in the following attack on Facebook.

Attack 3: On Facebook, Bob is a friend of Alice, but not in a user group namedClose Friends. Alice shares her SR set with Close Friends only. AlthoughBob is not allowed to view Alice’s social relationships in her SR set, such informationis automatically recommended by REC to Bob. If Bob is connected with Alice only, therecommendation list consists of the social relationships in Alice’s SR set only.

Any adversary who is a friend of a victim can perform the attack on both Facebookand Google+. No prior knowledge is required for this attack.

3.3 SA Set

The undisclosed information in SA set protected by existing privacy control mecha-nisms can be inferred due to the following exploit, inferable social activity.

Inferable Social Activity A user’s name can be mentioned by the other in a socialactivity via TAG such that this social activity provides a link to the profile page of thementioned user. Such links create correlations among all the users involved in the sameactivity. This causes an exploit named inferable social activity, which corresponds tothe information flow shown as solid arrow (2) in Figure 1.

Exploit 4: If a victim’s friend uses TAG to mention the victim in a social activity pub-lished by the victim’s friend, it implies that the victim may also attend the activity.Although this activity may involve the victim, the visibility of this activity is solely de-termined by the privacy rules specified by the victim’s friend who publishes the activity,which is out of the control of the victim.

An adversary may use Exploit 4 to obtain undisclosed social activities in a victim’sSA set, which is shown in the following attack on Facebook.

Attack 4: Considering a scenario on Facebook, where Bob and Carl are Alice’s friends,and Bob is Carl’s friend. Alice publishes a social activity in her SA set regarding a partywhich Carl and she attended together and she allows Carl only to view this social activ-ity. However, Carl publishes the same social activity in his SA set and mentions Alicevia TAG. Due to different concerns in privacy protection, Carl allows all his friendsto view this social activity. By viewing Carl’s social activity, Bob can infer that Aliceattended this party.

This attack may work on Facebook, Google+, and Twitter. Any adversary can per-form this attack if he knows the social activities published by the victim’s friends point-ing to the victim via TAG.

To mitigate the threat, the privacy control should enforce privacy rules to an activityno matter who publishes it. To resolve privacy conflicts in collaborative data sharing,

Page 25: Privacy leakage analysis in online social networks

MANUSCRIP

T

ACCEPTED

ACCEPTED MANUSCRIPT

policy negotiation mechanisms have been proposed [5, 8]. However, these policy nego-tiation mechanisms may significantly restrict the sharing nature of OSNs and frustrateusers who intend to share that activity, as the consents of all involved users are requiredfor each joint activity.

4 Conclusion

In this paper, we investigated privacy leakage under privacy control in online socialnetworks. Our analysis showed that privacy leakage could still happen even when userscorrectly configure their privacy rules. We examined real-world OSNs including Face-book, Google+, and Twitter, and discovered the exploits which lead to privacy leakage.The detailed attacks were demonstrated by utilizing these exploits to learn undisclosedpersonal information that is supposed to be protected by the corresponding privacyrules. Our analysis further revealed that these exploits are associated with the underlineconflicts between privacy control and functionalities, which are difficult to resolve.

Acknowledgment

This research is supported by the Singapore National Research Foundation under itsInternational Research Centre @ Singapore Funding Initiative and administered by theIDM Programme Office.

References1. M. Balduzzi, C. Platzer, T. Holz, E. Kirda, D. Balzarotti, and C. Kruegel. Abusing social

networks for automated user profiling. In Proceedings of the 13th international conference onRecent advances in intrusion detection, pages 422–441, 2010.

2. B. Carminati, E. Ferrari, R. Heatherly, M. Kantarcioglu, and B. Thuraisingham. A semanticweb based framework for social network access control. In Proceedings of the 14th ACMsymposium on Access control models and technologies, pages 177–186, 2009.

3. A. Chaabane, G. Acs, and M. A. Kaafar. You are what you like! information leakage throughusers interests. In Proceedings of the 19th annual network & distributed system securitysymposium, 2012.

4. P. Fong, M. Anwar, and Z. Zhao. A privacy preservation model for facebook-style socialnetwork systems. In Computer Security – ESORICS 2009, volume 5789, pages 303–320,2009.

5. H. Hu, G.-J. Ahn, and J. Jorgensen. Detecting and resolving privacy conflicts for collaborativedata sharing in online social networks. In Proceedings of the 27th Annual Computer SecurityApplications Conference, pages 103–112, 2011.

6. M. McPherson, L. Smith-Lovin, and J. M. Cook. Birds of a feather: Homophily in socialnetworks. Annual Review of Sociology, 27:415–444, 2001.

7. D. J. Watts. Small worlds: the dynamics of networks between order and randomness. PrincetonUniversity Press, 1999.

8. A. Yamada, T.-J. Kim, and A. Perrig. Exploiting privacy policy conflicts in online socialnetworks. Technical report, Carnegie Mellon University, 2012.

9. E. Zheleva and L. Getoor. To join or not to join: the illusion of privacy in social networks withmixed public and private user profiles. In Proceedings of the 18th international conference onWorld wide web, pages 531–540, 2009.