noyb - privacy in online social networks

Upload: alexandra-fodorean

Post on 07-Jul-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/18/2019 NOYB - Privacy in Online Social Networks

    1/6

    NOYB: Privacy in Online Social Networks

    Saikat Guha, Kevin Tang, and Paul FrancisCornell University, Ithaca

    {saikat, francis}@cs.cornell.edu, [email protected]

    ABSTRACT

    Increasingly, Internet users trade privacy for service. Face-book, Google, and others mine personal information to tar-get advertising. This paper presents a preliminary and par-tial answer to the general question “Can users retain theirprivacy while still benefiting from these web services?”. Wepropose NOYB, a novel approach that provides privacy whilepreserving some of the functionality provided by online ser-

    vices. We apply our approach to the Facebook online socialnetworking website. Through a proof-of-concept implemen-tation we demonstrate that NOYB is practical and incre-mentally deployable, requires no changes to or cooperationfrom an existing online service, and indeed can be non-trivialfor the online service to detect.

    Categories and Subject Descriptors

    C.2.4 [Computer Systems Organization]: Computer Com-munication Networks—Distributed Systems ; E.3 [Data]: DataEncryption

    General Terms

    Design, Security

    Keywords

    NOYB, Privacy, Cloud Computing

    1. INTRODUCTIONThe Internet was originally designed to connect endhosts.

    All data was stored and processed in these endhosts, and thenetwork simply provided transit. Reasoning about privacylargely involved which users and endhosts were processingthe data. But that is no longer the case. Today, the Inter-net   is   the computer. Data is stored and processed in the“cloud”. Service providers, such as Google and Facebook,are the faceless entities that control the cloud, and the user,

    for better or for worse, is merely on for the ride.

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.WOSN’08, August 18, 2008, Seattle, Washington, USA.Copyright 2008 ACM 978-1-60558-182-8/08/08 ...$5.00.

    Of course, there are good reasons for cloud computing.Gmail, Google’s web-based email service, for instance, pro-vides searchable, any-time any-where any-device access tothe user’s email through nothing more than the universalweb browser. Facebook, an online social networking web-site, allows groups of friends to stay up-to-date with devel-opments in each other’s lives. Meanwhile, under the covers,the cloud ensures data durability, disaster recovery, platformindependence, and provides integration with other services,

    all completely transparent to the user.In today’s economic climate, advertisers, rather than the

    users, pay for the cloud, and consequently, it is the interestsof the advertisers that takes priority over age old princi-ples such as that of least privilege [18]. The cloud, today,collects massive amounts of private information to providehighly targeted advertisements. Not surprisingly, lapses insecurity, poor judgement, or the lack of judicial oversightleaves users vulnerable: identity theft is rampant [19], infor-mation leakage is common [13], and the government is givenan avenue to impinge on civil liberties by sidestepping usersand going directly after cloud [2].

    Recent work in programming language techniques [4] demon-strate that it is possible to build online services that guar-

    antee conformance with strict privacy policies. However,such approaches require buy-in from the service providerwho, arguably,  need   the private data to generate revenue,and therefore have the incentive to do precisely the oppo-site. The research question that we seek to explore thereforeis to what extent a user can ensure his own privacy whilebenefiting from existing online services.

    The user unilaterally encrypting his data preserves pri-vacy, however, doing so precludes search, and more impor-tantly, breaks targeted ads. In order to preserve its bottom-line, a cooperative service provider may re-engineer the ser-vice to provide privacy, or push ad targeting to the client.An adversarial service provider not willing to expend thiseffort, on the other hand, can simply deny service to un-profitable users. To account for the latter case, a solution

    must protect privacy conscious users from being (easily) dis-covered.

    NOYB, short for  none of your business , is based on theobservation that some online services, notably social net-working websites, can operate on “fake” data. If the op-erations performed on the fake data by the online servicecan be mapped back onto the real data, the user can, to adegree, make use of the service. Furthermore, privacy canbe preserved by restricting the ability to recover the realdata from the fake data to authorized users only. This ob-

    49

  • 8/18/2019 NOYB - Privacy in Online Social Networks

    2/6

    servation leads naturally to our solution: user data is firstencrypted, and the ciphertext encoded to look like legitimatedata. The online service can operate on the ciphered data,however, only authorized users can decode and decrypt theresult.

    More broadly, NOYB proposes a new way of thinkingabout how to achieve privacy in online services whereby theuser devises a transformation under which much of the func-tionality of the service is preserved, but which can only be

    undone by authorized users. The transformation is weakerthan traditional encryption in that strictly more informa-tion is revealed to an adversary, but with the benefit thatthe victim can fly low under the adversary’s radar by mak-ing it hard for the adversary to find the victim amongstordinary users. Such an approach can be deployed incre-mentally by small groups of users without buy-in from theservice provider.

    Overall this paper makes three contributions. First, wepresent a general cipher and encoding scheme that preservescertain semantic and statistical properties such that onlineservices can process the data oblivious to the encryption.Second, we show how to apply this general approach toFacebook. And third, we report on our proof-of-concept im-

    plementation which demonstrates that NOYB is practical,feasible, and incrementally deployable by endusers withoutthe need for additional infrastructure.

    Having said that, we do not answer the question whethermodeling the service provider as an adversary, and corre-spondingly buying into heavyweight mechanisms such as aPKI, is necessary. But the existence of our solution thatpreserves, to a large extent, both user privacy and servicefunctionality under such an extreme model suggests the barcan be set high for other solutions that trade off complexityfor a more cooperative service provider.

    2. MOTIVATING EXAMPLEFacebook is a popular online social network. A user’s

    Facebook profile contains a wealth of personal information,including name, photo, date of birth, contact information,sexual orientation and relationship status, political and re-ligious views, personal interests, hobbies, education historyand more. This information is made available to membersof the user’s social network, allowing friends to stay in touchand up-to-date with each other’s lives. At the same time,Facebook generates revenue by targeting ads to highly spe-cific demographics (e.g. single males between 18–24 years of age in New York City).

    Registering on Facebook under a pseudonym, or obfuscat-ing one’s personal information is forbidden by Facebook’sterms of service [9]. Indeed, Facebook has banned users incases where it identified violations of these terms [16]. Al-though, negative publicity has forced Facebook to reinstate

    some users [17].Based on the study conducted by Acquisti and Gross [1],

    Facebook users   are   concerned about who can access theirpersonal information. While most users (60%) trust theirfriends almost completely with their personal information,significantly fewer (18%) trust Facebook (the company) tothe same degree, and even fewer (6%) trust strangers. Yet,under the covers, Facebook allows any application developer(a stranger) access to a user’s profile [10]. While a privacyconscious user may choose to not use Facebook, [1] finds thatpeer pressure drives membership, and unawareness of the

    true visibility of profiles lulls users into revealing personalinformation to untrusted parties.

    Facebook provides two primary services in the minds of users [1]: first, advertising information about oneself, typ-ically restricted to one’s extended social network, in orderto attract dates, and second, to find classmates. Since theability to search inherently requires revealing information toFacebook, in this paper we focus on preserving user privacywhile allowing users to advertise their private information

    to friends.

    3. GOALS AND ASSUMPTIONSMotivated by the example in Section 2, NOYB strives to

    achieve the following goals.Privacy Preserving:   NOYB preserves privacy defined

    as contextual integrity [12]. In such a framework, piecesof a user’s information are scattered but public; it is theinability of an adversary to combine  these pieces that definesprivacy. For instance, it may be public knowledge that thereexists some user named Alice, and some user aged 25, butan adversary is not able to conclude with any certainty thatAlice is 25. Only trusted parties, as designated by the user,typically excluding the untrusted online service, are able tocombine the user’s information.

    Incrementally Deployable: NOYB can be deployed in-crementally (by small groups of users) without any cooper-ation from the online service. In this deployment mode, a(hopefully large) fraction of the functionality provided bythe online service is preserved. If the service cooperates,however, a greater fraction of the functionality may be pre-served, and NOYB can be rolled out to all users.

    Hard to Detect:  In the case where an online service ishostile to NOYB, NOYB users blend into the crowd. Theimplicit assumption, as substantiated in Section 2, is thatthe service pays a high penalty for falsely accusing non-users. Making it difficult for the service to find NOYBusers through automated means serves as a deterrent against

    punitive action as long as the fraction of users using NOYBis not significant.

    4. NOYB: NONE OF YOUR BUSINESSThis section starts with an overview of NOYB, followed

    by a detailed description of its operation.

    4.1 Basic PrimitiveNOYB operates by first partitioning private information

    into multiple  atoms , and then replacing each atom with itsencryption. A simplistic approach would be to encrypt eachatom, and share the key with other users authorized to viewthat atom. While such a scheme does not reveal any userinformation to the online service, an online service can easily

    find users encrypting their data by looking for the traits of the cipher-text. Steganography can help avoid detection,but it is impractical to store cipher-text inside Facebookatoms as the cover-text, for example, as the atoms tend tobe small.

    Instead of this simple encryption scheme, NOYB sub-stitutes the user’s atom with another user’s atom pickedpseudorandomly. In essence, if all the atoms of the sameclass compose a dictionary , NOYB encrypts the index  of theuser’s atom in this dictionary, and uses the ciphered index topick the replacement atom from the dictionary (Figure 1).

    50

  • 8/18/2019 NOYB - Privacy in Online Social Networks

    3/6

    Key

    Nonce (IV)

    Plain-Text (atom)

    Dict ionary (in)

    Dict ionary (out)

    Index Index’Residue

    Cipher-text (atom)

    Key channel

    Residue channel

    Online service

    Public channel

    Index’ Index Plain-Text (atom)

    S ubs t i t ut i o n   e v e r s e S ubs t i t ut i o n

    DecryptEncryptReverse Lookup LookupFLookup

    Figure 1:   The basic NOYB primitive is a pseudorandom substitution cipher applied to each atom of private data. Dictionaries aremaintained for each class of atoms; input and output dictionaries may be the same. The pseudorandom sequence is generated by asymmetric-key cipher. Key management is out-of-band. Nonce is unique per-update. Residue channel may be in-band, possibly covert(e.g. steganography), or out-of-band (e.g. peer-to-peer pub-sub network).

    The benefit over the previous approach is that NOYB isharder to detect as the cipher atom is a legitimate atom, al-beit for some random user. To illustrate, assume Alice’sname, sex and age (Alice, F, 25) is partitioned into twoatoms: (Alice, F) and (25). The first atom is substitutedwith (Bob, M) say, and the second with (28) say, from Boband Charlie respectively based on the encrypted indices. Al-ice’s friends can reverse the encryption to recover Alice’s in-formation. And while Alice’s atoms may similarly show upin other users’ profiles, an adversary cannot piece togetherher atoms.

    4.2 Key ManagementExisting key management protocols (e.g. [3]) are used to

    distribute keys to users authorized to view the private infor-mation; however, the much of this complexity is hidden fromthe user by the NOYB application. The key managementprotocol is implemented out-of-band (OOB) — i.e. does notexchange messages using the online service to which NOYBis being applied. This is pursuant with our goal of pre-venting the online service from identifying NOYB users by

    identifying (high entropy) key management messages. Thechoice of the OOB channel depends on the trust model, andmay range from unencrypted emails exchanged through oneor more trusted third-parties, to a peer-to-peer network se-cured with a PKI.

    The number of keys depends on the online service. Forinstance in the case of Facebook, each user maintains onemaster key, from which it generates keys to encrypt eachprofile field. The master key is distributed to the user’s so-cial network (friends, friends of friends etc.). Revocation ishandled by negotiating a new key, however, only updatesto the profile are encrypted with the new key. By not re-encrypting the profile with the new key at the time of revo-cation, NOYB avoids the flurry of updates across multiple

    fields that may otherwise be detected by the online service.The tradeoff, however, is that multiple keys may be neededto decrypt a profile. The number of decryption keys neededis a function of how often profile fields are updated in re-lation to how often new keys are negotiated, and is upperbounded by the number of fields.

    4.3 DictionaryThe dictionary used for substitution is a mapping from

    a sequential index to unique atoms and their correspondingfrequency. The dictionary supports three functions: first,

    given an index it returns the corresponding atom (Lookup);second, given an atom it returns the corresponding index(Reverse Lookup); and third, given a (random) number in therange [0, 1), it returns an atom with the largest index suchthat the sum of frequencies of all atoms with lower indicesis less than the given number (FLookup) — in essence, pre-serving the marginal distribution of the cipher atoms giventhe uniformly distributed ciphered index. The joint distri-bution of atoms can be preserved by using different inputand output dictionaries based on the values of other atoms(Figure 1).

    Dictionaries are public, and include atoms from both NOYBusers and (some) non-users. The data from non-users thwartsthe service from using the dictionary to discover users, aswell as improves the confusion property of the pseudoran-dom cipher. While we assume for simplicity that there isone global dictionary per class of atoms, private dictionariesmay be used by groups of friends. For public dictionaries,external mechanisms are used to disseminate the mappings.

    Finally, dictionaries must support dynamic updates to al-low new atoms to be added over time. One concern in this

    regard is that adding new atoms must not break the existingindex-to-atom mappings that may be in use. Consequently,the dictionary is append-only. Updates to atom frequencies,however, are in-place since the frequencies are used only forencryption. Updates are performed anonymously over se-cure channels to defend against timing and eavesdroppingattacks.

    4.4 Pseudorandom SubstitutionThe semantic security of the substitution cipher is derived

    from the security provided by the encryption process thatlies at the core of the algorithm (Figure 1). As illustratedin the figure, a dictionary lookup is used to transform theatom to an index. The index is encrypted using the symmet-

    ric key and random nonce; we assume existing ciphers forthis purpose. The output of the underlying cipher is used toselect the cipher-text atom using the   FLookup  primitive of the output dictionary. By design, the distribution of the ci-pher text matches that of the plain text, which helps protectagainst certain frequency analysis attacks. The informationdiscarded by the   FLookup  process is encoded as a  residue ,and is transmitted separately along with the nonce to as-sist in the reverse substitution process. Message integrity isensured through a MAC computed over the output of theunderlying cipher across all atoms, and transmitted along-

    51

  • 8/18/2019 NOYB - Privacy in Online Social Networks

    4/6

  • 8/18/2019 NOYB - Privacy in Online Social Networks

    5/6

     0 1 2 3 4 5 6 7 8 9

     10

     0 2 4 6 8 10 12 14 16 18

       T  o   t  a   l   d   i  c   t   i  o  n  a  r  y  s   i  z  e   (   M   B   )

    Number of users (thousands)

    Linear trendlineDictionary size

    Figure 2:   Dictionary grows sublinearly with users.

    be encrypted deterministically for each community (e.g. auniversity) using a predetermined secret. This allows com-munity members to search for a name by searching for theciphered version without revealing the real name to Face-book.

    7. IMPLEMENTATIONWe implemented a proof-of-concept version of NOYB as

    a browser plugin for the Firefox web browser. Our imple-mentation modifies Facebook pages by adding a button thatencrypts the user’s own profile. A second button added toother user’s pages decrypts their profile. The plugin con-sists of 1400 lines of XUL code, and 500 lines of pythoncode that uses AES in counter mode as the underlying ci-pher. The dictionaries necessary for the plugin are generatedby a service we built; the service samples Facebook profilepages of NOYB users and non-users, and posts the dictio-nary to a public website that NOYB users can query anony-mously. At present, our implementation does not managekeys and instead defers to the user at the time of encryptionand decryption to enter the password, however, it is pos-sible to modify our plugin to automatically distribute keysto the friends, and receive keys from friends through web

    based email services. We have manually verified that the en-crypted profiles look plausible without revealing significantprivate information. We are quick to point out, however,that our experience is limited owing to our small userbase.

    A second purpose of our implementation exercise is tostudy the feasibility of maintaining the dictionaries. Fig-ure 2 shows the size of the dictionary as a function of thenumber of users sampled. We find that the size grows sub-linearly reflecting overlapping values across different users.While the numbers are encouraging in that we expect a dic-tionary of a million users to initially be manageably small(344 MB), we do not, at present, have data to comment onthe size of the dictionary over time as new atoms are ap-pended. Nevertheless we believe that if the size grows by upto 3–4 orders of magnitude over the lifetime of the onlineservice, a distributed peer-to-peer dictionary infrastructurecould be necessary.

    8. RELATED WORKUser privacy has been an active field of research stretching

    back to the beginnings of public and commercial adoption of the Internet. Pretty Good Privacy (PGP) [20] applies end-to-end public-key cryptography to emails. TLS [7] appliesthe same to end-to-end interactive communication channels.While NOYB is similarly end-to-end in the privacy it pro-

    vides, it is different from the aforementioned systems in thatit is not completely opaque to the middle, and can thereforemake greater use of the functionality provided by the onlineservice.

    A second class of privacy preserving services operate inthe middle of the network. Such services include anonymiz-ing proxies [8], and dark nets [5]. These services provide anall-or-nothing model to privacy, where the privacy preserv-ing mechanism, such as stripping of a HTTP cookie, either

    completely shields the user potentially breaking the applica-tion, or, when absent, leaves the user completely vulnerable.NOYB, instead, is tightly coupled with the application al-lowing fine-grained control over user privacy while balancingthe functionality preserved.

    Complementary to NOYB is the large amount of researchin ciphers, key management, steganography, and DHTs. NOYBshares a resemblance to the pseudorandom character substi-tution cipher in [15]. Particularly of use to NOYB are ex-isting and future broadcast key management algorithms [3],strong underlying ciphers [6], resillient steganographic tech-niques [11], and distributed store-lookup infrastructures [14],which can be used to implement the external mechanismsthat NOYB relies on.

    9. SUMMARY AND FUTURE WORKIn this paper we have described the NOYB mechanism,

    which provides fine-grained control over user privacy in on-line services while preserving much of the functionality pro-vided by the service. We apply this generic approach to thespecific case of protecting privacy in Facebook. While we arestill exploring the implications of such an approach; basedon a proof-of-concept implementation we conclude that thecore idea is feasible, and incrementally deployable by groupsof users without explicit cooperation from the online service.

    Beyond social networks, there are a number of interest-ing research directions we hope to explore. We only brieflylist them here. Foremost among these is to apply NOYBto online services that focus on search. Another is to applyNOYB in a way that allows online services to provide cus-tomized content, such as targeted advertisements, to userswithout the service being able to compromise the privateinformation on which the customization is based. Finally,NOYB appears to be a promising first step towards a newdesign paradigm of online services where the user plays anactive role in performing the sensitive operations on data,while the service takes care of the rest.

    Acknowledgements

    The authors would like to thank Tom Roeder for his inputin the design. Additional thanks to the Facebook users whocontributed their profile data for analysis.

    10. REFERENCES[1]  Acquisti, A., and Gross, R.  Imagined Communities

    Awareness, Information Sharing, and Privacy on theFacebook . In   Proceedings of the Privacy Enhancing Technologies Symposium (PET ’06)  (Cambridge, UK,June 2006).

    [2]   BBC News. Yahoo ‘helped jail China writer’ , Sept.2005.

    [3]  Boneh, D., Gentry, C., and Waters, B.  CollusionResistant Broadcast Encryption With Short

    53

  • 8/18/2019 NOYB - Privacy in Online Social Networks

    6/6

    Ciphertexts and Private Keys. In  Proceedings of the CRYPTO ’05   (Santa Barbara, CA, Aug. 2005).

    [4]   Chong, S., Liu, J., Myers, A. C., Qi, X., Vikram,K., Zheng, L., and Zheng, X.  Secure webapplications via automatic partitioning. In  Proceedings of the 21th ACM Symposium on Operating Systems Principles  (Stevenson, WA, Oct. 2007).

    [5]  Clarke, I., Sandberg, O., Wiley, B., and Hong,T. W.  Freenet: A Distributed AnonymousInformation Storage and Retrieval System. InProceedings of the Privacy Enhancing Technologies Symposium (PET ’01)  (San Francisco, CA, Apr.2001).

    [6]  Daemen, J., and Rijmen, V.  The Design of Rijndael: AES–the Advanced Encryption Standard .Springer, 2002.

    [7]   Dierks, T., and Rescorla, E.  RFC 4346: TheTransport Layer Security (TLS) Protocol Version 1.1,Apr. 2006.

    [8]   Dingledine, R., Mathewson, N., and Syverson,P. TOR: The Second-Generation Onion Router. InProceedings of 13th conference on USENIX Security Symposium  (San Deigo, CA, Aug. 2004).

    [9]   Facebook Inc. Terms of Use, Nov. 2007.[10]  Felt, A., and Evans, D.  Privacy Protection for

    Social Networking APIs. 2008.

    [11]   Katzenbeisser, S., and Petitolas, F.  InformationHiding Techniques for Steganography and DigitalWatermaking.  EDPACS 28 , 6 (2000), 1–2.

    [12]   Nissenbaum, H.  Privacy as Contextual Integrity.Washington Law Review 79 , 1 (Feb. 2004), 119–158.

    [13]   PCWorld. Facebook’s Beacon More Intrusive ThanPreviously Thought, Nov. 2007.

    [14]  Rhea, S., Godfrey, B., Karp, B., Kubiatowicz,J., Ratnasamy, S., Shenker, S., Stoica, I., , and

    Yu, H.  OpenDHT: A Public DHT Service and ItsUses. In  Proceedings of SIGCOMM’05  (Philadelphia,PA, Aug. 2005).

    [15]   Ritter, T.  Substitution cipher with pseudo-randomshuffling: The dynamic substitution combiner.Cryptologia 14, 4 (1990), 289–303.

    [16]   Robert Scoble. Facebook disabled my account, Jan.2008.

    [17]   Robert Scoble. Facebook lets me back in..., Jan.2008.

    [18]  Saltzer, J., and Schroeder, M.  The Protection of Information in Computer Systems.  Proceedings of the IEEE 63 , 9 (Sept. 1975), 1278–1308.

    [19]   Stana, R. M., and Burton, D. R.  Identity Theft:Prevalence and Cost Appear to be Growing. Tech.Rep. GAO-02-363, U.S. General Accounting Office,Washington, D.C, 2002.

    [20]  Zimmermann, P. R.  The official PGP user’s guide .MIT Press, Cambridge, MA, 1995.

    54