preserving location and absence privacy in geo-social networks

Preserving Location and Absence Privacyin Geo-Social Networks

Dario Freni1 Carmen Ruiz Vicente2 Sergio Mascetti1 Claudio Bettini1 Christian S. Jensen3

1EveryWare Lab, DICoUniversità degli Studi di Milano

Italy{freni,mascetti,bettini}

@dico.unimi.it

2Dept. of Computer ScienceAalborg University

[email protected]

3Dept. of Computer ScienceAarhus University

[email protected]

ABSTRACTOnline social networks often involve very large numbers ofusers who share very large volumes of content. This contentis increasingly being tagged with geo-spatial and temporalcoordinates that may then be used in services. For example,a service may retrieve photos taken in a certain region. Theresulting geo-aware social networks (GeoSNs) pose privacythreats beyond those found in location-based services. Con-tent published in a GeoSN is often associated with referencesto multiple users, without the publisher being aware of theprivacy preferences of those users. Moreover, this content isoften accessible to multiple users. This renders it difficultfor GeoSN users to control which information about them isavailable and to whom it is available. This paper addressestwo privacy threats that occur in GeoSNs: location privacyand absence privacy. The former concerns the availability ofinformation about the presence of users in specific locationsat given times, while the latter concerns the availability ofinformation about the absence of an individual from spe-cific locations during given periods of time. The challengeaddressed is that of supporting privacy while still enablinguseful services. We believe this is the first paper to formal-ize these two notions of privacy and to propose techniquesfor enforcing them. The techniques offer privacy guarantees,and the paper reports on empirical performance studies ofthe techniques.

Categories and Subject DescriptorsH.2.8 [Database Management]: Database Applications—Spatial databases and GIS ; K.4.1 [Computers and Soci-ety]: Public Policy Issues—Privacy

General TermsAlgorithms

KeywordsSocial networks, Location privacy, Absence privacy

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.CIKM’10, October 26–30, 2010, Toronto, Ontario, Canada.Copyright 2010 ACM 978-1-4503-0099-5/10/10 ...$10.00.

1. INTRODUCTIONGeo-aware social networks (GeoSNs) are enabled by the

availability of social network services, mobile devices withInternet connectivity, and geo-location capabilities. GeoSNusers generate and share very large volumes of content, orresources, tagged with geo-location. Thus, resources such asstatus messages, photos, and “check-ins” are tagged with thelocation in which they were generated. Further, resourcesmay reference other users—for example, this occurs when auser tags a photo with the people in the photo.

A variety of services exist and can be envisioned that ex-ploit GeoSN resources. For example, the Google Picasa ser-vice lets users share and search photos that are both geo-tagged and tagged with other users. Dozens of other suchcommercial applications exist, including Brightkite, Flickr,Foursquare, Google Buzz, Google Latitude, Gowalla, Loopt,Twitter, and Whrrl. Indications are that geo-tagging is alsocoming to Facebook1. In addition, some GeoSN servicesenable third-party services that exploit GeoSN resources.

Privacy in social network services has become a hot topic,and reports indicate that users are leaving social networkservices due to privacy concerns. In GeoSNs, it is possi-ble for exact locations of users to be exposed to untrustedentities that may in turn utilize these to infer sensitive infor-mation about the users. For example, the presence of a userin certain locations, e.g., a hospital or a night club, mayreveal sensitive information about the user. And becauseGeoSN resources are easily spread among users in real time,additional threats such as stalking or assault are possible. Inaddition, the instant publishing of GeoSN resources can en-able an adversary to infer how far users are from their homes(or other locations) and hence how long these are possiblyunattended. This information represents an absence privacyviolation and may be used to plan a burglary2.

It may be argued that users should be aware of the privacyimplications of making resources available and should simplybehave responsibly, and thus should not publish resourcesthat may cause privacy concerns. However, this arrange-ment is undesirable for two main reasons. First, GeoSNservice providers are generally interested in as much contentas possible being available, as this attracts users and thusincreases advertising revenue. Second, in most GeoSN ser-vices, users can reference other users in resources; and it is

1See www.facebook.com/policy.php; tcrn.ch/99VSSN2See PleaseRobMe.com

generally not possible for a user to control the resources pub-lished by another user. For instance, imagine that Alice andBob are having a drink together in a pub. Alice updates herGeoSN status, writing “having a beer with Bob” and taggingthe post with 10:05 p.m., Manhattan. A bit later, Bob seeshis friend Charlie and invites him to join them at the table.Charlie then updates his status, writing “just met Bob at apub!” and tagging the post with 10:10 p.m., 24 West 35thStreet. A user with access to both posts (e.g., a friend ofBob) can infer that Alice is at a pub with Charlie as well asthe exact address of the pub, although Alice used a coarserlocation in her tag.

For these reasons, techniques that enable users to specifytheir privacy preferences and then enforce these preferencesare desirable for service providers and users alike. The ob-jective of this paper is to provide such techniques.

The algorithms proposed combine meta-data generaliza-tion over spatio-temporal granularities with an analysis ofdynamic constraints on inter-dependent resources, also tak-ing into account constraints on the maximum velocity ofuser movement. Absence privacy is enforced by computingappropriate temporal delays in resource publication basedon the above analysis.

Although privacy has been studied extensively in location-based services and social networks, we are not aware of anystudies that consider location and absence privacy in theGeoSN setting.

Research on privacy in social networks focuses on profileinformation, which differs from dynamic geo-spatial infor-mation such as a user’s current location. In the GeoSN set-ting, specifically in relation to friend-finder services, privacypreserving proximity computation has been studied [6–8],since friend-finders are typical GeoSN services. However,the solutions proposed do not apply to resource publication.

Research on privacy in location-based services (LBSs) isalso not directly applicable to the GeoSN setting. Manytechniques proposed for LBSs assume that users are anony-mous and offer anonymity by extending the principle of k-anonymity to the geo-spatial setting, thus avoiding user re-identification through location data [4,5]. These techniquesare not suitable for our problem, since the assumption ofuser anonymity is not realistic for most existing social net-works.

Other techniques for LBSs assume that the identities ofthe users are known and offer location privacy by obfuscat-ing the users’ locations [3]. We adopt this location privacymodel that is also the reference model for privacy-preservingproximity services. Among the LBS solutions adopting thismodel, the proposals by Ghinita et al. [2] are arguably themost related since they also consider dynamic privacy at-tacks based on the maximum velocity of the users. However,like other LBS proposals, these proposals do not take intoaccount resources related to multiple users at the same timeas in GeoSNs, and they do not consider temporal uncertaintyand absence as privacy preferences. Finally, some existingGeoSN services offer some form of control of the geo-tags ofresources, e.g., by enabling tags at coarse granularities suchas the city level (e.g., Google Latitude), but much finer con-trols are necessary to avoid the privacy threats consideredin this paper.

The contributions of the paper are the following:

• Formalization of location and absence privacy attacksin the GeoSN setting.

• Proposals of means of expressing privacy preferences.

• A privacy preserving technique with formal guaranteeson the enforcement of user preferences.

• Empirical studies that suggest that the proposed meth-ods are applicable in realistic scenarios.

The rest of the paper is organized as follows. Section 2 for-mally characterizes the privacy problems in GeoSN resourcepublication and also covers how users can specify their pri-vacy preferences; Section 3 describes the architecture of theproposed GeoSN privacy preservation technique; Section 4reports the technical details of the algorithms that composeour technique; Section 5 describes experimental results, andSection 6 concludes the paper.

2. PROBLEM FORMALIZATIONWe formally describe the assumed GeoSN service setting

and the privacy threats we address. We then define meansfor the users to express their privacy preferences, the adver-sary model, and sufficient conditions for satisfying a user’sprivacy preferences.

2.1 The GeoSN service settingA GeoSN service allows its users to publish a resource

(e.g., a picture, a text message, a check-in) tagged with thecurrent location and time, as well as a set of users related tothe resource. A resource is either tagged automatically (e.g.an integrated GPS can provide location and time), or taggedmanually. Since resources and their tags become available toother users as well as to service providers, we are concernedwith the privacy violations that the publication can lead to.

Formally, a resource r is a tuple:

〈Udata, STdata, Content〉,

where the first two elements are meta-data tags with r.Udatabeing a set of identifiers of users, r.STdata being a spatio-temporal tag and r.Content being the resource itself. In thefollowing, when referring to a resource r, we also denote withr.Sdata and r.Tdata the spatial and temporal components,respectively, of r.STdata. We assume that all the users inr.Udata are in the location r.Sdata at the time r.Tdata.

As an example, recall the user Charlie performing a sta-tus update informing his friends about his presence in thepub together with Alice and Bob. In our formalization,the update is a resource with Alice, Bob, and Charlie asr.Udata, and the location of the pub with the current timeas r.STdata.

We consider techniques for privacy preservation based onthe generalization of resources before publication. In partic-ular, we consider generalization functions that generalize thespatio-temporal tag of a resource. Formally, STdata for anoriginal resource is a point in the spatio-temporal domain,while STdata for a generalized resource is a 3D volume inthe spatio-temporal domain that contains the point of thecorresponding original resource.3 In case of generalized re-sources r′, we denote by r′.Tmax and r′.Tmin the maximumand minimum time instant of r′.Tdata, respectively.

Definition 1. Given the domain OR of all possible origi-nal resources and the domain GR of all possible generalized

3We assume that location and time of the original resourceare recorded at the finest available resolution and is approx-imated by a point.

resources, a partial function g : OR → GR is a general-ization function if, for each r ∈ OR such that g(r) is de-fined, r.Udata = g(r).Udata, r.Content = g(r).Content andr.STdata ⊆ g(r).STdata.

The generalization function g(r) will take into accountthe privacy preferences of the users involved in r, as wellas all the generalizations of the original resources submittedbefore r. To avoid the case of two original resources beingsubmitted at the same time, we assume a total orderingamong the submission times of original resources.

2.2 Privacy concerns and user preferencesWhen an adversary associates a user’s identity with user’s

private information, a privacy violation has occured. Whenit is not possible to exclude that the user’s identity can beobtained by the adversary (as it is often the case in socialnetworks), privacy must be preserved by obfuscating the pri-vate information. We consider location and time as privateinformation (in the following we will for brevity often use“location” to refer to a spatio-temporal location), and weconsider two privacy notions: location privacy and absenceprivacy.

A location privacy concern exists when uncontrolled dis-closure of the geographic position of a user at specific timescan occur. Most of the currently available GeoSNs sufferfrom this privacy concern. A typical example is a user whoelects to not let people know that he attended a religiousceremony or a political meeting.

An absence privacy concern exists when uncontrolled dis-closure of the absence of a user from a geographic positionat specific times can occur. This concern is conceptuallydifferent from location privacy and requires different protec-tion techniques. A typical example is a user not wishing tolet people know that he will not be at home for an extendedperiod of time.

These privacy concerns can be addressed by offering theusers means of controlling the location information to be dis-closed. These means include different kinds of preferencesthe users can express: (a) for certain regions, the user’s lo-cation and time should be revealed only at a “sufficientlycoarse” granularity, (b) for certain regions, the user’s ab-sence should not be revealed. Hence, preferences for loca-tion privacy should specify the minimum uncertainty that anadversary should have about the location of the user uponpublication of a resource (possibly considering the resourcepublication history). In contrast, preferences for absenceprivacy should specify regions and time intervals such thatan adversary cannot exclude any point of the region as theuser’s location during the associated interval, independentlyof the publication of resources during the interval.

For both type of preferences, we introduce the notion ofminimal uncertainty region (MUR) as a spatio-temporal re-gion for which an adversary cannot exclude any internalpoint as the location of the user. In the solution we pro-pose, each user can express location privacy preferences byspecifying a partition of the spatio-temporal domain thatdefine the desired MURs. For example, Alice specifies thatany resource in which she is tagged should not report thespecific campus building where she was at 10:30 a.m. today;in this case, one solution is to define the combination of theentire campus region and the time period “this morning” asone of the MURs. In other words, when defining a MUR, auser accepts that the adversary learns that she is located in

that MUR, but requires that no location within the MURcan be excluded as a possible location.

The MURs for a user can be captured formally with thenotion of spatio-temporal granularity. Intuitively, a spatial(temporal) granularity is a partitioning of the spatial (tem-poral) domain into a discrete set of non-overlapping gran-ules [1]. Each granule is associated with an index i, and thei-th granule of a granularity G is denoted as G(i).

For clarity, we consider spatio-temporal granularities wherethe granules are products of granules of spatial and tempo-ral granularities. Thus, we assume that a user u specifies aspatial granularity GS

u and a temporal granularity GTu as pri-

vacy preferences. The spatio-temporal granularity Gu thatrepresents the user’s privacy preference is then derived fromGS

u and GTu as follows: for each index j of GS

u and each indexk of GT

u , Gu(〈j, k〉) = GSu(j) × GT

u (k). In this case the setof indexes of Gu is a set of pairs of numbers, and each gran-ule of Gu(〈j, k〉) is the temporal extension of GS

u(j) for thetime interval defined by GT

u (k). When no confusion arises,we denote the granules of Gu with a single index i = 〈j, k〉.Since the granularities partition the spatial and temporaldomains, for each spatio-temporal point p, there exists agranule of the granularity Gu that contains p. We denotethis granule with Gu[p]. Definition 2 formally states when alocation privacy preference is enforced.

Definition 2. Let u be a user and r be an original resourcewith u ∈ r.Udata. The location privacy preference of u,expressed as the spatio-temporal granularity Gu, is enforcedif the adversary cannot exclude any point of Gu[r.STdata]as the possible location of u.

Next, a user can express an absence privacy preference interms of a set of granules of a spatio-temporal granularity,and we call each granule an Absence Privacy Region (APR).The intuitive semantics is the following: no location infor-mation should ever be disclosed about the user such that,for any point p in an APR, p can be excluded as a possiblelocation of the user. Note that the difference with respectto location privacy is that the user requires that, indepen-dently of the previous or current publication of resources inlocations outside or inside the APR, no point of an APRcan be excluded as a possible location of the user. Defini-tion 3 formally states when an absence privacy preference isenforced.

Definition 3. Let u be a user and A be the set of APRschosen by u. The absence privacy requirement of u is en-forced if, for every point p of each APR in A, the adver-sary, when considering all published resources, cannot ex-clude that user u is located at p.

2.3 Adversary modelWe assume that the adversary has access to all the re-

sources published by all the users. This is a conservativeapproach, and in some cases, it would be possible to ex-ploit Access Control (AC) techniques to identify classes ofadversaries that can only access a subset of the resourcespublished by the users. The above assumption accounts forthe case of collusion among adversaries in different classes.

We assume that the adversary knows the technique usedto generalize resources before publication. Also, since we donot assume that the users’ privacy preferences are kept se-cret, we conservatively assume that the adversary can obtain

the privacy preferences. In general, we assume that the ad-versary has the knowledge to compute g(r) for each r ∈ OR.We also assume that the adversary knows, for each user, themaximum velocity v at which that user can move. Whileusers can have different maximum velocities, we assume forsimplicity that a single velocity v applies to all users.

As opposed to adversary models that assume a uniformdistribution of user locations, our defense techniques alsoallow adversaries having a non-uniform a-priori probabilis-tic distribution function P that gives for each user u theprobability that at a given time instant t, the user is in acertain spatial location s. Since we assume that a user canpublish a resource any time and from any location the useris at, the same probability distribution applies to the pub-lication of resources: for each potential original resource r,P [r.STdata = p] denotes the a-priori probability that r isissued from a spatio-temporal point p. Note that if the ad-versary knows from the a-priori knowledge that a user haszero probability of being located in a certain location at agiven time (and hence no resource tagging him can be pub-lished from there) then it is not possible to guarantee thelocation privacy of that user. Hence, for each potential orig-inal resource r and each spatio-temporal point p, we assumeP [r.STdata = p] > 0.

To summarize, we assume that the adversary has exactlythe following knowledge:

• the set R′ of all the published resources

• g(r) for each r ∈ OR

• the users’ maximum velocity v

• the a-priori probabilistic distribution function of orig-inal resources P [r.STdata = p] > 0.

Finally, we note that how to infer spatio-temporal infor-mation from the content of a resource, e.g., a photo, is anorthogonal research topic. If such information is made ex-plicit, it can be treated as a tag and thus handled by thepaper’s proposal.

2.4 Location privacy preservationA necessary condition to guarantee location privacy is that

each original resource r is generalized into a resource r′ suchthat, for each user u tagged in the resource, the MUR ofu that contains r.STdata must be contained in r′.STdata.Formally:

∀u ∈ r.Udata (Gu[r.STdata] ⊆ r′.STdata) (1)

When Equation 1 holds, we say that r′ covers its users’MURs.

As an intuition for this necessary condition, consider Fig-ure 1(a) that, for the sake of simplicity, shows a one-dimen-sional spatial domain. The attribute r.STdata is generalizedinto a spatio-temporal region r′.STdata that only partiallycovers the granule representing the MUR required by theuser u that is tagged in r. Consequently, the adversary canexclude u from the dark gray region, hence violating thelocation privacy requirement of u.

The fact that a generalized resource r′ covers its users’MURs is a necessary but insufficient condition to guaran-tee location privacy. Consider the example in Figure 1(b)that shows two generalized resources r1 and r2. User u1 istagged in both resources, while user u2 is tagged in r1 only.Resource r1 and r2 is generalized into r′1 and r′2, respec-tively. It is easily seen that each generalized resource covers

(a) Single resource (b) Set of resources

Figure 1: Example of privacy violations

its users’ MURs. However, location privacy of neither u1

nor u2 is enforced if we take into account the maximumvelocity v. First, the adversary knows that during time pe-riod r′1.Tdata, user u1 is in r′1.Sdata. Now, if resource r1

is issued from the bottom left corner of r′1.STdata, it is im-possible for user u1 to reach, with the maximum velocityv, the dark gray region of r′2.STdata. In other words, notall points of r′2.STdata are reachable from r′1.STdata, andhence some points can be excluded as possible locations ofu1. A similar attack can be performed to exclude some partsof r′1.STdata. Indeed, no point of r′2.STdata can be reachedfrom any point of the dark gray region in r′1.STdata. Sincethe adversary knows that at the time of r1 the two users arein the same location, the adversary can exclude u2 from thedark gray region of r′1, hence violating the location privacyof u2.

The above example shows that two resources can be de-pendent if they have at least one user in common and iftheir temporal distance is small when compared with thespatial distance. In order to capture this intuition formally,we first define the notion of reachability and then the notionof independence of resources.

Definition 4. Given a velocity v and two spatio-temporalregions str1 and str2, we say that str1 is reachable from str2

if

∀p1 ∈ str1 (∃p2 ∈ str2 (ds(p1, p2) ≤ v · dt(p1, p2))), (2)

where ds and dt denotes the spatial and temporal distancebetween two points, respectively.

In the following, we say that a resource r1 is reachable froma resource r2 if r1.STdata is reachable from r2.STdata. Thetransitivity property applies to reachability in case resourcesare totally ordered as defined in Property 1.

Property 1. (Transitivity) Let r1, r2, and r3 be gener-alized resources such that: r1.Tmax ≤ r2.Tmin and r2.Tmax ≤r3.Tmin . Then,(a) if r3 is reachable from r2 and r2 is reachable from r1

then r3 is reachable from r1;(b) if r1 is reachable from r2 and r2 is reachable from r3

then r1 is reachable from r3.

We can now define independent resources as follows.

Definition 5. Two resources r1 and r2 are independent ifat least one of the following conditions hold:a) r1.Udata ∩ r2.Udata = ∅;b) r1 is reachable from r2, and r2 is reachable from r1.

Intuitively, Definition 5 states that two resources having atleast one user in common are independent if they are reach-able from each other.

One last necessary condition to guarantee location pri-vacy is related to the generalization function g(). Considerthe following example, in which we exploit the fact that theadversary can compute g(r) for each potential original re-source r ∈ OR. Let r′ be a published generalized resource,p a point in r′.STdata, and r a potential original resourcethat is identical to r′ except r.STdata = p. Also assumethat g(r).STdata 6= r′.STdata. The adversary can excludep as a possible location of the users in r′.Udata since if theusers posted the resource from p then it would be general-ized to something different from r′. Clearly, this can violatethe location privacy requirement.

In order to guard against this kind of attack, the gen-eralization function must have the property known in theliterature as “non-invertibility” or “reciprocal” [4, 5]. Thisproperty is stated in Definition 6.

Definition 6. A generalization function g() is non-invert-ible if for each pair of original resources r1 and r2 such thatr2.Udata = r1.Udata, r2.Content = r1.Content , andr2.STdata ∈ g(r1).STdata, it holds that g(r1).STdata =g(r2).STdata.

We can now define a set of conditions that are sufficientto guarantee location privacy.

Theorem 1. Let R′ = {g(r)|r ∈ R and g(r) is defined}be the set of all published resources, where R is the set ofcorresponding original resources, and g() is a generalizationfunction. If all the following conditions hold then the lo-cation privacy of all the users tagged in these resources isenforced:(a) for each r′ in R′, r′ covers its users’ MURs(b) each pair of distinct resources r′1 and r′2 in R′ are inde-pendent(c) g() is non-invertible

If this is satisfied, we say that R′ is a safe set.The proofs of Theorem 1 and two following theorems are

omitted due to space limitations.

2.5 Absence privacy preservationWe consider the class of absence privacy preferences that

model the following situation: at each time instant, a userwants to prevent the adversary from learning, that, at thattime, the user is not in the APRs. This protects from theproblem of unattended locations (like a user’s home) thatarises in the case of instant publishing of resources.

In our model, this privacy preference can be specified eas-ily. Intuitively, the user defines a spatial region s. Then,at each time instant t, the APR corresponding to s is thespatio-temporal region formed by s at time t. This can beeasily extended to a set S of spatial regions, and we denoteby Su the set of spatial regions specified by user u to protectabsence privacy.

We now define a set of conditions that are sufficient toguarantee that absence privacy is enforced. Intuitively, ifall the published resources are such that location privacy isenforced then by definition, for each resource r′ in which auser u is tagged, no location in r′.STdata can be excludedas a possible location of u. Then, in order to guarantee thatabsence privacy is enforced, we postpone the publication ofthe resources until each APR of u is reachable from theSTdata of each resource in which u is tagged. Theorem 2captures this intuition.

Theorem 2. Let R′ be the set of published resources andu be a user. Then absence privacy is enforced if the followingtwo condition hold:(a) R′ is such that location privacy is enforced(b) the publication time t of each r′ ∈ R′ is such that ∀s ∈Su, the spatio-temporal region given by s×t is reachable fromr′.STdata

3. ARCHITECTUREFigure 2 shows the architecture of the privacy enforcing

system, including the interaction of the users with the sys-tem. We assume the presence of a centralized trusted en-tity that is in charge of processing the original resource andpublishing it to the GeoSN after the generalization processdescribed in the following.

Figure 2: System architecture

When a user wants to publish a resource r, possibly re-lated to multiple users, the user sends the resource to thetrusted system. A pre-processing module retrieves the pri-vacy preferences of the involved users and computes the gen-eralization srg that covers its users’ MURs.

The cloaking module checks whether the disclosure of srgintroduces privacy violations when related to other resourcespreviously processed. If so, the module applies a general-ization algorithm in an attempt to produce a more generalresource that eliminates the violation. We show in Section 4that there are cases in which such generalization will not bepossible, and hence the publication should be denied. In allthe other cases, the cloaking module finds a safe generaliza-tion r′ and sends it to the Publisher module.

The Publisher module checks whether the users’ absenceprivacy preferences are violated by the immediate publica-tion of the resource and computes a publication time afterwhich it is safe to publish the resource. The final resourceis then stored in the local database, since it must be usedto check new resources for possible violations. After thepublication time has passed, the resource is published.

The privacy-enforcing trusted system needs to know theUdata and STdata of each resource to be processed, but itdoes not rely fundamentally on the content of a resource.Thus, a variant of the system can return instructions to auser for how to safely publish a resource when simply giventhe Udata and STdata of the resource. This variant is at-tractive if the user does not want to reveal the content of aresource, e.g., a photo, to the system.

This also makes it possible to implement our techniquesas a third-party application for existing GeoSNs. Indeed,

many existing GeoSNs allow third-party applications to havelimited access to the users’ data and to post resources onbehalf of the users upon being authorized to do so.

4. THE Wyse TECHNIQUEThis section presents Wyse (Watch Your Social stEp),

our proposed privacy-preservation technique.

4.1 OverviewThe general idea is the following. Given a resource r to be

published and a stored, safe set R′ of generalized resources,the technique produces a generalized resource r′ such that{r′} ∪ R′ remains safe. As will be explained, it may hap-pen that it is impossible to find a proper r′. In this case,publication is denied.

Figure 3: Steps of Wyse

Figure 3 presents the different steps of the technique.First, the resource is generalized to a resource covering itsusers’ MURs. This step is performed by the Single ResourceGeneralization (SRG) module. Then the resource is sent toa cloaking module, where an additional spatial or temporalgeneralization is applied, if necessary. Finally, the resourcepasses through the absence module, where the publicationtime of the resource is defined so that absence privacy isenforced. We proceed to cover these steps in detail.

4.2 Single resource generalizationThe SRG module performs a generalization of the origi-

nal resource r according to the privacy preferences of eachuser associated with the resource and produce as output ageneralized resource srg that covers its users’ MURs.

To obtain this result, a granularity Gr is computed suchthat each granule of Gr is the union of a set of granules ofGu for each u in r.Udata. Then, the resource srg is producedsuch that srg .STdata = Gr[r.STdata].

In order to guarantee that a proper granularity Gr existsfor each possible set r.Udata, we limit the possible spatialand temporal granularities each user can choose as locationprivacy preferences. In brief, we assume that each user u canchoose a spatial granularity GS

u in a set of granularities thatare totally ordered. If a granularity G1 precedes G2 withrespect to this order then every granule of G2 is equal to theunion of some granules of G1. The same applies for tempo-ral granularities. With these assumptions, the granularityGr exists and is obtained as the combination of the coarserspatial granularity GS

u for each user u ∈ r.Udata and thecoarser temporal granularity GT

u for each user u ∈ r.Udata,denoted in the following with GS

r and GTr , respectively. As

srg .STdata is the granule Gr[r.STdata], for simplicity werefer to this as the granule of srg , and we denote it asGr[srg .STdata].

4.3 CloakingAs observed in Section 2, privacy violations occur when

resources are not independent. The aim of the cloaking mod-

ule is to find a generalized resource, derived from srg andGr, that does not cause a privacy violation when combinedwith resources in R′. The resulting generalized region isequivalent to or larger than srg .STdata. We propose twodifferent cloaking algorithms, called CountryCloakWyseand ClockWyse, that apply a generalization only on thespatial or temporal dimensions, respectively.

In order to reduce the number of resources belonging to R′

that need to be considered during the generalization process,both algorithms exploit the transitivity property of reach-ability (Property 1). Indeed, to obtain the independencewith respect to all resources, it is sufficient to identify a setof relevant resources and to enforce independence with re-spect to the resources in this set. To formally define the rel-evant resource, we need the following notation: resource r1

“precedes” resource r2 if r1.Tmax ≤ r2.Tmin and resource r3

“succeeds” resource r4 if r3.Tmin ≥ r4.Tmax . The set of rele-vant resources can then be defined as the union, for each useru ∈ r.Udata, of the resources ri ∈ R′ such that u ∈ ri.Udataand one of the following conditions holds:

• (concurrency): ri.Tdata ∩ srg .Tdata 6= ∅,• (preceding): ri precedes srg and there does not exist

any resource that precedes srg and succeeds ri,

• (succeeding): ri succeeds srg and there does not existany resource that succeeds srg and precedes ri.

In addition, in order to efficiently compute the independenceof the relevant resources, we exploit Property 2 that reducesthe problem of evaluating reachability to the computationof a single inequality.

Property 2. Given a velocity v and two resources r1 andr2, r1 is reachable from r2 if and only if the following con-dition holds:

hs(r1.STdata, r2.STdata) ≤

v ·max(|r2.Tdata|

2, ht(r2.STdata, r1.STdata)),

where |r2.Tdata| = r2.Tmax − r2.Tmin and hs and ht are thedirect spatial and temporal Hausdorff distances, respectively.

The direct spatial (temporal) Hausdorff distance betweenr1.STdata and r2.STdata is the maximum distance betweenany point in r1.Sdata (Tdata) to its closest point in r2.Sdata(Tdata). The spatial Hausdorff distance has been exploitedin literature to describe attacks based on the knowledgeof maximum velocity [2]. Property 2 is an extension forspatio-temporal regions that also considers the direct tem-poral Hausdorff distance.

The next two sections illustrate our two proposed gener-alization algorithms.

4.4 Spatial generalization: CountryCloakWyse

CountryCloakWyse generalizes Sdata with respect tothe srg while preserving Tdata. This kind of generalizationis thus suitable for services in which a high time accuracy isdesirable, such as microblogging services.

The pseudocode can be found in Algorithm 1. Overall,the idea of the algorithm is the following. First, from therelevant resources, two sets of granules of Gr are identified,among the set Gsrg.Tdata of the granules obtained as theproduct of each spatial granule of GS

r with GTr [srg .Tdata],

The set Ssafe contains granules such that if the granule of srg

Algorithm 1 CountryCloakWyse

Input: The resource srg , the granularity Gr and a safe setof resources R′.Output: A resource r′ such that R′ ∪ {r′} is a safe set.

1: compute Gsrg.Tdata

2: Ssafe , Sgener ← Gsrg.Tdata

3: RR = { set of resources in R′ that are relevant to srg}4: for each rR ∈ RR do5: SI gener = {g ∈ Gsrg.Tdata : g is reachable from rR}6: SI safe = {g ∈ Gsrg.Tdata : g is independent from rR}7: Sgener = Sgener ∩ SI gener

8: Ssafe = Ssafe ∩ SI safe

9: end for10: Sgener = Sgener \ Ssafe

11: if (Gr[srg .STdata] 6∈ Sgener ∪Ssafe) or (Ssafe = ∅) then12: return null {Deny the publication}13: end if14: cg = NN of srg in Ssafe

15: r′.Sdata =S{ RNN of cg in Sgener}

16: r′.Sdata = r′.Sdata ∪ cg17: 〈r′.Tdata, r′.Content〉 ← 〈srg .Tdata, srg .Content〉

is one of these granules then it would not require any gen-eralization. The set Sgener contains the granules such thatif the granule of srg is one of these granules then a general-ization is required to guarantee independence (lines 2–10).

Figure 4: Spatial generalization

If the granule of srg is neither in Sgener nor Sgener , or ifSsafe is empty, then the publication of the resource is denied(lines 11–12). Otherwise, due to Property 3, if srg .STdatais a granule of Sgener , then, to guarantee the independenceof the resource, it is sufficient to generalize srg .STdata sothat a granule of Ssafe is included. In order to achieve thiswith a non-invertible algorithm, the following is performed:the granule cg is computed as the granule in Ssafe that is thenearest neighbor (NN) of srg .STdata (this can be srg .STdataitself, if it is a granule in Ssafe). Then the generalization iscomputed as the union of cg with the set of granules ofSgener that considers cg as the closest granule in Ssafe . Thisis computed with a reverse nearest neighbor (RNN) query(lines 14–15). Both the NN and the RNN queries are com-puted considering the centers of the granules as source andtargets.

Property 3. Let A, B, and C be spatio-temporal regionssuch that A is reachable from C and A is not reachable fromB. Then A is reachable from B ∪ C.

An example of the algorithm is illustrated in Figure 4.The figure represents the safe and generalizable granules forsrg and a related resource rR, the closest safe granule to srg

and the resulting generalization. In general, the generaliz-able granules are partitioned with respect to their closer safegranule, and all the granules that belong to a partition areretrieved as the output of the generalization.

4.5 Temporal generalization: ClockWyse

ClockWyse performs temporal generalization techniqueand is suitable for services in which it is desirable to preservethe accuracy of the spatial location, like in check-ins, ratherthan maintaining an accurate temporal interval. In this case,a generalization is computed considering the set Gsrg.Sdata

of the granules obtained as the product of each temporalgranule of GT

r with GSr [srg .Sdata].

The pseudocode is shown in Algorithm 2. The idea of thealgorithm is the following: as inCountryCloakWyse, thetwo set of granules Ssafe and Sgener are computed from therelevant resources. However, in this case, since the gener-alization is performed on a single dimension (the temporaldimension) Ssafe and Sgener are intervals of time granules andhence can be specified in terms of the intervals’ boundaries.Another difference with respect to CountryCloakWyseis that to compute the two sets, it is necessary to considerthe three cases in which a relevant resource is preceding,succeeding, or concurrent wrt. srg (lines 3–16).

Figure 5: Temporal generalization (one-dimensionalspace)

We explain the algorithm for the case in which a relevantresource rR precedes srg ; the other cases are analogous. SeeFigure 5 that shows, for the sake of simplicity, a single spatialdimension. If g1 is the granule of srg then srg .STdata is notreachable from rR. Hence, even if srg is generalized, it is stillunreachable from rR, and the publication of the resource hasto be denied (lines 17–18). All of the granules to the right ofg1 are reachable from rR. However rR is not reachable fromall of them. Indeed, rR is not reachable from the granulesto the left of sfut . According to Property 3, these granulescan be generalized together with a granule in Ssafe to enforcelocation privacy. The solution adopted to achieve this with anon-invertible algorithm is the following: if the granule of srgis sfut or in Sgener then the temporal domain is generalizedto the “Generalization” box, i.e., the union of sfut and all thegranules in Sgener (lines 19–20). Otherwise, if the granule ofsrg is in Ssafe , but is not sfut , no temporal generalization isapplied (line 23–24).

A numerical example showing a two-dimensional spacecan be seen in Figure 6. The striped lines centered in eachresource r′ with radius t = x enclose, for a user who is insider′ and can travel at a maximum velocity of v, the possiblelocations after x time instants. Graphically, these regions

Algorithm 2 ClockWyse

Input: The resource srg , the granularity Gr and a safe setof resources R′.Output: A resource r′ such that R′ ∪ {r′} is a safe set.

1: compute Gsrg.Sdata

2: RR = { set of resources in R′ that are relevant to srg}3: for all rR ∈ RR do4: if (srg is concurrent with rR) or

(hs(rR, srg) ≤ v · |srg.Tdata|2

) then5: spast , sfut ← past/future safety boundaries6: ge fut ← future generalization boundary7: gepast ← Previous granule of ge fut − 18: else if srg .Tmin ≥ rR.Tmax then9: ge fut ← future generalizable boundary

10: sfut ← future safety boundary11: else12: gepast ,← past generalizable boundary13: spast ← past safety boundary14: end if15: Update generalizable and safety boundaries16: end for17: if (Gepast < srg < Ge fut) or (Sfut > Spast) then18: return null {Deny the publication}19: else if Ge fut ≤ srg ≤ Sfut then20: r′.Tdata =

S{g ∈ Gsrg.Sdata : Ge fut ≤ g ≤ Sfut}

21: else if Spast ≤ g ≤ Gepast then22: r′.Tdata =

S{g ∈ Gsrg.Sdata : Spast ≤ g ≤ Gepast}

23: else24: r′.Tdata = srg .Tdata {No generalization}25: end if26: 〈r′.Sdata, r′.Content〉 ← 〈srg .Sdata, srg .Content〉

correspond to the Minkowski sum of r′.Sdata and a circularregion having radius v · x (to simplify the notation, in thefigure we write t = x to refer to t = v · x).

Figure 6: Example of temporal generalization (two-dimensional space)

The figure shows the Minkowski sum of rR.Sdata and acircular region having radius t = 4, which is the time dif-ference between srg .Tdata and rR.Tdata. As this regioncontains srg , it means that it is possible to travel from rR

to any point of srg in 4 time instances. Therefore, srg isreachable from rR and the resource can be generalized byour technique (if this first condition does not hold, our al-gorithm denies publication). However, as can be observedin the figure, the opposite does not hold: it is not possi-ble to reach, from srg , all the points in rR in 4 time in-stants. Thus, rR is not reachable from srg . Intuitively, ifsrg is published, there is a part of rR that could be dis-carded as the origin of the resource rR (a user must have

been in the light gray region of rR in order to be able toreach srg by time t = 5). Therefore, a generalization tech-nique must be applied. Our temporal generalization obtainsthe following values: (a) Ge fut = 3 + 1, which intuitivelycorresponds to the minimum radius of a circular region suchthat the Minkowski sum of rR.Sdata and that circular regionfully covers srg , plus rR.Tdata, and (b) Sfut = 6 + 1, whichintuitively corresponds to the minimum radius of a circu-lar region such that the Minkowski sum of srg .Sdata andthat circular region fully covers rR, plus rR.Tdata. Then,srg .Tdata is generalized to the interval [Ge fut , Sfut ] = [4, 7].

4.6 Absence privacy enforcement and publi-cation

A resource r′ generalized by CountryCloakWyse orClockWyse is guaranteed to be independent from everyother existing resource, but this does not offer any guaran-tees with respect to the absence privacy preferences Su ofthe users (see Section 2.5). To enforce absence privacy, thepublication time of the resource must be computed in a waysuch that every spatial region in the Su of the users in Udatais reachable from r′ at that time. Algorithm 3 computes thepublication time of a resource r′ considering absence privacypreferences.

Algorithm 3 MinimumPublicationTime

Input: A resource r′ generalized by the cloaking moduleOutput: The minimum publication time for r′

1: delay = 02: for all u ∈ r′.Udata do3: Su = the set of spatial regions chosen by u in the

absence privacy preference4: for all apr ∈ Su do

5: delay = max(delay , hs(apr,r′.Sdata)v

)6: end for7: end for8: return delay + r′.Tmin

To determine the publication time of the resource, themaximum value between r′.Tmax and the value of Mini-mumPublicationTime is first stored in a temporary attributer′.pTime. The resource is then stored in the database ofgeneralized resources, and the Publisher component is re-sponsible for releasing the generalized resource to the GeoSNonce the time instant r′.pTime is reached. This guaranteesthat (a) it is not possible to exclude any of the points ofr′.Tdata as possible Tdata of the original resource, and (b)after pTime has passed, each user in Udata can be located inany of the spatial regions indicated by the absence privacypreference.

However, if another resource r′′ relevant to r′ is submitted,processed and published before r′.pTime, in some particularcases, an adversary may detect, by looking at r′′, that thereis a resource waiting to be published in the system. In theworst case, the adversary can obtain accurate informationabout STdata and Udata of r′, and this could enable theadversary to determine that a user in Udata can not be lo-cated in one of the spatial regions indicated in the absenceprivacy preference. To ensure that this attack cannot beperformed, we modify our cloaking algorithms so that, dur-ing the generalization of a resource r′, the maximum pTimeof the resources that were considered for the generalization

is stored in a variable called relpTime. The final publica-tion time of r′ then becomes the maximum among relpTime,MinimumPublicationTime(r′), and r′.Tmax.

4.7 Correctness of privacy enforcementThe Wyse algorithm enforces privacy because it guaran-

tees the sufficient properties of Theorems 1 and 2. This isformally stated in Theorem 3.

Theorem 3. The Wyse algorithm enforces location andabsence privacy.

The correctness of the theorem follows from the followingreasoning. To offer location privacy, the SRG module gener-alizes each resource so that it covers its users’ MURs (con-dition (a) of Theorem 1). Also, the cloaking module guar-antees that each resource is independent with respect to itsrelevant resources which, according to Property 1, meansthat the resource is independent of all the other resources(condition (b) of Theorem 1). As a result, the Wyse al-gorithm is not invertible (condition (c) of Theorem 1). In-tuitively, this holds because the Wyse algorithm implicitlycomputes a partition of the spatio-temporal domain. Then,the spatio-temporal attribute of each generalized resourceg(r) is computed as the block containing r.STdata.

According to Theorem 2, it is sufficient to guarantee twoconditions in order to enforce absence privacy. The first,which concerns the location privacy of the published re-sources, follows from the above reasoning. The second, whichrelates to the publication time, is guaranteed by the Absencemodule that delays the publication of resources according touser’s preferences.

5. EMPIRICAL STUDYWe conducted experiments to measure the impact of the

Wyse technique on the quality of service, comparing theClockWyse and CountryCloakWyse variants.

5.1 SettingThe experimental evaluation was performed on a survey-

driven synthetic dataset of user movements, which was ob-tained using the MilanoByNight simulation4. We carefullytuned the simulator in order to reflect a typical deploymentscenario of a GeoSN: 100,000 potential users moving betweentheir homes and one or more entertainment places in the cityof Milan during a weekend night. Locations are sampled ev-ery minute. The total size of the map is 215 km2, and theaverage density is 465 users/km2. The simulation also mod-els the time spent at the entertainment places, i.e., whenno movement occurs, following probability distributions ex-tracted from user surveys. The observed maximum velocityis 55.05 km/h, which is used as the velocity parameter forthe generalization techniques.

We extended the simulation by specifying how often theresources are generated by users. When a resource is gener-ated by a user, the other users that are located in the sameplace are associated with the resource, and the resource issubmitted. However, as it is unlikely that all the users inthe same place are associated with each resource generatedin that place, we limit the number of users that can be as-sociated with the same resource.

4See everywarelab.dico.unimi.it/lbs-datasim

The experiments use regular granularities, i.e., all thegranules of a granularity have the same size and shape. Eachtemporal granularity is identified by the duration of timecovered by one single granule. Each spatial granularity isidentified by the size of the edge of one cell of a grid. Theconversions from a spatio-temporal location to the respec-tive spatial and temporal granules are performed in constanttime.

For absence privacy, we assume that each user’s homeis sensitive. For simplicity, we assume that each user se-lects randomly, at the beginning of the simulation, a spatio-temporal granularity among the ones we consider. To ob-serve the impact of the preferences, we limit the minimumgranularity that a user can choose. The parameters used inthe experiments are shown in Table 1, with default values inbold. We implemented our algorithms using the Java lan-guage, and the studies were performed on a computer witha 2.5 GHz Intel Xeon processor and 32GB of shared RAM,running 64-bit RedHat Enterprise Linux 5 and using 1GBof allocated memory for the JVM.

Parameter ValuesResources posted by a user in 1 hour 1,2,6,12

Max number of users in UData 1,5,10,20

Min time interval of GTu (minutes) 1,4,8,16,32

Min edge of a cell of GSu (meters) 128,256,512,1024,2048

Table 1: Parameters and their values

5.2 Quality of serviceFigure 7(a) shows the percentage of times a technique

fails to find a safe generalized resource, considering differ-ent resource posting frequencies. In order to observe whenthe cloaking techniques actually avoid a resource from be-ing dropped, we developed a technique that consists onlyof the Single Resource Generalization step (SRG) and thatdrops the resource when the resulting generalization violatesprivacy. As can be observed, the percentage of droppedresources grows when the resources become more frequent.This is because it then becomes more likely that the relevantresources are close in time and the privacy condition is notsatisfied. We can also observe that both the ClockWyseand CountryCloakWyse techniques can often produce asafe generalized resource even when the resource generatedby SRG is unsafe. In particular, with our default values, theCountryCloakWyse and ClockWyse algorithms dropabout one half of the resources dropped by SRG, and thedrop ratio of ClockWyse grows much more slowly thanthose of the other techniques.

To observe the generalizations applied to the resources,we analyzed the average spatial area, the average temporalduration (i.e., the length of the temporal interval Tdata),and the average publication delay.

Figure 7(b) shows the average area using different val-ues for the minimum Gs chosen by users. As it can beobserved, the area grows linearly with the edge length ofthe spatial granules. In addition, the CountryCloakWysegenerates, on average, regions having areas larger than theones generated by SRG. This is expected because the Coun-tryCloakWyse attempts to extend the spatial region toavoid privacy violations and/or dropping.

In Figure 7(c), we measure the average time duration ofthe generalized resource considering different minimum val-ues of GT

u . With our default parameters and using bothalgorithms, the average time duration is around 40 minutes,

0

1

2

3

0 2 4 6 8 10 12

%of

dro

pp

edre

s.

Resources posted per hour

SRGCCWyse

CWyse

(a) Drop ratio

10

20

30

40

50

0 400 800 1200 1600 2000

km

2

Min. edge length of a cell of Gs (m)

CCWyseCWyse/ SRG

(b) Area

0

100

200

300

0 5 10 15 20 25 30 35

min

ute

s

Min duration of a granule of Gt (minutes)

CWyseCCWyse/ SRG

(c) Time interval

1416182022242628

0 2 4 6 8 10 12 14 16 18 20

min

ute

s

Maximum cardinality of Udata

CWyseCCWyse

(d) Publication delay

Figure 7: Impact on the generalized resource

with ClockWyse and CountryCloakWyse performingsimilarly. However, ClockWyse may produce a time inter-val that is significantly longer than the average. The verti-cal bars show the maximum time durations observed in theexperiments. For this reason, it is suggested to use Coun-tryCloakWyse for those services that perform better witha more accurate time duration.

Figure 7(d) shows the average publication delay of the re-sources generated by our techniques for varying maximumvalues of Udata. The delay can be caused by both the timegeneralization due to location privacy enforcement and thedelay due to absence privacy enforcement. The results indi-cate that the larger the number of users in Udata, the moredelay is added to the resource. This happens mainly be-cause having more users in Udata increases the probabilitythat some user has stricter (i.e., coarser) location privacypreferences and also results in more absence privacy prefer-ences that need to be satisfied.

5.3 RuntimeWe measured the runtime required by each algorithm to

process a resource. With our default values, the averagecomputation time for ClockWyse is 23.24 ms, while it is39.13 ms for CountryCloakWyse. These results suggestthat our prototype can support a mid-scale resource publish-ing service. This is without any optimizations to the code.We believe that the cost of some operations (e.g., the NNand the RNN operations) can be significantly reduced byoptimizations, including the use of spatio-temporal indexes.

6. CONCLUSIONSAs social networking services continue to proliferate, there

is an increasing need for means of affording users privacy.This paper proposes such means for a setting in which so-cial network users can publish resources that have spatio-temporal tags and that reference other users—a prototypi-cal example is a photo with location and time and a listingof who appears in the photo.

This paper addresses two privacy threats in this setting,namely location privacy and absence privacy. The formerconcerns the availability of information about the presenceof users in specific locations at given times, while the latterconcerns the availability of information about the absenceof an individual from specific locations during given periodsof time. The paper is the first to address these threats inthis setting.

The paper formalizes the setting, provides a foundationfor easily defining privacy preferences, and provides tech-niques that generalize the tags of resources so that theseremain useful while ensuring that the privacy preferencesare enforced. These techniques exploit spatial and temporalgeneralization, and they utilize publication delays. An em-

pirical study characterizes the impact of the proposal on thequality of service, considering the number of resources thatare dropped, the extents of the spatio-temporal generaliza-tions, and the publication delays, suggesting that these arerelatively modest. Further, the runtime of the techniques islow.

In future research, it is relevant to study how to best en-able users to specify their location and absence privacy pref-erences as supported by the paper’s proposal. Next, it maybe of interest to consider more general privacy preferences.Finally, it is of interest to consider integration with accesscontrol mechanisms.

AcknowledgmentsThis work was partially supported by Italian MIUR un-der grants PRIN-2007F9437X and FIRB-RBFR081L58 002,and by the NSF under grant CNS-0716567. The researchwas performed when C. S. Jensen was with Aalborg Univer-sity. C. S. Jensen is an adjunct professor at University ofAgder, Norway.

7. REFERENCES[1] C. Bettini, X. S. Wang, and S. Jajodia. Time Granularities in

Databases, Temporal Reasoning, and Data Mining. Springer,2000.

[2] G. Ghinita, M. L. Damiani, C. Silvestri, and E. Bertino.Preventing velocity-based linkage attacks in location-awareapplications. Proc. of ACM Int. Symp. on Advances inGeographic Information Systems, 246–255, 2009.

[3] C. S. Jensen, H. Lu, and M. L. Yiu. Location privacytechniques in client-server architectures. Privacy inLocation-Based Applications, LNCS 5599, 31–58. Springer,2009.

[4] P. Kalnis, G. Ghinita, K. Mouratidis, and D. Papadias.Preventing location-based identity inference in anonymousspatial queries. IEEE Transactions on Knowledge and DataEngineering, 19(12):1719–1733, 2007.

[5] S. Mascetti, C. Bettini, D. Freni, and X. S. Wang. Spatialgeneralization algorithms for LBS privacy preservation.Journal of Location Based Services, 1(3):179–207, 2007.

[6] S. Mascetti, C. Bettini, D. Freni, X. S. Wang, and S. Jajodia.Privacy-aware proximity based services. Proc. of the 10th Int.Conference on Mobile Data Management, 31–40, 2009.

[7] L. Siksnys, J. R. Thomsen, S. Saltenis, and M. L. Yiu. Privateand flexible proximity detection in mobile social networks.Proc. of the 11th Int. Conference on Mobile DataManagement, 75–84, 2010.

[8] G. Zhong, I. Goldberg, and U. Hengartner. Louis, Lester andPierre: Three protocols for location privacy. PrivacyEnhancing Technologies, LNCS 4776, 62–76. Springer, 2007.

preserving location and absence privacy in geo-social networks

Documents