[ieee 2011 ieee international conference on pervasive computing and communications workshops (percom...

Download [IEEE 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops) - Seattle, WA, USA (2011.03.21-2011.03.25)] 2011 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops) - Demystifying privacy in sensory data: A QoI based approach

Post on 19-Mar-2017




1 download

Embed Size (px)


  • Demystifying Privacy In Sensory Data: A QoI based approach

    Supriyo Chakraborty, Haksoo Choi and Mani B. SrivastavaUniversity of California, Los Angeles{supriyo, haksoo, mbs}@ucla.edu

    AbstractThere is a growing consensus regarding the emer-gence of privacy concerns as a major deterrent towards thewidespread adoption of emerging technologies such as mobilehealthcare, participatory sensing and other social networkbased applications. In this paper, we motivate the need for pri-vacy awareness, present a taxonomy of the privacy problems,and the various existing solutions. We highlight the tensionthat exists between quality of service at the receiver and theprivacy requirement at the source and present a linear programformalization to model the tradeoff between the two objectives.We further present the design and architecture of SensorSafe,a framework which allows privacy-aware sharing of sensoryinformation.

    Keywords-Privacy; Trust; Information Sharing; Access Con-trol;

    I. INTRODUCTIONThe canonical privacy problem for databases can be

    summed up as: given a collection of personal records fromindividuals, how do we disclose either the data or usefulfunction values such as correlations, population character-istics computed over the data, without revealing individualinformation. This notion of absolute privacy is analogous tothe principle of semantic security formulated for cryptosys-tems by Goldwasser et. al. [1]. A trivial solution to the aboveproblem is to disclose random data, bearing no relation tothe actual database. However, the implicit notion of utilityassociated with the disclosed data prohibits the use of sucha scheme. In addition, the utility requirement in conjunctionwith adversarial access to auxiliary information makes itimpossible to achieve absolute privacy [4]. Current researchin database privacy has thus evolved into a study of thetradeoff involving degradation in the quality of informationshared, owing to privacy concerns, and the correspondingeffect on the quality of service [9][10].

    While database privacy has been extensively studied overthe past decade, the proliferation of mobile smartphoneswith embedded and wireless connected wearable sensorshas added a new dimension to the problem. The miniaturesensors which can be easily placed on an individual can beused to unobtrusively collect large amounts of personal data.This data, which is richly annotated with both temporal andspatial information is in turn used by a variety of partici-patory sensing applications [11], healthcare studies [32][33],behavioral studies to extract both population- and individual-level inferences. As evident, sensory data pertaining toan individuals daily life can be used to infer extremely

    sensitive inferences and hence adequate privacy measures arequintessential. Concerns about privacy has the potential toreduce user participation, which is critical for the success ofpopulation based studies, and/or reduce application fidelitydue to random degradation in data quality. Thus, on onehand, from an data sources perspective the fundamentalquestions while sharing sensory data are: (1) Whom to sharethe data with? (2) How much data should we share? and (3)How to account for the neighborhood influences (collusion)while sharing information? On the other hand, a data receiveris typically concerned about the quality of information(QoI) received and the resulting quality of service (QoS).These seemingly contradicting objectives of the source andthe receiver creates a tension which is fundamental to theprivacy problem.

    In this paper, we present a linear program formulation ofthe above risk mitigation problem. We use the notion of trustto quantify the risk of leakage and use a source specifiedtolerance parameter to simultaneously maximize applicationQoS and minimize information leakage. Varying the toler-ance parameter allows the source to compute the QoS for agiven privacy level and thus provides a greater flexibilityof operation. Using the trust graph provides insight intothe possible collusion possibilities between receivers furthereffecting the quality of the data shared. In addition, we alsobriefly discuss the design and architecture of SensorSafe -a framework which allows users, in a participatory sensingsetting, to share data in a private way. It does so by providingusers with fine-grained access control primitives and alibrary of obfuscation algorithms to control information andinference disclosure.

    Section II discusses recent privacy attacks and also out-lines possible privacy concerns in future applications. Thefollowing section categorizes the privacy threats. This isimportant for two reasons. First it allows the source to betterestimate the possible threats and their consequences. Second,it allows a better selection of the information disclosuremechanism. Section IV discusses the importance of trustestablishment and its role as a precursor to privacy concerns.Section V summarizes the current approaches for solvingthe privacy problem in various application domains. This isfollowed by Section VI and Section VII where we presentour work. We conclude in Section VIII.

    The Third International Workshop on Information Quality and Quality of Service for Pervasive Computing

    978-1-61284-937-9/11/$26.00 2011 IEEE 38


    Is privacy something that people really care about? A re-cent survey on location privacy [18] summarized interestingopinions about privacy from multiple independent studies.While people in general were oblivious to privacy violationsand amenable to sharing their data, the perception quicklymorphed into one of concern when apprised of the varioussensitive inferences that could be drawn and the resultingconsequences.

    Some of the recent high-visibility fiascos have furtherestablished privacy as a important sharing constraint. Exam-ples include the de-anonymization of the publicly releasedAOL search logs [13] and the movie-rating records of Netflixsubscribers [12]. The large datasets in question were releasedto enable data-mining and collaborative filtering research.However, when combined with auxiliary information theanonymized datasets were shown to reveal identity infor-mation of individual users.

    Sharing sensory data also presents unique challenges.Due to high temporal and spatial granularity sensory dataoffers great potential for data mining, making it difficultto preserve privacy. For example, households in the U.S.are being equipped with smart meters to collect temporallyfine-grained report of energy consumption. This will allowutility to better estimate the domestic power consumptionleading to optimized distribution and control of the grid.However, as shown in [15], several unintended and sensitiveinferences such as occupancy and lifestyle patterns of theoccupants can be made from the data in addition to totalpower consumption. Infact, privacy has been identified asa major challenge in fine-grained monitoring of residentialspaces [17]. Similarly, in medical research the continuousphysiological data collected by wearable sensors can be usedto infer potentially sensitive information such as smoking ordrinking habits, food preferences. While informed consentof the data source is the currently used sharing policy,it can be easily overlooked causing privacy violations asexemplified in [14]. The DNA information, collected fromblood samples of a particular tribe, was originally meantfor type-2 diabetes research. But was later used to furtherresearch in schizophrenia - a condition stigmatized by thetribe - causing extreme anguish and sense of dissatisfaction.Similarly, participatory sensing applications [11] requireusers to voluntarily upload their personal data for purposesof a study. For example, PEIR [16], a popular tool forcomputing carbon exposure levels during trips, requiresusers to upload their travel paths. In the absence of adequateprivacy transformation the uploaded traces could be usedto find out frequently traveled paths, workplace, home andother sensitive locations.

    Thus, privacy threat during data disclosure is real andunless adequate mitigation steps are taken it could cause adelay in the adoption of various ubiquitous sensing based



    Depending on how data is shared by the source we cangroup the various privacy problems into two broad classes:

    Identity Violation: The data is syntactically sanitizedby stripping it of personally identifiable information (PII)before sharing - a process called anonymization. The shareddata is intended for research pertaining to population-levelstatistics. However, privacy violation occurs when the data inpresence of auxiliary information is de-anonymized to revealidentity information. Netflix [12] and AOL [13] fiascos fallunder this category. There are two important challenges inusing PII based anonymization. First, the definition of PII isinadequate for many types of datasets [3], including sensorydata. For example, while sharing sanitized location traces,the location data itself can be used to re-identify an indi-vidual. Hence, it is hard to clearly demarcate the PIIs fromthe other attributes shared. Second, it is assumed that thenon-PII attributes cannot be linked to an individual record.However, auxiliary information has been used along withnon-PII attributes to de-anonymize large datasets [12][2].

    Inference Violation: For this class of problems thesources identity is not concealed in the shared data. Instead,there exists a specific set of inferences which the sourcewants to protect. For example, in a medical study EKGdata is needed to monitor variability in heart rate. However,the data should not be used for other sensitive inferencessuch as smoking or drinking habits of the individual.The attacks using smartmeter data [15], location traces inPEIR [16], or the


View more >