statistical databases – query auditing
DESCRIPTION
Statistical Databases – Query Auditing. Li Xiong CS573 Data Privacy and Anonymity. Partial slides credit: Vitaly Shmatikov, Univ Texas at Austin. Q 1. Q 2 … Q n. …. Q. A 1. A 2 … A n. A or “Denied”. Query Audit Problem. Maintaining “privacy” of data. Auditor. Database. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/1.jpg)
Statistical Databases – Query Auditing
Li Xiong
CS573 Data Privacy and Anonymity
Partial slides credit: Vitaly Shmatikov, Univ Texas at Austin
![Page 2: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/2.jpg)
slide 2
Query Audit Problem
Database
Q1
Auditor
Maintaining “privacy” of data
…
Q2 … Qn
A1 A2 … An
Q
Does answer to Q combined with answers to Q1,…,Qn reveal something?
A or “Denied”
![Page 3: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/3.jpg)
slide 3
Variations of the Problem
Database
Q1
Auditor
…
Q2 … Qn
A1 A2 … An
List of real, integer, or Boolean values
Specifies subset of the variables
Min, max, median, sum, average, or count of specified subset
Wants to learn value of some variable
![Page 4: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/4.jpg)
slide 4
Offline auditing Given a collection of queries and answers to them,
check whether anything “forbidden” was revealed Detects privacy breaches after the fact
Online auditing Queries are presented to auditor one at a time;
auditor checks if answering the current query (in combination with past answers) reveals “forbidden” information
Prevents privacy breaches on-the-fly
Offline vs. Online
![Page 5: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/5.jpg)
Disclosure measures
Full disclosure (exact-value disclosure) – the exact value of a protected attribute is disclosed
Partial disclosure (interval-based disclosure) – the disclosed range (difference of the lower and upper bounds) of the protected attribute is smaller than a predefined threshold
Probability-based disclosure – the posterior distribution of the data after answering queries is significantly different from its prior distribution
![Page 6: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/6.jpg)
Offline Auditors for Full Disclosure Sum queries Max and min queries Median and average queries
![Page 7: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/7.jpg)
Audit Expert (Chin 1982)
Query auditing method for SUM queries A SUM query can be considered as a linear equation
where is whether record i belongs to the query set, xi is the sensitive value, and q is the query result
A set of SUM queries can be thought of as a system of linear equations
Maintains the binary matrix representing linearly independent queries and update it when a new query is issued
A row with all 0s except for ith column indicates disclosure
![Page 8: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/8.jpg)
Offline Auditing for Full Disclosure Arbitrary combinations of aggregate queries It is unlikely there will be efficient on-line
application algorithms for SUM + MAX queries, SUM + MIN queries, or SUM + MAX + MIN queries.
Example hardness results:
Theorem. There is no polynomial time full-disclosure auditing algorithm for sum and max queries unless P=NP.
![Page 9: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/9.jpg)
slide 9
Auditing Sum Queries on Booleans
Database: collection of secret Boolean variables Query: specifies subset S of variables Answer: sum of variables in S Privacy breach: after asking several queries, user
learns the value of some secret variable(s) Auditing problem: given a set of Boolean
equations, is there a variable that has the same value in all solutions?
Auditing Boolean attributes, Kleinberg, 2000
![Page 10: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/10.jpg)
slide 10
Linear Diaphantine equations Query can be safe on real-valued, unbounded
data, but reveal information when the data are discrete, with known bounds
Hardness results: the auditing problem for Boolean values is coNP-complete.
x + y + w = 1y + z = 1x + z = 1
Real: multiple solutions, secureBoolean: unique solution, insecure (why?)
Why Is This Interesting?
![Page 11: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/11.jpg)
Offline Auditor for Partial Disclosure
Partial disclosure – the disclosed range of the protected attribute is smaller than a predefined threshold
Sum queries
Interval-based disclosure: monitoring upper and lower bounds for all confidential attributes
Auditing interval-based inference. Li et al. 2002
![Page 12: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/12.jpg)
Offline Auditor for Partial Disclosure New query A series of linear programming problems
![Page 13: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/13.jpg)
Incremental evaluation of the LP
Treats the auditing problem as a series of updation problems and updates the bounds with certain rules
Horizontal updation – given the same set of queries, the bounds of one variable, how can the prior result be modified to get the bounds of other variables
Vertical updatation – given the same set of variables, and bounds under the previous queries, how can the prior result modified to get the bounds when a new query arrives
An efficient online auditing approach to limit private data disclosure, Lu, 2009
![Page 14: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/14.jpg)
slide 14
Online Auditing
Database
Qi+1
Auditor
A or “Denied”
Previous queries Q1 … Qi
“Denied” if answering Qi+1 would cause a
privacy breach
![Page 15: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/15.jpg)
Online Auditing
Given a sequence of queries and corresponding answers, and a new query, determine if the new query should be answered or denied in order to prevent a privacy breach.
Earliest approaches Query set size control Query set overlap control Limited data utility
![Page 16: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/16.jpg)
Offline to Online?
Can an offline auditor directly solve the online auditing problem?
Denials leak information!
![Page 17: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/17.jpg)
slide 17
“On the advice of my counsel I respectfully and regretfully decline to answer the question based on my constitutional rights.”
Colonel Oliver North, on the Iran-Contra arms deal
“Mr. Chairman, I would like to answer the committee's questions, but on the advice of my counsel I respectfully decline to answer the question based on the protection afforded me under the Constitution of the United States.”
David Duncan, former auditor for Enron and partner in Arthur Andersen
Sounds Familiar?[slide stolen from Kobbi Nissim]
![Page 18: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/18.jpg)
slide 18
Variables di are real, privacy breached if adversary learns some di
Example: Sum/Max
Database
Gimme sum(d1,d2,d3)Auditor
Answer=15
Gimme max(d1,d2,d3)
“Denied”
Wait… there must be a reason why second
query was denied
Oh well
The only possible reason for
denial is if d1=d2=d3=5
![Page 19: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/19.jpg)
slide 19
Online Audit
Denials reduce the search space
Possible assignments to {d1,…,dn}
Assignments consistent with (q1,…qi; a1,…,ai)
qi+1 denied
![Page 20: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/20.jpg)
One workaround
Deny whenever the offline algorithm does, in addition, randomly deny queries.
Issues Leakage is not prevented Have to remember which queries were
randomly denied Semantically determine whether two queries
are equivalent
![Page 21: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/21.jpg)
Simulatable Auditing
Observation: denials have the potential to leak information if the auditor uses information that is unavailable to the attacker (the answer to the new query)
Simlatable auditing: the attacker should be able to simulate or mimic the auditors decisions to answer or deny a query
Denials provably do not leak information
Simulatable Auditing, Kenthapadi, 2005
![Page 22: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/22.jpg)
slide 22
Simulatable Auditing An auditor is a function of Q, A and X An auditor is simulatable if there exists a simulator
that is a function of only Q and A-ai+1 and whose output is always equal to the auditor.
Auditor
qi+1
Deny or answer
qi+1
Deny or answer
Simulator
q1,…,qi
a1,…,aiDatabase
q1,…,qi
![Page 23: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/23.jpg)
slide 23
Possible assignments to {d1,…,dn}
Assignments consistent with (q1,…qi, a1,…,ai )
qi+1 denied/allowed
Simulatable Auditing
![Page 24: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/24.jpg)
Simulatable Auditing
Query-set-size control Query-set-overlap control Audit expert for sum queries
![Page 25: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/25.jpg)
Constructing Simulatable Auditor
General sufficient condition: the auditor should determine if there is any possible dataset, consistent with all past responses, in which the answer to the current query would cause some element to be fully disclosed
![Page 26: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/26.jpg)
slide 26
Example revisited: Sum/Max
Database
Gimme sum(d1,d2,d3)Auditor
Answer=a1
Gimme max(d1,d2,d3)
“Denied”
Simulatable auditor would always deny the max query following a sum query
Lose some utility due to the requirement of simulatability
![Page 27: Statistical Databases – Query Auditing](https://reader036.vdocuments.mx/reader036/viewer/2022062409/56814bf1550346895db8d99a/html5/thumbnails/27.jpg)
Challenges
Privacy definition Privacy of groups/families
Algorithmic limitations Simulatable algorithms computationally prohibitive Most work on sum queries, some on max, min,
median, hardness results on mixed queries Collusion
Reduced utility for legitimate users Large audit trail
Utility Percentage of denials may not be the best measure