1
Global Privacy Guarantee in Serial Data PublishingRaymond Chi-Wing Wong1, Ada Wai-Chee Fu2,
Jia Liu2, Ke Wang3, Yabo Xu4
The Hong Kong University of Science and Technology1
The Chinese University of Hong Kong2
Simon Fraser University3
Sun Yat-sen University4
Prepared by Raymond Chi-Wing WongPresented by Raymond Chi-Wing Wong
2
Outline
1. Sequential Releases2. Related Work 3. Our Proposed Privacy Model
Local Guarantee
4. Conclusion
3
1. Sequential Releases
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Time = 1
Release the data set to public
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data This table satisfies some privacy requirements(e.g., m-invariance)
4
1. Sequential Releases
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Time = 1
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 2
Release the data set to publicHospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
This table satisfies some privacy requirements(e.g., m-invariance)
Insertions, deletions and updates
5
1. Sequential Releases
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Time = 1
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 2
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 3
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
This table satisfies some privacy requirements(e.g., m-invariance)
Insertions, deletions and updates
6
1. Sequential Releases
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Time = 1
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 2
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 3
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t
7
1. Sequential Releases
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Time = 1
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 2
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 3
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t
Name Sex Zipcode
Disease
Raymond
M 65001 flu
Peter M 65002 chlamydia
Mary F 65014 flu
Alice F 65015 fever
Medical Data
Privacy Requirement:Peter would not want anyone to deduce with high confidence from thesepublished data that he has ever contracted chlamydia in the past.
A sexually transmitted disease (STD)
one or more published dataset
8
1. Sequential Releases
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Time = 1
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 2
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 3
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t
Name Sex Zipcode
Disease
Raymond
M 65001 flu
Peter M 65002 chlamydia
Mary F 65014 flu
Alice F 65015 fever
Medical Data
Privacy Requirement:Peter would not want anyone to deduce with high confidence from thesepublished data that he has ever contracted chlamydia in the past.
A sexually transmitted disease (STD)
Privacy Requirement:Probability that Peter is linked to chlamydia in one or more published dataset is at most a given threshold (e.g., 1/2).
Global Guarantee
9
1. Sequential Releases
This global guarantee requirement seems to be quite “obvious” and “natural”
No existing works consider this global guarantee requirement
Instead, they consider another requirement called local guarantee.
Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t
Privacy Requirement:Peter would not want anyone to deduce with high confidence from thesereleased data that he has ever contracted chlamydia in the past.
Privacy Requirement:Probability that Peter is linked to chlamydia in one or more published dataset is at most a given threshold (e.g., 1/2).
Global Guarantee
10
1. Sequential Releases
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Time = 1
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 2
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 3
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Name Sex Zipcode
Disease
Raymond
M 65001 flu
Peter M 65002 chlamydia
Mary F 65014 flu
Alice F 65015 fever
Medical DataA sexually transmitted disease (STD)
Privacy Requirement:Probability that Peter is linked to chlamydia in each published dataset isat most a given threshold (e.g., 1/2).
Local Guarantee
Probability that Peter is linked to chlamydia in the dataset at time = 1 is at most a given threshold (e.g., 1/2).
Probability that Peter is linked to chlamydia in the dataset at time = 2 is at most a given threshold (e.g., 1/2).
Probability that Peter is linked to chlamydia in the dataset at time = 3 is at most a given threshold (e.g., 1/2).
11
2. Related Work Local Guarantee
m-invariance Xiao et al, “m-invariance: Towards Privacy
Preserving Re-publication of Dynamic Datasets”, SIGMOD, 2007
l-scarcity Bu et al, “Privacy Preserving Serial Data Publishing
by Role Composition”, VLDB, 2008
12
Contribution
We are the first to propose the global guarantee requirement
We prove that global guarantee is a stronger requirement than local guarantee
13
How can we calculate the probability? According to the published datasets,
we derive a formula based on the possible world analysis
We skip the details.
Problem: At the current time t, we want to generate a tablewhich satisfies some privacy requirements (e.g., m-invariance)with respect to all published tables at any time <= t
Privacy Requirement:Peter would not want anyone to deduce with high confidence from thesereleased data that he has ever contracted chlamydia in the past.
Privacy Requirement:Probability that Peter is linked to chlamydia in one or more published dataset is at most a given threshold (e.g., 1/2).
Global Guarantee
14
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Time = 1
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 2
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
Time = 3
Hospital
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Medical Data
Public
Name PID Disease
Raymond p1 Flu
Peter p2 HIV
Mary p3 Fever
Alice p4 HIV
Bob p5 Flu
John p6 Fever
Published Data
15
Property
Theorem: Global guarantee is a stronger privacy requirement than local guarantee.
If the published tables satisfy global guarantee,then they satisfy local guarantee.
16
Our Algorithm
How can we generate tables such that they satisfy global guarantee?
Idea: Large group size
17
5. Conclusion
We are the first to propose global guarantee
Global guarantee is a stronger privacy requirement than local guarantee.
19
In the following, I will elaborate two concepts. Local Guarantee (e.g., m-invariance) Global Guarantee
20
Public
Hospital
Name Sex Zipcode
Disease
Raymond
M 65001 flu
Peter M 65002 chlamydia
Mary F 65014 flu
Alice F 65015 fever
Medical Data
Time = 1
Sex Zipcode
Disease
M 65001 flu
M 65002 chlamydia
F 65014 flu
F 65015 fever
Published Data Voter Registration ListName Sex Zipcod
e
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010Release the data set to public
21
Public
Hospital
Name Sex Zipcode
Disease
Raymond
M 65001 flu
Peter M 65002 chlamydia
Mary F 65014 flu
Alice F 65015 fever
Medical Data
Time = 1
Sex Zipcode
Disease
M 65001 flu
M 65002 chlamydia
F 65014 flu
F 65015 fever
Published Data Voter Registration ListName Sex Zipcod
e
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010Release the data set to public
22
Public
Hospital
Name Sex Zipcode
Disease
Raymond
M 65001 flu
Peter M 65002 chlamydia
Mary F 65014 flu
Alice F 65015 fever
Medical Data
Time = 1
Sex Zipcode
Disease
M 6500* flu
M 6500* chlamydia
F 6501* flu
F 6501* fever
Published Data Voter Registration ListName Sex Zipcod
e
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010Release the data set to public
Generalization
Each individual is linked to “chlamydia” with probability at most 1/2 in THIS PUBLISHED TABLE
2-diversity only focuses on ONE-TIME publishing
2-invariance focuses on MULTIPLE-TIME publishingIt also makes use of the idea of 2-diversity
Idea:
Each individual is linked to “chlamydia” with probability at most 1/2 for each of the MULTIPLE PUBLISHED TABLES
23
Public
Hospital
Name Sex Zipcode
Disease
Raymond
M 65001 flu
Peter M 65002 chlamydia
Mary F 65014 flu
Alice F 65015 fever
Medical Data
Time = 1
Sex Zipcode
Disease
M 6500* flu
M 6500* chlamydia
F 6501* flu
F 6501* fever
Published Data Voter Registration ListName Sex Zipcod
e
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010Release the data set to public
Name Signature
Raymond
Peter
Mary
Alice
{flu, chlamydia}
Raymond
Peter
Mary
Alice
{flu, chlamydia}
{flu, fever}
{flu, fever}
2-invariance
24
Public
Hospital
Name Sex Zipcode
Disease
Raymond
M 65001 flu
Peter M 65002 chlamydia
Mary F 65014 flu
Alice F 65015 fever
Medical Data
Time = 1
Sex Zipcode
Disease
M 6500* flu
M 6500* chlamydia
F 6501* flu
F 6501* fever
Published Data Voter Registration ListName Sex Zipcod
e
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010Release the data set to public
Name Signature
Raymond
Peter
Mary
Alice
{flu, chlamydia}
{flu, chlamydia}
{flu, fever}
{flu, fever}
2-invariance
25
Public
Hospital
Name Sex Zipcode
Disease
Raymond
M 65001 flu
Peter M 65002 chlamydia
Mary F 65014 flu
Alice F 65015 fever
Medical Data
Time = 1
Sex Zipcode
Disease
M 6500* flu
M 6500* chlamydia
F 6501* flu
F 6501* fever
Published Data Voter Registration ListName Sex Zipcod
e
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010
Release the data set to public
Name Signature
Raymond
Peter
Mary
Alice
{flu, chlamydia}
{flu, chlamydia}
{flu, fever}
{flu, fever}
2-invariance
26
Public
Hospital
Name Sex Zipcode
Disease
Raymond
M 65001 flu
Peter M 65002 chlamydia
Mary F 65014 flu
Alice F 65015 fever
Medical Data
Time = 1
Sex Zipcode
Disease
M 6500* flu
M 6500* chlamydia
F 6501* flu
F 6501* fever
Published Data
Name Sex Zipcode
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010
Release the data set to public
Name Signature
Raymond
Peter
Mary
Alice
{flu, chlamydia}
{flu, chlamydia}
{flu, fever}
{flu, fever}
Voter Registration List2-invariance
27
Public
Hospital
Name Sex Zipcode
Disease
Raymond
M 65001 flu
Peter M 65002 chlamydia
Mary F 65014 flu
Alice F 65015 fever
Medical Data
Time = 1
Sex Zipcode
Disease
M 6500* flu
M 6500* chlamydia
F 6501* flu
F 6501* fever
Published Data
Name Sex Zipcode
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010
Release the data set to public
Name Signature
Raymond
Peter
Mary
Alice
{flu, chlamydia}
{flu, chlamydia}
{flu, fever}
{flu, fever}
Voter Registration ListTime = 2
Hospital
Name Sex Zipcode
Disease
Raymond
M 65001 chlamydia
Peter M 65002 flu
Mary F 65014 fever
Emily F 65010 flu
Medical Data
Release the data set to public
Sex Zipcode
Disease
M 6500* chlamydia
M 6500* flu
F 6501* fever
F 6501* flu
Published Data
Raymond
Peter
Mary
Emily
2-invariance
28
Public
Hospital
Name Sex Zipcode
Disease
Raymond
M 65001 flu
Peter M 65002 chlamydia
Mary F 65014 flu
Alice F 65015 fever
Medical Data
Time = 1
Sex Zipcode
Disease
M 6500* flu
M 6500* chlamydia
F 6501* flu
F 6501* fever
Published Data
Name Sex Zipcode
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010
Release the data set to public
Name Signature
Raymond
Peter
Mary
Alice
{flu, chlamydia}
{flu, chlamydia}
{flu, fever}
{flu, fever}
Voter Registration ListTime = 2
Hospital
Name Sex Zipcode
Disease
Raymond
M 65001 chlamydia
Peter M 65002 flu
Mary F 65014 fever
Emily F 65010 flu
Medical Data
Release the data set to public
Sex Zipcode
Disease
M 6500* chlamydia
M 6500* flu
F 6501* fever
F 6501* flu
Published Data
Raymond
Peter
Mary
Emily
Name Signature
Raymond
Peter
Mary
Emily
{flu, chlamydia}
{flu, chlamydia}
{flu, fever}
{flu, fever}
This table satisfies 2-invariance.
This is because each individual is linked to the SAME signature.Idea of 2-invariance:
Each individual is linked to the SAME signature in each published table.
2-invariance
29
Public
Hospital
Name Sex Zipcode
Disease
Raymond
M 65001 flu
Peter M 65002 chlamydia
Mary F 65014 flu
Alice F 65015 fever
Medical Data
Time = 1
Sex Zipcode
Disease
M 6500* flu
M 6500* chlamydia
F 6501* flu
F 6501* fever
Published Data
Name Sex Zipcode
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010
Release the data set to public
Name Signature
Raymond
Peter
Mary
Alice
{flu, chlamydia}
{flu, chlamydia}
{flu, fever}
{flu, fever}
Voter Registration ListTime = 2
Hospital
Name Sex Zipcode
Disease
Raymond
M 65001 Chlamydia
Peter M 65002 flu
Mary F 65014 fever
Emily F 65010 flu
Medical Data
Release the data set to public
Sex Zipcode
Disease
M 6500* chlamydia
M 6500* flu
F 6501* fever
F 6501* flu
Published Data
Name Signature
Raymond
Peter
Mary
Emily
{flu, chlamydia}
{flu, chlamydia}
{flu, fever}
{flu, fever}
2-invariance
30
Public
Time = 1
Sex Zipcode
Disease
M 6500* flu
M 6500* chlamydia
F 6501* flu
F 6501* fever
Published Data
Name Sex Zipcode
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010
Voter Registration ListTime = 2
Sex Zipcode
Disease
M 6500* chlamydia
M 6500* flu
F 6501* fever
F 6501* flu
Published Data
2-invariance
2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.
Why?
Possible World Analysis
31
Public
Time = 1
Sex Zipcode
Disease
M 6500* flu
M 6500* chlamydia
Published Data
Name Sex Zipcode
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010
Voter Registration ListTime = 2
Sex Zipcode
Disease
M 6500* chlamydia
M 6500* flu
Published Data
2-invariance
2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.
Why?
Possible World Analysis
32
Public
Time = 1
Sex Zipcode
Disease
M 6500* flu
M 6500* chlamydia
Published Data
Name Sex Zipcode
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010
Voter Registration ListTime = 2
Sex Zipcode
Disease
M 6500* chlamydia
M 6500* flu
Published Data
2-invariance
2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.
Why?
Possible World Analysis
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
This is the possible world analysis based on the published table at time = 1 only.
33
Public
Time = 1
Sex Zipcode
Disease
M 6500* flu
M 6500* chlamydia
Published Data
Name Sex Zipcode
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010
Voter Registration ListTime = 2
Sex Zipcode
Disease
M 6500* chlamydia
M 6500* flu
Published Data
2-invariance
2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.
Why?
Possible World Analysis
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
This is the possible world analysis based on the published table at time = 2 only.
34
Public
Time = 1
Sex Zipcode
Disease
M 6500* flu
M 6500* chlamydia
Published Data
Name Sex Zipcode
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010
Voter Registration ListTime = 2
Sex Zipcode
Disease
M 6500* chlamydia
M 6500* flu
Published Data
2-invariance
2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.
Why?
Possible World Analysis
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
World 1
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
World 2
World 3
World 4
35
Public
Time = 1
Sex Zipcode
Disease
M 6500* flu
M 6500* chlamydia
Published Data
Name Sex Zipcode
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010
Voter Registration ListTime = 2
Sex Zipcode
Disease
M 6500* chlamydia
M 6500* flu
Published Data
2-invariance
2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.
Why?
Possible World Analysis
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
World 1
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
World 2
World 3
World 4
In the published data at time = 1,Prob(the second individual (i.e. Peter) is linked to chlamydia) =2/4 = 1/2
Yes
Yes
No
No
36
Public
Time = 1
Sex Zipcode
Disease
M 6500* flu
M 6500* chlamydia
Published Data
Name Sex Zipcode
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010
Voter Registration ListTime = 2
Sex Zipcode
Disease
M 6500* chlamydia
M 6500* flu
Published Data
2-invariance
2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.
Why?
Possible World Analysis
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
World 1
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
World 2
World 3
World 4
In the published data at time = 2,Prob(the second individual (i.e. Peter) is linked to chlamydia) =2/4 = 1/2
Yes
No
Yes
No
37
Public
Time = 1
Name Sex Zipcode
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010
Voter Registration ListTime = 2
2-invariance
2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.
Possible World Analysis
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
World 1
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
World 2
World 3
World 4
Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2.
Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) =
38
Public
Time = 1
Name Sex Zipcode
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010
Voter Registration ListTime = 2
2-invariance
2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.
Possible World Analysis
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
World 1
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
World 2
World 3
World 4
Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2.
Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) =
Yes
39
Public
Time = 1
Name Sex Zipcode
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010
Voter Registration ListTime = 2
2-invariance
2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.
Possible World Analysis
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
World 1
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
World 2
World 3
World 4
Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2.
Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) =
Yes
Yes
40
Public
Time = 1
Name Sex Zipcode
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010
Voter Registration ListTime = 2
2-invariance
2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.
Possible World Analysis
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
World 1
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
World 2
World 3
World 4
Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2.
Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) =
Yes
Yes
Yes
41
Public
Time = 1
Name Sex Zipcode
Raymond
M 65001
Peter M 65002
Mary F 65014
Alice F 65015
Emily F 65010
Voter Registration ListTime = 2
2-invariance
2-invariance provides the local guarantee.Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2.
Possible World Analysis
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
World 1
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
Sex Zipcode Disease
M 65001 flu
M 65002 chlamydia
Sex Zipcode Disease
M 65001 chlamydia
M 65002 flu
World 2
World 3
World 4
Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2.
Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) =
Yes
Yes
Yes
No
3/4
This value is larger than 1/2.
42
We illustrate how we derive a probabilty that an individual is linked to chlamydia with an example (for both local guarantee and global guarantee).
In fact, the general formula is much more complicated.
43
Theorem: Global guarantee is a stronger privacy requirement than local guarantee.
If the published tables satisfy global guarantee,then they satisfy local guarantee.
45
Public
Hospital
Name Sex Zipcode
Disease
Raymond
M 65001 flu
Peter M 65002 chlamydia
Mary F 65014 flu
Alice F 65015 fever
Medical Data
Time = 1
Sex Zipcode
Disease
M/F 650** flu
M/F 650** chlamydia
M/F 650** flu
M/F 650** fever
Published Data
Release the data set to public
Time = 2
Hospital
Name Sex Zipcode
Disease
Raymond
M 65001 flu
Peter M 65002 chlamydia
Mary F 65014 fever
Emily F 65010 flu
Medical Data
Release the data set to public
Sex Zipcode
Disease
M/F 650** flu
M/F 650** chlamydia
M/F 650** fever
M/F 650** flu
Published Data
Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published datasets) = 7/16
Global Guarantee
This value is smaller than 1/2.
46
5. Conclusion
We are the first to propose global guarantee
Global guarantee is a stronger privacy requirement than local guarantee.
48
Public
Hospital
Name Sex Zipcode
Disease
Raymond
M 65001 flu
Peter M 65002 chlamydia
Mary F 65014 flu
Alice F 65015 fever
Medical Data
Time = 1
Sex Zipcode
Disease
M 6500* flu
M 6500* chlamydia
F 6501* flu
F 6501* fever
Published Data
Release the data set to public
Time = 2
Hospital
Name Sex Zipcode
Disease
Raymond
M 65001 flu
Peter M 65002 chlamydia
Mary F 65014 fever
Emily F 65010 flu
Medical Data
Release the data set to public
Sex Zipcode
Disease
M 6500* flu
M 6500* chlamydia
F 6501* fever
F 6501* flu
Published Data
2-invariance (Local Guarantee)
Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) = 3/4
This value is larger than 1/2.