2015.9.28. - a hospital has a database of patient records, each record containing a binary value...
TRANSCRIPT
![Page 1: 2015.9.28. - A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of](https://reader036.vdocuments.mx/reader036/viewer/2022072006/56649f515503460f94c754fc/html5/thumbnails/1.jpg)
2015.9.28
Differential Privacy
![Page 2: 2015.9.28. - A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of](https://reader036.vdocuments.mx/reader036/viewer/2022072006/56649f515503460f94c754fc/html5/thumbnails/2.jpg)
A preliminary story- A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of cancer.
- We want to know the total number of patients with cancers? Easy! A summation over these binary values
patient has cancer
Amy 0
Tom 1
Jack 1
- But how about if we know anyone must on the list? Or anyone must be the end of the list? Whether Jack has cancer? S(3)-S(2)
![Page 3: 2015.9.28. - A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of](https://reader036.vdocuments.mx/reader036/viewer/2022072006/56649f515503460f94c754fc/html5/thumbnails/3.jpg)
A preliminary story- If f is a random query function, for example:
f(i) = count(i) + noise f(5) : { 2, 2, 5, 3} f(4): {2, 2, 5, 3} with same probability
f(5) – f(4) is useless !
![Page 4: 2015.9.28. - A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of](https://reader036.vdocuments.mx/reader036/viewer/2022072006/56649f515503460f94c754fc/html5/thumbnails/4.jpg)
GIC Incidence [Sweeny 2002]
• Group Insurance Commissions (GIC, Massachusetts)– Collected patient data for ~135,000 state employees.– Gave to researchers and sold to industry.– Medical record of the former state governor is identified.
Patient 1 Patient 2 Patient n
GIC, MA
DB
……
…… Age Sex Zip code Disease
69 M 47906 Cancer
65 M 47907 Cancer
52 F 47902 Flu
43 F 46204 Gastritis
42 F 46208 Hepatitis
47 F 46203 Bronchitis
Name
Bob
Carl
Daisy
Emily
Flora
Gabriel
4Re-identification occurs!Topic 21: Data Privacy
![Page 5: 2015.9.28. - A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of](https://reader036.vdocuments.mx/reader036/viewer/2022072006/56649f515503460f94c754fc/html5/thumbnails/5.jpg)
DefinitionsLet be a randomized algorithm. Let be two datasets that differ in at most one entry (we call these database neighbors)
xi xi’
D1 D2
Database neighbors
Deifinition 1. Let . Define to be private if for all neighboring databases , and for all (measurable) subsets, we have
Where the probability is taken over the coin tosses of
![Page 6: 2015.9.28. - A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of](https://reader036.vdocuments.mx/reader036/viewer/2022072006/56649f515503460f94c754fc/html5/thumbnails/6.jpg)
Deifinition 1. Let . Define to be private if for all neighboring databases , and for all(measurable) subsets , we have
Where the probability is taken over the coin tosses of
Observation 2. Because we can switch interchangeably, Definition 1 implies that
Since for small , then we have roughly
satisfies
![Page 7: 2015.9.28. - A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has some form of](https://reader036.vdocuments.mx/reader036/viewer/2022072006/56649f515503460f94c754fc/html5/thumbnails/7.jpg)
Laplace distribution