inference problem privacy preserving data mining

42
Inference Problem Privacy Preserving Data Mining

Upload: andrew-robinson

Post on 14-Jan-2016

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Inference Problem Privacy Preserving Data Mining

Inference ProblemPrivacy Preserving

Data Mining

Page 2: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 2Lecture 19

Readings and Assignments

I. Moskowitz, M. H. Kang: Covert Channels – Here to Stay? http://citeseer.nj.nec.com/cache/papers/cs/1340/http:zSzzSzwww.itd.nrl.navy.milzSzITDzSz5540zSzpublicationszSzCHACSzSz1994zSz1994moskowitz-compass.pdf/moskowitz94covert.pdf

Jajodia, Meadows: Inference Problems in Multilevel Secure Database Management Systems http://www.acsac.org/secshelf/book001/book001.html, essay 24

Page 3: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 3Lecture 19

Indirect Information Flow Channels

Covert channels Inference channels

Page 4: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 4Lecture 19

Communication Channels Overt Channel: designed into a system and

documented in the user's manual Covert Channel: not documented. Covert

channels may be deliberately inserted into a system, but most such channels are accidents of the system design.

Page 5: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 5Lecture 19

Covert Channel Timing Channel: based on system times Storage channels: not time related

communication Can be turned into each other

Page 6: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 6Lecture 19

Inference Channels

+ Meta-data Sensitive Information

Non-sensitiveinformation =

Page 7: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 7Lecture 19

Inference Channels Statistical Database Inferences General Purpose Database Inferences

Page 8: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 8Lecture 19

Statistical Databases Goal: provide aggregate information about groups of

individuals E.g., average grade point of students

Security risk: specific information about a particular individual E.g., grade point of student John Smith

Meta-data: Working knowledge about the attributes Supplementary knowledge (not stored in database)

Page 9: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 9Lecture 19

Types of Statistics Macro-statistics: collections of related statistics presented in 2-

dimensional tables

Micro-statistics: Individual data records used for statistics after identifying information is removed

Sex\Year 1997 1998 Sum

Female 4 1 5

Male 6 13 19

Sum 10 14 24

Sex Course GPA Year

F CSCE 590 3.5 2000

M CSCE 590 3.0 2000

F CSCE 790 4.0 2001

Page 10: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 10Lecture 19

Statistical Compromise Exact compromise: find exact value of an

attribute of an individual (e.g., John Smith’s GPA is 3.8)

Partial compromise: find an estimate of an attribute value corresponding to an individual (e.g., John Smith’s GPA is between 3.5 and 4.0)

Page 11: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 11Lecture 19

Methods of Attacks and Protection Small/Large Query Set Attack

C: characteristic formula that identifies groups of individuals

If C identifies a single individual I, e.g., count(C) = 1 Find out existence of property

If count(C and D)=1 means I has property D If count(C and D)=0 means I does not have D

OR Find value of property

Sum(C, D), gives value of D

Page 12: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 12Lecture 19

Small/Large Query Set Attack cont.

Protection from small/large query set attack: query-set-size control

A query q(C) is permitted only if

N-n |C| n , where n 0 is a parameter of the database and N is all the records in the database

Page 13: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 13Lecture 19

Tracker attack

Tracker C

C1C2

C=C1 and C2T=C1 and ~C2

q(C)=q(C1) – q(T)

q(C) is disallowed

Page 14: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 14Lecture 19

Tracker attack

TrackerC

C1C2

C=C1 and C2T=C1 and ~C2

D

C and Dq(C and D)=q(T or C and D) – q(T)

q(C and D) is disallowed

Page 15: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 15Lecture 19

Query overlap attack

C1 C2

JohnKathy

Max

Fred

EvePaul

Mitch

Q(John)=q(C1)-q(C2)

Protection: query-overlap control

Page 16: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 16Lecture 19

Insertion/Deletion Attack Observing changes overtime

q1=q(C)

insert(i)q2=q(C)

q(i)=q2-q1

Protection: insertion/deletion performed as pairs

Page 17: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 17Lecture 19

Statistical Inference Theory Give unlimited number of statistics and correct

statistical answers, all statistical databases can be compromised (Ullman)

Page 18: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 18Lecture 19

Inferences in General-Purpose Databases Queries based on sensitive data Inference via database constraints Inferences via updates

Page 19: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 19Lecture 19

Queries based on sensitive data Sensitive information is used in selection

condition but not returned to the user. Example: Salary: secret, Name: public

NameSalary=$25,000

Protection: apply query of database views at different security levels

Page 20: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 20Lecture 19

Database Constraints Integrity constraints Database dependencies Key integrity

Page 21: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 21Lecture 19

Integrity Constraints C=A+B A=public, C=public, and B=secret B can be calculated from A and C, i.e., secret

information can be calculated from public data

Page 22: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 22Lecture 19

Database DependenciesMetadata: Functional dependencies Multi-valued dependencies Join dependencies etc.

Page 23: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 23Lecture 19

Functional Dependency FD: A B, that is for any two tuples in the relation, if

they have the same value for A, they must have the same value for B.

Example: FD: Rank Salary

Secret information: Name and Salary together Query1: Name and Rank Query2: Rank and Salary Combine answers for query1 and 2 to reveal Name and

Salary together

Page 24: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 24Lecture 19

Key integrity Every tuple in the relation have a unique key Users at different levels, see different versions

of the database Users might attempt to update data that is not

visible for them

Page 25: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 25Lecture 19

Example

Name (key) Salary Address

Black P 38,000 P Columbia S

Red S 42,000 S Irmo S

Secret View

Name (key) Salary Address

Black P 38,000 P Null P

Public View

Page 26: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 26Lecture 19

UpdatesPublic User:

Name (key) Salary Address

Black P 38,000 P Null P

1. Update Black’s address to Orlando2. Add new tuple: (Red, 22,000, Manassas)IfRefuse update: covert channelAllow update: • Overwrite high data – may be incorrect• Create new tuple – which data it correct

(polyinstantiation) – violate key constraints

Page 27: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 27Lecture 19

Updates

Name (key) Salary Address

Black P 38,000 P Columbia S

Red S 42,000 S Irmo S

Secret user:

1. Update Black’s salary to 45,000IfRefuse update: denial of serviceAllow update: • Overwrite low data – covert channel• Create new tuple – which data it correct

(polyinstantiation) – violate key constraints

Page 28: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 28Lecture 19

Inference Problem No general technique is available to solve the

problem Need assurance of protection Hard to incorporate outside knowledge

Page 29: Inference Problem Privacy Preserving Data Mining

29

Web Evolution

Past: Human usage Static Web pages

(HTML, XML)

Present: Human & Automated usage Semantic Web, WS, SOA

Future: Mobile Computing

Page 30: Inference Problem Privacy Preserving Data Mining

30

Web Data Security Access Control Models Heterogeneous Data: XMLXML, Stream, Text Limitations:

Syntax-basedSyntax-basedNo association protectionLimited handling of updates No data or application semantics No inference control

Page 31: Inference Problem Privacy Preserving Data Mining

31

Secure XML Views - ExampleSecure XML Views - Example

<medicalFiles> UC <countyRec> S <patient> S <name>John Smith </name> UC <phone>111-2222</phone> S </patient> <physician>Jim Dale </physician> UC </countyRec> <milBaseRec> TS <patient> S <name>Harry Green</name> UC <phone>333-4444</phone> S </patient> <physician>Joe White </physician> UC <milTag>MT78</milTag> TS </milBaseRec></medicalFiles>

medicalFiles

countyRec

patient

nameJohn Smith

milBaseRec

physicianJim Dale

physicianJoe White

nameHarry Green

milTagMT78

patient

phone111-2222

phone333-4444

View over UC data

Page 32: Inference Problem Privacy Preserving Data Mining

32

<medicalFiles> <countyRec> <patient> <name>John Smith</name> </patient> <physician>Jim Dale</physician> </countyRec> <milBaseRec> <patient> <name>Harry Green</name> </patient> <physician>Joe White</physician> </milBaseRec></medicalFiles>

medicalFiles

countyRec

patient

nameJohn Smith

milBaseRec

physicianJim Dale

physicianJoe White

nameHarry Green

patient

View over UC data

Secure XML Views - ExampleSecure XML Views - Example

Page 33: Inference Problem Privacy Preserving Data Mining

33

medicalFiles

countyRec

patient

nameJohn Smith

milBaseRec

physicianJim Dale

physicianJoe White

nameHarry Green

patient

View over UC data

<medicalFiles> <tag01> <tag02> <name>John Smith</name> </tag02> <physician>Jim Dale</physician> </tag01> <tag03> <tag02> <name>Harry Green</name> </tag02> <physician>Joe White</physician> </tag03></medicalFiles>

Secure XML Views - ExampleSecure XML Views - Example

Page 34: Inference Problem Privacy Preserving Data Mining

34

<medicalFiles> UC <countyRec> S <patient> S <name>John Smith</name> UC </patient> <physician>Jim Dale</physician> UC </countyRec> <milBaseRec> TS <patient> S <name>Harry Green</name> UC </patient> <physician>Joe White</physician> UC </milBaseRec></medicalFiles>

medicalFiles

countyRec

patient

nameJohn Smith

milBaseRec

physicianJim Dale

physicianJoe White

nameHarry Green

patient

View over UC data

Secure XML Views - ExampleSecure XML Views - Example

Page 35: Inference Problem Privacy Preserving Data Mining

35

medicalFiles

nameJohn Smith

physicianJim Dale

physicianJoe White

nameHarry Green

View over UC data

<medicalFiles> <name>John Smith</name> <physician>Jim Dale</physician> <name>Harry Green</name> <physician>Joe White</physician></medicalFiles>

Secure XML Views - ExampleSecure XML Views - Example

Page 36: Inference Problem Privacy Preserving Data Mining

36

The Inference ProblemThe Inference Problem

General Purpose Database:

Non-confidential data + Metadata Undesired Inferences

Semantic Web:

Non-confidential data + Metadata (data and application semantics) + Computational Power +

Connectivity Undesired Inferences

Page 37: Inference Problem Privacy Preserving Data Mining

37

Correlated Inference

address fortPublic

district basinPublic

Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base

placebase

Water SourceWater source

Base

Place

Water source Base

Confidential

Page 38: Inference Problem Privacy Preserving Data Mining

Organizational Data

Confidential

Attacker

Public

Access Control

MisinfoMisinfoX

OntologyData Integration

andInferences

Web Data

X

Inference Control

Page 39: Inference Problem Privacy Preserving Data Mining

Organizational Data

Confidential

PublicMisinfoMisinfo

ACCESS and INFERENCE CONTROL POLICY• Logic-based inference detection• Exact and partial disclosure• Data and metadata protection• Heterogeneous data manipulation• Metadata discovery

Inference Control

Page 40: Inference Problem Privacy Preserving Data Mining

Data Mining and Privacy

Statistical inference:K-anonymityCorrelation

General inference:Pattern metadataBiased learning

CSCE 522 - Farkas 40Lecture 19

Page 41: Inference Problem Privacy Preserving Data Mining

Future

41

Page 42: Inference Problem Privacy Preserving Data Mining

CSCE 522 - Farkas 42Lecture 19

Next Class

Midterm exam