current trends

Post on 13-Apr-2016

218 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Current Trends in Data Security

Dan Suciu

Joint work with Gerome Miklau

2

Data Security

Dorothy Denning, 1982:

• Data Security is the science and study of methods of protecting data (...) from unauthorized disclosure and modification

• Data Security = Confidentiality + Integrity

3

Data Security

• Distinct from systems and network security– Assumes these are already secure

• Tools:– Cryptography, information theory, statistics, …

• Applications:– An enabling technology

4

Outline

• Traditional data security

• Two attacks

• Data security research today

• Conclusions

5

Traditional Data Security

• Security in SQL = Access control + Views

• Security in statistical databases = Theory

6

Access Control in SQL

GRANT privileges ON object TO users [WITH GRANT OPTIONS]

privileges = SELECT | INSERT | DELETE | . . .

object = table | attribute

REVOKE privileges ON object FROM users [CASCADE ]

[Griffith&Wade'76, Fagin'78]

7

Views in SQL

A SQL View = (almost) any SQL query

• Typically used as:

GRANT SELECT ON pmpStudents TO DavidRispoli

CREATE VIEW pmpStudents AS SELECT * FROM Students WHERE…

8

Summary of SQL Security

Limitations:• No row level access control• Table creator owns the data: that’s unfair !

… or spectacular failure:• Only 30% assign privileges to users/roles

– And then to protect entire tables, not columns

Access control = great success story of the DB community...

9

Summary (cont)

• Most policies in middleware: slow, error prone:– SAP has 10**4 tables– GTE over 10**5 attributes– A brokerage house has 80,000 applications– A US government entity thinks that it has 350K

• Today the database is not at the center of the policy administration universe

[Rosenthal&Winslett’2004]

10

Security in Statistical DBs

Goal:• Allow arbitrary aggregate SQL queries• Hide confidential data

SELECT count(*)FROM PatientsWHERE age=42 and sex=‘M’ and diagnostic=‘schizophrenia’

OK

SELECT nameFROM PatientWHERE age=42 and sex=‘M’ and diagnostic=‘schizophrenia’

Not OK

[Adam&Wortmann’89]

11

Security in Statistical DBsWhat has been tried:• Query restriction

– Query-size control, query-set overlap control, query monitoring– None is practical

• Data perturbation– Most popular: cell combination, cell suppression– Other methods, for continuous attributes: may introduce bias

• Output perturbation– For continuous attributes only

[Adam&Wortmann’89]

12

Summary on Security in Statistical DB

• Original goal seems impossible to achieve

• Cell combination/suppression are popular, but do not allow arbitrary queries

13

Outline

• Traditional data security

• Two attacks

• Data security research today

• Conclusions

14

Search claims by:

SQL InjectionYour health insurance company lets you see the claims online:

Now search through the claims :

Dr. Lee

First login: User:

Password:

fred

********

SELECT…FROM…WHERE doctor=‘Dr. Lee’ and patientID=‘fred’

[Chris Anley, Advanced SQL Injection In SQL]

15

SQL InjectionNow try this:

Search claims by: Dr. Lee’ OR patientID = ‘suciu’; --

Better:

Search claims by: Dr. Lee’ OR 1 = 1; --

…..WHERE doctor=‘Dr. Lee’ OR patientID=‘suciu’; --’ and patientID=‘fred’

16

SQL InjectionWhen you’re done, do this:

Search claims by: Dr. Lee’; DROP TABLE Patients; --

17

SQL Injection

• The DBMS works perfectly. So why is SQL injection possible so often ?

• Quick answer:– Poor programming: use stored procedures !

• Deeper answer:– Move policy implementation from apps to DB

18

Latanya Sweeney’s Finding

• In Massachusetts, the Group Insurance Commission (GIC) is responsible for purchasing health insurance for state employees

• GIC has to publish the data:

GIC(zip, dob, sex, diagnosis, procedure, ...)

19

Latanya Sweeney’s Finding

• Sweeney paid $20 and bought the voter registration list for Cambridge Massachusetts:

GIC(zip, dob, sex, diagnosis, procedure, ...)VOTER(name, party, ..., zip, dob, sex)

20

Latanya Sweeney’s Finding

• William Weld (former governor) lives in Cambridge, hence is in VOTER

• 6 people in VOTER share his dob• only 3 of them were man (same sex)• Weld was the only one in that zip• Sweeney learned Weld’s medical records !

zip, dob, sex

21

Latanya Sweeney’s Finding

• All systems worked as specified, yet an important data has leaked

• How do we protect against that ?

Some of today’s research in data security address breachesthat happen even if all systems work correctly

22

Summary on Attacks

SQL injection:• A correctness problem:

– Security policy implemented poorly in the application

Sweeney’s finding:• Beyond correctness:

– Leakage occurred when all systems work as specified

23

Outline

• Traditional data security

• Two attacks

• Data security research today

• Conclusions

24

Research Topics in Data Security

Rest of the talk:• Information Leakage• Privacy• Fine-grained access control• Data encryption• Secure shared computation

25

First Last Age RaceHarry Stone 34 Afr-AmJohn Reyser 36 Cauc

Beatrice Stone 47 Afr-amJohn Ramos 22 Hisp

First Last Age Race* Stone 30-50 Afr-Am

John R* 20-40 ** Stone 30-50 Afr-am

John R* 20-40 *

Information Leakage:k-Anonymity

Definition: each tuple is equal to at least k-1 others

Anonymizing: through suppression and generalization

Hard: NP-complete for supression onlyApproximations exists

[Samarati&Sweeney’98, Meyerson&Williams’04]

26

Information Leakage:Query-view Security

Secret Query View(s) Disclosure ?S(name) V(name,phone)

S(name,phone) V1(name,dept)V2(dept,phone)

S(name) V(dept)S(name)

where dept=‘HR’V(name)

where dept=‘RD’

TABLE Employee(name, dept, phone)Have data:

total

big

tiny

none

[Miklau&S’04, Miklau&Dalvi&S’05,Yang&Li’04]

27

Summary on Information Disclosure

• The theoretical research:– Exciting new connections between databases

and information theory, probability theory, cryptography

• The applications: – many years away

[Abadi&Warinschi’05]

28

Privacy

• “Is the right of individuals to determine for themselves when, how and to what extent information about them is communicated to others”

• More complex than confidentiality

[Agrawal’03]

29

Privacy

Involves:• Data• Owner• Requester• Purpose• Consent

Example: Alice gives her email to a web service

alice@a.b.com

Privacy policy: P3P

30

Hippocratic Databases

DB support for implementing privacy policies.• Purpose specification• Consent• Limited use• Limited retention• …

[Agrawal’03, LeFevrey’04]

alice@a.b.com

Privacy policy: P3P

Hippocratic DB

Protection against: Sloppy organizations Malicious organizations

31

Privacy for Paranoids

• Idea: rely on trusted agents

alice@a.b.com

Agent

aly1@agenthost.com

lice27@agenthost.com

foreign keys ?

[Aggarwal’04]

Protection against: Sloppy organizations Malicious attackers

32

Summary on Privacy

• Major concern in industry– Legislation– Consumer demand

• Challenge:– How to enforce an organization’s stated

policies

33

Fine-grained Access Control

Control access at the tuple level.

• Policy specification languages• Implementation

34

Policy Specification Language

CREATE AUTHORIZATION VIEW PatientsForDoctors AS SELECT Patient.* FROM Patient, Doctor WHERE Patient.doctorID = Doctor.ID and Doctor.login = %currentUser

Contextparameters

No standard, but usually based on parameterized views.

35

ImplementationSELECT Patient.name, Patient.ageFROM PatientWHERE Patient.disease = ‘flu’

SELECT Patient.name, Patient.ageFROM Patient, DoctorWHERE Patient.disease = ‘flu’ and Patient.doctorID = Doctor.ID and Patient.login = %currentUser

e.g. Oracle

36

Two Semantics• The Truman Model = filter semantics

– transform reality– ACCEPT all queries– REWRITE queries– Sometimes misleading results

• The non-Truman model = deny semantics– reject queries– ACCEPT or REJECT queries– Execute query UNCHANGED– May define multiple security views for a user

[Rizvi’04]

SELECT count(*)FROM PatientsWHERE disease=‘flu’

37

Summary of Fine Grained Access Control

• Trend in industry: label-based security• Killer app: application hosting

– Independent franchises share a single table at headquarters (e.g., Holiday Inn)

– Application runs under requester’s label, cannot see other labels

– Headquarters runs Read queries over them• Oracle’s Virtual Private Database

[Rosenthal&Winslett’2004]

38

Data Encryption for Publishing

• Users and their keys:

• Complex Policies:

All authorized users: Kuser

Patient: Kpat

Doctor: Kdr

Nurse: Knu

Administrator : Kadmin

What is the encryption granularity ?

Doctor researchers may access trials Nurses may access diagnosticEtc…

Scientist wants to publish medical research data on the Web

39

Data Encryption for PublishingAn XML tree protection:

<patient>

<privateData>

<name> <age>

<diagnostic>

JoeDoe 28

<address>

Seattle

<trial>

<drug>

flu

<placebo>

Kuser

Kpat (KnuKadm) Knu KdrKdr

Kpat Kmaster Kmaster

Tylenol Candy

[Miklau&S.’03]

Doctor: Kuser, Kdr

Nurse: Kuser, Knu

Nurse+admin: Kuser, Knu, Kadm

40

Summary on Data Encryption

• Industry:– Supported by all vendors:

Oracle, DB2, SQL-Server– Efficiency issues still largely unresolved

• Research:– Hard theoretical security analysis

[Abadi&Warinschi’05]

41

Secure Shared Processing

• Alice has a database DBA

• Bob has a database DBB

• How can they compute Q(DBA, DBB), without revealing their data ?

• Long history in cryptography• Some database queries are easier than general case

42

Secure Shared Processing

[Agrawal’03]

Alice Bob

a b c d c d e

h(a) h(b) h(c) h(d) h(c) h(d) h(e)

Compute one-way hash

Exchange

h(c) h(d) h(e) h(a) h(b) h(c) h(d)What’s wrong ?

Task: find intersectionwithout revealing the rest

43

Secure Shared ProcessingAlice Bob

a b c d c d e

EB(c) EB(d) EB(e) EA(a) EA(b) EA(c) EA(d)

commutative encryption:h(x) = EA(EB(x)) = EB(EA(x))

EA(a) EA(b) EA(c) EA(d) EB(c) EB(d) EB(e)

EA EB

h(c) h(d) h(e) h(a) h(b) h(c) h(d)EA EB

h(a) h(b) h(c) h(d) h(c) h(d) h(e)

[Agrawal’03]

44

Summary on Secure Shared Processing

• Secure intersection, joins, data mining

• But are there other examples ?

45

Outline

• Traditional data security

• Two attacks

• Data security research today

• Conclusions

46

Conclusions

• Traditional data security confined to one server– Security in SQL– Security in statistical databases

• Attacks possible due to:– Poor implementation of security policies: SQL

injection– Unintended information leakage in published data

47

Conclusions• State of the industry:

– Data security policies: scattered throughout applications– Database no longer center of the security universe– Needed: automatic means to translate complex policies into

physical implementations

• State of research: data security in global data sharing– Information leakage, privacy, secure computations, etc.– Database research community has an increased appetite for

cryptographic techniques

48

Questions ?

top related