current trends
TRANSCRIPT
1
Current Trends in Data Security
Dan Suciu
Joint work with Gerome Miklau
2
Data Security
Dorothy Denning, 1982:
• Data Security is the science and study of methods of protecting data (...) from unauthorized disclosure and modification
• Data Security = Confidentiality + Integrity
3
Data Security
• Distinct from systems and network security– Assumes these are already secure
• Tools:– Cryptography, information theory, statistics, …
• Applications:– An enabling technology
4
Outline
• Traditional data security
• Two attacks
• Data security research today
• Conclusions
5
Traditional Data Security
• Security in SQL = Access control + Views
• Security in statistical databases = Theory
6
Access Control in SQL
GRANT privileges ON object TO users [WITH GRANT OPTIONS]
privileges = SELECT | INSERT | DELETE | . . .
object = table | attribute
REVOKE privileges ON object FROM users [CASCADE ]
[Griffith&Wade'76, Fagin'78]
7
Views in SQL
A SQL View = (almost) any SQL query
• Typically used as:
GRANT SELECT ON pmpStudents TO DavidRispoli
CREATE VIEW pmpStudents AS SELECT * FROM Students WHERE…
8
Summary of SQL Security
Limitations:• No row level access control• Table creator owns the data: that’s unfair !
… or spectacular failure:• Only 30% assign privileges to users/roles
– And then to protect entire tables, not columns
Access control = great success story of the DB community...
9
Summary (cont)
• Most policies in middleware: slow, error prone:– SAP has 10**4 tables– GTE over 10**5 attributes– A brokerage house has 80,000 applications– A US government entity thinks that it has 350K
• Today the database is not at the center of the policy administration universe
[Rosenthal&Winslett’2004]
10
Security in Statistical DBs
Goal:• Allow arbitrary aggregate SQL queries• Hide confidential data
SELECT count(*)FROM PatientsWHERE age=42 and sex=‘M’ and diagnostic=‘schizophrenia’
OK
SELECT nameFROM PatientWHERE age=42 and sex=‘M’ and diagnostic=‘schizophrenia’
Not OK
[Adam&Wortmann’89]
11
Security in Statistical DBsWhat has been tried:• Query restriction
– Query-size control, query-set overlap control, query monitoring– None is practical
• Data perturbation– Most popular: cell combination, cell suppression– Other methods, for continuous attributes: may introduce bias
• Output perturbation– For continuous attributes only
[Adam&Wortmann’89]
12
Summary on Security in Statistical DB
• Original goal seems impossible to achieve
• Cell combination/suppression are popular, but do not allow arbitrary queries
13
Outline
• Traditional data security
• Two attacks
• Data security research today
• Conclusions
14
Search claims by:
SQL InjectionYour health insurance company lets you see the claims online:
Now search through the claims :
Dr. Lee
First login: User:
Password:
fred
********
SELECT…FROM…WHERE doctor=‘Dr. Lee’ and patientID=‘fred’
[Chris Anley, Advanced SQL Injection In SQL]
15
SQL InjectionNow try this:
Search claims by: Dr. Lee’ OR patientID = ‘suciu’; --
Better:
Search claims by: Dr. Lee’ OR 1 = 1; --
…..WHERE doctor=‘Dr. Lee’ OR patientID=‘suciu’; --’ and patientID=‘fred’
16
SQL InjectionWhen you’re done, do this:
Search claims by: Dr. Lee’; DROP TABLE Patients; --
17
SQL Injection
• The DBMS works perfectly. So why is SQL injection possible so often ?
• Quick answer:– Poor programming: use stored procedures !
• Deeper answer:– Move policy implementation from apps to DB
18
Latanya Sweeney’s Finding
• In Massachusetts, the Group Insurance Commission (GIC) is responsible for purchasing health insurance for state employees
• GIC has to publish the data:
GIC(zip, dob, sex, diagnosis, procedure, ...)
19
Latanya Sweeney’s Finding
• Sweeney paid $20 and bought the voter registration list for Cambridge Massachusetts:
GIC(zip, dob, sex, diagnosis, procedure, ...)VOTER(name, party, ..., zip, dob, sex)
20
Latanya Sweeney’s Finding
• William Weld (former governor) lives in Cambridge, hence is in VOTER
• 6 people in VOTER share his dob• only 3 of them were man (same sex)• Weld was the only one in that zip• Sweeney learned Weld’s medical records !
zip, dob, sex
21
Latanya Sweeney’s Finding
• All systems worked as specified, yet an important data has leaked
• How do we protect against that ?
Some of today’s research in data security address breachesthat happen even if all systems work correctly
22
Summary on Attacks
SQL injection:• A correctness problem:
– Security policy implemented poorly in the application
Sweeney’s finding:• Beyond correctness:
– Leakage occurred when all systems work as specified
23
Outline
• Traditional data security
• Two attacks
• Data security research today
• Conclusions
24
Research Topics in Data Security
Rest of the talk:• Information Leakage• Privacy• Fine-grained access control• Data encryption• Secure shared computation
25
First Last Age RaceHarry Stone 34 Afr-AmJohn Reyser 36 Cauc
Beatrice Stone 47 Afr-amJohn Ramos 22 Hisp
First Last Age Race* Stone 30-50 Afr-Am
John R* 20-40 ** Stone 30-50 Afr-am
John R* 20-40 *
Information Leakage:k-Anonymity
Definition: each tuple is equal to at least k-1 others
Anonymizing: through suppression and generalization
Hard: NP-complete for supression onlyApproximations exists
[Samarati&Sweeney’98, Meyerson&Williams’04]
26
Information Leakage:Query-view Security
Secret Query View(s) Disclosure ?S(name) V(name,phone)
S(name,phone) V1(name,dept)V2(dept,phone)
S(name) V(dept)S(name)
where dept=‘HR’V(name)
where dept=‘RD’
TABLE Employee(name, dept, phone)Have data:
total
big
tiny
none
[Miklau&S’04, Miklau&Dalvi&S’05,Yang&Li’04]
27
Summary on Information Disclosure
• The theoretical research:– Exciting new connections between databases
and information theory, probability theory, cryptography
• The applications: – many years away
[Abadi&Warinschi’05]
28
Privacy
• “Is the right of individuals to determine for themselves when, how and to what extent information about them is communicated to others”
• More complex than confidentiality
[Agrawal’03]
29
Privacy
Involves:• Data• Owner• Requester• Purpose• Consent
Example: Alice gives her email to a web service
Privacy policy: P3P
30
Hippocratic Databases
DB support for implementing privacy policies.• Purpose specification• Consent• Limited use• Limited retention• …
[Agrawal’03, LeFevrey’04]
Privacy policy: P3P
Hippocratic DB
Protection against: Sloppy organizations Malicious organizations
31
Privacy for Paranoids
• Idea: rely on trusted agents
Agent
foreign keys ?
[Aggarwal’04]
Protection against: Sloppy organizations Malicious attackers
32
Summary on Privacy
• Major concern in industry– Legislation– Consumer demand
• Challenge:– How to enforce an organization’s stated
policies
33
Fine-grained Access Control
Control access at the tuple level.
• Policy specification languages• Implementation
34
Policy Specification Language
CREATE AUTHORIZATION VIEW PatientsForDoctors AS SELECT Patient.* FROM Patient, Doctor WHERE Patient.doctorID = Doctor.ID and Doctor.login = %currentUser
Contextparameters
No standard, but usually based on parameterized views.
35
ImplementationSELECT Patient.name, Patient.ageFROM PatientWHERE Patient.disease = ‘flu’
SELECT Patient.name, Patient.ageFROM Patient, DoctorWHERE Patient.disease = ‘flu’ and Patient.doctorID = Doctor.ID and Patient.login = %currentUser
e.g. Oracle
36
Two Semantics• The Truman Model = filter semantics
– transform reality– ACCEPT all queries– REWRITE queries– Sometimes misleading results
• The non-Truman model = deny semantics– reject queries– ACCEPT or REJECT queries– Execute query UNCHANGED– May define multiple security views for a user
[Rizvi’04]
SELECT count(*)FROM PatientsWHERE disease=‘flu’
37
Summary of Fine Grained Access Control
• Trend in industry: label-based security• Killer app: application hosting
– Independent franchises share a single table at headquarters (e.g., Holiday Inn)
– Application runs under requester’s label, cannot see other labels
– Headquarters runs Read queries over them• Oracle’s Virtual Private Database
[Rosenthal&Winslett’2004]
38
Data Encryption for Publishing
• Users and their keys:
• Complex Policies:
All authorized users: Kuser
Patient: Kpat
Doctor: Kdr
Nurse: Knu
Administrator : Kadmin
What is the encryption granularity ?
Doctor researchers may access trials Nurses may access diagnosticEtc…
Scientist wants to publish medical research data on the Web
39
Data Encryption for PublishingAn XML tree protection:
<patient>
<privateData>
<name> <age>
<diagnostic>
JoeDoe 28
<address>
Seattle
<trial>
<drug>
flu
<placebo>
Kuser
Kpat (KnuKadm) Knu KdrKdr
Kpat Kmaster Kmaster
Tylenol Candy
[Miklau&S.’03]
Doctor: Kuser, Kdr
Nurse: Kuser, Knu
Nurse+admin: Kuser, Knu, Kadm
40
Summary on Data Encryption
• Industry:– Supported by all vendors:
Oracle, DB2, SQL-Server– Efficiency issues still largely unresolved
• Research:– Hard theoretical security analysis
[Abadi&Warinschi’05]
41
Secure Shared Processing
• Alice has a database DBA
• Bob has a database DBB
• How can they compute Q(DBA, DBB), without revealing their data ?
• Long history in cryptography• Some database queries are easier than general case
42
Secure Shared Processing
[Agrawal’03]
Alice Bob
a b c d c d e
h(a) h(b) h(c) h(d) h(c) h(d) h(e)
Compute one-way hash
Exchange
h(c) h(d) h(e) h(a) h(b) h(c) h(d)What’s wrong ?
Task: find intersectionwithout revealing the rest
43
Secure Shared ProcessingAlice Bob
a b c d c d e
EB(c) EB(d) EB(e) EA(a) EA(b) EA(c) EA(d)
commutative encryption:h(x) = EA(EB(x)) = EB(EA(x))
EA(a) EA(b) EA(c) EA(d) EB(c) EB(d) EB(e)
EA EB
h(c) h(d) h(e) h(a) h(b) h(c) h(d)EA EB
h(a) h(b) h(c) h(d) h(c) h(d) h(e)
[Agrawal’03]
44
Summary on Secure Shared Processing
• Secure intersection, joins, data mining
• But are there other examples ?
45
Outline
• Traditional data security
• Two attacks
• Data security research today
• Conclusions
46
Conclusions
• Traditional data security confined to one server– Security in SQL– Security in statistical databases
• Attacks possible due to:– Poor implementation of security policies: SQL
injection– Unintended information leakage in published data
47
Conclusions• State of the industry:
– Data security policies: scattered throughout applications– Database no longer center of the security universe– Needed: automatic means to translate complex policies into
physical implementations
• State of research: data security in global data sharing– Information leakage, privacy, secure computations, etc.– Database research community has an increased appetite for
cryptographic techniques
48
Questions ?