securing data-in-use in the cloud - microsoftrnd.co.il kit/bluehat il decks/prof... · securing...
TRANSCRIPT
| PROTECT THE KEYS TO EVERYTHING © 2015 Dyadic. All rights reserved. Confidential and proprietary.
Securing Data-in-Use in the CloudMyths and Facts
Yehuda LindellDyadic Security & Bar-Ilan University
| PROTECT THE KEYS TO EVERYTHING
Moving to the cloud has many operational advantages
Security in the cloud is a double-edged sword:• Security may be much better in many cases
• But, there is a loss of control
Loss of control• Psychological effect (car vs plane accidents)
• Regulatory effect
The real threat – being caught up in a mega breach• The stakes are rising all the time
2
The Problem
| PROTECT THE KEYS TO EVERYTHING
In 2012, attackers breached LinkedIn• Password hashes were not salted (and no iterations)
A Dropbox employee reused their password in LinkedIn
Attackers used the password to access the Dropbox user/password database
The file of passwords was dumped online• 68 million credentials exposed• Brute-force attacks are effective (albeit not so easy, since bcrypt was used
– but certainly vulnerable unlike Dropbox claim)
All Dropbox users caught up in the breach...
3
The Dropbox Breach (via LinkedIn)
| PROTECT THE KEYS TO EVERYTHING
A seemingly simple solution: Encrypt!
Problem: Where do you put the key and how do you protect it?
• Option 1: You keep it, but the cloud then becomes an encrypted disk and no more
• Fine for Dropbox-type applications
• Option 2: On the VM – worthless
• Option 3: Managed by the cloud
4
Protecting Data-at-Rest in the Cloud
| PROTECT THE KEYS TO EVERYTHING
We are all familiar with the “data-in-motion” and “data-at-rest” paradigm
The power of the cloud is in computation• If encrypted data-at-rest is decrypted only by the client, then the cloud becomes a disk
• If encrypted data-at-rest is decrypted in the cloud, then it becomes vulnerable
• A new paradigm: encrypting data-in-use• Data is encrypted by client
• Data is uploaded to cloud
• Cloud computes on the data while encrypted and returns encrypted answer to client
• Client decrypts
5
Protecting Data
| PROTECT THE KEYS TO EVERYTHING
Two Very Different Settings
An existing cloud/service provider
• Encrypt data before it gets to the cloud – use proxy
• The cloud/service doesn’t know that it’s encrypted
Advantage: use existing infrastructure
Disadvantage: limits what can be done
6
A modified cloud/service
• Encrypt data before it gets to the cloud – use proxy
• The cloud/service knows about the encryption and is modified appropriately
Advantage: more flexibility; can do more
Disadvantage: doesn’t work with existing solutions; much harder to adopt
| PROTECT THE KEYS TO EVERYTHING
Deterministic Encryption
Encrypt each “word” by applying deterministic encryption to the term
• Technically – apply a pseudorandom permutation
Search by encrypting the search term and simply comparing (standard search)
Notes:
• Partial-word search functionality is broken
• The output of deterministic encryption doesn’t look like a credit card number or English word
• Some services check formats!
• Can use format-preserving encryption to preserve the format
7
| PROTECT THE KEYS TO EVERYTHING
Assume for now that the deterministic encryption scheme is perfect
• Nothing can be revealed (except that the same value repeats)
Is this good enough?
• It is the best possible in this model, so it must be!
• But the best possible depends on the model
• We don’t have to use a solution based on not changing the server infrastructure
• We don’t have to recommend going to cloud
Not all security is good – the existence of weak solutions can provide a false sense of security, which is dangerous
8
The Best Possible
| PROTECT THE KEYS TO EVERYTHING
Security Analysis
The same word is encrypted to the same ciphertext word every time
• It is possible to gather statistics and learn a lot about the original text
• Natural language has a very specific distribution
• It can be fine for encrypting unique values (credit card numbers, ID numbers; when used as key attribute and so is unique)
9
| PROTECT THE KEYS TO EVERYTHING
Deterministic Encryption – Auxiliary Information
Name Illness
Yehuda Lindell Diabetes
John Smith High blood pressure
Jane Doe Asthma
Mike Jones Diabetes
10
Name Illness
Yehuda Lindell 0010110001101101
John Smith 1111011010101000
Jane Doe 0011110110011001
Mike Jones 0010110001101101
| PROTECT THE KEYS TO EVERYTHING
Deterministic Encryption – Auxiliary Information
Name Illness
Yehuda Lindell Diabetes
John Smith High blood pressure
Jane Doe Asthma
Mike Jones Diabetes
11
Name Illness
Yehuda Lindell 0010110001101101
John Smith 1111011010101000
Jane Doe 0011110110011001
Mike Jones 0010110001101101
Auxiliary informationabout Yehuda Lindellreveals Mike Jones’ condition
| PROTECT THE KEYS TO EVERYTHING
Research paper: Inference Attacks on Property-Preserving Encrypted Database, by Naveed, Kamara and Wright (ACM CCS 2015)
Analyzed the the National Inpatient Sample (NIS) database of the Healthcare Cost and Utilization Project (HCUP) • Available under strict controls only
The target: HCUP 2009 database (encrypted using deterministic encryption)
Auxiliary data: HCUP 2004 database
Simple attack:• Compare histograms and guess based on what minimizes the distance
13
Statistical (Auxiliary Information) Attack on Real Data
| PROTECT THE KEYS TO EVERYTHING
Consider the use of deterministic encryption in Office365 email encryption
Assume an attacker with access to the encrypted database (but not to the key)
The attack:
• The attacker sends an email with keywords of interest to an employee
• The email appears encrypted in the database
• The attacker obtains the encrypted versions of the keywords of interest
• The attacker scans the rest of the database and finds the emails with the keywords
This is a very easy attack to carry out
15
Chosen-Document Attack
| PROTECT THE KEYS TO EVERYTHING
Order Preserving Encryption
Order preserving encryption:• If 𝒙 < 𝒚, then 𝑬𝑲(𝒙) < 𝑬𝑲(𝒚)
An oxymoron• Encryption mixes the plaintext, order-preserving
encryption seems impossible
How does it work?• The “small” domain is randomly thrown into a large
range
Can be constructed efficiently
16
123456
123456789
1011121314151617181920212223
| PROTECT THE KEYS TO EVERYTHING
Order-Preserving Encryption
Name Blood sugar Illness
Yehuda Lindell 125 Diabetes
John Smith 89 High blood pressure
Jane Doe 98 Asthma
Mike Jones 141 Diabetes
17
Name Blood sugar Illness
Yehuda Lindell 01101101 0010110001101101
John Smith 00011010 1111011010101000
Jane Doe 01001100 0011110110011001
Mike Jones 10100110 0010110001101101
| PROTECT THE KEYS TO EVERYTHING
Order Preserving Dangers
18
Name Blood sugar Illness
Yehuda Lindell 01101101 0010110001101101
John Smith 00011010 1111011010101000
Jane Doe 01001100 0011110110011001
Mike Jones 10100110 0010110001101101
An attacker can easily know that Mike Jones has the highest blood
sugar (no auxiliary information needed)
| PROTECT THE KEYS TO EVERYTHING
Inference Attacks on Property-Preserving Encrypted Databases, Naveed, Kamara and Wright, ACM CCS 2015
Considered attacks on format and order preserving encryption on medical databases
Compare sorted values – works better as data density increases• All values appear (e.g., age) – perfect recovery
• If less values appear, then it may be hard to recover exact values
Results show that even in low density databases, exact recovery is not hard• Approximate recovery is always easy
19
Experimentation with Real Datasets
| PROTECT THE KEYS TO EVERYTHING
• What Else is Revealed by Order-Revealing Encryption, Durak, DuBuisson and Cash, ACM CCS 2016.
• Consider use of order-preserving encryption for location data
• Conjecture: since very sparse data, could be effective
22
More on Order-Preserving Encryption
| PROTECT THE KEYS TO EVERYTHING
Deterministic and Order-Preserving Encryption
Deterministic and format-preserving encryption is not secure
• It can be used for very specific cases only (e.g., unique values like credit-card numbers or key attributes in a database)
Order-preserving encryption is even worse
These should be called “encodings” rather than “encryption”
• If they are considered to be a method of obfuscation or hardening, then OK
26
| PROTECT THE KEYS TO EVERYTHING
The Second Setting
Service is modified to enable better security
• Assume a proxy at the gateway, as before
• Encryption and decryption key in the hands of the owner
Can better be achieved? Can probabilistic encryption be used?
27
| PROTECT THE KEYS TO EVERYTHING
Fully homomorphic encryption (FHE)
• Can compute anything over encrypted data
Theoretically, FHE is perfect
Practically:
• It is way too inefficient for mostproblems
28
Data-in-Use Encryption – Theory
Encrypted data-in-useEncrypted data-at-rest
| PROTECT THE KEYS TO EVERYTHING
Encrypt each term probabilistically
• Same term appears differently when encrypted multiple times
• Protection again snapshot attacks
In order to search:
• Client provides a “token” that enables the server to determine if a ciphertext is an encryption of the value being searched
Can be efficiently constructed
Unlike FHE, this reveals the access pattern but only the access pattern
29
Probabilistic Searchable Encryption
| PROTECT THE KEYS TO EVERYTHING
Security properties
• Static data reveals nothing – secure against snapshot attacks
• Search patterns are revealed – same search twice uses same token and returns same set of documents/transactions)
• If two documents are returned then they both contain the same keyword
• If two transactions are returned then they both have the same value
30
Probabilistic Searchable Encryption
| PROTECT THE KEYS TO EVERYTHING
Consider encryption of email and an attacker with access to the encrypted disk (but not the key)
Research paper: All Your Queries Are Belong to Us: The Power of File-Injection Attacks on Searchable Encryption, Zhang, Katz and Papamanthou, Usenix Security 2016.
A warm-up:
• Attacker generates a document containing half the keywords and injects it
• Attacker views whether the injected document is retrieved with others after a query
• If yes, it knows the retrieved documents contain one of the chosen keywords (and does not contain the others)
• If no, vice versa
31
Known-Document Attacks
| PROTECT THE KEYS TO EVERYTHING
• Let 𝐾 be the number of keywords
• Generate log𝐾 documents
• In the 𝑖th document, insert all keywords whose 𝑖th bit equals 1
• Inject all documents (𝐾 = 20 suffices for 1,000,000 keywords!)
• For the email setting, send one email a day for three weeks
• Upon any query, attacker observes exactly which of the injected documents are returned
• These define a unique keyword (known to the attacker)
• The attacker learns the exact keyword and exactly which documents it belongs to
32
General Binary Attack
| PROTECT THE KEYS TO EVERYTHING
Limit the number of keywords in any document
• The paper has more advanced attacks that overcome this limitation
Semantic filtering of nonsensical documents
• Can automatically generate text that makes sense from a series of keywords
Conclusion
• In settings where file injection is possible, probabilistic searchable encryption is very problematic
• In closed settings (e.g., database warehouse with authorized upload), can be OK
33
Attempted Countermeasures
| PROTECT THE KEYS TO EVERYTHING
Assume a semi-honest attacker with knowledge of some plaintext documents
• Attacker must reside on server over time and view access pattern of returned documents
Cash et al. tested this by taking 20 random emails from the Enron email database, and using them to learn another random email*
34
Known-Plaintext Attack
* Cash, Grubbs, Perry and Ristenpart, Leakage-Abuse Attacks Against Searchable Encryption, ACM CCS 2015
| PROTECT THE KEYS TO EVERYTHING
The downside:
• The security of probabilistic searchable encryption can be compromised under
• Chosen-document attacks (devastating)
• Known-document attacks (also very effective)
• Leakage from access patterns in real scenarios is unclear
The upside:
• Probabilistic searchable encryption is secure against snapshot attacks
• Attacks require sophistication and power (attacker needs to run code on the server)
• Very effective against “curious administrator attacks” and “curious external attacker”
• Not very effective when server itself is not trusted
35
Probabilistic Searchable Encryption – Security
| PROTECT THE KEYS TO EVERYTHING
Secure multiparty computation
• Different parties with private inputs compute without revealing anything but the input
• Studied in academia since the mid 1980s
• Became practical very recently
Examples:
• Compare DNA without revealing it
• Decrypt/sign using shares of secret key and without any server knowing the key
• Verify biometrics without revealing the template
36
Secure Multiparty Computation
| PROTECT THE KEYS TO EVERYTHING
Combination of probabilistic searchable encryption and MPC
• Filter by unencrypted values (e.g., date of transaction)
• Filter by WHERE EQUALS clauses using probabilistic searchable encryption
• Run a mix-net to break access patterns (and obtain a fresh sharing of all remaining data)
• Filter WHERE GREATER THAN clauses (and the like) using MPC
• Aggregate and compare using additively homomorphic secret shares
• JOIN and GROUP BY using a special 3-party protocol
Dyadic’s experiments show that this can achieve performance within an order of magnitude of standard PostgreSQL (with three mid-level servers)
37
Secure SQL
| PROTECT THE KEYS TO EVERYTHING
• Access patterns of probabilistic searchable encryption are leaked
• A mix-net is used to prevent direct access patterns from being revealed within a query and over different queries (after searchable encryption)
• Note that some patterns can be determined by the number of transactions that pass a query
• This is the first solution that provides rich SQL with full security against snapshots and leakage equivalent to probabilistic searchable encryption
38
Security of the Solution
| PROTECT THE KEYS TO EVERYTHING
Encryption in the cloud is very difficult• Even for just data-at-rest, can you trust the cloud to manage your keys?
• For data-in-use, standard strong encryption cannot be used
Deterministic and format-preserving encryption is not secure• Should not be used except for very limited settings
Order-preserving encryption is extremely weak and should not be used at all
Probabilistic searchable encryption is much better• Protection against snapshot attacks
• Can be attacked using chosen and known-document attacks
• More effective when the threat is an outside attacker and not the server itself
A combination of probabilistic searchable encryption and MPC can go even further
39
Summary