securing data-in-use in the cloud - microsoftrnd.co.il kit/bluehat il decks/prof... · securing...

| PROTECT THE KEYS TO EVERYTHING © 2015 Dyadic. All rights reserved. Confidential and proprietary.

Securing Data-in-Use in the CloudMyths and Facts

Yehuda LindellDyadic Security & Bar-Ilan University

| PROTECT THE KEYS TO EVERYTHING

Moving to the cloud has many operational advantages

Security in the cloud is a double-edged sword:• Security may be much better in many cases

• But, there is a loss of control

Loss of control• Psychological effect (car vs plane accidents)

• Regulatory effect

The real threat – being caught up in a mega breach• The stakes are rising all the time

2

The Problem


In 2012, attackers breached LinkedIn• Password hashes were not salted (and no iterations)

A Dropbox employee reused their password in LinkedIn

Attackers used the password to access the Dropbox user/password database

The file of passwords was dumped online• 68 million credentials exposed• Brute-force attacks are effective (albeit not so easy, since bcrypt was used

– but certainly vulnerable unlike Dropbox claim)

All Dropbox users caught up in the breach...

3

The Dropbox Breach (via LinkedIn)


A seemingly simple solution: Encrypt!

Problem: Where do you put the key and how do you protect it?

• Option 1: You keep it, but the cloud then becomes an encrypted disk and no more

• Fine for Dropbox-type applications

• Option 2: On the VM – worthless

• Option 3: Managed by the cloud

4

Protecting Data-at-Rest in the Cloud


We are all familiar with the “data-in-motion” and “data-at-rest” paradigm

The power of the cloud is in computation• If encrypted data-at-rest is decrypted only by the client, then the cloud becomes a disk

• If encrypted data-at-rest is decrypted in the cloud, then it becomes vulnerable

• A new paradigm: encrypting data-in-use• Data is encrypted by client

• Data is uploaded to cloud

• Cloud computes on the data while encrypted and returns encrypted answer to client

• Client decrypts

5

Protecting Data


Two Very Different Settings

An existing cloud/service provider

• Encrypt data before it gets to the cloud – use proxy

• The cloud/service doesn’t know that it’s encrypted

Advantage: use existing infrastructure

Disadvantage: limits what can be done

6

A modified cloud/service

• Encrypt data before it gets to the cloud – use proxy

• The cloud/service knows about the encryption and is modified appropriately

Advantage: more flexibility; can do more

Disadvantage: doesn’t work with existing solutions; much harder to adopt


Deterministic Encryption

Encrypt each “word” by applying deterministic encryption to the term

• Technically – apply a pseudorandom permutation

Search by encrypting the search term and simply comparing (standard search)

Notes:

• Partial-word search functionality is broken

• The output of deterministic encryption doesn’t look like a credit card number or English word

• Some services check formats!

• Can use format-preserving encryption to preserve the format

7


Assume for now that the deterministic encryption scheme is perfect

• Nothing can be revealed (except that the same value repeats)

Is this good enough?

• It is the best possible in this model, so it must be!

• But the best possible depends on the model

• We don’t have to use a solution based on not changing the server infrastructure

• We don’t have to recommend going to cloud

Not all security is good – the existence of weak solutions can provide a false sense of security, which is dangerous

8

The Best Possible


Security Analysis

The same word is encrypted to the same ciphertext word every time

• It is possible to gather statistics and learn a lot about the original text

• Natural language has a very specific distribution

• It can be fine for encrypting unique values (credit card numbers, ID numbers; when used as key attribute and so is unique)

9


Deterministic Encryption – Auxiliary Information

Name Illness

Yehuda Lindell Diabetes

John Smith High blood pressure

Jane Doe Asthma

Mike Jones Diabetes

10

Name Illness

Yehuda Lindell 0010110001101101

John Smith 1111011010101000

Jane Doe 0011110110011001

Mike Jones 0010110001101101


Deterministic Encryption – Auxiliary Information

Name Illness

Yehuda Lindell Diabetes

John Smith High blood pressure

Jane Doe Asthma

Mike Jones Diabetes

11

Name Illness

Yehuda Lindell 0010110001101101

John Smith 1111011010101000

Jane Doe 0011110110011001

Mike Jones 0010110001101101

Auxiliary informationabout Yehuda Lindellreveals Mike Jones’ condition


A Real Life Example

12


Research paper: Inference Attacks on Property-Preserving Encrypted Database, by Naveed, Kamara and Wright (ACM CCS 2015)

Analyzed the the National Inpatient Sample (NIS) database of the Healthcare Cost and Utilization Project (HCUP) • Available under strict controls only

The target: HCUP 2009 database (encrypted using deterministic encryption)

Auxiliary data: HCUP 2004 database

Simple attack:• Compare histograms and guess based on what minimizes the distance

13

Statistical (Auxiliary Information) Attack on Real Data

https://cs.brown.edu/~seny/pubs/edb.pdf

| PROTECT THE KEYS TO EVERYTHING 14

Results


Consider the use of deterministic encryption in Office365 email encryption

Assume an attacker with access to the encrypted database (but not to the key)

The attack:

• The attacker sends an email with keywords of interest to an employee

• The email appears encrypted in the database

• The attacker obtains the encrypted versions of the keywords of interest

• The attacker scans the rest of the database and finds the emails with the keywords

This is a very easy attack to carry out

15

Chosen-Document Attack


Order Preserving Encryption

Order preserving encryption:• If 𝒙 < 𝒚, then 𝑬𝑲(𝒙) < 𝑬𝑲(𝒚)

An oxymoron• Encryption mixes the plaintext, order-preserving

encryption seems impossible

How does it work?• The “small” domain is randomly thrown into a large

range

Can be constructed efficiently

16

123456

123456789

1011121314151617181920212223


Order-Preserving Encryption

Name Blood sugar Illness

Yehuda Lindell 125 Diabetes

John Smith 89 High blood pressure

Jane Doe 98 Asthma

Mike Jones 141 Diabetes

17


Yehuda Lindell 01101101 0010110001101101

John Smith 00011010 1111011010101000

Jane Doe 01001100 0011110110011001

Mike Jones 10100110 0010110001101101


Order Preserving Dangers

18


Yehuda Lindell 01101101 0010110001101101

John Smith 00011010 1111011010101000

Jane Doe 01001100 0011110110011001

Mike Jones 10100110 0010110001101101

An attacker can easily know that Mike Jones has the highest blood

sugar (no auxiliary information needed)


Inference Attacks on Property-Preserving Encrypted Databases, Naveed, Kamara and Wright, ACM CCS 2015

Considered attacks on format and order preserving encryption on medical databases

Compare sorted values – works better as data density increases• All values appear (e.g., age) – perfect recovery

• If less values appear, then it may be hard to recover exact values

Results show that even in low density databases, exact recovery is not hard• Approximate recovery is always easy

19

Experimentation with Real Datasets


• What Else is Revealed by Order-Revealing Encryption, Durak, DuBuisson and Cash, ACM CCS 2016.

• Consider use of order-preserving encryption for location data

• Conjecture: since very sparse data, could be effective

22

More on Order-Preserving Encryption


SpitzLoc Database – Utilizing Column Correlations


2000 Random Points out of 20,000 California Intersections


Attacks Using Most-Significant Bit Leakage


Deterministic and Order-Preserving Encryption

Deterministic and format-preserving encryption is not secure

• It can be used for very specific cases only (e.g., unique values like credit-card numbers or key attributes in a database)

Order-preserving encryption is even worse

These should be called “encodings” rather than “encryption”

• If they are considered to be a method of obfuscation or hardening, then OK

26


The Second Setting

Service is modified to enable better security

• Assume a proxy at the gateway, as before

• Encryption and decryption key in the hands of the owner

Can better be achieved? Can probabilistic encryption be used?

27


Fully homomorphic encryption (FHE)

• Can compute anything over encrypted data

Theoretically, FHE is perfect

Practically:

• It is way too inefficient for mostproblems

28

Data-in-Use Encryption – Theory

Encrypted data-in-useEncrypted data-at-rest


Encrypt each term probabilistically

• Same term appears differently when encrypted multiple times

• Protection again snapshot attacks

In order to search:

• Client provides a “token” that enables the server to determine if a ciphertext is an encryption of the value being searched

Can be efficiently constructed

Unlike FHE, this reveals the access pattern but only the access pattern

29

Probabilistic Searchable Encryption


Security properties

• Static data reveals nothing – secure against snapshot attacks

• Search patterns are revealed – same search twice uses same token and returns same set of documents/transactions)

• If two documents are returned then they both contain the same keyword

• If two transactions are returned then they both have the same value

30

Probabilistic Searchable Encryption


Consider encryption of email and an attacker with access to the encrypted disk (but not the key)

Research paper: All Your Queries Are Belong to Us: The Power of File-Injection Attacks on Searchable Encryption, Zhang, Katz and Papamanthou, Usenix Security 2016.

A warm-up:

• Attacker generates a document containing half the keywords and injects it

• Attacker views whether the injected document is retrieved with others after a query

• If yes, it knows the retrieved documents contain one of the chosen keywords (and does not contain the others)

• If no, vice versa

31

Known-Document Attacks

https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_zhang.pdf


• Let 𝐾 be the number of keywords

• Generate log𝐾 documents

• In the 𝑖th document, insert all keywords whose 𝑖th bit equals 1

• Inject all documents (𝐾 = 20 suffices for 1,000,000 keywords!)

• For the email setting, send one email a day for three weeks

• Upon any query, attacker observes exactly which of the injected documents are returned

• These define a unique keyword (known to the attacker)

• The attacker learns the exact keyword and exactly which documents it belongs to

32

General Binary Attack


Limit the number of keywords in any document

• The paper has more advanced attacks that overcome this limitation

Semantic filtering of nonsensical documents

• Can automatically generate text that makes sense from a series of keywords

Conclusion

• In settings where file injection is possible, probabilistic searchable encryption is very problematic

• In closed settings (e.g., database warehouse with authorized upload), can be OK

33

Attempted Countermeasures


Assume a semi-honest attacker with knowledge of some plaintext documents

• Attacker must reside on server over time and view access pattern of returned documents

Cash et al. tested this by taking 20 random emails from the Enron email database, and using them to learn another random email*

34

Known-Plaintext Attack

* Cash, Grubbs, Perry and Ristenpart, Leakage-Abuse Attacks Against Searchable Encryption, ACM CCS 2015


The downside:

• The security of probabilistic searchable encryption can be compromised under

• Chosen-document attacks (devastating)

• Known-document attacks (also very effective)

• Leakage from access patterns in real scenarios is unclear

The upside:

• Probabilistic searchable encryption is secure against snapshot attacks

• Attacks require sophistication and power (attacker needs to run code on the server)

• Very effective against “curious administrator attacks” and “curious external attacker”

• Not very effective when server itself is not trusted

35

Probabilistic Searchable Encryption – Security


Secure multiparty computation

• Different parties with private inputs compute without revealing anything but the input

• Studied in academia since the mid 1980s

• Became practical very recently

Examples:

• Compare DNA without revealing it

• Decrypt/sign using shares of secret key and without any server knowing the key

• Verify biometrics without revealing the template

36

Secure Multiparty Computation


Combination of probabilistic searchable encryption and MPC

• Filter by unencrypted values (e.g., date of transaction)

• Filter by WHERE EQUALS clauses using probabilistic searchable encryption

• Run a mix-net to break access patterns (and obtain a fresh sharing of all remaining data)

• Filter WHERE GREATER THAN clauses (and the like) using MPC

• Aggregate and compare using additively homomorphic secret shares

• JOIN and GROUP BY using a special 3-party protocol

Dyadic’s experiments show that this can achieve performance within an order of magnitude of standard PostgreSQL (with three mid-level servers)

37

Secure SQL


• Access patterns of probabilistic searchable encryption are leaked

• A mix-net is used to prevent direct access patterns from being revealed within a query and over different queries (after searchable encryption)

• Note that some patterns can be determined by the number of transactions that pass a query

• This is the first solution that provides rich SQL with full security against snapshots and leakage equivalent to probabilistic searchable encryption

38

Security of the Solution


Encryption in the cloud is very difficult• Even for just data-at-rest, can you trust the cloud to manage your keys?

• For data-in-use, standard strong encryption cannot be used

Deterministic and format-preserving encryption is not secure• Should not be used except for very limited settings

Order-preserving encryption is extremely weak and should not be used at all

Probabilistic searchable encryption is much better• Protection against snapshot attacks

• Can be attacked using chosen and known-document attacks

• More effective when the threat is an outside attacker and not the server itself

A combination of probabilistic searchable encryption and MPC can go even further

39

Summary


THANK YOU

40

securing data-in-use in the cloud - microsoftrnd.co.il kit/bluehat il decks/prof... · securing...

Documents