[ieee 22nd international conference on data engineering (icde'06) - atlanta, ga, usa...

Provable Security for Outsourcing Database Operations

Sergei Evdokimov, Matthias Fischmann, Oliver GuntherHumboldt-Universitat zu Berlin

{evdokim,fis,guenther}@wiwi.hu-berlin.de

Database outsourcing, whilst becoming more popu-lar in recent years, is creating substantial security andprivacy risks. In this paper, we assess cryptographicsolutions to the problem that some client party (Alex)wants to outsource database operations on sensitivedata sets to a service provider (Eve) without having totrust her. Contracts are an option, but for various rea-sons their effectiveness is limited [2]. Alex would ratherlike to use privacy homomorphisms [6], i.e., encryptionschemes that transform relational data sets and queriesinto ciphertext such that (i) the data is securely hid-den from Eve; and (ii) Eve computes hidden resultsfrom hidden queries that Alex can efficiently decrypt.Unfortunately, all privacy homomorphisms we know oflack a rigorous security analysis. Before they can beused in practice, we need formal definitions that areboth sound and practical to assess their effectiveness.

The strongest notion of confidentiality requires ev-ery single bit of information on queries as well as dataitself to be kept secret from Eve, no matter how manyqueries she can observe, or how rudely she breaks thecontract. This notion would allow for doing businesswith arbitrarily malicious service providers – but is im-possible to achieve. We present and analyse a solutionfor the case that (i) Alex trusts Eve to behave accord-ing to protocol and not secretly analyse the data shecan gather, but (ii) is worried about Eve’s data beingsold to an adversary of Alex.

Related Work: In 2002, Hacıgumus et al. pro-pose a database encryption scheme for full SQL [4].The idea is that every tuple is encrypted with a securecipher first, then weakly encrypted attributes are at-tached to the ciphertext. These weak encryptions areobtained by taking a plaintext attribute value, map-ping it to a containing interval, and encrypting thatinterval using a secret permutation. Two plaintextsmay therefore map to the same ciphertext, even if theyare not equal. Some of the information contained inthe plaintext is destroyed but not as much as in anordinary encryption scheme. While the remaining in-formation (like the number of tuples of the table, or

which tuples have similar values in which secret at-tributes) is enough to query the encrypted databasewith little post-processing, one hopes that Eve stilldoes not obtain too much information on the data ifthe partitionings into intervals and the other securityparameters have been chosen properly. However, thesecurity is based on intuition alone; a rigorous analysisis missing.

1 Relevant Notions of Security

An encryption scheme is a tuple (K, E,D), whereE : K × X �→ Y, D : K × Y �→ X are encryp-tion and decryption functions that convert a key andplaintext x ∈ X into the corresponding ciphertexty ∈ Y . It must hold that Dk(Ek(x)) = x, or for short:D(E(x)) = x. Keys are chosen uniformly at randomfrom the key space K. The bit length n = log(|K|) ofall keys is called the security parameter of the scheme.We now introduce a class of encryption schemes calleddatabase privacy homomorphisms that has additionalproperties:

Definition 1.1 (Database PH) Let (K, E,D),(K, Eq, Dq) be encryption schemes, R be a set ofrelations, C be a set of ciphertexts. A databaseprivacy homomorphism (or database PH, for short) isa tuple (K, E,Eq, D) such that

1. E : K × R → C encrypts tables,D : K × C → R decrypts tables, andEq : K × {σi} → {ψi} encrypts queries.

2. For any relation R and any relational operationσi, Ek(σi(R)) = ψi(Ek(R)).

Intuitively, that means that the plaintext operationscan be carried out on the ciphertext, producing cipher-texts containing the result. Note that our definitioncontains an extra query encryption function Eq. Thisturns out to be useful for reasoning about data confi-dentiality: In some attacks, if Eve knows a few plain-

1

Proceedings of the 22nd International Conference on Data Engineering (ICDE’06) 8-7695-2570-9/06 $20.00 © 2006 IEEE

text queries and some matching encrypted data, shecan infer significant information on the plaintext data.

Also, we only consider schemes that perform tuple-by-tuple encryption, i.e., for table R = {v1, . . . , vn},the ciphertext is the set C = {c1, . . . , cn}, where ci isthe encryption of tuple vi.

The most common notions of security are based ongames between an honest user (such as Alex) and anadversary (such as Eve) modeled by a probabilisticpolynomial-time algorithm. A scheme is secure if Evecannot win the game with probability 1

2 + 1/p(n) forevery polynomial p in polynomial time for sufficientlylarge n. If this is true, the winning probability is callednegligible. The following (or similar) definition can befound in any textbook on cryptography.:

Definition 1.2 An encryption scheme (K, E,D) is in-distinguishably secure if Eve cannot win the followinggame:1. Eve chooses two plaintexts m1,m2 of the same

length and presents them to Alex.2. Alex chooses i ∈ {1, 2} uniformly at random and

presents Ek(mi) to Eve.3. Eve must guess i.

E.g., Eve can win this game if by having plaintextencrypted she can learn enough about it to decrypt onebit of a chosen ciphertext. She merely needs to choosethe plaintexts such that this bit suffices to distinguishthe two.

This definition considers a passive adversary thatonly intercepts and processes the ciphertexts. Thereare exist more powerful adversary models capable ofproducing plaintexts and ciphertexts using encryptionand decryption oracles.1 But it is already easy tofind adversaries that win the game in this weak sensefor most schemes. Consider the scheme proposed byHacıgumus et al. in [4]. Let Eve produce two tables:

table 1:ID salary171 4900481 1200

table 2:ID salary171 4900481 4900

Next, Eve obtains a ciphertext, which is the encryp-tion of one of the tables, from Alex. According to thescheme, the salaries in the first table will be mapped todifferent intervals with high probability. The salaries inthe second table will be mapped to the same interval.Since the intervals are encrypted deterministically, the

1Eve is said to be an active adversary if she can produce sam-ples of her choice. This is modelled with a fictitious oracle. Inpractice, Eve usually obtains her answers from Alex by sendinghim confusing messages. If Alex is not a human but a machinerunning a complex application, this is a very real risk.

weak encryptions of the ”salary“ attribute of the firsttable will different, and the analogous weak encryptionfor the second table will be identical. Hence, Eve candetermine with high probability to which table corre-sponds the received ciphertext: If there are two dif-ferent weak encryptions of the “salary” attribute, Eveoutputs 1; otherwise, she outputs 2. Similar attackswork on the scheme of Damiani et al. [3].

2 Database Privacy Homomorphismsand Security Limitations

Although privacy homomorphisms are just a sub-set of the set of all encryption schemes, the ability totransform plaintext operations into corresponding ci-phertext operations yields new possibilities for an ad-versary to get insights into encrypted data. Consideran indistinguishably secure database PH. If an adver-sary runs the query σai=d on the encrypted table, shereceives a set of encrypted tuples. Though she cannotdecrypt them, she knows that the value of the attributeai of these tuples is d.

Why does a scheme that is secure against thestrongest adversary still leak so much information?The problem is that the classical model considers onlyparties operating with plain- and ciphertexts. Butin this scenario parties also exchange queries and theresults of these queries which can reveal informationabout plaintext. Kantarcıoglu and Clifton [5] proposea definition that addresses this problem in which theystate that a database PH is secure if any two tableswith the same number of tuples as well as any twoqueries returning the same number of tuples are indis-tinguishable. They suspect that it may already be toostrong to have a solution. Although we show that thisis not true, there is another problem: A scheme securein this sense does allow the adversary to get informa-tion about the plaintext with high probability.

Consider an example where Alex owns a databasewith statistics for three competing hospitals, keepingtrack of the state in which patients are leaving eachhospital. Each patient is described by the attributesid, name, hospital, and outcome (outcome is a binaryattribute either set to ‘fatal’ or ‘healthy’). Now supposethat Eve knows the database schema, the number ofhospitals, and has good estimates of the distribution ofpatient flows (0.2, 0.3, 0.5 resp.) and the ratio of fatalvs. successful outcomes (0.08, 0.92).

Alex issues the following queries:

SELECT * FROM table WHERE hospital = 1;SELECT * FROM table WHERE hospital = 2;SELECT * FROM table WHERE hospital = 3;

2


SELECT * FROM table WHERE outcome = ’fatal’;

From the size of the results and the fact that we onlyhave exact selects, Eve can guess the exact queries withhigh confidence (conditions on id and name yield farfewer hits). Then, by intersecting the answers to thefirst and the fourth query, Eve can infer the ratio oflethal to successful outcomes in hospital 1!

In another example we consider an active adver-sary with the ability to get encrypted queries of herchoice and show that no matter how secure the ta-ble is encrypted, such an adversary is able to de-duce a significant amount of information about theencrypted data. Suppose there was a patient “John”and Eve wants to find out in which hospital he wastreated and what happened to him. She issues the en-cryption of query σname:John using the query encryp-tion oracle. Then Eve issues encryptions of queriesσhospital:X , X ∈ {1, 2, 3}. By intersecting the resultsof the four queries issued, Eve can determine the hos-pital where John was treated. Analogously, she canfind his status.

The definition of a secure database PH addressingthese issues can be formulated as follows:

Definition 2.1 A database PH provides indistin-guishability if Eve cannot win the following game:

1. Eve chooses two tables T1(R), T2(R) containingthe same numbers of tuples and presents them toAlex.

2. Alex chooses i ∈ {1, 2} uniformly at random andpresents Ek(Ti(R)) to Eve.

3. Eve receives at most q encrypted queries issued toEk(Ti(R)) and computes the results (in case of ac-tive adversary Eve has access to the queries en-cryption oracle and issues q encryptions of plain-text queries of her choice).

4. Eve must guess i.

The above examples demonstrate that this level ofsecurity cannot be achieved by any database PH, nomatter how advanced. This can be established for-mally:

Theorem 2.1 Any database PH (K, E,Eq, D) is in-secure in the sense of Definition 2.1 if q > 0.

Although if the adversary is passive, the case is lessobvious, in both cases the security of the encrypteddata cannot be guaranteed.

Intuition indicates that Definition 2.1 is too strong.Several relaxations can be considered: (i) Allow passiveadversaries only, (ii) allow for some limited information

leakage, or (iii) grant less plaintext knowledge to Eve.Unfortunately, none of these ideas is very promising.

(i) Passive adversaries are indeed an option in gen-eral. However in our setting, Eve has unlimited writeaccess to all ciphertext and can produce any messagesequence necessary for her attack. To demonstrate theplausibility of active adversaries that have consider-ably fewer options than Eve, Bleichenbacher has im-plemented a practical attack against SSL in which therole of the oracle is played by a confused web server[1]. (ii) Often only one bit of information is of in-terest to Eve (e.g., in the acknowledgement resp. can-celling of a stock order sent to a broker). Even if thisis not the case, it is usually infeasible for the applica-tion designer to specify which bits are confidential andwhich are not. Hence, a secure scheme must not leaka single bit. (iii) Changing the amount of Eve’s plain-text knowledge is the most problematic approach. Ifan encryption scheme is used as a building block forbuilding complex applications, Eve must be assumedto have full access to the source code. In combina-tion with timing attacks and other traffic analysis tech-niques, this gives her extensive knowledge on the struc-ture and purpose of the encrypted data. Assuming aless well-informed Eve is a textbook case of security byobscurity and usually leads to disaster. The Risk Di-gest provides a rich collection of anecdotical evidence:http://catless.ncl.ac.uk/Risks. For a more de-tailed discussion, see the full version of this paper.

So what can be done? Assume that Alex trusts Evenot to attack him directly but still worries about herbecoming adversarial in the future (e.g., by a changeof company ownership). If Alex’s trust in Eve dete-riorates, he can cancel the contract in time and stopsending queries. Consequently, q = 0 and Theorem2.1 does not apply. In the next section we present asolution for this case.

3 A Privacy Homomorphism Preserv-ing Exact Selects

To conclude this poster, we give a general construc-tion of a database PH based on searchable encryptionschemes. One such scheme has been proposed by Songet al. [7] and includes a proof of security, but otherscan be used instead. In the full version of this paper,we describe a few straight-forward optimizations suchas attributes of variable length, and present a formalsecurity proof of our construction under the assump-tion that the underlying searchable encryption schemeis secure.

We write a|b for concatenation of strings aand b. ’#’ is the padding symbol. Consider

3


the relation Emp(name:string[9], dept:string[5],salary:int). We can construct a database privacyhomomorphism for the schema as follows.

First, create words that are strings of the samelength (the shorter the better) and that identify theattributes of the relation. Then bijectively mapthe tuples of the given relation to the documents,or sets of words.2 The number of words in eachdocument is equal to the number of the attributesin the relation. The globally fixed word length isthe length of the longest attribute value plus thelength of an attribute identifier (required for decryp-tion, see below). For example, the relation Empwill be mapped to the following set of documents:{string[9]|”N”, string[9]|”D”, string[9]|”S”}.

Now, map all tuples of the original relation Emp todocuments. For example:

〈name:"Montgomery", dept:"HR", sal:7500〉 �→{"MontgomeryN", "HR########D", "7500######S"}

These documents are encrypted using a searchableencryption scheme and stored on a remote server.

In order to perform exact selects on the encryptedrelation, queries of

σattribute name:value

will be mapped to the search operation

ϕtoString(attribute value)|”attribute id”

and processed as a search operation, returning a setof encrypted strings. The strings then are decryptedand mapped to the corresponding tuples, producingthe result of the issued query. E.g.:

σname:”Montgomery” �→ ϕ”MontgomeryN”

Note that some searchable encryption schemes, andin particular the scheme presented in [7], sometimesreturn false positives. Alex needs to run a filter onthe output. As the error rate is relatively small for allpractical purposes, this does not affect the efficiency ofour construction.

4 Conclusions and Future Work

If a database production system is deployed, nobodyquestions the virtue of a sound theory of databasesthat this system is based on. If an insecure network

2One could model documents as sequences and not sets, butsets are strictly more general, and for our purposes the wordorder is irrelevant.

connection is used between client and server for sen-sitive applications, reliable encryption and authentica-tion mechanisms are used to protect the user againstattacks. However, for the case that the database serveritself goes adversarial, recent work has been based onmuch lower standards: Rather than focusing on ac-ceptable worst-case bounds of security mechanisms, re-searchers have been overly concerned with minimizingtheir performance overhead.

In this paper, we have proposed a new security def-inition for database PHs. Any scheme that satisfiesthis definition allows for secure processing of encrypteddata in the presence of an untrusted database serviceprovider. Further, we have reasoned that this defini-tion cannot be satisfied, suggesting that we should notexpect too much from any previous or future work inthis field.

On the bright side, we have given a general construc-tion for a database PH that can be proved to be securein a relaxed, but still rigorous and plausible sense underwidely accepted cryptographic assumptions.

References

[1] D. Bleichenbacher. Chosen Ciphertext AttacksAgainst Protocols Based on the RSA EncryptionStandard PKCS#1. In Advances in Cryptology,1998.

[2] C. Boyens and O. Gunther. Trust is not Enough:Privacy and Security in ASP and Web Service En-vironments. In ADBIS’02, volume 2435 of LNCS.

[3] E. Damiani, S. De Capitani Vimercati, S. Jajodia,S. Paraboschi, and P. Samarati. Balancing Con-fidentiality and Efficiency in Untrusted RelationalDBMSs. In CCS’03. ACM Press.

[4] H. Hacıgumus, B. Iyer, C. Li, and S. Mehrotra. Ex-ecuting SQL over Encrypted Data in the Database-Service-Provider Model. In SIGMOD’02. ACMPress.

[5] M. Kantarcıoglu and C. Clifton. Security Issuesin Querying Encrypted Data. Purdue ComputerScience Technical Report 04-013, 2004.

[6] R. L. Rivest, L. Adleman, and M. L. Dertouzos.On Data Banks and Privacy Homomorphisms.In Foundations of Secure Computation. AcademicPress, 1978.

[7] D. X. Song, D. Wagner, and A. Perrig. Practi-cal Techniques for Searches on Encrypted Data. InIEEE Symposium on Security and Privacy, 2000.

4


[ieee 22nd international conference on data engineering (icde'06) - atlanta, ga, usa...

Documents