# [IEEE 22nd International Conference on Data Engineering (ICDE'06) - Atlanta, GA, USA (2006.04.3-2006.04.7)] 22nd International Conference on Data Engineering (ICDE'06) - Provable Security for Outsourcing Database Operations

Post on 27-Feb-2017

212 views

Embed Size (px)

TRANSCRIPT

<ul><li><p>Provable Security for Outsourcing Database Operations</p><p>Sergei Evdokimov, Matthias Fischmann, Oliver GuntherHumboldt-Universitat zu Berlin</p><p>{evdokim,fis,guenther}@wiwi.hu-berlin.de</p><p>Database outsourcing, whilst becoming more popu-lar in recent years, is creating substantial security andprivacy risks. In this paper, we assess cryptographicsolutions to the problem that some client party (Alex)wants to outsource database operations on sensitivedata sets to a service provider (Eve) without having totrust her. Contracts are an option, but for various rea-sons their effectiveness is limited [2]. Alex would ratherlike to use privacy homomorphisms [6], i.e., encryptionschemes that transform relational data sets and queriesinto ciphertext such that (i) the data is securely hid-den from Eve; and (ii) Eve computes hidden resultsfrom hidden queries that Alex can efficiently decrypt.Unfortunately, all privacy homomorphisms we know oflack a rigorous security analysis. Before they can beused in practice, we need formal definitions that areboth sound and practical to assess their effectiveness.</p><p>The strongest notion of confidentiality requires ev-ery single bit of information on queries as well as dataitself to be kept secret from Eve, no matter how manyqueries she can observe, or how rudely she breaks thecontract. This notion would allow for doing businesswith arbitrarily malicious service providers but is im-possible to achieve. We present and analyse a solutionfor the case that (i) Alex trusts Eve to behave accord-ing to protocol and not secretly analyse the data shecan gather, but (ii) is worried about Eves data beingsold to an adversary of Alex.</p><p>Related Work: In 2002, Hacgumus et al. pro-pose a database encryption scheme for full SQL [4].The idea is that every tuple is encrypted with a securecipher first, then weakly encrypted attributes are at-tached to the ciphertext. These weak encryptions areobtained by taking a plaintext attribute value, map-ping it to a containing interval, and encrypting thatinterval using a secret permutation. Two plaintextsmay therefore map to the same ciphertext, even if theyare not equal. Some of the information contained inthe plaintext is destroyed but not as much as in anordinary encryption scheme. While the remaining in-formation (like the number of tuples of the table, or</p><p>which tuples have similar values in which secret at-tributes) is enough to query the encrypted databasewith little post-processing, one hopes that Eve stilldoes not obtain too much information on the data ifthe partitionings into intervals and the other securityparameters have been chosen properly. However, thesecurity is based on intuition alone; a rigorous analysisis missing.</p><p>1 Relevant Notions of Security</p><p>An encryption scheme is a tuple (K, E,D), whereE : K X Y, D : K Y X are encryp-tion and decryption functions that convert a key andplaintext x X into the corresponding ciphertexty Y . It must hold that Dk(Ek(x)) = x, or for short:D(E(x)) = x. Keys are chosen uniformly at randomfrom the key space K. The bit length n = log(|K|) ofall keys is called the security parameter of the scheme.We now introduce a class of encryption schemes calleddatabase privacy homomorphisms that has additionalproperties:</p><p>Definition 1.1 (Database PH) Let (K, E,D),(K, Eq, Dq) be encryption schemes, R be a set ofrelations, C be a set of ciphertexts. A databaseprivacy homomorphism (or database PH, for short) isa tuple (K, E,Eq, D) such that1. E : K R C encrypts tables,</p><p>D : K C R decrypts tables, andEq : K {i} {i} encrypts queries.</p><p>2. For any relation R and any relational operationi, Ek(i(R)) = i(Ek(R)).</p><p>Intuitively, that means that the plaintext operationscan be carried out on the ciphertext, producing cipher-texts containing the result. Note that our definitioncontains an extra query encryption function Eq. Thisturns out to be useful for reasoning about data confi-dentiality: In some attacks, if Eve knows a few plain-</p><p>1</p><p>Proceedings of the 22nd International Conference on Data Engineering (ICDE06) 8-7695-2570-9/06 $20.00 2006 IEEE </p></li><li><p>text queries and some matching encrypted data, shecan infer significant information on the plaintext data.</p><p>Also, we only consider schemes that perform tuple-by-tuple encryption, i.e., for table R = {v1, . . . , vn},the ciphertext is the set C = {c1, . . . , cn}, where ci isthe encryption of tuple vi.</p><p>The most common notions of security are based ongames between an honest user (such as Alex) and anadversary (such as Eve) modeled by a probabilisticpolynomial-time algorithm. A scheme is secure if Evecannot win the game with probability 12 + 1/p(n) forevery polynomial p in polynomial time for sufficientlylarge n. If this is true, the winning probability is callednegligible. The following (or similar) definition can befound in any textbook on cryptography.:</p><p>Definition 1.2 An encryption scheme (K, E,D) is in-distinguishably secure if Eve cannot win the followinggame:1. Eve chooses two plaintexts m1,m2 of the same</p><p>length and presents them to Alex.2. Alex chooses i {1, 2} uniformly at random and</p><p>presents Ek(mi) to Eve.3. Eve must guess i.</p><p>E.g., Eve can win this game if by having plaintextencrypted she can learn enough about it to decrypt onebit of a chosen ciphertext. She merely needs to choosethe plaintexts such that this bit suffices to distinguishthe two.</p><p>This definition considers a passive adversary thatonly intercepts and processes the ciphertexts. Thereare exist more powerful adversary models capable ofproducing plaintexts and ciphertexts using encryptionand decryption oracles.1 But it is already easy tofind adversaries that win the game in this weak sensefor most schemes. Consider the scheme proposed byHacgumus et al. in [4]. Let Eve produce two tables:</p><p>table 1:ID salary171 4900481 1200</p><p>table 2:ID salary171 4900481 4900</p><p>Next, Eve obtains a ciphertext, which is the encryp-tion of one of the tables, from Alex. According to thescheme, the salaries in the first table will be mapped todifferent intervals with high probability. The salaries inthe second table will be mapped to the same interval.Since the intervals are encrypted deterministically, the</p><p>1Eve is said to be an active adversary if she can produce sam-ples of her choice. This is modelled with a fictitious oracle. Inpractice, Eve usually obtains her answers from Alex by sendinghim confusing messages. If Alex is not a human but a machinerunning a complex application, this is a very real risk.</p><p>weak encryptions of the salary attribute of the firsttable will different, and the analogous weak encryptionfor the second table will be identical. Hence, Eve candetermine with high probability to which table corre-sponds the received ciphertext: If there are two dif-ferent weak encryptions of the salary attribute, Eveoutputs 1; otherwise, she outputs 2. Similar attackswork on the scheme of Damiani et al. [3].</p><p>2 Database Privacy Homomorphismsand Security Limitations</p><p>Although privacy homomorphisms are just a sub-set of the set of all encryption schemes, the ability totransform plaintext operations into corresponding ci-phertext operations yields new possibilities for an ad-versary to get insights into encrypted data. Consideran indistinguishably secure database PH. If an adver-sary runs the query ai=d on the encrypted table, shereceives a set of encrypted tuples. Though she cannotdecrypt them, she knows that the value of the attributeai of these tuples is d.</p><p>Why does a scheme that is secure against thestrongest adversary still leak so much information?The problem is that the classical model considers onlyparties operating with plain- and ciphertexts. Butin this scenario parties also exchange queries and theresults of these queries which can reveal informationabout plaintext. Kantarcoglu and Clifton [5] proposea definition that addresses this problem in which theystate that a database PH is secure if any two tableswith the same number of tuples as well as any twoqueries returning the same number of tuples are indis-tinguishable. They suspect that it may already be toostrong to have a solution. Although we show that thisis not true, there is another problem: A scheme securein this sense does allow the adversary to get informa-tion about the plaintext with high probability.</p><p>Consider an example where Alex owns a databasewith statistics for three competing hospitals, keepingtrack of the state in which patients are leaving eachhospital. Each patient is described by the attributesid, name, hospital, and outcome (outcome is a binaryattribute either set to fatal or healthy). Now supposethat Eve knows the database schema, the number ofhospitals, and has good estimates of the distribution ofpatient flows (0.2, 0.3, 0.5 resp.) and the ratio of fatalvs. successful outcomes (0.08, 0.92).</p><p>Alex issues the following queries:</p><p>SELECT * FROM table WHERE hospital = 1;SELECT * FROM table WHERE hospital = 2;SELECT * FROM table WHERE hospital = 3;</p><p>2</p><p>Proceedings of the 22nd International Conference on Data Engineering (ICDE06) 8-7695-2570-9/06 $20.00 2006 IEEE </p></li><li><p>SELECT * FROM table WHERE outcome = fatal;</p><p>From the size of the results and the fact that we onlyhave exact selects, Eve can guess the exact queries withhigh confidence (conditions on id and name yield farfewer hits). Then, by intersecting the answers to thefirst and the fourth query, Eve can infer the ratio oflethal to successful outcomes in hospital 1!</p><p>In another example we consider an active adver-sary with the ability to get encrypted queries of herchoice and show that no matter how secure the ta-ble is encrypted, such an adversary is able to de-duce a significant amount of information about theencrypted data. Suppose there was a patient Johnand Eve wants to find out in which hospital he wastreated and what happened to him. She issues the en-cryption of query name:John using the query encryp-tion oracle. Then Eve issues encryptions of querieshospital:X , X {1, 2, 3}. By intersecting the resultsof the four queries issued, Eve can determine the hos-pital where John was treated. Analogously, she canfind his status.</p><p>The definition of a secure database PH addressingthese issues can be formulated as follows:</p><p>Definition 2.1 A database PH provides indistin-guishability if Eve cannot win the following game:</p><p>1. Eve chooses two tables T1(R), T2(R) containingthe same numbers of tuples and presents them toAlex.</p><p>2. Alex chooses i {1, 2} uniformly at random andpresents Ek(Ti(R)) to Eve.</p><p>3. Eve receives at most q encrypted queries issued toEk(Ti(R)) and computes the results (in case of ac-tive adversary Eve has access to the queries en-cryption oracle and issues q encryptions of plain-text queries of her choice).</p><p>4. Eve must guess i.</p><p>The above examples demonstrate that this level ofsecurity cannot be achieved by any database PH, nomatter how advanced. This can be established for-mally:</p><p>Theorem 2.1 Any database PH (K, E,Eq, D) is in-secure in the sense of Definition 2.1 if q > 0.</p><p>Although if the adversary is passive, the case is lessobvious, in both cases the security of the encrypteddata cannot be guaranteed.</p><p>Intuition indicates that Definition 2.1 is too strong.Several relaxations can be considered: (i) Allow passiveadversaries only, (ii) allow for some limited information</p><p>leakage, or (iii) grant less plaintext knowledge to Eve.Unfortunately, none of these ideas is very promising.</p><p>(i) Passive adversaries are indeed an option in gen-eral. However in our setting, Eve has unlimited writeaccess to all ciphertext and can produce any messagesequence necessary for her attack. To demonstrate theplausibility of active adversaries that have consider-ably fewer options than Eve, Bleichenbacher has im-plemented a practical attack against SSL in which therole of the oracle is played by a confused web server[1]. (ii) Often only one bit of information is of in-terest to Eve (e.g., in the acknowledgement resp. can-celling of a stock order sent to a broker). Even if thisis not the case, it is usually infeasible for the applica-tion designer to specify which bits are confidential andwhich are not. Hence, a secure scheme must not leaka single bit. (iii) Changing the amount of Eves plain-text knowledge is the most problematic approach. Ifan encryption scheme is used as a building block forbuilding complex applications, Eve must be assumedto have full access to the source code. In combina-tion with timing attacks and other traffic analysis tech-niques, this gives her extensive knowledge on the struc-ture and purpose of the encrypted data. Assuming aless well-informed Eve is a textbook case of security byobscurity and usually leads to disaster. The Risk Di-gest provides a rich collection of anecdotical evidence:http://catless.ncl.ac.uk/Risks. For a more de-tailed discussion, see the full version of this paper.</p><p>So what can be done? Assume that Alex trusts Evenot to attack him directly but still worries about herbecoming adversarial in the future (e.g., by a changeof company ownership). If Alexs trust in Eve dete-riorates, he can cancel the contract in time and stopsending queries. Consequently, q = 0 and Theorem2.1 does not apply. In the next section we present asolution for this case.</p><p>3 A Privacy Homomorphism Preserv-ing Exact Selects</p><p>To conclude this poster, we give a general construc-tion of a database PH based on searchable encryptionschemes. One such scheme has been proposed by Songet al. [7] and includes a proof of security, but otherscan be used instead. In the full version of this paper,we describe a few straight-forward optimizations suchas attributes of variable length, and present a formalsecurity proof of our construction under the assump-tion that the underlying searchable encryption schemeis secure.</p><p>We write a|b for concatenation of strings aand b. # is the padding symbol. Consider</p><p>3</p><p>Proceedings of the 22nd International Conference on Data Engineering (ICDE06) 8-7695-2570-9/06 $20.00 2006 IEEE </p></li><li><p>the relation Emp(name:string[9], dept:string[5],salary:int). We can construct a database privacyhomomorphism for the schema as follows.</p><p>First, create words that are strings of the samelength (the shorter the better) and that identify theattributes of the relation. Then bijectively mapthe tuples of the given relation to the documents,or sets of words.2 The number of words in eachdocument is equal to the number of the attributesin the relation. The globally fixed word length isthe length of the longest attribute value plus thelength of an attribute identifier (required for decryp-tion, see below). For example, the relation Empwill be mapped to the following set of documents:{string[9]|N, string[9]|D, string[9]|S}.</p><p>Now, map all tuples of the original relation Emp todocuments. For example:</p><p>name:"Montgomery", dept:"HR", sal:7500 {"MontgomeryN", "HR########D", "7500######S"}</p><p>These documents are encrypted using a searchableencryption scheme and stored on a remote server.</p><p>In order to perform exact selects on the encryptedrelation, queries of</p><p>attribute name:value</p><p>will be ma...</p></li></ul>

Recommended

View more >