ppgj: a privacy-preserving general join for outsourced encrypted database

13
SECURITY AND COMMUNICATION NETWORKS Security Comm. Networks 2014; 7:1232–1244 Published online 24 Ocober 2013 in Wiley Online Library (wileyonlinelibrary.com). DOI:10.1002/sec.854 RESEARCH ARTICLE PPGJ: A privacy-preserving general join for outsourced encrypted database Sha Ma 1 *, Bo Yang 2 and Mingwu Zhang 3 1 Department of informatics, South China Agricultural University, Guangzhou, Guangdong, China 2 School of Computer Science, Shaanxi Normal University, Shaanxi, Xi’an, China 3 School of Computers, Hubei University ofTechnology, Wuhan, Hubei, China ABSTRACT In outsourced database, it is desirable to store sensitive data in an encrypted form to reduce security and privacy risks because the server may not be fully trusted. Several approaches have been proposed in recent literatures to efficiently support queries on encrypted databases. Most researches focus on keywords search and range query while few works study on join query because of a lack of related cryptography primitives. We propose a solution of a privacy-preserving general join supporting both equality tests and simple non-equality tests on ciphertexts by using the revised Boneh–Goh–Nissim encryption algorithm and a Bloom filter. Finally, we analyze its advantages and disadvantages by the comparison with existing method on the performance and the newly introduced security notions. Copyright © 2013 John Wiley & Sons, Ltd. KEYWORDS outsourced database; general join; ciphertext comparison *Correspondence Sha Ma, Department of informatics, South China Agricultural University, Guangzhou, Guangdong, China. E-mail: [email protected] 1. INTRODUCTION In the last few years, cloud computing has grown from being a promising business concept to one of the fast growing segments of the IT industry [1]. The database- as-a-service [2–5] model is a new computing paradigm in cloud computing. Because highly sensitive data are now stored in locations without under the data owner’s control, such as leased space and partners’ sites, this can put data confidentiality at risk. Therefore, such a varying trust sce- nario necessitates encryption techniques in the context of outsourced database[6–11]. Although several approaches have been proposed in recent literature to efficiently support queries on encrypted databases, most research focuses on keywords search and range query while few works study on join query because of a lack of related cryptography primitives. Consider the following real-life scenario: a company database is required not only to store personal information for each employee, for instance, id, name, job and salary but also to record the statistical information, for instance, an annual average salary avg_salary. Suppose existing at least two relations in a company database: employee(id, name, salary) and stat_salary(year, avg_salary). Since salary is a sensitive information, the outsourced database stores encrypted relations as follows(The symbol * is denoted that all elements in this column should be encrypted): employee_enc(id, name, job, salary*) stat_salary_enc(year_id, avg_salary*) Suppose that we need to analyze the state of human resource management of this company by making inquires about jobs of employees whose salaries are greater than the average salary of all employees when financial crisis broke out in 2008. This requirement can be denoted as the following structured query language (SQL) expression: SELECT job FROM employee_enc, stat_salary_enc WHERE employee_enc.salary*> stat_salary_enc.avg_salary* AND stat_salary_enc.year=2008. Our proposed scheme gives a solution of general join with the predicate like the first mathematical expression in the WHERE clause. The main contributions of this paper are summarized in the later text. (1) We propose a construction of privacy-preserving general join protocol supporting both equality tests and simple non-equality tests between ciphertexts. 1232 Copyright © 2013 John Wiley & Sons, Ltd.

Upload: mingwu

Post on 08-Apr-2017

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: PPGJ: A privacy-preserving general join for outsourced encrypted database

SECURITY AND COMMUNICATION NETWORKSSecurity Comm. Networks 2014; 7:1232–1244

Published online 24 Ocober 2013 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/sec.854

RESEARCH ARTICLE

PPGJ: A privacy-preserving general join foroutsourced encrypted databaseSha Ma1 *, Bo Yang2 and Mingwu Zhang3

1 Department of informatics, South China Agricultural University, Guangzhou, Guangdong, China2 School of Computer Science, Shaanxi Normal University, Shaanxi, Xi’an, China3 School of Computers, Hubei University of Technology, Wuhan, Hubei, China

ABSTRACT

In outsourced database, it is desirable to store sensitive data in an encrypted form to reduce security and privacy risksbecause the server may not be fully trusted. Several approaches have been proposed in recent literatures to efficientlysupport queries on encrypted databases. Most researches focus on keywords search and range query while few works studyon join query because of a lack of related cryptography primitives. We propose a solution of a privacy-preserving generaljoin supporting both equality tests and simple non-equality tests on ciphertexts by using the revised Boneh–Goh–Nissimencryption algorithm and a Bloom filter. Finally, we analyze its advantages and disadvantages by the comparison withexisting method on the performance and the newly introduced security notions. Copyright © 2013 John Wiley & Sons, Ltd.

KEYWORDS

outsourced database; general join; ciphertext comparison

*Correspondence

Sha Ma, Department of informatics, South China Agricultural University, Guangzhou, Guangdong, China.E-mail: [email protected]

1. INTRODUCTION

In the last few years, cloud computing has grown frombeing a promising business concept to one of the fastgrowing segments of the IT industry [1]. The database-as-a-service [2–5] model is a new computing paradigm incloud computing. Because highly sensitive data are nowstored in locations without under the data owner’s control,such as leased space and partners’ sites, this can put dataconfidentiality at risk. Therefore, such a varying trust sce-nario necessitates encryption techniques in the context ofoutsourced database[6–11].

Although several approaches have been proposed inrecent literature to efficiently support queries on encrypteddatabases, most research focuses on keywords search andrange query while few works study on join query becauseof a lack of related cryptography primitives. Considerthe following real-life scenario: a company database isrequired not only to store personal information for eachemployee, for instance, id, name, job and salary butalso to record the statistical information, for instance, anannual average salary avg_salary. Suppose existing at leasttwo relations in a company database: employee(id, name,salary) and stat_salary(year, avg_salary). Since salary isa sensitive information, the outsourced database stores

encrypted relations as follows(The symbol * is denotedthat all elements in this column should be encrypted):

employee_enc(id, name, job, salary*)

stat_salary_enc(year_id, avg_salary*)

Suppose that we need to analyze the state of humanresource management of this company by making inquiresabout jobs of employees whose salaries are greater thanthe average salary of all employees when financial crisisbroke out in 2008. This requirement can be denoted as thefollowing structured query language (SQL) expression:

SELECT job

FROM employee_enc, stat_salary_enc

WHERE employee_enc.salary* > stat_salary_enc.avg_salary*

AND stat_salary_enc.year=2008.

Our proposed scheme gives a solution of general joinwith the predicate like the first mathematical expression inthe WHERE clause. The main contributions of this paperare summarized in the later text.

(1) We propose a construction of privacy-preservinggeneral join protocol supporting both equality testsand simple non-equality tests between ciphertexts.

1232 Copyright © 2013 John Wiley & Sons, Ltd.

Page 2: PPGJ: A privacy-preserving general join for outsourced encrypted database

S. Ma, B. Yang, and M. Zhang PPGJ: A privacy-preserving general join for outsourced database

(2) We give formal security definitions of our system:data security, controllability, predicate confidential-ity, proxy security. These definitions can be used toanalyze the other related query protocols.

(3) We prove the security of our proposed scheme andvalidate its advantages through the comparison withthe other method.

The rest of the paper is organized as follows. Wereview the related work in Section 2. Section 3 describesthe system model and assumptions of our scheme. InSection 4, we introduce the fundamental preliminaries ofthis work, which is followed by the solution in Section 5.In Section 6, we analyze our proposed scheme in terms ofsecurity and performance on the basis of the comparisonwith the other method. We conclude our paper in Section 7.

2. RELATED WORK

In Yao’s famous millionaires’ problem, two parties try tocompare their riches but without disclosing their assets.Secured two-party protocols computing the greater-thanfunction GT(a, b) = (a > b) provide a solution to this prob-lem. It is different from our scenario: the encrypted valuesC(a) and C(b) generated by the query proxy are stored inthe outsourced database server, which is desired to dealwith the comparison of a and b under the permission ofa query proxy without decryption key and do not obtainother useful information after the protocol is finished.

Many studies focus on the trapdoor encryption to allowusers to search for a particular keyword [12]. However,few studies solve the problems of comparison query onthe encrypted database except order-preserving symmetricencryption [13,14] and hidden-vector encryption [15]. Theformer one is viewed as a tool like a blockcipher ratherthan a full-fledged encryption scheme itself because of thevery large size of the ciphertext space. The latter one isrequired to get the knowledge of data’s location in a finiteset before the encryption, which is not suitable for databaseencryption because of the computation of the data’s loca-tion information for each value in the domain, for instance,char(10), is a nontrivial task. Our work is to provide apractical solution of join query, which is an important andfundamental operation on the database.

An existing related technology is deterministic encryp-tion, which is a primitive firstly proposed by Bellare,Boldyreva, and O’Neill [16]. And then further investigatedby Bellare, Fischlin, O’Neill, and Ristenpart [17], andBoldyreva, Fehr, and O’Neill [18] . Because each plain-text can be encrypted to a deterministic ciphertext, it isvery suitable for equal join in our system model. However,these primitives do not support general join such as thegreater-than predicate. Another recent cryptographic prim-itive Public Key Encryption which supports Equality Testbetween ciphertexts (PKEET) has a close nature to ours.It is firstly proposed by Boldyreva, Fehr, and O’Neill [19]and then different varieties of PKEET with authorization

properties are proposed by Yang, Tan, Huang, and Wong[20], and Tang [21] [22]. This primitive is used to checkwhether two ciphertexts owned by different data ownerswith different public keys are encryptions of the samemessage. However, our intuition is a little different fromPKEET and its variants because we consider the compari-son of ciphertexts owned by the same data owner.

It is the first time to introduce a mechanism for execut-ing general binary join operation (for predicates that satisfycertain properties) in an outsourced relational databaseframework by [23]. Its newest version is [24]. Each ele-ment a 2 A is stored on the server as [EK (a), O(a), BF(a)],where EK (a) is denoted as the ciphertext of a encryptedby a standard encrypted algorithm, an obfuscation O(a) ofa, and a Bloom filter (BF) containing all possible valuesassociate with a satisfying the predicate. For each ele-

ment b 2 B, compute eA(b) = rO(b)AB mod p where p is a

long prime, and rAB = gyA/xB is the trapdoor. For eachelement a 2 A, if BF(a) contains eA(b) then return thetuple < EK (a), EK (b) >. We will compare our scheme with[24] in Section 6.2. Furthermore, Ma, Yang, Li, and Xia[25] provided a novel method for equal join on outsourceddatabase by using Boneh–Goh–Nissim (BGN) encryptionand BF. This solution can achieve computational privacyand low overhead. Our proposed scheme can support bothequal and simple non-equal join.

3. MODEL AND ASSUMPTION

3.1. System model

We consider the system model consisting of two parties:the server and the client.

(1) Server. The server stores the outsourced database andperforms queries by using the trapdoor generated bythe query proxy in the client.

(2) Client. The client consists of three entities:

(a) Query proxy. The query proxy performs theencryption for data owner, generates the trap-door to support queries on the outsourceddatabase.

(b) Data owner. The data owner generates thedata to be stored in the outsourced databaseand the keys for data management andperforms the decryption on received queryresults as needed.

(c) User. The user submits SQL expressions tothe query proxy and processes on the plain-texts.

Note that the only difference between the data ownerand the query proxy is that the former owns all private keysfor both the decryption and the query while the latter onlyowns a private key for the query. In Figure 1, the query

Security Comm. Networks 2014; 7:1232–1244 © 2013 John Wiley & Sons, Ltd. 1233DOI: 10.1002/sec

Page 3: PPGJ: A privacy-preserving general join for outsourced encrypted database

PPGJ: A privacy-preserving general join for outsourced database S. Ma, B. Yang, and M. Zhang

Figure 1. An illustration of system model.

proxy generates its encrypted data on outsourced database.After receiving an SQL expression from the user, the queryproxy generates a trapdoor on the basis of the query pred-icate and sends it to server for query execution. Afterreceiving all ciphertext results computed by the server, theuser processes further on plaintexts by the data owner’sdecryption as needed and finally obtains query results.

3.2. Formulation

Suppose a join query is performed on attributes A and Bwith column identities: CID(A) and CID(B), join predicatep(A, B) is a mathematical expression on attributes A and B,a(or b) is a value in column A(or B). The PPGJ schemeconsists of the following polynomial time algorithms:

(1) Setup(k): This algorithm is run by the data owner,which takes a security parameter k as input, and out-puts a key pair(pk, sk, qk), where pk is public and(sk, qk) are private. (sk, qk) are owned by the dataowner for the decryption and meanwhile qk is sent tothe query proxy for the generation of the trapdoor.

(2) Encrypt(pk, M, qk, CID): This algorithm is run bythe query proxy, which takes a public key pk, a valueM, a private key qk, a column identity CID as inputs,and outputs a ciphertext C(M) for the value M incolumn CID.

(3) Trapdoor(qk, p(A, B)): This algorithm is run by thequery proxy, which takes a private key qk and a pred-icate p(A, B) as inputs, and outputs a trapdoor Tp(A,B)for join query on column A and column B.

(4) GeneralJoin(C(A), C(B), Tp(A,B)): This algorithmis run by the server, which takes all ciphertexts in col-umn A and B (C(A), C(B)), and a trapdoor Tp(A,B)as inputs, and outputs all ciphertext pairs satisfy thepredicate p(A, B).Note that although decryption algorithm is not anessential part in our scheme, we still give a descrip-tion for an integrated cryptographic system.

(5) Decryption(sk, qk, C(M), CID): This algorithm isrun by the data owner, which takes private keys(sk, qk) and a ciphertext C(M) in column CID asinputs, and outputs a plaintext M.

3.3. Security model

Firstly, we define the consistency of such a system, whichensures that the scheme fulfills its function.

Definition 1. (Consistency) A privacy-preserving gen-eral join (PPGJ) scheme is said to satisfy consistency if thefollowing two conditions are satisfied for a and b :

(1) If p(a, b) = true, thenGeneralJoin(C(a), C(b),Tp(A,B)) = true.

(2) If p(a, b) = false, then Pr[GeneralJoin(C(a), C(b),Tp(A,B) = false] > 1 – �(k), where �(k) is a negligiblefunction.

Next, we define data security of PPGJ against one-way chosen-plaintext attacks (OW-CPA). Given oracleEncrypt(�), the attacker should not be able to obtain theplaintext for a challenged ciphertext.

Definition 2. (Data security) Consider the followingProbability Polynomial Time (PPT) adversary A thateavesdrops on the encryption of M by using PPGJ.

(1) Setup(k) is run to obtain keys (pk, sk, qk). pk is givento adversary A.

(2) The challenger receives a message M in the columnCID, which is chosen by adversary A for its encryp-tion and then returns C = Encrypt(pk, M, qk, CID)to A.

(3) The challenger chooses (Mt, CIDt) only once inthe game and then sends the ciphertext Ct =Encrypt(pk, Mt, qk, CIDt) to A.

(4) The challenger continues to receive a message Min the column CID, which is chosen by adver-sary A for its encryption and then returns C =Encrypt(pk, M, qk, CID) to A.

(5) A outputs a guess M0t .

We define the adversary’s advantage as AdvOW–CPAA,PPGJ (k)

= | Pr[Mt = M0t ] – 12 |. We say that a PPGJ scheme satisfies

data security if AdvOW–CPAA,PPGJ (k) is a negligible function.

1234 Security Comm. Networks 2014; 7:1232–1244 © 2013 John Wiley & Sons, Ltd.

DOI: 10.1002/sec

Page 4: PPGJ: A privacy-preserving general join for outsourced encrypted database

S. Ma, B. Yang, and M. Zhang PPGJ: A privacy-preserving general join for outsourced database

Then we define the controllability, which means theserver should not be able to evaluate join predicates with-out permission of the query proxy. In more detail, withoutreceiving the trapdoor for the join predicate, the servercannot succeed in performing join query.

Definition 3. (Controllability) Suppose all possible pred-icate operators are in the set: {“=”, “>”, “<”}, which aredenoted as p1, p2, and p3. Consider the following PPTadversary A that eavesdrops on the execution of a joinpredicate by using PPGJ.

(1) Setup(k) is run to obtain keys (pk, sk, qk). pk is givento adversary A.

(2) The challenger chooses a predicate operator pt(t 2{1, 2, 3}), (Mt,CIDt), and (M0t ,CID0t), which sat-isfy that pt(Mt, M0t ) = true. And then returnspt, Ct = Encrypt(pk, Mt, qk, CIDt) and C0t =Encrypt(pk, M0t , qk, CID0t) to A.

(3) A guesses two columns CIDg and CID0g used for thegeneration of Ct and C0t and then receives a trap-door Tg = Trapdoor(qk, pt(CIDg, CID0g)) from thechallenger.

(4) A outputs the result of GeneralJoin(Ct, C0t , Tg).

We define the adversary’s advantage as AdvControlA,PPGJ(k)

= |Pr[GeneralJoin(Ct, C0t , Tg) = true] – 12 |. We say that a

PPGJ scheme satisfies controllability if AdvControlA,PPGJ(k) is

a negligible function.

Then we define predicate confidentiality. We need toensure that the attacker does not obtains the type of joinpredicate given the trapdoor for the predicate.

Definition 4. (Predicate confidentiality) Suppose all pos-sible predicate operators are in the set: {“=”, “>”, “<”},which are denoted as p1, p2, and p3. Consider the follow-ing PPT adversary A that eavesdrops on the generation ofa trapdoor using PPGJ.

(1) Setup(k) is run to obtain keys (pk, sk, qk). pk is givento adversary A.

(2) A chooses column identities (CID, CID0) and sendsthem to the challenger.

(3) The challenger chooses a predicate operatoron column CID and CID and returns Tt =Trapdoor(qk, pb(CID, CID0)) to A.

(4) A finally outputs a guess b0 2R {1, 2, 3}.

We define the adversary’s advantage as AdvPreConA,PPGJ(k)

= |Pr[b = b0] – 13 |. We say that a PPGJ scheme satisfies

predicate confidentiality if AdvPreConA,PPGJ(k) is a negligible

function.

Finally, we define proxy security. We need to ensurethat the attacker does not obtain the plaintexts for two chal-

lenged ciphertexts on (CID, CID0) given the trapdoor oncolumns (CID, CID0).

Definition 5. (Proxy security) Consider the followingPPT adversary A that eavesdrops on the processing ofproxy by using PPGJ.

(1) Setup(k) is run to obtain keys (pk, sk, qk). pk is givento adversary A.

(2) A chooses two columns CIDt and CID0t for the trap-door of join predicate p(CID, CID0) and then returnsT = Trapdoor(qk, p(CIDt, CID0t)) to A.

(3) The challenger chooses Mt in column CID and M0tin column CID0 only once in the game and thenreturns Ct = Encrypt(pk, Mt, qk, CIDt) and Ct0 =Encrypt(pk, M0t , qk, CID0t).

(4) A can guess the plaintexts Mg and M0g of the cipher-texts Ct and C0t .

We define the adversary’s advantage as AdvProSecA,PPGJ(k)

= |Pr[(Mg = Mt) ^ (M0g = M0t )]|. We say that a PPGJ

scheme satisfies proxy security if AdvProSecA,PPGJ(k) is a negli-

gible function.

4. FUNDAMENTAL PRELIMINARY

4.1. Revised Boneh–Goh–Nissimencryption system

The public key encryption algorithm of Boneh, Goh,and Nissim [26] resembles the Pailier and the Okamoto–Uchiyama encryption schemes. It consists of the fol-lowing three polynomial algorithms: BGNkg, BGNenc,BGNdec.

(1) BGNkg

(a) Choose two random s-bit primes p and q.(b) Generate two multiplicative groups G and

G1 of order N = pq and a bilinear map e :G �G ! G1 such that for all u, v 2 G anda, b 2 Z, we have that e(ua, vb) = e(u, v)ab.It is also required that if g is a generator ofgroup G then e(g, g) is a generator of groupG1.

(c) Choose two random generators g, u 2R G.(d) Calculate the generator h = uq of a subgroup

of G of order p.(e) Publish public key(N,G,G1, e, g, h) and

keep private key p secret.

(2) BGNenc

(a) Choose a random r 2R {0, : : : , N – 1}.(b) Produce the ciphertext c = gmhr 2 G.

Security Comm. Networks 2014; 7:1232–1244 © 2013 John Wiley & Sons, Ltd. 1235DOI: 10.1002/sec

Page 5: PPGJ: A privacy-preserving general join for outsourced encrypted database

PPGJ: A privacy-preserving general join for outsourced database S. Ma, B. Yang, and M. Zhang

(3) BGNdec To decrypt the ciphertext c, at first, com-pute cp = (gmhr)p = (gp)m =gm and then usePollard’s �method to calculate the discrete logarithmto retrieve m.

Boneh–Goh–Nissim encryption is clearly additivelyhomomorphic because E(m1) � E(m2) = gm1 hr1 � gm2 hr2 =gm1+m2 hr1+r2 = E(m1 + m2).

Theorem 1. The public key system of Boneh–Goh–Nissim encryption is semantically secure assuming thesubgroup decision assumption is satisfied [26].

In this paper, we use a revised BGN encryption scheme,called deterministic BGN (DBGN), which replaces thecoins used by a standard encryption scheme with a ran-dom number generated by a private key qk and an aux-iliary information about the message, which is denotedas d(�). More formally, DBGNenc draws its coins froma set coinsqk(d(x)), where qk is a secret key and d(x)is an auxiliary information about the message x. Wewrite DBGNenc(1k, pk, x; R) for outputs of DBGN oninputs pk, x and coins R. Let F : {0, 1}* � {0, 1}* !

{0, 1}* be a keyed pseudorandom function where thefirst input is called the key k, and the second input isjust called the input. It is defined by F(k, x) with theproperty that F(qk, d(x)) 2 Coinqk(d(x)) for qk, d(x) 2

{0, 1}*. The DBGN encryptions scheme DBGN =(DBGNkg, DBGNenc, DBGNdec) is defined via

DBGNkg(1k)

(pk, sk) BGNkg(1k)

qk 2 {0, 1}*

Return (pk, sk, qk)

DBGNenc(1k, pk, x, qk, d(x))

R F(qk, d(x))

y BGNenc(1k, pk, x; R)

Return y

DBGNdec(1k, sk, qk, y, d(x))

x BGNdec(1k, sk, y)

R H(qk, d(x))

If BGNenc(1k, pk, x; R)=y then return x

Else Return ?

Definition 6. Deterministic BGN encryption is OW-CPAsecure if any polynomial-time adversary has only a negli-gible advantage in the following attack game.

(1) DBGNkg(1k) is run to obtain keys (pk, sk, qk). pk isgiven to adversary A.

(2) The challenger receives a message M and its aux-iliary information d(M), which is chosen by adver-sary A for its encryption and then returns C =DBGNenc(pk, M, qk, d(M)) to A.

(3) The challenger chooses (Mt, d(Mt)) only once inthe game, and then sends the ciphertext Ct =DBGNenc(pk, Mt, qk, d(Mt)) to A.

(4) The challenger continues to receive a message M andits auxiliary information d(M), which is chosen byadversary A for its encryption and then returns C =DBGNenc(pk, M, qk, d(M)) to A.

(5) At some point, A outputs a guess M0t .

We define the adversary’s advantage as AdvOW–CPAA,DBGN(k)

= |Pr[b = b0] – 12 |. We say that DBGN cryptosystem

achieves OW-CPA security if AdvOW–CPAA,DBGN(k) is a negligi-

ble function.

4.2. Bloom filter

A BF [27] offers a compact representation of a set of dataitems, allowing for fast set inclusion tests.

A BF is an array B of t bits, initialized to zero. Itrequires a set of n independent hash functions Hi(i 2{1, : : : , n}) that produce uniformly distributed output in therange [0, t-1] over all possible inputs. To add an entry W tothe filter, set B[Hi(W)] = 1 (1 � i � n). To check whetherW is in the database, one computes bi for all 1 � i � nand checks the bits B[bi]. If any of the bits is 0, one con-cludes that W is out of the database; if all are 1, W probablydoes exist. This introduces false positives hits, in which Wappears to be in the database but it is actually not, becauseeach location of bi may have also been set by some elementother than W.

Eu-Jin Goh firstly quantifies the probability of a falsepositive occurring in a BF [28]. After using n hash func-tions to insert r distinct elements into an array of size t,the probability of a false positive is (1 – (1 – ( 1

t ))rn)r �

(1 – e– rnt )r. For a certain number r of inserted elements,

there exists a relationship that determines the optimal num-ber of hash functions h0 = t

r ln2 � 0.7 tr that yields a false

positive probability of p = ( 12 )h0 = ( 1

2 )tr ln2 � 0.62

tr .

For a BF, we denote BF.insert(v) as the insertion oper-ation and BF.contains(v) as the set inclusion test(returningtrue if it contains value v, false otherwise).

4.3. Pseudorandom function

Definition 7. Let F : {0, 1}* � {0, 1}* ! {0, 1}* bean efficient length-preserving keyed function. We say Fis a pseudorandom function if for all probabilisticpolynomial-time distinguishers D, there exists a negligiblefunction negl such that

| PrhDFk(�)(1n)

i– Pr

hDfn(�)(1n)

i| � negl(n)

where k {0, 1}n is chosen uniformly at random and fnis chosen uniformly at random from the set of functionsmapping n-bit strings to n-bit strings.

1236 Security Comm. Networks 2014; 7:1232–1244 © 2013 John Wiley & Sons, Ltd.

DOI: 10.1002/sec

Page 6: PPGJ: A privacy-preserving general join for outsourced encrypted database

S. Ma, B. Yang, and M. Zhang PPGJ: A privacy-preserving general join for outsourced database

Figure 2. A PPGJ scheme.

4.4. Discrete logarithm assumption

Let G be a finite field of size p prime and order q, and letg be a generator of G, the discrete logarithm assumption(DL):

Definition 8. Given g, v 2 G, it is intractable to find r 2Zq such that v = gr mod p.

5. A PRIVACY-PRESERVINGGENERAL JOIN SCHEME

We propose a PPGJ scheme including the following poly-nomial time algorithms for general join whose predicateoperator is: “=”, “>” or “<” (Figure 2). The proposedscheme has the following global parameters.

(1) k is the security parameter, e : G � G ! G1 isa bilinear map, G and G1 are groups of order N =q1q2 2 Z where q1 and q2 are two random k-bit

primes. Pick two random generator g, uR G and

set h = uq2 . Then h is a random generator of thesubgroup of G of order q1.

(2) F : {0, 1}* � {0, 1}* ! ZN is a pseudorandomfunction.

(3) CID is denoted as column identity, which can iden-tify the unique column in the table.

The intuition behind our construction is that, theencryption algorithm encrypts both the message and itscolumn identity by using a public key and a private keyfor query. When a query proxy receives a join predicatefrom the user, it uses the private key for query to gener-ate the trapdoor so that the server can support join querybut not recover the message. After receiving ciphertextresults from the server, the user may interact with the dataowner to request the decryptions for all plaintext results asneeded. In the following, we define the algorithms (Setup,Encrypt, Trapdoor, GeneralJoin).

(1) Setup(k).

(a) A probabilistic algorithm that takes as inputa security parameter k 2 Z+, run key gen-eration algorithm for a BGN encryption toobtain the public key pk = (N,G,G1, e, g, h)and the private key sk = p.

(b) Generate another private key qk 2 {0, 1}*

for the pseudorandom function F.(c) To describe briefly, the domain of joined

attributes is defined the same: [x1, x2](x1 <x2 ^ x1, x2 2R ZN ).

(d) Initiate the BF to 0, calculate:

�1 = hF(qk,CID(A),CID(B),1) (1)

And for all v 2 (0, x2 – x1], calculate:

�2 = gvhF(qk,CID(A),CID(B),2) (2)

�3 = g–vhF(qk,CID(A),CID(B),3) (3)

and execute BF.insert(�i)(i 2 {1, 2, 3}).Send the BF to the server.

(2) Encrypt(pk, M, qk, CID)For M 2 ZN in column CID, encrypt it using the

DBGNenc algorithm in section 4.1:

C(M) = DBGNenc(1k, pk, M, qk, CID)

= gMhF(qk,CID)(4)

For example, encrypt a 2 ZN in column A:

C(a) = DBGNenc(1k, pk, a, qk, CID(A))

= gahF(qk,CID(A))(5)

Similarly, encrypt b 2 ZN in column B:

C(b) = DBGNenc(1k, pk, b, qk, CID(B))

= gbhF(qk,CID(B))(6)

Security Comm. Networks 2014; 7:1232–1244 © 2013 John Wiley & Sons, Ltd. 1237DOI: 10.1002/sec

Page 7: PPGJ: A privacy-preserving general join for outsourced encrypted database

PPGJ: A privacy-preserving general join for outsourced database S. Ma, B. Yang, and M. Zhang

Figure 3. The pseudocode of GeneralJoin.

In the database application, because join query isperformed on all values in two columns, we applyEncrypt to all values in column A via

Encrypt(pk, A, qk, CID(A))

For i=1,: : :,|A| do

C(A)[i] Encrypt(pk, A[i], qk, CID(A))

Return C(A)

And we apply Encrypt to all values in column Bvia

Encrypt(pk, B, qk, CID(B))

For i=1,: : :,|B| do

C(B)[i] Encrypt(pk, B[i], qk, CID(B))

Return C(B)

(3) Trapdoor(qk, p(A, B))Given the description of a predicate p(A, B) on

column A and B, suppose that rA = F(qk, CID(A)),rB = F(qk, CID(B)), r1 = F(qk, CID(A), CID(B), 1),r2 = F(qk, CID(A), CID(B), 2), r3 =F(qk, CID(A), CID(B), 3), output the trapdoor forp(A, B) as follows:

Tp(A,B) =

8<:

hrB–rA+r1 if p(A, B) = “A = B"hrB–rA+r2 if p(A, B) = “A > B"hrB–rA+r3 if p(A, B) = “A < B"

(4) GeneralJoin(C(A), C(B), Tp(A,B))

For each pair of values in the column of A and B:(C(A)[i], C(B)[j])(1 � i � |C(A)|, 1 � j � |C(B)|),the server firstly computes e(C(A)[i], C(B)[j]) byusing the trapdoor Tp(A,B):

e(C(A)[i], C(B)[j]) =C(A)[i]

C(B)[j]� Tp(A,B) (7)

And then test e(C(A)[i], C(B)[j]) whether it isin the BF. If BF.contain(e(C(A)[i], C(B)[j])) returnstrue, send (C(A)[i], C(B)[j]) back to the client. Inreal scenario, GeneralJoin will output also anyadditional attributes specified in the SELECT clause,but for simplicity, we explicit here and in the follow-ing only the join attributes (Figure 3).

Theorem 2. The PPGJ scheme from the preceding con-struction satisfies consistency according to definition 1.

Proof.

(1) If p(A,B) = “A = B”, we can verify the following:

C(a)

C(b)� Tp(A,B) =

gahrA

gbhrB� hrB–rA+r1

= ga–bhr1

= hr1 2 �1

(8)

Because there always exists the BF containing �1,the first condition of consistency is satisfied. Theoccurrence of false positive rate means when a is notequal to b, BF.contains(e(a, b)) returns true. It occursowing to two cases:

(a) e(a, b) is a member of the set for BF. It meansthere exists v1 2 [–(x2 –x1), 0)_ (0, (x2 –x1)]satisfying the following:

gv1 hr1 2 {�1, �2, �3} (9)

Because r1, r2, r3 are random numbers inprivate key qk, in this case, the max prob-ability is �1 = 2�|x2–x1|

|G| , where |G| is theorder of group G. If the domain of the joinedattributes are in the small range of ZN , �1 isnegligible, for example, if |N| is 1024-bit andthe domain of salary monthly is [0, 100000],which is at least required to be defined as 17-

bit length, �1 = 2�217

21024 , which is small enoughfor application.

(b) BF.contains(e(a, b)) although e(a, b) is not amember of the set for BF. We have discussedthe controllable rate of false positive for setinclusion test. So we can choose optimalparameters to obtain a desired small falsepositive. It means that BF.contains(e(a, b)))is negligible, denoted as �2:

Pr�BF.contains(gv1 hr1 ) = true |

:BF.insert(gv1 hr1 )�

= �2(10)

Combining the two cases, the total false positiverate is �1 + �2 – �1�2, which is negligible. So whenp(A, B) = “A = B", the consistency of the scheme issatisfied according to Definition 1.

1238 Security Comm. Networks 2014; 7:1232–1244 © 2013 John Wiley & Sons, Ltd.

DOI: 10.1002/sec

Page 8: PPGJ: A privacy-preserving general join for outsourced encrypted database

S. Ma, B. Yang, and M. Zhang PPGJ: A privacy-preserving general join for outsourced database

(2) If p(A, B) = “A > B", we can verify(v2 2 (0,(x2 – x1)]):

C(a)

C(b)� Tp(A,B) =

gahrA

gbhrB� hrB–rA+r2

= ga–bhr2

= gv2 hr2 2 �2

(11)

Because there always exists BF containing �2, thefirst condition for consistency is satisfied. The occur-rence of false positive rate means when a is notgreater than b, BF.contains(e(a, b)) returns true. Itoccurs owing to two cases:

(a) e(a, b) is a member of the set for BF. It meansthere exists v2 2 (0, (x2 – x1)] satisfying:

gv2 hr2 2 {�1, �2, �3} (12)

Because r1, r2, r3 are random numbers inprivate key qk, in this case, the max proba-bility is �3 = |x2–x1|

|G| . Similar to the previousanalysis of “A = B”, it is also small enoughif the domain of the joined attributes are inthe small range of ZN .

(b) BF.contains(e(a, b)), although e(a, b) is not amember of the set for BF� . The probabilityof BF.contains(e(a, b))) is also �2, which isnegligible.

Combining the two cases, the total false positiverate is �3 + �2 – �3�2, which is negligible. So whenp(A, B) =“A > B”, the consistency of the scheme issatisfied according to the definition 1.

(3) If p(A, B) = “A < B", the proof is similar. We omit ithere.

So we conclude that the scheme satisfies consistencyaccording to Definition 1.

6. ANALYSIS OF OUR PROPOSEDSCHEME

In this section, we will present the security analysis of ourscheme and then compare our scheme with other methods.

6.1. Security analysis

Theorem 3. Assuming the underlying DBGN cryptosys-tem is OW-CPA secure, the PPGJ scheme satisfies datasecurity, according to Definition 2.

Proof. Suppose that there exists an adversary A 2 PPTthat can succeed in breaking the security game accord-ing to Definition 2, with a non-negligible advantage. So,under those conditions, A can succeed in breaking OW-

CPA secure of DBGN cryptosystem even given the BF.It is easy to see that there is no correlation betweenEncrypt(pk, M, mk, CID) and the BF, because the con-struction of the BF is performed in the first stage ofa PPGJ scheme without referring to M. Next, we willprove A succeeds in breading OW-CPA secure of DBGNcryptosystem.

Suppose an adversary A has the advantage � in theattack game against OW-CPA security of DBGN cryp-tosystem, we construct an algorithm B that breaks CPAsecurity of BGN by using DBGN cryptosystem:

(1) BGNkg is run to obtain pk and sk. pk is given to B.(2) B chooses qk 2 {0, 1}*, gives pk to A and runs A

to obtain two messages (M0, d(M0)), (M1, d(M1)) 2ZN .

(3) Then B chooses a random bit b, and computes aciphertext Cb = BGN(pk, Mb).

(4) B runs A, gives the ciphertext Cb to A and obtain amessage (M, d(M)). If M = M0, output 0; otherwise,output 1.

Our goal is to prove that there exists a negligiblefunction negl such that

PrhAdvOW–CPA

A,DBGN = 1i�

1

2+ negl(k)

By assumption, BGN is CPA secure and so there existsa negligible function negl1 such that

1

2+ negl1(k) � Pr

hAdvOW–CPA

B,DBGN (k) = 1i

If B1 outputs 0, given C0, A can compute C0gm0 . If A can

succeed, it needs to obtain t satisfying that hF(qk,t) = C0gm0 .

F is a pesudorandom function with range ZN and qk is arandom string, So

Pr[A output (M0, d(M0))|b = 0] =

Pr[B output 0|b = 0](13)

It therefore follows that

1

2+ negl1(k) � Pr

hAdvCPA

B,DBGN(k) = 1i

=1

2� Pr[B output 0|b = 0]+

1

2� Pr[B output 1|b = 1]

=1

2� Pr[A output(M0, d(M0))|b = 0]+

1

2� Pr[A output(M0, d(M0))|b = 1]

= PrhAdvOW–CPA

A,DBGN = 1i

(14)That is exactly what we want to prove. The theoremnow follows.

Security Comm. Networks 2014; 7:1232–1244 © 2013 John Wiley & Sons, Ltd. 1239DOI: 10.1002/sec

Page 9: PPGJ: A privacy-preserving general join for outsourced encrypted database

PPGJ: A privacy-preserving general join for outsourced database S. Ma, B. Yang, and M. Zhang

Theorem 4. The PPGJ scheme satisfies controllability,according to Definition 3.

Proof. Suppose that there exists an adversary A 2 PPTthat can succeed in breaking the security game, fromDefinition 3, with some non-negligible advantage. So,under those conditions, A can decide whether the resultof comparison on a and b is correct with non-negligibleadvantage even it may not use the trapdoor for thejoin predicate on A and B. We suppose that A com-putes the correct results with non-negligible advantage byexecuting GeneralJoin protocol by using an impropertrapdoor Tp(A0,B0). Suppose that r0A = F(qk, CID(A0)),r0B = F(qk, CID(B0)), r01 = F(qk, CID(A0, B0), 1), r02 =F(qk, CID(A0, B0), 2) and r03 = F(qk, CID(A0, B0), 3).

e0(a, b) =C(a)

C(b)� Tp(A0,B0)

=gahF(qk,CID(A))

gbhF(qk,CID(B))� Tp(A0,B0)

= ga–bhrA–rB � Tp(A0,B0)

(15)

(1) When p(A, B) =“A = B”:

e0(a, b) = hrA–rB+rB0–rA0+r01 (16)

It occurs owing to two cases thatBF.contains(e0(a, b)) returns true:

(a) e0(a, b) is a member of the set for the BF.Because rA, rB, rA0 , rB0 , r01 are random num-bers in private key qk. In this case, the maxprobability is �4 = (2�|x2–x1|+1)�|x2–x1|

|G| . Simi-lar to the former analysis of consistency, it isalso small enough if the domain of the joinedattributes are in the small range of ZN .

(b) BF.contains(e0(a, b)) although e0(a, b) is nota member of the set for BF. The probabilityof BF.contains(e0(a, b))) is �2, which is neg-ligible. Combining the two cases, the proba-bility of A succeeding in the experiment ofcontrollability is 1

2 + �4 + �2 – �4�2. Because�4 + �2 – �4�2 is negligible, this completesthe proof that PPGJ satisfies controllabilitywhen p(A, B) = “A = B" according to thedefinition 3 .

(2) When p(A, B) =“A > B” or p(A, B) =“A < B” theproof is similar. The probability of A succeeding inthe experiment of controllability is 1

2 + �3 + �2 – �3�2,Because �3 + �2 – �3�2 is negligible, this completesthe proof that PPGJ satisfies controllability whenp(A, B) =“A > B” or p(A, B) =“A < B” according tothe Definition 3.

Combining the three cases, the theorem now follows.

Theorem 5. The PPGJ scheme satisfies predicate confi-dentiality, according to Definition 4.

Proof. Suppose a polynomial time algorithm A 2 PPTthat can succeed in breaking the security game accord-ing to definition 4, with non-negligible advantage. So,under those conditions, A can succeed in distinguishingthe trapdoors for different predicates p1, p2, p3 by doingthe following works:

(1) A can guess v satisfying that hv = Tt. The probabilityis denoted as �5.

(2) A can guess i 2 {1, 2, 3} satisfying that v = rCID0 –rCID + ri. The probability is denoted as �6.

According to the DL assumption, �5 is negligible.According to the definition of pseudorandom function,because rCID0 , rCID and ri are pseudorandom numbersthat are computationally indistinguishable from randomstrings, �6 is also negligible. So the success proba-bility of A is 1

3 + �5�6. Because �5�6 is negligible,this completes the proof that PPGJ satisfies predicateconfidentiality according to Definition 4. The theoremnow follows.

Theorem 6. The PPGJ scheme satisfies proxy security,according to Definition 5.

Proof. We observe that it is obvious that this theorem issatisfied. Similar to the proof of Theorem 5, even given thetrapdoor, the adversary A in the security game accordingto Definition 4 cannot obtain private key qk, because theDL assumption and pseudorandom function exist. Further-more, sk is hidden from A due to the subgroup decisionassumption in the Theorem 1. So the success advantage ofmathcalA, which can obtain the plaintexts for the cipher-texts even given the trapdoor, is negligible. The theoremnow follows.

6.2. Comparison

The trivial method of querying on the encrypted data is totransmit all encrypted data from the server to the client andthen query on the decrypted data. It is easy to observe thatit cannot achieve the controllability and proxy security andhence lacks the ability of the control for query proxy. Next,we will give the comparison between our scheme with [24]on the overhead of storage, transmission, computation, andsecurity notions in Section 3.3.

In [24], in order to realize join predicates aimed atthis paper, the set of values that satisfy these join predi-cates for each element should be inserted into the BF forthe element. As recommended by the Wassenaar Arrange-ment [29], we set N1, the size of multiplicative group ina classical asymmetric system, to be 512 bits, q, the sizeof subgroup is 160 bits, and N2, the key size in a sym-metric key cryptosystem, to be 64 bits (suppose that thesizes of a plaintext, a ciphertext and a key are the same

1240 Security Comm. Networks 2014; 7:1232–1244 © 2013 John Wiley & Sons, Ltd.

DOI: 10.1002/sec

Page 10: PPGJ: A privacy-preserving general join for outsourced encrypted database

S. Ma, B. Yang, and M. Zhang PPGJ: A privacy-preserving general join for outsourced database

(a) PPGJ (b) [25]

Figure 4. Storage overhead of two methods.

(a) PPGJ (b) [25]

Figure 5. Transmission overhead of two methods.

in the symmetric encryption algorithm). Although thesecryptographic key sizes are not long enough for currentapplications, it can be used to compare different methodson performance. As recommended by Carbunar and Sion[24], we set a desired false positive rate of no more thanpfp = 0.8% and h0 = 7 for each BF.

6.2.1. Storage overhead.

For the PPGJ scheme, (m + n) values are encrypted to(m + n) numbers in G and a BF for join query. The storageoverhead is (m + n)N1 + |BF1|. For [24], (m + n) valuesare encrypted to (m + n) numbers with N2 bits, (m + n)numbers with q bits and (m + n) BFs. The storage overheadis (m + n)(N2 + q + |BF2|).

Figure 4 shows the storage overhead of two methodswith a variation of the number of tuples and the attributedomain. For PPGJ scheme, all elements in the attributedomain should be inserted into the BF. For [24], it is theworst case that the values inserted into the BF for eachelement to come from the whole domain of the joinedattribute. We suppose that half of all elements in theattribute domain are inserted into the BF for each elementon average. Figure 4 (b) shows that storage overhead in

[24] grows more greatly, because it stores one BF for eachelement. Furthermore, Figure 4(a) and (b) show that stor-age overhead increases with the size of attribute domain,because the size of BF grows with the number of insertedvalues for a fixed pfp and h0.

6.2.2. Transmission overhead.

We consider data transmission overhead of executionof join query. For the PPGJ scheme, a trapdoor for a joinquery is uploaded to the server, which is a number in G.The upload overhead is N1 bits. Then after executing joinquery by the trapdoor, the server sends a set of ciphertextswith |Q|(m + n) entries to the client. The download over-head is |Q|(m + n)N1 bits. The total transmission overheadis (|Q|(m + n) + 1)N1 bits. For [24], the upload overheadby sending a trapdoor for a join query to the server is alsoN1 bits. Then after executing join query by the trapdoor,the server sends a set of values with |Q|(m + n) entries tothe client. The download overhead is |Q|(m+n)N2 bits. Thetotal transmission overhead is N1 + |Q|(m + n)N2 bits.

Figure 5 shows the data transmission overhead of twomethods with a variation of the number of tuples andquery selectivity. Query selectivity is an important factor

Security Comm. Networks 2014; 7:1232–1244 © 2013 John Wiley & Sons, Ltd. 1241DOI: 10.1002/sec

Page 11: PPGJ: A privacy-preserving general join for outsourced encrypted database

PPGJ: A privacy-preserving general join for outsourced database S. Ma, B. Yang, and M. Zhang

Table I. Comparison of two methods.

PPGJ [24]Storage (m + n)N1 + |BF1| (m + n)(N2 + q + |BF2|)

Transmission Up Trapdoor: N1 Trapdoor: N1

Down |Q|(m + n)N1 |Q|(m + n)N2

Total (|Q|(m + n) + 1)N1 N1 + |Q|(m + n)N2

Computation Encryption C : O(m + n)Exp + O(k(m + n))Hash1 C : O(m + n)AES.enc + O(m + n)Mul+

O(|D|(m + n))Exp + O(k(m + n))Hash2

Trapdoor/Decryption C : O(1)Exp C : O(1)Exp

Join Execution S : O(mn)Mul + O(kmn)Hash1 S : O(mn)Exp + O(kmn)Hash2

Security Data Security OW-CPA OW-CPAControllability

p p

Predicate Confidentialityp

Proxy Securityp p

Note: m, n, the number of elements in two joined columns; N1, the size of multiplicative group G in a classical asymmetric system,

for example, Boneh–Goh–Nissim (BGN) or RSA; N2, the key size in a symmetric key cryptosystem, for example, AES; q, the size of the

subgroup of multiplicative group G; |BF1|, the size of the Bloom filter (BF) in the PPGJ scheme; |BF2|, the size of BF in [24]; C, client;

S, server; k, the number of hash functions used for the BF; |Q|, estimated query selectivity; |D|, estimated number of elements satisfying

the query; Exp, the complexity of one exponentiation operation on group G. Mul: the complexity of one multiplication operation on group

G; Hash1, the complexity of one hash operation that produce uniformly distributed output in the rang [0, |BF1|] over all possible inputs;

Hash2, the complexity of one hash operation that produce uniformly distributed output in the rang [0, |BF2|] over all possible inputs; PPGJ,

privacy-preserving general join; OW-CPA, chosen-plaintext attacks.

of the size of final results. In Figure 5, data transmis-sion overhead increases by the growth of query selectivity,because more final results should be sent back to thequery proxy. Furthermore, because all ciphertexts satisfy-ing the join predicate should be transferred to the clientin our scheme and [24], and while the former has a largersize than the latter, the data transmission overhead ofour method grows more quickly than [24]. However, weobserve that if the join attributes are not specified in theSELECT clause, our scheme and [24] can incur the sametransmission overhead.

6.2.3. Computation overhead.

For the PPGJ scheme, the process of encryption is togenerate DBGN ciphertexts and a BF, so the computationoverhead of encryption is O(m+n)Exp+O(k(m+n))Hash1.After receiving the trapdoor from client, which requiresO(1)Exp, the server computes Equation 7 for each pair andthen tests whether its result is in the BF. So the computa-tion cost of join execution is O(mn)Mul + O(kmn)Hash1.For [24], the process of encryption is to generate a sym-metric encryption ciphertext, an obfuscation and a BFfor each element: [EK (a), O(a), BF(a)], which have beendescribed in the Section 2, so the computation overheadof encryption is C : O(m + n)AES.enc + O(m + n)Mul+O(|D|(m + n))Exp + O(k(m + n))Hash2. After receiving thetrapdoor rAB from the client, which requires O(1)Exp, the

server computes eA(b) = rO(b)AB mod p for each pair and then

tests whether its result is in the BF. So the computation costof join execution is S : O(mn)Exp + O(kmn)Hash2.

As shown in Table I, for the encryption by the client,our proposed scheme incurs lower cost than [24], because[24] incurs Exp operations for all expected values insertedinto the BF for each element. For the trapdoor genera-tion by the client, our scheme and [24] incur the samecost resulting from the trapdoor generation: O(1)Exp. Forthe join execution, our scheme has great advantage in thecomputation of server compared with [24] because Expoperation is much more expensive than Mul operation, forexample, as a results in [24], 512-bit modular exponenti-ations (with 160-bit exponents) take 274 �s while 512-bitmodular multiplications take only 687 ns.

6.2.4. Security.

With regard to the security, we have analyzed that ourPPGJ scheme can achieve OW-CPA data security, control-lability, predicate confidentiality and proxy security. For[24], because the same message in the same column isencrypted to the same ciphertext, it cannot achieve CPAsecurity and only achieve OW-CPA security. Controllabil-ity was described in [24]. It is obvious that [24] cannotachieve predicate confidentiality, because during the joinexecution, the server needs the corresponding open(pred)for the predicate revealed by the client. This informationcan be considered a part of join predicate. Furthermore,[24] can achieve proxy security because the generation of

1242 Security Comm. Networks 2014; 7:1232–1244 © 2013 John Wiley & Sons, Ltd.

DOI: 10.1002/sec

Page 12: PPGJ: A privacy-preserving general join for outsourced encrypted database

S. Ma, B. Yang, and M. Zhang PPGJ: A privacy-preserving general join for outsourced database

trapdoor cannot need the decryption key for the messageand meanwhile the message cannot be recovered from theauxiliary information (e.g., the secret value (xA,yA) cor-responding to column A) because of subgroup decisionassumption.

From the previous analysis, we conclude that our PPGJscheme is indeed efficient in terms of storage and compu-tation cost meanwhile achieving security. Meanwhile, [24]has an advantage in transmission overhead. Actually, [24]is suitable for the finite match predicates where the size ofall possible pairs satisfying the predicate has a small upperbound while our proposed scheme has an advantage insupport simple general join predicates(“ > ”, “ < ”, “ = ”).

7. CONCLUSION

In this paper, we propose a privacy-preserving generaljoin for outsourced database on the basis of the compari-son of ciphertexts, where functional tests include “A = B”,“A > B” and “A < B”, and provide security analysis thatour scheme achieves data security, controllability, predi-cate confidentiality and proxy security. We also compareit with [24] and analyze their advantages and disadvan-tages theoretically. The future work includes: (1) to furtherstudy the security model of PPGJ, e.g., although the servercannot get the knowledge of the whole information ofplaintext, it may gains the relationship of elements in thesame column, which also exists in [24] and is called same-column duplicate leaks. It may be a valuable informationleakage; (2) to extend the functional test to the comparisonon ciphertexts generated by different data owners.

ACKNOWLEDGMENTS

This work is supported by the National Natural ScienceFoundation of China under Grant 61272436 and 61272404,the Natural Science Foundation of Guangdong Provinceunder Grants 10351806001000000 and S2012010010383.

REFERENCES1. Subashini S, Kavitha V. A survey on security issues

in service delivery models of cloud computing. Jour-nal of Network and Computer Application 2011; 34(1):1–11.

2. Agrawal D, El Abbadi A, Emekci F, Metwally A.Database management as a service: challenges andopportunities, The 25th International Conference onData Engineering, Piscataway, NJ, USA, IEEE, 2009;1709–1716.

3. Lehner W, Sattler K U. Database as a service (DBaaS),The 26th International Conference on Data Engineer-ing, Piscataway, NJ, USA, IEEE, 2010; 1216–1217.

4. Hacigumus H, Iyer B, Li C, Mehrotra S. Execut-ing SQL over encrypted data in the database-service-

provider model, Proceedings of the ACM SIGMODInternational Conference on Managment of Data,Madison, WI, United states, Association for Comput-ing Machinery, 2002; 216–227.

5. Hacigumus H, Iyer B. Mehrotra S efficient execu-tion of aggregation queries over encrypted relationaldatabases. In The 9th International Conferenceof Database Systems for Advanced Applications,LNCS 2973. Springer-Verlag: Berlin, Germany, 2004;125–136.

6. Kantarciolu M, Clifton C. Security issues in query-ing encrypted data. In The 19th Annual IFIP WG 11.3Working Conference on Data and Applications Secu-rity, LNCS 3654. Springer Verlag: Storrs, CT, Unitedstates, 2005; 325–337.

7. Amanatidis G, Boldyreva A, O’Neill A. Provably-secure schemes for basic query support in outsourceddatabases. Data and Applications Security XXI 2007:14–30.

8. Li J, Omiecinski E R. Efficiency and security trade-off in supporting range queries on encrypted databases.In The 19th Annual IFIP WG 11.3 Working Con-ference on Data and Applications Security, LNCS3654. Springer Verlag: Storrs, CT, United states, 2005;69–83.

9. Xie M, Wang H, Yin J, Meng X. Integrityauditing of outsourced data, Proceedings of the33rd International Conference on Very Large DataBases, VLDB Endowment, Vienna, Austria, 2007;782–793.

10. Song DX, Wagner D, Perrig A. Practical techniquesfor searches on encrypted data, Symposium on Secu-rity and Privacy, Berkeley, CA, USA, IEEE, 2000;44–55.

11. Pang HH, Zhang J, Mouratidis K. Scalable verificationfor outsourced dynamic databases. Proceedings of theVLDB Endowment 2009; 2(1): 802–813.

12. Boneh D, Crescenzo GD, Ostrovsky R, PersianoG. Public key encryption with keyword search.In Advances in Cryptology – EUROCRYPT 2004,LNCS 3027. Springer-Verlag: Berlin, Germany, 2004;506–522.

13. Agrawal R, Kiernan J, Srikant R, Xu Y. Order pre-serving encryption for numeric data, Proceedingsof the ACM SIGMOD International Conference onManagement of Data, Paris, France, Association forComputing Machinery, 2004; 563–574.

14. Boldyreva A, Chenette N, Lee Y, O’Neill A. Order-preserving symmetric encryption. In The 28th AnnualInternational Conference on the Theory and Appli-cations of Cryptographic Techniques, EUROCRYPT,LNCS 5479. Springer Verlag: Cologne, Germany,2009; 224–241.

Security Comm. Networks 2014; 7:1232–1244 © 2013 John Wiley & Sons, Ltd. 1243DOI: 10.1002/sec

Page 13: PPGJ: A privacy-preserving general join for outsourced encrypted database

PPGJ: A privacy-preserving general join for outsourced database S. Ma, B. Yang, and M. Zhang

15. Boneh D, Waters B. Conjunctive, subset, and rangequeries on encrypted data. Theory of Cryptography2007: 535–554.

16. Bellare M, Boldyreva A, O’Neill A. Deterministicand efficiently searchable encryption. In CRYPTO,LNCS 4622. Springer: Santa Barbara, CA, USA, 2007;535–552.

17. Bellare M, Fischlin M, O’Neill A, Ristenpart T. Deter-ministic encryption: Definitional equivalences andconstructions without random oracles. In CRYPTO,LNCS 5157. Springer: Santa Barbara, California,USA, 2008; 360–378.

18. Boldyreva A, Fehr S, O’Neill A. On notions ofsecurity for deterministic encryption, and efficientconstructions without random oracles. In CRYPTO,LNCS 5157. Springer: Santa Barbara, CA, USA, 2008;335–359.

19. Yang G, Tan C, Huang Q, Wong DS. Probabilisticpublic key encryption with equality test. In CT-RSA,Pieprzyk J (ed), LNCS 5985. Springer: San Francisco,CA, USA, 2010; 119–131.

20. Tang Q. Towards public key encryption scheme sup-porting equality test with fine-grained authorization.In the 16th Australasian Conference on Informa-tion Security and Privacy, LNCS 6812. Springer:Melbourne, Australia, 2011; 389–406.

21. Tang Q. Public key encryption supporting plain-text equality test and user-specified authorization.

Security and Communication Networks 2012; 5 (12):1351–1362.

22. Tang Q. Public key encryption schemes supportingequality test with authorization of different granularity.International Journal of Applied Cryptography 2012;2(4): 304–321.

23. Carbunar B, Sion R. Joining privately on outsourceddata. In The 7th VLDB Workshop on Secure DataManagement, LNCS 6358. Springer: Singapore, 2010;70–86.

24. Carbunar B, Sion R. Towards private joins on out-sourced data. IEEE transactions on knowledge anddata engineering 2012; 24(9): 1699–1710.

25. Ma S, Yang B, Li K, Xia F. A privacy-preservingjoin on outsourced database. In The 14th InformationSecurity Conference, ISC 2011, LNCS 7001. Springer:Xi’an, China, 2011; 278–292.

26. Boneh D, Goh E J, Nissim K. Evaluating 2-DNF for-mulas on ciphertexts. Theory of Cryptography 2005:325–341.

27. Bloom BH. Space/time trade-offs in hash coding withallowable errors. Communications of the ACM 1970;13(7): 422–426.

28. Goh E J. Secure indexes, 2003. Cryptology ePrintArchive, http://eprint.iacr.org/2003/216/.

29. Lenstra A K, Verheul E R. Selecting crypto-graphic key sizes. Journal Cryptology 2001; 14 (4):255–293.

1244 Security Comm. Networks 2014; 7:1232–1244 © 2013 John Wiley & Sons, Ltd.

DOI: 10.1002/sec