new order preserving encryption model for outsourced databases in cloud environments

New order preserving encryption model for outsourced databasesin cloud environments

Zheli Liu a, Xiaofeng Chen b, Jun Yang a, Chunfu Jia a, Ilsun You c,n

a College of Computer and Control Engineering, Nankai University, Chinab State Key Laboratory of Integrated Service Networks, Xidian University, Chinac School of Information Science, Korean Bible University, South Korea

a r t i c l e i n f o

Article history:Received 18 October 2013Received in revised form7 June 2014Accepted 7 July 2014

Keywords:Order preserving encryptionOutsourced databasePrivacy protectionCloud computingCiphertext-only attack

a b s t r a c t

The order of the plaintext remains in the ciphertext, so order-preserving encryption (OPE) scheme isunder threat if the adversary is allowed to query for many times. To hide the order in the ciphertext, theonly ideal-security OPE scheme (Popa et al., 2013) requires the database server to maintain extrainformation and realize comparison or range query by user defined functions (UDFs). However, orderoperations will no longer be performed directly on the ciphertext. It will affect the efficiency and makethis scheme to be not suitable for some cases.

In this paper, we aim at constructing efficient and programmable OPE scheme for outsourceddatabases. Firstly, we introduce the system model of outsourced database where OPE scheme will beused, show that ciphertext-only attack is basic and practical security goal. Secondly, we discuss thestatistical attack for OPE schemes, point out how to hide data distribution and data frequency isimportant when designing OPE schemes. Thirdly, we propose a new simple OPE model, which usesmessage space expansion and nonlinear space split to hide data distribution and frequency and furtheranalyze its security against two kinds of attack in details. Finally, we discuss implementation detailsincluding how to use our OPE scheme in the database applications. And we also evaluate its performancethrough the experiment. The security analysis and performance evaluation show that our OPE scheme issecure enough and more efficient.

& 2014 Elsevier Ltd. All rights reserved.

1. Introduction

Order-preserving encryption (OPE) is a common encryptionscheme which ensures that the order of plaintexts remains in theciphertexts. It is appealing because systems can perform orderoperations on ciphertexts in the same way as on plaintexts: forexample, a database server can build an index, perform SQL rangequeries, and sort encrypted data, all in the same way as for plaintextdata. This property results in good performance and requires mini-mal changes to existing software, making it easier to adopt.

In the cloud computing and big data environments, OPE will bemore useful, because: (1) outsourced database has attracted muchattention recently due to the emergence of cloud computing, how-ever, how to protect the outsourced data storing in the untrustedcloud server becomes a serious problem. Since order-preserving, OPEallows untrusted server to perform database operations, such ascomparison and range query over encrypted data, without decrypt-ing them; (2) in the big data environment, a fruitful direction for

future research in data mining will be the development of cryptologytechniques that incorporate privacy concerns. For most of the datamining algorithms usually rely on the order of data, OPE will be alsothe ideal tool when to protect data privacy using the cryptographictechniques and ensure the right results can be mined.

The ideal security goal for an OPE scheme, IND-OCPA (Boldyrevaet al., 2009), is to reveal no additional information about the plaintextvalues besides their order (which is the minimum requirementfor the order-preserving property). Until now, the only ideal-security OPE scheme is mutable order-preserving encoding (mOPE)scheme (Popa et al., 2013), which is proposed by Popa et al. in 2013,where the ciphertexts reveal nothing except for the order of theplaintext values. The mOPE works by building a balanced search treecontaining all of the plaintext values encrypted by the application inthe database side, and it requires the encryption protocol to beinteractive and for a small number of ciphertexts of already-encrypted values to change as new plaintext values are encrypted(e.g., it is straightforward to update a few ciphertexts stored in adatabase), and these operations in database side can be implementedby user define functions (UDFs).

It has been a problem of OPE that how to improve security butensure the function and the efficiency. Although mOPE has ideal

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/jnca

Journal of Network and Computer Applications

http://dx.doi.org/10.1016/j.jnca.2014.07.0011084-8045/& 2014 Elsevier Ltd. All rights reserved.

n Corresponding author.E-mail address: [email protected] (I. You).

Please cite this article as: Liu Z, et al. New order preserving encryption model for outsourced databases in cloud environments. Journalof Network and Computer Applications (2014), http://dx.doi.org/10.1016/j.jnca.2014.07.001i

Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎

www.sciencedirect.com/science/journal/10848045

www.elsevier.com/locate/jnca

http://dx.doi.org/10.1016/j.jnca.2014.07.001



mailto:[email protected]





security, but the interaction and tree balancing will affect itsefficiency, besides, UDFs and the maintained balance tree makeit be not suitable for the cases in which: (1) user has no permissionto create UDFs in the database, for example, some small companiesdeploy their web applications into the rented web server using therented database; (2) the application requires the direct ordercomparison on the ciphertext, for example, the OPE is used toachieve privacy-preserving data publishing for special data miningtask. Another scheme (Boldyreva et al., 2009) has provablesecurity guarantees: the encryption is equivalent to a randommapping that preserves order, however, the experiment in Popaet al. (2011) shows that it has a poor efficiency and its executiontime of encryption is 9 ms. Except them, some other OPE schemes(Kadhem et al., 2010; Seungmin et al., 2009; Yum et al., 2012) havebeen proposed, however, they all leak more information than justthe order of values. Thus, it is always necessary to propose anefficient OPE scheme with the practical security.

In this paper, we aim at proposing feasible, programmable andsecure OPE scheme which is practical on the outsourced databaseor privacy-preserving data publishing (Fung et al., 2010). Inparticular, we assume that: (1) the database should support thedirect order comparison on the ciphertext, i.e., the ciphertextshould also be numerical data; (2) the new OPE scheme shouldhave a good performance and lead to minimal change for existingsoftware, and the ideal OPE ciphertext can be stored in the originalfield; (3) the basic security goal for outsourced database is againstthe ciphertext-only attack, besides, the security against chosen-plaintexts attack can also be achieved if we make some restrictionof the database system due to the different scene.

2. System model

In this section, we will briefly discuss the system model fordatabase applications based on the cloud storage, where OPEscheme will be applied, and further discuss its adversary model.

2.1. Basic model

As shown in Fig. 1, there are three different roles in the model,which are owner, cloud database service provider and applicationserver.

� Cloud database service provider: It is the service provider, whoprovides the cloud storage service and allows paying customersto store their application data. It helps customers to reduce themanagement and maintenance cost, and avoids purchasingexpensive hardware and database software. However, it must

be untrusted and can be defined as “honest but curious”, i.e. itis interested in the users' private data.

� Owner: It is the data provider, who stores data to the rentedcloud database.

� Application server: It is not the necessary role in our model, butdatabase applications based on three layer architecture usuallyuse it to process business operation. For database applicationsbased on “client/server” model, the owner and applicationserver will be the same role. So, in our model, we assume thatapplication server is trusted as the owner, and we call them as“OPE client”.

There are also two data flows in the model, i.e., storing data andquerying data.

� Storing data: To store data, data owner should firstly use OPE toencrypt the data which needs for preserving order in the OPEclient (owner or application server), and then store the cipher-text to the cloud.

� Querying data: To perform a query, data owner should firstlyuse OPE to encrypt the keyword in the range query or exactquery SQL sentence, and then send the new SQL query to thecloud. The cloud database can directly execute the SQL sen-tence and return the results to the OPE client.

Notice: In our system model, the OPE operations are happenedin OPE client, but comparison or range query can be directlysupported by the database server. And thus, the OPE scheme willbe suitable for privacy-preserving data publishing.

2.2. Adversary model

If the OPE encryption executes in the application server side, weassume that sufficient access control or other effective methodsare applied, to make sure the application server will not leak thekey information.

We consider two types of attackers:

1. Attackers have access rights of database, such as DBA or cloudservice provider of outsourced database. They can see encrypteddata, database structure, but can only launch ciphertext-only attack. The security against such attackers is the basic andpractical security notion.

2. Attackers have access rights of both application system anddatabase, who can access SQL interpretation interface deployedin database applications, construct SQL sentences with plaintext,gain interpreted SQL sentences with encrypted data, view allfields and structure of database. They have more information toguess the encryption details. They can launch chosen plaintext orciphertext attacks, in order to guess encryption key. The securityagainst such attackers is the advanced security notion.

In the practical applications, the main threat is the first typeattacker. In this case, curious attacker is easy to get the data storing indatabase, but he is difficult to get encryption key which can beprotected by the cryptographic method. So that the security againstthe ciphertext-only attack is our basic and practical security goal.

3. Related works

In this section, we will make a summary on the related works,discuss statistical attack for OPE schemes and introduce twotypical OPE schemes.Fig. 1. System model for outsourced database.

Z. Liu et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎2





3.1. Summary

Table 1 shows the comparison between the typical OPEschemes.

About the usage, i.e., order comparison operation, except (Popaet al., 2013) and (Boldyreva et al., 2011), the OPE schemes cansupport direct comparison in the ciphertext, for the reason thattheir ciphertext is in the form of numerical data. For database, it isobvious that it would cost plenty of time if we have to performother operations (e.g., UDFs like in Popa's scheme Popa et al., 2013)to realize the ciphertext comparison. The ideal ciphertext is in theform of numerical data with the nature order involved, and can bestored in the original field.

About the security. Boldyreva et al. (2009) were the first toprovide a rigorous treatment of the security; in fact, they showedthat it is infeasible to achieve ideal security for OPE, under certainimplicit assumptions. As a result, they settled on a weaker securityguarantee that was later shown to leak at least half of the plaintextbits (Boldyreva et al., 2011). Popa et al. (2013) presented the firstideal-security order-preserving encoding scheme where theciphertexts reveal nothing except for the order of the plaintextvalues. Although Liu and Wang's scheme (Liu and Wang, 2013) willleak more information and be dangerous for users, but it has agood efficiency and is programmable, thus, the further researchbased on their scheme may be helpful for proposing the practicaland efficient OPE scheme for database applications.

About the efficiency, we can see that security and efficiency arecontradictory, the high security has, the low efficiency have.

Through the above summary, we can also conduct a conclu-sion: to achieve high security, OPE must hide the order in theciphertext and use additional function to finish order comparisonwith privacy concerns, such as operations in Popa et al. (2013);however, this approach will lead to low efficiency, extra storageand no direct comparison on ciphertext.

3.2. Statistical attack for OPE schemes

As described in above section, the schemes supporting directorder comparison in ciphertext always have the relative lowsecurity, because the order will give adversary more backgroundknowledge. But these schemes always have good efficiency andrequire minimal changes to existing database application soft-wares, so the further research is necessary.

In this paper, we consider the practical OPE scheme for out-sourced database, in which the most common attack is theciphertext-only attack. In this condition, statistical attack may bethe most effective method under such attack. We consider theadversary with the background knowledge, i.e., he can obtainsome statistical information from other data providers. This kindof adversary is often mentioned in the privacy-preserving datamining (Agrawal and Srikant, 2000; Lindell and Pinkas, 2000;Vaidya and Chris, 2000) in the big data environment, but havenot been discussed in other OPE schemes yet. These adversaries

can get some useful statistical information to launch an attack,including:

� Data distribution: From the data distribution between plaintextand ciphertext, the adversary can easily confirm the range ofciphertext. For example, for the given field like “salary”, assumeadversary can easily know data distribution of employees’salaries like in Fig. 2, he can firstly make a statistics onciphertext and tries to confirm which range contains the densedata from 4000 to 5000.

� Data frequency: From the knowledge of data with high fre-quency, the adversary can easily confirm some value of cipher-text with the same frequency, and then launch further attack.As shown in Fig. 2, the adversary maybe know the salary withhighest frequency is 5000, then he can easily guess theciphertext of 5000 by frequency attack.

All the proposed OPE schemes did not take the statistic char-acteristics in consideration except Agrawal et al. (2004). In fact,even Agrawal et al. (2004) did not offer a feasible method againstthe statistic attack. So, how to hide the rule of data distribution anddata frequency is very important for OPE scheme supporting directorder comparison, and it is the goal of OPE scheme.

3.3. Liu and Wang's linear scheme

In 2013, Liu and Wang presented a basic scheme which worksas follows:

The secret K¼ fða; bÞ; a; bAZþ ;g. To each value of the plaintextsv, we assign the encryption of v: Here we randomly choose thenoise from f0;1;…; a�1g so that the order of the plaintexts wouldnot change after the encryption. Unluckily, this scheme owns nosecurity guarantee because the a; b are constant during all theencryption. In this way, if the adversary obtains two pairs of

Table 1Comparison between typical OPE schemes.

Scheme Efficiency level Security level Order comparison

Agrawal'04. (Agrawal et al., 2004) Medium Low DirectlyBoldyreva'09 (Boldyreva et al., 2009) Low Medium DirectlyAgrawal'09 (Agrawal et al., 2009) Medium Medium DirectlyBoldyreva'11 (Boldyreva et al., 2011) Low Medium DirectlyLiu'13 (Liu and Wang, 2013) High Low DirectlyPopa'13 (Popa et al., 2013) Low High UDFs

Fig. 2. Data distribution of salaries.

Z. Liu et al. / Journal of Network and Computer Applications ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 3





plaintexts and ciphertexts:

av1þbþnoise1 ¼ Encðv1Þav2þbþnoise2 ¼ Encðv2Þ :

(

We can easily get the equation aðv1�v2ÞþΔnoise ¼ Encðv1Þ�Encðv2Þ,where Δnoise ¼ noise1�noise2Af0;1;…; a�1g still.

Then the adversary can get a range of a:

Encðv1Þ�Encðv2Þv1�v2þ1

rarEncðv1Þ�Encðv2Þv1�v2�1

So the key space K lessens much and the parameter a will beeasily learned. After some statistical analysis of a series of the formbþnoise, the value of b will be get also.

Liu and Wang's quasi-linear encryption scheme is insecureagainst such an attack. Although it is obviously feasible foroutsourcing, it fails to protect the information anywhere in thedatabase. But we should mention that this quasi-linear is indeedefficient with the certain form for encryption and decryption.

3.4. Popa et's mutable tree scheme

To achieve an ideal security, Popa et al. proposed an order-preserving encoding scheme through the binary searchable tree. Intheir scheme, all the values of the plaintexts are put in order andgiven each of them the sequence number as the encoding.Suppose the values of the plaintexts are fvig and we already havevjovjþ1 to each j¼ 0;1;…;n, then the encoding value EncðvjÞ ¼ j.

In this way, we can put all the pairs ðj; vjÞ in a binary tree.According to their path from the head node, each ðj; vjÞ pair owns abinary sequence with different length. Therefore, to keep the samelength, the OPE encoding is defined as follows:

OPE encoding of vj¼[path]10…0The order of the plaintexts value remains here.To prevent encoding of some nodes in the binary tree get too

large, they applied the tree balancing to keep a B-tree. But thecost of the operation-insert, delete and lookup is always too high.For instance, a full binary tree with the height of h owns n¼∑h�1

i ¼ 02i ¼ 2h�1 nodes. Taking the common operations on such a

tree, we may visit about h nodes which cost is Oðln nÞ.As a result, we are sure that it is indeed a strictly secure

scheme. It will only leak the order of plaintext but nothing else.Unfortunately, the interaction and tree balancing will affect itsefficiency.

4. Technique preliminary

4.1. Order-preserving encryption

A regular cryptosystem fM; C;K; Enc;Decg contains the space ofplaintext M, the space of ciphertext C, the space of the key K, theencryption algorithm Enc and the decryption algorithm Dec.

In this way, an order-preserving encryption is an encryption ofencoding schemes that if we have an order o of plaintext andxoy, then the corresponding ciphertext will satisfy EncðxÞoEncðyÞ(and vice versa: if the ciphertext EncðxÞoEncðyÞ, then we musthave xoy).

4.2. General idea

The order of the plaintext remains in the ciphertext, so such acryptosystem is under threat if the adversary queries for manytimes, which is proved in Lemma 4.1. Although mOPE can achieveideal security, the interaction and tree balancing affect its effi-ciency. The reason that Liu's scheme is not secure is that any twosuccessful chosen-plaintext attack will lead to the leak of the

whole cryptosystem, however, the linear operation makes it canhave a good efficiency and be programable. With the furtherresearch based on Liu's scheme, we try to present an efficientand more secure OPE scheme for outsourced database.

The intuition for how our OPE scheme works is simple. Firstly,we randomly split the original message space into successiveintervals with different length. Secondly, we select an extendedciphertext space and split it into the same number of intervals.Finally, we use some nonlinear mapping functions to mapthe original element into another one in the extended messagespace. For different interval, the different mapping function shouldbe used.

There are two key points described as following:

1. Extend message space: Extending message space is the precon-dition of our OPE scheme, for the two following reasons: thefirst reason is that the databases support to store the ciphertextin the extended message space. For example, it is feasible thatchange the original field's datatype from “number(8,0)” to“number(10,6)” in Oracle, and such change from low-precisionto high-precision will not cause the loss of data; the otherreason is that it is helpful to hide the data frequency. The samedata can be randomly mapped into a range of the extendedciphertext space, thus, the frequency of this data will behidden.

2. Nonlinear split message space: Splitting message space providesan effective way to hide the data distribution. For example, oneadopted method is that: for the range where more data existsin, the more interval can be split; for a dense interval contain-ing the high frequency data, its corresponding ciphertextinterval has a large range.

4.3. Extended message space

For both the plaintext spaces M and ciphertext space C, theywill be treated as a metric space. In this way, there is a functiondðx; yÞ to measure the distance, where

� dðx; yÞ ¼ 0 if and only if x¼y;� dðx; yÞ ¼ dðx; yÞ;� dðx; yÞþdðy; zÞZdðx; yÞ.

Since the plaintext spaces M is encoded into a successivesubset where dðx; yÞ ¼ jx�yj, it will always satisfy the above threeconditions. But what the ciphertext space C need to satisfy is onlythe first two, where we only care about the dispersion.

Suppose M¼ f1;2;…;Mg, thus jMj ¼M, i.e., the number ofelements in M is M, and the distance between any adjacentelements in M is obviously 1. We can set jCj ¼M and C¼fc1; c2;…; cMg, where ci ¼ EncðiÞ, and the distance of any twoadjacent element (e.g., ci and ciþ1) is always much greater than1. Otherwise, if dðci; ciþ1Þ ¼ 1, the space of ciphertext is almost thesame to the plaintext (C is just a shift of M). In fact, we hope C ismuch larger than M, i.e., jCjcM.

Lemma 4.1. Suppose jMj ¼M and Enc is order-preserving, the rangeincluding any Enc(x) can be get after about log M times queriesunder chosen-plaintext attack.

Proof. Consider the adversary wants to get the plaintext x of aknown Enc(x). Under chosen-plaintext attack, the adversary canrandomly select a value m and query the encryption oracle to getits ciphertext Enc(m). For convenience, letM0 be the space where xmay be in.






In the beginning, M0 ¼ f1;2;…;Mg, and the adversary willfirstly select mAM0. Considering the efficiency, m can be selectedas about M=2. Because the function Enc is order-preserving,if EncðmÞoEncðxÞ, then xAf1;2;…;mg; otherwise, xAfmþ1;mþ2;…;Mg. So, after this query, the reduced range M0 where xis in will be confirmed.

By repeating above operations, i.e., select mAM0 (m can beselected as the middle element of M0) and further confirm thereduced rangeM0, after about k¼ log M times, he will finally get x.

For example, for jMj ¼ 2k, the range will be reduced by half,and it will contain only one element after at most k queries.

Lemma 4.2. If M is a set of integers with jMj ¼M and the encryptfunction Enc(x) satisfy

krdðEncðiÞ; EncðjÞÞdði; jÞ rK;

If we add a random noise δ to the Enc(i) where δ o12ki

�� , the Enc(i)is still reversible. In other words, there is a certain i mapping to eachEnc(i)

Proof. In this case, the C is a k to K times extension of M.For any adjacent integer i; iþ1, we have

krdðEncðiÞ; Encðiþ1ÞÞdði; iþ1Þ ¼ dðEncðiÞ; Encðiþ1ÞÞrK ;

so the gap between Enc(i) and Encðiþ1Þ is no more than ki. If

EncðiÞþδ-EncðiÞ;Then Enc(i) is also reversible since values in the set fEncðiÞgMi ¼ 1 arestill isolated to each other. So none of them will merge while theorder can be preserved somewhat. □

We should mention that if dðEncðiÞ; EncðjÞÞ=dði; jÞ ¼ k and k is aconstant, then the encrypt function is linear such as EncðxÞ ¼ kxþb.In this way, the function Enc(x) is invalid after just two queries.Similarly, if jdðEncðiÞ; EncðjÞÞ=dði; jÞ�kjrδ and δ¼ oðkÞ, the Enc(x)will be semi-linear as we discussed above in Liu's scheme.

The conclusion we want to present here is that: the more thatvalue of dðEncðiÞ; EncðjÞÞ=dði; jÞ (with different(i,j)) varies, the moresecure the order-preserving cryptosystem is.

5. New OPE model

Based on the two key points described in Section 4.2, we willextend the ciphertext to be more discrete and propose a newOPE model.

5.1. Notations

For convenience, we introduce the notations used in the rest ofthe paper:

� Let Di be an interval of mathematics of original message space,and Di ¼ ðli; ri�, ði¼ 1;2;…;mÞ, where li is the minimum valuebut ri is the maximal value of Di, and if xADi, then lioxrri. Fortwo adjacent intervals Di and Diþ1, we further have ri ¼ liþ1.� Let Ci be an interval of mathematics in the extended messagespace, and Ci ¼ ðl0i; r0i�, ði¼ 1;2;…;mÞ.

� Let EncðÞ be a family of monotone increasing functions, which iscomposed of different function EnciðÞ for different interval Di.Thus, Enci(x) denotes a concrete function, where xADi, i.e.,lioxrri.� Let range(i) be the function to get the left and right value ofinterval Di, and its output is like (l,r).

� Let range0ðiÞ be the function to get the left and right value ofinterval Ci, and its output is like ðl0; r0Þ.

� Let index(x) be the function to find the index number i whichinterval Di contains x.

5.2. Operational principle

As shown in Fig. 3, there are three steps in our OPE model:

5.2.1. Splitting the message spaceTo extend the message space, the first step is splitting the

message space M to a sequence of intervals such as Di ¼ ðli; ri�,(i¼ 1;2;…;m).

Because the M is always discrete space, we set li; riAZ whichsatisfy:

M¼ ⋃m

i ¼ 1Di ¼ ⋃

m

i ¼ 1½li; ri�

½li; ri� \ ½lj; rj� ¼ ϕðia jÞ:

8><>:

Splitting the message space is helpful to destroy the rule of datadistribution: for a data collection, the more data exists, the moreintervals should be divided. How to split is described in Section 5in detail.

5.2.2. Splitting the ciphertext spaceThe second step is splitting the ciphertext space C to a sequence

of intervals such as Ci ¼ ðl0i; r0i�, (i¼ 1;2;…;m), and we set l0i; r0iAZ

which satisfy:

C¼ ⋃m

i ¼ 1Ci ¼ ⋃

m

i ¼ 1½l0i; r0i�

½l0i; r0i� \ ½l0j; r0j� ¼ ϕðia jÞ:

8><>:

Given an original data interval Di, its corresponding ciphertextinterval will be Ci, and we can further have EncðliÞ ¼ l0i and EncðriÞ ¼ r0i.Notice that this split is also helpful to destroy the rule of datadistribution: the more data in Di, the large range should be in Ci.

5.2.3. Mapping each data to extended spaceAfter both split, the third step is mapping each data to the

extended ciphertext space C. We use function Enci(x) (a piecewisefunction) to extend Di ¼ ½li; ri� to Ci ¼ ½l0i; r0i� as following:

For each value x in M, we get the interval Di ¼ ½li; ri� whichcontains x. For different interval Di, the different function EnciðÞwill be used to mapping each value to the target interval Ci ¼ ½l0i; r0i�.The overall process here is like this:

xADi ¼ ½li; ri�⟶EnciðÞEnciðxÞACi ¼ ½l0i; r0i�

To achieve security against frequency attack, an effective way isusing one to many mapping function EncðÞ. So, the key of our OPE

Fig. 3. Operational principle of OPE.






scheme is how to design such encryption functions, and wediscuss its details in Section 5.4.

5.3. Split methods

To implement the split, we should give the parameters:

paras¼ ðxmin; xmax; fTig; dminÞ

where xmin and xmax is the start and end point of the plaintext, fTigis the set of dense intervals and dmin is the minimal length ofinterval we can set.

In fact, with no dense intervals offered, our extension plan willstill break the statistic characterizers. It is common that there is noinformation about the distribution before encryption. In the nextsection, we will prove that our scheme is secure against theciphertext-only attack with the breaking of distribution.

The goal of split is to break the statistical characteristics. Wewill discuss how to split the message spaceM and the ciphertext Cto achieve it.

The principles of designing such functions are that:

� For an original data collection, the more data exists, the moreintervals should be divided, and we call these intervals as denseinterval.

� For an original dense interval Di containing the high frequencydata, its corresponding ciphertext interval Ci should have alarge range, i.e., for two adjacent element x1; x2ACi, dðx1; x2Þc1,or, jr0i� l0ijc jri� lij.

The above principles are helpful to destroy the data distribu-tion, because the dense interval will be extended to a ciphertextinterval with large range, but the sparse interval to a ciphertextone with small range, and by this way, the ciphertexts will be closeto uniform distribution.

Notice: In order to be feasible, the method to split the cipher-text space should satisfy: (a) for any ciphertext yAC, it is easyto get the index index(y) which is the number of split intervalcontains y; (b) for any index number i, it is also easy to get therange bound ðl0i; r0iÞ of the interval Ci. Moreover, for the range andindex, paras is the input of them which requires feasibility incomputing.

Designing such an ideal split may be difficult. For our OPEmodel, we will provide a simple solution in Section 7.2.

5.4. Encrypt function

The encrypt function EncðÞ should satisfy the following twoproperties:

1. EncðÞ is solvable: With any xADi, the cipher Enci(x) is easy forprogramable computation. And if get yi ¼ EnciðxiÞ, the corre-sponding xi ¼ Enc�1

i ðyiÞ is computable also.2. The ki and Ki in kirdðEnciðx1Þ; Enciðx2ÞÞ=dðx1; x2ÞrKi: the ki

must not be too small to keep the security, the Ki must not betoo large to avoid wasting the storage space.

Computational process: To be programable, for ith interval andxADi, Enci(x) will take the range of Di and Ci as input, i.e.,ðli; riÞ’rangeðiÞ and ðl0i; r0iÞ’range0ðiÞ, and output the result yACi.The simplest computational process is as follows:

� compute scale¼ ðl0i�r0iÞ=ðli�riÞ;� map the x to the ciphertext interval, i.e., x0 ¼ l0iþscalenðx� lÞ;� add noise to x0, i.e., compute x0 ¼ x0 þr, where r is the random

value in ð0; scaleÞ;

In fact, Enc(x) can be generated from any increasing function toachieve nonlinear mapping. But here, we use the above linearmapping to introduce our OPE model, and for convenience, we usethe form of Liu's scheme to describe the Enc(x) as

EnciðxÞ ¼ aixþbiþδi;

and in this description:

� The ai ¼ scale, and bi ¼ l0i�scalenl.� The δi is the noise that δi o1

2ai�� and ki ¼ E0iðxÞ ¼ ai ¼ Ki.

As a result, the encrypt function EncðÞ can be denoted asfollows:

EncðxÞ ¼ ∑m

i ¼ 1EnciðxÞ � δðx;DiÞ

¼ ∑m

i ¼ 1ðaixþbiþδiÞ � δðx;DiÞ

where,

δðx;DiÞ ¼0; xADi;

1; x=2Di:

(

Noise: It should be mentioned that the noise δi can help breakthe statistical characteristics. For instance, for some xi with higherprobability in distribution. If we keep the δi obeying the uniformdistribution with the restriction that jδijoai=2, the distribution ofciphertext would be

Pr y¼ aixiþbiþδi� �¼ Prfx¼ xig

jδij:

We have proved it in Section 4 that by this way, we assign aone-to-many function which is

xi-y¼ aixiþbiþδi with jδijo12ai:

In the standpoint of a map from a point to a set, it is also one-to-one function which in another word, is reversible. So after thisthe dense distribution around xi can be deduced in this way. Andthe statistical characteristics can be altered.

5.5. OPE cryptosystem

Based on the above split function and encrypt function, we willuse three algorithms OPE¼(Setup, Encrypt, Decrypt) to describe ourOPE cryptosystem:

� Setup(): This algorithm is run by the OPE client to set up thescheme. It must set the right parameters, including:1. Message space M. It must set the minimum value xmin and

maximal value xmax.2. Ciphertext space C. It must set the minimum value ymin and

maximal value ymax.3. Parameters for message space split. For our implementation,

it must set the dense intervals fTig and the minimal lengthof interval dmin.

4. Parameters for ciphertext space split. For our implementa-tion, it must set the corresponding dense intervals fT 0

ig.

The final key sk will be composed of (xmin, xmax, fTig, fT 0ig, dmin).� Encrypt(x,sk): This algorithm is run by the OPE client to encrypt

the data x and outputs its OPE ciphertext. To encrypt x, it firstlyget i by computing index(x) and then runs Enci(x) to output itsciphertext. We should mention that the δi is randomly generatewhich satisfy δi o1

2ai�� .






� Decrypt(y,sk): This algorithm is run by the OPE client to decryptthe data y and outputs its OPE plaintext. In fact, the decryptionis done by

x¼ y�biai

� �:

Before decryption, it should firstly get i of which interval Cicontains y.

6. Security and maintenance

For the above OPE cryptosystem, we will analyze its securityagainst ciphertext-only attack and a particular chosen-plaintextattack which we will define in the following part. In fact, in ourmodel, we just store the ciphertext in the untrusted database sothe adversary can only get some ciphertext but nothing else.

6.1. Ciphertext-only attack

In fact, we can prove that the base of our model EncðxÞ ¼ axþbþδ can be safe enough under ciphertext-only attack.

Lemma 6.1. Suppose EncðxÞ ¼ axþbþδ ðxAXÞ and the expectationof X :E(X) and variance of X:Var(X) are known. And the parameter δ isirrelevant with x whose expectation is EðΔÞ and variance is VarðΔÞ.Then

EðEncðXÞÞ ¼ aEncðXÞþbþEðδÞVarðEncðXÞÞ ¼ a2VarðXÞþVarðΔÞ

The proof of this lemma is easily got with some basic knowl-edge of statistic.

6.1.1. Method by statisticThe common ciphertext-only attack is the statistic attack by

calculating the frequency of each ciphertext. Under this attack, ifthe plaintextsM¼ f1;2;…;Mg have a totally different distribution,which means Pfx¼ 1g; Pfx¼ 2g;…; Pfx¼Mg display a sequence ofprobability that have a obvious order by the value. Then, if weput it in the order like Pfx¼ σð1Þg4Pfx¼ σð2Þg4…4Pfx¼ σðMÞg,where the function σðiÞ is a permutation of f1;2;…;Mg. We canassert that the ciphertext should maintain this order as

Pfy¼ Encðσð1ÞÞg4⋯4Pfy¼ EncðσðMÞÞg:Meanwhile, by statistic analysis, we can also get the order

of the cipher text by the frequency as Pfy¼ y1g; Pfy¼ y2g;…;

Pfx¼ yMg.As a result, after a comparison between the two sequence, we

can reasonably think

yi ¼ EncðσðiÞÞThen the adversary can get the entire cryptosystem.

6.1.2. PreventionIn fact, before encryption, it is easily to prevent the statistic

attack by distributing the large probabilities. We can spread thesome elements of the sequence. For example, there can be a one-to-many map

g : f1;2;…;Mg-f1;2;…;M0g; g : i-Si ¼ fki;1; ki;2;…; ki;nðiÞg;

where n(i) is the size of the set. From another point of view, g is aone-to-one which maps a number to a set.

Any dense distribution can be reduced by this way. And thefinal sequence can be a series of number with almost equal pro-bability such as even distribution.

We have referred it in Section 5 that the noise δi can help toprevent such attack. The map here is in such form

x⟶EnciðÞfaixþbiþδig; faixþbiþδig⟶

DeciðÞx;

where jδijoa=2 and the size of the set jfaixþbiþδigj � a so thedistribution will be altered by adding the noise δi.

As a result, the probability turns to be

Pr y¼ aixiþbiþδi� �¼ Prfx¼ xig

jδijSo it succeeds in preventing the statistic attack.By the way, to avoid being attacked by statistic analysis, X can

be made roughly obeying even distribution as δ. The statisticalresults here are almost even distributed. And it is also obvious thatour model is not less safe than this basic one linear function.

6.2. A particular chosen-plaintext attack

We have mentioned that by the continues arbitrary queries,almost all the OPE cryptosystem is insecure under the commonchosen-plaintext attack. So we have turn to a restricted chosen-plaintext attack which we call it sparse and random chosen-plaintext attack: SR-CPA.

The SR-CPA owns the following properties:Sparse: If any adversary want to get the ciphertext of fx1; x2;…;

xkg, but the fx1; x2;…; xkg is a dense one, he will only getf0;…; Encðxi1 Þ;…; Encðxil Þ;…;0g, which only contains l ciphers andl{k. Here we set a restriction that after return any ciphertext, thedatabase will not allow any query for the ciphertext close enoughfor some time.

Random: In SR-CPA, the adversary cannot get a sequencewhatever he wanted immediately. For instance, if he want to getthe ciphertext of fx1; x2;…; xkg, he cannot get the cipher sequencefEncðx1Þ; Encðx2Þ;…; EncðxkÞg at the same time. In another word, hecan only get f0;…; Encðxi1 Þ;…; Encðxil Þ;…;0g which only contains lciphers and l{k.

In fact, these two restrictions are reasonable for the queries indatabase or cloud server. Obviously, no client will keep on askingfor the adjacent ciphertext with a high frequency and a time limitof reply. In this way, we can prevent any intervals being attack byarbitrary queries and leaking the encryption function on it.

6.2.1. Two kinds of attack(a) Whole-range attack: In this situation, we suppose that the

adversary get a sequence of ciphertext as fc1; c2;…; cng randomlywhich means no selection but just record them. (b) Exact-heightattack: In this situation, the adversary is curious about only a exactrange such as ½D;U� of the ciphertext and also get a sequence ofciphertext as fc1; c2;…; cng. But all of them satisfy ciA ½D;U�.

It is remarkable that the second attack can be generalized insuch two form: one is basic but danger and the other is complexbut safe:

1. ( iAf1;2;…;mg that ½D;U� � ½EnciðliÞ; EnciðriÞ�.2. ∄iAf1;2;…;mg that ½D;U� � ½EnciðliÞ; EnciðriÞ�, which is

½D;U� � [m2i ¼ m1

½EnciðliÞ; EnciðriÞ�.

At first, as the restriction on of the attack, any adversary willspend plenty of time to succeed in getting such information hewants. In case of failing by chance to resist the attack, since theupdating algorithm we offered in the following part, if theencryption on any intervals leaked, it can be altered in a certaintime. And the leaked encryption will also be out of date at thesame time.






6.3. The updating and maintenance

In case of failing to insist the attack, there can be an algorithmto alter or update our cryptosystem. Obviously, the new oneshould have different parameters to keep the security. Meanwhile,the algorithm should feasible on a large database which is easy forthe server. In another word, our cryptosystem should be bothmutable and feasible. Here we just offer an idea and algorithm tokeep the cryptosystem mutable.

6.3.1. Change ðai;biÞIn case of the leak of some parameter pairs as ðai; biÞ, we can

keep all of them mutable but only alter one pair of them at onetime. For instance, if ðai; biÞ are revealed, we will do the followingstep to update the data:

(a) Traverse the database, if any y satisfy ailioyoairi, decrypt itand get the corresponding x.

(b) Set a new parameter pair: ða0i; b0iÞ-ðai; biÞ.

(c) Replace y by y0 ¼ aixþbiþδ0i.

At last, we should mention that this is just an algorithm in case ofleakage at any intervals. The operation above may lead a high costof time in computing and updating. So, if it is unnecessary, we donot prefer to keep such an updating scheme as routine main-tenance.

7. Implementation and evaluation

7.1. Implementation details

Popa's scheme (Popa et al., 2013) proves that any stateful OPEscheme that is IND-OCPA-secure has ciphertext size exponential inthe plaintext size, and thus, extending ciphertext space caneffectively enhance the security. And for our OPE scheme, thereare two ways to implement:

� Preserving data format and generating the numerical OPEciphertext. In the practical application, the number range isgenerally not great, and the database provides some datatypeswith large range for numerical data, for example, the range of“real” of SQL server is from �3.40Eþ38 to 3.40Eþ38, and it isenough for some OPE applications. By this way, the OPEciphertext will be the real number, and can be stored in theoriginal field, thus it leads to minimal changes to existingsoftware.

� Generating the OPE ciphertext with the datatype of string.To use the more large ciphertext space, the OPE ciphertext willbe character string, which can preserve the characteristic oforder. However, the comparison of string is different fromnumerical data, to ensure the correctness, the OPE ciphertextsmust have the same data length. To represent the big number,we can use the fixed-length hex string to achieve the goal. Forthe data whose length of hex string is less than the fixed-lengthlen, we will fill ’00’ in the prefix, for example, assume len¼10,for a given integer 15, its hex string will be “000000000E”. Bythis way, we can use the datatype like “varchar” with the lengthof len, to represent the range with the maximal value of 16len=2.This way will be more inefficient than the above.

Figure 4 shows how to use our OPE scheme in the databaseapplications. We can see that the OPE operations includingencryption and decryption are only executed in the OPE client.As a result, it is easily to be deployed and realized by anyprogramming language, including java, Cþþ , C#, and so on. The

field should be changed to two possible datatypes, i.e., “real” or“varchar”, the former for most of database applications, the laterfor security consideration. If the “real” is applied, the changing ofexisting softwares will be reduced.

7.2. Performance evaluation

To evaluate OPE scheme's performance, we focus on followingtwo issues: (1) whether the performance of OPE encryptionalgorithms can meet the needs of batch data encryption andconcurrency among multiple users; (2) whether the distributionof original data is protected in the OPE ciphertext.

7.2.1. Concrete split methodTo implement our OPE model and experiment, we need the

concrete split function.We suppose there is only one dense interval T1 in our experi-

ment. So M¼ R1þT1þR2 where R1;R2 are the remaining inter-vals. Then we can randomly select two integer r1; r2, where R1;R2

will be split into equal intervals with the number of r1; r2. But fordense interval T1 ¼ ðl; r�, it can be split by (shown in Fig. 5):

T1 ¼ dminþm [ … [ dmin [ …dminþm:

Besides, the ciphertext space yAC can be split in the same way asplaintext, and we will ignore the details.

By the above split method, the index(x) and range(i) can beconfirmed, i.e.,

indexðxÞ ¼

x�x0t1

; xr l;

DminþmþffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðDminþmÞ2þ4ðx�R1Þ

q2

þr1;

loxrr;x�x2t3

þr1þr2; rox:

8>>>>>>>>>><>>>>>>>>>>:

It is obvious to see that the undefined parameters here are theend points value or the total number of corresponding intervals.So the index(x) can be put in such form which is easy to compute.And so is range(i) with the hypothesis of split method above.

7.2.2. Experimental resultWe set the ciphertext space C as the range of “real”,

and implement it using Cþþ to experiment for averageexecution time.

Fig. 4. OPE usage in database application.






We measured the performance on a machine with an Intel Core(TM) i7-3517U processor running Windows 7. The results are inFig. 6, and we can see that:

1. Figure 6a shows the average execution time of OPE. We can seethat the average execution time is about 0.00025us, so that ourOPE scheme has a very high efficiency.

2. Figure 6b shows the data distribution of the original datacollection of students’ scores. In Fig. 6b, the scores of studentsare mainly from 70 to 88.

3. Figure 6c shows the data distribution of the OPE ciphertexts ofstudents’ scores. In Fig. 6c, the data distribution is obviouslydestroyed.

7.3. Comparison

Table 2 shows the comparison between our OPE scheme andother typical OPE schemes including Popa'13 (Popa et al., 2013)and Liu'13 (Liu and Wang, 2013). From Table 2, we can see that:

1. About the efficiency, the Popa's scheme has the lowest perfor-mance. In Popa's scheme, the client is required to interact with

the server when encrypts a value, and the server is alsorequired to adjust the encoding tree to be balance when addsor removes a node. Our scheme and Liu's scheme are con-structed by some linear mathematical functions withoutany interaction, and they can be regarded as have the sameefficiency.

2. About the security, the Popa's scheme has the ideal-securitybut Liu's scheme has the lowest security. Compared withLiu's scheme, our scheme can achieve the security againstciphertext-only attack, in particularly, our scheme uses mes-sage space expansion and nonlinear space split to hidedata distribution and frequency, and thus it can resist statisticattack.

Fig. 5. An example of splitting the message space.

Fig. 6. System model for outsourced database. (a) Execution time of OPE, (b) data distribution of score and (c) data distribution of ciphertexts.

Table 2Comparison between our scheme and other typical OPE schemes

Scheme Efficiencylevel

Securitylevel

Programmability

Our scheme (Boldyreva et al.,2011)

High Medium High

Liu'13 (Liu and Wang, 2013) High Low HighPopa'13 (Popa et al., 2013) Low High Low






3. About the programmability, our scheme and Liu's scheme willbe better than Popa's scheme. It is easy to implement the linearOPE scheme including our scheme and Liu's scheme, andthe implementation of our scheme has been discussed inthe previous description. However, in Popa's scheme, exceptinteraction and tree balance operations, user defined functionsshould be implemented for different databases, which increasesthe difficulty of implementation.

8. Conclusion

Through the summary of proposed OPE schemes, we conduct aconclusion that OPE must hide the order in the ciphertext toachieve high security. However, this approach will result in thedatabase server cannot support the direct order operations. And itwill limit the application of OPE scheme. We also find that most ofthe proposed OPE schemes did not take the statistic characteristicsin consideration and further introduce the practical statistic attack.We point out that how to hide the rule of data distribution anddata frequency is very important for OPE scheme while supportingdirect order comparison. And it is also the goal of OPE scheme.

Based on the further research of Liu's scheme (Liu and Wang,2013), we proposed a new OPE model. With the help of the noiseand extended space, we offer several ways to break the statisticalcharacteristics of plaintext to insist the ciphertext-only attack. Thesecurity analysis and performance evaluation show that our OPEscheme is both secure and efficient.

Our OPE model can be implemented by any programminglanguage, and users can define their split methods and encryptfunction. We will further study on how to provide a formalnonlinear encrypt function and a new general and perfect splitfunction.

Acknowledgments

This work is supported by the National Key Basic ResearchProgram of China (No. 2013CB834204), National Natural Science

Foundation of China (Nos. 61272423 and 61300241), NationalNatural Science Foundation of Tianjin (Nos. 12JCYBJC10100,13JCQNJC00300 and 14JCYBJC15300), and Specialized ResearchFund for the Doctoral Program of Higher Education of China (No.20120031120036).

References

Agrawal R, Srikant R. Privacy-preserving data mining. ACM Sigmod Record2000;292:439–50.

Agrawal R, Kiernan J, Srikant R, Xu Y. Order preserving encryption for numeric data.In: Proceedings of the 2004 ACM SIGMOD international conference on manage-ment of data. ACM; 2004.

Agrawal D, El Abbadi A, Emekci F, Metwally A. Database management as a service:challenges and opportunities. Data engineering. In: IEEE 25th internationalconference on ICDE'09. IEEE; 2009.

Boldyreva A, Chenette N, Lee Y, O’Neill N. Order-preserving symmetric encryption.Advances in Cryptology-EUROCRYPT 2009. Berlin, Heidelberg: Springer; 2009.p. 224–41.

Boldyreva A, Chenette N, O'Neill A. Order-preserving encryption revisited:improved security analysis and alternative solutions. Advances in Cryptology-CRYPTO 2011. Berlin, Heidelberg: Springer; 2011. p. 578–95.

Fung B, Wang K, Chen R, Yu PS. Privacy-preserving data publishing: a survey ofrecent developments. ACM Comput Surv CSUR 2010;42(4):14.

Kadhem H, Amagasa T, Kitagawa H. MV-OPES: multivalued-order preservingencryption scheme: a novel scheme for encrypting integer value to manydifferent values. IEICE Trans Inf Syst 2010;939:2520–33.

Lindell, Y, Pinkas B. Privacy preserving data mining. Advances in Cryptology-CRYPTO 2000. Berlin, Heidelberg: Springer; 2000.

Liu D, Wang S. Nonlinear order preserving index for encrypted database query inservice cloud environments. Concurr Comput Pract Exp 2013;2513:1967–84.

Popa RA, Redfield MSC, Zeldovich N, Balakrishnan H. CryptDB: protecting con-fidentiality with encrypted query processing. In: Proceedings of the twenty-third ACM symposium on operating systems principles. ACM; 2011.

Popa RA, Li FH, Zeldovich N. An ideal-security protocol for order-preservingencoding. In: 2013 IEEE symposium on IEEE Security and Privacy (S&P). 2013.

Lee S, Park TJ, Lee D, Nam T, Kim S. Chaotic order preserving encryption for efficientand secure queries on databases. IEICE Trans Inf Syst 2009;92(11):2207–17.

Vaidya, J, Clifton C. Privacy preserving association rule mining in verticallypartitioned data. In: Proceedings of the eighth ACM SIGKDD internationalconference on knowledge discovery and data mining. ACM; 2002.

Yum DH, Kim DS, Kim JS, Lee PJ, Hong SJ. Order-Preserving encryption for non-uniformly distributed plaintexts. Information security applications. Berlin,Heidelberg: Springer; 2012. p. 84–97.



http://refhub.elsevier.com/S1084-8045(14)00135-0/sbref1














new order preserving encryption model for outsourced databases in cloud environments

Documents