# nonlinear order preserving index for encrypted database query in service cloud environments

Post on 11-Dec-2016

214 views

Embed Size (px)

TRANSCRIPT

CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCEConcurrency Computat.: Pract. Exper. 2013; 25:19671984Published online 25 January 2013 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.2992

SPECIAL ISSUE PAPER

Nonlinear order preserving index for encrypted database queryin service cloud environments

Dongxi Liu*, and Shenlu Wang,

CSIRO ICT Centre, Marsfield, NSW 2122, Australia

SUMMARY

The database services on cloud are appearing as an attractive way of outsourcing databases. When a databaseis deployed on a cloud database service, the data security and privacy becomes a big concern for users.A straightforward way to address this concern is to encrypt the database. However, after encryption, thedatabase cannot be easily queried. In this paper, we propose a nonlinear order preserving scheme for indexingencrypted data, which facilitates the range queries over encrypted databases. The scheme is secure eventhere are a large number of duplicates in plaintexts. Moreover, our scheme allows the programmability ofbasic indexing expressions and thus provides the capability of hiding the distribution of plaintexts from thedistribution of indexes. This scheme is suitable for long-standing databases because its use does not needany assumption on the characteristics of database data, such as their distribution, range and number, whichmay change dramatically over time. Copyright 2013 John Wiley & Sons, Ltd.

Received 19 July 2012; Accepted 13 December 2012

KEY WORDS: database encryption; cloud database; secure index; query

1. INTRODUCTION

Cloud database services, such as Amazon Relational Database Service (RDS) and Microsoft SQLAzure, are appearing as an attractive way for enterprises to outsource their databases. In clouddatabase services, the hardware and software underlying databases are shared among users. Thedatabase services allow enterprises to deploy their databases quickly without making the largeinvestment on their proprietary hardware and software, hence reducing the total cost of ownership.Moreover, the database services on cloud can be elastic, meaning that an enterprise can dynami-cally increase or decrease the compute resources allocated to its databases according to its businessrequirements.

Although attractive as a new paradigm of data management, database services cannot be fullyexploited if the problem of data privacy and security cannot be addressed [1, 2]. When a databaseis deployed into a public database service, the service provider has the complete physical controlover the database. The data in the database might be improperly accessed by the untrusted clouddatabase administrators accidentally or intentionally or by attackers who compromise the databaseservice platforms. Because the database services are a kind of cloud computing services, the tech-niques of trusted cloud computing have the potential to be used to build trusted database services.However, there is still a gap of applying the techniques of trusted cloud computing such as [3, 4] toaddress the security and privacy problem in database services.

*Correspondence to: Dongxi Liu, CSIRO ICT Centre, Marsfield, NSW 2122, Australia.E-mail: dongxi.liu@csiro.auShenlu was a vacation student in CSIRO, coming from RMIT University

Copyright 2013 John Wiley & Sons, Ltd.

1968 D. LIU AND S. WANG

For cloud database services, a straightforward approach to addressing the security and privacyproblem is to encrypt the database. By this way, the untrusted cloud database administrators orattackers only can see meaningless ciphertexts. However, after being encrypted, a database cannotbe easily queried. It is not acceptable to decrypt the entire database before performing each querybecause the decryption might be very slow for a large database, and the decrypted database is againat the risk of having its security and privacy breached. Ideally, a query should be executed directlyover the encrypted database.

A database query can be an equality query, a range query, an aggregate query, or their combina-tions. In this paper, we focus on the problem of performing range queries over encrypted databases.For example, a range query can be select staffs who join the company between 2000 and 2012.For equality queries, they can be handled when a deterministic encryption scheme (e.g. AdvancedEncryption Standard (AES) in Electronic codebook (ECB) mode) is used, because in this scheme,the same plaintexts are always encrypted into the same ciphertexts. For aggregate queries of usingSUM and AVG operations, homomorphic encryption algorithms [5] are needed to sum and averageciphertexts directly. We have discussions on how to apply our method together with secure hashalgorithms and homomorphic encryption algorithms to deal with all types of queries over encrypteddatabases.

To deal with range queries over encrypted databases, an order preserving encryption scheme hasbeen proposed in [6]. In this scheme, the i th value in the plaintext domain is mapped to the i thvalue in the ciphertext domain, such that the order between plaintexts is preserved between cipher-texts. To use this scheme, users need to be able to model the distributions of values in the plaintextand ciphertext domains. However, when using cloud database services, an enterprise may not havedatabase professionals who know the techniques [7] needed for distribution modeling.

In addition, the scheme [6] can only deal with plaintexts in a finite domain. The cryptographicanalysis of the order preserving encryption scheme is performed in [8].

The work [1] shows a way of building order preserving polynomials, which are based on the poly-nomials proposed by Shamir for secret sharing [9]. However, to use this mechanism, the number ofplaintexts are needed to determine the range of coefficients in a polynomial. On the other hand, theevaluation results of order preserving polynomials may reveal the distribution of plaintexts, becausesimilar plaintexts are transformed with similar polynomials. As discussed in [6], the coupling ofthe plaintext distribution and the ciphertext distribution might be exploited by attackers to guess thescope of the plaintext for a ciphertext.

In [10], an indexing mechanism for range queries is proposed. This mechanism is not strictlyorder preserving because two different values may be mapped into the same bucket, which is usedwhen checking query conditions. The mechanism can lead to inaccuracy of query results, and hence,some post-processing is needed to remove unexpected query results.

In the previous work [11], we proposed an order preserving indexing scheme, which indexesplaintexts by using simple linear expressions of the form a x C b C noise. In such indexingexpressions, the coefficients a and b are kept secret (not known by untrusted cloud database adminis-trators), and noise is randomly sampled from some particular range, such that the order of plaintextsis preserved. As in [6,12], the threat model taken in our work assumes that untrusted cloud databaseadministrators can access only ciphertexts. However, even in this threat model, the indexing schemein [11] might become vulnerable when there are duplicates in plaintexts. This vulnerability is veryrealistic, because the duplicates of plaintexts can happen in realistic databases. For example, in acompany, all staffs at the same level usually have the same salary (i.e. duplicates of salaries).

In this paper, we propose the nonlinear indexing scheme to address the vulnerability of linearindexing. An nonlinear indexing expression has the form a f .x/ x C b C noise, where f .x/is a function over x. To keep the order preserving property, we determine the correctness require-ments to the function f .x/. Any functions satisfying the requirements can be used as f .x/ to definenonlinear indexing expressions. We have identified several instances of f .x/ and proven their cor-rectness, such as the logarithm function and the cosine function. The nonlinear indexing expressionscan keep a and the definition of f .x/ secret even when there are duplicates in plaintexts.

In the indexing scheme [11], programmability is a feature giving users the capability to unlinkthe distributions of plaintexts and indexes. That is, indexing expressions can be programmed to

Copyright 2013 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. 2013; 25:19671984DOI: 10.1002/cpe

NONLINEAR ORDER PRESERVING INDEX 1969

process plaintexts in different ranges with different indexing expressions. In this work, we stillallow the programmability of nonlinear indexing expressions. Moreover, the programmability isenhanced from two aspects. The first aspect is that the addition of two indexing expressions issupported as a new way to compose indexing expressions. For example, from two expressionsa1 f1.x/ x C b1 C noise1 and a2 f2.x/ x C b2 C noise2, we can build the following oneby addition.

.a1 f1.x/ C a2 f2.x// x C b1 C b2 C noise1 C noise2The composite indexing expressions make it harder for the untrusted administrators to guess the

secret values a1, a2 and the definitions of f1.x/ and f2.x/ even there are a large number of plaintextduplicates in a cloud database.

The second aspect of programmability enhancement is that the function f .x/ can also be com-posed. For example, suppose f1.x/ and f2.x/ are two functions satisfying the correctness require-ments, then their composition f1.f2.x// also satisfies the requirements and hence can be used inindexing expressions. The composition of f .x/ increases the robustness of indexing expressions bygenerating more complex forms of f .x/. For example, a function f .x/ can be composed from thelogarithm function and the cosine function.

Like the indexing scheme in [11], the nonlinear indexing scheme in thi

Recommended