Data confidentiality and keyword searchin the cloud using visual cryptography
Varun Maheshwari
School of Computer ScienceMcGill UniversityMontreal, Canada
December 2011
A thesis submitted to McGill University in partial fulfillment of the requirements forthe degree of Master of Science.
c© 2011 Varun Maheshwari
Dedication
To Mummy and Rohan, for their undying love and support.
iv
Acknowledgements This thesis would have been impossible without the support and mentoring of my supervisor, Muthucumaru Maheswaran. I was constantly inspired by his expertise on the subject matter, innovative insights and boundless optimism. His timely suggestions and guidance were unparallel towards the completion of this work.
A warm thanks to my colleagues at the Advanced Networking Research Laboratory for a wonderful and rewarding time. I received important suggestions and insights in the analysis of my results from Arash Nourian. A special thanks to Yijia Xu for proofreading my thesis. I am also specially thankful to Vince Forgetta for translating the abstract into French. I would also like to extend my appreciation of the facilities provided by the Laboratory, School of Computer Science and McGill University.
Finally, my heartfelt thanks to my mother and brother for always being there for me, and to whom I owe everything.
v
Abstract Security has emerged as the most feared aspect of cloud computing and a major hindrance for the customers. Current cloud framework does not allow encrypted data to be stored due to the absence of efficient searchable encryption schemes that allow query execution on a cloud database. Storing unencrypted data exposes the data not only to an external attacker but also to the cloud provider itself. Thus, trusting a provider with confidential data is highly risky.
To enable querying on a cloud database without compromising data confidentiality, we propose to use data obfuscation through visual cryptography. A new scheme for visual cryptography is developed and configured for the cloud for storing and retrieving textual data. Testing the system with query execution on a cloud database indicates full accuracy in record retrievals with negligible false positives. In addition, the system is resilient to attacks from within and outside the cloud. Since standard encryption and key management are avoided, our approach is computationally efficient and data confidentiality is maintained.
vi
Résumé La sécurité a émergé comme l'aspect le plus redouté de l’informatique en nuage et comme un obstacle majeur pour les clients. Le cadre actuel de l’informatique en nuage ne permet pas que les données chiffrées soient stockées en raison de l'absence de schémas efficaces de cryptage qui permettent l'exécution des requêtes sur une base de données des nuages. Le stockage des données non cryptées expose les données non seulement à un agresseur extérieur, mais aussi au fournisseur de nuage lui-même. Ainsi, faire confiance à un fournisseur avec des données confidentielles est très risqué.
Afin de permettre des requêtes sur une base de données des nuages sans compromettre la confidentialité des données, nous proposons d'utiliser l’obscurcissement des données à travers la cryptographie visuelle. Un nouveau schéma pour la cryptographie visuelle est développé et configuré pour le nuage pour stocker et récupérer des données textuelles. Tester le système avec l'exécution des requêtes sur une base de données nuée indique une grande précision dans la récupération des enregistrements avec négligeables faux positifs. En outre, le système est résistant aux attaques de l'intérieur et l'extérieur du nuage. Parce que le cryptage standard et la gestion des clés sont évités, notre approche est mathématiquement efficace et la confidentialité des données est assurée.
vii
Contents
Acknowledgement iv
Abstract v
Contents vii
List of Figures ix
List of Tables x 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Thesis contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Related Work 7 2.1 Encrypted cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Data encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 Searching on encrypted data . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.3 Infeasibility of encryption alone for the cloud . . . . . . . . . . . . 12 2.2 Data obfuscation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Visual cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 Data Confidentiality using Visual Cryptography 16 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Overall concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 Sending data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.1 Converting text to image . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.2 Image obfuscation using noise . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.3 Data division across the cloud . . . . . . . . . . . . . . . . . . . . . . . 25 3.3.4 Sending algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
viii
3.4 Retrieving data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4.1 Image retrieval from the records . . . . . . . . . . . . . . . . . . . . . 30 3.4.2 Matching the image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4.3 Retrieval algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.5 Complexity analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.6 Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4 Results 42 4.1 System design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.3.1 Noise parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3.2 Sending and retrieving data . . . . . . . . . . . . . . . . . . . . . . . . 48 4.3.3 NCC threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.4 Multiple queries on large dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.5 Running time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5 Threat Analysis 63 5.1 Threat scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.2 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6 Conclusions and Future Work 73 6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7 References 76
ix
List of Figures
3.1 Overall system concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Image library of ASCII characters . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.3 Normal distribution at different mean and variance . . . . . . . . . . . . . . . 223.4 Gaussian noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.5 Speckle noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.6 Data carried by each cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.7 Sending data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.8 Problem with creating the mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.9 Creating the correct mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.10 Pattern matching with NCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.11 Retrieving data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.1 Mean PSNR of all ASCII characters for Gaussian noise . . . . . . . . . . . . 454.2 Slope of mean PSNR of all ASCII characters for Gaussian noise . . . . . . 454.3 Images at different ( , ) with their PSNR values in dB . . . . . . . . . 464.4 Mean PSNR of all ASCII characters for speckle noise . . . . . . . . . . . . . 474.5 Slope of mean PSNR of all ASCII characters for speckle noise . . . . . . . 474.6 Optimum NCC threshold for Gaussian noise . . . . . . . . . . . . . . . . . . . . 534.7 Optimum NCC threshold for speckle noise . . . . . . . . . . . . . . . . . . . . . 544.8 Data retrieval on >5K and >10K character dataset . . . . . . . . . . . . . . . 595.1 Data when one cloud out of four is breached . . . . . . . . . . . . . . . . . . . . 645.2 Data when two clouds out of four are breached . . . . . . . . . . . . . . . . . . 645.3 False positives and successful search with four clouds . . . . . . . . . . . . . . 685.4 False positives and successful search with eight clouds . . . . . . . . . . . . . 68
x
List of Tables
4.1 NCC for Gaussian noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 NCC for speckle noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3 NCC threshold estimation for Gaussian noise . . . . . . . . . . . . . . . . . . . 52
4.4 NCC threshold estimation for speckle noise . . . . . . . . . . . . . . . . . . . . . 54
4.5 Data retrieval with multiple search queries on a>5K character dataset . . 56
4.6 Data retrieval with multiple search queries on a>10K character dataset . 58
4.7 Running time measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1 Threat scenarios tested . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2 Results for different attack scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . 69
1
Chapter 1
Introduction
In this chapter we briefly present the current scenario in the domain of cloud
computing security and the motivation for our thesis. The general organization of
the thesis is given at the end.
1.1 Motivation The ever increasing demand for large scale computing, combined with advances in
low cost yet fast networking technologies, has helped cloud computing to emerge
as a promising computing model. In contrast to traditional IT services, it
distinguishes itself as a high-performance Internet based technology which is
economically sustainable, both in terms of initial setup and maintenance. With a
slated revenue growth set to rise to about $150 billion by 2014 [23], cloud
computing has finally arrived on the IT landscape.
Ac loud provider offers numerous services, however, they can be essentially
classified into following categories [8]:
Infrastructure as a Service (IaaS): the consumer employs the
computing, storage and networking infrastructure from the provider
2 Introduction
and uses it to deploy and run its own software. The consumer
controls the operating system, storage and applications, however,
does not need to manage the underlying infrastructure.
Platform as a Service (PaaS): consumer can deploy its own software
applications on the provider's infrastructure. Like in IaaS, the
consumer does not manage the underlying infrastructure, however,
has control over the custom applications.
Software as a Service (SaaS): a consumer uses a cloud provider's
computing, networking, storage and individual application capabilities
without worrying to manage or control the underlying infrastructure.
Applications are accessible to various clients such as via a web-
browser.
Despite the service model of a cloud, the clouds may be deployed as one of
these:
Public cloud: cloud infrastructure is owned and managed by an
organization (cloud service provider) and is located outside the
customer's premises. The data is thus, out of the customer's control.
Providers like Amazon, Google and Microsoft come under this
category.
Private cloud: the infrastructure is owned and controlled by the
customer itself and located within the customer's premises. In
contrast to the public cloud, data is under the customer's control and
hence access is only given to trusted parties.
3
Hybrid cloud: two or more clouds, combining public and private ones
make a hybrid cloud, where specific parts of the infrastructure and
applications lie in public or private clouds.
While the benefits of using the cloud are clear and understood, some
problems remain unsolved. The biggest hurdle to existing cloud services is
security. The public cloud provider acting as a custodian of a customer's
confidential data has become a major concern for organizations planning a shift to
the cloud. Numerous studies, for instance, the IDC Cloud Services User Survey
identifies security as the primary concern for about three-fourth of IT
executives/CIOs.
The issue of security in cloud computing has gained attention in the
academia [4], [13], industry [33] and the government [38]. Potential customers,
especially enterprises and government organizations, which hold a lot of critical
data such as financial or medical records, are unwilling to trade the privacy for
the performance that the cloud promises. Major security loopholes have been
exposed not only in the early days of cloud computing with Google Docs [46], but
also recently with Amazon [18].
A good cloud provider must provide the following:
Data confidentiality: the cloud provider should not learn anything
from the customer data
Integrity: if a provider modifies the customer data, all such
unauthorized modifications must be detectable by the customer
1.1 Motivation
4 Introduction
Availability: data should be accessible by the customer from any
machine and at anytime
Also, the following are desirable in conjunction with the above:
Reliability: customer data should be backed up
Efficient retrieval: data should be retrievable efficiently
Sharing: customer should be able to share data with parties they
trust
Another concern is the geographical location of the data, as discussed in
[31], since there are currently no internationally agreed rules on data protection
and privacy. Depending on where the cloud provider has stored your data, that
country may exercise its right to investigate a customer's data. For instance, a
Canadian customer might be concerned about using SaaS in the United States
given the USA Patriot Act [43].
A solution to all these is using encryption/decryption scheme for data
storage and retrieval. However, this approach is highly inefficient and raises other
difficulties in its implementation. More importantly, it hinders the efficient
implementation of a database system on the cloud which can process queries while
maintaining data confidentiality and other attributes. This renders the standard
cryptography methods useless on the cloud, which is expected to be economical
yet deliver a high-performance. Hence, a new method for efficient data retrieval
from the cloud without compromising on data confidentiality is required.
5
1.2 Thesis contribution The main contribution of this thesis is a novel procedure to send and retrieve data
to and from a cloud using database style query without using standard
cryptography schemes, and thus offers efficient retrievals while maintaining data
confidentiality.
We propose to use data obfuscation instead of an encryption/decryption
scheme to achieve data confidentiality. In this work, we have come up with a
novel procedure for visual cryptography, which we will use to conduct obfuscation.
We show that using our procedure, information cannot be understood by the
cloud and is only decipherable by the user. Further, we show our system can be
used to retrieve records from a database using a database style query. Our system
can retrieve records which satisfy a single query. Moreover, the system is sensitive
to uppercase and lowercase characters and queries containing non-alphanumeric
characters such as the space character and brackets.
For this thesis we focus on retrieving records from a database which begin
with a query string, that is, equivalent to the database operation 'LIKE %query'.
We run tests with various configurations and analyze the possible threat
component to prove the practicality and confidentiality of our system.
To the best of our knowledge, this is the first work that uses visual
cryptography as a data obfuscation technique to achieve data confidentiality on
the cloud and runs database query for effective data retrieval.
1.1 Thesis contribution
6 Introduction
1.3 Thesis organization Chapter 2 presents related work in the domain of data confidentiality on the
cloud. We discuss why present systems which employ standard cryptography
procedure have not yet stood up to the task of performing efficient retrieval from
the cloud. We also discuss present visual cryptography schemes and their
relationship with data obfuscation.
Chapter 3 presents our main work along with detailed algorithms, working
examples and discussion on complexity of our approach. The background
information and metric used are discussed as well.
Results are presented in Chapter 4. Discussion and analysis are presented
to adjudge data confidentiality in our system. We also discuss the running time
for our approach and how it optimized.
Chapter 5 discusses how the system reacts to threats and attacks from
outside and within the cloud. Results are presented and we discuss possible
measures to further strengthen the security framework for our system.
We conclude with a final discussion and future work in Chapter 6.
7
Chapter 2 Related Work In this chapter we discuss the present work in the domain of data confidentiality
in the cloud, namely using encryption/decryption. Later we contrast our work in
visual cryptography with the existing work.
2.1 Encrypted cloud A customer naturally wants to protect his data on the cloud from unauthorized
access. If the resources on which data is stored is owned by the customer itself,
existing authentication and/or authorization measures can protect the data from
being disclosed, lost, corrupted or stolen. However, when the data is in the hands
of a third-party, that is, a public cloud provider, data is exposed. Data can be
sabotaged by an external attacker, which may hack some part of the cloud, or
even by an employee of the resource vendor [49], [50]. This means that, a
customer wants a two-fold security envelope: to protect data from attacks outside
the cloud provider and avoid the data being visible/available to the cloud itself.
One of the methods to accomplish this is to encrypt the data on the cloud. This
section discusses this in further detail.
8 Related Work
2.1.1 Data encryption
There is a vast literature on how to perform encryption on the cloud. Various
cutting edge algorithms have been proposed and proved, at least, theoretically
that they can secure the data on the cloud. Though, only few of them have been
actually tested on realistic systems to judge their practical deployment.
A high level architecture of a cryptographic cloud is presented in [32].
Essentially, a data processor at the customer encrypts the data and metadata
(size, keywords etc.) and sends it to the cloud. The key is stored with the
customer only. A data verifier at the customer can verify the integrity of the data
at anytime. A token generator at the customer side creates a token for data
retrieval and sends it to the cloud. Cloud retrieves the encrypted data using the
token and sends it to the customer. Customer then uses the decryption key it had
to decrypt and retrieve the data. If a customer wants to grant access to another
user, a token can be sent to the new user for communicating with the cloud.
Such a scheme is suitable for simple storage operations. The Amazon
Simple Storage Service, S3, works in a similar manner. S3 authenticates its user
using encryption, but data is not encrypted by default. However, a user can
encrypt the data and store it on S3 [3].
The advantages of an encrypted cloud are well documented and discussed
such as in [8], [11], [42]. Namely, data is controlled and maintained by the user,
and cryptography provides a strong secure framework for data confidentiality.
The risk of an untrustworthy cloud which may access your data, or an attack
9
from outside the cloud are both mitigated. Data is also protected from legal
jurisdiction arising from the location of the data, such as, US government using
the Patriot Act to collect the data [19]. An encrypted data is immune to such
jurisdictions as without the key to decrypt it, encrypted data is meaningless. Also,
integrity of the data can be readily verified anytime by the user to detect security
breach or data corruption.
2.1.2 Searching on encrypted data
Although encryption seems a favourable solution to data confidentiality, an
important aspect has not yet been discussed. Can we perform efficient operations
on an encrypted data? A cloud acting as a simple storage platform is not a
feasible economic model. It must enable us to perform operations on the data. At
the very least, a cloud must act as a virtual database which can process queries
and retrieve records from. Until now no major cloud provider has come up with a
facility to search on encrypted data. For instance, on Amazon's SimpleDB,
encrypted data cannot not be used as a part of query filtering conditions [2]. The
only way to run queries on encrypted data is to retrieve the entire dataset,
decrypt it and then run queries. Clearly, it is not a practical solution for large
databases. Even with technologies like Transparent Data Encryption used by
Microsoft and Oracle [54], which encrypts the physical files rather than the data
itself, one still cannot run queries on encrypted data and must rely on decryption
of the data before querying.
2.1 Encrypted cloud
10 Related Work
In the past decade, a lot of literature has emerged in the domain of
searchable encryption. Such a scheme generates a search index, over the full-text
or a keyword, and encrypts this index. An authorized user is given a token, using
which files that contain a keyword can be retrieved. The token can only be
generated using a key which is only available to an authorized user. Without the
token, the index is not revealed. The output of such a retrieval process does not
reveal the contents of the files. It only indicates that the files have a keyword in
common.
First searchable encryption scheme was proposed in [55]. In the early years,
only single-keyword search was supported [1], [14], [20]. Multi-keyword search
involving conjunctive and disjunctive queries have been proposed in [6], [10], [27],
[51]. Also, [28] and later improved in [35], introduced the concept of searching on
a part of the encrypted data using attribute based encryption which also
concealed the token. These work indicate the possibility of performing search on
encrypted data, however, no real system even at a moderate scale has proved the
practicality of encrypted search. The complexity of such operations is high,
making them impractical for large scale operations such as on a cloud.
Recently, fully homomorphic encryption was developed in [24]. It allows
algebraic operations to be performed on plain data such that it is equivalent to
performing the operation on encrypted data. It is described as the "holy grail" of
cryptography. It would enable searching on encrypted data in a cloud system as if
searching on unencrypted data without exposing the data or the search query to
the cloud provider. The theoretical framework is in place but it is still far from
11
being practical. It takes immense computational power. A single Google search
would take would be one trillion times slower with fully homomorphic encryption
and in the future is expected to be reduced to 100,000 times the computation
required for unencrypted computing [22]. In [25] and more recently in [26], it is
evident that fully homomorphic encryption is not yet sufficiently efficient enough
for any practical application.
The idea of using a hybrid cloud has also emerged in the academia. The
work in [12] proposes to use two clouds. A cloud which is trusted encrypts the
data in the setup phase and performs security-critical operations. This data is sent
to an un-trusted cloud which handles high load of queries and communicates with
the trusted cloud using a secure channel. Still the performance aspect is not
discussed and we believe it is not as efficient as if encryption is still involved.
Secondly, the reliance on assigning one cloud as the trusted party does not
eliminate the possibility of a data breach within that cloud.
Using trusted hardware instead of a software service provider is introduced
in [5]. A server-side trusted hardware is used to achieve full privacy control on the
data. However, trusted hardware has performance bottleneck when working with
large data due to heat dissipation and are constrained in computing and memory
capacity. Besides, although their system achieves better performance than a
complete encryption based system, it is far less efficient than querying on an
unencrypted database. A trusted hardware is also not immune to attacks, such as,
during bootstrapping [45].
2.1 Encrypted cloud
12 Related Work
Most recently secure ranked keyword search over encrypted cloud data has
been proposed [58]. In this model, a secure searchable index is built from distinct
keywords that are a part of the data. The index and the encrypted data are sent
to the cloud and when a query in the form of a keyword is received, cloud
searches the index and returns ranked results. Searching over untrustworthy
servers and retrieving certain top results using a confidential index were discussed
in [62]. While, [56] presents ranked search over order preserving cryptographic
function. However, [62] is inefficient and [56] does not support dynamic changes in
score. The work in [58] uses a new order preserving symmetric encryption scheme
[9].
These work closely address the problem we present in this work, however,
there are various scenarios where a keyword search linking to a document is not
valid. For instance, trying to setup a confidential meeting or a personal task list,
the keyword-document pairing mechanism will not be appropriate as there are no
indexed documents for each entry. Also, the calendar for such tasks will be fed
into a relational database model. This makes our approach more suitable as we
address the problem as a natural database query operation, LIKE. Using our
approach a wider spectrum of problems can be addressed than the ranked
keyword search over encrypted data schemes.
2.1.3 Infeasibility of encryption alone for the cloud
Following from the previous section we can observe that encryption is not a viable
option for data confidentiality on the cloud. Apart from the overhead involved in
13
the process itself, a large overhead is involved in key management and keeping
safe. Apart from searchable encryption, Private Information Retrieval (PIR) [16],
which allows the data owner himself to query a database, faces similar challenges
as the former. The failure of cryptography as a stand-alone for achieving privacy
on the cloud is proved in [21]. They indicate that even full homomorphic
encryption is not capable for the cloud environment. The cost incurred by
employing cryptography on the cloud renders infeasible if considering only core
technology costs even with basic schemes such as AES, MD5, SHA-1, RSA and
DSA [15]. In addition, [15] also states that to break even, we need large amount of
data in the cloud (e.g. 109 tuples) and queries which return an infinitesimal
fraction of the data (e.g., 0.00037%).
Hence, we definitely need a new paradigm for searching on the cloud which
is efficient and provides data confidentiality. In our work we propose to use data
obfuscation using visual cryptography. To the best of our knowledge, no work has
been presented which combines visual cryptography with application to the cloud.
We do not use any standard encryption procedure that involves keys and keys
management. Our decryption procedure does not use any keys either and relies on
the human visual system to verify the data retrieved.
2.2 Data obfuscation Protecting a database using data obfuscation is a promising approach as it avoids
the large computational requirements associated with searching on an encrypted
database, and renders a feasible and practical system.
2.2 Data obfuscation
14 Related Work
The impossibility of achieving full confidentiality using obfuscation was
mentioned in [7], however, their work revolves around the standard encryption
paradigm and works for certain classes of operations. Our work achieves a "virtual
black box" model as we consider a different set of operations without using
encryption. Obfuscating the database such that only certain queries run on it, is
presented in [41]. Their work relies on server-side solutions, thus relies on a
trusted provider. In [39], obfuscation from a cloud perspective is introduced,
however, only weak privacy concerns are addressed, and some information about
the data is made available to the service provider. Also, [39], [41] involve key
management which introduces overhead.
There have been attempts at obfuscating the search query itself using some
noise [61]. Although, as highlighted in [47], an attacker can easily determine the
query using rudimentary attacks. Thus, a noisy query will not help in achieving
privacy on the cloud. Moreover, such a scheme assumes that the provider is a
trusted entity, which is not true all the time and exposes the system to various
threats as discussed in the preceding sections.
Thus, we believe the data along with the query itself, needs to obfuscated
to protect the data from an untrusted provider. We propose to use visual
cryptography to achieve data confidentiality.
2.3 Visual cryptography Visual cryptography was introduced in [40]. It relies on decryption using only the
human visual system where the data is in a visual form, such as, printed text or
15
pictures. Thus, it avoids the huge computational complexity associated with
standard encryption schemes discussed in preceding sections. Visual cryptography
relies on breaking up an image into multiple shares such that the image can be
reconstructed only when all the shares are available. A share is printed separately
and when all the shares are superimposed, the original image can be revealed.
While the preliminary work in this domain involved only black and white,
single images, later schemes involved two images and two shares [60], and later
[53] generalized for multiple images into two shares. This approach was extended
to colored images in [57] and more recently in [52], [59].
We consider using multiple cloud providers for data confidentiality. Data,
which is plaintext, is converted to images. Random noise is added to the images
to corrupt the data. Each image is divided into equal sized multiple shares and
sent to separate cloud providers. Cloud providers are not aware of each other's
presence and hence data is disjoint in the cloud. For data retrieval, a novel
method to create a mask is introduced which is used to reconstruct the original
data. Human visual system verifies the results.
We believe it's the first time visual cryptography has been used to achieve
data confidentiality on a cloud system and retrieve database style records based
on a search query.
16
Chapter 3 Data confidentiality using visual cryptography In this chapter, we describe the overall system design and algorithms developed in
detail. The necessary background information including the terminology, metrics
employed etc. are also explained.
3.1 Introduction Our work mainly deals with using visual cryptography to secure data in the cloud
instead of using encryption schemes. As discussed in the Chapter 2, all the
existing work in the domain of cloud computing security is focused on using some
encryption scheme for sending and retrieving data. This makes performing
operations such as a numerical calculation or a database query almost impossible
or highly inefficient. Our approach is to avoid using encryption. Instead, we
employ visual cryptography for sending and retrieving data. Moreover, we show
how efficient database operations can be without losing any data in the process.
Experimental results pertaining to our work are presented and analyzed in the
next chapter.
Data confidentiality using visual cryptography 17
crop the image into parts equal
to the #cloud providers
If criteria
satisfied
cloud sends
the image to
the user
user combines
the image and
evaluates it
using certain
metrics
retrieve the record
pertaining to the
query from the
clouds
Data to be retrieved based on a query
request data from
each cloud
3.2 Overall concept The overall concept of the system is simple: instead of sending the data in its raw
form to one cloud, convert the data into basic images and send part of the image
to different independent clouds. Figure 3.1 illustrates the process.
send to
cloud
convert to
image
Data to be sent to the cloud
#1 #2 #3 #4
18
We are considering data that only consists of ASCII characters. Data
containing images is not yet compatible with our system. The data should be in
the form of a simple text file, if it is not, it must be converted to the requisite
format. Following which, each ASCII character in the text file is represented by
an image. Each image is then cropped into as many equal parts as the number of
cloud service providers. Each part is sent to a different cloud. Note that all cloud
providers are independent and are not aware of each other's presence. The clouds
return an address to the user of where his data is stored. Like in a regular
database, the data itself is stored as database records
For data retrieval, the system is designed to respond to database style
queries, for instance, using the LIKE keyword means the user want to retrieve
records that contain a certain string of characters. The user searches the records
that satisfy the query and calculates certain metrics. Based on the result of these
metrics, the system retrieves the record from each cloud. Image merging and data
recovery occurs at the user end.
3.3 Sending data The user has some data in a simple text file that is to be uploaded to the clouds.
If the data is in another format, it must be converted to text. The data can
consist of any printable ASCII character. (Out of the total 127 ASCII characters,
94 are printable.) We will create a library of images corresponding to each ASCII
3.3 Sending data
Data confidentiality using visual cryptography 19
character at the user end. The library needs to be constructed only once at the
user end. The next step is to convert the text into images.
3.3.1 Converting text to image
We propose using the BMP file format for converting text to images. The bits
representing the pixels are packed in rows, which allows easy image manipulation
for our system. To begin with, we create an ASCII library of images in BMP
format, (henceforth referred to as lib). Each image is constructed in Grayscale, 1-
bit depth and two colors, thus is in black and white. We fix the resolution as
40×40, with which each image is 382 bytes in size.
The image resolution can be increased if number of cloud providers to
which image is to be cropped and sent, is high. However, we would not want one
cloud storing too many parts of an image for security reasons. Thus, 40×40 seems
an optimum size. Also, note that other image formats such as JPEG or TIFF will
not allow 1-bit per pixel images, and thus will lead to larger file size. To allow
scalability on the cloud, we try to keep the image file size as low as possible.
Figure 3.2 shows some of the ASCII characters and their respective BMP
images. The images are displayed in their actual 40×40 resolution without any
resizing. These were generated using font Arial, style 'bold'. The lib can be
generated using any font, but preferably it should be close to the font used in the
text file. The text file has font as Lucida Console, style regular and size 10.
20
a
A
s
S
3
8
#
!
+
(
Figure 3.2 Image library of ASCII characters.
3.3 Sending data
Data confidentiality using visual cryptography 21
(1)
3.3.2 Image obfuscation using noise
As data is read, its equivalent images are obfuscated by adding noise to each
image. Adding noise to the data severely decreases the possibility of discovering
the actual text. We propose to focus on Gaussian and speckle noise for our system
since they are two most important and common categories of noises for images.
We will use these noises for our system and compare their performance.
Gaussian noise
For Gaussian noise, the noise density follows a normal distribution, also known as
the Gaussian distribution. Mathematically, it is explained by (1). The mean and
1
√2
variance identify the normal distribution. The noise if often represented as
N( , ). In the following sections of the thesis, we will represent Gaussian noise
simply as ( , ). The mean controls where the mean of the distribution lies and
variance measures the width, that is, how concentrated the distribution is around
its mean. Generally speaking, as the variance increases, the image becomes
noisier.
The standard normal distribution has zero mean. Varying the mean
introduces skewness in the data, thus makes the distribution asymmetric. When
the mean is positive, distribution becomes right skewed and when the mean is
negative we get a left skewed distribution. Figure 3.3 illustrates the property of
skewness for Gaussian noise. Having a positive mean results in more high values,
that is white pixels (=1) as noise. The opposite occurs with a negative mean.
22
Figure 3.3 Normal distribution at different mean and variance.
While adding noise to the images, the original pixel is replaced by a black
or white pixel. So image size and other attributes remain the same after noise
addition. Note that in the original image, the text is black on a white background.
Thus, if the noise is composed of too many high values, that is has a positive
mean, the black colored text will have large white colored noise on it, however,
the original white background will have less black colored noise added to it.
Black pixels which are added to the text as noise, simply replace the
original black pixel on text with a noisy black pixel, thus text is not affected by
noisy black pixels. The opposite holds for the white background. The black noise
added to the background will be visible, but white pixels added as noise will
simply swap the original '1' in the background with a '1'.
3.3 Sending data
Data confidentiality using visual cryptography 23
(0, 0.30)
(0.20, 0.30) (0.10, 0.30)
(2)
On the other hand, if the mean is negative, the background will be very
noisy as black colored pixels will be added as noise, however, the black text will
be much less noisy with very few white pixels on it. Figure 3.4 illustrates the
above concept. As it is evident, positive and negative mean disproportinally affect
the images for Gaussian noise. In the next chapter we show the experiments
carried out for determinig the optimum noise parameter for our system.
Figure 3.4 Gaussian noise.
Speckle noise
Speckle noise, also known as impulse noise, is multipicative in nature as indicated
in (2). Iorig is the original image and Inoise the output after adding noise, and n is
uniformly distributed random noise with mean 0 and variance .
Inoise Iorig n *Iorig
As speckle noise has = 0, we need to vary only . Due to the nature of
the noise, only black colored pixels are added to the image as noise. Thus, when
speckle noise is added, the original white pixels in the background are replaced by
black pixels, while the original black pixels on the text are replaced by black
(0, 0.50)
(-0.10, 0.30)(-0.20, 0.30) (0, 0)
24
(3)
0 0.15 0.39 0.50 1.00 1.50 2.00 3.00
(4)
pixels itself. Figure 3.5 shows sample images when speckle noise is added at
different variances. Note that mean is 0 in all cases.
Figure 3.5 Speckle noise.
Peak signal-to-noise ratio
Peak signal-to-noise ratio or PSNR is a measure of the ratio of maximum possible
signal power to the power of noise. It is expressed in decibels (dB) and defined as
below,
1, ,
20 ·√
The mean-square-error or MSE for two images I and K, both of dimensions m×n,
quanitfies the difference between the two images. In (4), MAXI is the maximum
possible pixel value in image I. PSNR is preferred since its mathematical
complexity is least among quality metrics such as SSIM and VQM, thus will put
least burden on the cloud while remainign a strong quality metric.
A lower PSNR implies lower quality of a signal. Thus, to increas the obfuscation
level and make it harder for an authorised user to determine the actual text, we
need a low PSNR. To determine how PSNR affects the data in our system, we
carried out experiments which are shown in Chapter 4.
3.3 Sending data
Data confidentiality using visual cryptography 25
3.3.3 Data division across the cloud
After adding noise to the bitmap images, each image is split into as many equal
parts as the number of cloud providers. The cloud providers are independent and
are not aware of each other's presence or data held by other providers. The data
held by them is disjoint, that is, no two providers have any part of the data
common between them. Segregating the data among different clouds, maintains
the disjoint property and better preserves data confidentiality. This way even if
one provider is hacked and an unauthorized user accesses the data, the user is still
safe since only a small part of the data will be revealed. With respect to the
current visual cryptography schemes as discussed in Section 1.3, each split part of
the original image is essentially a secret image or a share.
For our system, we consider four cloud service providers. If the number of
cloud providers increase, the data is further divided, thus ensuring that each cloud
has a minimal amount of data. The affect of changing the number of providers on
the security of the data is experimented in Chapter 5. Figure 3.6 illustrates how
data will be handled by each cloud for the word 'Once' with four and record size
also four. All the characters, including spaces, will be split similarly.
Figure 3.6 Data carried by each cloud.
record
1 2 3 4
cloud 1
cloud 2
cloud 3
cloud 4
Once
26
3.3.4 Sending algorithm
In the previous sections we presented in detail the procedure of preparing the
data. Here we present the pseudocode for the algorithm, 'Algorithm 1: SEND
(input_data)', where input_data is a text file to be sent to the cloud.
Algorithm 1: SEND (input_data)
1 ns number of cloud service providers/servers 2 rs record size, that is, number of images per record 3 lib library of ASCII images in .bmp each of size p×p where (p mod ns) = 0 4 tmp temporary working directory 5 nor number of records 6 char[ ] 0, array to read data character by character until end_of_line is reached 7 length 0, length of char[ ] 8 while input_file 0 9 char[ ] read from input_data each character until end_of_line 10 for i 1 to char[i] '\0' do 11 copy to tmp char[i].bmp from lib 12 i i + 1, length length + 1 13 end for 14 add noise to all images in tmp 15 c 0 16 r 1 17 for j 1 to length do 18 crop char[j].bmp in tmp into ns parts, output is ns images of size (p/ns)×p 19 for k 1 to ns to 20 transmit ns parts of char[j]-ns.bmp to cloud k 21 k k + 1 22 end for 23 c c + 1 24 if c == rs then 25 c 0, r r + 1 26 end if 27 j j + 1 28 end for 29 end while 30 nor r
3.3 Sending data
Data confidentiality using visual cryptography 27
In our system, we have four cloud providers, thus ns = 4. In Chapter 4 we
present the results and discuss the affect of increasing the cloud providers to 8.
Record size, rs, is 8 or 16 and the results for both are presented in the next
chaper. As we saw in Section 4.3.1, with an image size of 382 bytes, the record
size is 3056 or 6112 bytes, for rs 8 and 16 respectively. We read the data
character by character until we reach the end of line, '\0'. For the length of the
line, we copy the respective characters from our library of ASCII images lib into a
temporary directory tmp.
Next, the noise, Gaussian or speckle in our case, is added to all these
characters. Then, each character is split into as many equal parts as the number
of servers, ns. This requires that (p mod ns) = 0 for a p×p image. In our case we
have p as 40. The size of the image can be changed to fit the space requirements.
We believe that working with smaller images will reduce the space and
computation requirements. Thus, a p×p image is converted to ns images of size
(p/ns)×p. That is, for a 40×40 image and with 4 clouds, we get 4 images of size
10×40 if cropped row-wise.
Cropping along the column is also an alternative. For better security, other
combinations such as diagonal or even a random crop can be done. In the case of
random cropping, the user will have to store the random sequence so that on data
retrieval, user knows how to combine the images received from the clouds. Note
that random cropping puts more computational burden at the user. Lastly, the
number of records, each of size rs, is stored as nor, and will be used for data
retrieval. Figure 3.7 illustrates the sending process.
28
read the respective images from lib
add noise to images
split each image and send to clouds
cloud 1
cloud 2
cloud 3
cloud 4
1 2 3 4 5 6 7 8
position in a record
Figure 3.7 Sending data.
Note that even though clouds are independent, the system behaves as one
to the user. The user first logs into each of his cloud provider's account to whom
data is to be sent. When the user gives the command to send, the system connects
to the specified clouds internally, establishes independent channels of
communication, and sends the data.
H i c l o u d
3.3 Sending data
Hi cloud
text to be sent to the cloud
read each character in the text
Data confidentiality using visual cryptography 29
3.4 Retrieving data
The overall motive for our work is to construct a system which can allow a user
to perform database operations on the cloud. A user must be able to send
database style queries to the cloud and the cloud must return records which
satisfy the query. In our work, we concentrated on developing a system which can
handle the LIKE database query. To be precise, the user should be able to retrieve
records beginning with a certain string, that is, in terms of database terminology:
'LIKE %query'. The seach query is composed in a simple text file and can consist
of any ASCII character.
Since our objective is to retrieve records which begin with a certain query,
for a query of length l, we need to evaluate the first l locations in a record. Then,
we need to check if each of the first l images in a record match with the respective
equivalent images of the search query. If the query string matches with the images
in the first l locations of the record, then we retrieve the record and send it to the
user.
The key to the procedure above is pattern matching. We need to
determine a suitable method to match the noisy data in the server with the
unnoisy images of the search query. Then a suitable metric needs to be assigned
to determine whether a match is found or not. We must also consider false
positives and unsuccessful searches. In the subsequent sections we will discuss
these issues further.
30
3.4.1 Image retrieval from the records
Once the query is received, the system retrieves the images stored at selected
locations in the beginning of a record from each provider. Recall that the number
of records at each cloud is the same and contain only a part of the original image.
The first image from the first record in each server is retrieved. Aligning
these images together, we will get the original noisy image of a character which
was cropped and sent to individual clouds. Thus, for instance, an original 40×40
noisy image of a character was split into four 10×40 images and sent to four cloud
providers. The same is repeated for all characters and they are stored in records of
size eigth at each cloud.
Using the terminology from Algorithm 1, let rij[k] indicate an element at
server i (1 i ns), record numbered j (1 j nor) and location of the element
inside record j, is k (1 k rs). Note that for our system ns=4, rs=8 and nor
depends on the length of the input text and rs. Then, in the first stpe of data
retrieval, images at record r11[1], r21[1], r31[1], r41[1] are retrieved and assembled to
produce the original noisy image of a single character. This character is matched
with the first character in the query string, which is not noisy. If a match is found
then r11[2], r21[2], r31[2], r41[2] are retrieved and assembled, and the constructed
image is matched with the second character in the query. This is repeated for all
the characters in the query string.
If a match is not found then r12[1], r22[1], r32[1], r42[1] is retrieved and
procedure repeated. The process of performing the match is explained in the next
section.
3.4 Retrieving data
Data confidentiality using visual cryptography 31
unnoisy image stored in lib
noisy image from the cloud
bitwiseAND
mask
3.4.2 Matching the images
The images retrieved from the clouds are noisy. Pattern matching on these images
generates extremely chaotic results. Images must be denoised first.
Creating the mask
Note that we are working on bitmap images and each pixel is simply black or
white. The text is black and background is white. Thus, performing a bitwise
AND operation between noisy and unnoisy images of the same character will
produce an unnoisy image, which we call the mask. Then, we can compare the
mask with the actual image of the character in lib, which will result in an almost
perfect match. However, there is a problem as illustrated in Figure 3.8. The
fugure illustrates it for Gaussian noise.
Figure 3.8 Problem with creating the mask.
Clearly, the mask is noisy whereas we expected an unnoisy image. This is
because bitwise AND produces a 1 (=white) only when both inputs are 1 and in
all other cases we get output as 0 (=black). In the background, unnoisy image is
all 1, while noisy has both 0 and 1. Thus, the output remains noisy in the
32
unnoisy image stored in lib
noisy image from the cloud
bitwiseAND
intermediate mask
bitwise
NOT
mask
noisy image from the cloud
bitwiseAND
intermediate mask
mask
unnoisy image stored in lib
bitwise
NOTbitwise
NOT
bitwise
NOT
bitwise
NOT
bitwise
NOT
background. For the text, original image is all 0 in that part, and in noisy image
is 1 and 0, thus we get 0, that is black, as the text color in the mask. We cannot
perform a pattern matching with so much noise in the mask.
Instead, if we perform a bitwise NOT on the original and noisy images and
then perform AND between them to, and then again NOT the output of the last
step we will get a perfect mask. Note that in this mask the background is all
white, however, the text will remains noisy for Gaussian noise, while for speckle
noise, even the text will be perfectly unnoisy. For details on this refer to Section
3.3.2. Thus, if original image is Iorig and noisy image is Inoisy, we generate the mask
using equation (5) and the process is illustrated in Figure 3.9.
NOT NOT Iorig AND NOT Inoisy mask
Gaussian noise
Speckle noise
Figure 3.9 Creating the correct mask.
(5)
3.4 Retrieving data
Data confidentiality using visual cryptography 33
Using De Morgan's law as expressed in (6),
A OR B NOT NOT A AND NOT B
we get, NOT NOT Iorig AND NOT Iorig Iorig OR Inoisy.
Thus, mask Iorig OR Inoisy.
With respect to data retrieval, the mask is created between the search query
character and record rij[k]. The masks are created on the fly and pattern matching
is performed as explained below.
Normalized cross-correlation
To perform the matching between two images, we employ the normalized cross-
correlation (NCC) metric as mentioned in [34] and indicated below in (9),
11
, ,
,
where, the similarity between images , and , , each having pixels is
calculated using (9). The variable and is the average and standard deviation
of respectively, and same holds for . The output is 1 if images match perfectly
and 0 if they do not match. We use NCC to determine if a mask and the search
query image match. As noted before, a mask is created every time an image in the
record is accessed during data retrieval. The NCC value, labelled as ncc
henceforth, between the mask and the character which is to be searched, is then
calculated. Detailed experiments were conducted using this metric, the results of
which are presented in Chapter 5. As an example, Figure 3.10 illustrated the
nature of the NCC metric.
(7)
(8)
(6)
(9)
34
unnoisy image stored in lib
noisy image from cloud
mask
ncc 0.5730 0.9020
noisy image from cloud
mask
ncc1.0000 0.7469
Gaussian noise Speckle noise
Figure 3.10 Pattern matching with NCC.
Note that ncc is between 0 and 1. It is important to define a threshold
value, thv, such that a match is considered positive only if ncc is beyond a the
threshold. If a very low threshold is chosen, we will have numerous false positives
where a match is determined by the system even though ncc is very low. If the
threshold is high, and image is very noisy, it might result in more unsuccessful
searches, since ncc will be low in case of high noise. Note that avoiding high
threshold is important for Gaussian noise because the mask created in this case
has a noisy text. Thus, ncc will never be 1 for Gaussian noise, and in fact, it will
drop as the level of noise increases in the characters present in the cloud. On the
other hand, high threshold does not affect speckle noise as the mask is a perfect
unnoised image which will produce ncc of 1 in case of a match, irrespective of the
amount of noise.
3.4 Retrieving data
Data confidentiality using visual cryptography 35
3.4.3 Retrieval algorithm
Having presented in detail the procedure for data retrieval from the cloud in the
preceding sections, here we present the pseudocode of the retrieval algorithm,
'Algorithm 2: RETRIEVE(search_query)'. The content of the search_query is
contained in a text file and we want to retrieve records from the cloud which
begin with it. Only new variables introduced are indicated, while the rest are
explained in Agorithm 1. The variables ns and rs are same as the ones in the
sending and retrieval procedure and are specified in Algorithm 1. Variable nor
depends on rs and number of characters in the input_data and is sent by the
cloud to the user after send (input_data) is called. The search_query must not
exceed the maximum size of the record, that is, rs to avoid overflow.
Algorithm 2: RETRIEVE (search_query, ns, rs, nor)
1 query[ ] 0, array to read query character by character until end_of_string 2 length 0, length of query[ ] 3 fin directory where retrieved records are placed 4 ncc normalized cross-correlation (NCC) value 5 thv NCC threshold value 6 loc location identifier for a record rij[k] (refer to Section 4.4.1 for details) 7 found a boolean indicating whether record matching the query was found or not 8 query[ ] read from search_query each character until end_of_string 9 length length of query[ ] 10 for j 1 to nor do 11 found 1 12 for k 1 to length and found 1 do 13 loc 0 14 for i 1 to ns do 15 loc record location rij[k] 16 i i + 1 17 end for
36
18 noise[k] append records identified in loc and place them in tmp 19 original[k] copy query[k].bmp from lib to tmp 20 mask[k] original[k] OR noise[k] 21 ncc perform NCC between mask[k] and original[k] 22 if ncc thv then 23 found = 0 24 else found = 1 25 end if 26 k k + 1 27 end for 28 if found = 1 then 29 for k 1 to rs do 30 loc 0 31 for i 1 to ns do 32 loc record location rij[k] 33 i i + 1 34 end for 35 image[k] append records identified in loc and place them in fin 36 k k + 1 37 end for 38 complete record satisfying the search_query is retrieved and placed in fin 39 else record not found 40 j j + 1 41 end for
The details of the algorithm have been already discussed in detail in the
preceding sections. Below, Figure 3.11 illustrates the retrieval algorithm. Let the
input_data be computer science, which we already stored in the clouds with
number of cloud providers, ns, being 4 and record size, rs, being 8. Thus, as the
number of characters in input_data are 16, number of records, nor, will be 2. We
want to retrieve records that begin with computer, which constitutes the
search_query. Assume we are working with Gaussian noise with and as 0
and 0.30 respectively. Let ncc threshold, thv, be 0.8750.
3.4 Retrieving data
Data confidentiality using visual cryptography 37
character matches: present record may contain the query
True
mask
1 2 3 4 5 6 7 8
position in record
2 cloud 1
1
2
1
2
2
cloud 2
cloud 3
cloud 4
search_query: computer
cloud system
ncc
False
character does not match: present record does not
contain the query
Figure 3.11 Retrieving data.
1
1
retrieve records: r111, r211, r311, r411
record
number
r111r211r311r411
append
retrieve unnoisy
c from lib mask creation
normalized cross‐correlation
retrieve records:r112, r212, r312, r412
and repeat until end of string in search_query
retrieve records: r121, r221, r321, r421
and repeat until end of string in search_query
ncc > thv
request access to each cloud
38
3.5 Complexity analysis
Analysing Algorithm 1 for sending data, we observe its time complexity is O ,
where is the number of characters in input text input_data. Initially we are
simply reading the input data and copying the respective images is a O 1
operation. Adding noise to the data is also O 1 . Cropping each image into as
many parts as the number of clouds and subsequently storing them at the cloud is
O 1 as well. For characters in the input text the time complexity is O .
For Algorithm 2, time complexity is dominated by the NCC procedure.
Clearly, the process of reading and assembling the images from the records,
calculating the mask and finally retrieving the complete record, is O / · ,
where / is the number of records and is the number of characters in the
query and . However, NCC procedure takes O using the algorithm
described in [34], where image dimensions are and template to be matched
is . In our case , where is the pixel dimension of the image such
that mod 0 . Thus, the complexity of retrieval is O / · · .
(Recall = 40 in our experiments and each image is 382 bytes.) Comparing this
complexity to standard encryption based schemes as presented in Section 1.1,
namely the recent work in [5], [37], our system incurs a far lower complexity.
The storage requirement at the cloud is O since in our scheme we are
storing images. This is more than the storage used when data is stored only as
unencrypted plaintext. It might seem that our scheme incurs slightly higher space
overhead than that incurred by encryption based schemes, however, the latter
3.5 Complexity analysis
Data confidentiality using visual cryptography 39
incur large key management overhead as well. However, given how cheap storage
is on the cloud1, a slight increase in storage requirements is a small price for data
confidentiality and ability to run queries on the cloud. Clearly, the latter is still
not possible in encryption based schemes.
Regarding to the visual cryptography schemes discussed in Section 1.3.3,
the number of secret images or shares [40] is in our system. The space required
to store shares is equal to the space requirement for the original unnoisy
image. Adding noise does not increase the image size. Following noise addition, we
crop the noisy image into equal parts which are then directly stored at the
designated clouds without incurring any unnecessary space overhead.
The variable pixel expansion [40], which is the number of sub-pixels in the
generated shares that represent a pixel from the original image, is one for our
system, the minimum possible. A larger pixel expansion, as used in other visual
cryptography schemes [48], results in large storage requirements. In addition, we
are working with binary two-color images, which results in a small image size. We
do not believe using colored images instead is a better choice as that will lead to
large computational and storage overheads.
Thus, the total space requirement to store one image is O 1/ · =
O 1 . Moving the entire data to the cloud is thus O . Hence, in terms of cost
incurred by the user, the operation of storing parts of an image at clouds is
equivalent to storing one image at one cloud. Comparing our result with other
visual cryptography schemes [48], our approach is better in terms of complexity.
We discuss more on the security aspect in the following section.
1 For Amazon's S3, first 1 TB is $0.140 per GB and next 49 TB is $0.125 per GB. All data transfer to cloud the is free, and retrieval is free for first 1 GB/month and $0.120 per GB up to
10 TB/month. The price for all cases falls as usage increases. (All figures as on July 28, 2011.)
40
In our implementation of identifying the records and matching them, we
used simple hashing. This is easy to implement but suffers from overheads. We
can definitely use a better hash function which optimizes the time complexity.
3.6 Security analysis
Our approach to achieve data confidentiality on the cloud using visual
cryptography indicates a promising approach to provide security to an un-trusted
cloud provider. At any instance, a cloud holds only a part of the data. More the
number of clouds the data is divided into, better the security is. The cloud is
neither aware of which part of the data it holds, nor it knows how many other
clouds hold the remaining data. Data stored on the clouds is disjoint. The data
itself is highly noisy. Thus, any attempt by an attacker to extract meaningful
information from the data will be rendered useless as only noisy (garbage) data
will be returned.
Let us further assume that the attacker has access to optimum values of
the parameters used for noise addition and data retrieval. Even then, considering
a nominal case that one of the service providers is hacked, an attempt to run a
query and retrieve the data will not yield anything meaningful. We cover the
threat analysis in detail in Chapter 5. Extensive experiments are run to prove our
case.
Comparing our visual cryptography scheme to the current work in this
domain [48], the number of secret images or shares generated is . In the
3.6 Security analysis
Data confidentiality using visual cryptography 41
preceding Section 3.5, we discussed the complexity for the number of shares. With
respect to security, the more shares we have, the better the security is. In our
case, we have multiple shares, yet it does not result in an increase in complexity.
Multiple shares allow confidentiality since if 1 shares are compromised and
available to an attacker, data will not be completely revealed.
Our framework protects against both types of attack on a cloud: (1) from
outside the system by an external agent, and (2) within the system by an internal
agent. We already explained (1) in the preceding page. For (2), consider an
internal employee eavesdropping on the data and trying to gain meaningful
information from it. The employee may be a self-motivated attacker trying to
steal the data, or supported by the resource provider itself where the provider
wants to know what the data is. With only 1/ part of the noisy data available,
full data retrieval will be impossible. Refer to Chapter 6 for results on threat
analysis that validate our hypothesis.
42
Chapter 4
Results
In this chapter, we describe the system design to evaluate the algorithms described
in Chapter 3. Further, experiments are run to test the effectiveness of the proposed
scheme and the results are discussed and analysed.
4.1 System design
The system comprises one computer acting as a client, which issues the commands
to store and retrieve the data, and four servers, which act as virtual cloud
providers. The four servers are independent and do not interact with each other.
The choice of four servers is based on the premise that each server should get only a
part of the data so that if one, or even two, server(s) are compromised, data is not
revealed. We later argue how varying the number of servers affects data
confidentiality.
For adding noise to the data, Gaussian and speckle noise were selected. As
discussed in Section 3.3.2, the right choice of noise parameter, that is, mean and
variance, is not a straight forward choice. First experiments are run to determine
the optimal noise parameter range and based on the range, we decide the NCC
threshold value, beyond which data retrieval from the clouds becomes infeasible.
43
4.2 Implementation
The system backend at the client and cloud providers is written in C. It handles the
basic task of sending and retrieving the data. Adding noise to the images was
accomplished using MATLAB [36] where a standalone executable was created for
noise addition. To perform the operations on the images namely cropping,
appending, creation of masks and calculation of NCC values, ImageMagick [30] was
used. The noise executable and the ImageMagick libraries are incorporated in C,
which is further combined with the backend to produce a single executable. The
entire system is accessible using a command line interface.
The data to be sent to and to be retrieved from the clouds is stored in a
simple text file. The library of images of ASCII character is generated using the
default font Arial, style bold. As mentioned in Section 3.3.1, each image is 40x40 in
Grayscale, two colors and depth one. Thus, each pixel is either black or white. Size
of each image is 382 bytes.1
4.3 Parameter estimation
We are focussing on evaluating two types of noise, Gaussian and speckle. The
experiments are to around determine how the system can be set for effective data
confidentiality on the clouds, following which, testing the system on a large dataset
to validate effective data retrieval. The first part of the experiments involve
evaluating noise parameters of the noise added to each character to judge how
effectively data is obfuscated. Based on the results from the first part, a sample text
1 When adding noise in MATLAB, each image is first converted to a double, noise is added, and then the output noisy image is again converted to a Grayscale, 1-bit depth image of 382 bytes.
4.3 Parameter estimation
44 Results
is sent to the clouds for storage. Data retrieval is initiated using a search query and
the NCC value, ncc, is calculated between the data retrieved and the actual data,
as described in Chapter 3. The third part of the experiment involves determining an
optimal ncc threshold, thv, where false positives and unsuccessful searches cease to
exist. Based on the results, the system is tested on a larger dataset where multiple
queries are sent to the cloud for retrieval. Both noises are tested to evaluate their
effectiveness.
4.3.1 Noise parameters
As discussed in Section 3.3.2, peak signal-to-noise ratio (PSNR) is a sound metric to
measure the effectiveness of the noise added to the characters. Below, we discuss
how varying the mean and variance of the noises affects PSNR and thus how
effectively data can be concealed in the clouds.
Gaussian noise
For Gaussian noise, our experiment varies the mean ( ) from -0.25 to 0.25 and
variance ( ) from 0 to 1 in increments of 0.05, and mean PSNR of all the ASCII
characters for each ( , ). Since noise is added to all ASCII characters and each
has an equal probability of being queried, lower the mean PSNR of the characters
is, more obfuscated the image will become. Figure 4.1 plots noise variance against
mean PSNR and Figure 4.2 shows the slope of PSNR. The slope plotted on the
figure is negative of the actual slope so that the change in slope is visually clearer to
notice.
45
Figure 4.1 Mean PSNR of all ASCII characters for Gaussian noise.
Figure 4.2 Slope of mean PSNR of all ASCII characters for Gaussian noise.
4.3 Parameter estimation
46 Results
It is evident from the figures, that PSNR drops sharply before =0.20 and
becomes asymptotic at =0.35 and beyond for all values of . Thus, =0.20 to
0.35 represents an optimal range for noise addition.
Following our discussion in Section 3.3.2, we further discuss the skewness for
Gaussian noise here. Taking into account the PSNR values, we can see that
negative and positive means unevenly distort the distribution. Though it is
desirable to have noise parameters such that PSNR is the lowest, however, a
negative mean leads to a more noisy background and only a moderately noisy text.
In our algorithm, the background is effectively filtered out by employing the
mask. Thus, as long as the PSNR of an image is low enough to render the
text noisy enough for effective obfuscation, the noise in the background is not
relevant. For positive mean, though PSNR is less than the case of negative mean,
the text portion is noisier. Hence, positive and negative skewness both portray their
advantages and disadvantages for now. For further experiments, we will restrict the
system to =-0.10 to 0.10 to avoid generating noise from a larger asymmetric
distribution. Figure 4.3 shows some images at different ( , ) with their
individual PSNR values.
(-0.20, 0.30): 5.2560 (-0.10, 0.30): 5.7573 (0.10, 0.30): 7.7924 (0.20, 0.30): 8.4239
Figure 4.3 Images at different ( , ) with their PSNR values in dB .
(0, 0.30): 7.0997
(0, 0.50): 5.9666
47
Speckle noise
As noted in Section 3.3.2, speckle noise is essentially a multiplicative noise
with =0, while variance can be varied. Speckle noise only adds black pixels to the
image, on both background and the text. We vary from 0 to 5 in increments of
0.05. Figure 4.4 plots the mean PSNR of all characters against the variance and
Figure 4.5 plots the negative of the slope.
Figure 4.4 Mean PSNR of all ASCII characters for speckle noise.
Figure 4.5 Slope of mean PSNR of all ASCII characters for speckle noise.
4.3 Parameter estimation
48 Results
The figures indicate that mean PSNR stabilizes at =1.85 to 2.00 and
becomes an asymptote after this point. Increasing the variance beyond 2.00 does
not yield a significant reduction in PSNR as it is evident from the slope, thus we
will limit our system to the range determined above.
4.3.2 Sending and retrieving data
Based on the implications in the previous section, we conducted experiments for
sending and retrieving actual data on the cloud system. The data consists of a book
chapter with 784 characters (including white spaces). As discussed in Chapter 3, we
want to focus on retrieving records which begin with a certain string. In our test, we
use the search query the, as it is the most frequently occurring word in the English
language [44]. The size of a record is kept as 3056 bytes, which means 8 images
compose a record as one image is 382 bytes. Following the discussion in Section 4.1,
the number of cloud service providers is 4.
Gaussian noise
Table 4.1 summarizes the ncc observed for Gaussian noise at different values of
and for search query the. The numbers 43, 61 and 91 refer to the actual
location of the records where the occurs in the beginning.
It is revealed from Table 4.1 that a negative indeed causes a negative
skewness leading to a large number of pixel values to be low, that is 0 (black).
When the mask is created with such a noisy image, the background will have more
noise with black pixels while the black text will have less noise. The correlation of
such a mask with the original image will lead to a high ncc, as indicated in 3.4.2.
49
Mean ( ) = -0.10 = 0.20 = 0.25 = 0.30 = 0.35
43 61 91 43 61 91 43 61 91 43 61 91 t 0.9618 0.9502 0.9657 0.9541 0.9502 0.9463 0.9463 0.9345 0.9463 0.9306 0.9306 0.9227h 0.9596 0.9468 0.9698 0.9494 0.9210 0.9468 0.9443 0.9132 0.9364 0.9417 0.8843 0.9106e 0.9715 0.9483 0.9715 0.9570 0.9278 0.9570 0.9454 0.9130 0.9454 0.9366 0.8981 0.9249
Mean ( ) = -0.05 = 0.20 = 0.25 = 0.30 = 0.35
43 61 91 43 61 91 43 61 91 43 61 91 t 0.9618 0.9520 0.9541 0.9502 0.9385 0.9463 0.9345 0.9306 0.9227 0.9266 0.9187 0.9227h 0.9571 0.9443 0.9647 0.9443 0.9158 0.9391 0.9417 0.9001 0.9106 0.9236 0.8790 0.9054e 0.9570 0.9308 0.9570 0.9454 0.9160 0.9454 0.9366 0.8981 0.9308 0.9219 0.8830 0.9071
Mean ( ) = 0.00 = 0.20 = 0.25 = 0.30 = 0.35
43 61 91 43 61 91 43 61 91 43 61 91 t 0.9541 0.9502 0.9502 0.9385 0.9345 0.9345 0.9266 0.9187 0.9227 0.9026 0.9187 0.9187h 0.9494 0.9210 0.9443 0.9417 0.9080 0.9262 0.9314 0.8790 0.9054 0.9027 0.8629 0.8922e 0.9512 0.9160 0.9541 0.9425 0.9011 0.9366 0.9308 0.8830 0.9130 0.9190 0.8739 0.8951
Mean ( ) = 0.05 = 0.20 = 0.25 = 0.30 = 0.35
43 61 91 43 61 91 43 61 91 43 61 91 t 0.9463 0.9345 0.9463 0.9266 0.9306 0.9227 0.9067 0.9187 0.9187 0.8905 0.9147 0.9147h 0.9443 0.9132 0.9314 0.9314 0.8843 0.9054 0.9054 0.8683 0.8922 0.8922 0.8521 0.8736e 0.9454 0.9100 0.9454 0.9337 0.8830 0.9190 0.9190 0.8739 0.8951 0.9041 0.8678 0.8708
Mean ( ) = 0.10 = 0.20 = 0.25 = 0.30 = 0.35
43 61 91 43 61 91 43 61 91 43 61 91 t 0.9187 0.9187 0.9187 0.9345 0.9345 0.9345 0.8905 0.9187 0.9147 0.8823 0.9067 0.9147h 0.9158 0.8736 0.8922 0.9417 0.9001 0.9106 0.8922 0.8548 0.8710 0.8869 0.8303 0.8656e 0.9219 0.8739 0.8981 0.9366 0.8951 0.9308 0.9011 0.8678 0.8708 0.8860 0.8585 0.8616
Table 4.1 NCC for Gaussian noise. A positive leads to a positive skewness, thus majority of the values are relatively
higher, that is 1 (white). The text in the mask has a large number of white pixels as
the noise. Thus, ncc for such an image is low as correlation becomes tough with
4.3 Parameter estimation
50 Results
such a noisy text. In both the cases of positive and negative , ncc drops steadily
with increasing . This indicates successful data retrieval becomes increasingly
tough with increasing variance.
Speckle noise
As the mean is 0 in speckle noise, we only need to vary the variance in this
case. Table 4.2 summarizes the observation.
Mean ( ) = 0.00 = 1.85 = 1.90 = 1.95 = 2.00
43 61 91 43 61 91 43 61 91 43 61 91 t 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 h 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 e 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
Table 4.2 NCC for speckle noise.
The ncc is observed to be exactly 1 for all the cases. This is owing to the
nature of the noise. Note that speckle noise is multiplicative and thus large amount
of black noise is added to the white background since it is bright while the text
essentially has black noise added to it, thus not affecting it. However, if variance is
high such as 2.00, the noise added to the white background renders the image with
so much black noise that the text itself is obfuscated in the process, leading to very
low PSNR as noted in Section 4.3.1.
4.3.3 NCC threshold
When retrieving data from the clouds, NCC plays a key role for a successful
retrieval. A low ncc threshold might lead to a large number of false positives, that
51
is, records that do not actually begin with the search query, will be incorrectly
identified as satisfying the search query. A large ncc threshold might cause
unsuccessful searches, that is, if the observed ncc is smaller than the threshold set,
records that do satisfy the search query will be ignored. Thus, it is a non-trivial task
to identify the correct ncc threshold thv.
Gaussian noise
Using the same system settings as in Section 4.3.2, the results for thv estimation for
Gaussian noise are presented in Table 4.3. The number of false positives for
different thv are indicated in each cell for a range of ( , ) values. The cells with
0, indicate no false positives, which is the threshold we are searching. Cells with -1
indicate unsuccessful searches, that is the search query was not found. This occurs
when ncc of any character in the search string is less than the threshold, following
which that record character is overlooked. Note that the -1 cases are more
detrimental to the system as a low false positives is still acceptable in real systems
than missing out on the data altogether.
Figure 4.6 presents the key findings from the above observations. The
number of false positives are high for low threshold values as expected. It is
interesting to note that the average number of false positives across all the four
variances fall by 38.51% from =-0.10 to =0.10 at a low thv of 0.7. At a high thv
of 0.825, the fall is by 93.75%. At thv=0.85, ten contiguous observations with zero
false positives is observed from (0.00, 0.25) to (0.10, 0.30). This is a good indicator
since working with a range of ( , ) allows flexibility in setting up a large scale
system which caters to different search queries. At thv=0.875, we can observe
4.3 Parameter estimation
52 Results
unsuccessful searches at (0.05, 0.35) and beyond. Thus, the block from (0.00, 0.25)
to (0.05, 0.30) allows the flexibility to have a range of thv from 0.85 to 0.875 which
is again useful for a large system setup.
thv
False positives
= -0.10 = -0.05 = 0.00 = 0.05 = 0.10
0.20 0.25 0.30 0.35 0.20 0.25 0.30 0.35 0.20 0.25 0.30 0.35 0.20 0.25 0.30 0.35 0.20 0.25 0.30 0.35
0.7000 45 44 43 42 39 39 39 38 38 37 35 34 31 29 29 28 29 28 25 25
0.7250 38 37 36 35 33 31 30 29 26 24 26 25 24 23 21 20 23 20 20 18
0.7500 29 28 28 27 26 24 25 21 24 21 19 17 19 18 15 12 15 13 9 7
0.7750 24 23 23 21 19 18 13 11 15 14 1 8 12 10 6 6 6 6 4 4
0.8000 14 13 10 8 13 12 7 4 7 4 4 3 5 3 3 2 2 2 1 0
0.8250 5 4 4 3 4 2 2 2 3 2 2 1 2 2 1 0 1 0 0 0
0.8500 2 2 2 0 2 2 1 0 2 0 0 0 0 0 0 0 0 0 0 -1
0.8750 1 1 0 0 1 1 0 0 1 0 0 0 0 0 0 -1 -1 -1 -1 -1
0.9000 1 0 0 -1 0 0 -1 -1 0 0 -1 -1 0 -1 -1 -1 -1 -1 -1 -1
0.9250 0 -1 -1 -1 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
0.9500 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
Table 4.3 NCC threshold estimation for Gaussian noise.
53
Figure 4.6 Optimum NCC threshold for Gaussian noise.
The parameter (0.05, 0.20) provides a good setting as it lies almost midway
between (0.00, 0.25) and (0.05, 0.30) and more importantly the thv spans till 0.9
without leading to false positives or misses. Thus, thv=0.875 is a good parameter as
it is supported by zero false positives above and below it. Hence, we can
convincingly set the system to =0.05, =0.20 and thv=0.875 for further
experiments with Gaussian noise. Speckle noise
The results for speckle noise are presented in Table 4.4. Owing to the nature of
speckle noise, there is no unsuccessful search at any threshold. Moreover, the false
positives cease to exist when thv=0.95. Figure 4.7 clearly shows the results at
different variances.
4.3 Parameter estimation
54 Results
Table 4.4 NCC threshold estimation for speckle noise.
Figure 4.7 Optimum NCC threshold for speckle noise.
thv False positives = 1.85 = 1.90 = 1.95 = 2.00
0.7000 63 63 64 63 0.7250 52 53 53 52 0.7500 43 42 42 42 0.7750 39 38 39 40 0.8000 36 35 36 36 0.8250 25 24 25 25 0.8500 14 13 14 14 0.8750 6 7 6 7 0.9000 4 4 4 4 0.9250 2 2 2 2 0.9500 0 0 0 0 0.9750 0 0 0 0 1.0000 0 0 0 0
55
It is noteworthy to notice that the numbers of false positives are almost
constant despite an increasing variance. This again is attributed to the
multiplicative nature of noise addition. Thus, even when the image is greatly
distorted due to noise, which is at high variance values, data retrieval is more
effective in case of speckle than Gaussian noise. From these results we can
convincingly assign thv=0.99 for speckle noise.
Hence, we have now found the optimum parameter for both Gaussian and
speckle noise. Before we proceed testing on a large dataset, it is interesting to
compare the effectiveness of these noises. As we observed, for Gaussian noise
optimum parameters are =0.05 and =0.20. From Section 4.3.1 and Figures 4.1
and 4.2, the mean PSNR for all ASCII characters at (0.05, 0.20) is 8.0723 dB. For
speckle, optimum is at =2.00 and the mean PSNR for all ASCII characters is
4.6580 dB. Thus, speckle is about 42% more effective in noise addition for our
system. In the following experiments, we will evaluate both noises with their
optimal parameters to evaluate their performance on a larger scale.
4.4 Multiple queries on large dataset Based on the above findings we now test our system on an independent dataset
with both Gaussian and speckle noise. We selected a book chapter which is to be
sent to the cloud. The system portrays the following statistics:
number of characters in the data = 5370
size of each record, rs = 3056 bytes (8 images per record. 382 bytes each)
total number of records, nor = 683
4.4 Multiple queries on large dataset
56 Results
number of cloud service providers simulated, ns = 4
Gaussian noise: = 0.05 and = 0.20
Speckle noise: = 2.00 ( = 0 for speckle noise)
thv: Gaussian noise = 0.8750 and speckle noise = 0.99
Table 4.5 summarizes the result for both the noises:
Gaussian noise Speckle noise
search query
number of occurrences
false positives
unsuccessful search
false positives
unsuccessful search
an 11 1 0 0 0 ing 6 0 0 0 0 of 4 0 0 0 0 as 6 0 0 0 0 s. 1 0 0 0 0 and 1 1 0 0 0 the 9 0 0 0 0 all 4 1 0 0 0 995 1 0 0 0 0 We 1 0 0 0 0 we 4 0 0 0 0 because 1 0 0 0 0 warming 1 0 0 0 0 green 2 0 0 0 0 Earth 2 0 0 0 0 have 2 0 0 0 0 know 1 0 0 0 0 on 3 0 0 0 0 he 10 1 0 0 0 me 2 1 0 0 0
Table 4.5 Data retrieval with multiple search queries on a 5K character dataset.
57
The results indicate that the system successfully retrieved all the queries
with no misses. There were no false positives for speckle noise, while Gaussian
reported only five. The system could also identify correctly between lowercase and
uppercase characters of the same alphabet and also the space character and
numbers. In case of multiple occurrences of a query, all were successfully retrieved.
On investigating the false positives it is revealed that out of the five in case
of Gaussian (speckle has zero), three are due to errors in determining
lowercase/uppercase character. That is, for instance, in case of all, the one false
positive is due to the system recognizing All as all. The system is able to detect
long search queries, symbols, uppercase and lowercase version of the same character
with 100% accuracy. The total number of search queries is 72 and number of false
positives for Gaussian noise is 5, which accounts for 6.94% of all queries. Speckle
noise has zero false positives.
As the last series of experiments, we tested the system with longer search
queries and a larger dataset to reflect a more realistic scenario. We increased the
record size by double to test longer queries. The remaining parameters are the same
as in previous experiments. Following are the system parameters:
number of characters in the data = 10730
size of each record, rs = 6112 bytes (16 images per record. 382 bytes each)
total number of records, nor = 744
number of cloud service providers simulated, ns = 4
The findings are summarized in Table 4.6. The system has consistently zero
unsuccessful searches. The number of false positives is small for both the noises.
4.4 Multiple queries on large dataset
58 Results
Table 4.6 Data retrieval with multiple search queries on a 10K character dataset.
Gaussian noise Speckle noise
search query
number of occurrences
false positives
unsuccessful search
false positives
unsuccessful search
computer 4 0 0 0 0 programming 1 0 0 0 0 basic 4 0 0 0 0 on set B (say) 1 0 0 0 0 numbers: 1 0 0 0 0 simultaneously 1 0 0 0 0 comp 8 0 0 0 0 comm 1 0 0 0 0 to 9 3 0 2 0 too 1 0 0 0 0 machine 1 0 0 0 0 from the wheels 1 0 0 0 0 in fact 2 1 0 1 0 element 2 0 0 0 0 alphabets. 1 0 0 0 0 " 8 0 0 0 0 "hardware" 1 0 0 0 0 operation 2 0 0 0 0 multi 2 0 0 0 0 Cipher, with A 1 0 0 0 0 For 1 0 0 0 0 for 3 0 0 0 0 That is, why 1 0 0 0 0 Also, our 1 0 0 0 0 Let us examine 1 0 0 0 0 R.L. Brown 1 0 0 0 0 quite a variety! 1 0 0 0 0 buses - to 1 0 0 0 0 It's a deep ques 1 0 0 0 0 what? 1 0 0 0 0
59
Referring to Table 4.6, which shows the test results of a dataset twice the
size of the former, and caters to 64 queries which are much more complex than the
previous, the number of false positives for Gaussian is 4, that is 6.25% of the total
queries, and speckle has 3 which makes it 4.69%. The final results are presented in
Figure 4.8.
Figure 4.8 Data retrieval on >5K and >10K character dataset
with multiple search queries.
Hence, we conclude that both Gaussian and speckle noise can be used for
data obfuscation in our system with 100% accuracy. The number of false positives
are very low and do not affect the system accuracy. It is evident that Speckle noise
performs slightly better than the Gaussian.
4.4 Multiple queries on large dataset
60 Results
4.5 Running time
Before we proceed, we would like to discuss the running time of the above
experiments. All results were generated on a 64-bit, 3.4 GHz Intel Core i7 2600
processor with 8 GB RAM. At peak usage, processor and memory consumption was
only about 20% each.
The average running time for a query was determined based on observations
on >5K and >10K character datasets. To give an accurate picture, we report time
per unit character. That is, time required to send or retrieve one character to and
from the cloud, respectively. The sending time depends on the number of characters
in the text. Thus, we report _ _ / for sending time. For retrieval,
time depends primarily on the number of records . The query length also has
effect on the retrieval time, but we chose queries with varying length for our
experiments. Hence, an average result on retrieval time suffices. Thus, the retrieval
time is _ _ / .
Note that we are working with images, thus, time per character intrinsically
means time taken to send the equivalent image of a character and retrieve the
equivalent image from the cloud system. Recall each image is 382 bytes. In
addition, the figures were averaged for both the noises as running times were within
0.1 ms of each other. Results are presented in Table 4.7. We show results for
different cloud configurations, varying the number of clouds, , and record size .
All figures are in millisecond.
61
Sending time (in ms)
Retrieval time (in ms)
4 8 22.21 28.53 4 16 19.40 14.08 8 8 34.17 23.29 8 16 30.92 11.75
Table 4.7: Running time measurements.
The running times are reasonably fast. It is interesting to notice that as
is doubled for same , sending time increases by 35%. However, for the same case,
retrieval time drops by 18%. The drop can be attributed to the fact that its
computationally less expensive to withdraw small parts of a fixed size object rather
than large chunks. When doubles, the number of parts of an image also double
and data held by each cloud halves. During retrieving when each cloud is asked to
transmit their part of the image, a cloud sends images which are half the size of
these when is not doubled. Thus, it is beneficial to have more clouds not only
from a security perspective but also from a computation point of view.
If is doubled while keeping the same, sending time drops by 12.5%.
The reason for this is that writing to continuous locations in a record is better than
writing data to continuous records. With a higher the system has less overhead
in going from one record to the next. One the other hand, retrieval time drops by
50% as doubles while is constant. This is also due to the fact that
traversing through a single record and reading the data is quicker than going from
one record to the next. We reckon all these are attributed to caching since we are
reading from continuous locations in a record. Thus, it is beneficial to have longer
records from a computation perspective.
4.5 Running time
62 Results
However, from s security perspective it is not advisable to have long records.
If record length is small, there will be more number of records for a given input text.
In case of an attacker eavesdropping and trying to determine from which record the
query is being retrieved from, smaller the number of records are, smaller is the
attack domain for an attacker. We want to generate as many false positives for an
attacker, which can only occur when the number of records is high, that is, record
length is small.
Note that the program is not written to exploit a multi-threaded
architecture. We are using just one thread at a time. If we can modify the code and
make it multi-threaded, the running time will come down by a large margin. And,
using parallelization while processing the images will also make the system much
faster. Calculating the NCC is a time consuming task and we would like to
investigate how to speed this operation. We strongly believe it is possible as
currently we are using a very basic way of calculating the NCC, which is not
optimized for speed.
Also, we are not using any indexing services on the database. Simple hashing
is used for identifying the records. Simple hashing is easy to implement but not an
optimal choice. We believe a good hash function can reduce the running time and
also add security to the system.
We still have not yet discussed how the system will behave in case of an
attack. If one or more servers are sabotaged, how much data will an adversary be
able to retrieve? In the next chapter we discuss these circumstances.
63
Chapter 5 Threat Analysis
In this chapter, we discuss how the system reacts in the event of an unauthorized
access. Various scenarios are developed and extensively tested to determine the
system resilience to attacks.
5.1 Threat scenarios Our system achieves data confidentiality by storing an authorized user's data in
the form of images at independent cloud providers. The providers are not aware of
each other's presence. Also, the data stored in them is disjoint, thus, no two
clouds have any data in common between them. Only the authorized users can
access the cloud for adding or retrieving data.
When an unauthorized user tries to access the data and targets a specific
cloud provider, only a part of the data will be accessible to him. That is, for ns
clouds, only 1/ns part of the data is revealed to the attacker. In such a scenario,
when an attacker sends a search_query, only the cloud which has been
compromised by the attacker will return the actual data while the rest of the
providers will return noisy data.
64
As for of our system, consider ns=4, and assume the authorised user has
already stored the data in the clouds. From the discussion at the end of Section
3.3, note that the user has to connect to each of the clouds and the system
establishes independent channels of communication between the user and the
providers. Consider an attacker who is able to gain access to one of the clouds
and wants to retrieve all the data stored on the cloud by a user. When he sends a
search_query, only the cloud which has been compromised will return the actual
data, while others will simply return noise. Figure 5.1 and Figure 5.2 show how
the data, for instance, character a, will appear to an attacker when one and two
clouds are breached for ns=4.
Figure 5.1 Data when one cloud out of four is breached.
Figure 5.2 Data when two clouds out of four are breached.
data available to an authorised user
data available to an attacker for ns=4, when one cloud is breached
1 32 4
data available to an authorised user
data available to an attacker for ns=4, when two clouds are breached
1 2 1 41 3
2 3 2 4 3 4
5.2 Experimental setup
Threat Analysis 65
We suppose that the attacker is not aware of which cloud holds which part
of the data. Thus, if he plans to attack one cloud, each cloud has an equal
probability of holding any of the one-fourth part. In other words, the probability
of an attacker breaching a cloud i for a total of ns clouds, is i/ns. For four clouds,
if one cloud is to be breached, the attacker has equally probable options:
1 2 3 4,
While if two clouds are to breached, we have equally likely scenarios:
1 2 1 3 1 4 2 3 2 4 3 4.
In Section 3.3.3, we briefly discussed how the number of cloud providers
affect the overall security of the system. We postulate that as the number of
clouds increases, data will be further divided. Hence, in case of a breach, a very
small fraction of the data will be leaked to the attacker. We will test our system
with ns=8 as well to validate our claim.
5.2 Experimental setup To test the threat scenarios, we sent some data to the cloud for storage. As we
saw in Section 4.4, although both Gaussian and speckle noise performed well, the
performance of Gaussian noise was slightly lesser than the latter. Thus, we can
assume that the chance of breaching a cloud where data obfuscation is done using
Gaussian noise, is higher. Hence, for the following experiments we will use
Gaussian noise for image obfuscation. The system has the following parameters:
66
number of characters in the data = 1130
size of each record, rs = 3056 bytes (8 images per record. 382 bytes each)
total number of records, nor = 145
number of cloud service providers simulated, ns = 4 and ns = 8
Gaussian noise: = 0.05 and = 0.20
thv: Gaussian noise = 0.8750
The parameters for Gaussian noise are the same as in Chapter 4. As a
worst-case scenario, we also assume that the attacker is aware of the optimum
threshold to use for correlation. We test with the following ten queries:
six an in for the
My grown-up wing to any one
In other words, we want to see if the attacker can successfully retrieve
records for a certain input query by breaching a certain number of clouds. Table
5.1 indicates the number of cases for which we tested the system. For a specific
ns, the integers denote the clouds breached and in brackets we indicate the
fraction of total data leaked to the attacker.
clouds breached
ns = 4 ns = 8 1
(25%) 2
(50%) 1
(12.5%) 2
(25%) 4
(50%) number of
combinations tested
40 60 80 280 700
Table 5.1 Threat scenarios tested.
5.2 Experimental setup
Threat Analysis 67
5.3 Results
We measured the number of successful retrievals and false positives for a specific
search string. Results for all ten queries were averaged for a specific number of
clouds breached, that is, for each column in Table 5.1. The rational to average the
results is that since an attacker decided to breach one cloud, and total number of
clouds is four, all four combinations are equally likely. The attacker is not aware
of the total number of clouds into which the data is divided. Attacker only
decides the number of clouds he wants to breach.
Figure 5.3 and Figure 5.4 illustrate the results for ns = 4 and ns = 8,
respectively. Each observation point represents results for a single cloud
configuration for the ten queries tested. That is, for instance, with ns = 4 and one
cloud breached, all ten queries for this cloud configuration were tested. The false
positives and successful retrieval were recorded and converted into percentage. In
the figures, it represents one observation point each for the false positives and
successful retrieval. In other words, each dot in the graph represents ten query
results. For one cloud breached there are = 4 combinations, thus (10×4=) 40
recordings. Refer to Table 5.1 for other scenarios. Similar explanation holds for
other test scenarios as well.
Results are summarized in Table 5.2. In a total of ten queries, some had
single occurrences, while others had multiple. In total, fifteen records should be
retrieved for a 100% successful retrieval. Lower the success rate and higher the
false positive are, better the resilience of the system to an attack is.
68
Figure 5.3 False positives and successful search with four clouds.
Figure 5.4 False positives and successful search with eight clouds.
5.3 Results
Threat Analysis 69
clouds breached
ns =4 ns = 81
(25%) 2
(50%) 1
(12.5%) 2
(25%) 4
(50%)
successful retrieval*
5.0 37.8 0.0 5.7 37.2
false positive*
25.0 18.9 2.5 5.5 15.1
* all figures in %.
Table 5.2 Results for different attack scenarios: average false positives
and successful retrievals.
5.4 Analysis
The results are encouraging. What is noticeable is that even with 25% of the data
in the clouds breached, the success rate for an attacker is just about 5%. Even
when 50% of the data is available to the attacker, he still can only retrieve less
than 40% of the queries correctly. The number of false positives is also moderate.
For our system, we want the false positives to be high in case of an attack. A high
number would confuse the attacker more and make it more difficult to retrieve
any useful information from the data. An attacker may use some artificial
intelligence or machine learning technique to extract some pattern on the queries
and in the process try to estimate the data. With a good number of false
positives, this would make such an attempt increasingly difficult for an attacker.
70
In real life, a 50% breach is too high a number. We believe an attacker
may be able to access about 10-20% of the data in a realistic scenario. Also, we
tested the system with just four and eight providers. For better resilience to
attacks, this number will be much higher in realistic cases. From the table we can
see that when only one server out of eight, that is about 12% of the data, is
breached, the success rate is zero for an attacker.
We can also observe that the ratio of successful retrieval (and false
positive) to the amount of data breached, is almost constant. This indicates the
consistency and robustness of our system. Given the large number of cloud
providers in the industry, the more providers we have, lower the success rate for
an attacker will be, with a favorable ratio to the number of false positives. Thus,
with respect to the theoretical security analysis in Section 3.6, our results validate
our claim. Also, note that the attacker here can be an external agent or even
internal to the service provider. In either case, data confidentiality is maintained.
Thus, even the cloud provider itself cannot determine the data with full accuracy.
In our experiments when the attacker has access to some of the clouds and
tries to retrieve data from the entire system, the clouds to which attacker does
not have access, will return garbage or noisy data. These uncompromised clouds
add the same noise pattern to the output which was used to initially obfuscate
the original data before sending to the cloud. This should theoretically facilitate
the attacker when he creates the mask and attempts retrieval.
Also, the optimum ncc threshold, thv, which was determined as described
in Section 4.3.3, is provided to the attacker in our tests. In realistic cases, this is
5.4 Analysis
Threat Analysis 71
only known to the user and even the cloud is not aware of it. Even when armed
with thv, the attacker is unable to fully retrieve the data. Our experimental setup
was a best-case scenario for an attacker, still the results are favour strong data
confidentiality. Moreover, the user can verify the integrity of the data anytime.
One may simply evaluate random records to see if they have been corrupted or
not.
We would like to highlight that there are cases when an attacker is able to
retrieve parts of the data. The figures represented in the previous section are
average figures. Overall data is safe from an attack, however, as revealed in
Figures 5.3 and 5.4, there are instances when the attacker is successful at
retrieving the records. On closer inspection, we realize the reason for this is how
the data is split. When a 40×40 image is cropped into four or eight parts, the top
most and bottommost parts do not contain the text portion. The parts belonging
to the middle of the image contain maximum text. Even when noise is added to
these middle sections, there is a possibility an attacker may be able to reveal the
data. The high success rate observed in some cases is due to the hacker having
access to these middle parts of the image.
In a realistic scenario, we note that which cloud has which part of the
data, is secret. Thus, the attacker has an almost equal probability of attacking a
cloud that has top or bottom parts of the image, and attacking a cloud which has
the middle region. To overcome this scenario we propose a modified strategy.
When cropping an image, a certain percentage of the top and bottom
regions of the image should be split into a nominal number of parts, such as four.
72
Even if these top and bottom parts are revealed to an attacker, since they hardly
contain the text part, no meaningful data can be revealed. The middle region of
the image should be split into more parts, for instance in eight or more, than the
end regions. Now, when these middle parts are sent to the cloud providers, each
cloud will have a very small part of the original image. This will render a
situation similar to the case when only one out of eight parts is revealed to an
attacker. Such a scheme will allow full data confidentiality. If a user is limited by
the number of cloud providers, then two parts of the image instead of one can be
sent to the same cloud, provided the two parts are as far apart as possible in the
original image. This is analogous to the concept of hamming distance used in
visual cryptography [40].
5.4 Analysis
73
Chapter 6
Conclusions and Future Work
In this chapter we present the conclusion for our work and suggest improvements
to the existing framework.
6.1 Conclusions We introduced a novel method to achieve data confidentiality in the cloud
computing environment. The cloud provider is considered untrustworthy and the
data must be concealed not only from an outside attack but the provider itself
must not be able to extract meaningful information from the data. Instead of
relying on one cloud service provider, we propose to use multiple (untrustworthy)
public cloud providers. We use visual cryptography to protect the data on the
cloud. Standard encryption schemes are avoided, yet we achieve strong privacy of
the data. A new visual cryptography scheme for binary images is introduced for
our system. The complexity of our approach is shown to be reasonable and much
less than standard encryption based schemes. We also avoid any overhead
associated with key management, as required by encryption. Besides achieving full
data confidentiality, a user can also run database style queries on the system for
efficient data retrieval, which is not possible with encryption based schemes.
74
The system was tested for a simple query scheme on small and large
datasets. Results indicate that the system is able to retrieve data successfully with
zero failure rate and very few false positives. Threat analysis of the system with
worst-case security conditions indicates the system is resilient to attacks and data
leaks. Attacks not only from outside the cloud, but also from within the cloud will
not be successful. A cloud owner itself will be unaware of the contents of the data
even during data retrieval operations.
We believe our system is best suited for storing sensitive data such as
medical records and financial transactions. At present, cloud providers do not
allow storing encrypted data as search cannot be performed on it. Thus, current
schemes cannot achieve both, data confidentiality and efficient query execution.
With our approach data such as credit card information, a person's health record
can be stored and queried on the cloud. At the expense of a small computational
overhead, which is much less than the encryption based schemes, we achieve
query execution and data confidentiality.
6.2 Future work
The results are encouraging for our system and indicate that our work has the
potential for a large-scale application. To this end, we propose some key
improvements and suggestions for future systems.
To validate our proposed method, we conducted small-scale experiments
and restricted ourselves to a basic LIKE database query for retrieval operations.
Our system can be easily modified to include other basic database queries as well.
However, we do believe that complex queries involving JOIN operations will not be
impossible but indeed be tough to include to our system.
6.2 Future work
Conclusions and Future Work 75
In addition, with a larger more range of queries to work with, we expect to
test our system with a larger database and more query operations. We relied on a
simulated cloud environment for our experiments. The system must be tested on
real clouds to better evaluate the complexity and overhead of implementation. It
would also be interesting to see how the system behaves with different noise
models besides Gaussian and Speckle. These two noises deemed sufficient for a
preliminary analysis, however, further investigation with noise types is pending.
We also acknowledge that the running time for our system is moderately
good. We strongly believe we can optimize our system from a programming
perspective and increase the speed by a good factor. Using multi-threading and
parallelization will increase the performance many-folds. A fast approach to
calculate the NCC should also be investigated since we are using a very basic
version for calculating NCC, which is not optimized for speed. Also, indexing the
database and using a better hash function will lower the retrieval time by a good
margin. How these can be accommodated in a secure manner such that the service
provider cannot exploit them, must be investigated.
With respect to our visual cryptography technique, instead of a simple
horizontal crop operation, a pseudo-random approach can be investigated which
will add security to the system. However, the overhead involved in such a scheme
must be balanced such that performance does not suffer. We also expect to test
the system against artificial intelligence/machine learning based attacks where an
attacker may employ techniques to gain useful information from the data.
76
References [1] M. Abdalla, M. Bellare, D. Catalano, E. Kiltz, T. Kohno, T. Lange, J.
Malone-Lee, G. Neven, P. Paillier, and H. Shi, "Searchable encryption revisited: Consistency properties, relation to anonymous ibe, and extensions," CRYPTO 2005. LNCS vol. 3621, pp. 205–222. Springer, Heidelberg (2005).
[2] Amazon, "Amazon Web Services: Overview of Security Processes", May 2011.
http://awsmedia.s3.amazonaws.com/pdf/AWS_Security_Whitepaper.pdf. [Accessed: June 14, 2011.]
[3] Amazon S3, "Amazon Simple Storage Service FAQs". http://aws.amazon.com/s3/faqs/#How_secure_is_my_data. [Accessed: Feb
20, 2011.] [4] M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee,
D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, "Above the Clouds: A Berkeley View of Cloud computing," Technical Report No. UCB/EECS-2009-28, University of California at Berkley, USA, Feb. 10, 2009.
[5] S. Bajaj and R. Sion, "TrustedDB: a trusted hardware based database with
privacy and data confidentiality," in Proceedings of the 2011 international conference on Management of data (SIGMOD '11), ACM, New York, NY, USA, pp. 205-216, 2011. [doi=10.1145/1989323.1989346]
[6] L. Ballard, S. Kamara, and F. Monrose, "Achieving efficient conjunctive
keyword searches over encrypted data," in Proceedings of the Seventh International Conference on Information and Communication Security (ICICS '05), pp. 414-426, 2005.
References 77
[7] B. Barak, O. Goldreich, R. Impagliazzo, S. Rudich, A. Sahai, S. P. Vadhan, and K. Yang, "On the (Im)possibility of Obfuscating Programs," in Proceedings of the 21st Annual International Cryptology Conference on Advances in Cryptology, pp.1-18, August 19-23, 2001.
[8] J. Bardin, J. Callas, S. Chaput, P. Fusco, F. Gilbert, C. Hoff, D. Hurst, S.
Kumaraswamy, L. Lynch, S. Matsumoto, B. O'Higgins, J. Pawluk, G. Reese, J. Reich, J. Ritter, J. Spivey, J. Viega, "Security guidance for critical areas of focus in cloud computing," [Online.] Cloud Security Alliance, Technical report, April 2009. Available: https://cloudsecurityalliance.org/csaguide.pdf. [Accessed: Mar 5, 2011].
[9] A. Boldyreva, N. Chenette, Y. Lee, and A. O’Neill, “Orderpreserving
symmetric encryption,” in Proc. of Eurocrypt ’09, vol. 5479 of LNCS. Springer, 2009.
[10] D. Boneh and B. Waters, "Conjunctive, Subset, and Range Queries on
Encrypted Data," in Proceedings of the 4th Theory of Cryptography Conference (TCC '07). LNCS vol. 4392, pp, 535-554, Springer, Heidelberg (2007).
[11] J. Brodkin. "Gartner: Seven cloud-computing security risks", July 2, 2008.
http://www.networkworld.com/news/2008/070208-cloud.html. [Accessed: Nov 18, 2010.] [12] S. Bugiel, S. N urnberger, A. Sadeghi, and Thomas Schneider, "Twin Clouds:
An architecture for secure cloud computing (Extended Abstract)," in Workshop on Cryptography and Security in Clouds (WCSC '11), Zurich, March 15-16, 2011.
[13] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, "Cloud
Computing and Emerging IT Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility" in Future Generation Computer Systems, Elsevier Science, Amsterdam, The Netherlands, 2009.
[doi: http://dx.doi.org/10.1016/j.future.2008.12.001]
78 References
[14] Y. C. Chang and M. Mitzenmacher, "Privacy preserving keyword searches
on remote encrypted data." in Applied Cryptography and Network Security Conference (ACNS '05). LNCS, vol. 3531, Springer, Heidelberg (2005).
[15] Y. Chen and R. Sion, "On securing untrusted clouds with cryptography," in Proceedings of the 9th annual ACM workshop on Privacy in the electronic society (WPES '10), pp.109-114, 2010. [16] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan, "Private information
retrieval," in 36th IEEE Conference on the Foundations of Computer Science, pp. 41–50. IEEE Computer Society Press, 1995.
[17] Cloud Security Alliance, "CSA Cloud Security Alliance Security Guidance
for Critical Areas of Focus in Cloud Computing V2.1," December 2009. Available: https://cloudsecurityalliance.org/csaguide.pdf.
[Accessed on: Mar 8, 2011.] [18] CNBC, "Amazon Failure Takes Down Sites Across Internet," April 21, 2011.
http://www.cnbc.com/id/42706104/. [Accessed: April 25, 2011.] [19] CSO Online, "Avoid Patriot Act Surprises: Encrypt Cloud Data on-
Premise", July 20, 2011. http://www.cio.com.au/article/394098/. [Accessed: July 21, 2011.] [20] R. Curtmola, J. A. Garay, S. Kamara and R. Ostrovsky, "Searchable
symmetric encryption: Improved definitions and efficient constructions," in Conference on Computer and Communications Security (CCS '06), ACM Press, New York, 2006.
[21] M. van Dijk and A. Juels, "On the impossibility of cryptography alone for
privacy-preserving cloud computing," in Cryptology ePrint Archive, Report 2010/305, 2010. Available: http://eprint.iacr.org/2010/305.
References 79
[22] Forbes, "DARPA Will Spend $20 Million To Search For Crypto’s Holy Grail," April 6, 2011.
Available: http://blogs.forbes.com/andygreenberg/2011/04/06/darpa-will-spend-20-million-to-search-for-cryptos-holy-grail/. [Accessed: April 28, 2011.]
[23] Gartner, "Gartner Says Worldwide Cloud Services Market to Surpass $68
Billion in 2010," June 22, 2010. http://www.gartner.com/it/page.jsp?id=1389313. [Accessed: May 2, 2011.] [24] C. Gentry, "Fully homomorphic encryption using ideal lattices," in
Proceedings of the 41st ACM Symposium on Theory of Computing (STOC), pp. 169–178. ACM, New York, 2009.
[25] C. Gentry and S. Halevi, "Implementing Gentry's fully-homomorphic
encryption scheme," in Cryptology ePrint Archive, Report 2010/520, 2010. Available: http://eprint.iacr.org/2010/520.
[26] C. Gentry, "Fully Homomorphic Encryption without Bootstrapping," in
Cryptology ePrint Archive, Report 2011/277, 2011. Available: http://eprint.iacr.org/2011/277.
[27] P. Golle, J. Staddon, and B. Waters, "Secure conjunctive keyword search
over encrypted data," in Applied Cryptography and Network Security Conference (ACNS '04). LCNS, vol. 3089, pp. 31-45. Springer-Verlag, 2004.
[28] V. Goyal, O. Pandey, A. Sahai and B. Waters, "Attribute-based encryption
for fine-grained access control of encrypted data," in Proceedings of the 13th ACM conference on Computer and communications security (CCS '06), pp. 89-98, 2006.
[29] IDC, "IT Cloud Services User Survey, pt.2: Top Benefits & Challenges," Oct
2, 2008. http://blogs.idc.com/ie/?p=210. [Accessed: May 16, 2011.] [30] ImageMagick, ImageMagick 6.7.1-1, 2011. http://www.imagemagick.org.
80 References
[31] P. T. Jaeger, J. Lin, J. M. Grimes, and S. N. Simmons, "Where is the cloud? Geography, economics, environment, and jurisdiction in cloud computing," First Monday, vol. 14, no. 5, 2009.
http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2456/. [Accessed: Jan 10, 2011.]
[32] S. Kamara, and K. Lauter, "Cryptographic cloud storage," in Proceedings of
Financial Cryptography: Workshop on Real-Life Cryptographic Protocols and Standardization, Tenerife, Canary Islands, Spain, 2010.
[33] L. M. Kaufman, "Data Security in the World of Cloud Computing," IEEE
Security and Privacy, vol.7, no.4, pp. 61-64, 2009. [34] J. P. Lewis, “Fast normalized cross-correlation,” Vision Interface, pp. 120–
123, 1995. [35] A. Lewko, T. Okamoto, A. Sahai, K. Takashima, and B. Waters, "Fully
secure functional encryption: Attribute-based encryption and (hierarchical) inner product encryption," in Advances in Cryptology-EUROCRYPT 2010, LNCS vol. 6110, pp. 62-91. Springer, Heidelberg (2010).
[36] Matlab Users Guide, The Math Works, Inc., Natick, MA.
http://www.mathworks.com/help/techdoc/. [Accessed: March 20, 2011.] [37] L. Ming, Y. Shucheng, C. Ning and L. Wenjing, "Authorized Private
Keyword Search over Encrypted Data in Cloud Computing," in 31st International Conference on Distributed Computing Systems (ICDCS '11), pp. 383-392, 2011. [doi=10.1109/ICDCS.2011.55.]
[38] P. Mell and T. Grance, "Effectively and securely using the cloud computing
paradigm," NIST, Information Technology Laboratory, 2009. Available: http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-computing-v26.ppt
References 81
[39] M. Mowbray, S. Pearson, and Y. Shen, “Enhancing privacy in cloud computing via policy-based obfuscation,” in Journal of Supercomputing, pp. 1-25, 2010.
[40] M. Naor and A. Shamir, “Visual cryptography,” in Advances in Cryptology:
EUROCRYPT’ 94, LNCS, vol. 950, pp. 1–12, Berlin, Springer-Verlag, 1995. [41] A. Narayanan, and V. Shmatikov, "Obfuscated databases and group
privacy," in ACM Conference on Computer and Communications Security, pp. 102–111. ACM Press, New York, 2005.
[42] Network Computing, "Encryption Is Cloud Computing Security Savior", Nov
19, 2009. http://www.networkcomputing.com/security/229502349. [Accessed: June 18, 2011.] [43] Network World, "The U.S. Patriot Act has an impact on cloud security,"
Sep 29, 2009. http://www.networkworld.com/newsletters/2009/092909cloudsec1.html.
[Accessed: April 19, 2011.] [44] Oxford Dictionaries, " The OEC: Facts about the language". http://oxforddictionaries.com/page/oecfactslanguage/the-oec-facts-about-
the-language. [Accessed: Dec 16, 2010.] [45] B. Parno, "Bootstrapping trust in a “trusted” platform," in HOTSEC 2008:
Proceedings of the 3rd conference on Hot topics in security, Berkeley, CA, USA, USENIX Association, pp. 1–6, 2008.
[46] PC World, "Google Docs Glitch Exposes Private Files", March 9, 2009. http://www.pcworld.com/article/160927/google_docs_glitch_exposes_priv
ate_files.html. [Accessed: Feb 12, 2011.] [47] S. T. Peddinti and N. Saxena, "On the effectiveness of anonymizing
networks for web search privacy," in Proceedings of the 6th ACM
82 References
Symposium on Information, Computer and Communications Security (ASIACCS '11), pp. 483-489, 2011, ACM, New York, NY, USA.
[48] P. S. Revenkar, A. Anjum, W .Z.Gandhare. "Survey of Visual Cryptography
Schemes," in International Journal of Security and Its Applications, vol. 4, no. 2, pp. 49-56, 2010.
[49] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage, "Hey, you, get off of
my cloud: exploring information leakage in third-party compute clouds," in Proceedings of the 16th ACM conference on computer and communications security, pp. 199-212, Chicago, Illinois, USA, 2009.
[50] RSA, "The Role of Security in Trustworthy Cloud Computing," 2010.
http://www.emc.com/collateral/about/investor-relations/9921_CLOUD_W P_0209_lowres.pdf. [Accessed: May 22, 2011.]
[51] E. Shi, J. Bethencourt, T-H. H. Chan, D. Song and A. Perrig, "Multi-
Dimensional Range Query over Encrypted Data," in Proceedings of the 2007 IEEE Symposium on Security and Privacy, pp. 350-364, 2007.
[52] S. J. Shyu, "Efficient visual secret sharing scheme for color images," Pattern
Recognition, vol. 39, pp. 866–880, 2006. [53] S. J. Shyu, S.Y. Huanga, Y.K. Lee, R.Z. Wang and K. Chen, "Sharing
multiple secrets in visual cryptography," Pattern Recognition, vol. 40, pp. 3633–3651, 2007.
[54] Simple-Talk, "Transparent Data Encryption", March 16, 2010. http://www.simple-talk.com/sql/database-administration/transparent-data-
encryption/. [Accessed: March 28, 2011.] [55] D. Song, D. Wagner and A. Perrig, "Practical Techniques for Searches on
Encrypted Data," in Proceedings of the 2000 IEEE Symposium on Security and Privacy, pp. 44-55, May 14-17, 2000.
References 83
[56] A. Swaminathan, Y. Mao, G.-M. Su, H. Gou, A. L. Varna, S. He, M.Wu, and D.W. Oard, “Confidentiality-preserving rank-ordered search,” in Proc. of the Workshop on Storage Security and Survivability, 2007.
[57] E. R. Verheul and H. C. A. van Tilborg, "Constructions and properties of k
out of n visual secret sharing schemes," in Designs, Codes and Cryptography, vol. 11, pp. 179-196, 1997.
[58] C. Wang, N. Cao, K. Ren, and W. Lou, "Enabling Secure and Efficient
Ranked Keyword Search over Outsourced Cloud Data," in IEEE Transactions on Parallel and Distributed Systems (TPDS), 2011. (To be published.)
[59] D. Wang, F. Yia and X. Li, "On general construction for extended visual
cryptography schemes," Pattern Recognition, vol. 42, pp. 3071–3082, 2009. [60] C. Wu and L. Chen, "A study on visual cryptography," Master's thesis,
Institute of Computer and Information Science, National Chiao Tung University, Taiwan, R.O.C., 1998.
[61] S. Ye, F. Wu, R. Pandey and H. Chen, "Noise Injection for Search Privacy
Protection," in Proceedings of the 2009 International Conference on Computational Science and Engineering, pp. 1-8, 2009.
[62] S. Zerr, D. Olmedilla, W. Nejdl, and W. Siberski, “Zerber+r: Top-k retrieval
from a confidential index,” in Proc. of EDBT ’09, 2009.