hello from cs - jaist...
TRANSCRIPT
Hello from CS
Security on Cloud Computing, Query Computation and Data Mining on Encrypted Database
Professor David CHEUNG
Head, Department of Computer Science The University of Hong Kong
RIVF 2010 Hanoi
What is Cloud Computing ???
From Youtube
Song from Youtube
What the Cloud can offer ?‐ unlimited supply of computing power
(electricity)
What do they say about Cloud Computing?
• 27.5 M hits in Google Search on “cloud computing”
• “The term cloud is used as a metaphor for Internet, based on how Internet is depicted in the network diagrams” –Wikipedia
• “The ability to connect to software and data on the Internet (the cloud) instead of your hard drive or local area network” ‐ComputerWorld
Some examples of Cloud
Computing?
Gmail – Web mail
Flickr – Photo Album
Google Docs – Internet Docs
Amazon Web Services –Infrastructure as a services
The three pillars of Cloud
Computing?
Google Docs – Internet Docs
Virtualization
Separation of applications and infrastructure
Software as a Service
Utility Computing
Impacts and Riskof Cloud
Computing?
Survey by World Economic Forum (Dec 2009)
Barriers – from WorldEconomicsForum
• Lack of Understanding• Interoperability• Fear of vendor lock‐in• Privacy
• Security• Governance• Integration and Migration
How about security inCloud
Computing?
Trust me, it is very secure.
How about that 0.001% chance ???
Can we run query applications on Cloud if the provider cannot be
trusted??
Secure Computation on Encrypted
Database
Secure Computation on Encrypted Data – What ?
X Y
Z
encryption
Computation :x+y ??
find a: a > 0 ?
data
Security on Cloud Computing
Amazon’s EC2database
E(DB)(U.S.)
encrypted customerdata
(submit to Cloud for query services )
A’s customers(U.K.)
Company A(Hong Kong)
customerqueries
Gartner listed SEVEN security issues in Cloud Computing:
4. Data Segregation : encryption scheme must be available to protect the corporate data
Challenge: How to perform computation (queries) on encrypted data ??
Computation on Encrypted Data –what is the key issue ??
Bob: the master
Alice: untrusted slave
xy…
E(x), E(y)
I want to find x+y;I want Alice to do it for me
But I don’t trust Alice
E(x)E(y)
Result = E(x) + E(y)
R = E(x) + E(y)
Privacy homomorphism (additive):
E(x) + E(y) = E(x + y)
E-1(R)
= E-1(E(x) + E(y))
= E-1(E(x + y))
= x + y
[Rivest, Adleman, Dertouzos, Foundations of Secure Computation 1978]
Can I use R to recover (x+y) ??
Keyissue??
need a nice and secure encryption
What is a secure encryption scheme ?
A secure scheme :
It can stop the attacker and achieve the defined security goal ?
Security goal Adversary Model
Attack ModelWhat can the adversary do ?
e.g., he can access some data (plain text) and encrypt it
What does the adversary know ?
e.g., he knows some background knowledge
Security GoalWhat is the primary goal ?
e.g., prevent the adversary (service provider) from seeing the data
What is the additional requirement ?
e.g., prevent the adversary from finding any statistical information on the protected data
A model for Secure Queries on Encrypted Database (SCONEDB)
• Encrypted DBMS (EDBMS) hosting at an untrusted service provider– Store encrypted data– Process queries
• A Three Players Game– Player 1 : Database owner ‐ encrypt data and send them to
the db at the service provider
– Player 2 : User of the database – issue queries to the EDBMS
– Player 3 : Attacker (service provider) – try to break into the encrypted database
Player 2 (queries issuer)
Player 1 (data owner) DB
ET()tK
R EQ(q)ET(t)
EQ()
D()
K
q
D(R)
EDBMS
E(DB)query
processing
Service provider: hosting the encrypted db
SCONEDB Model
CryptanalysisH DBA
Player 3 (attacker)
knowledge known to the attacker on the data
attacker’s guess in his attempt to break
the encryption
full access to E(DB)
Player 2
Player 1
EDBMS
E(DB)
DB
ET() EQ()
D()
t
queryprocessing
K
K q
D(R)
R EQ(q)ET(t)
CryptanalysisH DBA
Player 3 (attacker)
Database hosting at the service provider
Define an encryptionscheme (ET,EQ and D) and a queryprocessingmethod on E(DB) such that query resultsreturned are correct and the attackercannot compromisethe E(DB), i.e., DBA is empty, given background knowledge H.
Problem definition:SCONEDB Model
Attack Model :Three levels of background knowledge
• Basic capability: attacker has full access to encrypted data
• Background knowledge (a three level model):– Level 1 : no background knowledge
– Level 2 : attacker knows some records in DB (plain text)
– Level 3 : attacker knows some records in DB and the encrypted values of these records, i.e., knows some (x, E(x)) pairs
x2
DB
x1
x1’x2’
E(DB)
x3’E
x3 Level 1attackerLevel 2attackerLevel 3attacker
? ? ?
Attack model – different background knowledge level
• The attacker has full access to the EDBMS (encrypted database, encrypted queries)
• Three levels of attacker model
– Level 1: has no background knowledge (i.e., H = empty set) (basic level);
• e.g., service provider knows nothing about the business of the data owner
– Level 2: knows a set of points P in DB (practical level);• e.g., the adversary is a customer of the bank
– Level 3: knows a set of points P in DB and their corresponding encrypted values in E(DB) (avoidable);
• e.g., the adversary creates a new account yesterday and he observes there is only one new encrypted account added since yesterday
If a scheme survives a higher level attack, it survives a lower level one.
SCONEDB on an important query type: Secure kNN Computation
• We develop an encryption scheme for kNN queries on SECONEDB to explore its applicability
• k nearest neighbor query (kNN)
– Database DB: a set of d dimensional points
– Given a query point q, find the k nearest points to q in the database
q
x1x2
d2d1
DB
x2 is the 1-nearest neighbor of q
Is Distance Preserving Transformation (DPT) an answer to the kNN problem?
• E is a DPT if– d(x, y) = d’(E(x), E(y))
• kNN can be computed on E(DB) if E is a DPT
q
x1x2
d2d1
DB
q’
x1’
x2’d2’
d1’
E(DB)
E
(DPT)
Is this a real solution? Can it survive attack?
Nice property ??????
DPT fails at levels 2 and 3 attack
• DPT fails at a level 3 attack
• DPT also fails at a level 2 attack (signature attack)
y (?)
x1
x2
DB
x3
y’
x1’x2’
E(DB)
x3’
d1’
d2’
d3’
d1
d2
d3
Attacker’s background knowledge (level 3)
Attacker wants to compromise the encryption of y’
E
(DPT)
DPT is not a good solution, it is not secure enough.
An Asymmetric Scalar Product Preservation Encryption
Scheme 1: Asymmetric Scalar Product Preservation Encryption [SIGMOD 2009]
x’ = ET(x) = MT(xT, -0.5 ||x||2 )T
q’ = EQ(q) = M-1(r(qT, 1)T)
x = D(x’) = Πd((MT)-1(x’))
Encryption key: M is a (d+1) dimension random invertible matrix
Scheme 1: An ASPE encryption scheme
Theorem: Let x1’, x2’, and q’ be the encrypted values of x1, x2, and q with scheme 1, then
||x2 – q|| < ||x1 – q|| iff x2’.q’ > x1’.q’
and the scheme is not a DPT. (Very nice !!!!)
q
x1x2
d2d1
E
(Scheme 1)
d dimensions d+1 dimensions
x1’
x2’
q’
Scheme 1: how safe ?
• Scheme 1 resists level 2 attack – it preserves some nice asymmetric scalar product and it has broken the curse of distance preservation
• Unfortunately, Scheme 1 fails level 3 attack – if enough (x, ET(x)) pairs are known to the attacker, the random matrix M can be solved (compromised)
Scheme 2: Asymmetric Random Splitting
• Scheme is based on a splitting technique done independently and randomly on each dimension [SIGMOD 2009]
• 2d+1 possible splitting configurations for (d+1)‐dimensional case – exponentially many configurations for the adversary to guess
• Scheme 2 resists level 3 attack if the attacker cannot derive the splitting configuration
• If the number of dimensions is over 80, the scheme is as safe as 1024‐bit RSA key
What have we shown you ?
• It is possible to process queries on encrypted data – so, query processing on cloud service provider could be done and need new techniques
• This is a practical approach – the security level is not as strong as the conventional goal (e.g., semantic security); but nevertheless practical and has an implementable performance
Can we do data mining on Cloud if the provider cannot be
trusted??
Data Mining on Encrypted Data
Example: Association rules mining
• In the form of X => Y• Meaning
– If a transaction contains itemset X, the transaction will probably contain itemset Y
• A rule must have– High support : number of transactions that include XY– High confidence : the ratio of number of transactions containing XY
to number of transactions containing X but not Y.
• Key issue : compute large item sets– X is large if the percentage of transactions containing X is larger than a threshold [Agrawal VLDB 94]
Data mining by a Service Provider (cloud)
Data Mining ServiceProvider (cloud computing)DB DB’Transformer
AssociationRules’
AssociationRules
DataOwner
Send to SPEncryption
Decryption
Solution: Item mapping (encryption cipher)
• T is a set of transactionst = {cheese, book, bread, chocolate, ..}
• bread ‐> 54 (item mapping)• chocolate ‐> 165• t = <cheese, book, bread, chocolate> becomes t’ = <8, 69, 54, 165>
• <54, 165> is large to the miner, but what is it ? <cheese, book> or <bread, chocolate> ???
• Similar to substitution cipher used in encryption of text
A More Secure Mapping ??
• A one‐to‐n item mapping (more secure)– B: a set of items
– m: I ‐> 2B
• Example– m(a) = {1, 4, 5}
– m(b) = {2}
– m(c) = {3, 5}
Itemset mapping using one‐to‐n item mapping (encryption)
• m: I ‐> 2B : one‐to‐n item mapping• M: 2I ‐> 2B : itemset mapping
• Example:– M(<a, c>) = <1, 3, 4, 5>– M(<b, c>) = <2, 3, 5>– M‐1(<1, 2, 4, 5>) = <a, b>– M‐1(<1, 2, 3, 4, 5>) = <a, b, c>
m:a -> {1, 4, 5}b -> {2}c -> {3, 5}
Correctness – collision on one‐to‐n mapping
• <a, b>=><1, 2, 3>
• <a, b, c>=><1, 2, 3>
Collisions!
Decryption failure !
• <a>=><1, 2>
...
• <a,b>=><1, 2, 3>
…
• <a, b, c>=><1, 2, 3, 4>
m:a -> {1, 2}b -> {2, 3}c -> {1, 3}
m’:a -> {1, 2}b -> {2, 3}c -> {2, 4}
A good one-to-n mapping: the mapping of each item must contain unique stuff
One‐to‐n vs one‐to‐one
• one‐to‐n vs one‐to‐one?– Intuitively, one‐to‐n should be more secure
Unfortunate Scenario:• one‐to‐n + item mapping
= one‐to‐one + item mappingOur solution :
– Add a random component to transaction transformation
– It will make one‐to‐n always better (more secure) than one‐to‐one [VLDB 2007]
Criteria for a valid transformation
• Correctness ‐ The randomly added items must not affect the decryption
• Completeness ‐ A good random transformation algorithm should generate all possible transformations to make it not easy to guess
Algorithm to perform valid and complete transformation
t = <…>
StartMeet
quota?
a->…b->…
…->…
Mappings
No
N(t)
Pick one
x->x1, …,xn
History
Stores items we must not add
xi, …,xj
Filter
E
E = Ø at start
Some add to E
Others to history
Next
Add E to result
Integrity concerns
• The mining results returned from the service provider could be wrong– Bugs in their mining program– Only compute part of answer (laziness) – Adversely modify the results (malicious)
• Two problems– Soundness
• All rules are correct (no false positive)• All rules are with correct support and confidence
– Completeness• All correct rules are included (no false negative)
Light weight audit of mining result
• Pre‐processing + light‐weighted verification [VLDB 2009]
T
Data owner
T T
FI
Transformations
Service provider
FI
Audit Environment
FrequentItemsets
FIVerifications
auxiliary data
^
^
U
R
What have we shown you ?
• It is possible to do data mining on encrypted data –at least it can be done on association mining; also the mining result is not visible to the service provider
• We can audit the mining result returned by a service provider; therefore, the integrity is protected against the service provider
Publications on Computation and Mining on Encrypted Data
• Data mining on encrypted data– [Wong, Cheung, Hung, Liu]
Protecting Privacy in Incremental Maintenance for Distributed Association Rule Mining, PAKDD 2008.
– [Wong, Cheung, Hung, Kao and Mamoulis]
Security in Outsourcing of Association Rule Mining, VLDB 2007.
• Integrity of data mining on encrypted data– [Wong, Cheung, Hung, Kao and Mamoulis]
An Audit Environment for Outsourcing of Frequent Itemset Mining, Proceeding of PVLDB 2009 (VLDB).
• Secure kNN computation on encrypted data– [Wong, Cheung, Kao and Mamoulis]
Secure k‐NN Computation on Encrypted Databases, SIGMOD 2009.
• Privacy preservation– [Wong, Mamoulis and Cheung]
Non‐homogeneous Generalization in Privacy Preserving Data Publishing, SIGMOD 2010.
Acknowledgement
• Wong Wai Kit• Ben Kao• Nikos Mamoulis• Edward Hung
The End
Thank you. Question please !!!!