[IEEE 22nd International Conference on Data Engineering (ICDE'06) - Atlanta, GA, USA (2006.04.3-2006.04.7)] 22nd International Conference on Data Engineering (ICDE'06) - Sovereign Joins

Download [IEEE 22nd International Conference on Data Engineering (ICDE'06) - Atlanta, GA, USA (2006.04.3-2006.04.7)] 22nd International Conference on Data Engineering (ICDE'06) - Sovereign Joins

Post on 29-Mar-2017




0 download

Embed Size (px)


<ul><li><p>Sovereign Joins</p><p>Rakesh Agrawal Dmitri Asonov Murat Kantarcioglu Yaping LiIBM Almaden Research Center</p><p>The University of Texas at DallasUniversity of California, Berkeley</p><p>Abstract</p><p>We present a secure network service for sovereign infor-mation sharing whose only trusted component is an off-the-shelf secure coprocessor. The participating data providerssend encrypted relations to the service that sends the en-crypted results to the recipients. The technical challenge inimplementing such a service arises from the limited capa-bility of the secure coprocessors: they have small memory,no attached disk, and no facility for communicating directlywith other machines in the network. The internal state of anongoing computation within the secure coprocessor cannotbe seen from outside, but its interactions with the server canbe exploited by an adversary.</p><p>We formulate the problem of computing join in thissetting where the goal is to prevent information leakagethrough patterns in I/O while maximizing performance. Wespecify criteria for proving the security of a join algorithmand provide provably safe algorithms. These algorithmscan be used to compute general joins involving arbitrarypredicates and multiple sovereign databases. We thus en-able a new class of applications requiring query processingacross sovereign entities such that nothing apart from theresult is revealed to the recipients.</p><p>1 Introduction</p><p>Conventional information integration approaches, as ex-emplified by centralized data warehouses and mediator-based data federations, assume that the data in eachdatabase can be revealed completely to the other databases.Consequently, information sharing across autonomous enti-ties is inhibited due to confidentiality and privacy concerns.The goal of sovereign information sharing [2, 3, 8] is to en-able such sharing by allowing queries to be computed acrosssovereign databases such that nothing apart from the resultis revealed. The computation of join of sovereign databasesin such a manner is referred to as sovereign join. We citebelow two motivating applications of sovereign joins [3]:</p><p>Security For national security, it might be necessary tocheck if any of the airline passengers is on the watchlist of a federal agency [21]. Sovereign join may beused to find only those passengers who are on the list,without obtaining information about all the passengersfrom the airline or revealing the watch list.</p><p>Healthcare In epidemiological research, it might be ofinterest to ascertain whether there is a correlation be-tween a reaction to a drug and some DNA sequence,which may require joining DNA information from agene bank with patient records from various hospi-tals. However, a hospital disclosing patient informa-tion could be in violation of privacy protection laws,and it may be desirable to access only the matchingsequences from the gene bank.</p><p>1.1 Desiderata</p><p>A system offering sovereign join service has the follow-ing desirable attributes:</p><p>The system should be able to handle general joins in-volving arbitrary predicates. The national security ap-plication cited above requires a fuzzy match on pro-files. Similarly, the patient records spread across hos-pitals may require complex matching in the healthcareapplication.</p><p>The system should be able to handle multi-party joins.The recipient of the join result can be a party differentfrom one of the data providers.</p><p>The recipient should only be able to learn the result ofthe join computation. No other party should be able tolearn the result values or the data values in someoneelses input.</p><p>The system should be provably secure. The trustedcomponent should be small, simple, and isolated [4].</p><p>Proceedings of the 22nd International Conference on Data Engineering (ICDE06) 8-7695-2570-9/06 $20.00 2006 IEEE </p></li><li><p>1.2 Problem Addressed</p><p>We present a secure network service for sovereign in-formation sharing whose only trusted component is a se-cure coprocessor [15, 26, 32]. IBM 4758 cryptographic co-processor [17] is an example of a commercially available,tamper-responding secure coprocessor.</p><p>The technical challenge in implementing such a servicearises from the following:</p><p>Secure coprocessors have limited capabilities. Theyrely on the server to which they are attached for diskstorage or communication with other machines. Theyalso have small memory (e.g. 4MB in IBM 4758).The factors constraining the memory size are cost andheat dispensation. The trend towards consolidating thesecure coprocessor functionality on a single chip alsoconstrains the amount of memory as larger memoriesreduce the yield.</p><p>While the internal state of a computation within thesecure coprocessor cannot be seen from outside, theinteractions between the server and the secure copro-cessor can be observed.</p><p>Simply encrypting communication between the dataproviders and the secure processor is, therefore, insuffi-cient. The join computation needs to be carefully orches-trated such that the read and write accesses made by thesecure coprocessor cannot be exploited to make unwantedinferences.</p><p>Careful orchestration of join computation in the face oflimited memory has been a staple of database research for along time. The goal in the past, however, has been the mini-mization of I/O to maximize performance. While the I/Ominimization is still important, avoiding leakage throughpatterns in I/O accesses now becomes paramount.</p><p>1.3 Related Work</p><p>In principle, sovereign information sharing can be imple-mented by using techniques for secure function evaluation(SFE) [13, 31]. Given two parties with inputs and re-spectively, SFE computes a function such that theparties learn only the result. SFE techniques are consideredto have mostly theoretic significance and have been rarelyapplied in practice, although some effort is afoot to changethe situation [22].</p><p>To avoid the high cost of SFE, the approach taken in [3]was to develop specialized protocols for intersection, inter-section size, equijoin, and equijoin size. Similar protocolsfor intersection have been proposed in [8, 16]. A new inter-section protocol has been recently proposed in [10]. How-ever, the protocols provided in [3] have the following short-comings: (1) It is not clear how to extend them to operations</p><p>involving general predicates as they are hash-based. (2) Itis not obvious how to extend them to efficiently handle alarge number of parties. (3) They leak information. For ex-ample, the equijoin size protocol leaks the distribution ofduplicates; if no two values have the same number of dupli-cates, it can also leak the intersection.</p><p>Secure coprocessors have been earlier used in a vari-ety of applications, including secure e-commerce [33], au-ditable digital time stamping [30], secure fine-grained ac-cess control [12], secure data mining [1], and private in-formation retrieval [5, 28]. See [27] for a taxonomy of se-cure coprocessing applications. The techniques developedtherein though are quite different. Note that the capabilitiesprovided in the architectures such as Trusted ComputingGroups trusted platform module [29], while complemen-tary, do not solve our problem.</p><p>1.4 Paper Layout</p><p>The rest of the paper is organized as follows. In Sec-tion 2, we specify the adversarial model and give the sim-plifying assumptions and notations. In Section 3, we illus-trate using classical nested loop some of the subtleties of theproblem. This investigation enables us to distill the designprinciples underlying the proposed algorithms. We also de-fine the correctness criteria for proving the safety of the joinalgorithms.</p><p>In Section 4, we provide two provably safe algorithmsfor general join in which the matching predicate can be anarbitrary function. They offer a range of performance trade-offs under different operating parameters.</p><p>Section 5 is devoted to the study of equijoins. Surpris-ingly, adaptations of classical sort-merge join or hash jointurn out to be unsafe. We then provide a safe algorithm.</p><p>In Section 6, we analyze the performance characteristicsof the proposed algorithms. We conclude with a summaryand directions for future work in Section 7.</p><p>2 Preliminaries</p><p>This section specifies the adversarial model and our sim-plifying assumptions and notations.</p><p>2.1 Adversarial Model</p><p>Our computing model admits any number of dataproviders and result recipients. Without loss of generality,we will consider the case where two parties andthat have private relations and are participating in thesovereign join operation and the result is sent to the party</p><p>, which is not or . We assume that the join algo-rithms and the join predicates are known to the parties.</p><p>Proceedings of the 22nd International Conference on Data Engineering (ICDE06) 8-7695-2570-9/06 $20.00 2006 IEEE </p></li><li><p>The server , offering sovereign information sharing, isa general purpose computer. A secure coprocessor isattached to . The only trusted component is the securecoprocessor. All other components, including , are un-trusted. We assume that no party (including ) can observethe state of the computation inside or tamper with thecode loaded into it.</p><p>Communication between and , , or is en-crypted. Similarly, any temporary value output by to isalso encrypted.</p><p>2.2 Authenticated Computation</p><p>Given that nothing but is trusted, we have the chal-lenge of validating the authenticity and protecting the se-crecy of the computation done by .</p><p>We use the remote attestation mechanism provided bythe secure coprocessor to ensure that it is indeed executinga known, trusted version of the application code, runningunder a known, trusted version of the OS, and loaded by aknown, trusted version of the bootstrap code [12].</p><p>We assume that and have signed a digital con-tract [12] prescribing what data can be shared and whichcomputations are permissible. holds a copy of the con-tract and serves as an arbiter of it. Contracts are kept en-crypted at the server. At the start of a join computation,</p><p>authenticates the identities of and to ensure thatthe parties it is interacting with are indeed the ones listedin the contract. Then sets up the symmetric keys to beused with and respectively. Each party prepends itsrelation with the contract ID and encrypts the two togetheras one message.</p><p>We require an encryption scheme that provides bothmessage privacy and message authenticity. Such schemesare called authenticated encryption and include XCBC,IAPM, and OCB [11, 19, 24]. We choose OCB (whichstands for offset codebook) over the other two, as it re-quires the least number of block cipher operations (block cipher operations to encrypt (resp. decrypt) plain-text (resp. ciphertext) blocks). It is also provably secure:(a) an adversary is unable to distinguish OCB-outputs froman equal number of random bits (privacy) and an adversaryis unable to generate any valid Nonce, Ciphertext, Authen-tication Tag triple (authenticity). The indistinguishabilityfrom random strings implies that OCB is semantically se-cure [24], which ensures with high probability that dupli-cate tuples will be encrypted differently.</p><p>Encryption under OCB [24] requires an -bit noncewhere is the block size. The nonce would typically be anidentifier selected by the sender. In OCB, two states, Offsetand Checksum, are computed accumulatively as blocks aresequentially encrypted. The offset is used in encrypt-ing and decrypting block where ,</p><p>for and some easily computablefunction . When encrypting a plaintext block , theciphertext forwhere is the total number of message blocks. The fi-nal cipher block first bitswhere len ,len ) the length of the final message block, andsome easily computable function. The state Checksum =</p><p>and the tagChecksum [first bits] where</p><p>represents padding the last cipher block to the block size.The first bits are the authentication tag . The nonceand the ciphertext are transferredto the recipient.</p><p>When decrypting a ciphertext block , the plaintextfor where</p><p>is computed from the received nonce. Letlen .</p><p>[first bit]. Checksum =. Let Checksum [first</p><p>bits]. If , then accept the message, otherwise reject.Since we use authenticated encryption, an adversary who</p><p>does not know the key cannot impersonate or , norcan it tamper with the encrypted tuples in any way thatwill not be detected. Similarly, for communication of re-sult from to .</p><p>Thus, the only vulnerability that an adversary can hopeto exploit is the pattern in the interactions between and .Our algorithms are designed to thwart the adversary fromlearning anything by observing this interaction.</p><p>2.3 Assumptions and Notations</p><p>To simplify exposition, we will assume that the tuples of, , and are of the same size and that free memory of</p><p>the secure processor can hold at most such tuples.Note that we need to be able to hold at least two input tu-ples in memory during the join processing and expressingmemory size as simplifies cost expressions. is themaximum number of tuples from that match a tuple from</p><p>. Our algorithms have been designed to handle the gen-eral case where . We also assume that is muchsmaller than or .</p><p>We will omit from the algorithms the details of the com-munication between , , , and . Assume thatand have sent their encrypted relations and respec-tively to , who has stored them on its local disk. Similarly,</p><p>writes the encrypted join result to s disk (invoking theserver process running on ), which then sends to .The algorithms will describe the code executed by .</p><p>We will indicate a transfer of data from to byprepending the operation with the keyword ; the key-word will indicate a transfer from to . We will use</p><p>Proceedings of the 22nd International Conference on Data Engineering (ICDE06) 8-7695-2570-9/06 $20.00 2006 IEEE </p></li><li><p>and to denote the encryption and de-cryption functions respectively. We will ignore the use ofkeys in these functions. We assume fixed size tuples andthat the server knows their size.</p><p>We do not discuss issues such as schema discovery andschema mappings. We assume schemas can be shared. Thedesign presented in [2] can be used for this purpose.</p><p>3 Design Principles</p><p>We first present two straightforward, but unsafe, adapta-tions of the classical nested loop join algorithm. We discussthem as they help derive the design principles underlyingour proposed algorithms.</p><p>3.1 A Straightforward, but Unsafe Algorithm</p><p>Here is a straightforward adaptation of the classicalnested loop join algorithm. first obtains an encryptedtuple of by sending a read request to and decrypts thetuple inside its memory. then reads a tuple of , decryptsit, and compares it with the decrypted tuple of . If thematch succeeds, encrypts the result tuple and outputs itto to write to disk. The above step is repeated for the restof the tuples of and then the procedure is repeated for therest of the tuples of .</p><p>Unfortunately, this straightforward adaptation is notsafe, although the input as well as output values remain en-crypted outside of . An adversary (e.g., colluding with</p><p>who does not receive the join result) can easily deter-mine which encrypted tuples of joined with which tuplesof , simply by observing wh...</p></li></ul>


View more >