root oram: a tunable differentially private oblivious ram · oram family requires a bandwidth of a...

Root ORAM:A Tunable Differentially Private Oblivious RAM

Sameer WaghPrinceton University

[email protected]

Paul CuffPrinceton [email protected]

Prateek MittalPrinceton University

[email protected]

Abstract—State-of-the-art mechanisms for oblivious RAM(ORAM) suffer from significant bandwidth overheads (greaterthan 100x) that impact the throughput and latency of memoryaccesses. This renders their deployment in high-performance andbandwidth-constrained applications difficult, motivating the de-sign of low-overhead approaches for memory access obfuscation.

We introduce and formalize the notion of a differentiallyprivate ORAM that provides statistical privacy guarantees, andwhich to the extent of our knowledge, is the first of its kind.The formalization of differentially private ORAM opens up alarge design space of low-bandwidth ORAM protocols that canbe deployed in bandwidth constrained applications.

We present Root ORAM, a family of practical ORAMs thatprovide a tunable trade-off between the desired bandwidthoverhead and system security, and that provide the rigorousprivacy guarantees of differentially private ORAMs. Our RootORAM family can be tuned to achieve application-specificbandwidth constraints, enabling practical deployment, at the costof statistical privacy guarantees which are quantified under thedifferential privacy framework.

We demonstrate the practicality of Root ORAM using theo-retical analysis, simulations, as well as real world experimentson Amazon EC2. Our theoretical analysis rigorously quantifiesthe privacy offered by Root ORAM, and provably bounds theinformation leaked from observing memory access patterns. Ourexperimental analysis shows that the simplest protocol in the RootORAM family requires a bandwidth of a mere 10 blocks, at thecost of rigorously quantified security loss, and that this numberis independent of the number of outsourced blocks N. This isan order of magnitude improvement over the existing state-of-the-art ORAM schemes such as Path ORAM, which incurs abandwidth overhead of 10 · logN blocks.

I. INTRODUCTION

Cloud storage and computing are important tools to out-source data but have given rise to significant privacy concernsdue to the non-local nature of data storage. Though encryp-tion goes a long way in assuring data confidentiality, recentwork [5], [16] has shown that encryption is not sufficient.Encryption does not hide memory access patterns; an untrustedstorage server can thus perform traffic analysis of memoryaccess patterns to compromise client privacy. The work ofIslam et al. has shown the leakage of sensitive keywordinformation by performing traffic analysis of access patternsover encrypted email [16]. Similarly, Dautrich et al. haveshown that access patterns over database tuples can leakordering information [5].

Oblivious RAM (ORAM), first introduced by Goldreichand Ostrovsky [13], [14], is a cryptographic primitive which

allows a client to protect its data access pattern from anuntrusted server storing the data. Since its introduction, sub-stantial progress has been made by the research community indeveloping novel and efficient ORAM schemes [4], [12], [25],[26], [30]–[32]. Recent work has also shown the promise ofusing ORAMs as a critical component in developing protocolsfor other cryptographic primitives such as Secure Multi-PartyComputation [12].

However, ORAM schemes incur a large overhead in termsof bandwidth that renders them impractical. For example, eventhe most efficient ORAM protocols [26], [31], [32] incur a log-arithmic overhead compared to conventional RAMs (greaterthan 100x including constants). This significantly impacts thethroughput and latency of memory accesses, and presentsa bottleneck for real-world deployment of oblivious RAMsin high-performance and bandwidth constrained applications.The lack of low-bandwidth ORAMs, despite considerableefforts from the security community, is an undeniable indicatorthat we need a fundamentally new approach.

Hence, we propose a novel approach for developing prac-tical ORAM protocols that can support even a constantbandwidth overhead compared to conventional RAMs. Ourkey approach is to trade-off reduction in bandwidth at thecost of privacy offered by the ORAM. We first formalizethe notion of a differentially private ORAM that providesstatistical privacy guarantees, and which to the extent of ourknowledge, is the first of its kind. As the name suggests, weuse the differential privacy framework developed by Dwork etal. [6] with its (ε, δ)-differential privacy modification [9]. Inthe current formulation of an ORAM (perfect), the output iscomputationally indistinguishable for any two input sequences.In a differentially private ORAM, we characterize the effectof a small change in the ORAM input to the change in theprobability distribution at the output.

We present Root ORAM1, a family of practical ORAMschemes that provide a tunable trade-off between the desiredbandwidth overhead and system security, including a designpoint that supports constant bandwidth construction and thatprovide the rigorous privacy guarantees of differentially privateORAMs. The low bandwidth protocols are an order of magni-

1The protocol family is called Root ORAM because in the lowest bandwidthregime, the data structure reduces to just the root and the leaves (tree of depth1). Refer to Sec. V for details.

1

arX

iv:1

601.

0337

8v1

[cs

.CR

] 1

3 Ja

n 20

16

tude improvement over previous work in which the protocolsstill incur a logarithmic bandwidth [12], [31], [32].

The formalization of a differentially private ORAM opensup a large underlying design space currently not consideredby the community. With a rigorously quantified privacy-utilitytrade-off, we propose Root ORAM as the first step in thisdirection of statistically private ORAMs.

A. Our Contributions

Root ORAM introduces a number of paradigm shifts inthe designing of ORAM protocols while at the same timebuilding on the prevailing ideas of contemporary ORAMconstructions2. Our main contributions can be summarized asfollows :

The notion of a differentially private ORAM : We for-malize the notion of a differentially private ORAM, which tothe extent of our knowledge is the first of its kind. Formallydefined in Section III, a differentially private ORAM is a wayto characterize an ORAM protocol that bounds the informationleakage from memory access patterns.

Tunable/parametric protocol family : In bandwidth con-straint applications, large bandwidth overhead (>100x) ofconventional ORAM schemes is be a significant bottleneck.We propose to reduce bandwidth overhead at the cost of arigorously quantified privacy loss. We provide a new family ofORAM schemes called Root ORAM, that can achieve tunabletrade-offs between bandwidth and privacy while at the sametime providing security guarantees of differentially privateORAMs. This allows Root ORAM to be tailored as per theneeds and constraints of the application, serving as an enablerfor practical deployment.

Security : We theoretically analyze the security offeredby Root ORAM, and prove that Root ORAM provides therigorous privacy guarantee of differentially private ORAMs.Thus, Root ORAM provably bounds the information leakagefrom observed memory access patterns. Our theoretical se-curity analysis also gives a novel proof of the security of thePath ORAM protocol in the framework of differentially privateORAMs. We believe that our approach is general, and will beuseful to rigorously reason about the security of alternativedifferentially private ORAM schemes in the future.

Performance : Root ORAM introduces a new design pointwith constant bandwidth overhead. The simplest protocol ofthe family has bandwidth usage per access as low as aconstant around 10 data blocks3 compared to 10 · logN blocksin the case of Path ORAM. At the same time, the server-side storage efficiency can be as high as 1:2, i.e., one fakeblock per real block outsourced (compared to Path ORAMwhich uses around 1:4). We implement Root ORAM and

2This work is inspired by the Path ORAM paper [32] and we would liketo give the authors of the paper all due credit. At the same time, we wouldlike to bring it to the reader’s attention that there are considerable differencesbetween the two protocols as highlighted in section IV.

3Achieved for λ = 4 and Z = 2.

demonstrate its practicality using simulations as well as realworld experiments on Amazon EC2.

Some of these are order of magnitude improvements inperformance over state-of-the-art, though we would like toremind the reader that these come at the cost of a rigorouslyquantified privacy loss. Finally, Root ORAM does not assumeany server-side computation and uses low, practical amountsof client-side storage at the same time being extremely simpleto implement at both the client and the server side.

B. Paper organization

The paper is organized as follows :

• We begin by motivating the formulation of ORAMs withstatistical privacy guarantees in Section II along withdifferent metrics that we considered.

• Section III formalizes the notion of a differentially privateORAM.

• Section IV gives an overview of the Root ORAM protocoland Section V presents a detailed description of the RootORAM family.

• We theoretically evaluate the security offered by RootORAM in Section VI and present our systems evaluationin Section VII.

• Section VIII contrasts Root ORAM with related work.• Limitations and directions for future work are addressed

in Section IX.• Finally, we conclude in Section X.

II. PRELIMINARIES: STATISTICAL PRIVACY

Fig. 1: A representation of an ORAM. The protocol boxtranslates an input access sequence into an output accesssequence.

The significant bandwidth overhead in conventional ORAMschemes necessitates our design of reducing the protocolbandwidth at the cost of rigorously quantified privacy loss.The lack of deployed ORAM schemes despite considerableefforts from the security community to develop low-overheadapproaches is a strong indicator that we need a paradigm shiftin our approach. To this extent, we formulate the concept ofstatistical privacy in ORAMs.

2

In this section, we give an overview of the notion of adifferentially private ORAM. A perfect ORAM roughly4 leaksno information about the input access sequence. In otherwords, we can consider an ORAM to be a black-box with aninput sequence as X and an output sequence as Y as shownin Fig. 1. An ORAM with perfect privacy would guaranteethe independence of X and Y . A slightly stronger conditioncould be to say that the distribution of the output sequence isuniform over its space for any given input5.

The most natural way to extend the latter condition fordesigning statistically private ORAMs is to consider ORAMschemes that give non-uniform distributions of the outputsequences Y (for a given input X) and use security metricsthat quantify the “non-uniformness” of this distribution. Thisis graphically illustrated in Fig. 2, where an attacker aimsto guess the original access pattern after observing the outputaccess pattern o. Next, we discuss a number of metrics that canbe used to quantify the statistical privacy guarantees offeredby ORAMs with non-uniform output distributions.

1. k-Anonymity : k-Anonymity was first formulated bySweeney in [33]. It refers to the largest set within which a datapoint is anonymous. In the ORAM setting, the adversary willtry to infer the input sequence of the ORAM after observingits output sequence. Given an outcome, we cluster the setof all inputs into two sets, the set of potential inputs whichcan lead to the observed sequence, and the set of improbableinputs which cannot lead to the observed sequence. Now ifthe smallest of the potential input sets is of size k for anyobserved access pattern, then we can define the ORAM ask-anonymous.

More formally, for a given observed access sequence Y ,if we denote its potential set by I(Y ) then we can define anORAM protocol as k-anonymous if

k ≤ min∀Y|I(Y )|

where |I(Y )| denotes the cardinality of a set I(Y ).Though k-anonymity is an effective metric, it is often too

simplistic because it does not take underlying probabilitydistributions into account. If among a given cluster, oneelement is much more probable than all the rest, then itdefeats the intuitive privacy offered by this metric. Similarly,k-anonymity does not resist adaptive composition attacks [11].Since the formalization of k-anonymity, metrics such as l-diversity [22] and t-closeness [19] have been studied aspotential modifications for it.

2. KL divergence : Next, we consider first order metricswhich take into account the underlying probability distribu-tions on the access sequences. The most natural candidate isKullback-Leibler divergence6, which measures the distance be-

4We say roughly because ORAMs could leak information such as timingof accesses/access pattern size.

5This is stronger because conventional definitions of perfect ORAMsinvolve outputs being computationally indistinguishable which hides a smalldetail which we shall see in Sec. VI.

6Also known as information gain or relative entropy

tween two probability distributions. KL-divergence is formallydefined as :

DKL(P ||Q) =∑i

P (i) logP (i)

Q(i)

In the context of ORAMs, we can define a metric as follows :Given the observed access sequence as Y , the attacker hasa distribution over the original access sequence (which ingeneral will not be uniform). Denoting this distribution by Pand a uniform distribution over the original access sequenceas U , we can quantify the privacy loss using γ given by,

γ = DKL(P ||U) (1)

The KL divergence between a distribution P and a uniformdistribution U is directly related to the entropy of the distribu-tion as in Eq. 3. And hence, KL-divergence metric is directlyrelated to a metric using Shannon entropy.

3. Entropy & Min-entropy : Shannon entropy as wellas min-entropy are very well studied metrics in informationtheory7. They have been used frequently to quantify securityand privacy [15], [18], [23] by different research communities.Formally, these are defined as :

H1(P ) = H(P ) =∑i

−P (i) logP (i)

H∞(P ) = − logmaxiP (i)

(2)

If we denote by A the set of points in the sample space ofoutput access sequences and by U a uniform distribution overthis space, then in the context of ORAMs, we can define thismetric using the KL-divergence by noting the simple relationbetween KL-divergence and Shannon entropy.

DKL(P ||U) = log |A| −H(P ) (3)

where H(Q) is the Shannon entropy of a distribution Q.Hence, H(P ) can be suitably used to give a metric of theprivacy loss in ORAMs.

Min-entropy lower bounds Shannon entropy and can hencebe used to derive a lower bounds on statistical security definedusing Shannon entropy. Min-entropy has also been used in thepast to characterize statistical security [23], [29].

4. Differential Privacy : Differential Privacy is the state-of-the-art privacy metric and there is an emerging consensusamong the security and privacy community about its use. Firstformulated in a seminal paper by Dwork et al. [6], differentialprivacy was introduced in the context of security for publishingdatabases [7], [8], [17]. We leverage this framework in oursetting, and use it to define a metric for quantifying the privacyoffered by statistical ORAMs. Differential privacy is definedas follows :

Definition 1: ε-Differential Privacy : A randomized algo-rithm K gives ε-differential privacy if for all data sets D1 andD2 differing on at most one element, and all S ⊂ Range(K),

Pr[K(D1) ∈ S] ≤ eεPr[K(D2) ∈ S] (4)

7Renyi entropy of order 1 and ∞ respectively.

3

(a) (b)

Fig. 2: These figures show the intuitions behind general statistically private ORAMs. Fig. 2a shows a way to define astatistical ORAM using KL-divergence as a metric, whereas Fig. 2b shows the intuition behind a differentially privateORAM.

The definition characterizes the amplification of inputchange on the change in the distribution of the output se-quence. In other words, if the input is perturbed slightly i.e|D1−D2| = 1, the output probability distribution changes onlyslightly. For more in depth reading, we refer the reader to thedifferential privacy paper by Dwork et al. [6]. As we will seelater in Sec. VI, ε-differential privacy alone is inadequate tocompletely capture the ORAM security for subtle reasons8. Itturns out that the extension of differential privacy called (ε, δ)-differential privacy is extremely apt and fitting to characterizeORAM security. (ε, δ)-Differential Privacy was first defined byDwork et. al. [9] and was formalized by Nissim et. al. [24]can be stated as follows :

Definition 2: (ε, δ)-Differential Privacy : A randomizedalgorithm K is (ε, δ)-differentially private if for all data setsD1 and D2 differing on at most one element, and for allS ⊂ Range(K),

Pr[K(D1) ∈ S] ≤ eεPr[K(D2) ∈ S] + δ (5)

When δ = 0, the algorithm is ε-differentially private.

Both ε-differential privacy and (ε, δ)-differential privacyhave an interesting composability property. Given two in-dependent mechanisms with privacy guarantees (ε1, δ1) and(ε2, δ2), any function of them is a (ε1 + ε2, δ1 + δ2)-differentially private mechanism.

III. A DIFFERENTIALLY PRIVATE ORAM

The notion of probabilistic security has been commonlyused across security/privacy applications in the literature [1],[20]. But in the context of ORAMs, this notion of havingstatistical privacy has not been rigorously explored. We be-lieve formulating such a framework would greatly expand the

8For ex: For a fixed stash size, even Path ORAM is not differentially privatefor any ε. Refer to Sec. VI-D.

ability to develop novel ORAM protocols with low-bandwidthoverhead, serving as an enabler for real-world deployment.

A large number of papers have been published over thelast few years in the ORAM domain which adopt quite a fewdifferent definitions to quantify ORAM bandwidth. Hence itis important to mention the definition we use in this paper.We will stick to the original and straightforward definition ofbandwidth as the average number of blocks transferred for oneaccess [4].

Definition 3: The bandwidth cost of a storage scheme isgiven by the average number of blocks transferred in order toread or write a single block.

Formally, an ORAM can be defined as a mechanism (pos-sibly randomized) which takes an input access sequence −→y asgiven below,

−→y = ((opM, addrM, dataM), ..., (op1, addr1, data1)) (6)

and outputs a resulting output sequence denoted by ORAM(−→y ).Here M is the length of the access sequence, opi denoteswhether the i-th operation is a read or a write, addri denotesthe address for that access, and datai denotes the data (ifopi is a write). Denoting by |−→y | the length of the accesssequence −→y , the currently accepted security definition forORAM security can be summarized as follows :

Definition 4: (Currently accepted ORAM Security) :Let −→y as denoted in Eq. 6, denote an access sequence. LetORAM(−→y ) be the resulting randomized data request sequenceof an ORAM algorithm. The ORAM protocol guarantees thatfor any −→y and −→y ′, ORAM(−→y ) and ORAM(−→y ′) are computa-tionally indistinguishable if |−→y | = |−→y ′|, and also that forany −→y the data returned to the client by ORAM is consistentwith −→y (i.e the ORAM behaves like a valid RAM) with highprobability.

4

Symbol Description

k ≥ 1 Model parameter (to tune trade-off)p = 1− 1/2k Derived model parameterN = 2L Number of real data blocks outsourced

Z Number of blocks in each bucketB Size of each block (in bits)

P (x) Path from leaf x to the rootP (x, i) Node at level i in P (x)

x := position[a] Data block a is currently mapped to leafx i.e. a resides in some bucket in P (x)

TABLE I: Notation for Root ORAM

The existing frameworks for ORAM security are con-structed with complete security at their core [4], [12], [26],[32]. There is no natural way to extend these frameworks toincorporate a statistical privacy notion. Hence, we formalizethe notion of a differentially private ORAM as follows.

A. Formalizing Differentially Private ORAMs

The intuition behind a statistically private ORAM is thatgiven any two input sequences that differ by a single access,the distributions of their output sequences should be “close”. Inother words, a differentially private ORAM can be thought ofas a mechanism whose output distributions are “close enough”if the input sequence is changed slightly. We formally defineit as follows :

Definition 5: Differentially Private ORAM : Let −→y , asdefined in Eq. 6, denote the input to an ORAM. Let ORAM(−→y )be the resulting randomized data request sequence of anORAM algorithm. We say that a ORAM protocol is (ε, δ)-differentially private if for all input access sequences −→y1 and−→y2 , which differ in at most one access, the following conditionis satisfied by the ORAM protocol,

Pr[ORAM(−→y1) ∈ S] ≤ eεPr[ORAM(−→y2) ∈ S] + δ (7)

where S is any set of output sequences of the ORAM.

We note that the definition does not make any assumptionabout the size of the output sequences in S. Thus, if theinput to the ORAM is changed by a single access tuple(opi, addri, datai), the output distribution does not changesignificantly. Fig. 2 graphically represents this intuition. Giventhe two sequences r1 and r2, the two distributions generated(the red and the blue) are close to each other in the differentialprivacy sense.

IV. ROOT ORAM OVERVIEW

In this section, we briefly describe our key design goalsand give a high level overview of the Root ORAM protocol.Notation is briefly given in Table I.

A. Design Goals

1) Tunable ORAM scheme: We target a tunable architecturewith explicit privacy-utility trade-off which can be used todesign ORAM protocols for bandwidth constraint applications.In general, we would like be able to tune the protocolparameters based on the systemic constraints and explicitlydemonstrate the security-bandwidth trade-off.

2) Framework for security: We target a protocol that pro-vides rigorous privacy guarantees viz. that of differentiallyprivate ORAMs formalized in Sec. III.

3) Low Storage and Computation: The design should useas low storage as possible both on the client as well as theserver side. Similarly, we would like to avoid assuming anyserver-side computation.

B. Approach Overview

Root ORAM protocol can be split into three components,the access, the new mapping and the eviction. These are brieflydescribed below. As Path ORAM is an instantiation of RootORAM, the protocols are very similar in their structure.

The server-side storage is a partial binary tree where eachnode is a bucket which can hold up to Z data blocks. A stashat the client is used to store a small amount of data wheneverneeded. Data elements are mapped to leaves of the tree and alocal mapping is used to store this mapping.

Access : The main invariant (same as Path ORAM) is thatany data block is along the path from the root to the leaf itis mapped or is in the stash. To access a data element, theclient looks up the local mapping to find the leaf that the dataelement is mapped onto. This completes the access part.

New Mapping : The relevant data block is then read orwritten with the new data and a new mapping is generated.The crucial difference here is that the new mapping is notuniform among all the leaves, the new mapping is slightlymore likely to be be the same as the old mapping than anyother random leaf. The exact distribution is given by Eq. 8.

Finally, new randomized encryptions are generated and allthe data is written back with elements being pushed downfurther in the tree if possible (towards the leaf) and if newelements can be written back to the tree.

Eviction : The eviction scheme used in Root ORAM is thatof fake accesses. The client machine independently sends fakeaccess queries to the server, completely indistinguishable fromnormal requests, through a Poisson process with parameter λ.The eviction process ensures that the stash size remains low.

C. Comparison with Path ORAM [32]

Root ORAM is inspired by the Path ORAM protocol andwe would like to give the authors all due credit. At the sametime, in this subsection, we would like to highlight the criticaldifferences between the two papers.

Differentially Private ORAM : Root ORAM introduces anew rigorous metric to quantify ORAM security, which ex-tends current formalism to include the notion of a statisticallyprivate ORAM. We rigorously bound the statistical privacy

5

offered by the Root ORAM family of protocols using thismetric.

Storage structure : Root ORAM uses a partial binary treeas the storage structure at the server where the height of thetree is a model parameter k. This is represented in Fig 3.The parameter k governs the security-bandwidth trade-off. ThePath ORAM protocol on the contrary has a fixed height binarytree (complete binary tree).

Tunability : The ability to tune the protocol as per thesystem constraints is a stark difference between Root ORAMand Path ORAM. There is no way to optimize Path ORAMwhen the bandwidth is constrained and statistical security isacceptable. Root ORAM introduces the novel notion of non-uniform mapping, a specific choice which allows Root ORAMto give statistical privacy guarantees. Path ORAM’s updatemapping scheme then turns out to be simple case of thisgeneralized mapping.

Simply by tuning the parameters, Root ORAM matchesor exceeds the performance of Path ORAM. We provide theability to operate in the low bandwidth regime which PathORAM cannot support. The eviction scheme allows RootORAM to achieve perfect security (ε = 0) at even lowerbandwidth than the Path ORAM protocol as can be seen inthe Fig. 6. Path ORAM uses a bandwidth of ∼ 10 logN datablocks per access9 whereas Root ORAM can perform the samewith around ∼ 8 logN .

Eviction scheme : The eviction schemes of the two proto-cols are very different. Path ORAM relies on a sufficientlylarge bucket size to achieve its goals. In contrast, RootORAM uses an eviction scheme of fake accesses. Root ORAMparameters can be tuned to achieve Path ORAM protocol, butthe latter is not the lowest bandwidth full security (ε = 0)protocol in the Root ORAM family.

Multi-dimensional trade-off space : Root ORAM can betuned as per the user’s requirements not just in the security-bandwidth space but also in terms of the server storage used,and the local stash size required. Thus, Root ORAM offersattractive design points that can support a diverse range ofmulti-dimensional trade-offs just by tuning its parameters.

V. ROOT ORAM DETAILS

In this section, we provide the details of Root ORAM. Webegin by describing the basics of the protocol. The requirednotation is tabulated in Table I.

A. Server Storage

Server Storage : The server stores data in the form of apartial binary tree consisting of buckets10 as nodes. In otherwords, given a integer k, we first construct a binary tree ofdepth k− 1 i.e the root is level 0 and the lowest level is levelk−1 (this tree will have 2k−1 leaves). Then each of the 2k−1

leaves of this tree has 2L−k+1 children each (From here on,

9For bucket size Z = 5.10A bucket contains multiple blocks of data storage which can be real or

dummy.

we shall refer to these N = 2L nodes as the leaves of thetree). This set-up is illustrated in Fig. 3.

Bucket structure : Each node is a bucket consisting of Zblocks (each block can either be real or dummy (encryptions of0). Note that, the bucket size directly affects the bandwidth ofthe ORAM scheme and Root ORAM demonstrates practicalityof bucket sizes as low as Z = 2.

Path structure : The leaves are numbered in the set{0, 1, ..., 2L−1}. P (x) denotes the path (set of buckets alongthe way) from leaf x to the root and P (x, i) denotes the bucketin P (x) at level i. It is important to emphasize here that thepath length in Root ORAM is (k+1) blocks compared to the(logN) + 1 in Path ORAM.

Dummy blocks and randomized encryption : We use thestandard padding technique (fill buckets with dummy blockswhen needed) along with randomized encryption to ensureindistinguishability of real and dummy blocks.

B. Invariants of the scheme

Main Invariant (same as Path ORAM) : The maininvariant is that each real data block a is mapped to a leafx := position[a], x ∈ {0, 1, 2, ...2L − 1} and at any point inthe execution of the ORAM, the real block will be somewherein a bucket ∈ P (x) or in the local Stash.11

Secondary Invariant : We maintain the secondary invariantthat after each access to an element, its new mapping isgoverned by a constant non-uniform distribution D given bythe following equation and shown graphically in Fig. 4.

Pz,x = p2 + (p1 − p2)δzx (8)

Where Pz,x is the probability that an element accessed fromleaf x is mapped to a leaf z, δij is the Kronecker delta12,p1 = (1 − p) and p2 = p/(N − 1) where p is the modelparameter probability as defined in Table. I.

C. Client Storage

Position Map : The client side stores a position map whichmaps each real data block to a leaf at the server side tree. Thiscan be stored recursively by the technique introduced in thePath ORAM paper [32].

Stash : As in the Path ORAM protocol, the client maintainsa local Stash, which is a small amount of storage locally atthe client. The purpose of the stash is to store overflown datablocks locally.

D. Main idea

The main idea of the protocol is very simple, we read dataalong a path, try to write data back to the same path (with somemodifications and new encryptions) and if there is insufficient

11It is important to note that the invariant does not say that the positioneach data block is uniform over the set of leaves.

12Kronecker delta is defined as

δij =

{0 if i 6= j

1 if i = j

6

Fig. 3: Root ORAM server storage : The figure illustrates the server side storage. The level 0 to k − 1 form a binarytree and the last level of the tree contains N = 2L leaves evenly distributed over the binary tree leaves.

Fig. 4: The new position of a data block is in generalnon-uniform according to this distribution. Note that p =1− 2−k and hence, the distribution reduces to uniform ifk = L.

storage, we retain those overflown data elements back in thelocal Stash.

Along with this, there is an independent access processof fake accesses13. These accesses are made by the user tothe server and are indistinguishable from real accesses. Fakeaccesses are drawn from a Poisson process with a parameterλ. It is important to note that the real access made by theclient and the fake accesses by the client machine are exactlythe same and hence are indistinguishable from the serversperspective.

13This is the similar to the eviction scheme described in [27], [28] withthe crucial difference that our fake accesses are completely indistinguishablefrom real accesses.

E. Details of the protocol

An access is defined as a 3-tuple

Accessi = (datai, elementi, operationi)

For a real access, given a particular access 3-tuple, the userfinds the mapping of the data block needed using his localposition map. He then requests the whole path of that leaf fromthe server tree. After processing the data and generating newrandomized encryptions, the user writes the data back to thetree with the element that was accessed at a new location alongthe path. But the key idea here is that the element that wasaccessed has a non-uniform distribution14 of it being mappedto other leaves. It is more likely to be mapped to the sameleaf than to others and the probabilities involved are decidedby the security parameter k.

The broader picture of the protocol is as follows. The clientsystems makes real as well as fake accesses to the server. Thereal access is as described in the previous paragraph. Thereis a parameter λ which controls the amount of fake accesses.One way of implementing the protocol is in the followingway.15

Access(op, a, data∗) :

1: while ORAM is under use do2: α← Poisson(λ)3: for i = 1 : α do4: normal access(a)5: end for6: fake access()7: end while

14The distribution becomes uniform if k = L = logN .15It should be noted that the code has been structured in the following way

for clarity of understanding and hence can be optimized in a number of ways.

7

normal access(a) : A normal access consists of the fol-lowing functions in order: read(a), push down(position[a]),update mapping(a) and finally a write(a).

read(a) : The reading phase is the same as that in PathORAM; Using the local client side mapping, the client findsout the leaf to which the data element a is currently mappedi.e find x such that x := position[a]. We then request allthe data blocks in the buckets along path P (x). The invariantensures that the client can retrieve a, its data element, fromthese. This completes the reading phase of the protocol.

read(a) :

1: x← position[a]

2: for i ∈ {0, 1, ..., k} do3: S ← S ∪ ReadBucket(P (x, i))4: end for

update mapping(a) : After reading a data block, wemodify its mapping using the distribution mentioned in Eq.8i.e update mapping keeps the mapping same with probability(1 − p) and with the remaining probability changes it to auniformly random leaf among the remaining leaves.

update mapping(a) :

1: x← position[a]2: if Bernoulli(p) = 0 then3: return x4: else5: return UniformRandom

({0, 1, 2, ..2L−1} \ {x}

)6: end if

push down(position[a]) : When any path is accessed,this function tries to place any data blocks along the pathP (position[a]) or in the Stash to lower positions on the samepath if possible.

write(a) : Once the mapping is updated, say initiallyx := position[a] and after updating the mapping z :=position[a], we try to write the data block back into thebucket which is the lowest intersection of the two paths inconsideration i.e. lowest bucket in P (x)

⋂P (z) (with the

convention that bucket with the highest level number is theroot at level 0) which has an empty/dummy block.

fake access() : A fake access is issued to push backelements from the stash to the tree. More precisely, afake access is issued on a non-empty stash. One datablock, say a′, is chosen at random from those in the Stashand a normal access is performed on a′, i.e., : read(a′)followed by push down(position[a′]) followed by anupdate mapping(a′) followed by a write(a′).

VI. THEORETICAL EVALUATION

In this section, we shall state our main theorems, their proofsand a few interesting special cases.

Theorem 1 (Main theorem): Given a stash size C,the Root ORAM with parameters k, Z and λ is (ε, δ)-differentially private for ε = 2 log

(N−12k−1

)and δ = 0.5Mk

where Mk = (C + Z(k + 1) + 1)

Proof : The theorem has two parts, the ε bound and the δbound. Firstly, we give a brief insight into the two securityparameters ε and δ. The proof is then structured as follows :

The ε bound :• We set up the differential privacy framework in the

ORAM setting.• Then we set up the probability evaluation model which

takes the real and observed access sequences as inputsand finds the probability of that real sequence leading tothat output sequence by the ORAM.

• Finally, we compute the maximum change that one accessin the input sequence can have on the probability of theoutput sequence over all possible output sequences.

• These together give the ε bound.The δ bound :• We begin by showing the need for δ in the security

framework.• We then conservatively evaluate a bound on δ.

ε and δ can be interpret as follows: Given an ORAMscheme with an unbounded amount of local stash, we showthat such a scheme is ε-differentially private. But the momentwe introduce a finite amount of stash, this is no longer trueas is shown in Sec. VI-D. And the privacy loss under such asituation is precisely the quantity that is bounded by δ.

In the context of Path ORAM, δ characterizes the privacyloss if the stash size exceeds its bounds. Another quantity ofinterest is the probability of the stash overflow. Similarly, inRoot ORAM, δ quantifies the privacy loss if the stash size isexceeded.

The ε bound

A. Framework set-up

The notation used is specified in Table II. Recall that a fakeaccess in Root ORAM is indistinguishable from a real access.Hence, for our theoretical analysis, we make a conservativeassumption that the sequence of accesses requested by theclient machine were all real accesses. In other words, amongM access, some are real and some are fake. We conservativelyassume that these can be distinguished. In practice, the securityoffered by our approach is higher since the untrusted serverstorage cannot differentiate fake accesses from real accessesin practice.

More formally, let fi denote the set of fake accesses andri denotes the real set of accesses made by the ORAM. We

8

Symbol Description

k ≥ 1 Model parameterp = 1− 1/2k Derived model parameterN = 2L Number of real data blocks outsourcedM Access pattern sizeC Stash sizep1 (1− p)p2 p/(N − 1)Mk Mk = Z(k + 1) + C

TABLE II: Notation for analysis of Root ORAM

denote by Ri, the complete set of accesses made (ri alongwith fi). Thus we have that :

maxr1,r2

|r1−r2|=1

Pr[ORAM(r1) = o]

Pr[ORAM(r2) = o]≤ max

R1,R2

|R1−R2|=1

Pr[ORAM(R1) = o]

Pr[ORAM(R2) = o]

(9)where ORAM(Ri) denotes the ORAM protocol output on se-quence Ri without any additional fake accesses. But to provethe bounds of differential privacy in the theorem, we need tobound the following term :

maxr1,r2

|r1−r2|=1

Pr[ORAM(r1) = o]

Pr[ORAM(r2) = o]≤ eε

Hence, it is sufficient to bound the latter quantity in Eq. 9 byeε.

B. Probability model

Next, we evaluate the ratio of the probabilities by invokingthe secondary invariant. Recall that our secondary invariant is: after each access for an element (real/fake) the position mapof that element (and none other) changes randomly accordingto the distribution D given in Eq. 8.

With this invariant, we can compute the probability of aparticular real sequence R leading to a particular observedsequence o. For our computation, we write the real sequence(including fake accesses) below the observed sequence andcalculate the probabilities according to the following rules :• The first time a data block is accessed, its location is

random. Hence we write a 1/N below this access.• When an element that was accessed before is accessed,

we write a p1 or p2 in the probability calculation de-pending on whether the observed locations were same ordifferent respectively.

• A background check is maintained, if at any time thereare more than (k + 1) × Z + C data blocks mapped tothe same location, the probability becomes 0, where kis the model parameter, Z is the bucket size and C isthe maximum stash size. Refer to Subsection VI-D fordetails.

• Finally, we multiply all the written probabilities to getthe final probability Pr[ORAM(r1) = o].

This is demonstrated in the Table III.

Observed seq. a b a c a a b dReal seq. x y x z y y z x

Probabilities 1N

1N

p11N

p2 p1 p2 p2

TABLE III: An example of how one can write probabilitiesdirectly given the real and observed access patterns r ando. Different symbols are used for real and observed accesspatterns merely for the clarity of the demonstration. p1and p2 are as defined in Eq. 8 or Table II.

Observed seq. a b a c a a b dReal seq. x y x z y y z x

Probabilities 1N

1N

p11N

p2 p1 p2 p2

TABLE IV: The symbols in blue are the only ones thataffect the probability that will be written under the dataelement shown by an enclosing box. The data elements inred show the previous and next access of the boxed dataelement.

C. Maximum change

Next, we find the maximum change in the probabilities thatcan occur as a result of changing one real access.

First we note that in the probability model, different ac-cessed data elements have independent chains of probabilities,i.e. each data element has a probability written independentof the other symbols. Also, the probability written under eachdata element depends only on its previous location and nothingelse (and is governed by the distribution D given by Eq. 8).Hence, if one data access is changed, the maximum changethat can occur in the probability is at most in two placesv.i.z the location which was modified and the next accessedlocation of that data element. With this, we can enumerateall the possible cases that can occur and find the maximumchange in probabilities. To do this efficiently, we develop somemore notation.

Let the accessed data element be changed from a to b. Letthe previous location of access of a data element a be lpa(leaf pa) and the next location be lna. Similarly, the previouslocation of access of b is lpb and the next location as lnb. Ifany of these 4 do not exist i.e the symbol was never accessedbefore or was never accessed afterwards, we define that leafto be 0 for simplification of the equations. In other words, ifdata element a was never accessed after the location of accesschange, then lna = 0. Let l be the location of the accessin consideration i.e the location of data access which waschanged in r1 and r2. Note that in our calculations, we havethe same observed sequence for both the sequences r1 and r2,the location of access l is the same in both the sequences. Thisis shown in the Fig. 5.

Now, the probabilities can differ in at most 3 places v.i.z l,lna and lnb. Let r1 be the sequence with symbol a and r2 bethe sequence with symbol b. To make the equations crisp, we

9

Fig. 5: The sequences r1 and r2 differ by one elementshown in the box. The previous accessed location and thenext accessed location are as shown. Note that the observedsequence o for both is the same (condition for DifferentialPrivacy). The dots denote irrelevant accesses (accesses forelements different from a and b).

define the following extension to the Kronecker delta function,

δij =

0 if i 6= j

1 if i = j1/N−p2p1−p2 if j = 0

This modification of the Kronecker delta is for the simplicityof the equations. Specifically, the modification ensures that if asymbol is accessed for the first time, then its probability givenby Pz,x = p2 + (p1 − p2)δzx evaluates to 1/N as it should.Now if Pr[ORAM(R1) = o] > 0 and Pr[ORAM(R2) = o] > 0,so that their ratios are well-defined, we can calculate the ratioof the probabilities as :

Pr[ORAM(R1) = o]

Pr[ORAM(R2) = o]=Pl,lpa · Plna,l · Plnb,lpb

Plna,lpa · Pl,lpb · Plnb,l

After observing that

1/N

p1≥ p2p1

we can see that this maximum value of the ratio of probabili-ties occurs when lna = l = lpa and lpb = lnb 6= l. In this case,the ratio is given by,

p1 · p1 · p1p1 · p2 · p2

=

(p1p2

)2

Evaluating this in terms of our parameters, p1 = (1−p) = 2−k

and p2 = pN−1 = 1−2−k

N−1 and plugging this into the differentialprivacy equation, we get

maxr1,r2

|r1−r2|=1

Pr[ORAM(r1) = o]

Pr[ORAM(r2) = o]≤ max

R1,R2

|R1−R2|=1

Pr[ORAM(R1) = o]

Pr[ORAM(R1) = o]

≤(p1p2

)2

=

(N − 1

2k − 1

)2

It is important to note that the above equation holds for allobserved access sequences o. And hence, we can see that Root

ORAM guarantees ε = 2 log(N−12k−1

). This completes the ε

bound16.

The δ bound

D. The need for δ

In this subsection, we compute the δ bound in the the-orem. We use the notation from the Path ORAM paper todemonstrate this short-coming. We assume that the Stash sizeis bounded by C and let ML denote Z logN + C + 1 .

We assume that the ORAM has been used to access eachelement at least once. For demonstration purpose, we constructa minimal working example :

−→y = ((r , 1 , x ), (r , 1 , x ), ..., (r , 1 , x )) and (10)−→y ′ = ((r , 1 , x ), (r , 2 , x ), ..., (r ,ML, x )) (11)

where r denotes the read operation and x denotes data whichis not important for the demonstration. In words, one accesssequence consists of ML accesses to the same element andthe second access sequence consists of ML different accessesto elements 1, 2, ...,ML.

Now, of all the possible sequences ORAM(−→y ) can produce,we can see that the sequence 1, 1, ..., 1 can be one of them.17

But, its not hard to see that the same sequence 1, 1, ..., 1can never occur as ORAM(−→y ′). The reason for this is simplybecause we cannot ever map more than ML elements to thesame path (or else the Path ORAM invariant is broken i.e stashoverflows) and hence the ML accesses to that one locationcannot all be for different symbols.

To demonstrate this, we project this as an attack on the PathORAM protocol. We imagine a hypothetical situation where aprogram is using the Path ORAM protocol to hide its accesspattern. We know that the program has the following traits,

Access Pattern =

{1, 1, 1, ..., 1 if Secret = 11, 2, 3, ..,M if Secret = 0

We assume that the program makes sufficiently large numberof accesses. Now, if y is the access real pattern, we know thatif ever we see a sequence of ML or more access made tothe same location in ORAM(−→y ), we can immediately infer thatSecret = 1!!18

E. δ bound

Back to the Root ORAM protocol, we can see that theprobability of an observed sequence can suddenly jump from0 to a non-zero value after one data access has been changed.

16It is interesting to note that as k becomes reasonably large (compared to1), we can approximate the last expression by N/2k and get an estimate onε as 2(L− k), where N = 2L as before.

17For that matter so can any sequence a, a, ..., a for any a ∈ {1, 2, 3, ...N}18The reason for this is that there is another constraint in the system which is

that no leaf can have more than ML data blocks mapped to it. This is becauseeach path P (x) has Z logN buckets and along with the main invariant thateach block is stored somewhere along the path from the mapped leaf to theroot or in the Stash. We assume that this is covered in the failure probabilityof the ORAM because the probability of this occurring is very very low.

10

And this is what is captured by the δ in the (ε, δ)-differentialprivacy framework for ORAMs.

Let Mk denote the number (C + Z(k + 1) + 1). It is easyto see that there is a sudden jump in the probability from 0to a non-zero value when the real access is changed at onelocation when we look at any such sequence. In particular wechoose the two sequences to be the following :

r1 = (1, 2, 3, ...,Mk)

r2 = (1, 2, 3, ...,Mk − 1, 1)

If Pr[ORAM(ri) = o] > 0 for i = 1, 2, then we have alreadyshown the ε bound and hence δ = 0. So it remains tofind the maximum δ when one of these terms is 0. WLOG,Pr[ORAM(r1) = o] = 0. Hence δ is the maximum valueof Pr[ORAM(r2) = o] i.e the maximum probability over aneighboring sequence compared to a zero probability over theoriginal sequence. Now, one simple upper bound on δ can befound by noting the following: Since the probabilities used tocompute for each access are at most p1 (they are either p1 orp2 or 1/N ), we can get a quick upper bound on δ as

δ ≤ pMk1 ≤ 0.5Mk (12)

Where Mk = (C+Z(k+1)+1) and the last inequality followsby inserting the worst case value of p1 which is p1 = 1/2 whenk = 1. This completes the δ bound.19 �

F. Bandwidth

Theorem 2: The bandwidth of the Root ORAM protocolwith parameters k, Z and λ is 2 × Z(k + 1) × (1 + 1/λ)per real access.

Proof : The number of blocks in any path of the tree isequal to Z(k + 1) and hence twice the number of blocks aretransferred per read and write. Also, they way the parameterλ is set (i.e the way the fake accesses are programmed), weperform on an average λ real accesses per fake access. (theaverage of a Poisson process with parameter λ is λ). Hence,the bandwidth gets an addition factor of (1 + 1/λ) per realaccess. �

G. Special Cases of Root ORAM

Case 1 : k = 1 This is a particularly interesting case.The bandwidth used is an extremely low constant. Withrealistic estimates of parameters, the security guaranteed bythis scheme is about ε = 2 logN , which is roughly 60 whereas the bandwidth used is a mere 10 blocks per real access andis independent of N (using Z = 2 and λ = 4).

Case 2 : Path ORAM

Theorem 3: Path ORAM protocol is an instantiation ofRoot ORAM with the following values of parameters :k = L = logN,Z = 5 and λ =∞.

Proof : This follows directly by noting that the distributionD which is the distribution of the new location of any data

19A strictly better bound can be evaluated by actually evaluatingPr[ORAM(r1) = o].

Fig. 6: This figure shows the security-bandwidth trade-offfor N = 230. We would like to highlight the low bandwidthdesign space that has opened up with the introduction ofRoot ORAM. The low bandwidth regime shows an order ofmagnitude improvement over state-of-the-art. A red crossis marked to reference the Path ORAM protocol in thisspace and a blue circle is used to reference the Ring ORAMprotocol.

block reduces to a uniform distribution when k = L and thatλ = ∞ corresponds to no fake access. This is precisely thePath ORAM protocol. �

Case 3 : k = L We can see that k = L corresponds tocomplete binary tree. We get ε = 0, the equivalent of perfectsecurity in the differential privacy formulation. Given λ = 1and Z = 2, the protocol has very low stash size (refer toSec. VII for technical details). The average bandwidth used is∼ 8 logN (compared to ∼ 10 logN) for Path ORAM). Thisis because Root ORAM uses a combination of fake accesses(λ) and smaller bucket size (Z).

H. Security-Bandwidth trade-off

Fig. 6 show the security-bandwidth trade-off, the centralresult of this paper. It shows possibility of having lowerbandwidth at the cost of a rigorously quantified privacy loss,for various values of Z and λ. We plot the figure in a log scaleto show the order of magnitude improvement in bandwidthover existing protocols. The lowest bandwidth protocol uses amere 10 Blocks per access of bandwidth (for λ = 4) and givea security of ε = 60. We would also like to highlight that thebandwidth in the low bandwidth regime independent of N .With realistic values of parameters, δ is negligibly small andhence we do not show plots of δ.

VII. SYSTEMS EVALUATION

We have already established the security-bandwidth trade-off using theoretical analysis. Next, we would like to investi-gate the hidden component of the system viz. the stash sizeused. To recapitulate, ε gives a bound on a the protocol withan unbounded stash. If we introduce a stash size constraint,then δ characterizes the privacy loss if the stash is overflown.

11

Symbol Description

L From 10 to 21k Runs from 1 to LZ Z ∈ {2, 3, 4, 5}λ λ ∈ {0.25, 0.5, 0.75, 1, 2,∞}

TABLE V: Simulations limits

But we need to show the relation between stash size andthe overflow probability viz. small stash size is much morelikely to overflow whereas large stash sizes are less likely tooverflow. We resort to simulations to demonstrate how largeshould the stash size be to have low probabilities of overflow.

We simulate Root ORAM for various values of the parame-ter to understand the impact of design parameters on the stashsize. Specifically, we varied L from 10 to 21, k from 1 toL, Z from 2 to 5 and for six different values of λ. This hasbeen tabulated in Table V. We define outsourced ratio as thefraction of client storage to the total data outsourced.

We begin by giving the details of our implementations.

A. Details of the implementation

We implemented the complete functionality of Root ORAMin C++. We plan to make our implementation publicly avail-able as an open source software. We performed all experimentson a 1.4 GHz Intel processor. The Amazon EC2 experimentswere performed using a TCP connection for reliable datadownloads.

We use random access patterns for the simulations andthe maximum stash size is calculated excluding the transientstorage for one path. Unlike current work, we independentlystudy the effect of increasing the number of accesses (M )on the max stash size. The rationale behind this is thatin current ORAM evaluations, a fixed number of accessesallows us to absolutely bound the stash size. But in anyprobabilistic ORAM, this stash size will be probabilistic andgiven a sufficiently large number of access, is going to exceedthe bounds given. Hence, we independently present resultsabout the dependence of the maximum stash size on M , thenumber of accesses made to the ORAM. Next we brieflydescribe the aims of our evaluations before showing its results.

Security and Bandwidth trade-off curve : First we wouldlike to explicitly show the trade-off curve of security vs band-width. We would like to highlight the two dimensional natureof the graph which demonstrates an explicit design spacebased on the system requirements. Similarly, we remark aboutthe low bandwidth regime which has a constant bandwidthindependent of N .

Stash usage : Next we explore the effect of N on themaximum stash used at the client side. We aim to investigatethe outsourcing ratio for large values of N and demonstratelow-stash usage of our protocol.

Fig. 7: This figure illustrates the maximum stash usage asa function of N . Different lines correspond to differentvalues of λ. To put this in perspective, the maximum stashusage for 10 GB of outsourced data is roughly 40 MB(using a 4 KB block size).

Number of accesses : It is also important to show thedependence of the maximum stash usage on M , the numberof accesses made by the ORAM. We expect the growth of themaximum stash used to be extremely slow with M , whichwill make this a feasible architecture. Since this is not acritical aspect of the paper, the content has been deferred tothe Appendix A.k-dependence of stash size : We would then like to

show the maximum usage of stash size as a function of theparameters of the model viz. k, λ and Z, which gives a holisticview of the effect of various design parameters. We aim toinvestigate the different regimes of these graphs such as thelow bandwidth regime and the high bandwidth regime.

Latency study via Amazon EC2 : Finally, we study thelatency incurred for Root ORAM memory accesses using anAmazon EC2 server. Details of the client machine are notdisclosed to maintain the anonymity of the authors. We studythe effect of latency as a function of k for three different blocksizes viz. 1 KB, 4 KB and 16 KB.

B. Max stash size vs N

In light of the recent paper by Bindschaedler et al. [2], webase our experimental evaluation by giving due importance tothe constants involved in relevant equations. Fig. 7 shows thedependence of the maximum stash used on N , the numberof outsourced blocks. Different lines correspond to differentvalues of λ. As can be seen, the effect of λ goes down forrelatively large values of N . We believe the reason for this isthat during a probabilistic run of the protocol, certain paths ofthe tree have more real data elements causing crowding and itbecomes less likely for fake accesses to alleviate that for largevalues of N .

As can be seen in Fig. 7, the worst case stash size forstandard 4 KB blocks for 10 GB of data outsourced is 40

12

(a) Z = 2 (b) Z = 3

(c) Z = 4 (d) Z = 5

Fig. 8: These figures show the dependence of the maximum stash used on the fake access parameter (λ). N = 220

and corresponding Z sizes are mentioned below the figures. Towards the low bandwidth regime, outsourcing ratios of1/1000 can be achieved (for Z = 5). Another interesting feature of these graphs is in their blue regions, where k isclose to L = logN . In these regimes, the stash size is extremely low.

MB. The growth in Fig. 7 suggests that this outsourced ratiowill be smaller for larger values of N .

C. k-dependence of stash sizeFig. 8 and Fig. 9 show the dependence of the maximum

stash used as a function of k, Z and λ respectively. All theseplots are for N = 220.

When low bandwidth is used, the stash size usage isrelatively high, though with modest values of the parameters,the outsourcing ratio can be reduced to acceptable values. Forsmall values of Z, we have an outsourcing ratio of about1/30, whereas for larger values, the ratio is almost 1/1000. Itenables a smartphone client with only 1 GB of local storage tooutsource 1 TB of data to the untrusted cloud server. Similarly,as can be seen from the Fig. 7, the growth of the maximumstash size is considerably flat implying that the outsourcedratio gets much better with higher amounts of outsourced data.

Another interesting feature of Root ORAM is the extremelylow stash size used by the models with near perfect securityi.e models with k between roughly 13 and 20. Though RootORAM does not have theoretical bounds on the stash usagelike in Path ORAM, it can be seen clearly how these aspects tietogether from these figures. 9 as Path ORAM is one particularinstantiation of Root ORAM in the ε = 0 security regime andas can be seen, the stash size is extremely low.

D. Real-world implementation

Next, we evaluate our real-world implementation of RootORAM using an Amazon EC2 server. We aim to compute thelatency overhead of a memory accesses as a function of RootORAM parameters. We anonymize the geographic locations ofthe servers to protect author identities. Fig. 10a depicts latencyas a function of k for Z = 2, λ = 4 while Fig. 10b depicts the

13

(a) λ = 0.25 (b) λ = 1

Fig. 9: These figures show the dependence of maximum stash used on the bucket size Z N = 220. The small differencebetween Fig. 9a and Fig. 9b is because at high values of N , there is little difference between λ = 1 and λ = 0.25 interms of stash size usage. It is only noticeable at the lower end, i.e., around Z = 2.

(a) (b)

Fig. 10: Real-world implementations over Amazon EC2. These figures show the latency as a function of k and theapplication bandwidth for N = 220 and three block sizes viz. 1 KB, 4 KB and 16 KB. Fig. 10a shows the latency vsk whereas Fig. 10b shows the latency as a function of the constrained application bandwidth. It is worth noting thesignificant difference between the latencies for different values of k when application bandwidth is constrained.

latency as a function of the client bandwidth for k = 1, Z =2, λ = 4. We used the trickle application to constraint thebandwidth at our client machine to desired values. We can seethat Root ORAM enables an application designer to achievedesired trade-offs between system performance and security.It is important to note that the difference between latencies forlow and high values of k is significant at the regime of lowclient bandwidth. We provide the full data set in Appendix B.

E. Summary

To summarize, we have shown the practicality of RootORAM through theoretical analysis, simulations and real-world experiments. . Theoretically, we have shown the band-width and security for different values of the protocol param-

eters. Experimentally, we have shown the dependence of thestash size and memory access latency on Root ORAM parame-ters, demonstrating the possibility of multi-dimensional trade-offs and orders of magnitude performance improvements.

VIII. RELATED WORK

Since the formalization of the concept of an ObliviousRAM, in a seminal paper by Goldreich and Ostrovsky [14], theresearch community has made substantial progress in makingORAM practical by improving their performance [4], [12],[25], [26], [30]–[32]. Recent work has also shown the promiseof using ORAMs as a critical component in developingprotocols for Secure Multi-Party Computation [12].

14

A recent benchmark for ORAMs has been the Path ORAMprotocol [32]. It builds upon previous hierarchical construc-tions such as [31] and gives rigorous bounds on stash usage.generalizes the construction of Path ORAM to provide a tun-able framework offering differentially private guarantees. RootORAM also reintroduces the eviction scheme of dummy/fakeaccesses. This was first looked into by Shi et al. [28] and for-malized in Ren et al. [27]. The latter also highlights the poten-tial pitfalls in proposing eviction schemes that are not provablysecure while demonstrating the security consequences of onesuch scheme. Root ORAM uses dummy accesses which areindistinguishable from real ORAM accesses and gives rigorousbounds on the security.

Another novel concept that was recently introduced in theORAM domain was that of the XOR technique to reduce on-line bandwidth. Online bandwidth, first formalized by Bonehet al. in [3] was reduced to O(1) using the XOR techniqueby Dautrich et al. [4]. The XOR technique can be extendedto Root ORAM as well, which will further introduce andinfluence the trade-offs in the design space.

Two optimizations for [28] were provided by Gentry etal. [12]. Concretely, they show the benefits of using a treestructure with multiple leaves instead of 2 as in the case of abinary tree. This idea is in similar spirit as that of Root ORAMthough these differ considerably in terms of their working aswell as the eviction scheme used. The higher-degree tree inthe paper is a complete higher-degree tree (i.e degree of eachnode is the same) where as in the Root ORAM paper, the treeis binary till the last level and only the last level nodes havea higher degree. This leads to very different dynamics of thetwo schemes. Similarly, Root ORAM uses fake accesses as itseviction process which is different from that in [12].

ORAM has been implemented and shown to be feasible ata chip level in prototypes such as the Ascend architecture [10]and the Phantom architecture [21]. But unlike the case of chip-level implementations where trusted local cache is expensive,most other applications have more client space. In fact, intoday’s settings, it is feasible to have client storage of theorder of 1 GB for outsourced data of about 1 TB [2].

In short, Root ORAM is the only protocol with tunablesecurity-bandwidth construction. Similarly, none of the pre-vious work deals with statistical privacy in the context ofORAMs. The notion and formalism of a differentially privateORAM is a novel and important contribution of our work.

IX. LIMITATIONS AND FUTURE WORK

To enable the design of stringent bandwidth constrainedapplications, a security-bandwidth trade-off is desirable andRoot ORAM takes the first step in this direction by introducinga tunable framework that provides rigorous differential privacyguarantees. This opens up a number of research ideas whichremain unexplored in the current work.

First, we would like to explore the integration of techniquesthat leverage server-side computation in the Root ORAMarchitecture, such as the XOR technique [3], [4]. Such anapproach can trade-off bandwidth at the cost of server-side

computation and can influence the design space of differen-tially private ORAMs. Second, we would like to explore theeffect of varying the ratio of number of blocks outsourcedto the size of the server-side storage. Gentry et al. [12] haveexplored similar techniques in the case of Path ORAM, and itwould be interesting to combine these techniques with RootORAM.

Our experimental results have demonstrated the requiredstash size for various parameters of Root ORAM. However,we acknowledge that Root ORAM lacks rigorous theoreticalguarantees on the stash usage. In future work, it would beinteresting to rigorously bound the stash size required by RootORAM.

X. CONCLUSIONS

To summarize, we present Root ORAM, a tunable familyof ORAM protocols which trade-off bandwidth (performance)with security. We introduce and formalize the notion of adifferentially private ORAM, which to our knowledge is thefirst of its kind.

We evaluate the protocol using theoretical analysis, simula-tions, and real world implementation on Amazon EC2. Wetheoretically prove that Root ORAM provides the rigorousprivacy guarantee of differential privacy. We experimentallydemonstrate that the stash size used by Root ORAM isbounded for realistic parameters of the scheme. Overall, RootORAM can serve as an enabler for real-world deployment ofoblivious RAM by providing novel design points that providean order of magnitude performance improvement over currentstate-of-the-art.

15

REFERENCES

[1] Mihir Bellare, Shafi Goldwasser, Carsten Lund, and Alexander Russell.Efficient probabilistically checkable proofs and applications to approxi-mations. In Proceedings of the twenty-fifth annual ACM symposium onTheory of computing, pages 294–304. ACM, 1993.

[2] Vincent Bindschaedler, Muhammad Naveed, Xiaorui Pan, XiaoFengWang, and Yan Huang. Practicing oblivious access on cloud storage: Thegap, the fallacy, and the new way forward. In Proceedings of the 22NdACM SIGSAC Conference on Computer and Communications Security,CCS ’15, pages 837–849, New York, NY, USA, 2015. ACM.

[3] Dan Boneh, David Mazieres, and Raluca Ada Popa. Remote obliviousstorage: Making oblivious RAM practical. 2011.

[4] Jonathan Dautrich, Emil Stefanov, and Elaine Shi. Burst oram: Mini-mizing ORAM response times for bursty access patterns. In USENIXSecurity, 2012.

[5] Jonathan L Dautrich Jr and Chinya V Ravishankar. Compromising pri-vacy in precise query protocols. In Proceedings of the 16th InternationalConference on Extending Database Technology, pages 155–166. ACM,2013.

[6] Cynthia Dwork. Differential privacy. In Automata, languages andprogramming, pages 1–12. Springer, 2006.

[7] Cynthia Dwork. Differential privacy: A survey of results. In Theory andapplications of models of computation, pages 1–19. Springer, 2008.

[8] Cynthia Dwork. The differential privacy frontier. In Theory ofcryptography, pages 496–502. Springer, 2009.

[9] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov,and Moni Naor. Our data, ourselves: Privacy via distributed noisegeneration. In Advances in Cryptology-EUROCRYPT 2006, pages 486–503. Springer, 2006.

[10] Christopher W Fletcher, Marten van Dijk, and Srinivas Devadas. Asecure processor architecture for encrypted computation on untrustedprograms. In Proceedings of the seventh ACM workshop on Scalabletrusted computing, pages 3–8. ACM, 2012.

[11] Srivatsava Ranjit Ganta, Shiva Prasad Kasiviswanathan, and AdamSmith. Composition attacks and auxiliary information in data privacy.In Proceedings of the 14th ACM SIGKDD international conference onKnowledge discovery and data mining, pages 265–273. ACM, 2008.

[12] Craig Gentry, Kenny A Goldman, Shai Halevi, Charanjit Julta, MarianaRaykova, and Daniel Wichs. Optimizing ORAM and using it efficientlyfor secure computation. In Privacy Enhancing Technologies, pages 1–18.Springer, 2013.

[13] O. Goldreich. Towards a theory of software protection and simulationby oblivious RAMs. In Proceedings of the Nineteenth Annual ACMSymposium on Theory of Computing, STOC ’87, pages 182–194, NewYork, NY, USA, 1987. ACM.

[14] Oded Goldreich and Rafail Ostrovsky. Software protection and simula-tion on oblivious RAMs. Journal of the ACM (JACM), 43(3):431–473,1996.

[15] Jean-Pierre Hubaux, Srdjan Capkun, and Jun Luo. The security andprivacy of smart vehicles. IEEE Security & Privacy, (3):49–55, 2004.

[16] MS Islam, Mehmet Kuzu, and Murat Kantarcioglu. Access patterndisclosure on searchable encryption: Ramification, attack and mitigation.In Proc. NDSS, volume 14, 2014.

[17] Daniel Kifer and Ashwin Machanavajjhala. No free lunch in dataprivacy. In Proceedings of the 2011 ACM SIGMOD InternationalConference on Management of data, pages 193–204. ACM, 2011.

[18] Boris Kopf and David Basin. An information-theoretic model for adap-tive side-channel attacks. In Proceedings of the 14th ACM conference onComputer and communications security, pages 286–296. ACM, 2007.

[19] Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. t-closeness:Privacy beyond k-anonymity and l-diversity. In Data Engineering, 2007.ICDE 2007. IEEE 23rd International Conference on, pages 106–115.IEEE, 2007.

[20] Yingbin Liang, H Vincent Poor, et al. Information theoretic security.Foundations and Trends in Communications and Information Theory,5(4–5):355–580, 2009.

[21] Martin Maas, Eric Love, Emil Stefanov, Mohit Tiwari, Elaine Shi,Krste Asanovic, John Kubiatowicz, and Dawn Song. Phantom: Practicaloblivious computation in a secure processor. In Proceedings of the 2013ACM SIGSAC conference on Computer & communications security,pages 311–324. ACM, 2013.

[22] Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthu-ramakrishnan Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data(TKDD), 1(1):3, 2007.

[23] Prateek Mittal and Nikita Borisov. Information leaks in structuredpeer-to-peer anonymous communication systems. ACM Transactionson Information and System Security (TISSEC), 15(1):5, 2012.

[24] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sen-sitivity and sampling in private data analysis. In Proceedings of thethirty-ninth annual ACM symposium on Theory of computing, pages75–84. ACM, 2007.

[25] Ling Ren, Christopher W Fletcher, Albert Kwon, Emil Stefanov, ElaineShi, Marten van Dijk, and Srinivas Devadas. Constants count: Practicalimprovements to oblivious RAM. In 24th USENIX Security Symposium(USENIX Security 15). USENIX Association.

[26] Ling Ren, Christopher W Fletcher, Albert Kwon, Emil Stefanov, ElaineShi, Marten van Dijk, and Srinivas Devadas. Ring ORAM: Closing thegap between small and large client storage oblivious RAM. Technicalreport, Cryptology ePrint Archive, Report 2014/997, 2014. http://eprint.iacr. org.

[27] Ling Ren, Xiangyao Yu, Christopher W Fletcher, Marten Van Dijk, andSrinivas Devadas. Design space exploration and optimization of pathoblivious RAM in secure processors. In ACM SIGARCH ComputerArchitecture News, volume 41, pages 571–582. ACM, 2013.

[28] Elaine Shi, T-H Hubert Chan, Emil Stefanov, and Mingfei Li. Obliviousram with o ((logn) 3) worst-case cost. In Advances in Cryptology–ASIACRYPT 2011, pages 197–214. Springer, 2011.

[29] Vitaly Shmatikov and Ming-Hsiu Wang. Measuring relationshipanonymity in mix networks. In Proceedings of the 5th ACM workshopon Privacy in electronic society, pages 59–62. ACM, 2006.

[30] Emil Stefanov and Elaine Shi. Oblivistore: High performance obliviouscloud storage. In Security and Privacy (SP), 2013 IEEE Symposium on,pages 253–267. IEEE, 2013.

[31] Emil Stefanov, Elaine Shi, and Dawn Song. Towards practical obliviousRAM. arXiv preprint arXiv:1106.3652, 2011.

[32] Emil Stefanov, Marten Van Dijk, Elaine Shi, Christopher Fletcher, LingRen, Xiangyao Yu, and Srinivas Devadas. Path oram: An extremelysimple oblivious ram protocol. In Proceedings of the 2013 ACM SIGSACconference on Computer & communications security, pages 299–310.ACM, 2013.

[33] Latanya Sweeney. k-anonymity: A model for protecting privacy. Interna-tional Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,10(05):557–570, 2002.

16

APPENDIX

Previous venues submitted : This paper has not beensubmitted previously to any venue and is being sent forpublication for the first time.

APPENDIX A

A. Max stash size vs M

We examine the dependence of M on the max stash size.This max stash size is computed as the maximum over allvalues of k for a given value of N . Fig. 11 shows the resultsof the simulations as M varies from N to 105 × N . Theresults show that the stash size does grow with M , whichcan be expected since the model is probabilistic20. But the keyconclusion here is that this growth of the maximum stash usedis extremely slow as a function of M . From the simulations inFig. 11, the growth seems to be logarithmic or sub-logarithmic.

Fig. 11: This figure illustrates dependence of the maximumstash usage with the number of accesses made by theORAM. N = 210, λ = 1 were used in these simulationsand the x-axis plots the number of real access (M ). To putthis in context, we start with number of access M ∼ Nand go all the way to M ∼ 105N . It is important to notethat though the stash size does show an increasing trendwith M , which it should since it is a probabilistic system,its growth is extremely slow with M (logarithmic or sub-logarithmic).21

APPENDIX B

Due to high fluctuations in the bandwidth, the data pointsgenerated by real-world implementations over Amazon EC2

20Hence given sufficient number of accesses any amount of stash size willbe overflown

21Path ORAM guarantees a Stash size of O(logN) because their maintheorem gives that O(logN) bound for the number of accesses M = N . Ifwe use Path ORAM for xN accesses, then the probability of success roughlygoes down exponentially in x ∼ (1−pf )x where pf is the failure probabilityof ORAM. This can be worked around by not having independent simulationsbut it is still worth mentioning it.

servers have high variation. This is pronounced in the case ofthe 16 KB block size around k = 20 and hence we provide afull box plot of all the data points here in Fig 12.

Fig. 12: Real-world implementation over EC2. This figureshows the complete box plot of the latency data for 16 KBblock size.

17

root oram: a tunable differentially private oblivious ram · oram family requires a bandwidth of a...

Documents