anrin chakraborti*, chen chen, and radu sion datalair ...€¦ · (“single-snapshot...

Proceedings on Privacy Enhancing Technologies ; 2017 (3):175–193

Anrin Chakraborti*, Chen Chen, and Radu Sion

DataLair: Efficient Block Storage with PlausibleDeniability against Multi-Snapshot AdversariesAbstract: Sensitive information is present on ourphones, disks, watches and computers. Its protectionis essential. Plausible deniability of stored data allowsindividuals to deny that their device contains a pieceof sensitive information. This constitutes a key tool inthe fight against oppressive governments and censor-ship. Unfortunately, existing solutions, such as the nowdefunct TrueCrypt [5], can defend only against an ad-versary that can access a user’s device at most once(“single-snapshot adversary”). Recent solutions havetraded significant performance overheads for the abil-ity to handle more powerful adversaries able to accessthe device at multiple points in time (“multi-snapshotadversary”). In this paper we show that this sacrificeis not necessary. We introduce and build DataLair1, apractical plausible deniability mechanism. When com-pared with existing approaches, DataLair is two ordersof magnitude faster for public data accesses, and 5 timesfaster for hidden data accesses. An important compo-nent in DataLair is a new write-only ORAM construc-tion which improves on the complexity of the state ofthe art write-only ORAM by a factor of O(logN), whereN denotes the underlying storage disk size.

Keywords: Plausible deniability, oblivious access

DOI 10.1515/popets-2017-0034Received 2016-11-30; revised 2017-03-15; accepted 2017-03-16.

1 IntroductionWith increasing amounts of sensitive data being storedon portable storage devices, disk encryption has becomea necessity. Although full disk encryption (FDE) tools(such as dm-crypt) provide protection against unautho-

1 A preliminary version of this paper was present as a poster [26]with an overview of the technical solution described here.*Corresponding Author: Anrin Chakraborti: StonyBrook University, E-mail: [email protected] Chen: Stony Brook University, E-mail:[email protected] Sion: Stony Brook University, E-mail:[email protected]

rized adversaries attempting to access sensitive data atrest, it does not allow the owner to deny possession ofsensitive data. This is a serious challenge in the pres-ence of oppressive regimes and other powerful adver-saries that may want to coerce the user into revealing en-cryption keys. Unfortunately, this is an all-too-commonoccurrence as illustrated by numerous examples [24, 25],where sensitive data in possession of human rights ac-tivists have been subject to government scrutiny in op-pressive regimes, thus endangering the witnesses.

Plausible deniability (PD) provides a strong defenseagainst such coercion. A system with PD allows its usersto deny the existence of stored sensitive information orthe occurrence of a sensitive transaction[15].

An example of a plausibly deniable storage solu-tion is the successful, yet unfortunately now-defunctTrueCrypt [5]. TrueCrypt divides a disk into multiple“password-protected” volumes and allows some of thesevolumes to be “hidden” in order to store sensitive data.Password-derived encryption keys are used to encrypteach such volume. Upon coercion, a user can plausiblydeny the existence of a hidden volume by simply provid-ing a valid password for one of the non-hidden ones, thusshowing a plausible use for the disk without revealingthe hidden volume data. TrueCrypt stores hidden vol-umes in the free space of non-hidden (public) volumes.To mask their existence, TrueCrypt fills all free spacewith random data and encrypts the hidden data with arandomized semantically secure encryption scheme withoutput indistinguishable from random.

However, as pointed out by Czeskis [9], TrueCrypt isnot secure against an adversary that can access the diskat multiple points in time (e.g., multiple security checksor border crossings). In such scenarios, an adversary cansave a disk snapshot and compare subsequent snapshotswith it. Changes to free space occurring between snap-shots will suggest the existence of hidden data.

A major reason why TrueCrypt fails to provide PDagainst an adversary with multiple snapshots is becauseit does not attempt to hide access patterns. The adver-sary can point out exact locations on disk that havechanged in between snapshots and notice that the ap-parently free portion of the disk (potentially containinghidden data) appears altered.

DataLair 176

To defeat a multi-snapshot adversary, we need toeliminate all evidence of hidden data and its correspond-ing accesses – for example by ensuring that all modifica-tions on the disk are attributable and indistinguishablefrom the traces of public data operations.

This means that modifications to apparently freespace should be part of normal behavior of plausiblepublic operations and the traces of hidden data accessesshould be indistinguishable from the traces of publicdata accesses.

One effective way to achieve this is to “cloak” hid-den accesses within public data accesses by always per-forming a public operation for every hidden operation.Further, oblivious access mechanisms (ORAM) can beused for randomizing accesses and making them indis-tinguishable [7]. Unfortunately, ORAMs come with veryhigh overheads and reduce overall throughput by ordersof magnitude.

Fortunately, a new insight emerges that enables asignificant throughput increase: accesses to public datado not need to be hidden since the existence of publicdata is admitted. In fact, revealing access patterns topublic data reinforces deniability as it shows non-hiddendisk use to a curious adversary.

Consequently, DataLair uses this insight to designa significantly more efficient way to achieve strong PD:protecting only operations on the hidden data, whileensuring that they are indistinguishable from operationson public data (thus allowing the user to claim thatall I/O to the disk is due to public accesses). Further,DataLair also optimizes the oblivious access mechanismdeployed for hidden data.

In summary, public data is accessed (almost) di-rectly without the need to use an oblivious access mech-anism while hidden data accesses are mitigated througha new throughput-optimized write-only ORAM whichsignificantly reduces access complexity when comparedto existing work [7, 12]. As a result, DataLair is twoorders of magnitude faster for public data accesses, and5 times faster for hidden data accesses, when comparedto existing work.

2 Related WorkPlausible deniability (PD) was first proposed in rela-tion to deniable encryption [8]. Deniable encryption usescryptographic techniques to allow decrypting the sameciphertext to different plaintexts.

Filesystem Level PD. For storage devices, Andersonet al. first explored the idea of steganographic filesys-tems and proposed two solutions for hiding data in[6]. The first solution is to use a set of cover files andtheir linear combinations to reconstruct hidden files.The ability to correctly compute the linear combinationrequired to reconstruct a file was based on the knowl-edge of a user-defined password. The second solutionwas to use a hash based scheme for storing files at loca-tions determined by the hash of the filename. This re-quires storing multiple copies of the same file at differentlocations to prevent data loss. Macdonald and Kahn [14]designed and implemented an optimized steganographicfilesystem for Linux, that is derived from the second so-lution proposed in [6]. Pang et al. [17] further improvedon the previous constructions by avoiding hash collisionsand more efficient storage.

The solutions based on steganographic filesystemonly defend against a single-snapshot adversary. Han etal. [11] designed a steganographic filesystem that allowsmultiple users to share the same hidden file. Further,runtime relocation of data ensures deniability againstan adversary with multiple snapshots. However, the so-lution does not scale well to practical scenarios as deni-ability is attributed to joint-ownership of sensitive data.Defy [18] is a log structured file system for flash devicesthat offers PD using secure deletion. Although, Defyprotects against a multi-snapshot adversary, it does soby storing all filesystem related metadata in the mem-ory, which does not scale well for memory constrainedsystems with large external storage devices.Block device level PD. At device-level, disk en-cryption tools such as Truecrypt [5] and Rubberhose [4]provide deniability but cannot protect against a multi-snapshot adversary. Mobiflage [20] also provides PDfor mobile devices against a single-snapshot adversary.Blass et al. [7] were the first to deal with deniabilityagainst a multi-snapshot adversary at device level. Thesolution in [7] deploys a write-only ORAM for mappingdata from logical volumes to an underlying storage de-vice and hiding access patterns for hidden data withinreads to non-hidden (public) data.

3 ModelWe focus on storage-centric plausible deniability (PD)as in the results discussed above, but we note that PDhas also been defined in more general frameworks [3].

DataLair 177

Plausible Deniability in real life. It is important tounderstand however that the mere use of a system withPD capability may raise suspicion! This is particularlythe case if PD-enabled systems have high overheads orare outright impractical when accessed for public datastorage. This is why it is important to design mecha-nisms that are practical and do not impede the use ofthe device, especially for storing non-sensitive data. Weenvision a future where all block device logic is endowedwith a PD mode of operation.

3.1 Preliminaries

Adversary. We consider a computationally bounded“multi-snapshot” adversary that has the power to co-erce the user into providing a password. As in existingresearch [7], we also assume that the device user is notdirectly observed by the adversary during writes – asmall amount of volatile memory is being used duringreads and writes and is inaccessible to the adversarythat can only see static device snapshots.Configuration. While we note that there may be anumber of other ways to achieve PD , our focus is on apractical solution involving storage devices with multi-ple logical volumes, independently encrypted with userpassword-derived keys.

To protect her sensitive data from adversaries, auser may write it encrypted to one of these logical vol-umes (the “hidden” volume). For PD, the user maychoose to also write non-sensitive data to a “public”volume, encrypted with a key which can be provided toan adversary as proof of plausible device use.Logical and physical blocks. For a block devicehosting multiple logical (hidden or public) volumes,clients address data within each volume using logicalblock addresses. The data is stored on the underlyingdevice using physical block addresses.Access patterns. We define an access pattern infor-mally as an ordered sequence of logical block reads andwrites.Write traces. We define a write trace as the actualmodifications to physical blocks due to the execution ofa corresponding access pattern.Solution Space. While there might be multiple waysto achieve PD in a multi-volume multi-snapshot adver-sary setting, one idea [7] is to “hide” operations to thehidden volume “within” operations to the public vol-ume. This prevents the adversary from gathering anyinformation regarding user access patterns to the hidden

volume (hidden data access) by ensuring that the usercan plausibly attribute any and all writes traces as ac-cesses to the public volume (public data access) instead.On coercion, the user can provide the credentials for thepublic volume and thus plausibly deny the existence ofthe hidden data. Arguably, otherwise, in the absence ofpublic volume operations, an adversary could questionthe reason for any observed changes to the space allo-cated for the hidden volume and then rightfully suspectthe existence of a hidden volume.Atomicity. As in existing work, a very small numberof physical block I/O ops (corresponding to one Data-Lair read/write operation) are assumed to be performedas atomic transactions. The adversary may gain accessonly after entire transactions have completed (or rolledback). This is reasonable since the user is unlikely toperform any sensitive operation in the presence of anadversary.Access pattern indistinguishability. Computa-tional indistinguishability of traces has been widelydiscussed in ORAM literature [13, 21]. Further, mostORAMs employ randomization techniques which ensurethat the write traces generated due to accesses are in-distinguishable from random. As will be detailed later,indistinguishability of access patterns is one of the mainrequirements to achieve PD.

We also define the link between access patterns andwrite traces.

Definition 1. Given two access patterns O0 ={a1, a2, . . . , ai} and O1 = {b1, b2, . . . , bi} and their corre-sponding write traces W0 = {w1, w2, . . . , wi} and W1 ={y1, y2, . . . , yi}, O0 is called indistinguishable from O1iff. W0 is computationally indistinguishable from W1.

3.2 Defining Plausible Deniability(PD-CPA)

We model PD as a “chosen pattern” security game, PD-CPA 2 for the block device. Since we focus on mech-anisms that “hide” operations to the hidden volume“within” operations to the public volume, we also de-fine an implementation-specific parameter establishingthe number of operations that can be hidden within apublic operation. Specifically, φ is the ratio of the num-

2 The similarity with IND-CPA is intentional. The access pat-terns correspond to the “plaintexts” in an access pattern privacysetting.

DataLair 178

ber of hidden operations that can be performed with apublic operation such that the write traces due to thepublic operation with a hidden operation is indistin-guishable from the write traces for the same operationwithout the hidden operations. This paper proposes asolution with φ = 1 to ensure that each public operationperforms one hidden operation.

We define PD-CPA(λ, φ) , with security parameterλ between a challenger and an adversary as follows:

1. Adversary A provides a storage device D (the adver-sary can decide its state fully) to challenger C.

2. C chooses two encryption keys Kpub and Khid usingsecurity parameter λ and creates two logical volumes,Vpub and Vhid, both stored in D. Writes to Vpub andVhid are encrypted with keys Kpub and Khid respec-tively. C also fairly selects a random bit b.

3. C returns Kpub to A.4. The adversary A and the challenger C then engage in

a polynomial number of rounds in which:(a) A selects two access patterns O0 and O1 with the

following restriction:– O1 and O0 include the same writes to Vpub– Both O1 and O0 may include writes to Vhid– O0 or O1 should not include more writes to

Vhid than φ times the number of operationsto Vpub in that sequence.

(b) C executes Ob on D and sends a snapshot of thedevice to A.

(c) A outputs b′ .(d) A is said to have “won” the round iff. b′ = b.

We note that the restrictions imposed on theadversary-generated access patterns are necessary toeliminate trivially-identifiable cases. E.g., allowing dif-ferent writes to V_pub in O0 and O1 would allow theadversary to distinguish between the two sequences bysimply decrypting the contents of V_pub (using theknown Kpub) and comparing the decrypted data. For-tunately, a similar comparison is not possible for Vhidsince Khid is not accessible to the adversary.

The restriction imposed by φ ensures that the ad-versary may not trivially distinguish between O0 andO1 by providing sequences of different “true lengths” –the number of actual blocks modified in the write tracegenerated by a given access pattern. Specifically, for agiven φ, PD-CPA assumes that the true length of a se-quence with k writes to V_pub is k × φ. Since, O0 andO1 have the same writes to V_pub, their true lengthsare the same. The additional φ writes in the correspond-ing write traces generated for the sequences allows hid-

ing writes to V_hid. Thus, if the number of writes toV_hid exceed the number allowable by φ, the sequencesbecome trivially distinguishable by their true lengths.Relationship with existing work. PD-CPA is sim-ilar to the hidden volume encryption game in [7] withthe notable difference that PD-CPA empowers the ad-versary further by giving her the ability to choose theinput device. This consideration is regarding a practi-cal scenario where an oppressive regime officer might beaware of particular underlying properties of a storagedevice. Thus, a PD-CPA secure solution should not bereliant on the properties of a particular kind of storagedevice that can be used by the challenger, as that initself would be suspicious to the adversary3.

Definition 2. A storage mechanism is “plausibly de-niable” if it ensures that A cannot win any round ofthe corresponding PD-CPA game with a non-negligibleadvantage (in the security parameter λ) over randomguessing.

3.3 Necessary and sufficient conditions forPD-CPA

DataLair provides a plausibly deniable solution defeat-ing a PD-CPA adversary. We note informally here (andprove later) that the following conditions are necessaryand sufficient to ensure PD-CPA security.1. Indistinguishability between hidden data write ac-

cess patterns (“access pattern indistinguishability”,HWA).

2. Indistinguishability between write traces that in-clude public data accesses with one (or more) hid-den data accesses, and the same public data accesseswithout any hidden data accesses (“access type in-distinguishability”, PAT).

Indistinguishability between hidden write accesspatterns (HWA). Recall that in PD-CPA, the adver-sary can include writes to Vhid in both O0 and O1. Ifthe write traces were distinguishable, the two sequenceswould become distinguishable to an adversary providingdifferent accesses to Vhid in the sequences.

Note that HWA indistinguishability also ensuresthat writes to the same logical location in Vhid in two

3 For example, [23] exploits the variations in the programmingtime of a flash device to hide data by encoding bits in the pro-gramming time of individual cells.

DataLair 179

different rounds of PD-CPA results in write traces thatare independently distributed and thus prevents an ad-versary from “linking” accesses to the same logical loca-tion in successive rounds. In the absence of this “unlink-ability”, an adversary could provide the same accessesto Vhid as part of O1 in successive rounds with onlywrites (possibly different) to Vpub as O0. On observingthe same write traces for successive rounds, the adver-sary could correctly predict that the sequence executedis O1.Indistinguishability between public access writetraces (PAT). In PD-CPA, an adversary can provideO0 with only accesses to Vpub and O1 with accesses toboth Vpub and Vhid and win trivially if the sequences’write traces were distinguishable. In effect, to ensurePD-CPA security, in case O0 and O1 include thesame public data operations, they should be in-distinguishable. Note that both sequences should con-tain the same public data operations since otherwisethey are trivially distinguishable on the basis of any ad-ditional public operation that is performed, as discussedabove. This is why the write trace due to a sequence ofpublic data operations plus (one or more) hidden datawrite(s) should be indistinguishable from the write tracedue to accesses including the same public data opera-tions without any hidden data write(s).

Theorem 1. A storage mechanism is PD-CPA se-cure iff. it guarantees indistinguishability between hid-den write access patterns (HWA) and indistinguishabil-ity between public operations with and without hiddenwrites (PAT).

See appendix for the detailed proof.

4 Access PatternIndistinguishability

Section 3 shows that one of the necessary conditionsto plausibly deny the existence of a logical volume to amulti-snapshot adversary, is to ensure indistinguishabil-ity of hidden data write access patterns (HWA).

A straightforward solution here is to use an oblivi-ous RAM (ORAM) which allows a client/CPU to hideits data access patterns from an untrusted server/RAMhosting the accessed data. Informally, ORAMs preventan adversary from distinguishing between equal lengthsequences of queries made by a client to the server. Thisusually also includes indistinguishability between reads

and writes. We refer to the vast amount of existing lit-erature on ORAMs for more formal definitions [10, 22].Write-ony ORAM. As noted by Blass et al. [7], anoff-the-shelf full ORAM is simply too powerful since itprotects both read and write accesses – while for PDonly write access patterns are of concern. In this case,a write-only ORAM [7, 12] can be deployed which pro-vides access pattern privacy against an adversary thatcan monitor only writes.Logical vs. practical complexity. ORAM literaturetraditionally defines access complexity as the number oflogical blocks of data accessed per I/O. This allows opti-mizations by using logical blocks of smaller size [19, 21].However, it is important to note that standard off-the-shelf block-based storage devices can access data only inunits of physical blocks (sectors). For example, accessinga 256 byte logical block still entails accessing the corre-sponding 4KB block from the disk. Thus, in the contextof locally deployed block-based storage devices, practi-cal complexity needs to be measured in terms of thenumber of physical blocks (sectors) accessed per I/O.HIVE-ORAM. The most bandwidth-efficient write-only ORAM is the construction by Blass et al. [7] (fur-ther referred to as HIVE-ORAM). HIVE-ORAM [7]maps data from a logical address space uniformly ran-domly to the physical blocks on the underlying device.The position map for the ORAM is recursively storedin O(logN) smaller ORAMs, a standard technique intro-duced in [19]. The recursive technique reduces the logi-cal block access complexity for the position map by stor-ing the position map blocks in logical blocks of smallersizes. Under this assumption, HIVE-ORAM [7] accessesa constant number of logical blocks per ORAM opera-tion at the cost of some overflow that is stored in anin-memory stash.

Unfortunately, as noted in [7], with physical blocksof uniform size, HIVE-ORAM has a practical block readcomplexity (number of blocks read) of O(logβN) anda practical block write complexity (number of blockswritten) of O(log2βN) where β = B/2 ∗ addr, B is thephysical block size in bytes and addr is the size of onelogical/physical block address in bytes. This is becauseto perform a write, HIVE-ORAM [7] needs to updateall logβN position map ORAMs recursively and up-dating each ORAM requires O(logβN) accesses to findfree blocks. More specifically, to find free blocks in anORAM, a constant number of randomly chosen physicalblocks are selected and then the position map for thatORAM is checked to determine free blocks within thesample. Consequently, with O(logβN) block read com-

DataLair 180

plexity of each position map ORAM, the overall writecomplexity of HIVE-ORAM is O(log2βN).DL-ORAM. We propose DL-ORAM, a new effi-cient write-only ORAM scheme with a practical blockwrite complexity of O(logβN). Similar to [7], DL-ORAMmaps blocks from the logical address space to uniformlyrandom blocks in the physical address space. The DL-ORAM construction however is significantly differentand incorporates two key optimizations.

First, DL-ORAM eliminates the need for recursiveposition map ORAMs by storing the position map as aB+ tree, indexed by the logical block addresses. The treeis stored along with the data within the same ORAMaddress space, thus judiciously utilizing space. Second,DL-ORAM uses a novel O(1) scheme for identifying uni-formly random free blocks using an auxiliary encrypteddata structure. This allows writes with O(logβN) com-munication complexity. We detail below.

4.1 Position Map

In DL-ORAM, all blocks are stored encrypted with asemantically secure cipher. Further, DL-ORAM storesthe logical to physical block address mappings in a B+tree indexed on the logical data block address. Logicaland physical addresses are assumed to be of the samesize. The position map tree is stored on the device witheach node stored in an individual physical block. Eachleaf node stores a sorted list of logical block addressesalong with the corresponding physical block addressesthey are mapped to.

Further, the leaf nodes are ordered from left to right,e.g., the left-most leaf node contains the entries for log-ical block addresses in the range of 1 to β. This ensuresthat the (i/β)-th leaf node always contains the mappingfor logical block address i. If a logical block is currentlynot mapped (e.g. when the block hasn’t been written toyet by the upper layer), the entry corresponding to thataddress in the map is set to null.

For traversal, each internal node stores only the listof physical block address of the node’s children. Notethat since the leaves are ordered on logical addresses,the index within an internal node determines the childnode to visit next in order to retrieve a path correspond-ing to a particular logical block address. Searching for aparticular logical block address mapping requires read-ing all the blocks along a path from the root to the leafthat contains the mapping for that logical block ID. Aseach node is stored in a physical block, the number ofblock addresses that can be stored in each leaf node is

Fig. 1. DL-ORAM. The position map blocks are randomly inter-leaved with the data blocks within the same address space.

bound by the physical block size, β. Consequently, thedepth of the tree is bounded by logβ(N) with a fanoutof β. Thus, querying the map for a particular logical tophysical block address mapping requires logβ(N) blockaccesses.

The position map shares the same physical addressspace with the ORAM data blocks. Specifically, the B+tree blocks are assigned a logical block address and arewritten to random physical blocks interleaved with thedata blocks, using the ORAM protocols. The physicallocation of the tree root is stored at a known locationoutside the ORAM address space (intuition explainedlater). Semantic security of the cipher used ensures thatthe position map blocks are indistinguishable from thedata blocks in the ORAM.

DL-ORAM supports two operations: read_oramand write_oram. Detailed pseudocode can be found inthe Appendix.read_oram(id) returns the data in the block with log-ical block address id. It locates the mapping for id inthe B+ tree and returns the data stored in the corre-sponding physical block.write_oram(id,d) writes data d to the block with log-ical block address id. It first determines the entry cor-responding to id in the B+ tree and then writes data dto a new free block. Finally, the map is updated corre-sponding to id. The mechanism for finding a free blockis described next in Section 4.2

Finally, updating the map requires recursively writ-ing all the nodes along the the specific path of the B+tree to new free blocks to ensure indistinguishability be-tween map blocks and data blocks. Specifically, the up-dated leaf node (after adding the entry for id) is writtento a selected new free block. This results in its parentnode being modified and being mapped to a new free lo-cation. Consequently, on an ORAM write, all the blocksfrom the root of the map to the corresponding leaf nodeare modified and remapped to new locations.

DataLair 181

To ensure that the recursion terminates, the physi-cal location of the tree root is always stored at a knownfixed location as described before. For DL-ORAM , thisinformation is held at a fixed location on the disk wherethe physical block address of the root is written en-crypted and is modified for each access. Once the rootdata is also modified and remapped to a new physicalblock, this information is updated.

4.2 Finding Free Blocks

A major challenge for write-only ORAMs is selectinguniformly randomly free block from the distribution ofall free blocks on the device for writing data. Writingthusly eliminates correlations between the logical datablock addresses and their physical locations, thus ren-dering an adversary incapable of linking modificationsto physical blocks with the corresponding modificationsto logical blocks. Even when data already in the ORAMneeds to be updated, it is relocated to a new randomlocation to prevent correlation with previous writes.

DL-ORAM deploys a new O(1) scheme for iden-tifying uniformly random free blocks using an auxil-iary encrypted data structure. This allows writes withO(logβ(N)) access complexity. This is achieved by stor-ing free block addresses in a novel encrypted data struc-ture – the free block matrix (FBM). The FBM is de-signed with the two following properties – i) it allowsretrieval of uniformly random elements in O(1) accesses,and ii) it does not reveal the actual number of elementsthat it contains at any given point in time. Ensuring (i)allows efficient retrieval of random free block addressesin our scheme (as we detail later). (ii) ensures that theactual number of data blocks in the ORAM is not re-vealed thus preventing possible correlations.Free Block Matrix (FBM). The FBM is an en-crypted β × N/β matrix (where N is the total num-ber of physical blocks in the ORAM and β is numberof physical block addresses that can be written to onedisk block) that stores the addresses of all currently freeblocks. The columns of the matrix are stored in consec-utive disk blocks outside the ORAM, each containing βfree block addresses (Figure 2). The coordinates in thematrix where each block address is stored is randomizedindependent of the address. Since all blocks are free atinitialization, the FBM is initialized with all N entriescontaining a randomly chosen disk block address.

As physical blocks are used for writing data, thecorresponding block addresses need to be removed fromthe FBM. This is achieved by invalidating entries (de-

Fig. 2. FBM design. The FBM is a matrix with β = B/addr rowsand N/β columns, containing N entries. Each entry of the matrixis a physical block ID. Each column of the FBM is stored in adisk block. A special “FBM header” array is in stores the numberof physical block IDs per row. The FBM requires N/β + 1 diskblocks for storage – N/β columns and 1 FBM Header. The figureillustrates an example FBM configuration with 9 entries. The firstrow has two invalid entry as indicated by the FBM header.

scribed next) when the corresponding blocks are usedfor ORAM writes. In this case, to track the number ofvalid entries in the FBM, the FBM header block storesa block-sized array tracking the number of valid entriesper FBM row.

More specifically, the FBM header contains an en-try for each row indicating the number of valid entriescurrently present in that row. Since each row can con-tain N/β valid entries (corresponding to the numberof columns), each entry in the FBM header is of sizelog(N/β). For β rows, the total number of entries in theFBM header is β × log(N/β). Also, β 1,

β × log(N/β) ≤ β × logN ≤ B/logN × logN ≤ B

Hence, the FBM header always fits in one physicalblock if B > logN . With standard 4KB block size on off-the-shelf storage disks, the FBM header requires morethan one block (B < logN) only if the disk has morethan 24096 blocks in total

Figure 2 describes the FBM design.Selection from the FBM. The FBM allows selec-tion of uniformly random block addresses with two diskblock accesses as follows – select a random i in the rangeof the total number of valid entries in the FBM. Then,determine the coordinate of the ith entry in the FBMby counting valid entries row-wise – this is straightfor-ward since the header stores the number of valid entriesper row. The block address at that coordinate is thenretrieved from the corresponding location.

DataLair 182

Fig. 3. Example of FBM with 9 entries. In (a), the FBM headercorrectly determines that the 3rd valid entry is at (2,2) sincethere is 1 valid entry in the first row and 3 in the second. In(b), the 3rd valid entry is determined incorrectly since the FBMheader correctly indicates that the second row has 2 valid entriesbut does not show which columns have the valid entries.

If the block is subsequently used (possibly for a newwrite), it is invalidated by reducing the valid entry countfor the corresponding row in the FBM header. Conse-quently, for subsequent accesses the FBM header indi-cates that the row has one less valid entry. Note thatinvalidating an entry does not entail removing it fromthe disk block storing the FBM column, rather the FBMheader is updated to ensure that the entry is not usedin subsequent accesses.

An important condition for correctness of this mech-anism is to ensure that a location determined from theFBM header always contains a valid entry. A possiblescenario where this might be violated is shown in Fig-ure 3 where the FBM incorrectly points to a locationthat contains an invalid entry. The problem in this caseis due the presence of invalid entries between valid en-tries (as in Figure 3(b)). Here, the FBM header correctlyindicates that the second row has 2 valid entries but doesnot indicate the columns with the valid entries. Deter-mining the exact coordinate at which the valid entry ispresent is non-trivial given that the header only recordsthe number of valid entries in a row but not their cor-responding locations. Consequently, with gaps betweenvalid entries in the FBM (as in Figure 3), the selec-tion mechanism described above can erroneously returnblock addresses (corresponding to invalid entries) thatare already being used.Compacting the FBM. To ensure correctness, DL-ORAM maintains the following invariant – all invalidentries in the FBM appear before all valid entries whenthe FBM is traversed row-wise. This requires all validentries in a particular row to be invalidated first beforeinvalidating an entry from the next row in sequence– e.g., all entries in the first row are invalidated be-

Fig. 4. Compacting the FBM. Invalid entries appear before validentries when entries are traversed row-wise. In the example, 7 wasselected randomly from (2,3) which was replaced by 5 from (1,2)to ensure compactness.

fore the second row etc. Since, entries are selected ran-domly from the FBM, DL-ORAM performs an extracompaction step to maintain the invariant.

The compaction replaces a randomly selected validentry from the FBM by another valid entry which is se-lected row-wise from the FBM. This replacement entryis the first valid entry that is encountered while countingvalid entries per row sequentially from the FBM header.After the replacement, the FBM header is updated.

For example, in Figure 4, 7 is uniformly randomlyselected (and invalidated) from coordinate (2,3) in theFBM using the protocol described above. This leads to aconfiguration that violates the invariant. Then for com-paction, 5 is copied from coordinate (1,2) to (2,3) andthe entry at (1,2) is invalidated. Note that the entry at(1,2) is the first valid entry encountered while travers-ing the FBM row-wise. Also, even though the FBM nowcontains duplicate block addresses at (1,2) and (2,3),the block address at (1,2) will not be used subsequentlyduring uniform selection since the FBM header correctlyindicates that row 1 has one valid entry. Due the invari-ant, this single valid entry has to be present at (1,3).

Compaction straightforwardly ensures the invariantsince randomly selected entries are replaced by entriesthat are selected row-wise and subsequently invalidated.To summarize, the compaction modifies two blocks – theheader and the block from which the random entry isselected and replaced.

Moreover, in certain cases this compaction is notrequired. For instance, when a free block address is to beadded to the FBM and a new random free block addressis required in turn. Recall that this is indeed the casefor updates to DL-ORAM blocks – the updated data iswritten to a new free block while the block containingthe previous data is now free and the corresponding freeblock address can be added back to the FBM. In thiscase, the new randomly chosen free block address can

DataLair 183

be directly replaced by the free block address that is tobe added back to the FBM. Note that due to the directreplacement, the header does not need to be updated.

In order to ensure that such an access looks indistin-guishable from an access with compaction – otherwisean adversary could identify accesses which are simplyupdates to existing blocks and thus deduce the actualnumber of data blocks in the ORAM – the FBM headeris reencrypted along with the modification to the blockwhere the replacement takes place. Due to semanticsecurity, reencrytion of the header is indistinguishablefrom actual modification during compaction.

Finally, note that entries are added back to theFBM only as part of updates (as described above). DL-ORAM does not support deletes since modern filesys-tems do not indicate deletes to block devices. Specifi-cally, FS such as ext4 only update its internal metadatato track blocks that have been deleted while the deleteddata is overwritten only when the block is subsequentlyallocated for writing new data. Thus, when DL-ORAMis used with an overlying filesystem mounted logical vol-ume (the deployment scenario considered here), deletesare logically equivalent to updates and the deleted datais updated with new data in future accesses.Uniform free block selection As discussed, the FBMprovides an efficient mechanism for determining uni-formly random free block addresses for ORAM writeswith O(1) access complexity. Unfortunately, writing toblocks only selected from the FBM for every write, canresult in certain blocks in the ORAM never being modi-fied. For instance, consider a block containing data thatis never updated. This block is never subsequently mod-ified in the ORAM after the first write since the blocknever becomes free again. This can leak subtle correla-tions since an adversary can differentiate between diskblocks that are being updated frequently from locationsthat are not updated since they were first written.

To solve this, DL-ORAM deploys an O(1) free blockselection scheme using the FBM, that modifies uni-formly random locations in the ORAM and thus pre-vents any block level correlations. The intuition is to al-ways modify k (a chosen constant) random disk blockswhile also finding a free block.

For every ORAM write, DL-ORAM creates twosets, each with k items as follows –1. A free set that contains k randomly chosen free

block addresses from the FBM.2. A random set that contains k randomly chosen block

addresses out of all the blocks in the ORAM.

In case of duplicate items between the sets, one of theduplicates is randomly discarded while a new item isselected either to the free set or the combined set de-pending on the set from which the item was discarded.Then, items from the two sets are merged randomly tocreate a combined set.Selection Protocol. Using the combined set DL-ORAM , executes the following protocol –1. Select an item randomly from the combined set.2. If the item, i selected in step 1 originally belonged

to the free set, use the corresponding block for theORAM write. Otherwise, reencrypt the block corre-sponding to the block address selected. Remove theitem from the combined set.

3. Repeat steps 1-3, k times.4. For all the remaining items, i, that are also in the

free set, replace the addresses back in the FBM fromwhere they were selected.

Lemma 1. The sample of k physical blocks modifiedfor every ORAM write (due to the selection protocol)is indistinguishable from a sample of k blocks selecteduniformly at random out of the N physical blocks in theORAM.

Proof. The idea here is to show that the probabilityof a block x being selected (and modified) randomlyin a sample of k blocks out of N , is the same as theprobability of x being one of the k blocks chosen by theselection protocol. Let X denote the event of x beingchosen uniformly at random in a sample of k blocks outof N . Then, straightforwardly Pr[X] = k/N .

Next, consider the process of building the free set.Let Pr[x ∈ FBM ] denote the probability that x is cur-rently in the FBM. Since the FBM is initialized with allN block addresses and for each access, entries are se-lected (and invalidated) uniformly randomly, allN blockaddresses are equally likely to be present and valid inthe FBM during the current access. More specifically,all the N initial entries are equally likely to have beeninvalidated in the writes that have preceded this access.Thus, Pr[x ∈ FBM ] = f/N when f is the current num-ber of valid entries in the FBM.

Let E1 be the event that x is selected into the free set– x is one of k randomly chosen block addresses selectedfrom the FBM to form the free set. This conditionallydepends on x being present in the FBM. Thus, Pr[E1] =Pr[x ∈ k|x ∈ FBM ] = Pr[x ∈ k] × Pr[x ∈ FBM ] =k/f × f/N = k/N since the event that x is selected as

DataLair 184

one of the k items from the FBM is independent of xbeing present in the FBM.

Now, since the random set is created by selecting krandom block address out of all N , the event that x isselected into the random set, E2 straightforwardly hasthe probability, Pr[E2] = k/N . Thus, x has equal prob-ability of being added to the combined set from eitherthe free set or the random set. Note that the combinedset has size 2k.

Let X ′ be the event of x being selected for mod-ification by the protocol. The goal here is to showPr[X] = Pr[X ′ ]. X ′ depends on x being chosen eitherto free set or the random set (E1 and E2) and x beingselected in one of the rounds of the protocol, denotedby event Y . Thus, Pr[X ′] = Pr[Y ]× Pr[E1UE2].

Let Y = Y1 + Y2 + . . . + Yk where Yi is the eventthat x is selected in the ith round of the protocol. Also,since x can be selected only once (selection without re-placement), probability of x being selected in the ith

round conditionally depends on x not being selected inprevious rounds4. Then,

Pr[Y = Yi] = Pr[Yi|Y 6= Yi−1, Yi−2, . . . , Y1]

Note that for round i, an item is selected from thecombined set with probability 1/(2k−i) since previouslyselected items are removed without replacement.

Now, Pr[Y = Y1] = 1/2k.

Pr[Y = Y2] = Pr[Y2|Y 6= Y1] = 1/(2k − 1)× Pr[Y 6=Y1] = 1/2k.

It can be similarly shown that Pr[Y = Yi] = 1/2k∀i ∈ k. Thus,

Pr[Y ] =i=k∑i=1

Pr[Y = Yi] = 1/2.

Finally, Pr[X ′ ] = Pr[Y ] × Pr[E1UE2] = 1/2 ×2k/N = k/N , thus proving Pr[X ′ ] = Pr[X].

Stash. If none of the k rounds yields a free block, thedata is written to an in-memory stash. Since, initiallythere are equal number of items from the free set and therandom set in the combined set and uniformly randomitems are selected each round, k rounds of the protocolyields an expected k/2 number of items from the freeset. If k = 3, the DL-ORAM stash can be modeled asa D/M/1 queue similar to [7], and bound to a constant

4 Y follows a hypergeometric distribution with 2k populationsize, k draws and 1 observed and possible success state. Thus,the following could also be derived straightforwardly using theprobability mass function for the distribution.

size with negligible failure probability in the securityparameter. We refer to [7] for details.Device utilization. A final detail is to ensure that theexpected number of free blocks obtained from the selec-tion protocol does not exceed the expected number offree blocks obtained from an equal-sized sample selectedrandomly out of all the blocks in the ORAM. Other-wise, the protocol will always yield an expected k/2 freeblocks unlike a randomly selected sample where the ex-pected number of free blocks will be a function of theactual distribution of data. Note that lemma 1 showsindistinguishability between a randomly selected sam-ple and the sample obtained due to the modification inStep 2 of the protocol for each write. However, a signifi-cant difference in the expected number of free blocks ob-tained can leak subtle correlations over multiple rounds.Fortunately, a straightforward solution for this is to en-sure that half of the ORAM blocks are always free. Notethat a similar assumption is also made by HIVE [7] butfor a different purpose, namely to bound the in-memorystash for the write protocol.

Lemma 2. If half of the ORAM blocks are always en-sured to be free, the expected number of free blocks pro-vided by the selection protocol per write is equivalent tothe expected number of free blocks in a sample of k blocksrandomly selected out of all N blocks in the ORAM.

Proof. Note that a free block can be obtained by theprotocol only if an item is selected from the free set inStep 1, for at least 1 out of the k rounds. Since, therounds are independent, it can be straightforwardly ob-served that the expected number of free blocks yieldedby the selection protocol is k/2. This is equal to theexpected number of free blocks in a randomly selectedsample of k blocks, if half of the blocks in the ORAMare always free.

Access Complexity. Both read_oram() andwrite_oram() access the ORAM map to locate the tar-get block. The complexity of accessing an entry in theB+ tree is O(logβN). Further, write_oram() needs tofind and write to O(logβN) free blocks for writing dataand updating the map blocks. Finding a free block re-quires O(1) accesses as discussed above, and thus thewrite complexity of DL-ORAM is O(logβN). Conse-quently, the overall access complexity of DL-ORAM isO(B × logβ(N)).

Lemma 3. DL-ORAM provides write access patternindistinguishability.

DataLair 185

Proof. This follows directly from the ORAM properties.For every DL-ORAM write, logβ(N) + 1 blocks need tobe written to the ORAM (logβ(N) blocks for updatingthe position map and one data block that needs to bewritten/updated) . Free blocks for writing these blocksare randomly chosen and modified independently of thelogical block address that the user wants to to write at.In fact, for any given access pattern, A = {a1, a2, . . .}and its corresponding write trace B = {b1, b2, . . .}, theselection of the blocks in B is independent of the blocksin A since B is composed of uniformly randomly se-lected blocks from the disk (Lemma 1). Thus, observ-ing B does not leak any information about A to thecomputationally-bounded adversary. Further since theelements in B are randomly and uniformly chosen, thewrite trace of any access pattern C 6= A is indistinguish-able from the write trace for A.

Finally, each access results in writes to blocks of theposition map along with the block that the user wants toaccess. Position map and data blocks are re-encryptedwith semantic security and thus indistinguishable. Thisreduces the security of DL-ORAM to the security of theunderlying semantically secure encryption.

Simulating a write. Note that all write_oram ac-cesses have indistinguishable write traces – modifyingk × (logβN + 1) randomly selected ORAM data blocks,FBM blocks, and the FBM header block. Consequently,a write_oram access can be simulated even with ran-dom data stored in the ORAM and the FBM by ran-domly selecting and re-randomizing an equal number ofORAM blocks, FBM blocks and FBM header. Due tosemantic security, re-randomizing blocks with randomdata is indistinguishable from reencrypting a block withvalid data. We use this property of DL-ORAM as a basisfor the solution described in Section 5.

5 Access Type IndistinguishabilityWe now detail the design of DataLair, which ensuresHWA by deploying DL-ORAM (Section 4) and PATthrough its access protocols described below. For PAT,DataLair ensures that a device containing both publicand hidden data is indistinguishable from a device con-taining only public data.

First, we describe a simple secure design, DataLairLite, which sacrifices storage space for reduced designcomplexity. Section 5.2, introduces a more complex butspace-efficient design dubbed DataLair Full.

Fig. 5. Design of DataLair Lite. Writes to the hidden partitionhappen through DL-ORAM.

5.1 DataLair Lite: Isolating Volumes

Setup. DataLair Lite maps blocks from two logicalvolumes to an underlying block device and can be setup in two modes of operation – ONLY_PUB (onlypublic) and PUB_HID (public and hidden). In theONLY_PUB mode, the device only stores data froma “public” logical volume that is disclosed to an adver-sary. For PUB_HID mode, DataLair Lite also storesdata from a hidden volume. DataLair Lite fixes thesize of each volume to 50% of the underlying phys-ical device size. Each logical volume may support afilesystem. DataLair Lite creates two physical partitionson the device – a public partition and a hidden par-tition, according to the corresponding logical volumesizes (Figure 5). When, DataLair Lite is initialized inONLY_PUB mode, the hidden partition is filled withrandom data.

The underlying storage is a block device with Nblocks of size B each and physical block address Pid ∈[1, N ]. The format of the address would vary across dif-ferent types of block devices, e.g., in case of a hard disk,the physical block address would be the sector numbersthat constitute one block on the physical device.

Logical block addresses Vid ∈ [1, |Vid|] are used toreference blocks in the logical volumes (|Vid| is the sizeof the volume). The logical blocks are mapped to thephysical blocks using a device mapper. Data from thepublic volume is mapped to the public partition directly(as indicated by the overlying FS) while data from thehidden volume is mapped to the hidden partition usingDL-ORAM. In addition, semantically secure encryptionis used to encrypt all data and metadata before writ-ing. Public data is encrypted with a key Kpub availableto the adversary. Hidden data is encrypted with secretkey Khid. In practice, keys may be derived from userpasswords.I/O. DataLair Lite maps logical volume I/O into eitherpublic or hidden volume operations.

DataLair 186

Public reads and writes are straightforward sinceDataLair Lite can linearly translate the logical blockaddress and perform a read/write to the correspondingphysical block in the public partition. A read to the hid-den volume calls read_oram on the hidden partition. Inaddition, to ensure PD, with every public write, Data-Lair Lite performs a hidden operation as follows:– If there is a write queued up for the hidden logical

volume, it is performed by calling write_oram.– In the absence of a write for the hidden volume, or

if DataLair Lite is used in ONLY_PUB mode, awrite is simulated for DL-ORAM (as described inSection 4).

Effectively, this ensures that every write to the hiddenvolume is preceded by a write to the public volume. Ifthere is no public write when a hidden write request ar-rives at DataLair Lite, the hidden data is queued inthe DL-ORAM stash. With every write to the pub-lic volume, either an outstanding write (or data fromthe stash) is written to the hidden volume (using DL-ORAM) or a write is simulated. As shown before, thesetwo cases are indistinguishable to an adversary withoutthe key for DL-ORAM.Security. The construction described above pro-vides PD for the hidden volume. First, both theONLY_PUB and PUB_HID modes of operationscreate public and hidden partitions of equal size. A writeto the public volume in both the cases results in indistin-guishable modifications to the hidden partition – eitherdue to an actual write or a simulation. This guaranteesPAT indistinguishability. Further, deploying DL-ORAMon the hidden partition ensures HWA indistinguishabil-ity. Recall that these are the necessary and sufficientconditions to ensure PD-CPA security.

5.2 DataLair Full: Merging Volumes

Although, DataLair Lite (Section 5.1) achieves PD,it makes sub-optimal use of storage space – e.g., inONLY_PUB mode, a hidden partition uses up 50% ofthe space allocated to it, notwithstanding of actual use.A more reasonable solution would allow physical volumestorage space to correspond to logical use requirements.Further, space not used for hidden data should be avail-able for public data and vice-versa.

To this end, DataLair Full allows the user to createtwo (or more) volumes of variable sizes and stores themon the same physical partition. In this case, both thepublic and the hidden volume can be of the same logical

size as the underlying partition and use all the availablespace (in this case up to 50% of total device size) foreither hidden or public data. We provide the intuitionfor the restricted device usage further below.

Unfortunately, achieving this is significantly morechallenging than the Lite construction – the main prob-lem being mapping public data to independent locationsin the presence of hidden data. We detail below.Mapping Public Data. First, with both public andhidden data being stored within the same physical ad-dress space, writing public data straightforwardly tophysical blocks indicated by an overlying filesystem isnot possible. Since DataLair does not restrict the choiceof filesystem, the distribution of public data will alsodetermine the distribution of hidden data. For example,with a log structured filesystem on the public volume,all hidden data will end up being “pushed" towards theend of the disk. This breaks the security of the randomfree block selection mechanism in DL-ORAM. Instead,public data will need to be mapped randomly withoutcompromising overall PD. This requires storing a corre-sponding mapping table for the public volume .Public Position Map (PPM). DataLair Full storesthe logical to physical block address mappings for publicdata in an array called the Public Position Map (PPM),stored at a fixed device location. The PPM is similar tothe mapping table used by most device mappers. Im-portantly, the PPM is considered to be public data andthus not subject to PD.

To proceed, it is necessary to define two importantterms here to categorize physical blocks on the basis oftheir state of occupancy: truly free and apparently free.Truly free block: a block that does not contain any(public or hidden) data. Apparently free block: ablock that contains hidden data and the use of whichneeds to be hidden from an adversary, i.e., the blockneeds to “appear” free to an adversary.

To maintain PD, public data writes should not avoidapparently free blocks by writing around hidden data.Thus, while writing public data to randomly selectedblocks, DataLair must ensure that all free blocks (in-cluding truly free and apparently free) are equally likelyto get selected to complete the write. An obvious so-lution then is to choose a random block and write thepublic data there if it is unoccupied. If the block is ap-parently free, the hidden data there can be relocated toa new random location subsequently.

This approach however creates a significant prob-lem. Recall that DL-ORAM writes hidden data by de-ploying the uniform free block selection protocol (de-

DataLair 187

Fig. 6. Sample PFL with 7 entries. Each entry in the FMA is aphysical block address. The reverse mapping array (RMA) storesan index pointer to the forward mapping array (FMA) for a par-ticular entry.

tailed in Section 4.2), using the FBM for selecting “free”block addresses. In the current context, to ensure cor-rectness, the FBM should contain addresses of only trulyfree blocks, i.e., blocks that do not contain either pub-lic or hidden data. Thus, randomly choosing a block forwriting public data will also require the correspondingblock address to be invalidated in the FBM. Otherwise,the FBM will contain block addresses that are alreadyoccupied by public data. If such a block address is se-lected for a subsequent hidden write, the public datain the block will have to be moved elsewhere (leakingthe presence of hidden data) to complete the write. Un-fortunately, invalidating a particular randomly chosenblock address in the FBM is not straightforward – byconstruction the location of a block address in the FBMis randomized.

The solution then is to select block addresses forpublic writes also from the FBM while updating theFBM in the process. The problem however with naivelyimplementing this is that the FBM is hidden data i.e,the FBM is encrypted with the DL-ORAM secret key.Thus, using the FBM for public writes would entail theuser to provide the FBM key to the adversary since allpublic operations and data structures in DataLair needsto be transparent to the adversary for PD. Providingthe FBM key to the adversary breaks the security ofDL-ORAM if hidden data is also stored.Public Free List (PFL). To solve this, DataLairselects free block addresses from the FBM but also pro-vides a way to plausibly deny this. The idea is to storea public (encrypted with the public key) list of blockaddresses corresponding to the blocks that do not con-tain public data. The list further supports the followingtwo properties: i) it allows efficient retrieval of uniformlyrandom entries, and ii) the location of any given entry inthe list can be determined efficiently. We detail below.

The PFL (Figure 6 ) is a data structure keepingtrack of block addresses of blocks that do not contain

public data. The PFL is public (not subject to deniabil-ity) and is composed of two arrays:1. The forward mapping array (FMA) (sizeN) of block

addresses that currently do not contain public data(apparently free + truly free). Uniformly randomblock addresses can be selected by picking up theentry at a randomly selected index in the array. Ifthe selected entry is to be removed from the PFL(use case described later), the array is subsequentlycompacted, by moving the entry at the end of thearray to the index corresponding to the removeditem. The compacting ensures that the real size ofthe array is always known and entries can be pickeduniformly.

2. The reverse mapping array (RMA) (size N) tracksthe index in the forward mapping array correspond-ing to each physical block address. Whenever anentry is removed/added or replaced in the forwardmapping array, the reverse mapping array is up-dated as well.

The intuition behind the PFL is that the FMA allowsDataLair to efficiently select and retrieve a free blockaddress that does not contain public data, while theRMA allows efficiently locating the index of a particularfree block address in the forward mapping array.Public write free block selection. DataLair Fullperforms the public write free block selection as follows.First, the uniform free block selection protocol usingthe FBM is deployed as described in Section 4.2. Morespecifically, for a public write, DataLair Full creates afree set and the random set. In addition, now the randomset can be built by using random block addresses fromthe PFL because the blocks that already contain publicdata (and thus not part of the PFL) can be trivially ex-cluded as being occupied. This is followed by executingthe k rounds of the protocol using the combined set. Ifthe protocol yields a free block address then the data iswritten to that block. Then, using the RMA, DataLairFull determines the location of the block address in theFMA, which is subsequently removed. This allows Data-Lair Full to claim that the block address was actuallyselected from the PFL.

If the protocol yields no free blocks, the data is stillwritten to the disk (instead of being added to the in-memory stash). The idea here is to ensure that a pub-lic write always translates to a write to the disk. Sincethe modifications to blocks not containing public data(due to a hidden write or DL-ORAM simulation) areattributed to public data writes, writing public data to

DataLair 188

the stash can result in inconsistent modifications on thedisk and violate PD.

To implement this, when the selection protocol doesnot yield a truly free block for a public write, the pub-lic data is instead written to an apparently free block.The hidden data there can then be moved to the stashand written back in a subsequent hidden write. An ap-parently free block address can be straightforwardly se-lected from the DL-ORAM position map, which as de-scribed before stores the logical to physical address map-pings for data in the ORAM (hidden data in this case).Subsequently, this block address is removed from thePFL. Note that the PFL necessarily contains this ad-dress, since it contains entries of all blocks not contain-ing public data – which also includes blocks that alreadycontain hidden data (apparently free blocks). Thus, afree block address selected using this procedure can alsobe attributed to being selected from the PFLHidden write free block selection. Hidden writesfollow the same procedure as DataLair Lite by invokingthe DL-ORAM write protocol. However unlike publicwrites, hidden data is still written to the stash if the se-lection protocol yields no free blocks. Recall that one ofthe requirements for a bounded DL-ORAM stash is toensure that half of the ORAM is free. In this case, sinceDL-ORAM will write to the blocks which are sharedwith public data, it necessary to ensure that the com-bined size of public and hidden data is only half the sizeof the device. This is achieved by the DataLair devicemapper only allowing the user to create logical volumeswith size equal to or less than half of the device capacity.Further, on reaching 50% utilization, the device mapperinforms the user that the disk is full.Indistinguishability between two modes of op-eration. Recall that in the ONLY_PUB mode, DL-ORAM and the FBM is initialized with random data.Thus, the free block selection mechanism for public datawrites described above (using the FBM) cannot be de-ployed in this case. Fortunately, to overcome this thePFL can be used to efficiently select uniformly randomfree block addresses. Once the address has been selectedand removed from the FMA, the array is compacted.

Further, to ensure indistinguishability, the accessesdue to the uniform free block selection using the FBM issimulated by reencrypting the required number of FBMblocks and required number of random block addressesselected from the PFL.

In summary (Figure 7), when writing public data inONLY_PUB mode, a randomly chosen block addressis removed from the PFL while simulating the uniform

Fig. 7. Free block selection protocol for public writes in DataLairFull. Step 1 is indistinguishable for the two cases since simulationto the FBM is indistinguishable from a real access. Using eitherthe PFL or the free block selection protocol in step 2 providesuniformly random block addresses of blocks that do not containpublic data. The entry selected by the selection protocol in step2 will be at a random index in the FMA as all entries in the FBMare randomized independent of their locations in the FMA.

free block selection protocol. In PUB_HID mode, therandomly chosen block address for writing public data isdetermined using the free block selection protocol whileremoving its corresponding address from the PFL. Thetwo cases are indistinguishable since an actual accessto DL-ORAM and the FBM is indistinguishable from asimulation.Storing metadata for encryption. Since encryp-tion is performed at the block level and the reversemapping array contain an entry for each disk block, theIVs/counters for the randomized semantically secure ci-pher used to encrypt the physical blocks are stored inthe reverse mapping array.Optimization: in-place updates for publicdata. When a public block is written for the first time(insert), it requires searching for a random free block asdescribed above, but subsequent updates can be madein-place, thus avoiding additional accesses for findingfree blocks and updating the PPM.Storing the stash. The in-memory stash is stored tothe disk at a graceful power-down. DataLair Full allo-cates a fixed location to store the constant-sized stash.On power-down, the stash is written encrypted to thatlocation. If the stash is empty or not being used (in case

DataLair 189

Fig. 8. DataLair full design with the four main components:DL-ORAM, PPM, PFL and FBM. Hidden data I/O is mappedthrough DL-ORAM while public data I/O is mapped through thePPM. The public data inserts and updates simulates an access tothe DL-ORAM. In ONLY _PUB mode free blocks are locatedusing the PFL.

of ONLY_PUB mode), DataLair Full writes randomdata instead of the stash for indistinguishability. Onboot-up, the stash is read to memory and reencrypted.

Figure 8 illustrates the DataLair Full design.Security. DataLair Full derives its security propertiesstraightforwardly from DataLair Lite. First, note thatthe only difference between the schemes is that pub-lic data and hidden data coexist in the same physicaladdress space. Public data is mapped randomly, inde-pendent of the locations where hidden data is alreadystored. This is ensured by the free block selection mech-anism. The PPM and the PFL (added in this construc-tion) are public data and do not need to be protected.

Similar to the Lite construction, hidden writesthrough DL-ORAM are either performed with publicwrites (inserts and updates) or simulated indistinguish-ably. This provides HWA indistinguishability. Further,as shown above, the ONLY_PUB mode of operation isindistinguishable from the PUB_HID mode for Data-Lair Full. This ensures PAT indistinguishability. Asshown before, these are the necessary and sufficient con-ditions for PD.

6 Evaluation

Implementation. We implemented DataLair Full asa kernel module device mapper, a Linux based frame-work for mapping blocks in logical volumes to physicalblocks. The default cipher used is AES-256 in countermode with individual per-block random IVs. Underly-ing hardware blocks are 512 bytes each and 8 adjacenthardware blocks constitute a DataLair “physical block”.Logical and physical block sizes are 4KB.

Access dm-crypt DataLair HIVE [7]Public Read 225.56 84.1 0.88Public Write 210.10 2.00 0.57Hidden Read n/a 6.00 5.36Hidden Write n/a 2.92 0.60

Table 1. Throughput Comparison (MB/s). Higher is better.DataLair performance for public data reads is practical whencompared to dm-crypt and almost 100x faster than existing work[7]. For hidden data writes, DataLair is 5x faster.

DataLair was benchmarked with two logical vol-umes (one public and one hidden) using an ext4 filesys-tem in ordered mode (metadata journaling) on top.Each volume was allocated a logical size of 25% of theunderlying device capacity. This ensures that the com-bined size of the logical volumes is 50% of the device,thus ensuring that 50% of the device is always free.Throughput was compared against Hive [7] as well dm-crypt, a commonly used linux device mapper for fulldisk encryption.Platform. Benchmarks were conducted on Linuxboxes with Intel Core i7-3520M processors running at2.90GHz and 6GB+ of DDR3 DRAM. The storage de-vice was a 1TB Samsung-850 Evo SSD. Logical vol-ume sizes were set 64GB while DataLair was built on a256GB physical partition. Benchmarking was performedusing Bonnie++ [1] on Ubuntu 14.04 LTS, kernel ver-sion 3.13.6. Benchmarking for Hive [7] was performedwith the same parameters by compiling the open sourceproject [2].

System caches were flushed in between tests. Alltests were run 3 times and results collected with 95%confidence interval. DataLair stores all internal datastructures persistently. The in-memory stash used wasconstrained to 50 4KB blocks (200KB of data).Throughput. Table 1 shows throughput comparisonfor DataLair, HIVE and dm-crypt. Public reads featurea throughput of about 85MB/s, 100x faster than exist-ing work [7] and only 2.5x slower than dm-crypt. Thespeedup results from the fact that public reads do notneed to use the ORAM. Note that the PPM stil needsto be accessed first for determining the physical locationof the logical block. This additional synchronous accessresults in the overhead when compared to dm-crypt.

Public writes simulate a DL-ORAM access. Theimproved write complexity of DL-ORAM compared toHIVE-ORAM [7] results in a 4x speedup. Later we showhow to optimize this further for more practical use. Sim-ilarly, hidden writes for DataLair are almost 5x fasterthan HIVE. Hidden reads perform comparably to HIVE

DataLair 190

Access dm-crypt DataLair HIVE [7]Public Read .007 .018 1Hidden Read n/a .10 .19Public Write .7 25 332Hidden Write n/a 92 219

Table 2. Latency Comparison (in seconds). Lower is better. Data-Lair is 100x faster than HIVE [7] for public reads and almost 10xfaster public writes.

0

5

10

15

20

25

30

35

0 2 4 6 8 10 12 14 16

Pub

lic W

rite

Th

rou

gh

pu

t(in

MB

/s)

Public to Private Ratio

All WritesOnly Updates

HIVE

Fig. 9. Variations in public write throughput vs. the publicwrite to private write ratio. The x-axis represents the num-ber of public writes that are performed in between two privatewrites/simulation. The throughput plateaus at around 12 MB/s,around 6x faster than the configuration where hidden writes aremade with each public write. The benefits of performing hiddenwrites only with updates is visible even in this case.

since the overall read complexity is asymptotically thesame for DL-ORAM and HIVE-ORAM [7].Latency. Table 2 shows the latency comparison forDataLair, HIVE and dm-crypt. Expectedly, DataLairpublic reads are almost 100x faster than HIVE [7]. Italso interesting to note that DataLair public writes arealmost 15x faster than HIVE [7]. This is due to thereduced number of I/Os that needs to be performed peraccess due to the better write-complexity of DL-ORAM.Writing hidden data with public updates. Astraightforward optimization for DataLair is to use onlypublic updates (in-place) for hidden writes/simulations,avoiding the expensive free block selection. In fact, sincefilesystem block access patterns typically follows a zip-fian distribution [16] – only a small group of existingblocks are accessed/updated frequently – an update toa group of already existing public data blocks is morelikely than inserting new data. Also, since filesystems donot indicate deletes to the device, once the public vol-ume is completely occupied, all subsequent operationsduring the lifetime of the disk will be treated as updatesby the device mapper.Frequency of Hidden Writes. DataLair features asolution for PD-CPA (Section 3) where the number ofhidden operations performed with each public opera-

tion, φ = 1. For real world applications, it is reasonableto assume that a user will access hidden data less oftenthat public data. In that case, φ can be configured ac-cording to an estimated workload to improve the publicwrite throughput.

Figure 9 shows the variations in the public writethroughput while increasing the public write to pri-vate write ratio. The throughput achieves a maximumof around 12MB/s when hidden writes/simulations aremade every 10 public writes (inserts and updates).When hidden writes/simulations are performed onlywith updates (as described above), the write through-put achieves a maximum of around 30MB/s, around 40xfaster than HIVE [7]. Note that since HIVE [7] uses awrite-only ORAM for public writes, excluding hiddenwrites for a fraction of the public writes does not resultin significant gains when compared to the overhead ofthe ORAM. Although, DataLair is still 7x slower thandm-crypt (Table 1), the additional PD guarantees overfull disk encryption makes this acceptable in practice.

7 ConclusionThis work shows that it is not necessary to sacrificeperformance to achieve plausible deniability (PD), evenin the presence of a powerful multi-snapshot adver-sary. DataLair is a block device with practical perfor-mance and PD assurances, designed around a new ef-ficient write-only ORAM construction. DataLair publicdata reads are two orders of magnitude faster than ex-isting approaches while accesses to hidden data are 5times faster. For more restricted settings, DataLair canachieve public data write performance almost 50x fasterthan existing work.

AcknowledgementThis work was supported by the National Science Foun-dation through awards 1161541, 1318572, 1526102, and152670. We would like to thank our shepherd, GiuliaFanti and the anonymous reviewers for their suggestionson improving the paper. We also thank Vinay Ganesh-mal jain for helping us implement DataLair.

DataLair 191

References[1] Bonnie++. "http://www.coker.com.au/bonnie++".[2] Hive. "http://www.onarlioglu.com/hive".[3] Plausible Deniability. "http://en.wikipedia.org/wiki/

Plausible_deniability".[4] Rubberhose. "http://en.wikipedia.org/wiki/Rubberhose_

(filesystem)".[5] TrueCrypt. "http://truecrypt.sourceforge.net/".[6] Ross Anderson, Roger Needham, and Adi Shamir. The

steganographic file system. pages 43–60, 1998.[7] Erik-Oliver Blass, Travis Mayberry, Guevara Noubir, and

Kaan Onarlioglu. Toward robust hidden volumes usingwrite-only oblivious ram. In Proceedings of the 2014 ACMSIGSAC Conference on Computer and Communications Se-curity, CCS ’14, pages 203–214, New York, NY, USA, 2014.ACM.

[8] Ran Canetti, Cynthia Dwork, Moni Naor, and Rafail Os-trovsky. Deniable encryption. In Proceedings of the 17thAnnual International Cryptology Conference on Advances inCryptology, CRYPTO ’97, pages 90–104, London, UK, UK,1997. Springer-Verlag.

[9] Alexei Czeskis, David J. St. Hilaire, Karl Koscher, Steven D.Gribble, Tadayoshi Kohno, and Bruce Schneier. Defeatingencrypted and deniable file systems: Truecrypt v5.1a and thecase of the tattling os and applications. In Proceedings ofthe 3rd Conference on Hot Topics in Security, HOTSEC’08,pages 7:1–7:7, Berkeley, CA, USA, 2008. USENIX Associa-tion.

[10] Oded Goldreich and Rafail Ostrovsky. Software protectionand simulation on oblivious rams. J. ACM, 43(3):431–473,May 1996.

[11] Jin Han, Meng Pan, Debin Gao, and HweeHwa Pang. Amulti-user steganographic file system on untrusted sharedstorage. In Proceedings of the 26th Annual Computer Se-curity Applications Conference, ACSAC ’10, pages 317–326,New York, NY, USA, 2010. ACM.

[12] Lichun Li and Anwitaman Datta. Write-only oblivi-ous ram based privacy-preserved access of outsourceddata. Cryptology ePrint Archive, Report 2013/694, 2013.http://eprint.iacr.org/.

[13] Martin Maas, Eric Love, Emil Stefanov, Mohit Tiwari, ElaineShi, Krste Asanovic, John Kubiatowicz, and Dawn Song.Phantom: Practical oblivious computation in a secure pro-cessor. In Proceedings of the 2013 ACM SIGSAC Conferenceon Computer & Communications Security, CCS ’13,pages 311–324, New York, NY, USA, 2013. ACM.

[14] Andrew D. Mcdonald and Markus G. Kuhn. Stegfs: Asteganographic file system for linux. In In Information Hid-ing, 1999.

[15] Andrew D. McDonald and Markus G. Kuhn. Stegfs: Asteganographic file system for linux. In Proceedings of theThird International Workshop on Information Hiding, IH ’99,pages 462–477, London, UK, UK, 2000. Springer-Verlag.

[16] Ranjit Noronha, Xiangyong Ouyang, and Dhabaleswar K.Panda. Designing a high-performance clustered nas: A casestudy with pnfs over rdma on infiniband. In Proceedingsof the 15th International Conference on High PerformanceComputing, HiPC’08, pages 465–477, Berlin, Heidelberg,

2008. Springer-Verlag.[17] HweeHwa Pang, K.-L. Tan, and X. Zhou. Stegfs: a stegano-

graphic file system. In Data Engineering, 2003. Proceedings.19th International Conference on, pages 657–667, March2003.

[18] Timothy Peters, Mark Gondree, and Zachary N. J. Peterson.DEFY: A deniable, encrypted file system for log-structuredstorage. In 22nd Annual Network and Distributed SystemSecurity Symposium, NDSS 2015, San Diego, California,USA, February 8-11, 2014, 2015.

[19] Elaine Shi, T h. Hubert Chan, Emil Stefanov, and MingfeiLi. Oblivious ram with o((log n) 3) worst-case cost.

[20] A. Skillen and M. Mannan. Mobiflage: Deniable storageencryptionfor mobile devices. Dependable and Secure Com-puting, IEEE Transactions on, 11(3):224–237, May 2014.

[21] Emil Stefanov, Marten van Dijk, Elaine Shi, ChristopherFletcher, Ling Ren, Xiangyao Yu, and Srinivas Devadas.Path oram: An extremely simple oblivious ram protocol.In Proceedings of the 2013 ACM SIGSAC Conference onComputer & Communications Security, CCS ’13, pages299–310, New York, NY, USA, 2013. ACM.

[22] Emil Stefanov, Marten van Dijk, Elaine Shi, ChristopherFletcher, Ling Ren, Xiangyao Yu, and Srinivas Devadas.Path oram: An extremely simple oblivious ram protocol.In Proceedings of the 2013 ACM SIGSAC Conference onComputer & Communications Security, CCS ’13, pages299–310, New York, NY, USA, 2013. ACM.

[23] Yinglei Wang, Wing kei Yu, S.Q. Xu, E. Kan, and G.E. Suh.Hiding information in flash memory. In Security and Privacy(SP), 2013 IEEE Symposium on, pages 271–285, May 2013.

[24] "How a Syrian refugee risked his life to bear witness toatrocities". "https://www.thestar.com/news/world/2012/03/14/how_a_syrian_refugee_risked_his_life_to_bear_witness_to_atrocities.html".

[25] "Former Press Writer Mike Giglio Detained, Beaten and Re-leased During Egypt Protests". "http://www.houstonpress.com/news/former-press-writer-mike-giglio-detained-beaten-and-released-during-egypt-protests-6722336".

[26] Anrin Chakraborti, Chen Chen, and Radu Sion. Poster:Datalair: A storage block device with plausible deniability.In Proceedings of the 2016 ACM SIGSAC Conference onComputer and Communications Security, CCS ’16, pages1757–1759, New York, NY, USA, 2016. ACM.

"http://www.coker.com.au/bonnie++" "http://www.onarlioglu.com/hive" "http://en.wikipedia.org/wiki/Plausible_deniability" "http://en.wikipedia.org/wiki/Plausible_deniability" "http://en.wikipedia.org/wiki/Rubberhose_(filesystem)" "http://en.wikipedia.org/wiki/Rubberhose_(filesystem)" "http://truecrypt.sourceforge.net/" http://eprint.iacr.org/"https://www.thestar.com/news/world/2012/03/14/how_a_syrian_refugee_risked_his_life_to_bear_witness_to_atrocities.html" "https://www.thestar.com/news/world/2012/03/14/how_a_syrian_refugee_risked_his_life_to_bear_witness_to_atrocities.html" "https://www.thestar.com/news/world/2012/03/14/how_a_syrian_refugee_risked_his_life_to_bear_witness_to_atrocities.html" "http://www.houstonpress.com/news/former-press-writer-mike-giglio-detained-beaten-and-released-during-egypt-protests-6722336" "http://www.houstonpress.com/news/former-press-writer-mike-giglio-detained-beaten-and-released-during-egypt-protests-6722336" "http://www.houstonpress.com/news/former-press-writer-mike-giglio-detained-beaten-and-released-during-egypt-protests-6722336"

DataLair 192

A ProofsTheorem 1. A storage mechanism is PD-CPA se-cure iff. it guarantees indistinguishability between hid-den write access patterns (HWA) and indistinguishabil-ity between public operations with and without hiddenwrites (PAT).

Proof. HWA straightforwardly ensures that even if thesame logical locations in Vhid are written by two ac-cess patterns in two rounds, their write traces are inde-pendent. Otherwise, an adversary could win one of thegames by observing the same write traces for the samewrites to Vhid in subsequent rounds.

Also, in absence of PAT, an adversary could winPD-CPA by providing O0 and O1 with the same publicdata operations but with and without writes to Vhid re-spectively. Thus, PAT is a necessary condition to ensurePD-CPA security.

We now show that HWA and PAT are also suffi-cient for PD-CPA. First, note that HWA ensures thatwrites to locations in hidden volume Vhid map to physi-cal locations selected independently of their correspond-ing logical locations. Logical and physical locations areuncorrelated. An adversary cannot determine the logicallocations corresponding to observed modified physicallocations.

Second, observe that in the context of the PD-CPAgame, PAT ensures that traces due to combined writesto Vhid and Vpub can be attributed to writes correspond-ing to Vpub only. And, as the writes to Vpub are thesame in both the sequences, the adversary cannot dis-tinguish between the write traces non-negligibly betterthan guessing.

Now, consider a PD solution, S which provides bothHWA and PAT. Also, consider an adversary A that winsPD-CPA against S selecting two sequences O0 and O1.Since, O0 and O1 differ only in the writes to Vhid (writesto Vpub are the same), either of the following holds:– O0 and O1 contain different writes to Vhid and they

are distinguishable from the corresponding writetraces in direct contradiction to the HWA propertyof S.

– O0 contains writes to Vhid and O1 does not containwrites to Vhid, and the corresponding write tracesare distinguishable. This implies that traces due tocombined writes to Vhid and Vpub in O0 are distin-guishable from traces with only the same writes toVpub in O1, in direct contradiction to the PAT prop-erty of S.

Note that ensuring HWA without PAT and vice versa isnot sufficient for PD-CPA since either of the above twocases will allow the adversary to win with non-negligibleadvantage.

B DL-ORAM Protocols

input : logical block address id1 root:= //determine B+ tree root address from

fixed location on disk;2 depth := logβ(N);3 index := βdepth;4 while not at leaf do5 root := // child # b idindexc;6 // Search subtree rooted at root;7 blk = // Read physical block corresponding

to root;8 depth := depth− 1;9 index := index/β;

10 end11 addr := // entry for B in root;12 blk := // Read block from disk with address

addr;return: Decrypt(blk)

Algorithm 1: read_oram(id)

input : logical block address id,data d1 root:= //determine B+ tree root address from fixed

location on disk;2 depth := logβ(N);3 index := βdepth;4 while not at leaf do5 root := // child # b id

indexc;

6 // Search subtree rooted at root;7 blk = // Read physical block corresponding to root;8 depth := depth− 1;9 index := index/β;

10 end11 // Find free block for new write // ;12 new_blk_id := // Find free block ;13 disk.Write(new_blk_id,d) ;14 Map.updateMap(id, new_blk_id) ;

Algorithm 2: write_oram(β, d)

DataLair 193

input : logical block address for map node id, physicalblock address where map node is writtennew_blk_id

1 root := //Determine from fixed location;2 if at root then3 // Update new root address at fixed location //4 else5 l := // READ leaf node for id;6 id := // ID for leaf node;7 // Update l with new mapping for id;8 write_oram(id, l);Algorithm 3: Map.updateMap(id, new_blk_id)

DataLair: Efficient Block Storage with Plausible Deniability against Multi-Snapshot Adversaries1 Introduction2 Related Work3 Model3.1 Preliminaries3.2 Defining Plausible Deniability (PD-CPA)3.3 Necessary and sufficient conditions for PD-CPA

4 Access Pattern Indistinguishability4.1 Position Map4.2 Finding Free Blocks

5 Access Type Indistinguishability5.1 DataLair Lite: Isolating Volumes5.2 DataLair Full: Merging Volumes

6 Evaluation7 ConclusionA ProofsB DL-ORAM Protocols

anrin chakraborti*, chen chen, and radu sion datalair ...€¦ · (“single-snapshot...

Documents