riposte: an anonymous messaging system handling millions ...this paper presents riposte, a new...

arX

iv:1

503.

0611

5v2

[cs.

CR

] 29

Jun

201

5

Riposte: An Anonymous Messaging SystemHandling Millions of Users

Henry Corrigan-Gibbs, Dan Boneh, and David MazièresStanford University

Abstract

This paper presents Riposte, a new system for anonymousbroadcast messaging. Riposte is the first such system,to our knowledge, that simultaneously protects againsttraffic-analysis attacks, prevents anonymous denial-of-service by malicious clients, and scales to million-useranonymity sets. To achieve these properties, Ripostemakes novel use of techniques used in systems for privateinformation retrieval and secure multi-party computation.For latency-tolerant workloads with many more readersthan writers (e.g. Twitter, Wikileaks), we demonstrate thata three-server Riposte cluster can build an anonymity setof 2,895,216 users in 32 hours.

1 Introduction

In a world of ubiquitous network surveillance [7, 35,36, 40, 63], prospective whistleblowers face a dauntingtask. Consider, for example, a government employee whowants to anonymously leak evidence of waste, fraud, orincompetence to the public. The whistleblower couldemail an investigative reporter directly, butpost hocanal-ysis of email server logs could easily reveal the tip-ster’s identity. The whistleblower could contact a re-porter via Tor [28] or another low-latency anonymizingproxy [32, 54, 60, 72], but this would leave the leakervulnerable to traffic-analysis attacks [4, 61, 62]. Thewhistleblower could instead use an anonymous messagingsystem that protects against traffic analysis attacks [15,39, 78], but these systems typically only support rela-tively small anonymity sets (tens of thousands of users,at most). Protecting whistleblowers in the digital age re-quires anonymous messaging systems that provide strongsecurity guarantees, but that also scale to very large net-work sizes.

This is the extended version of a paper by the same name that appearedat theIEEE Symposium on Security and Privacyin May 2015.

In this paper, we present a new system that attempts tomake traffic-analysis-resistant anonymous broadcast mes-saging practical at Internet scale. Our system, called Ri-poste, allows a large number of clients to anonymouslypost messages to a shared “bulletin board,” maintainedby a small set of minimally trusted servers. (As few asthree non-colluding servers are sufficient). Whistleblow-ers could use Riposte as a platform for anonymously pub-lishing Tweet- or email-length messages and could com-bine it with standard public-key encryption to build point-to-point private messaging channels.

While there is an extensive literature on anonymity sys-tems [23,29], Riposte offers a combination of security andscalability properties unachievable with current designs.To the best of our knowledge, Riposte is the only anony-mous messaging system that simultaneously:

1. protects against traffic analysis attacks,2. prevents malicious clients from anonymously exe-

cuting denial-of-service attacks, and3. scales to anonymity set sizes ofmillionsof users, for

certain latency-tolerant applications.

We achieve these three properties in Riposte by adapt-ing three different techniques from the cryptography andprivacy literature. First, we defeat traffic-analysis attacksand protect against malicious servers by using a protocol,inspired by client/server DC-nets [15, 78], in which ev-ery participating client sends a fixed-length secret-sharedmessage to the system’s servers in every time epoch. Sec-ond, we achieve efficient disruption resistance by using asecure multi-party protocol to quickly detect and excludemalformed client requests [30, 42, 79]. Third, we achievescalability by leveraging a specific technique developed inthe context of private information retrieval (PIR) to min-imize the number of bits each client must upload to eachserver in every time epoch. The tool we use is called adistributed point function[17, 38]. The novel synthesisof these techniques leads to a system that is efficient (interms of bandwidth and computation) and practical, evenfor large anonymity sets.

1

http://arxiv.org/abs/1503.06115v2

Our particular use of private information retrieval (PIR)protocols is unusual: PIR systems [18] allow a client toefficiently read a row from a database, maintained collec-tively at a set of servers, without revealing to the serverswhich row it is reading. Riposte achieves scalable anony-mous messaging by running a private information re-trieval protocolin reverse: with reverse PIR, a Riposteclient can efficientlywrite into a database maintained atthe set of servers without revealing to the servers whichrow it has written [68].

As we discuss later on, a large Riposte deploymentcould form the basis for an anonymous Twitter service.Users would “tweet” by using Riposte to anonymouslywrite into a database containing all clients’ tweets for aparticular time period. In addition, by having read-onlyusers submit “empty” writes to the system, the effectiveanonymity set can be much larger than the number of writ-ers, with little impact on system performance.

Messaging in Riposte proceeds in regulartime epochs(e.g., each time epoch could be one hour long). To posta message, the client generates awrite request, crypto-graphically splits it into many shares, and sends one shareto each of the Riposte servers. A coalition of serverssmaller than a certain threshold cannot learn anythingabout the client’s message or write location given its sub-set of the shares.

The Riposte servers collect write requests until the endof the time epoch, at which time they publish the aggrega-tion of the write requests they received during the epoch.From this information, anyone can recover the set of postsuploaded during the epoch, but the system reveals no in-formation about who posted which message. The identityof the entire set of clients who posted during the interval isknown, but no one can link a client to a post. (Thus, eachtime epoch must be long enough to ensure that a largenumber of honest clients are able to participate in eachepoch.)

In this paper, we describe two Riposte variants, whichoffer slightly different security properties. The first vari-ant scales to very large network sizes (millions of clients)but requires three servers such that no two of these serverscollude. The second variant is more computationally ex-pensive, but provides security even when all but one of thes> 1 servers are malicious. Both variants maintain theirsecurity properties when network links are actively adver-sarial, when all but two of the clients are actively mali-cious, and when the servers are actively malicious (subjectto the non-collusion requirement above).

The three-server variant uses a computationally inex-pensive multi-party protocol to detect and exclude mal-formed client requests. (Figure 1 depicts this protocol at

a high-level.) Thes-server variant uses client-producedzero-knowledge proofs to guarantee the well-formednessof client requests.

Unlike Tor [28] and other low-latency anonymity sys-tems [39, 49, 54, 72], Riposte protects against active traf-fic analysis attacks by a global network adversary. Priorsystems have offered traffic-analysis-resistance only at thecost of scalability:• Mix-net-based systems [16] require large zero-

knowledge proofs of correctness to provide privacyin the face of active attacks by malicious servers [2,5,33,46,66].• DC-nets-based systems require clients to transfer

datalinear in the size of the anonymity set [15, 78]and rely on expensive zero-knowledge proofs to pro-tect against malicious clients [21,45].

We discuss these systems and other prior work in Sec-tion 7.

Experiments. To demonstrate the practicality of Ri-poste for anonymous broadcast messaging (i.e., anony-mous whistleblowing or microblogging), we implementedand evaluated the complete three-server variant of the sys-tem. When the servers maintain a database table largeenough to fit 65,536 160-byte Tweets, the system can pro-cess 32.8 client write requests per second. In Section 6.3,we discuss how to use a table of this size as the basisfor very large anonymity sets in read-heavy applications.When using a larger 377 MB database table (over 2.3 mil-lion 160-byte Tweets), a Riposte cluster can process 1.4client write requests per second.

Writing into a 377 MB table requires each client toupload less than 1 MB of data to the servers. In con-trast, a two-server DC-net-based system would requireeach client to upload more than 750 MB of data. Moregenerally, to process a Riposte client request for a table ofsizeL, clients and servers perform onlyO(

√L) bytes of

data transfer.The servers’ AES-NI encryption throughput limits the

rate at which Riposte can process client requests at largetable sizes. Thus, the system’s capacity to handle clientwrite request scales with the number of available CPUcores. A large Riposte deployment could shard thedatabase table acrossk machines to achieve a near-k-foldspeedup.

We tested the system with anonymity set sizes of upto 2,895,216 clients, with a read-heavy latency-tolerantmicroblogging workload. To our knowledge, this is thelargest anonymity setever constructedin a system defend-ing against traffic analysis attacks. Prior DC-net-basedsystems scaled to 5,120 clients [78] and prior verifiable-shuffle-based systems scaled to 100,000 clients [5]. In

2

(a) A client submits one shareof its write request to each ofthe two database servers. If thedatabase has lengthL, each sharehas lengthO(

√L).

(b) The database servers gen-erate blinded “audit request”messages derived from theirshares of the write request.

(c) The audit serveruses the audit requestmessages to validatethe client’s requestand returns an “OK”or “Invalid” bit to thedatabase servers.

(d) The servers apply thewrite request to their localdatabase state. The XOR ofthe servers’ states containsthe clients message at thegiven row.

Figure 1: The process of handling a single client write request. The servers run this process once per client in eachtime epoch.

contrast, Riposte scales to millions of clients for certainapplications.

Contributions. This paper contributes:• two new bandwidth-efficient and traffic-analysis-

resistant anonymous messaging protocols, obtainedby running private information retrieval protocols “inreverse” (Sections 3 and 4),• a fast method for excluding malformed client re-

quests (Section 5),• a method to recover from transmission collisions in

DC-net-style anonymity systems,• experimental evaluation of these protocols with

anonymity set sizes of up to 2,895,216 users (Sec-tion 6).

In Section 2, we introduce our goals, threat model, andsecurity definitions. Section 3 presents the high-level sys-tem architecture. Section 4 and Section 5 detail our tech-niques for achieving bandwidth efficiency and disruptionresistance in Riposte. We evaluate the performance of thesystem in Section 6, survey related work in Section 7, andconclude in Section 8.

2 Goals and Problem Statement

In this section, we summarize the high-level goals of theRiposte system and present our threat model and securitydefinitions.

2.1 System Goals

Riposte implements an anonymous bulletin board usinga primitive we call awrite-private database scheme. Ri-poste enables clients to write into a shared database, col-lectively maintained at a small set of servers, without re-vealing to the servers the location or contents of the write.

Conceptually, the database table is just a long fixed-lengthbitstring divided into fixed-length rows.

To write into the database, a client generates awrite re-quest. The write request encodes the message to be writ-ten and the row index at which the client wants to write.(A single client write request modifies a single databaserow at a time.) Using cryptographic techniques, the clientsplits its write request into a number of shares and theclient sends one share to each of the servers. By construc-tion of the shares, no coalition of servers smaller than aparticular pre-specified threshold can learn the contents ofa single client’s write request. While the cluster of serversmust remain online for the duration of a protocol run, aclient need only stay online for long enough to upload itswrite request to the servers. As soon as the servers receivea write request, they can apply it to to their local state.

The Riposte cluster divides time into a series of epochs.During each time epoch, servers collect many write re-quests from clients. When the servers agree that the epochhas ended, they combine their shares of the database to re-veal the clients’ plaintext messages. A particular client’sanonymity set consists of all of the honest clients whosubmitted write requests to the servers during the timeepoch. Thus, if 50,000 distinct honest clients submittedwrite requests during a particular time epoch, each honestclient is perfectly anonymous amongst this set of 50,000clients.

The epoch could be measured in time (e.g., 4 hours), ina number of write requests (e.g., accumulate 10,000 writerequests before ending the epoch), or by some more com-plicated condition (e.g., wait for a write request signedfrom each of these 150 users identified by a pre-definedlist of public keys). The definition of what constitutes anepoch is crucial for security, since a client’s anonymityset is only as large as the number of honest clients whosubmit write requests in the same epoch [74].

3

When using Riposte as a platform for anonymous mi-croblogging, the rows would be long enough to fit aTweet (140 bytes) and the number of rows would besome multiple of the number of anticipated users. Toanonymously Tweet, a client would use the write-privatedatabase scheme to write its message into a random rowof the database. After many clients have written tothe database, the servers can reveal the clients’ plain-text Tweets. The write-privacy of the database schemeprevents eavesdroppers, malicious clients, and coalitionsof malicious servers (smaller than a particular threshold)from learning which client posted which message.

2.2 Threat Model

Clients in our system arecompletely untrusted: they maysubmit maliciously formed write requests to the systemand may collude with servers or with arbitrarily manyother clients to try to break the security properties of thesystem.

Servers in our system are trusted for availability. Thefailure—whether malicious or benign—of any one serverrenders the database state unrecoverable butdoes notcompromise the anonymity of the clients. To protectagainst benign failures, server maintainers could imple-ment a single “logical” Riposte server with a cluster ofmany physical servers running a standard state-machine-replication protocol [55,67].

For each of the cryptographic instantiations of Riposte,there is a threshold parametert that defines the number ofmalicious servers that the system can tolerate while stillmaintaining its security properties. We make no assump-tions about the behavior of malicious servers—they canmisbehave by publishing their secret keys, by colludingwith coalitions of up tot malicious servers and arbitrar-ily many clients, or by mounting any other sort of attackagainst the system.

The thresholdt depends on the particular cryptographicprimitives in use. For our most secure scheme,all but oneof the servers can collude without compromising clientprivacy (t = |Servers|−1). For our most efficient scheme,no twoservers can collude (t = 1).

2.3 Security Goals

The Riposte system implements awrite-private anddisruption-resistantdatabase scheme. We describe thecorrectness and security properties for such a schemehere.

Definition 1 (Correctness). The scheme iscorrectif, whenall servers execute the protocol faithfully, the plaintext

state of the database revealed at the end of a protocol runis equal to the result of applying each valid client writerequests to an empty database (i.e., a database of all ze-ros).

Since we rely on all servers for availability, correctnessneed only hold when all servers run the protocol correctly.

To be useful as an anonymous bulletin board, thedatabase scheme must bewrite-privateanddisruption re-sistant. We define these security properties here.

(s, t)-Write Privacy. Intuitively, the system provides(s, t)-write-privacy if an adversary’s advantage at guess-ing which honest client wrote into a particular row of thedatabase is negligibly better than random guessing, evenwhen the adversary controls all but two clients and up tot out of s servers (wheret is a parameter of the scheme).We define this property in terms of aprivacy game, givenin full in Appendix A.

Definition 2 ((s, t)-Write Privacy). We say that the proto-col provides(s, t)-write privacyif the adversary’s advan-tage in the security game of Appendix A is negligible inthe (implicit) security parameter.

Riposte provides a very robust sort of privacy: the ad-versary can select the messages that the honest clients willsend and can send maliciously formed messages that de-pend on the honest clients’ messages. Even then, the ad-versary still cannot guess which client uploaded whichmessage.

Disruption resistance.The system isdisruption resistantif an adversary who controlsn clients can write into atmostn database rows during a single time epoch. A sys-tem that lacks disruption resistance might be susceptibleto denial-of-service attacks: a malicious client could cor-rupt every row in the database with a single write request.Even worse, the write privacy of the system might preventthe servers from learning which client was the disruptor.Preventing such attacks is a major focus of prior anony-mous messaging schemes [15, 39, 45, 76, 78]. Under ourthreat model, we trust all servers for availability of thesystem (though not for privacy). Thus, our definition ofdisruption resistance concerns itself only with clients at-tempting to disrupt the system—wedo nottry to preventservers from corrupting the database state.

We formally definedisruption resistanceusing the fol-lowing game, played between a challenger and an adver-sary. In this game, the challenger plays the role of all ofthe servers and the adversary plays the role of all clients.

1. The adversary sendsn write requests to the chal-lenger (wheren is less than or equal to the numberof rows in the database).

4

2. The challenger runs the protocol for a single timeepoch, playing the role of the servers. The challengerthen combines the servers’ database shares to revealthe plaintext output.

The adversary wins the game if the plaintext outputcontains more thann non-zero rows.

Definition 3 (Disruption Resistance). We say that the pro-tocol isdisruption resistantif the probability that the ad-versary wins the game above is negligible in the (implicit)security parameter.

2.4 Intersection Attacks

Riposte makes it infeasible for an adversary to determinewhich client posted which messagewithin a particulartime epoch. If an adversary can observe traffic patternsacrossmany epochs, as the set of online clients changes,the adversary can make statistical inferences about whichclient is sending which stream of messages [25, 52, 57].These “intersection” or “statistical disclosure” attacksaf-fect many anonymity systems and defending against themis an important, albeit orthogonal, problem [57,77]. Evenso, intersection attacks typically become more difficult tomount as the size of the anonymity set increases, so Ri-poste’s support for very large anonymity sets makes it lessvulnerable to these attacks than are many prior systems.

3 System Architecture

As described in the prior section, a Riposte deploymentconsists of a small number of servers, who maintain thedatabase state, and a large number of clients. To writeinto the database, a client splits its write request using se-cret sharing techniques and sends a single share to eachof the servers. Each server updates its database state us-ing the client’s share. After collecting write requests frommany clients, the servers combine their shares to revealthe plaintexts represented by the write requests. The secu-rity requirement is that no coalition oft servers can learnwhich client wrote into which row of the database.

3.1 A First-Attempt Construction: Toy Pro-tocol

As a starting point, we sketch a simple “straw man”construction that demonstrates the techniques behind ourscheme. This first-attempt protocol shares some designfeatures with anonymous communication schemes basedon client/server DC-nets [15,78].

In the simple scheme, we have two servers,A andB,and each server stores anL-bit bitstring, initialized toall zeros. We assume for now that the serversdo notcollude—i.e., that one of the two servers is honest. Thebitstrings represent shares of the database state and each“row” of the database is a single bit.

Consider a client who wants to write a “1” into rowℓof the database. To do so, the client generates a randomL-bit bitstringr. The client sendsr to serverA andr⊕eℓto serverB, whereeℓ is anL-bit vector of zeros with a oneat indexℓ and⊕ denotes bitwise XOR. Upon receivingthe write request from the client, each server XORs thereceived string into its share of the database.

After processingn write requests, the database state atserverA will be:

dA = r1⊕·· ·⊕ rn

and the database at serverB will be:

dB = (eℓ1⊕·· ·⊕eℓn)⊕ (r1⊕·· ·⊕ rn)

= (eℓ1⊕·· ·⊕eℓn)⊕dA

At the end of the time epoch, the servers can revealthe plaintext database by combining their local statesdA anddB.

The construction generalizes to fields larger thanF2.For example, each “row” of the database could be ak-bitbitstring instead of a single bit. To prevent impersonation,network-tampering, and replay attacks, we use authenti-cated and encrypted channels with per-message noncesbound to the time epoch identifier.

This protocol satisfies the write-privacy property aslong as the two servers do not collude (assuming that theclients and servers deploy the replay attack defenses men-tioned above). Indeed, serverA can information theoreti-cally simulate its view of a run of the protocol given onlyeℓ1⊕·· ·⊕eℓn as input. A similar argument shows that theprotocol is write-private with respect to serverB as well.

This first-attempt protocol has two major limitations.The first limitation is that it is not bandwidth-efficient. Ifmillions of clients want to use the system in each timeepoch, then the database must be at least millions of bitsin length. To flip asingle bit in the database then, eachclient must sendmillions of bitsto each database, in theform of a write request.

The second limitation is that it is not disruption resis-tant: a malicious client can corrupt the entire databasewith a single malformed request. To do so, the maliciousclient picks randomL-bit bitstringsr and r ′, sendsr toserverA, and sendsr ′ (instead ofr⊕eℓ) to serverB. Thus,

5

a single malicious client can efficiently and anonymouslydeny service to all honest clients.

Improving bandwidth efficiency and adding disruptionresistance are the two core contributions of this work, andwe return to them in Sections 4 and 5.

3.2 Collisions

Putting aside the issues of bandwidth efficiency and dis-ruption resistance for the moment, we now discuss the is-sue ofcolliding writes to the shared database. If clientswrite into random locations in the database, there is somechance that one client’s write request will overwrite a pre-vious client’s message. If client A writes messagemA intolocationℓ, client B might later write messagemB into thesame locationℓ. In this case, rowℓ will containmA⊕mB,and the contents of rowℓ will be unrecoverable.

To address this issue, we set the size of the database ta-ble to be large enough to accommodate the expected num-ber of write requests for a given “success rate.” For exam-ple, the servers can choose a table size that is large enoughto accommodate 210 write requests such that 95% of writerequests will not be involved in a collision (in expecta-tion). Under these parameters, 5% of the write requestswill fail and those clients will have to resubmit their writerequests in a future time epoch.

We can determine the appropriate table size by solvinga simple “balls and bins” problem. If we throwm ballsindependently and uniformly at random inton bins, howmany bins contain exactly one ball? Here, them ballsrepresent the write requests and then bins represent therows of the database.

Let Bi j be the probability that balli falls into bin j. For

all i and j, Pr[Bi j ] = 1/n. LetO(1)i be the event thatexactly

one ball falls into bini. Then

Pr[

O(1)i

]

=mn

(

1− 1n

)m−1

Expanding using the binomial theorem and ignoring loworder terms we obtain

Pr[

O(1)i

]

≈ mn−(m

n

)2+

12

(mn

)3

where the approximation ignores terms of order(m/n)4

ando(1/n). Thenn ·Pr[O(1)i ] is the expected number of

bins with exactly one ball which is the expected numberof messages successfully received. Dividing this quantityby mgives the expected success rate so that:

E[SuccessRate] =nm

Pr[O(1)i ]≈ 1−m

n+

12

(mn

)2

So, if we want an expected success rate of 95% then weneedn≈ 19.5m. For example, withm= 210 writers, wewould use a table of sizen≈ 20,000.

Handling collisions. We can shrink the table sizen bycoding the writes so that we can recover from collisions.We show how to handle two-way collisions. That is,when at most two clients write to the same location in thedatabase. Let us assume that the messages being writtento the database are elements in some fieldF of odd char-acteristic (sayF = Fp wherep = 264−59). We replacethe XOR operation used in the basic scheme by additionin F.

To recover from a two-way collision we will need todouble the size of each cell in the database, but the overallnumber of cellsn will shrink by more than a factor of two.

When a clientA wants to write the messagemA ∈ F

to locationℓ in the database the client will actually writethe pair(mA,m2

A) ∈ F2 into that location. Clearly if nocollision occurs at locationℓ then recoveringmA at the endof the epoch is trivial: simply drop the second coordinate(it is easy to test that no collision occurred because thesecond coordinate is a square of the first). Now, supposea collision occurs with some clientB who also added herown message(mB,m2

B) ∈ F2 to the same locationℓ (andno other client writes to locationℓ). Then at the end of theepoch the published values are

S1=mA+mB (mod p) and S2=m2A+m2

B (mod p)

From these values it is quite easy to recover bothmA andmB by observing that

2S2−S21 = (mA−mB)

2 (mod p)

from which we obtainmA−mB by taking a square rootmodulop (it does not matter which of the two square rootswe use—they both lead to the same result). SinceS1 =mA+mB is also given it is now easy to recover bothmA

andmB.Now that we can recover from two-way collisions we

can shrink the number of cellsn in the table. LetO(2)i be

the event that exactly two balls fell into bini. Then theexpected number of received messages is

nPr[O(1)i ]+2nPr[O(2)

i ] (1)

where Pr[O(2)i ] =

(m2

)

1n2

(

1− 1n

)m−2. As before, dividing

the expected number of received messages (1) bym, ex-panding using the binomial theorem, and ignoring low or-der terms gives the expected success rate as:

E[SuccessRate]≈ 1− 12

(mn

)2+

13

(mn

)3

6

So, if we want an expected success rate of 95% we needa table withn ≈ 2.7m cells. This is a far smaller tablethan before, when we could not handle collisions. In thatcase we neededn≈ 19.5m which results in much biggertables, despite each cell being half as big. Shrinking thetable reduces the storage and computational burden on theservers.

This two-way collision handling technique generalizesto handlek-way collisions fork > 2. To handlek-waycollisions, we increase the size of each cell by a factor ofk and have each clienti write (mi ,m2

i , . . . ,mki ) ∈ Fk to its

chosen cell. Ak-collision givesk equations ink variablesthat can be efficiently solved to recover allk messages,as long as the characteristic ofF is greater thank. Usingk> 2 further reduces the table size as the desired successrate approaches one.

The collision handling method described in this sectionwill also improve performance of our full system, whichwe describe in the next section.

Adversarial collisions. The analysis above assumes thatclients behave honestly. Adversarial clients, however,need not write into random rows of the database—i.e., allmballs might not be thrown independently and uniformlyat random. A coalition of clients might, for example, tryto increase the probability of collisions by writing into thedatabase using some malicious strategy.

By symmetry of writes we can assume that all ˆmadver-sarial clients write to the database before the honest clientsdo. Now a message from an honest client is properly re-ceived at the end of an epoch if it avoids all the cells filledby the malicious clients. We can therefore carry out thehonest client analysis above assuming the database con-tain n− mcells instead ofn cells. In other words, given aboundm on the number of malicious clients we can cal-culate the required table sizen. In practice, if too manycollisions are detected at the end of an epoch the serverscan adaptively double the size of the table so that the nextepoch has fewer collisions.

3.3 Forward Security

Even the first-attempt scheme sketched in Section 3.1 pro-videsforward securityin the event that all of the servers’secret keys are compromised [14]. To be precise: an ad-versary could compromise the state and secret keys ofallserversafter the servers have processedn write requestsfrom honest clients, butbeforethe time epoch has ended.Even in this case, the adversary will be unable to deter-mine which of thenclients submitted which of then plain-text messages with a non-negligible advantage over ran-dom guessing. (We assume here that clients and servers

communicate using encrypted channels which themselveshave forward secrecy [51].)

This forward security property means that clients neednot trust thatS− t servers stay honest forever—just thatthey are honest at the moment when the client submits itsupload request. Being able to weaken the trust assumptionabout the servers in this way might be valuable in hostileenvironments, in which an adversary could compromise aserver at any time without warning.

Mix-nets do not have this property, since servers mustaccumulate a set of onion-encrypted messages beforeshuffling and decrypting them [16]. If an adversary al-ways controls the first mix server and if it can compro-mise the rest of the mix servers after accumulating a setof ciphertexts, the adversary can de-anonymize all of thesystem’s users. DC-net-based systems that use “blame”protocols to retroactively discover disruptors have a simi-lar weakness [20,78].

The full Riposte protocol maintains this forward secu-rity property.

4 Improving Bandwidth Efficiencywith Distributed Point Functions

This section describes how application of private informa-tion retrieval techniques can improve the bandwidth effi-ciency of the first-attempt protocol.

Notation. The symbolF denotes an arbitrary finite field,ZL is the ring of integers moduloL. We useeℓ ∈ F

L torepresent a vector that is zero everywhere except at indexℓ∈ ZL, where it has value “1.” Thus, form∈ F, the vectorm· eℓ ∈ FL is the vector whose value is zero everywhereexcept at indexℓ, where it has valuem. For a finite setS,the notationx←R S indicates that the value ofx is sam-pled independently and uniformly at random fromS. Theelementv[i] is the value of a vectorv at indexi. We indexvectors starting at zero.

4.1 Definitions

The bandwidth inefficiency of the protocol sketchedabove comes from the fact that the client must send anL-bit vector to each server to flip a single bit in the logicaldatabase. To reduce thisO(L) bandwidth overhead, weapply techniques inspired by private information retrievalprotocols [17,18,38].

The problem of private information retrieval (PIR) isessentially the converse of the problem we are interestedin here. In PIR, the client mustreada bit from a replicateddatabase without revealing to the servers the index being

7

read. In our setting, the client mustwrite a bit into a repli-cated database without revealing to the servers the indexbeing written. Ostrovsky and Shoup first made this con-nection in the context of a “private information storage”protocol [68].

PIR schemes allow the client to split its query to theservers into shares such that (1) a subset of the shares doesnot leak information about the index of interest, and (2)the length of the query shares is much less than the lengthof the database. The core building block of many PIRschemes, which we adopt for our purposes, is adistributedpoint function. Although Gilboa and Ishai [38] defineddistributed point functions as a primitive only recently,many prior PIR schemes make implicit use the primi-tive [17,18]. Our definition of a distributed point functionfollows that of Gilboa and Ishai, except that we generalizethe definition to allow for more than two servers.

First, we define a (non-distributed) point function.

Definition 4 (Point Function). Fix a positive integer L anda finite fieldF. For all ℓ∈ZL and m∈F, thepoint functionPℓ,m : ZL → F is the function such that Pℓ,m(ℓ) = m andPℓ,m(ℓ′) = 0 for all ℓ 6= ℓ′.

That is, the point functionPℓ,m has the value 0 whenevaluated at any input not equal toℓ and it has the valuemwhen evaluated atℓ. For example, ifL= 5 andF= F2, thepoint functionP3,1 takes on the values(0,0,0,1,0) whenevaluated on the values(0,1,2,3,4) (note that we indexvectors from zero).

An (s, t)-distributed point function provides a way todistribute a point functionPℓ,m amongstsservers such thatno coalition of at mostt servers learns anything aboutℓ orm given theirt shares of the function.

Definition 5 (Distributed Point Function (DPF)). Fix apositive integer L and a finite fieldF. An (s, t)-distributedpoint functionconsists of a pair of possibly randomizedalgorithms that implement the following functionalities:• Gen(ℓ,m)→ (k0, . . . ,ks−1). Given an integerℓ ∈ ZL

and value m∈ F, output a list of s keys.• Eval(k, ℓ′)→m′. Given a key k generated usingGen,

and an indexℓ′ ∈ ZL, return a value m′ ∈ F.

We define correctness and privacy for a distributedpoint function as follows:• Correctness. For a collection ofs keys generated

usingGen(ℓ,m), the sum of the outputs of these keys(generated usingEval) must equal the point functionPℓ,m. More formally, for allℓ,ℓ′ ∈ ZL andm∈ F:

Pr[(k0, . . . ,ks−1)← Gen(ℓ,m) :

Σs−1i=0Eval(ki , ℓ

′) = Pℓ,m(ℓ′)] = 1

where the probability is taken over the randomnessof theGen algorithm.• Privacy. Let S be any subset of{0, . . . ,s− 1} such

that|S| ≤ t. Then for anyℓ∈ZL andm∈F, letDS,ℓ,m

denote the distribution of keys{(ki) | i ∈ S} inducedby (k0, . . . ,ks−1)← Gen(ℓ,m). We say that an(s, t)-DPF maintains privacy if there exists a p.p.t. algo-rithm Sim such that the following distributions arecomputationally indistinguishable:

DS,ℓ,m≈c Sim(S)

That is, any subset of at mostt keys leaks no informa-tion aboutℓ or m. (We can also strengthen this defi-nition to require statistical or perfect indistinguisha-bility.)

Toy Construction. To make this definition concrete, wefirst construct a trivial information-theoretically secure(s,s−1)-distributed point function with length-L keys.As above, we fix a lengthL and a finite fieldF.• Gen(ℓ,m)→ (k0, . . . ,ks−1). Generate random vec-

torsk0, . . . ,ks−2 ∈ FL. Setks−1 = m·eℓ−Σs−2i=0ki .

• Eval(k, ℓ′)→m′. Interpretk as a vector inFL. Returnthe value of the vectork at indexℓ′.

The correctness property of this construction follows im-mediately. Privacy is maintained because the distributionof any collection ofs−1 keys is independent ofℓ andm.

This toy construction uses length-L keys to distribute apoint function with domainZL. Later in this section wedescribe DPF constructions which use much shorter keys.

4.2 Applying Distributed Point Functionsfor Bandwidth Efficiency

We can now use DPFs to improve the efficiency of thewrite-private database scheme introduced in Section 3.1.We show that the existence of an(s, t)-DPF with keysof length|k| (along with standard cryptographic assump-tions) implies the existence of write-private databasescheme usings servers that maintains anonymity in thepresence oft malicious servers, such that write requestshave lengths|k|. Any DPF construction with short keysthus immediately implies a bandwidth-efficient write-private database scheme.

The construction is a generalization of the one pre-sented in Section 3.1. We now assume that there aresservers such that no more thant of them collude. Eachof thes servers maintains a vector inFL as their databasestate, for some fixed finite fieldF and integerL. Each“row” in the database is now an element ofF and thedatabase hasL rows.

8

When the client wants to write a messagem∈ F intolocationℓ ∈ ZL in the database, the client uses an(s, t)-distributed point function to generate a set ofsDPF keys:

(k0, . . . ,ks−1)← Gen(ℓ,m)

The client then sends one of the keys to each of theservers. Each serveri can then expand the key into avectorv ∈ F

L by computingv(ℓ′) = Eval(ki , ℓ′) for ℓ′ =

0, . . . ,L− 1. The server then adds this vectorv into itsdatabase state, using addition inFL. At the end of thetime epoch, all servers combine their database states toreveal the set of client-submitted messages.

Correctness.The correctness of this construction followsdirectly from the correctness of the DPF. For each of then write requests submitted by the clients, denote thej-thkey in thei-th request aski, j , denote the write location asℓi , and the message being written asmi . When the serverscombine their databases at the end of the epoch, the con-tents of the final database at rowℓ will be:

dℓ =n−1

∑i=0

s−1

∑j=0

Eval(ki, j , ℓ) =n−1

∑i=0

Pℓi ,mi (ℓ) ∈ F

In words: as desired, the combined database contains thesum ofn point functions—one for each of the write re-quests.

Anonymity. The anonymity of this construction followsdirectly from the privacy property of the DPF. Given theplaintext database stated (as defined above), any coali-tion of t servers can simulate its view of the protocol. Bydefinition of DPF privacy, there exists a simulatorSim,which simulates the distribution of any subset oft DPFkeys generated usingGen. The coalition of servers canuse this simulator to simulate each of then write requestsit sees during a run of the protocol. Thus, the servers cansimulate their view of a protocol run and cannot win theanonymity game with non-negligible advantage.

Efficiency. A client in this scheme sends|k| bits to eachserver (wherek is a DPF key), so the bandwidth efficiencyof the scheme depends on the efficiency of the DPF. Aswe will show later in this section,|k| can be much smallerthan the length of the database.

4.3 A Two-Server Scheme Tolerating OneMalicious Server

Having established that DPFs with short keys lead tobandwidth-efficient write-private database schemes, wenow present one such DPF construction. This construc-tion is a simplification of computational PIR scheme ofChor and Gilboa [17].

This is a (2,1)-DPF with keys of lengthO(√

L) op-erating on a domain of sizeL. This DPF yields a two-server write-private database scheme tolerating one ma-licious server such that writing into a database of sizeLrequires sendingO(

√L) bits to each server. Gilboa and

Ishai [38] construct a(2,1)-DPF with even shorter keys(|k| = polylog(L)), but the construction presented here isefficient enough for the database sizes we use in practice.Although the DPF construction works over any field, wedescribe it here using the binary fieldF= F2k (the field ofk-bit bitstrings) to simplify the exposition.

WhenEval(k, ℓ′) is run on every integerℓ′ ∈ {0, . . . ,L−1}, its output is a vector ofL field elements. The DPFkey construction conceptually works by representing thisa vector ofL field elements as anx× y matrix, such thatxy≥ L. The trick that makes the construction work is thatthe size of the keys needs only to grow with the size ofthesidesof this matrix rather than itsarea. The DPF keysthatGen(ℓ,m) outputs give an efficient way to constructtwo matricesMA andMB that differ only at one cellℓ =(ℓx, ℓy) ∈ Zx×Zy (Figure 2).

Fix a binary finite fieldF = F2k, a DPF domain sizeL, and integersx andy such thatxy≥ L. (Later in thissection, we describe how to choosex andy to minimizethe key size.) The construction requires a pseudo-randomgenerator (PRG)G that stretches seeds from some spaceS

into length-y vectors of elements ofF [48]. So the signa-ture of the PRG isG : S→ Fy. In practice, an implemen-tation might use AES-128 in counter mode as the pseudo-random generator [65].

The algorithms comprising the DPF are:• Gen(ℓ,m)→ (kA,kB). Compute integersℓx ∈ Zx andℓy ∈ Zy such thatℓ= ℓxy+ ℓy. Sample a random bit-vectorbA←R {0,1}x, a random vector of PRG seedssA←R Sx, and a single random PRG seeds∗ℓx

←R S.GivenbA andsA, we definebB andsB as:

bA = (b0, . . . ,bℓx, . . . ,bx−1)

bB = (b0, . . . , bℓx, . . . ,bx−1)

sA = (s0, . . . ,sℓx, . . . ,sx−1)

sB = (s0, . . . ,s∗ℓx, . . . ,sx−1)

That is, the vectorsbA andbB (similarly sA andsB)differ only at indexℓx.Letm·eℓy be the vector inFy of all zeros except that ithas valuemat indexℓy. Definev←m·eℓy +G(sℓx)+G(s∗ℓx

).The output DPF keys are:

kA = (bA,sA,v) kB = (bB,sB,v)

• Eval(k, ℓ′)→ m′. Interpretk as a tuple(b,s,v). Toevaluate the PRF at indexℓ′, first write ℓ′ as an

9

Figure 2: Left: We represent the output ofEval as anx× y matrix of field elements. Left-center: Construction of thev vector used in the DPF keys. Right: using thev, s, andb vectors,Eval expands each of the two keys into anx× ymatrix of field elements. These two matrices sum to zero everywhere except at(ℓx, ℓy) = (3,4), where they sum tom.

(ℓ′x, ℓ′y) tuple such thatℓ′x ∈ Zx, ℓ′y ∈ Zy, and ℓ′ =

ℓ′xy+ ℓ′y. Use the PRGG to stretch theℓ′x-th seedof s into a length-y vector: g← G(s[ℓ′x]). Returnm′← (g[ℓ′y]+b[ℓ′x]v[ℓ

′y]).

Figure 2 graphically depicts howEval stretches the keysinto a table ofx× y field elements.

Correctness.We prove correctness of the scheme in Ap-pendix B.

Privacy. The privacy property requires that there existsan efficient simulator that, on input “A” or “ B,” outputssamples from a distribution that is computationally indis-tinguishable from the distribution of DPF keyskA or kB.

The simulatorSim simulates each component of theDPF key as follows: It samplesb←R {0,1}x, s←R Sx,andv←R Fy. The simulator returns(b,s,v).

We must now argue that the simulator’s output distri-bution is computationally indistinguishable from that in-duced by the distribution of a single output ofGen. Sincethe b and s vectors outputted byGen are random, thesimulation is perfect. Thev vector outputted byGen iscomputationally indistinguishable from random, since itis padded with the output of the PRG seeded with a seedunknown to the holder of the key. An efficient algorithmto distinguish the simulatedv vector from random canthen also distinguish the PRG output from random.

Key Size.A key for this DPF scheme consists of: a vectorin {0,1}x, a vector inSx, and a vector inFy. Let α bethe number of bits required to represent an element ofS

and letβ be the number of bits required to represent anelement ofF. The total length of a key is then:

|k|= (1+α)x+βy

For fixed spacesS andF, we can find the optimal choicesof x andy to minimize the key length. To do so, we solve:

minx,y

((1+α)x+βy) subject to xy≥ L

and conclude that the optimal values ofx andy are:

x= c√

L and y=1c

√L where c=

√

β1+α

.

The key size is thenO(√

L).When using a database table of one million rows in

length (L = 220), a row length of 1 KB per row (F =F28192), and a PRG seed size of 128 bits (using AES-128,for example) the keys will be roughly 263 KB in length.For these parameters, the keys for the naïve construction(Section 3.1) would be 1 GB in length. Application of ef-ficient DPFs thus yields a 4,000× bandwidth savings inthis case.

Computational Efficiency. A second benefit of thisscheme is that both theGen andEval routines are com-putationally efficient, since they just require performingfinite field additions (i.e., XOR for binary fields) and PRGoperations (i.e., computations of the AES function). Theconstruction requires no public-key primitives.

4.4 An s-Server Scheme Toleratings− 1Malicious Servers

The (2,1)-DPF scheme described above achieved a keysize ofO(

√L) bits using only symmetric-key primitives.

The limitation of that construction is that it only maintainsprivacy when a single key is compromised. In the contextof a write-private database scheme, this means that theconstruction can only maintain anonymity in the presenceof a singlemalicious server. It would be much better tohave a write-private database scheme withs servers thatmaintains anonymity in the presence ofs− 1 maliciousservers. To achieve this stronger security notion, we needa bandwidth-efficient(s,s−1)-distributed point function.

In this section, we construct an(s,s− 1)-DPF whereeach key has sizeO(

√L). We do so at the cost of requir-

ing more expensive public-key cryptographic operations,

10

instead of the symmetric-key operations used in the priorDPF. While the(2,1)-DPF construction above directlyfollows the work of Chor and Gilboa [17], this(s,s−1)-DPF construction is novel, as far as we know.

This construction uses aseed-homomorphic pseudo-random generator[3, 11, 64], to split the key for thepseudo-random generatorG across a collection ofs DPFkeys.

Definition 6 (Seed-Homomorphic PRG). A seed-homomorphicPRG is a pseudo-random generator Gmapping seeds in a group(S,⊕) to outputs in a group(G,⊗) with the additional property that for any s0,s1 ∈ S:

G(s0⊕ s1) = G(s0)⊗G(s1)

It is possible to construct a simple seed-homomorphicPRG from the decision Diffie-Hellman (DDH) assump-tion [11,64]. The public parameters for the scheme are listof y generators chosen at random from an order-q groupG, in which the DDH problem is hard [10]. For example,if G is an elliptic curve group [58], then the public param-eters will bey points(P0, . . . ,Py−1) ∈Gy. The seed spaceis Zq and the generator outputs vectors inGy. On inputs∈ Zq, the generator outputs(sP0, . . . ,sPy−1). The gen-erator is seed-homomorphic because, for anys0,s1 ∈ Zq,and for alli ∈ {1, . . . ,y}: s0Pi + s1Pi = (s0+ s1)Pi .

As in the prior DPF construction, we fix a DPF domainsizeL, and integersx andy such thatxy≥ L. The con-struction requires a seed-homomorphic PRGG : S 7→Gy,for some groupG of prime orderq.

For consistency with the prior DPF construction, wewill write the group operation inG using additive nota-tion. Thus, the group operation applied component-wiseto vectorsu,v ∈ G

y results in the vector(u+ v) ∈ Gy.

SinceG has orderq, qA= 0 for all A∈G.The algorithms comprising the(s,s−1)-DPF are:• Gen(ℓ,m)→ (k0, . . . ,ks−1). Compute integersℓx ∈Zx andℓy ∈ Zy such thatℓ = ℓxy+ ℓy. Sample ran-dom integer-valued vectorsb0, . . . ,bs−2 ←R (Zq)

x,random vectors of PRG seedss0, . . . ,ss−2←RSx, anda single random PRG seeds∗←R S.Selectbs−1 ∈ (Zq)

x such thatΣs−1k=0bk = eℓx (mod q)

and selectss−1 ∈ Sx such thatΣs−1

k=0sk = s∗ ·eℓx ∈Gx.

Definev←m·eℓy−G(s∗).The DPF key for serveri ∈ {0, . . . ,s− 1} is ki =(bi ,si ,v).• Eval(k, ℓ′)→ m′. Interpretk as a tuple(b,s,v). To

evaluate the PRF at indexℓ′, first write ℓ′ as an(ℓ′x, ℓ

′y) tuple such thatℓ′x ∈ Zx, ℓ′y ∈ Zy, and ℓ′ =

ℓ′xy+ ℓ′y. Use the PRGG to stretch theℓ′x-th seed

of s into a length-y vector: g← G(s[ℓ′x]). Returnm′← (g[ℓ′y]+b[ℓ′x]v[ℓ

′y]).

We omit correctness and privacy proofs, since they fol-low exactly the same structure as those used to prove se-curity of our prior DPF construction. The only differenceis that correctness here relies on the fact thatG is a seed-homomorphic PRG, rather than a conventional PRG. Asin the DPF construction of Section 4.3, the keys here areof lengthO(

√L).

Computational Efficiency. The main computational costof this DPF construction comes from the use of theseed-homomorphic PRGG. Unlike a conventional PRG,which can be implemented using AES or another fastblock cipher in counter mode, known constructions ofseed-homomorphicPRGs require algebraic groups [64] orlattice-based cryptography [3,11].

When instantiating the(s,s− 1)-DPF with the DDH-based PRG construction in elliptic curve groups, eachcall to the DPFEval routine requires an expensive ellip-tic curve scalar multiplication. Since elliptic curve opera-tions are, per byte, orders of magnitude slower than AESoperations, this(s,s−1)-DPF will be orders of magnitudeslower than the(2,1)-DPF. Security against an arbitrarynumber of malicious servers comes at the cost of compu-tational efficiency, at least for these DPF constructions.

With DPFs, we can now construct a bandwidth-efficientwrite-private database scheme that tolerates one mali-cious server (first construction) ors−1 out ofsmaliciousservers (second construction).

5 Preventing Disruptors

The first-attempt construction of our write-privatedatabase scheme (Section 3.1) had two limitations: (1)client write requests were very large and (2) maliciousclients could corrupt the database state by sending mal-formed write requests. We addressed the first of these twochallenges in Section 4. In this section, we address thesecond challenge.

A client write request in our protocol just consists ofa collection ofs DPF keys. The client sends one key toeach of thes servers. The servers must collectively de-cide whether the collection ofs keys is a valid output ofthe DPFGen routine, without revealing any informationabout the keys themselves.

One way to view the servers’ task here is as a securemulti-party computation [42, 79]. Each serveri’s privateinput is its DPF keyki . The output of the protocol is asingle bit, which determines if thes keys (k0, . . . ,ks−1)are a well-formed collection of DPF keys.

11

Since we already rely on servers for availability (Sec-tion 2.2), weneed notprotect against servers maliciouslytrying to manipulate the output of the multi-party proto-col. Such manipulation could only result in corrupting thedatabase (if a malicious server accepts a write request thatit should have rejected) or denying service to an honestclient (if a malicious server rejects a write request that itshould have accepted). Since both attacks are tantamountto denial of service, we need not consider them.

We do care, in contrast, about protecting client privacyagainst malicious servers. A malicious server participat-ing in the protocol should not gain any additional infor-mation about the private inputs of other parties, no matterhow it deviates from the protocol specification.

We construct two protocols for checking the validityof client write requests. The first protocol is computa-tionally inexpensive, but requires introducing a third non-colluding party to the two-server scheme. The secondprotocol requires relatively expensive zero-knowledgeproofs [31, 43, 44, 71], but it maintains security when allbut one ofs servers is malicious. Both of these protocolsmust satisfy the standard notions of soundness, complete-ness, and zero-knowledge [13].

5.1 Three-Party Protocol

Our first protocol for detecting malformed write requestsworks with the (2,1)-DPF scheme presented in Sec-tion 4.3. The protocol uses only hashing and finite fieldadditions, so it is computationally inexpensive. Thedownside is that it requires introducing a thirdauditserver, which must not collude with either of the othertwo servers.

We first develop a three-party protocol calledAlmostEqual that we use as a subroutine to imple-ment the full write request validation protocol. TheAlmostEqual protocol takes place between three parties:serverA, serverB, and an audit server. ServerA’s privateinput is a vectorvA ∈ Fn and serverB’s private input is avectorvB ∈ Fn. The audit server has no private input. Theoutput of theAlmostEqual protocol is “1” bit if vA andvB differ at exactly one index and is “0” bit otherwise.As with classical secure multi-party computations, thegoal of the protocol is to accurately compute the outputwithout leaking any extraneous information about theplayers’ private inputs [30, 42, 79]. We useAlmostEqual

in such a way that, whenever the client’s write request isproperly formed and whenever no two servers collude,the output of the protocol will be “1.” Thus, we need onlyprove the protocol secure in the case when the output is“1.”

We denote an instance of the three-party protocol asAlmostEqual(vA,vB), where the arguments denote thetwo secret inputs of partyA and partyB. The protocolproceeds as follows:

1. ServersA andB use a coin-flipping protocol [9] tosamplenhash functionsh0, . . . ,hn−1 from a family ofpairwise independent hash functionsH [56] havingdomainF. The servers also agree upon a random“shift” value f ∈ Zn.

2. Server A computes the valuesmi ← hi(vA[i])for every index i ∈ {0, . . . ,n − 1} and sends(mf ,mf+1, . . . ,mn−1,m0, . . . ,mf−1) to the auditor.

3. ServerB repeats Step 2 withvB.

4. The audit server returns “1” to serversA andB if andonly if the vectors it receives from the two servers areequal at every index except one. The auditor returns“0” otherwise.

We include proofs of soundness, correctness, and zero-knowledge for this construction in Appendix C.

The keys for the(2,1)-DPF construction have the form

kA = (bA,sA,v) kB = (bB,sB,v).

In a correctly formed pair of keys, theb and s vectorsdiffer at a single indexℓx, and thev vector is equal tov = m·eℓy +G(sA[ℓx])+G(sB[ℓx]).

To determine whether a pair of keys is correct, serverA constructs a test vectortA such thattA[i] = bA[i]‖sA[i]for i ∈ {0, . . . ,x− 1}. (where‖ denotes concatenation).ServerB constructs a test vectortB in the same way andthe two servers, along with the auditor run the protocolAlmostEqual(tA, tB). If the output of this protocol is “1,”then the servers conclude that theirb ands vectors differat a single index, though the protocol does not reveal to theservers which index this is. Otherwise, the servers rejectthe write request.

Next, the servers must verify that thev vector is well-formed. To do so, the servers compute another pair of testvectors:

uA =x−1

∑i=0

G(sA[i]) uB = v+x−1

∑i=0

G(sB[i]).

The servers runAlmostEqual(uA,uB) and accept the writerequest as valid if it returns “1.”

We prove security of this construction in Appendix D.An important implementation note is that ifm= 0—

that is, if the client writes the string of all zeros into thedatabase—then theu vectors will not differ at any index

12

and this information is leaked to the auditor. The protocolonly provides security if the vectors differ atexactly oneindex. To avoid this information leakage, client requestsmust be defined such thatm 6= 0 in every write request. Toachieve this, clients could define some special non-zerovalue to indicate “zero” or could use a padding scheme toensure that zero values occur with negligible probability.

As a practical matter, the audit server needs to beable to match up the portions of write requests comingfrom serverA with those coming from serverB. Riposteachieves this as follows: When the client sends its uploadrequest to serverA, the client includes a cryptographichash of the request it sent to serverB (and vice versa).Both servers can use these hashes to derive a commonnonce for the request. When the servers send audit re-quests to the audit server, they include the nonce for thewrite request in question. The audit server can use thenonce to match every audit request from serverA with thecorresponding request from serverB.

This three-party protocol is very efficient—it only re-quiresO(

√L) applications of a hash function andO(

√L)

communication from the servers to the auditor. The audi-tor only performs a simple string comparison, so it needsminimal computational and storage capabilities.

5.2 Zero Knowledge Techniques

Our second technique for detecting disruptors makes useof non-interactive zero-knowledge proofs [12,44,71].

We apply zero-knowledge techniques to allow clientsto prove the well-formedness of their write requests.This technique works in combination with the(s,s− 1)-DPF presented in Section 4.4 and maintains client write-privacy whenall but oneof sservers is dishonest.

The keys for the(s,s− 1)-DPF scheme are tuples(bi ,si ,v) such that:

s−1

∑i=0

bi = eℓx

s−1

∑i=0

si = s∗ ·eℓx v = m·eℓy−G(s∗)

To prove that its write request was correctly formed, wehave the client perform zero-knowledge proofs over col-lections of Pedersen commitments [69]. The public pa-rameters for the Pedersen commitment scheme consist ofa groupG of prime orderq and two generatorsP andQ ofG such that no one knows the discrete logarithm logQ P.A Pedersen commitment to a messagem∈ Zq with ran-domnessr ∈ Zq is C(m, r) = (mP+ rQ) ∈G (writing thegroup operation additively). Pedersen commitments arehomomorphic, in that given commitments tom0 andm1,it is possible to compute a commitment tom0+m1:

C(m0, r0)+C(m1, r1) =C(m0+m1, r0+ r1)

Here, we assume that the(s,s−1)-DPF is instantiatedwith the DDH-based PRG introduced in Section 4.4 andthat the groupG used for the Pedersen commitments isthe same order-q group used in the PRG construction.

To execute the proof, the client first generates Peder-sen commitments to elements of each of thes DPF keys.Then each serveri can verify that the client computed thecommitment to thei-th DPF key elements correctly. Theservers use the homomorphic property of Pedersen com-mitments to generate commitments to thesumof the ele-ments of the DPF keys. Finally, the client proves in zeroknowledge that these sums have the correct values.

The protocols proceed as follows:

1. The client generates vectors of Pedersen commit-mentsBi andSi committing to each element ofbi andsi . client sends theB andS vectors toevery server.

2. To serveri, the client sends the opening of the com-mitmentsBi and Si . Each serveri verifies thatBi

andSi are valid commitments to thebi andsi vectorsin the DPF key. If this check fails at some serveri,serveri notifies the other servers and all servers rejectthe write request.

3. Using the homomorphic property of the commit-ments, each server can compute vectors of com-mitmentsBsum and Ssum to the vectorsΣs−1

i=0bi andΣs−1

i=0si .

4. Using a non-interactive zero-knowledge proof, theclient proves to the servers thatBsum and Ssum arecommitments to zero everywhere except at a single(secret) indexℓx, and thatBsum[ℓx] is a commitmentto one.1 This proof uses standard witness hidingtechniques for discrete-logarithm-based zero knowl-edge proofs [12,22]. If the proof is valid, the serverscontinue to check thev vector.

This first protocol convinces each server that theb andscomponents of the DPF keys are well formed. Next, theservers check thev component:

1. For each serveri, the client sums up the seed valuessi it sent to serveri: σi = Σs−1

j=0si [ j]. The client thengenerates the output ofG(σk) and blinds it:

Gi = (σiP1+ r1Q, σiP2+ r2Q, . . . ).

2. The client sends theG values to all servers and theclient sends the opening ofGi to each serveri.

1Technically, this is a zero-knowledge proof of knowledge which provesthat the clientknowsan opening of the commitments to the stated val-ues.

13

3. Each server verifies that the openings are correct, andall servers reject the write request if this check failsat any server.

4. Using the homomorphic property of Pedersen com-mitments, every server can compute a vector of com-mitmentsGsum= (Σs−1

i=0Gi)+ v. If v is well formed,then theGsum vector contain commitments to zeroat every index except one (at which it will contain acommitment to the client’s messagem).

5. The client uses a non-interactive zero-knowledgeproof to convince the servers that the vector of com-mitmentsGsum contains commitments to zero at allindexes except one. If the proof is valid, the serversaccept the write request.

We prove in Appendix E that this protocol satisfies thestandard notions of soundness, completeness, and zero-knowledge [13].

6 Experimental Evaluation

To demonstrate that Riposte is a practical platform fortraffic-analysis-resistant anonymous messaging, we im-plemented two variants of the system. The first vari-ant uses the two-server distributed point function (Sec-tion 4.3) and uses the three-party protocol (Section 5.1)to prevent malicious clients from corrupting the database.This variant is relatively fast, since it relies primarily onsymmetric-key primitives, but requires that no two of thethree servers collude. Our results for the first variantinclude the costof identifying and excluding maliciousclients.

The second variant uses thes-server distributed pointfunction (Section 4.4). This variant protects againsts−1colluding servers, but relies on expensive public-key op-erations. We have not implemented the zero-knowledgeproofs necessary to prevent disruptors for thes-server pro-tocol (Section 5.2), so the performance numbers representonly an upper bound on the system throughput.

We wrote the prototype in the Go programming lan-guage and have published the source code online athttps://bitbucket.org/henrycg/riposte/. Weused the DeterLab network testbed for our experi-ments [59]. All of the experiments used commodityservers running Ubuntu 14.04 with four-core AES-NI-enabled Intel E3-1260L CPUs and 16 GB of RAM.

Our experimental network topology used between twoand ten servers (depending on the protocol variant in use)and eight client nodes. In each of these experiments, theeight client machines used many threads of execution to

submit write requests to the servers as quickly as possi-ble. In all experiments, the server nodes connected toa common switch via 100 Mbps links, the clients nodesconnected to a common switch via 1 Gbps links, andthe client and server switches connected via a 1 Gbpslink. The round-trip network latency between each pairof nodes was 20 ms. We chose this network topology tolimit the bandwidth between the servers to that of a fastWAN, but to leave client bandwidth unlimited so that thesmall number of client machines could saturate the serverswith write requests.

Error bars in the charts indicate the standard deviationof the throughput measurements.

6.1 Three-Server Protocol

A three-server Riposte cluster consists of two databaseservers and one audit server. The system maintains itssecurity properties as long asno twoof these three serverscollude. We have fully implemented the three-server pro-tocol, including the audit protocol (Section 5.1), so thethroughput numbers listed hereincludethe cost of detect-ing and rejecting malicious write requests.

The prototype used AES-128 in counter mode as thepseudo-random generator, Poly1305 as the keyed hashfunction used in the audit protocol [8], and TLS for linkencryption.

Figure 3 shows how many client write requests our Ri-poste cluster can service per second as the number of 160-byte rows in the database table grows. For a database tableof 64 rows, the system handles 751.5 write requests persecond. At a table size of 65,536 rows, the system han-dles 32.8 requests per second. At a table size of 1,048,576rows, the system handles 2.86 requests per second.

We chose the row length of 160 bytes because it was thesmallest multiple of 32 bytes large enough to to contain a140-byte Tweet. Throughput of the system depends onlythe total size of the table (number of rows× row length),so larger row lengths might be preferable for other appli-cations. For example, an anonymous email system usingRiposte with 4096-byte rows could handle 2.86 requestsper second at a table size of 40,960 rows.

An upper bound on the performance of the system isthe speed of the pseudo-random generator used to stretchout the DPF keys to the length of the database table. Thedashed line in Figure 3 indicates this upper bound (605MB/s), as determined using an AES benchmark written inGo. That line indicates the maximum possible through-put we could hope to achieve without aggressive opti-mization (e.g., writing portions of the code in assembly)or more powerful machines. Migrating the performance-

14

https://bitbucket.org/henrycg/riposte/

1

10

100

1000

10 100 1k 10k 100k 1M 10M

Thr

ough

put

(clie

nt r

eque

sts/

sec)

Database table size (# of 160-byte rows)

Actual throughputMaximum TLS throughputMaximum AES throughput

Figure 3: As the database table size grows, the throughputof our system approaches the maximum possible given theAES throughput of our servers.

0

10

20

30

40

50

0.0001 0.01 1 100 10000

Thr

ough

put

(clie

nt r

eque

sts/

sec)

Database table width-height ratio

Figure 4: Use of bandwidth-efficient DPFs gives a 768×speed-up over the naïve constructions, in which a client’srequest is as large as the database.

critical portions of our implementation from Go to C (us-ing OpenSSL) might increase the throughput by a fac-tor of as much as 6×, sinceopenssl speed reports AESthroughput of 3.9 GB/s, compared with the 605 MB/s weobtain with Go’s crypto library. At very small table sizes,the speed at which the server can set up TLS connectionswith the clients limits the overall throughput to roughly900 requests per second.

Figure 4 demonstrates how the request throughputvaries as the width of the table changes, while the num-ber of bytes in the table is held constant at 10 MB. Thisfigure demonstrates the performance advantage of usinga bandwidth-efficientO(

√L) DPF (Section 4) over the

naïve DPF (Section 3.1). Using a DPF with optimal ta-ble size yields a throughput of 38.4 requests per sec-ond. The extreme left and right ends of the figure indi-cate the performance yielded by the naïve construction, inwhich making a write request involves sending a(1×L)-dimension vector to each server. At the far right extremeof the table, performance drops to 0.05 requests per sec-ond, so DPFs yield a 768× speed-up.

Figure 5 indicates the total number of bytes transferredby one of the database servers and by the audit serverwhile processing a single client write request. The dashed

100 B

1kB

10kB

100kB

1MB

10MB

100MB

1GB

10GB

10 100 1k 10k 100k 1M 10M 100M

Dat

a tr

ansf

er (

byte

s)


No DPFServer - RecvServer - Send

Audit - RecvAudit - Send

Figure 5: The total client and server data transfer scalessub-linearly with the size of the database.

line at the top of the chart indicates the number of bytesa client would need to send for a single write request ifwe did not use bandwidth-efficient DPFs (i.e., the dashedline indicates the size of the database table). As the fig-ure demonstrates, the total data transfer in a Riposte clus-ter scalessub-linearlywith the database size. When thedatabase table is 2.5 GB in size, the database server trans-fers only a total of 1.23 MB to process a write request.

6.2 s-Server Protocol

In some deployment scenarios, having strong protectionagainst server compromise may be more important thanperformance or scalability. In these cases, thes-serverRiposte protocol provides the same basic functionality asthe three-server protocol described above, except that itmaintains privacy even ifs− 1 out of s servers colludeor deviate arbitrarily from the protocol specification. Weimplemented the basics-server protocol but have not yetimplemented the zero-knowledge proofs necessary to pre-vent malicious clients from corrupting the database state(Section 5.2). These performance figures thus representan upper boundon thes-server protocol’s performance.Adding the zero-knowledge proofs would require an addi-tionalΘ(

√L) elliptic curve operations per server in anL-

row database. The computational cost of the proofs wouldalmost certainly be dwarfed by theΘ(L) elliptic curve op-erations required to update the state of the database table.

The experiments use the DDH-based seed-homomorphic pseudo-random generator describedin Section 4.4 and they use the NIST P-256 elliptic curveas the underlying algebraic group. The table row size isfixed at 160 bytes.

Figure 6 demonstrates the performance of an eight-server Riposte cluster as the table size increases. At atable size of 1,024 rows, the cluster can process one re-

15

0.01

0.1

1

10

100

10 100 1k 10k

Thr

ough

put

(clie

nt r

eque

sts/

sec)


Actual throughputMaximum EC throughput

Figure 6: Throughput of an eight-server Riposte clusterusing the(8,7)-distributed point function.

1

2

3

4

5

6

7

8

9

2 3 4 5 6 7 8 9 10

Thr

ough

put

(clie

nt r

eque

sts/

sec)

Number of servers

16-row table64-row table

Figure 7: Throughput of Riposte clusters using two differ-ent database table sizes as the number of servers varies.

quest every 3.44 seconds. The limiting factor is the rateat which the servers can evaluate the DDH-based pseudo-random generator (PRG), since computing each 32-byteblock of PRG output requires a costly elliptic curve scalarmultiplication. The dashed line in the figure indicates themaximum throughput obtainable using Go’s implementa-tion of P-256 on our servers, which in turn dictates themaximum cluster throughput. Processing a single requestwith a table size of one million rows would take nearlyone hour with this construction, compared to 0.3 secondsin the AES-based three-server protocol.

Figure 7 shows how the throughput of the Riposte clus-ter changes as the number of servers varies. Since theworkload is heavily CPU-bound, the throughput only de-creases slightly as the number of servers increases fromtwo to ten.

6.3 Discussion: Whistleblowing andMicroblogging with Million-UserAnonymity Sets

Whistleblowers, political activists, or others discussingsensitive or controversial issues might benefit from an

anonymous microblogging service. A whistleblower, forexample, might want to anonymously blog about an in-stance of bureaucratic corruption in her organization.The utility of such a system depends on the size of theanonymity set it would provide: if a whistleblower is onlyanonymous amongst a group of ten people, it would beeasy for the whistleblower’s employer to retaliate againsteveryonein the anonymity set. Mounting this “punish-them-all” attack does not require breaking the anonymitysystem itself, since the anonymity set is public. As theanonymity set size grows, however, the feasibility of the“punish-them-all” attack quickly tends to zero. At ananonymity set size of 1,000,000 clients, mounting an“punish-them-all” attack would be prohibitively expen-sive in most situations.

Riposte can handle such large anonymity sets as longas (1) clients are willing to tolerate hours of messaginglatency, and (2) only a small fraction of clients writes intothe database in each time epoch. Both of these require-ments are satisfied in the whistleblowing scenario. First,whistleblowers might not care if the system delays theirposts by a few hours. Second, the vast majority of usersof a microblogging service (especially in the whistleblow-ing context) are more likely toreadposts than write them.To get very large anonymity sets, maintainers of an anony-mous microblogging service could take advantage of thelarge set of “read-only” users to provide anonymity for therelatively small number of “read-write” users.

The client application for such a microblogging ser-vice would enable read-write users to generate and sub-mit Riposte write requests to a Riposte cluster running themicroblogging service. However, the client applicationwould also allow read-only users to submit an “empty”write request to the Riposte cluster that would alwayswrite a random message into the first row of the Ripostedatabase. From the perspective of the servers, a read-onlyclient would be indistinguishable from a read-write client.By leveraging read-only users in this way, we can increasethe size of the anonymity set without needing to increasethe size of the database table.

To demonstrate that Riposte can support very largeanonymity set sizes—albeit with high latency—we con-figured a cluster of Riposte servers with a 65,536-rowdatabase table and left it running for 32 hours. In thatperiod, the system processed a total of 2,895,216 writerequests at an average rate of 25.19 requests per second.(To our knowledge, this is the largest anonymity seteverconstructedin a system that offers protection against traf-fic analysis attacks.) Using the techniques in Section 3.2,a table of this size could handle 0.3% of users writing ata collision rate of under 5%. Thus, to get an anonymity

16

set of roughly 1,000,000 users with a three-server Ripostecluster and a database table of size 65,536, the time epochmust be at least 11 hours long.

As of 2013, Twitter reported an average throughput of5,700 140-byte Tweets per second [53]. That is equiva-lent roughly 5,000 of our 160-byte messages per second.At a table size of one million messages, our Riposte clus-ter’s end-to-end throughput is 2.86 write requests per sec-ond (Figure 3). To handle the same volume of Tweets asTwitter does with anonymity set sizes on the order of hun-dreds of thousands of clients, we would need to increasethe computing power of our cluster by “only” a factor of≈1,750.2 Since we are using only three servers now, wewould need roughly 5,250 servers (split into three non-colluding data centers) to handle the same volume of traf-fic as Twitter. Furthermore, since the audit server is justdoing string comparisons, the system would likely needmany fewer audit servers than database servers, so the to-tal number of servers required might be closer to 4,000.

7 Related Work

Anonymity systems fall into one of two general cate-gories: systems that provide low-latency communicationand those that protect against traffic analysis attacks by aglobal network adversary.

Aqua [54], Crowds [72], LAP [49], Shad-owWalker [60], Tarzan [32], and Tor [28] belong tothe first category of systems: they provide an anonymousproxy for real-time Web browsing, but they do not protectagainst an adversary who controls the network, manyof the clients, and some of the nodes on a victim’s paththrough the network. Even providing a formal definitionof anonymity for low-latency systems is challenging [50]and such definitions typically do not capture the need toprotect against timing attacks.

Even so, it would be possible to combine Tor (or an-other low-latency anonymizing proxy) and Riposte tobuild a “best of both” anonymity system: clients wouldsubmit their write requests to the Riposte servers viathe Tor network. In this configuration, even ifall ofthe Riposte servers colluded, they could not learn whichuser wrote which message without also breaking theanonymity of the Tor network.

David Chaum’s “cascade” mix networks were one ofthe first systems devised with the specific goal of defend-ing against traffic-analysis attacks [16]. Since then, there

2We assume here that scaling the number of machines by a factorof kincreases our throughput by a factor ofk. This assumption is reason-able given our workload, since the processing of write requests is anembarrassingly parallel task.

have been a number of mix-net-style systems proposed,many of which explicitly weaken their protections againsta near omni-present adversary [75] to improve prospectsfor practical usability (i.e., for email traffic) [24]. In con-trast, Riposte attempts to provide very strong anonymityguarantees at the price of usability for interactive applica-tions.

E-voting systems (also called “verifiable shuffles”)achieve the sort of privacy properties that Riposte offers,and some systems even provide stronger voting-specificguarantees (receipt-freeness, proportionality, etc.), thoughmost e-voting systems cannot provide the forward secu-rity property that Riposte offers (Section 3.3) [1, 19, 33,46,47,66,70].

In a typical e-voting system, voters submit their en-crypted ballots to a few trustees, who collectively shuf-fle and decrypt them. While it is possible to repurposee-voting systems for anonymous messaging, they typi-cally require expensive zero-knowledge proofs or are in-efficient when message sizes are large. Mix-nets that donot use zero-knowledge proofs of correctness typically donot provide privacy in the face of active attacks by a subsetof the mix servers.

For example, the verifiable shuffle protocol of Bayerand Groth [5] is one of the most efficient in the litera-ture. Their shuffle implementation, when used with ananonymity set of sizeN, requires 16N group exponen-tiations per server and data transferO(N). In addition,messages must be small enough to be encoded in singlegroup elements (a few hundred bytes at most). In con-trast, our protocol requiresO(L) AES operations and datatransferO(

√L), whereL is the size of the database table.

When messages are short and when the writer/reader ratiois high, the Bayer-Groth mix may be faster than our sys-tem. In contrast, when messages are long and when thewriter/reader ratio is low (i.e.,L≪ O(N)), our system isfaster.

Chaum’s Dining Cryptographers network (DC-net) isan information-theoretically secure anonymous broad-cast channel [15]. A DC-net provides the same stronganonymity properties as Riposte does, but it requires everyuser of a DC-net to participate in every run of the proto-col. As the number of users grows, this quickly becomesimpractical.

The Dissent [78] system introduced the idea of us-ing partially trusted servers to make DC-nets practical indistributed networks. Dissent requires weaker trust as-sumptions than our three-server protocol does but it re-quires clients to sendO(L) bits to each server per timeepoch (compared with ourO(

√L)). Also, excludinga sin-

gle disruptor in a 1,000-client deployment takes over an

17

hour. In contrast, Riposte can excludes disruptors as fastas it processes write requests (tens to hundreds per sec-ond, depending on the database size). Recent work [21]uses zero-knowledge techniques to speed up disruptionresistance in Dissent (building on ideas of Golle andJuels [45]). Unfortunately, these techniques limit the sys-tem’s end to end-throughput end-to-end throughput to 30KB/s, compared with Riposte’s 450+ MB/s.

Herbivore scales DC-nets by dividing users into manysmall anonymity sets [39]. Riposte creates a single largeanonymity set, and thus enables every client to be anony-mous amongst theentire setof honest clients.

Our DPF constructions make extensive use of priorwork on private information retrieval (PIR) [17,18,34,38].Recent work demonstrates that it is possible to make the-oretical PIR fast enough for practical use [26,27,41].

Gertner et al. [37] considersymmetricPIR protocols, inwhich the servers prevent dishonest clients from learningabout more than a single row of the database per query.The problem that Gertner et al. consider is, in a way, thedual of the problem we address in Section 5, though theirtechniques do not appear to apply directly in our setting.

Ostrovsky and Shoup first proposed using PIR protocolas the basis for writing into a database shared across a setof servers [68]. However, Ostrovsky and Shoup consid-ered only the case of a single honest client, who uses theuntrusted database servers for private storage. Sincemanymutually distrustful clientsuse a single Riposte cluster,our protocol must also handle malicious clients.

Pynchon Gate [73] builds a private point-to-point mes-saging system from mix-nets and PIR. Clients anony-mously upload messages to email servers using a tradi-tional mix-net and download messages from the emailservers using a PIR protocol. Riposte could replace themix-nets used in the Pynchon Gate system: clients couldanonymously write their messages into the database us-ing Riposte and could privately read incoming messagesusing PIR.

8 Conclusion and Open Questions

We have presented Riposte, a new system for anony-mous messaging. To the best of our knowledge, Ri-poste is the first system that simultaneously (1) thwartstraffic analysis attacks, (2) prevents malicious clientsfrom anonymously disrupting the system, and (3) enablesmillion-client anonymity set sizes. We achieve these goalsthrough novel application of private information retrievaland secure multiparty computation techniques. We havedemonstrated Riposte’s practicality by implementing itand evaluating it with anonymity sets of over two million

nodes. This work leaves open a number of questions forfuture work, including:• Does there exist an(s,s− 1)-DPF construction for

s> 2 that uses only symmetric-key operations?• Are there efficient techniques (i.e., using no public-

key primitives) for achieving disruption resistancewithout the need for a non-colluding audit server?• Are there DPF constructions that enable processing

write requests in amortized timeo(L), for a length-Ldatabase?

With the design and implementation of Riposte, we havedemonstrated that cryptographic techniques can maketraffic-analysis-resistant anonymous microblogging andwhistleblowing more practical at Internet scale.

Acknowledgements

We would like to thank Joe Zimmerman and David Wufor helpful discussions about distributed point functions.We would like to thank Stephen Schwab and the staffof DeterLab for giving us access their excellent networktestbed. This work was supported by NSF, an IARPAproject provided via DoI/NBC, a grant from ONR, anNDSEG fellowship, and by a Google faculty scholarship.Opinions, findings and conclusions or recommendationsexpressed in this material are those of the author(s) anddo not necessarily reflect the views of DARPA or IARPA.

References

[1] B. Adida, “Helios: Web-based open-audit voting.”in USENIX Security Symposium, vol. 17, 2008.

[2] B. Adida and D. Wikström, “How to shuffle in pub-lic,” in Theory of Cryptography, 2007.

[3] A. Banerjee and C. Peikert, “New and improvedkey-homomorphic pseudorandom functions,” inCRYPTO, 2014.

[4] K. Bauer, D. McCoy, D. Grunwald, T. Kohno, andD. Sicker, “Low-resource routing attacks againstTor,” in WPES. ACM, 2007.

[5] S. Bayer and J. Groth, “Efficient zero-knowledgeargument for correctness of a shuffle,” inEURO-CRYPT, 2012.

[6] M. Bellare and P. Rogaway, “Random oracles arepractical: A paradigm for designing efficient proto-cols,” in CCS. ACM, 1993.

18

[7] K. Bennhold, “In Britain, guidelines for spying onlawyers and clients,”New York Times, p. A6, 7 Nov.2014.

[8] D. J. Bernstein, “The Poly1305-AES message-authentication code,” inFast Software Encryption,2005.

[9] M. Blum, “Coin flipping by telephone a protocol forsolving impossible problems,”ACM SIGACT News,vol. 15, no. 1, pp. 23–27, 1983.

[10] D. Boneh, “The decision Diffie-Hellman problem,”in Algorithmic Number Theory, ser. Lecture Notesin Computer Science, J. P. Buhler, Ed. Springer,1998, vol. 1423, pp. 48–63.

[11] D. Boneh, K. Lewi, H. Montgomery, and A. Raghu-nathan, “Key homomorphic PRFs and their applica-tions,” in CRYPTO, 2013.

[12] J. Camenisch and M. Stadler, “Proof systems forgeneral statements about discrete logarithms,” Dept.of Computer Science, ETH Zurich, Tech. Rep. 260,Mar. 1997.

[13] J. L. Camenisch, “Group signature schemes and pay-ment systems based on the discrete logarithm prob-lem,” Ph.D. dissertation, Swiss Federal Institute ofTechnology Zürich (ETH Zürich), 1998.

[14] R. Canetti, S. Halevi, and J. Katz, “A forward-securepublic-key encryption scheme,” inEUROCRYPT,2003.

[15] D. Chaum, “The Dining Cryptographers problem:Unconditional sender and recipient untraceability,”Journal of Cryptology, pp. 65–75, Jan. 1988.

[16] D. L. Chaum, “Untraceable electronic mail, returnaddresses, and digital pseudonyms,”Communica-tions of the ACM, vol. 24, no. 2, pp. 84–90, 1981.

[17] B. Chor and N. Gilboa, “Computationally private in-formation retrieval,” inSTOC. ACM, 1997.

[18] B. Chor, E. Kushilevitz, O. Goldreich, and M. Su-dan, “Private information retrieval,”Journal of theACM, vol. 45, no. 6, pp. 965–981, 1998.

[19] M. R. Clarkson, S. Chong, and A. C. Myers, “Civ-itas: A secure voting system,” Cornell University,Tech. Rep. TR 2007-2081, May 2007.

[20] H. Corrigan-Gibbs and B. Ford, “Dissent: Account-able anonymous group messaging,” inCCS. ACM,October 2010.

[21] H. Corrigan-Gibbs, D. I. Wolinsky, and B. Ford,“Proactively accountable anonymous messaging inVerdict,” in USENIX Security Symposium, 2013.

[22] R. Cramer, I. Damgård, and B. Schoenmakers,“Proofs of partial knowledge and simplified designof witness hiding protocols,” inCRYPTO, 1994.

[23] G. Danezis and C. Diaz, “A survey of anonymouscommunication channels,” Technical Report MSR-TR-2008-35, Microsoft Research, Tech. Rep., 2008.

[24] G. Danezis, R. Dingledine, and N. Mathewson,“Mixminion: Design of a type III anonymous re-mailer protocol,” inSecurity and Privacy. IEEE,2003.

[25] G. Danezis and A. Serjantov, “Statistical disclosureor intersection attacks on anonymity systems,” inIn-formation Hiding Workshop, May 2004.

[26] D. Demmler, A. Herzberg, and T. Schneider,“RAID-PIR: Practical multi-server PIR,” inWPES,2014.

[27] C. Devet and I. Goldberg, “The best of both worlds:Combining information-theoreticand computationalpir for communication efficiency,” inPETS, July2014.

[28] R. Dingledine, N. Mathewson, and P. Syverson,“Tor: The second-generation onion router,” inUSENIX Security Symposium, Aug. 2004.

[29] M. Edman and B. Yener, “On anonymity in an elec-tronic society: A survey of anonymous communi-cation systems,”ACM Computing Surveys, vol. 42,no. 1, p. 5, 2009.

[30] R. Fagin, M. Naor, and P. Winkler, “Comparing in-formation without leaking it,”Communications ofthe ACM, vol. 39, no. 5, pp. 77–85, 1996.

[31] U. Feige, A. Fiat, and A. Shamir, “Zero-knowledgeproofs of identity,” Journal of Cryptology, vol. 1,no. 2, pp. 77–94, 1988.

[32] M. J. Freedman and R. Morris, “Tarzan: A peer-to-peer anonymizing network layer,” inCCS. ACM,2002.

[33] J. Furukawa, “Efficient, verifiable shuffle decryptionand its requirement of unlinkability,” inPKC, 2004.

[34] W. Gasarch, “A survey on private information re-trieval,” in Bulletin of the EATCS, 2004.

19

[35] B. Gellman and A. Soltani, “NSA infiltrates linksto Yahoo, Google data centers worldwide, Snowdendocuments say,”Washington Post, Oct. 30 2013.

[36] B. Gellman, J. Tate, and A. Soltani, “In NSA-intercepted data, those not targeted far outnumberthe foreigners who are,”Washington Post, 5 Jul.2014.

[37] Y. Gertner, Y. Ishai, E. Kushilevitz, and T. Malkin,“Protecting data privacy in private information re-trieval schemes,” inSTOC, 1998.

[38] N. Gilboa and Y. Ishai, “Distributed point functionsand their applications,” inEUROCRYPT, 2014.

[39] S. Goel, M. Robson, M. Polte, and E. Sirer, “Herbi-vore: A scalable and efficient protocol for anony-mous communication,” Cornell University, Tech.Rep., 2003.

[40] V. Goel, “Government push for Yahoo’s user data setstage for broad surveillance,”New York Times, p. B3,7 Sept. 2014.

[41] I. Goldberg, “Improving the robustness of private in-formation retrieval,” inSecurity and Privacy. IEEE,2007.

[42] O. Goldreich, S. Micali, and A. Wigderson, “How toplay any mental game,” inSTOC. ACM, 1987.

[43] ——, “Proofs that yield nothing but their validityor all languages in NP have zero-knowledge proofsystems,”Journal of the ACM, vol. 38, no. 3, pp.690–728, 1991.

[44] S. Goldwasser, S. Micali, and C. Rackoff, “Theknowledge complexity of interactive proof systems,”SIAM Journal on computing, vol. 18, no. 1, pp. 186–208, 1989.

[45] P. Golle and A. Juels, “Dining cryptographers revis-ited,” in EUROCRYPT, 2004.

[46] J. Groth, “A verifiable secret shuffle of homomor-phic encryptions,”Journal of Cryptology, vol. 23,no. 4, pp. 546–579, 2010.

[47] J. Groth and S. Lu, “Verifiable shuffle of large sizeciphertexts,” inPKC, 2007.

[48] J. Håstad, R. Impagliazzo, L. A. Levin, and M. Luby,“A pseudorandom generator from any one-way func-tion,” SIAM Journal on Computing, vol. 28, no. 4,pp. 1364–1396, 1999.

[49] H.-C. Hsiao, T.-J. Kim, A. Perrig, A. Yamada,S. C. Nelson, M. Gruteser, and W. Meng, “LAP:Lightweight anonymity and privacy,” inSecurity andPrivacy. IEEE, May 2012.

[50] A. Johnson, “Design and analysis of efficientanonymous-communication protocols,” Ph.D. dis-sertation, Yale University, Dec. 2009.

[51] C. Kaufman, P. Hoffman, Y. Nir, P. Eronen, and K. T,“RFC7296: Internet key exchange protocol version2 (IKEv2),” Oct. 2014.

[52] D. Kedogan, D. Agrawal, and S. Penz, “Limits ofanonymity in open environments,” inInformationHiding, 2003.

[53] R. Krikorian, “New Tweets per second record,and how!” https://blog.twitter.com/2013/new-tweets-per-second-record-and-how, Aug.2013.

[54] S. Le Blond, D. Choffnes, W. Zhou, P. Druschel,H. Ballani, and P. Francis, “Towards efficient traffic-analysis resistant anonymity networks,” inSIG-COMM. ACM, 2013.

[55] B. Liskov and J. Cowling, “Viewstamped replicationrevisited,” MIT CSAIL, Tech. Rep. MIT-CSAIL-TR-2012-021, Jul. 2013.

[56] M. G. Luby, M. Luby, and A. Wigderson,Pairwiseindependence and derandomization. Now Publish-ers Inc, 2006.

[57] N. Mathewson and R. Dingledine, “Practical trafficanalysis: Extending and resisting statistical disclo-sure,” inPrivacy Enhancing Technologies, 2005.

[58] V. S. Miller, “Use of elliptic curves in cryptography,”in CRYPTO, 1986.

[59] J. Mirkovic and T. Benzel, “Teaching cybersecuritywith DeterLab,”Security & Privacy, vol. 10, no. 1,2012.

[60] P. Mittal and N. Borisov, “ShadowWalker: Peer-to-peer anonymous communication using redundantstructured topologies,” inCCS. ACM, November2009.

[61] S. J. Murdoch and G. Danezis, “Low-cost trafficanalysis of Tor,” inSecurity and Privacy. IEEE,2005.

20

https://blog.twitter.com/2013/new-tweets-per-second-record-and-how

https://blog.twitter.com/2013/new-tweets-per-second-record-and-how

[62] S. J. Murdoch and P. Zielinski, “Sampled trafficanalysis by Internet-exchange-level adversaries,” inPETS, June 2007.

[63] E. Nakashima and B. Gellman, “Court gave NSAbroad leeway in surveillance, documents show,”Washington Post, 30 Jun. 2014.

[64] M. Naor, B. Pinkas, and O. Reingold, “Distributedpseudo-random functions and KDCs,” inEURO-CRYPT, 1999.

[65] National Institute of Standards and Technology,“Specification for the advanced encryption standard(AES),” Federal Information Processing StandardsPublication 197, Nov. 2001.

[66] C. A. Neff, “A verifiable secret shuffle and its appli-cation to e-voting,” inCCS. ACM, 2001.

[67] D. Ongaro and J. Ousterhout, “In search of an under-standable consensus algorithm,” inATC. USENIX,Jun. 2014.

[68] R. Ostrovsky and V. Shoup, “Private informationstorage,” inSTOC, 1997.

[69] T. P. Pedersen, “Non-interactive and information-theoretic secure verifiable secret sharing,” inCRYPTO, 1992.

[70] M. O. Rabin and R. L. Rivest, “Efficient end to endverifiable electronic voting employing split valuerepresentations,” inEVOTE 2014, Aug. 2014.

[71] C. Rackoff and D. R. Simon, “Non-interactive zero-knowledge proof of knowledge and chosen cipher-text attack,” inCRYPTO, 1992.

[72] M. K. Reiter and A. D. Rubin, “Crowds: Anonymityfor Web transactions,”ACM Transactions on Infor-mation and System Security, vol. 1, no. 1, pp. 66–92,1998.

[73] L. Sassaman, B. Cohen, and N. Mathewson, “ThePynchon gate: A secure method of pseudonymousmail retrieval,” inWPES, November 2005.

[74] A. Serjantov, R. Dingledine, and P. Syverson, “Froma trickle to a flood: Active attacks on several mixtypes,” inInformation Hiding, 2003.

[75] P. Syverson, “Why I’m not an entropist,” inSecurityProtocols XVII, 2013.

[76] M. Waidner and B. Pfitzmann, “The Dining Cryp-tographers in the disco: Unconditional sender andrecipient untraceability with computationally secureserviceability,” inEUROCRYPT, Apr. 1989.

[77] D. Wolinsky, E. Syta, and B. Ford, “Hang withyour buddies to resist intersection attacks,” inCCS,November 2013.

[78] D. I. Wolinsky, H. Corrigan-Gibbs, A. Johnson,and B. Ford, “Dissent in numbers: Making stronganonymity scale,” in10th OSDI. USENIX, Oct.2012.

[79] A. C. Yao, “Protocols for secure computations,” inFOCS. IEEE, 1982.

A Definition of Write Privacy

An (s, t)-write-private database schemeconsists of thefollowing three (possibly randomized) algorithms:

Write(ℓ,m)→ (w(0), . . . ,w(s−1)). Clients use theWrite

functionality to generate the write request queriessent to thes servers. TheWrite function takesas input a messagem (from some finite messagespace) and an integerℓ and produces a set ofs writerequests—one per server.

Update(σ ,w)→ σ ′. Servers use theUpdate functional-ity to process incoming write requests. TheUpdatefunction takes as input a server’s internal stateσ , awrite requestw, and outputs the updated state of theserverσ ′.

Reveal(σ0, . . . ,σs−1)→ D. At the end of the time epoch,servers use theReveal functionality to recover thecontents of the database. TheReveal function takesas input the set of states from each of thes serversand produces the plaintext database contentsD.

We define the write-privacy property using the follow-ing security game, played between the adversary (whostatically corrupts up tot servers and all but two clients)and a challenger.

1. In the first step, the adversary performs the followingactions:

• The adversary selects a subsetAs⊆ {0, . . . ,s−1} of the servers, such that|As| ≤ t. The setAs

represents the set of adversarial servers. Let thesetHs = {0, . . . ,s−1}\As represent the set ofhonest servers.

21

• The adversary selects a set of clientsHc ⊆{0, . . . ,n−1}, such that|Hc| ≥ 2, representingthe set of honest clients. The adversary selectsone message-location pair per honest client:

M = {(i,mi , ℓi) | i ∈Hc}

The adversary sendsAs andM to the challenger.

2. In the second step, the challenger responds to the ad-versary:

• For each(i,mi , ℓi) ∈M, the challenger gener-ates a write request:

(w(0)i , . . . ,w(s−1)

i )←Write(ℓi ,mi)

The set of shares of theith write requestrevealed to the malicious servers isWi =

{w( j)i } j∈AS.

In the next steps of the game, the challengerwill randomly reorder the honest clients’ writerequests. The challenger should learn nothingabout which client wrote what, despite all theinformation at its disposal.

• The challenger then samples a random permu-tationπ over{0, . . . , |Hc|−1}. The challengersends the following set of write requests to theadversary, permuted according toπ :

〈Wπ(0),Wπ(1), . . . ,Wπ(|Hc|−1)〉

3. For each clienti in {0, . . . ,n−1}\Hc, the adversary

computes a write request(w(0)i , . . . ,w(s−1)

i ) (possiblyaccording to some malicious strategy) and sends theset of these write requests to the challenger.

4. • For each serverj ∈ Hs, the challenger com-putes the server’s final stateσ j by running theUpdate functionality on each of then clientwrite requests in order. LetS= {( j,σ j ) | j ∈Hs} be the set of states of the honest servers.

• The challenger samples a bitb←R {0,1}. Ifb = 0, the challenger send(S,π) to the adver-sary. Otherwise, the challenger samples a freshpermutationπ∗ on Hc and sets(S,π∗) to theadversary.

5. The adversary makes a guessb′ for the value ofb.

The adversary wins the game ifb = b′. We definethe adversary’s advantage as|Pr[b = b′]− 1/2|. Thescheme maintains(s, t)-write privacy if no efficient ad-versary wins the game with non-negligible advantage (inthe implicit security parameter).

B Correctness Proof for(2,1)-DPF

This appendix proves correctness of the distributed pointconstruction of Section 4.3. For the scheme to be correct,it must be that, for(kA,kB)← Gen(ℓ,m), for all ℓ′ ∈ ZL:

Eval(kA, ℓ′)+Eval(kB, ℓ

′) = Pℓ,m(ℓ′).

Let (ℓx, ℓy) be the tuple inZx×Zy representing locationℓand let(ℓ′x, ℓ

′y) be the tuple representingℓ′. Let:

m′A← Eval(kA, ℓ′) m′B← Eval(kB, ℓ

′).

We use a case analysis to show that the left-hand side ofthe equation above equalsPℓ,m for all ℓ′:

Case I: ℓx 6= ℓ′x. Whenℓx 6= ℓ′x, the seedssA[ℓ′x] andsB[ℓ

′x]

are equal, sogA = gB. SimilarlybA[ℓ′x] = bB[ℓ

′x]. The

outputm′A will be gA[ℓ′y]+bA[ℓ

′x]v[ℓ

′y], The outputm′B

will be identical tom′A. Since the field is a binaryfield, adding a value to itself results in the zero ele-ment, so the summ′A+m′B will be zero as desired.

Case II: ℓx = ℓ′x and ℓy 6= ℓ′y. Whenℓx = ℓ′x, the seedssA[ℓ

′x] and sB[ℓ

′x] are not equal, sogA 6= gB. Simi-

larly bA[ℓ′x] 6= bB[ℓ

′x]. Whenℓy 6= ℓ′y, v[ℓ′y] = gA[ℓ

′y]+

gB[ℓ′y]. AssumebA[ℓ

′x] = 0 (an analogous argument

applies whenbA[ℓ′x] = 1), then:

v[ℓ′y] = (m·eℓy)[ℓ′y]+gA[ℓ

′y]+gB[ℓ

′y].

The summ′A+m′B will then be:

m′A+m′B = gA[ℓ′y]+gB[ℓ

′y]+ v[ℓ′y] = 0.

Case III: ℓx = ℓ′x andℓy = ℓ′y. This is the same as CaseII, except that(m·eℓy)[ℓ

′y] = m whenℓy = ℓ′y, so the

summ′A+m′B = m, as desired.

C Proofs for the AlmostEqual Proto-col

This appendix proves security of theAlmostEqual proto-col of Section 5.1.

Soundness.We compute the probability that an honestaudit server will output “1” when the vectors are not equalat exactly one index. First, consider the case when thevvectors are equal everywhere. In this case, the test vectorsthat serversA andB send to the audit server will be equaleverywhere and the audit server will always output “0.”

Next, consider the case when thev vectors differ atk+ 1 positions, wherek > 0. The soundness errorεk is

22

equal to the probability that, foreveryindex i′ where thevectors are unequal (except one), there is a hash collision.Since the probability of many hash collisions is boundedby the probability of a single hash collision,εk ≤ ε1. Theprobability, ε1, of a single collision we know from theproperties of a pairwise-independent hash function fam-ily, where each member of the family has rangeR:

ε1 = Pr[hi ←RH : hi(vA[i]) = hi(vB[i])]≤1|R|2

The overall soundness error is then at mostε ≤ 1/|R|.Since|R| (the output space of the hash function) is expo-nentially large in the security parameter, this probabilityis negligible.

Completeness.If the vectorsvA andvB differ in exactlyone position, the audit server must output “1” with over-whelming probability. Since the audit server only out-puts “1” if exactlyone element of the test vectors is equal,whenever there is at least one collision in the hash func-tion, the protocol will return an incorrect result. The prob-ability of this event happening is negligible, however, aslong as the length of the vectors is polynomial in the se-curity parameter.

Zero Knowledge. The zero-knowledge property needonly hold when the vectors differ at exactly one index. Inthis case, serversA and B receive a single bit from theaudit server (a “1”), so the simulation is trivial for thedatabase servers. Thus, we only need to prove that thezero-knowledge property holds for the audit server.

Whenever the vectors differ at exactly one position theaudit server can also simulate its view of the protocol. Theaudit server simulator runs by picking length-n vectors ofrandom elements in the range of the pairwise hash func-tion familyH subject to the constraint that the vectors areequal at a random indexi′ ∈Zn. The simulator outputs thetwo vectors as the vectors received from serversA andB.

The simulation is valid becauseH is a pairwise-independent hash function family. LetH be a family ofhash functionhi : D→ R Then for allx,y∈ D, by defini-tion of pairwise independence:

Pr[h←RH : h(x) = h(y)]≤ 1R

This property implies that the two vectors sent to the auditserver leak no information about thev vectors, since anhonest client’sv vector will be independent of the choiceof hash functionh, and so every element of the vectorssent to the audit servers takes on every value inR withequal probability. As in the real protocol, the simulatedvectors are equal at one random index.

D Security Proof for Three-ServerProtocol

This appendix contains the security proofs for the three-server protocol for detecting malicious client requests(Section 5.1).

Completeness. If the pair of keys is well-formed thenthe bA and bB vectors (also thesA and sB vectors) areequal at every indexi 6= ℓx and they differ at indexi = ℓx.Even in the negligibly unlikely event that the random seedchosen atsA[ℓx] is equal to the random seed chosen atsB[ℓx], the test vectorstA and tB will still differ becausebA[ℓx] 6= bB[ℓx]. Thus, a correct pair ofb ands vectorswill pass the firstAlmostEqual check.

The secondAlmostEqual check is more subtle. If thevvector is well formed then, lettingℓx be the index wherethesvectors differ, we have:

uB =

(

x−1

∑i=0

G(sA[i])

)

+G(sA[ℓx])+G(sB[ℓx])+ v

= uA+G(sA[ℓx])+G(sB[ℓx])+ v

= uA+m·eℓy

If v is well-formed, then two test vectorsuA anduB differonly at indexℓy.

Soundness. To show soundness, we must bound theprobability that the audit server will output “1” when theservers take a malformed pair of DPF keys as input. Iftheb andsvectors are not equal everywhere except at oneindex, the soundness of theAlmostEqual protocol impliesthat the audit server will return “0” with overwhelmingprobability when invoked the first time.

Now, given that thes vectors differ at one index, wecan demonstrate that if theu vectors pass the secondAlmostEqual check, thenv is also well formed. Letℓx

be the index at which thes vectors differ. Write the val-ues of thes vectors at indexℓx as s∗A ands∗B. Then, byconstruction:

uA =

(

x−1

∑i 6=ℓx

G(sA[i])

)

+G(s∗A)

uB =

(

x−1

∑i 6=ℓx

G(sB[i])

)

+G(s∗B)+ v

The first term of these two expressions are equal (be-cause thes vectors are equal almost everywhere). Thus,to violate the soundness property, an adversary must con-struct a tuple(s∗A,s

∗B,v) such that the vectorsG(s∗A) and

(G(s∗B) + v) differ at exactly one indexand such that

23

v 6= G(s∗A)+G(s∗B)+m·eℓ. This is a contradiction, how-ever, since ifG(s∗A) and(G(s∗B)+ v) differ at exactly oneindex, then:

m·eℓy = G(s∗A)+ [(G(s∗B)+ v)]

for someℓy andm, by definition ofm·eℓy.

Zero Knowledge. The audit server can simulate its viewof a successful run of the protocol (one in which the in-put keys are well-formed) by invoking theAlmostEqual

simulator twice.

E Security Proof for Zero-Knowledge Protocol

Completeness.Completeness for the first half of the pro-tocol, which checks the form of theB andS vectors, fol-lows directly from the construction of those vectors.

The one slightly subtle step comes in Step 5 of the sec-ond half of the protocol. For the protocol to be complete,it must be thatGsum is zero at every index except one.This is true because:

Gsum= (Σs−1i=0Gi)+ v = G(s∗)+m·eℓy−G(s∗) = m·eℓy

Soundness.The soundness of the non-interactive zero-knowledge proof in the first half of the protocol guaran-tees that theB vectors sum toeℓx and that thes vectorssum tos∗ ·eℓx for some valuesℓx ∈ Zx ands∗ ∈ S.

We must now argue that the probability that all serversaccept an invalid write request in the second half of theprotocol is negligible. The soundness property of the un-derlying zero-knowledge proof used in Step 5 implies thatthe vectorGsum contains commitments to zero at all in-dices except one. A client who violates the soundnessproperty produces a vectorv and seed values∗ such that(Σs−1

i=0Gi)+v = m·eℓy for some valuesℓy∈Zy andm∈G,and thatv 6= m·eℓy−G(s∗). This is a contradiction, how-

ever, since(Σs−1i=0Gi) = G(s∗), by the first half of the pro-

tocol, and so:

(Σs−1i=0Gi)+ v = m·eℓy = G(s∗)+ v

Finally, we conclude thatv = m·eℓy−G(s∗).

Zero Knowledge. The servers can simulate every mes-sage they receive during a run of the protocol. In particu-lar, they see only Pedersen commitments, which are statis-tically hiding, and non-interactive zero-knowledge proofs,which are simulatable in the random-oracle model [6].

24

riposte: an anonymous messaging system handling millions ...this paper presents riposte, a new...

Documents