curious case of rowhammer: flipping secret exponent bits ... curious case of rowhammer: flipping...

23
Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya (B ) and Debdeep Mukhopadhyay Department of Computer Science and Engineering, Indian Institute of Technology, Kharagpur, Kharagpur 721302, India {sarani.bhattacharya,debdeep}@cse.iitkgp.ernet.in Abstract. Rowhammer attacks have exposed a serious vulnerability in modern DRAM chips to induce bit flips in data which is stored in mem- ory. In this paper, we develop a methodology to combine timing analysis to perform the hammering in a controlled manner to create bit flips in cryptographic keys which are stored in memory. The attack would require only user level privilege for Linux kernel versions before 4.0 and is unaware of the memory location of the key. An intelligent combina- tion of timing Prime + Probe attack and row-buffer collision is shown to induce bit flip faults in a 1024 bit RSA key on modern processors using realistic number of hammering attempts. This demonstrates the feasibil- ity of fault analysis of ciphers using purely software means on commer- cial x86 architectures, which to the best of our knowledge has not been reported earlier. The attack is also relevant for the newest Linux kernel in a Cross-VM environment where the VMs having root privilege are not denied to access the pagemap. Keywords: Rowhammer · Fault attack · Prime + Probe · Bit flip 1 Introduction Rowhammer is a term coined for disturbances observed in recent DRAM devices in which repeated row activation causes the DRAM cells to electrically interact among themselves [14]. This results in bit flips [2] in DRAM due to discharging of the cells in the adjacent rows. DRAM cells are composed of an access transistor and a capacitor which stores charge to represent a bit. Since capacitors loose their charge with time, DRAM cells require to be refreshed within a fixed interval of time referred to as the refresh interval. DRAM comprises of two dimensional array of cells, where each row of cells have its own wordline and for accessing each row, its respective wordline needs to be activated. Whenever some data is requested, the cells in the corresponding row are copied to a direct-mapped cache termed Row-buffer. If same row is accessed again, then the contents of row-buffer is read without row activation. However, repeatedly activating a row causes cells in adjacent rows to discharge themselves and results in bit flip. c International Association for Cryptologic Research 2016 B. Gierlichs and A.Y. Poschmann (Eds.): CHES 2016, LNCS 9813, pp. 602–624, 2016. DOI: 10.1007/978-3-662-53140-2 29

Upload: others

Post on 17-Oct-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

Curious Case of Rowhammer:Flipping Secret Exponent Bits Using Timing

Analysis

Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

Department of Computer Science and Engineering, Indian Institute of Technology,Kharagpur, Kharagpur 721302, India

{sarani.bhattacharya,debdeep}@cse.iitkgp.ernet.in

Abstract. Rowhammer attacks have exposed a serious vulnerability inmodern DRAM chips to induce bit flips in data which is stored in mem-ory. In this paper, we develop a methodology to combine timing analysisto perform the hammering in a controlled manner to create bit flipsin cryptographic keys which are stored in memory. The attack wouldrequire only user level privilege for Linux kernel versions before 4.0 andis unaware of the memory location of the key. An intelligent combina-tion of timing Prime + Probe attack and row-buffer collision is shown toinduce bit flip faults in a 1024 bit RSA key on modern processors usingrealistic number of hammering attempts. This demonstrates the feasibil-ity of fault analysis of ciphers using purely software means on commer-cial x86 architectures, which to the best of our knowledge has not beenreported earlier. The attack is also relevant for the newest Linux kernelin a Cross-VM environment where the VMs having root privilege are notdenied to access the pagemap.

Keywords: Rowhammer · Fault attack · Prime + Probe · Bit flip

1 Introduction

Rowhammer is a term coined for disturbances observed in recent DRAM devicesin which repeated row activation causes the DRAM cells to electrically interactamong themselves [1–4]. This results in bit flips [2] in DRAM due to dischargingof the cells in the adjacent rows. DRAM cells are composed of an access transistorand a capacitor which stores charge to represent a bit. Since capacitors loose theircharge with time, DRAM cells require to be refreshed within a fixed interval oftime referred to as the refresh interval. DRAM comprises of two dimensionalarray of cells, where each row of cells have its own wordline and for accessingeach row, its respective wordline needs to be activated. Whenever some datais requested, the cells in the corresponding row are copied to a direct-mappedcache termed Row-buffer. If same row is accessed again, then the contents ofrow-buffer is read without row activation. However, repeatedly activating a rowcauses cells in adjacent rows to discharge themselves and results in bit flip.c© International Association for Cryptologic Research 2016B. Gierlichs and A.Y. Poschmann (Eds.): CHES 2016, LNCS 9813, pp. 602–624, 2016.DOI: 10.1007/978-3-662-53140-2 29

Page 2: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

Curious Case of Rowhammer: Flipping Secret Exponent Bits 603

The authors in [2] show that rowhammer vulnerability exists in majority ofthe recent commodity DRAM chips and is most prevalent for sub 40 nm mem-ory technology. DRAM process technology over the years have seen continuousreduction in the size of each cell and cell density has also increased to a greatextent reducing the cost per bit of memory. Smaller cells hold less charge andthis makes them even more vulnerable. Moreover, the increased cell density hasa negative impact on DRAM reliability [2] as it introduces electromagnetic cou-pling effects and leads to such disturbance errors.

In [5], it was first demonstrated that rowhammer not only has issues regard-ing memory reliability but also are capable of causing serious security breaches.DRAM vulnerability causing bit flips in Native Client (NaCl) program wasshown to gain privilege escalation, such that the program control escapes x86NACL sandbox and acquires the ability to trigger OS syscall. The blog alsodiscusses about how rowhammer can be exploited by user-level programs togain kernel privileges. In [5], the bit flips are targeted at a page table entrywhich after flip points to a physical page of the malicious process, thus pro-viding read-write access to cause a kernel privilege escalation. A new approachfor triggering rowhammer with x86 non-temporal store instruction is proposedin [6]. The paper discusses an interesting approach by using libc’s memset andmemcpy functions and the threats involved in using non-temporal store instruc-tions for triggering rowhammer. A vivid description of the possible attack sce-narios exploiting bit flips from rowhammer are also presented in the paper.

A javascript based implementation of the rowhammer attack, Rowham-mer.js [7] is implemented using an optimal eviction strategy and claims to exploitrowhammer bug with high accuracy without using clflush instruction. Beingimplemented in javascript, it can induce hardware faults remotely. But, in orderto mount a successful fault attack on a system the adversary should have thehandle to induce fault in locations such that the effect of that fault is useful inmaking the attack successful. Another variant of rowhammer exists, termed asdouble-sided-rowhammer [8]. This variant chooses three adjacent rows at a time,targeting bit blips at the middle row by continuously accessing the outer rows forhammering. The existing implementation [8] is claimed to work in systems hav-ing a specific DRAM organization. In all of the existing works, precisely inducingbit flip in the data used by a co-residing process has not been attempted. Noneof the previous works attempted to demonstrate a practical fault analysis attackusing rowhammer. The address mappings to various components of the LLCand DRAM being functions of physical address bits, inducing bit flip in a dataresiding in an unknown location of DRAM seems to be a challenging task.

In this paper, we illustrate a software driven fault attack on public key expo-nentiation by inducing a bit flip in the secret exponent. It is well known from [9],theoretically if any fault is induced while public key exponentiation is takingplace, then a single faulty signature is enough to leak the secret exponent in anunprotected implementation. However, to inflict the fault using rowhammer onthe secret exponent to lead to a usable faulty signature requires further investi-gation. While [5] was able to successfully induce rowhammer flips in the DRAMto cause a fault in a page table entry, the challenge to induce faults to perform

Page 3: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

604 S. Bhattacharya and D. Mukhopadhyay

a fault attack on a cipher, requires a better understanding of the location of thesecret key in the corresponding row of a bank of the DRAM. More recent devel-opments of rowhammering, like double-sided-rowhammer [8], while increasingthe probability of a bit flip, cannot be directly applied to the current scenario,as the row where the secret resides can be in any arbitrary location in the targetbank. The chance of the memory location for the secret key lying between therows of the allocated memory for rowhammer is low. In this scenario a double-sided-rowhammer will reduce the probability of a successful exploitable fault.Hence, it is imperative to ascertain the location of the secret exponent beforelaunching the rowhammer. Our novelty is to combine Prime + Probe attackand row-buffer collision, detected again through timing channel, to identify thetarget bank where the secret resides.

We combine knowledge of reverse engineering of LLC slice and DRAMaddressing with timing side-channel to determine the bank in which secretresides. We precisely trigger rowhammer to address in the same bank as thesecret. This increases probability of bit flip in the secret exponent and the nov-elty of our work is that we provide series of steps to improve the controllabilityof fault induction. The overall idea of the attack involves three major steps. Theattacker successfully identifies an eviction set which is a set of data elementswhich maps to the same cache set and slice as that of the secret exponent bytiming analysis using Prime + Probe methodology. This set essentially behavesas an alternative to clflush statement of x86 instruction set. The attacker nowobserves timing measurements to DRAM access to determine the DRAM bankin which the secret resides in. The variation in timing is observed due to therow-buffer conflict forced by the adversary, inducing bit flips by repeated rowactivation in the particular bank where the secret is residing. Elements whichmap to same bank but separate rows are accessed continuously to observe asuccessful bit flip.

The organization of the paper is as follows:- The following Sect. 2 provides abrief idea on Cache and DRAM architecture and also the rowhammer bug. InSect. 3, we provide the algorithm using timing observations to determine the LLCset, slice and also the DRAM banks in which the secret maps to. Section 4 pro-vides the experimental validation for a successful bit flip in the secret exponentin various steps. Section 5 discusses the existing hardware and software drivencountermeasures to the rowhammer problem. Section 6 provides a detailed dis-cussion on the assumptions and limitations of our attack model on bigger rangeof systems.The final section concludes the work we present here.

2 Preliminaries

In this section, we provide a background on some key-concepts, which includesome DRAM details, the rowhammer bug and details of cache architecture whichhave been subjected to attack.

Page 4: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

Curious Case of Rowhammer: Flipping Secret Exponent Bits 605

2.1 Dynamic Random Access Memory

Dynamic Random-Access Memory (DRAM) is a Random-Access Memory inwhich each unit of charge is stored in a capacitor and is associated with anaccess transistor, together they constitute a cell of DRAM. The DRAM cellsare organized in rows and columns. The access transistor is connected to thewordline of a row, which when enabled, connects the capacitor to the bitline ofthe column and allows reading or writing data to the connected row. The readingor writing to cells in a row is done through row-buffer which can hold chargesfor a single row at a time. There are three steps that are performed, when datais requested to be read from a particular row:

Opening Row - The wordline connected to the row is enabled, which allowsthe capacitors in the entire row to get connected to the bitlines. This results inthe charge of each cell to discharge through the bitlines to the row-buffer.Reading or Writing to cells - The row-buffer data is read or written by thememory controller by accessing respective columns.Closing Row - The wordline of the respective row is disabled, before someother row is enabled.

Processor

yro

meM

rellort

no

C Bank Bank

Rank0

Rank1

Channel

Fig. 1. DRAM architecture [2]

DRAM Architecture. DRAM is hierarchically composed of Channels, Rankand Banks. The physical link between the memory controller and the DRAMmodule is termed as channel. Inside the channel, the physical memory mod-ules connected to the motherboard are named as Dual Inline Memory Module(DIMM) which typically comprises of one or two ranks as in Fig. 1. Each rankis further comprised of multiple banks, as for example, 8 banks exist in a DDR3rank. Each bank is a two-dimensional collection of cells having typically 214

to 217 rows and a row-buffer. Any row in a particular bank can only be readand written by involving the row-buffer. The latency in DRAM access when twoaccess request concurrently map to same channel, rank, bank but different row istermed as row-buffer conflict. The channel, rank, bank and the row index wherea data element is going to reside, is decided as functions of physical address ofthe concerned data element.

The capacitors in each cell of DRAM discharges with time. The capacitor canhold charge upto a specific interval of time before it completely looses its charge.This interval is termed as retention time, which is guaranteed to be 64 ms inDDR3 DRAM specifications [10]. But it is shown, that repeated row activationover a period of time leads to faster discharge of cells in the adjacent rows [2].

Page 5: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

606 S. Bhattacharya and D. Mukhopadhyay

2.2 The Rowhammer Bug

Persistent and continuous accesses to DRAM cells, lead the neighboring cellsof the accessed cell to electrically interact with each other. The phenomenon offlipping bits in DRAM cells is termed as the rowhammer bug [1,2]. As describedin Sect. 2.1, accessing a byte in memory involves transferring data from therow into the bank’s row-buffer which also involves discharging the row’s cells.Repeated discharging and recharging of the cells of a row can result in leakageof charge in the adjacent rows. If this can be repeated enough times, beforeautomatic refreshes of the adjacent rows (which usually occur every 64 ms), thisdisturbance error in DRAM chip can cause bit flips.

Code-hammer{

mov (X), %eax // read from address Xmov (Y), %ebx // read from address Yclflush (X) // flush cache for address Xclflush (Y) // flush cache for address Yjmp Code-hammer

}

In [2], a code snippet is provided in which the program generates a readto DRAM on every data access. The mov instructions generate access requestto DRAM at X and Y locations. The following clflush instructions evict thedata elements from all levels of cache. However, if X and Y point to differentrows in the same bank, code-hammer will cause X and Y’s rows to be repeatedlyactivated. This may lead to disturbance error in the adjacent cells of the accessedrows, resulting in flipping bits in the adjacent row cells.

Core 0

L1 Inst

32 KB

L1 Data

32 KB

L2 256 KB

Core 1

L1 Inst

32 KB

L1 Data

32 KB

L2 256 KB

Core 2

L1 Inst

32 KB

L1 Data

32 KB

L2 256 KB

Core 3

L1 Inst

32 KB

L1 Data

32 KB

L2 256 KB

L3 Unified - 6 MB

Fig. 2. Cache architecture in Intel Ivy Bridge [11]

In the next subsection, we summarize some key concepts of cache architecturein modern processors.

2.3 Cache Memory Architecture

In recent processors, there exists a hierarchy of cache memories where the size ofeach cache level increases as we move to its higher level, but their access times

Page 6: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

Curious Case of Rowhammer: Flipping Secret Exponent Bits 607

32 20 16 11 5 0

Frame number Page offset

hash

Line offset

Set index

Physical

Page number Page offset

MMU

....

Slice 0 Slice 1 Slice 2 Slice 3

TagAddress

VirtualAddress

30

2

11

Fig. 3. Cache memory indexing [12]

increases. L3 or Last Level Cache (LLC) is shared across processor cores, takeslarger time and is further divided into slices such that it can be accessed bymultiple cores concurrently. Figure 2 illustrates the architectural specificationfor a typical Intel Ivy-Bridge architecture [11]. In Intel architecture, the dataresiding in any of the lower levels of cache are included in the higher levels aswell. Thus are inclusive in nature. On a LLC cache miss, the requested elementis brought in the cache from the main memory, and the cache miss penalty ismuch higher compared to the lower level cache hits.

Requested data are brought from the main memory in the cache in chunks ofcache lines. This is typically of 64 Bytes in recent processors. The data requestedby the processor is associated with a virtual address from the virtual addressspace allocated to the running process by the Operating System. The virtualaddress can be partitioned into two parts: the lower bits in Fig. 3 is the offsetwithin the page typically represented by log2(page size) bits, while the remain-ing upper bits forms the page number. The page number forms an index to pagetable and translates to physical frame number. The frame number together withthe offset bits constitute the physical address of the element. The translation ofvirtual to physical addresses are performed at run time, thus physical addressesof each elements are most likely to change from one execution to another.

The physical address bits decide the cache sets and slices in which a data isgoing to reside. If the cache line size is of b bytes, then least significant log2(b)bits of the physical addresses are used as index within the cache line. If thetarget system is having k processor cores, the LLC is partitioned into k sliceseach of the slice have c cache sets where each set is m way associative.

The log2(k) bits following log2(b) bits for cache line determines the cacheset in which the element is going to reside. Because of associativity, m suchcache lines having the identical log2(k) bits reside in the same set. The sliceaddressing in modern processors is implemented computing a complex Hashfunction. Recently, there has been works which reverse engineered [12,13] theLLC slice addressing function. Reverse engineering on Intel architectures hasbeen attempted in [13] using timing analysis. The functions differ across differentarchitectures and each of these functions are documented via successful reverseengineering in [12,13].

Page 7: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

608 S. Bhattacharya and D. Mukhopadhyay

In the following sections, we use this knowledge of underlying architecture tobuild an attack model which can cause successful flips in a secret value.

3 Combining Timing Analysis and Rowhammer

In this section we discuss the development of an algorithm to induce bit flip inthe secret exponent by combining timing analysis and DRAM vulnerability.

3.1 Attack Model

In this paper, we aim to induce bit fault in the secret exponent of the publickey exponentiation algorithm using rowhammer vulnerability of DRAM withincreased controllability. The secret resides in some location in the cache memoryand also in some unknown location in the main memory. The attacker havinguser-level privileges in the system, does not have the knowledge of these locationsin LLC and DRAM since these location are decided by mapping of physicaladdress bits. The threat model assumed throughout the paper allows adversaryto send known ciphertext to the algorithm and observe its decrypted output.

Let us assume that the adversary sends input plaintext to the encryptionprocess and observes the output of the encryption. Thus the adversary gets holdof a valid plaintext-ciphertext pair, which will be used while checking for thebit flips. The adversary has the handle to send ciphertext to the decryptionoracle, which decrypts the input and sends back the plaintext. The decryptionprocess constantly polls for its input ciphertexts and sends the plaintext to therequesting process. The adversary aims to reproduce bit flip in the exponentand thus first needs to identify the corresponding bank in DRAM in whichthe secret exponent resides. Let us assume, that the secret exponent resides insome bank say, bank A. Though the decryption process constantly performsexponentiation with accesses to the secret exponent, but such access requestsare usually addressed from the cache memory itself since they result in a cachehit. In this scenario it is difficult for the adversary to determine the bank inwhich the secret resides because the access request from the decryption processhardly results in main memory access.

According to the DRAM architecture, the channel, rank, bank and rowaddressing of the data elements depend on the physical address of the dataelements. In order to perform rowhammering on the secret exponent, preciseknowledge of these parameters need to be acquired, which is impossible for anadversary since the adversary does not have the privilege to obtain the cor-responding physical addresses to the secret. This motivates the adversary toincorporate a spy process which probes on the execution of the decryption algo-rithm and uses timing analysis to successfully identify the channel, rank andeven the bank where the secret gets mapped to.

The adversary introduces a spy process which runs everytime, before eachdecryption is requested. The spy process issues accesses to data elements ofthe eviction set, which eventually flushes the existing cache lines with its own

Page 8: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

Curious Case of Rowhammer: Flipping Secret Exponent Bits 609

data requests and fills the cache. Thus during the next request to the decryp-tion process, the access to the secret exponent results in a cache miss and thecorresponding access request is addressed from the bank A of main memory.Effectively, a spy process running alternate to the decryption process, makesarbitrary data accesses to ensure that every access request from the decryptionprocess is addressed from the corresponding bank of the main memory.

3.2 Determining the Eviction Set

As mentioned before, the attack model is assumed to be such that the adversaryhas access to a system where the decryption is running. The decryption algorithmperforms exponentiation involving the secret exponent and initially the adversaryaims to determine the cache sets in which the secret exponent bits maps to.The adversary is oblivious of the virtual address space used by the decryptionengine and thus involves a spy process which uses Prime + Probe [11,14,15]cache access methodology to identify the target sets. The execution scenarioof the decryption and the adversarial processes running concurrently on thesame system are depicted in Fig. 4. In this context, the spy process targets theLast Level cache (LLC) since it is shared within all cores of the system. Theadversary sends input to the decryption engine which performs decryption onrequest, otherwise remains idle. Figure 4 illustrates the following steps.

1. Step 1: The adversary starts the spy process, which initially allocates a setof data elements and consults its own pagemap to obtain the correspondingphysical addresses for each element. The kernel allows userspace programsto access their own pagemap (/proc/self/pagemap)1 to examine the pagetables and related information by reading files in /proc. The virtual pages are

3. Computes the set,slice addressingfrom its physical addresses.

5. Sends a selected input to the Decryption Engine

6. Decryption runs with input from the requesting process

7. Receives decrypted message from Decryption Engine

8. Spy accesses the seleced elements againand measures their access times.

Decryption Engine Adversary Spy

1. Initiates the spy process.

2. Generates a memory map.

b. Primes LLC, by accessing selected elements.

Time

4. For the target set t,a. Select m elements in distinct cachelineswhich maps to set t for k slices.

Fig. 4. Steps to determine cache sets shared by secret exponent

1 For all Linux kernels before version 4.0, the versions released from early 2015 requiresadministrator privilege to consult pagemap, refer to Sect. 6.2.

Page 9: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

610 S. Bhattacharya and D. Mukhopadhyay

mapped to physical frames and this mapping information is utilized by thespy process in the following steps.

2. Step 2: Once the physical addresses are obtained, the Last level cache setnumber and their corresponding LLC slice mappings are precomputed bythe spy with the address mapping functions as explained in Sect. 2.3. Let ussuppose, that the target system is having k processor cores, thus the LLC ispartitioned into k slices, each of the slice have c cache sets where each set ism way associative. All elements belonging to the same cache line are fetchedat a time.

– If the cache line size is of b bytes, then least significant log2(b) bits of thephysical addresses are used as index within the cache line.

– As described in Sect. 2.3, the log2(k) bits following these log2(b) bits deter-mine the cache set in which the element is going to reside.

– Because of associativity, m such cache lines having identical log2(k) bitsreside in the same set.

– In modern processors, there exists one more parameter that determineswhich slice the element belongs to. Computing the Hash function reverseengineered in [12,13], we can also compute the slice in which a cache setgets mapped. The functions are elaborated in the experimental Sect. 4.1.

Thus, at the end of this step, the spy simulates the set number and slicenumber of each element in its virtual address space.Repeat the following steps for all of the c sets in the LLC.

3. Priming Target set t: The spy primes the target Set t and becomes idle.This is the most crucial step for the entire procedure since the success ofcorrectly determining the cache sets used by spy process entirely depends onhow precisely the existing cache lines have been evicted from the cache by thespy in the Prime phase. In order to precisely control the eviction of existingcache lines from set t, a selection algorithm is run by the spy which selects aneviction set of m∗k elements each belonging to set t from its defined memorymap.

– Thus the selection algorithm selects elements belonging to distinct cachelines for each of the k cache slices where their respective physical addressesmaps to the same set t. These selected data elements constitutes theeviction set for the set t.

– In addition to this, since each set of a slice is m way associative, theselection algorithm selects m such elements corresponding to each k cacheslice belonging to set t.

– The spy process accesses each of these m ∗ k selected memory elementsrepeatedly in order to ensure that the cache replacement policy hasevicted all the previously existing cache lines.

This essentially ensures that the target set t of all slices is only occupied withelements accessed by the spy.

4. Decryption Runs: The adversary sends the chosen ciphertext for decryptionand waits till the decryption engine sends back the message. In this decryptionprocess, some of the cache lines in a particular set where the secret maps, getsevicted to accommodate the cache line of the secret exponent.

Page 10: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

Curious Case of Rowhammer: Flipping Secret Exponent Bits 611

5. Probing LLC: On getting the decrypted output the adversary signals thespy to start probing and timing measurements are noted. In this probing step,the spy process accesses each of the selected m elements (in Prime phase) ofeviction set t for all slices and time to access each of these elements areobserved.The timing measurements will show a variation when the decryption algo-rithm shares same cache set as the target set t. This is because, after thepriming step the adversary allows the decryption process to run. If the cachesets used by the decryption is same as that of the spy, then some of the cachelines previously primed by the spy process gets evicted during the decryption.Thus, when the spy is again allowed to access the same elements, if it takeslonger time to access then it is concluded that the cache set has been accessedby the decryption as well. On the other hand, if the cache set has not beenused by the decryption, then the time observed during probe phase is lesssince no elements primed by spy have been evicted in the decryption phase.

Determining the LLC Slice Where the Secret Maps. The Prime + Probetiming analysis elaborated in the previous discussion successfully identifies theLLC set in which the cache line containing the secret exponent resides. Thusat the end of the previous step we obtain an eviction set of m ∗ k elementswhich map to the same set as the secret in all of the k slices. Now, this timethe adversary can easily identify the desired LLC slice by iteratively running thesame Prime + Probe protocol separately for each of the k slices with the selectedm elements for that particular slice. The timing observations while probing willshow significant variation for a set of m elements which corresponds to the sameslice where the secret maps. Thus we further refine the size of eviction set fromm ∗ k to m elements.

3.3 Determining the DRAM Bank that the Secret Maps

In this section, we describe a timing side channel analysis performed by theadversary to successfully determine the bank of the DRAM in which the secretexponent maps to. In the previous section, a timing analysis is elaborated whichfinally returns an eviction set of m elements which maps to the same set aswell as the same slice as the secret exponent. Thus, if the adversary allows thespy and the decryption engine to run in strict alternation, then the decryptionengine will always encounter a cache miss for the secret exponent, and the accessrequest shall always result in a main memory access. As described in Sect. 2.1,DRAMs are primarily partitioned into channel, ranks and banks where each bankaccess is serviced through a buffer termed as row-buffer. Concurrent accesses todifferent rows in the same DRAM bank results in row-buffer conflict and auto-matically leads to higher access time. The functions which decide the channel,rank and bank mapping from the physical addresses are not disclosed by thearchitecture manufacturers. In some recent works, reverse engineering of theseunknown mappings have been targeted. A successful deployment of a high speedcovert channel has also been reported [16].

Page 11: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

612 S. Bhattacharya and D. Mukhopadhyay

In this paper, we illustrate a timing analysis of accessing separate DRAMbanks using this knowledge of reverse engineering and the following steps high-light how this is achieved. In order to exploit timing variation occurring due tothe row-buffer collision, accesses requested from the decryption process as wellas the adversarial spy process must result in main memory accesses. Intuitively,DRAM access time will increase only if addresses map to the same bank but toa different row. Thus to observe row-buffer conflict between the decryption andadversarial spy the major challenges are:

– To ensure that every access to secret exponent by the decryption processresults in LLC cache miss and thus automatically result in main memoryaccess. This is elaborated in the previous subsection, as to how the spy deter-mines the eviction set and selectively accesses those elements to evict existingcache lines from the set. Let the spy generates an eviction set C with dataelements in distinct cache lines mapping to the same set and slice as thesecret.

– This suggests that before each decryption run, the spy has to fill the particularcache set by accessing elements in eviction set C.

– In addition to this, row-buffer conflict and access time delay can only beobserved if two independent processes concurrently request data residing inthe same bank but in different rows. In order to produce a row-buffer conflictwith the secret exponent requested by the spy, the adversary has to produceconcurrent access requests to the same bank.

The adversary allows the spy process to mmap a chunk of memory and thespy refers to its own pagemap to generate the physical addresses correspondingto each memory element. Following the functions reverse engineered in [16,17]the spy pre-computes the channel, rank and bank addressing for the correspond-ing physical addresses. As illustrated in Fig. 5, the timing analysis has to be

3. Computes the set,slice addressingfrom its physical addresses.

Decryption Engine Adversary Spy

1. Initiates the spy process.

2. Generates a memory map.

Time

4. Computes the Channel, Rank, Bank indices from physical addresses

8. Sends a selected input to the Decryption Engine

7. Primes LLC, by accessing elements in C.

from Decryption Engine10. Receives decrypted message

9. Decryption runs with input from the requesting process

10. Flush the accessed element from cache

9. Access randomly selected data which maps to

5. Fill Set C with elements mapping tosame LLC set and slice as the secret6. For each bank b in DRAM,

target bank b and time the access.

using clflush.

Fig. 5. Steps to determine the DRAM bank in which secret maps

Page 12: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

Curious Case of Rowhammer: Flipping Secret Exponent Bits 613

performed by accessing elements from each bank. After each access request bythe spy, the elements are flushed deliberately from the cache using clflush.

The adversary sends an input to the decryption engine and waits for theoutput to be received. While it waits for the output, the spy process targets oneparticular bank, selects a data element which maps to the bank and accesses thedata element. This triggers concurrent accesses from the spy and the decryptionto the DRAM banks. Repeated timing measurements are observed for each ofthe DRAM bank accesses by the spy, and this process is iterated for elementsfrom each DRAM bank respectively.

3.4 Performing Rowhammer in a Controlled Bank

In the previous subsections, we have discussed how the adversary performs tim-ing analysis to determine cache set collision and subsequently use it to determineDRAM bank collisions to identify where the secret data resides. In this section,we aim to induce fault in the secret by repeatedly toggling DRAM rows belongingto the same DRAM bank as the secret.

Inside the DRAM bank, the cells are two-dimensionally aligned in rows andcolumns. The row index in which any physical address maps is determined bythe most significant bits of the physical address. Thus it is absolutely impossi-ble for an adversary to determine the row index of the secret exponent. Thusrowhammer to the secret exponent has to be performed with elements whichmap to the same DRAM bank as the secret, but on different row indices untiland unless the secret exponent is induced with a bit flip.

The original algorithm for rowhammer in [2], can be modified intelligently toachieve this precise bit flip. The algorithm works in following steps:

– A set of addresses are chosen which map to different row but the same bankof DRAM.

– The row indices being a function of the physical address bits are simulatedwhile execution. Elements of random row indices are selected and accessedrepeatedly by the adversary to induce bit flips in adjacent rows.

– The detection of bit flip in secret can be done easily, if and only if the outputof decryption differs.

The rowhammering attempts required to produce a suitable bit flip on thesecret depends on the total number of rows in a bank, since the adversary has nohandle to know in which row in the bank the secret exponent is residing. Neitherit has handle to place its own mmap-ed data deliberately adjacent to the secret,such that it can easily exploit the rowhammer bug. Thus the adversary can onlyselect those elements which belong to the same bank as secret and access themrepeatedly to induce bit flips in the secret. To increase the probability of bit flipin the secret exponent, the adversary needs to mmap multiple times to generatedata which belong to different rows.

Page 13: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

614 S. Bhattacharya and D. Mukhopadhyay

4 Experimental Validation for Inducing BitFlips on Secret

In this section we present the validation of our previous discussion throughexperiments. Our experiments are framed with the following assumptions:

– We target an 1024 bit RSA implementation using square and multiply asthe underlying exponentiation algorithm. We have used the standard GNU-MPbig integer library (version number 2:5.0.2+dfsg-2) for our implementation.The adversary sends a chosen ciphertext for decryption which involves anexponentiation using the secret exponent.

– The experimentations are performed considering the address bit mappings ofIntel Ivy Bridge micro-architecture. These are the line of processors basedon the 22 nm manufacturing process. The experiments are performed on IntelCore i5-3470 processor running Ubuntu 12.04 LTS with the kernel version of3.2.0-79-generic.

– The adversary is assumed to have user-level privileges to the system wheredecryption process runs. It uses mmap to allocate a large chunk of data andaccesses its own pagemap (/proc/self/pagemap) to get the virtual to physicaladdress mappings. The Linux kernel version for our experimental setup beingolder than version 4.0, we did not require administrator privileges to performthe entire attack.

4.1 Identifying the Cache Set

The experiments being performed on RSA, the 1024 bit exponent resides in con-secutive 1024 bit locations in memory. Considering the cache line size as 64 bytes,1024 bits of secret maps to 2 cache lines. As described in Sect. 2.3, 11 bits ofphysical address from b6, b7, · · · b16 refer to the Last Level cache set. Moreover,the papers [12,13] both talk about reverse engineering of the cache slice selec-tion functions. The authors in paper [13], used Prime + Probe methodology tolearn the cache slice function, while the authors in [12] monitored events fromthe performance counters to build the cache slice functions. Though it has beenobserved that the LLC slice functions reported in these two papers are not same.

In our paper, we devised a Prime + Probe based timing observation setupand wished to identify the target cache set and slice which collides with thesecret. Thus we were in the lines of [13] and used the function from [13] in ourexperiments for Prime + Probe based timing observations. As illustrated in thefollowing section, the timing observations using functions from [13] can correctlyidentify the target cache slice where the secret maps to. Reverse engineering ofLast Level Cache (LLC) slice for Intel Ivy Bridge Micro-architecture in [13] usesthe following function:

b17 ⊕ b18 ⊕ b20 ⊕ b22 ⊕ b24 ⊕ b25 ⊕ b26 ⊕ b27 ⊕ b28 ⊕ b30 ⊕ b32.

Page 14: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

Curious Case of Rowhammer: Flipping Secret Exponent Bits 615

550

600

650

700

750

800

850

900

950

1000

0 10 20 30 40 50 60

Cach

e acce

ss tim

e

Iterations

No collisionCollision

Fig. 6. Timing observations for cache set collision

But the architectural specification described in [13], documented that theirselected system for Ivy Bridge architecture has LLC size of 3MB. In our experi-mental setup, instead of 3MB we had 6MB LLC with is divided among 4 cores.Thus we adopted the functions documented for Haswell, and the function workedsuccessfully. The functions used for slice selection are:

h0 = b17 ⊕ b18 ⊕ b20 ⊕ b22 ⊕ b24 ⊕ b25 ⊕ b26 ⊕ b27 ⊕ b28 ⊕ b30 ⊕ b32h1 = b18 ⊕ b19 ⊕ b21 ⊕ b23 ⊕ b25 ⊕ b27 ⊕ b29 ⊕ b30 ⊕ b31 ⊕ b32

Our host machine has LLC with 12 way associativity and having 4 cache sliceseach consisting of 2048 sets. The adversary mmaps a large chunk of memory,and consults its own pagemap to obtain the physical addresses correspondingto each element in the memory map. Using the equations mentioned above, theadversarial process simulates the cache set and slice in which their respectivephysical address points to. The experimental setup being 12-way associative,the selection algorithm for each set and slice selects the eviction set with 12elements belonging to distinct cache lines and mapping to the same set for aparticular slice. The host machine having 4 LLC slice for 4 cores selects theeviction set having altogether 12 ∗ 4 = 48 data elements in distinct cache linesmapped to same set and all 4 slices.

In our experimental setup, the adversary runs the Prime + Probe cacheaccess methodology over each of the 2048 sets in each of the LLC slices. Each ofthe 2048 sets are targeted one after another. The cycle starts with priming a setwith elements from eviction set and then allowing the decryption to happen andagain observing the timings required for accessing selected elements from the set.The timing observations from the probe phase on 2 such LLC sets are illustratedin Fig. 6. The sets are chosen such that one of them is having a collision withthe secret exponent and the other set does not have any collision. The variationof timing in these two sets is apparent in Fig. 6, where the set which observes acollision observes higher access time to the other set. The average access timeof the these two sets during the probe phase differs by approximately 80 clockcycles. This implies that the LLC cache set having collision with the secretexponent cache line can be identified from the other sets which does not havecollision with the decryption algorithm.

Page 15: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

616 S. Bhattacharya and D. Mukhopadhyay

700

750

800

850

900

950

1000

1050

1100

1150

0 10 20 30 40 50 60 70 80 90 100

Cac

he a

cces

s tim

e

Iterations

Slice 0Slice 2

(a) Timing Observations during

Probe phase when secret maps to

slice 0

850

900

950

1000

1050

1100

1150

1200

1250

0 10 20 30 40 50 60 70 80

Cac

he a

cces

s tim

e

Iterations

Slice 0Slice 2

(b) Timing Observations during

Probe phase when secret maps to

slice 2

Fig. 7. Timing observations for LLC slice collision

4.2 Alternative Strategy to Determine the Target Cache Set

In the previous subsection, we observed that timing observations obtained byrepeating the m ∗ k accesses to the individual set on all k slices is sufficientfor identification of the target cache set. Though in [7], it has been stated thatonly m ∗ k accesses may not be sufficient to guarantee the existing cache linesto be evicted from the cache. In this context, we argue that the cache evictionsets are identified in [7] so that accessing the elements of this set in a prede-termined order results in an equivalent effect of clflush to induce rowhammerflips. Since to exhibit successful bit flips, hammering of rows needs to satisfyvarious preconditions, it was crucial for authors in [7] to generate an optimaleviction set.

In our paper the conditions are little less stringent, since we are using clflushto induce bit flips. In addition to this, we constrain the hammering to the targetbank. The identification of this bank has a series of experiments to be performed.In this scenario, we claimed that accessing the near-optimal eviction set of m∗kaccesses for each cache set for all k slices repeatedly will result in eviction ofsecret from the respective cache set and result in DRAM accesses of the secretkey. In addition, we have again performed our experiments by implementing theoptimal eviction set as described in [7]. The results we obtain in Fig. 8a can becompared with Fig. 6. The separation of timing and identification of collisionset from the non-collision set definitely improves upon accessing the eviction setwith parameters defined in [7].

4.3 Identifying the LLC Slice

Once the cache set is identified, the variation from timing observations for dif-ferent LLC slices leak the information of which LLC slice the secret maps to.In the same experimental setup as in the previous section, we identify the slicein which the actual secret resides, using timing analysis with the slice selectionfunction. Since we have already identified the LLC cache set with which the

Page 16: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

Curious Case of Rowhammer: Flipping Secret Exponent Bits 617

1100

1200

1300

1400

1500

1600

1700

1800

1900

2000

0 20 40 60 80 100

cach

e ac

cess

tim

e in

clo

ck c

ycle

s

Iterations

CollisionNo Collision

(a) Timing Observation of Cache set

Collision from optimal eviction set

1150

1200

1250

1300

1350

1400

0 20 40 60 80 100

cach

e ac

cess

tim

e in

clo

ck c

ycle

s

Iterations

No CollisionCollision

(b) Timing Observation of Cache

slice Collision using equations in [12]

Fig. 8. Timing observations for LLC set and slice collision

secret collides, 12 data elements belonging to each slice of the particular set areselected. Prime + Probe timing observations are noted for the set of 12 elementsfor each slice. The slice observing collision with the secret exponent will sufferfrom cache misses in the probe phase and thus have higher access time to otherslices.

We illustrate the timing observations for two scenarios in Fig. 7a and b. InFig. 7a, the secret is mapped to LLC slice 0, while in Fig. 7b, the secret getsmapped to LLC slice 2. In both of the figures, access time for probing elementsfor the cache slice for which the secret access collides is observed to be higherthan the other cache slice which belongs to the same set but do not observe cachecollision. Thus because of collision of accesses of both the processes to the sameslice, the spy observed higher probe time for slice 0, than slice 2 in Fig. 7a. Onthe contrary, in a different run, the secret exponent got mapped to LLC slice 2,which in Fig. 7b shows higher probe time than slice 0. Thus we can easily figureout the cache slice for the particular set for which both the decryption and thespy process accesses actually collides.

We also extended our experiment with the reverse engineered cache slicefunctions from [12]. Figure 8b shows the timing observations when we use theslice selection functions for a 4-core processor. The functions [12] are:

o0 = b6⊕b10⊕b12⊕b14⊕b16⊕b17⊕b18⊕b20⊕b22⊕b24⊕b25⊕b26⊕b27⊕b28⊕b30⊕b32⊕b33

o1 = b07⊕b11⊕b13⊕b15⊕b17⊕b19⊕b20⊕b21⊕b22⊕b23⊕b24⊕b26⊕b28⊕b29⊕b31⊕b33⊕b34

Similar to our previous observations, Fig. 8b shows that we were able toidentify the target cache slice from the timing observations using cache slicereverse engineering functions from [12].

Determining the LLC set and slice in which secret maps, actually gives thecontrol to the adversary to flush the existing cache lines in these locations, andthus everytime the decryption process have to access the main memory. In simplewords, accesses made by the adversary to this particular LLC set and slice actsas an alternative to clflush instruction being added to the decryption process.

Page 17: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

618 S. Bhattacharya and D. Mukhopadhyay

4.4 Identifying the DRAM Bank

From the previous subsections, we identified particularly the LLC set and theslice mappings for the decryption process. Thus if the adversary selects dataelements which belong to same set as well slice as to the secret exponent, andalternatively primes the LLC before running each decryption, each time thedecryption process will encounter a cache miss and which will eventually getaccessed from the main memory. This aids the adversary to identify the respec-tive bank of DRAM, where the secret exponent is mapped. For the 1024-bit RSAexponentiation secret key, the channel, rank and bank mappings of DRAM willbe decided by the equations reverse engineered in [16,17]. In our experimentalsetup, there exists 2 channel, 1 DIMM per channel, 2 ranks per DIMM, 8 banksper rank and 214 rows per bank.

– The DRAM bank equations for Ivy Bridge [16] is decided by the physicaladdress bits: ba0 = b14 ⊕ b18, ba1 = b15 ⊕ b19, ba2 = b17 ⊕ b21,

– Rank is decided by r = b16 ⊕ b20 and the– Channel is decided by, C = b7 ⊕ b8 ⊕ b9 ⊕ b12 ⊕ b13 ⊕ b18 ⊕ b19.– The DRAM row index is decided by physical address bits b18, · · · , b31.

0

10

20

30

40

50

60

70

80

350 400 450 500 550 600 650 700 750

freq

uenc

ies

time

(a) Timing Observations in clock cy-

cles for DRAM bank collision

0

10

20

30

40

50

60

70

300 350 400 450 500 550

freq

uenc

ies

time

(b) Timing Observations in clock cy-

cles of separate DRAM bank access

Fig. 9. Timing observations for Row-buffer collision during DRAM bank accesses

In the same experimental setup as previous, the adversary targets each bankat a time and selects elements from the memory map for which the physicaladdresses map to that particular bank. The following process is repeated toobtain significant timing observations:

1. The spy primes the LLC and requests decryption by sending ciphertext.2. While the spy wait for the decrypted message, it selects an element for the

target bank from the memory map, clflush’es it from cache, and accessesthe element. The clflush instruction removes the element from all levels ofcache, and the following access to the element is addressed from the respectivebank of the DRAM.

3. The time to DRAM bank access is also noted.

Page 18: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

Curious Case of Rowhammer: Flipping Secret Exponent Bits 619

It is important to note that, there is no explicit synchronization imposedupon the the two concurrent processes in their software implementation. Thedecryption and the spy both requests a DRAM bank access. If the target bankmatches with the bank in which the secret is mapped, then we expect to havehigher access time. Figure 9a and b are the timing observations noted by the spyprocess while it accesses elements selected from the target bank. Figure 9a refersto the the case where the higher access times are observed due to the row-buffercollision as the bank accessed by the spy is same as the secret mapped bank.While Fig. 9b refers to the situation where the elements accessed by the spy arefrom an arbitrary different bank than the bank where secret maps. In both of thefigures, the significant high peak has been observed in timing range of 350 – 400clock cycles. While in Fig. 9a, the row-buffer collision is apparent because thereare significant number of observations which have timings greater than the regionwhere the peak is observed. Had there been an absolute synchronization of twoprocesses accessing the same DRAM bank, each access to DRAM bank, by eitherof the two process would have suffered from row-buffer collision. Thus in ourscenario, we claim that in majority of cases, though the accesses are addressedfrom the same bank they seldom result in row-buffer collision, which justifiesthe peak around 350–400 clock cycles. From this we conclude that detectionof row-buffer collision can only be identified over a significant number of timingobservations. The DRAM bank which shows such higher access times is identifiedto be the bank where secret data resides.

4.5 Inducing Bit Flip Using Rowhammer

In the previous section, we have illustrated that how the adversary is able todistinguish the bank in which the secret exponent resides. The software imple-mentation of the induction of bit flip is performed by repeated access to theelements of the same bank. The following pseudo-code is used to hammer rowsin specific banks After each access to the element it is deliberately flushed fromthe cache using the clflush instruction by the adversary.

Code-hammer-specific-bank{

Select set of 10 data elements mapping to specific bankRepeat{

Access all elements of the setClflush each element of the set

}jmp Code-hammer-specific-bank

}

A statistic over observations of bit flips in respective banks is reported inFig. 10. The bar graph shows the number of bit flip instances we were able to

Page 19: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

620 S. Bhattacharya and D. Mukhopadhyay

observe for respective banks of a single Dual In-line Memory Module (DIMM).The bit faults that we have observed in our experiments are bit-reset faults.

Fig. 10. Number of bit flips observed in all banks of a single DIMM

The row index of the location of the secret in the DRAM bank is determinedby the physical address bits of the secret. Thus this implies that the secret expo-nent can sit in any of the rows in the target bank. Accordingly, we restricted ourhammering attempts in the target bank and we selected random accesses to thetarget bank which eventually resulted in bit flips. Thus we slightly modified oursetup such that the code iteratively runs until and unless the decryption outputchanges, which signifies that secret exponent bits have been successfully flipped.The fault attack in [9] requires a single faulty signature to retrieve the secret.Thus, bit flip introduced in the secret exponent by the rowhammer in a spe-cific bank can successfully reveal the secret by applying fault analysis techniquesin [9]. The probability of bit flip is 1/214, since there are 214 rows in a particularbank. Interestingly, the size of the secret key has an effect on the probability ofbit flip in the secret exponent. In other words, we can say that the probability ofbit flip in the secret exponent will be more if the secret exponent size is larger.

5 Possible Countermeasures

There has been various countermeasures of rowhammer attacks proposed in lit-erature. In [2], seven potential system level mitigation techniques were proposedwhich range from designing secure DRAM chips, enforcing ECC protection onthem, increasing the refresh rate, identification of victim cells and retiring themand refreshing vulnerable rows (for which the adjacent rows are getting accessedfrequently). As mentioned in [2], each of these solutions suffers from the trade-offbetween feasibility, cost, performance, power and reliability. In particular, thesolution named as Probabilistic Adjacent Row Activation (PARA) has the leastoverhead among the solutions proposed in [2]. The memory controller in PARAis modeled such that every time a row closes, the controller decides to refreshits adjacent rows with probability p (typically 1/2). Because of its probabilisticnature, the approach is low overhead as it does not require any complex datastructure for counting the number of row activations.

Another hardware level mitigation is reported in [5], where it is mentionedthat the LPDDR4 standard for the DRAM incorporated two features for the

Page 20: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

Curious Case of Rowhammer: Flipping Secret Exponent Bits 621

hardware level mitigation such as Targeted Row Refresh (TRR) and MaximumActivate Count (MAC). Among which, it is reported that TRR technique is get-ting deployed in the next generation DDR4 memory units [18,19]. TRR incor-porates a special module which can track the frequently made row-activationsand can selectively refresh the rows adjacent to these aggressor rows. All of theabove discussed protections have to be incorporated in hardware, but this doesnot eliminate the threat from rowhammer attacks since many of the manufac-turers refer to these as optional modules.

There are few attempts which provide software level protection from rowham-mer attacks. The clflush instruction was responsible for removing the targetelement from the cache and that resulted in DRAM accesses. In order to stopthe security breaches from NaCl sandbox escape and privilege escalation [5],Google NaCl sandbox was recently patched to disallow applications from usingthe clflush instruction. The other software level mitigation is to double therefresh rate from 64 ms to 32 ms by changing the BIOS settings, or alterna-tively upgrading own BIOS with a patched one. It has been reported in [20],that system manufacturers such as HP, Lenovo and Apple have already con-sidered updating the refresh rate. But both of the techniques such as doublingrefresh rate and removing access to clflush instruction as a prevention tech-nique has been proved to be ineffective in [20]. The paper illustrates a case studyof introducing bit flips inspite of having refresh interval as low as 16 ms and themethod does not use the clflush instruction. The paper also propose an effec-tive, low cost software based protection mechanism named ANVIL. Instead [20]propose a two-step approach which observe the LLC cache miss statistics fromthe performance counters for a time interval, and examines if the number ofcache misses crosses the predetermined threshold. If there are significantly highnumber of cache misses, then the second phase of evaluation starts, which sam-ples the DRAM accesses of the suspected process and identifies if rows in aparticular DRAM bank is getting accessed. If repeated row activation in samebank is detected, then ANVIL performs a selective refresh on the rows whichare vulnerable.

6 Further Discussion

The present paper’s main focus is to show that targeted faults can be inflictedby rowhammer. As a consequence, we have cited the example of a fault analysison RSA, which is not protected by countermeasures. One of the objectives ofthis paper, is to show that fault attacks are serious threats even when triggeredusing software means. This makes the threat more probable as opposed to a faultinjection by hardware means: like voltage fluctuations etc. Thus, this emphasizesmore the need for countermeasures, at the software level.

Having said that, even standard libraries like OpenSSL use fault counter-measures, but they are not fully protected against these class of attacks. Forexample, in Black Hat 2012 [21], a hardware based fault injection was shownto be of threat to OpenSSL based RSA signature schemes. It was reported that

Page 21: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

622 S. Bhattacharya and D. Mukhopadhyay

the initial signature is verified by the public key exponent, however in case of afault, another signature is generated and this time it is not verified [21]. The finalsignature is not verified because it is widely assumed that creating a controlledfault on a PC is impractical. More so, the faults are believed to be acciden-tal computational errors, rather than malicious faults. Hence, the probability ofinflicting two successive faults is rather low in normal computations! However,in case of rowhammer, as the fault is created in the key, repeating the processwould again result in a wrong signature and thus get released.

Hence, the objective of the current paper is to highlight that inflicting con-trolled faults are more probable through software techniques than popularlybelieved, and hence ensuring that verification should be a compulsory step beforereleasing signatures.

6.1 Assumptions of the Proposed Attack

In our proposed attack, we assumed that the secret decryption exponent residesin a particular location of the DRAM and the decryption oracle continuouslypolls for input ciphertexts. In addition we also assume that the secret resides inthe same location in the DRAM through out the duration of the attack and isnot page-swapped by other running processes.

6.2 Limitations and Practicality of Our Attack

In this paper, the access to pagemap is assumed to be available at user privilegelevel since our setup has 3.2.0-79-generic version of Linux kernel. But from early2015, for versions of kernel 4.0 onwards, the access to this pagemap has beenrestricted to processes with root privileges. However, the attack would still berelevant in a cross-VM environment as in [22], where the users of the co-locatedVMs actually have the administrator privilege and can consult the pagemapfor the required virtual to physical address translation. In such a scenario, theattacker is assumed to be mounted on a VM which is co-resident to the VMwhich hosts the decryption oracle. Timing information obtained from Prime +Probe methodology in this experimental setup along with the reverse engineeringknowledge can be used to precisely induce fault in the secret of the co-residentVM. Our paper primarily focuses on the vulnerability analysis of rowhammer incontext to Linux kernels, but the vulnerability may be equally or more relevantin context to other operating systems where the access to a data structure suchas pagemap (in context to Linux kernels) is not restricted only to administratorprivileges. Moreover, the attack in its original form might be relevant in cus-tomized embedded system applications, thus it would be an interesting exerciseto ascertain the security impact of rowhammer in such applications.

7 Conclusion

In this paper, we claim to illustrate in steps a combination of timing and faultanalysis attack exploiting vulnerability of recent DRAM’s disturbance error to

Page 22: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

Curious Case of Rowhammer: Flipping Secret Exponent Bits 623

induce a bit flip targeting the memory bank shared by the secret. This is apractical fault model and uses Prime + Probe cache access attack methodologyto narrow down the search space where the adversary is supposed to induce flip.The experimental results illustrate that the timing analysis shows significantvariation and leads to the identification of LLC set and slices. In addition row-buffer collision has been exploited to identify the DRAM bank which holds thesecret. The worst case complexity of inducing fault by repeated hammering ofrows in the specific memory bank typically is same as the number of rows inbank. The proposed attack finds most relevance in cross-VM setup, where theco-located VMs share the same underlying hardware and thus root privileges areusually granted to the attack instance.

Acknowledgements. We would like to thank the anonymous reviewers for their valu-able comments and suggestions. We would also like to thank Prof. Berk Sunar for hisinsightful feedback and immense support. This research was supported in part by theTCS Research Scholarship Program in collaboration with IIT Kharagpur. This workwas also supported in part by the Challenge Grant from IIT Kharagpur, India andInformation Security Education Awareness (ISEA), Deity, India.

References

1. Wikipedia: Rowhammer wikipedia page (2016). https://en.wikipedia.org/wiki/Row-hammer

2. Kim, Y., Daly, R., Kim, J., Fallin, C., Lee, J.-H., Lee, D., Wilkerson, C., Lai,K., Mutlu, O.: Flipping bits in memory without accessing them: an experimentalstudy of DRAM disturbance errors. In: ACM/IEEE 41st International Symposiumon Computer Architecture, ISCA 2014, Minneapolis, MN, USA, June 14–18, 2014,pp. 361–372. IEEE Computer Society (2014)

3. Huang, R.-F., Yang, H.-Y., Chao, M.C.-T., Lin, S.-C.: Alternate hammering testfor application-specific drams and an industrial case study. In: Groeneveld, P.,Sciuto, D., Hassoun, S. (eds.) The 49th Annual Design Automation Conference2012, DAC 2012, San Francisco, CA, USA, June 3–7, 2012, pp. 1012–1017. ACM(2012)

4. Kim, D.-H., Nair, P.J., Qureshi, M.K.: Architectural support for mitigating rowhammering in DRAM memories. Comput. Archit. Lett. 14(1), 9–12 (2015)

5. Seaborn, M., Dullien, T.: Exploiting the DRAM rowhammer bug togain kernel privileges (2015). http://googleprojectzero.blogspot.in/2015/03/exploiting-dram-rowhammer-bug-to-gain.html

6. Qiao, R., Seaborn, M.: A new approach for rowhammer attacks. In: HOST 2016(2016)

7. Gruss, D., Maurice, C., Mangard, S.: Rowhammer.js a remote software-inducedfault attack in javascript. CoRR, abs/1507.06955 (2015)

8. Seaborn, M., Dullien, T.: Test DRAM for bit flips caused by the rowhammerproblem (2015). https://github.com/google/rowhammer-test,2015

9. Boneh, D., DeMillo, R.A., Lipton, R.J.: On the importance of checking crypto-graphic protocols for faults. In: Fumy, W. (ed.) EUROCRYPT 1997. LNCS, vol.1233, pp. 37–51. Springer, Heidelberg (1997)

10. JEDEC. Standard No. 79-3F. DDR3 SDRAM Specification (2012)

Page 23: Curious Case of Rowhammer: Flipping Secret Exponent Bits ... Curious Case of Rowhammer: Flipping Secret Exponent Bits Using Timing Analysis Sarani Bhattacharya(B) and Debdeep Mukhopadhyay

624 S. Bhattacharya and D. Mukhopadhyay

11. Yarom, Y., Falkner, K.: FLUSH+RELOAD: A high resolution, low noise, L3 cacheside-channel attack. In: Fu K., Jung, J. (eds.) Proceedings of the 23rd USENIXSecurity Symposium, San Diego, CA, USA, August 20–22, 2014, pp. 719–732.USENIX Association (2014)

12. Maurice, C., Le Scouarnec, N., Neumann, C., Heen, O., Francillon, A.: Reverseengineering intel last-level cache complex addressing using performance coun-ters. In: Bos, H., et al. (eds.) RAID 2015. LNCS, vol. 9404, pp. 48–65. Springer,Heidelberg (2015). doi:10.1007/978-3-319-26362-5 3

13. Irazoqui, G., Eisenbarth, T., Sunar, B.: Systematic reverse engineering of cacheslice selection in intel processors. In: 2015 Euromicro Conference on Digital SystemDesign, DSD 2015, Madeira, Portugal, 26–28 August 2015, pp. 629–636. IEEEComputer Society (2015)

14. Osvik, D.A., Shamir, A., Tromer, E.: Cache attacks and countermeasures: thecase of AES. In: Pointcheval, D. (ed.) CT-RSA 2006. LNCS, vol. 3860, pp. 1–20.Springer, Heidelberg (2006)

15. Liu, F., Yarom, Y., Ge, Q., Heiser, G., Lee, R.B.: Last-level cache side-channelattacks are practical. In: 2015 IEEE Symposium on Security and Privacy, SP 2015,San Jose, CA, USA, 17–21 May 2015, pp. 605–622. IEEE Computer Society (2015)

16. Pessl, P., Gruss, D., Maurice, C., Mangard, S.: Reverse engineering intel DRAMaddressing and exploitation. CoRR, abs/1511.08756 (2015)

17. Hund, R., Willems, C., Holz, T.: Practical timing side channel attacks againstkernel space ASLR. In: 2013 IEEE Symposium on Security and Privacy, SP 2013,Berkeley, CA, USA, 19–22 May 2013, pp. 191–205. IEEE Computer Society (2013)

18. JEDEC Solid State Technology Association: Low Power Double Data Rate 4(LPDDR4) (2015)

19. Micron Inc. DDR4 SDRAM MT40A2G4, MT40A1G8, MT40A512M16 Data sheet,2015 (2015)

20. Aweke, Z.B., Yitbarek, S.F., Qiao, R., Das, R., Hicks, M., Oren, Y., Austin, T.M.:ANVIL: software-based protection against next-generation rowhammer attacks.In: Conte, T., Zhou, Y. (eds.) Proceedings of the Twenty-First International Con-ference on Architectural Support for Programming Languages and Operating Sys-tems, ASPLOS 2016, Atlanta, GA, USA, 2–6 April 2016, pp. 743–755. ACM (2016)

21. Bertacco, V., Alaghi, A., Arthur, W., Tandon, P.: Torturing openSSL22. Inci, M.S., Gulmezoglu, B., Irazoqui, G., Eisenbarth, T., Sunar, B.: Cache attacks

enable bulk key recovery on the cloud. IACR Cryptology ePrint Archive: 2016/596