practical and secure pcm memories via online attack detection

© 2007 IBM Corporation

WEST-2010

Practical and Secure PCM Memories via Online Attack Detection

Moinuddin Qureshi Luis Lastras, Michele Franceschini, John Karidis

IBM T. J. Watson Research Center, Yorktown Heights, NY

2 © 2007 IBM Corporation

Birthday Paradox Attack

We thank Andre Seznec for pointing the Birthday Paradox

Attack (BPA) vulnerability for Start-Gap

In the paper, we first prove that for our RBSG scheme, BPA

causes at-most 2x reduction in lifetime compared to

Repeat Address Attack (RAA)

Lifetime under attack : ~1-2 months

Here we introduce an adaptive-RBSG scheme which extends

the life to years under all analyzed attacks


The Assumptions in Security Studies

No other application running except attack Attack goes undetected for months/years OS maintains mapping of pages for entire duration Possible to write a line to memory at full speed

Other things that can help: Significant spare lines needed for PCM anyway Legal actions for minimizing warranty fraud


What we need …

Parameter RBSG RandSwap Ideal.Req

Lifetime for typical apps 97% 87.5% 95+%

Lifetime under attacks Month(s) Year(s) Year(s)

Storage overhead ~1.5KB 25-256KB few KB

Latency overhead <10 cyc <10 cyc <10 cyc

Write (performance) overhead

1% 12.5% <1%

Complexity Low High Low

Security: Lifetime under attacks should be in the range of years.

Typical solution: Table based with RandSwap [Ben-Aroya et al.’06, Seznec’10]

We would like to have security for free (or almost free)


Key insight

Table based methods make all applications pay high overhead to handle

(infrequent) attacks: High overhead okay for attacks, not otherwise

Wear leveling algorithms move lines at a particular rate. This rate

determines: write overhead & robustness under attacks

Trade-off: rate must be low to reduce overhead and high for robustness

If we can identify attack-like patterns, then have slower rate for typical

apps and faster for attacks: best of both worlds


Our proposal: Adaptive Wear Leveling

WearLevelingAlgorithm

Online AttackDetector (OAD)

WRITE.ADDRPCM

BasedMain

Memory

RateControl

OAD controls the rate of movement based on properties of the memory write stream


Outline

Introduction Background on Start-Gap Anatomy of Attacks Practical Attack Detector Adaptive Start-Gap Summary and Discussions


Start-Gap Wear Leveling

Two registers (Start & Gap) + 1 line (GapLine) to support movement.Move GapLine every G writes to memory.

STARTABC

0 1 2 3

4

PCMAddr = (Start+Addr); (PCMAddr >= Gap) PCMAddr++)

D

GAP

Storage overhead: less than 8 bytes (GapLine taken from spares) Write overhead: One extra write every G writes 1% (G=100)

Randomized address space to avoid “hot region” and predictability


Terminology

Line Vulnerability Factor (LVF): N*G

If (LVF > E) Repeated writes to a line causes line failure

N= Number of Lines in MemoryG= Gap Movement Interval (Writes per Gap movement)E= Endurance per each line

LVF= Max writes to same address before line gets moved

In general we want (LVF << E), say LVF=E/8 (not enough)

Therefore, G= (E/8N), which means G=1/16 (if N=2E)But we want G>=100 to limit write overhead to less than 1%


Effective LVF

Intra Write Distance (d): Num writes between two writes to the same line

But we write to several other lines between writes to same line

For ELVF=E/8 G=(E.d)/(8N)

Typically, d>>1K, which means G=128 easy for typical apps

If (d>1), the line vulnerability factor reduces

Effective LVF (d) = ELVF(d) = (N*G)/d


Likelihood of Repeat Writes

To show that d>>1K, we conducted the following experiment

DRAM cache: 32MB, 8-way LRU, normal indexingKeep track of most recent “w” writes to memory

Probability of hit in most recent “w” writes to memory

Likelihood of hit in most recent 1K entries is less than one in a million


Outline



Requirements for Attack

For an attack to successfully cause failure, it must:

1. Write to a few lines (otherwise d>1K)

2. Repeatedly for at-least a few million times

3. At a fairly high write bandwidth

We can exploit properties (1) and (2) for online attack detection.


Canonical Form of Attacks

SMA-like attacks are most challenging to detect

Generalized RAA

Stealth-Mode Attack


Typical Apps vs. Attacks

Easy to differentiate attacks by measuring hit rate of 1K entry window

But too much hardware and power. Window searched on each access.


Outline



Practical Attack Detector (PAD)

We are interested in measuring hit rate of a relatively small window (1K)

This can be approximated by a small stack and insert with prob “p”

ADDR

+

+

HitCounter (16-bit)

WriteCounter (16-bit)

On each access, the stack is checked, WriteCounter incrementedOn a hit, the replacement info of entry is updated, HitCounter incrementedOn a miss, with a probability p, the insert the address in the stack

When WriteCounter reaches max, hit rate is calculated, both counters halvedWe conservatively calculate the distance (d) = 1/hitrate


Effectiveness of PAD

PAD can monitor different window sizes, depending on the insert probability (p)We assume a 16-entry stack with p=1/256

PAD can accurately estimate the distance (d) for SMA


Characteristics of PAD

1. Programmable and accurate for distances within window2. Low detection latency (32K writes)3. Pardons infrequent hits in window (must hit several times)4. Decaying effect on hit rate attack not easily forgotten5. Handles multiple attacks reports worst-case d6. Low hardware cost (68 bytes) and non-intrusive

PAD is off the critical path, distance calculation every 32K writes


Outline



Adaptive Start-Gap

Key insight: Make Gap Movement Interval (G) a function of distance (d)

To ensure < E/8 writes occur to a given line before the line is moved:

G =

For typical apps, d>>1K, which means G=128

For repeat address attack (RAA), d=1, which means G=1/16 if (N=2E)Slow (but not a problem, as it is an attack, not using the cache)But useful writes reduced to less than 1/16.


Adaptive Region-Based Start-Gap

Key insight: Divide memory into regions (num regions = num banks)

With “r” banks, number of lines in each bank becomes N/r

Therefore, to ensure < E/8 writes occur before the line is moved:

G =

Thus, even under RAA, one line moves every demand write

Each bank has its own start-gap and attack detection circuit (68 bytes)

Total storage overhead of aRBSG: 16*(4+4+68) = 1216 bytes

For N=2Eand r=16 G =


0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Rated GRAA(X) GRAA(2X) GRAA(4X) GRAA(8X) GRAA(16X) GRAA(32X) GRAA(64X)

Ex

pe

cte

d L

ife

tim

e (

Ye

ars

)

Extra WritesUseful Writes

Results for GRAA

Generalized Repeat Address Attack (GRAA) on n lines.

N=226, E=225, Banks=Regions=16, TimeToWriteOneLine=1 µ second

Lifetime remains constant at 4 years. Useful writes from 50% to 98%


0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536

Capacity of each Bucket (K)

Per

c. o

f T

heo

reti

cal

Max

(B

*K)

Handling BPA Effectiveness of BPA bounded by “Buckets and Balls” problem

Assume attacker knows which line maps to which bank

ELVF of E/8 not enough, need ELVF of E/64 for ~50% demand writes

8 months

Number of buckets (B)=4M


Extensions for Handling BPA

With ELVF=E/64, 37% of demand writes + 37% “extra” writes. Lifetime: 74% of 4 years (~3 years) [Without any spare lines]

G = G 8

Move gap 8 times faster to ensure ELVF of E/64

Avoid 8x write overhead: Under attack ensure >= 8 demand entries in WRQ before writing. This ensure dmin = 8. Worst-case overhead: one “extra” write per demand write


Generalized SMA: “Fooling” the detector

SMA tries to trick PAD by hiding 1 attack line in random writesSMA is easy to detect but GSMA much harder to detect.

k small enough that attack line does not enter PAD (k<1/p)n large enough that PAD forgets about the line if inserted (n>NumEntries/p)

For our setup k<256 and n>256*16 n> 4096

< Avg. Detection Period > Avg. Retention Period


Lifetime under GSMA

k={1,2,4,8,16,32,64,128,256} x n={1K,2K,4K,8K,16K,32K,64K,128K}

If distance calc off by 8x this means ELVF of E/8 instead of E/64If attack was at full speed: Lifetime of 8 months

GSMA inherently slow (8K/128 64x slow): Lifetime ~2.2 years

For the worst-case attack under LRU, distance calc. off by 8x


Comparison

Parameter RBSG RandSwap Ideal.Req aRBSG

Lifetime for typical apps 97% 87.5% 95+% 97%

Lifetime under attacks Month(s) Year(s) Year(s) Year(s)

Storage overhead ~1.5KB 25-256KB few KB <1.3KB

Latency overhead <10 cyc <10 cyc <10 cyc <10 cyc

Write (perf.) overhead 1% 12.5% <1% <1%

Complexity Low High Low Low

Possible to get security for free … Well, almost free


Outline



Summary

Security: We want lifetime in range of year(s) under attacks

We propose to decouple: attack detection and corrective actions

We propose simple and accurate attack detector (68 bytes)

We propose Adaptive RBSG that keeps write overhead for typical apps to < 1% and provides years of lifetime under attacks

Attack detection can be used for other corrective actions:slow down service, inform OS, swap page, inform sys admin

Attack detection also applicable to other wear leveling algorithms

Beta version: Work in progress (other attacks, theoretical analysis)


Pagemode can be supported securely as well by: Randomizing at page granularityHaving a GAP-PAGE instead of a GAP-LINERotating lines within page using a static randomizer: F(start,addr)

Make it harder for attacker to devise attacks. For example, randomize the index of DRAM cache

This will also prevent typical apps showing attack-like patterns

Other Extensions (not evaluated yet)

Last Level CacheRand InvRandToMemoryAddr

practical and secure pcm memories via online attack detection

Documents