exploiting memory device wear-out dynamics to improve nand

14
Exploiting Memory Device Wear-Out Dynamics to Improve NAND Flash Memory System Performance Yangyang Pan, Guiqiang Dong, and Tong Zhang Electrical, Computer and Systems Engineering Department Rensselaer Polytechnic Institute, USA. Abstract This paper advocates a device-aware design strategy to improve various NAND flash memory system perfor- mance metrics. It is well known that NAND flash memory program/erase (PE) cycling gradually degrades memory device raw storage reliability, and sufficiently strong error correction codes (ECC) must be used to en- sure the PE cycling endurance. Hence, memory man- ufacturers must fabricate enough number of redundant memory cells geared to the worst-case device reliability at the end of memory lifetime. Given the memory de- vice wear-out dynamics, the existing worst-case oriented ECC redundancy is largely under-utilized over the en- tire memory lifetime, which can be adaptively traded for improving certain NAND flash memory system perfor- mance metrics. This paper explores such device-aware adaptive system design space from two perspectives, in- cluding (1) how to improve memory program speed, and (2) how to improve memory defect tolerance and hence enable aggressive fabrication technology scaling. To en- able quantitative evaluation, we for the first time develop a NAND flash memory device model to capture the ef- fects of PE cycling from the system level. We carry out simulations using the DiskSim-based SSD simula- tor and a variety of traces, and the results demonstrate up to 32% SSD average response time reduction. We further demonstrate that the potential on achieving very good defect tolerance, and finally show that these two design approaches can be readily combined together to noticeably improve SSD average response time even in the presence of high memory defect rates. 1 Introduction The steady bit cost reduction over the past decade has en- abled NAND flash memory to enter increasingly diverse * This material is based upon work supported by the National Sci- ence Foundation under Grant No. 0937794 applications, from consumer electronics to personal and enterprise computing. In particular, it is now economi- cally viable to implement solid-state drives (SSDs) using NAND flash memory, which is expected to fundamen- tally change the memory and storage hierarchy in future computing systems. As the semiconductor industry is aggressively pushing NAND flash memory technology scaling and the use of multi-bit-per-cell storage scheme, NAND flash memory increasingly relies on error correc- tion codes (ECC) to ensure the data storage integrity. It is well known that NAND flash memory cells gradually wear out with the program/erase (PE) cycling [6], which is reflected as gradually diminishing memory cell storage noise margin (or increasing raw storage bit error rate). To meet a specified PE cycling endurance limit, NAND flash memory manufacturers must fabricate enough number of redundant memory cells that can tolerate the worst- case raw storage reliability at the end of memory life- time. Clearly, the memory cell wear-out dynamics tend to make the existing worst-case oriented ECC redun- dancy largely under-utilized over the entire lifetime of memory, especially at its early lifetime when PE cycling number is relatively small. Very intuitively, we may adaptively trade such under- utilized ECC redundancy for improving certain NAND flash memory system performance metrics throughout the memory lifetime. This naturally leads to a PE- cycling-aware adaptive NAND flash memory system de- sign paradigm. Based upon extensive open literature on flash memory devices, we first develop an approximate NAND flash memory device model that quantitatively captures the dynamic PE cycling effects, including ran- dom telegraph noise [15, 17] and interface trap recovery and electron detrapping [26, 31, 45], and another major noise source: cell-to-cell interference [25]. Such a device model makes it possible to explores and quantitatively evaluate possible adaptive system design strategies. In particular, this paper explores the adaptive system de- sign space from two perspectives: (1) Since NAND flash 1

Upload: lynhi

Post on 19-Dec-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploiting Memory Device Wear-Out Dynamics to Improve NAND

Exploiting Memory Device Wear-Out Dynamics to Improve NAND FlashMemory System Performance

Yangyang Pan, Guiqiang Dong, and Tong ZhangElectrical, Computer and Systems Engineering Department

Rensselaer Polytechnic Institute, USA.

AbstractThis paper advocates a device-aware design strategy toimprove various NAND flash memory system perfor-mance metrics. It is well known that NAND flashmemory program/erase (PE) cycling gradually degradesmemory device raw storage reliability, and sufficientlystrong error correction codes (ECC) must be used to en-sure the PE cycling endurance. Hence, memory man-ufacturers must fabricate enough number of redundantmemory cells geared to the worst-case device reliabilityat the end of memory lifetime. Given the memory de-vice wear-out dynamics, the existing worst-case orientedECC redundancy is largely under-utilized over the en-tire memory lifetime, which can be adaptively traded forimproving certain NAND flash memory system perfor-mance metrics. This paper explores such device-awareadaptive system design space from two perspectives, in-cluding (1) how to improve memory program speed, and(2) how to improve memory defect tolerance and henceenable aggressive fabrication technology scaling. To en-able quantitative evaluation, we for the first time developa NAND flash memory device model to capture the ef-fects of PE cycling from the system level. We carryout simulations using the DiskSim-based SSD simula-tor and a variety of traces, and the results demonstrateup to 32% SSD average response time reduction. Wefurther demonstrate that the potential on achieving verygood defect tolerance, and finally show that these twodesign approaches can be readily combined together tonoticeably improve SSD average response time even inthe presence of high memory defect rates.

1 Introduction

The steady bit cost reduction over the past decade has en-abled NAND flash memory to enter increasingly diverse

∗This material is based upon work supported by the National Sci-ence Foundation under Grant No. 0937794

applications, from consumer electronics to personal andenterprise computing. In particular, it is now economi-cally viable to implement solid-state drives (SSDs) usingNAND flash memory, which is expected to fundamen-tally change the memory and storage hierarchy in futurecomputing systems. As the semiconductor industry isaggressively pushing NAND flash memory technologyscaling and the use of multi-bit-per-cell storage scheme,NAND flash memory increasingly relies on error correc-tion codes (ECC) to ensure the data storage integrity. Itis well known that NAND flash memory cells graduallywear out with the program/erase (PE) cycling [6], whichis reflected as gradually diminishing memory cell storagenoise margin (or increasing raw storage bit error rate). Tomeet a specified PE cycling endurance limit, NAND flashmemory manufacturers must fabricate enough numberof redundant memory cells that can tolerate the worst-case raw storage reliability at the end of memory life-time. Clearly, the memory cell wear-out dynamics tendto make the existing worst-case oriented ECC redun-dancy largely under-utilized over the entire lifetime ofmemory, especially at its early lifetime when PE cyclingnumber is relatively small.

Very intuitively, we may adaptively trade such under-utilized ECC redundancy for improving certain NANDflash memory system performance metrics throughoutthe memory lifetime. This naturally leads to a PE-cycling-aware adaptive NAND flash memory system de-sign paradigm. Based upon extensive open literature onflash memory devices, we first develop an approximateNAND flash memory device model that quantitativelycaptures the dynamic PE cycling effects, including ran-dom telegraph noise [15, 17] and interface trap recoveryand electron detrapping [26, 31, 45], and another majornoise source: cell-to-cell interference [25]. Such a devicemodel makes it possible to explores and quantitativelyevaluate possible adaptive system design strategies. Inparticular, this paper explores the adaptive system de-sign space from two perspectives: (1) Since NAND flash

1

Page 2: Exploiting Memory Device Wear-Out Dynamics to Improve NAND

memory program speed also strongly affects the mem-ory cell storage noise margin, we could trade the under-utilized ECC redundancy to adaptively improve NANDflash memory program speed; (2) We could also exploitthe under-utilized ECC redundancy to realize strongermemory cell defect tolerance and hence enable more ag-gressive technology scaling. We elaborate on the under-lying rationale and realizations of these two design ap-proaches. In addition, for the latter one, we propose asimple differential wear-leveling strategy in order to min-imize its impact on effective PE cycling endurance.

For the purpose of evaluation, using the developedNAND flash memory device model, we first obtain de-tailed quantitative memory cell characteristics under dif-ferent PE cycling times and different program speed fora hypothetical 2bit/cell NAND flash memory. Accord-ingly, with the sector size of 512B user data, we constructa binary BCH code (4798, 4096, 54) with 1.1% codingredundancy that can ensure the data storage integrity atthe PE cycling limit of 10K. Using representative work-load traces and the SSD model [3] in DiskSim [8], wecarry out extensive simulations to evaluate the potentialof trading under-utilized ECC redundancy to improvememory program speed while assuming the memory isdefect-free. The simulation results show that we couldreduce the SSD average response time by up to 32%. As-suming memory defects follow Poisson distributions, wefurther show that the proposed differential wear-levelingtechnique can very effectively improve the effectivenessof allocating ECC redundancy for improving memorydefect tolerance. Finally, we study the combined effectswhen we trade the under-utilized ECC redundancy to im-prove memory program speed and realize defect toler-ance at the same time. DiskSim-based simulations showthat, even in the presence of high defect rates, we can stillachieve noticeable SSD average response time reduction.

2 Background

2.1 Memory Erase and Program BasicsEach NAND flash memory cell is a floating gate tran-sistor whose threshold voltage can be configured (or pro-grammed) by injecting certain amount of charges into thefloating gate. Hence, data storage in NAND flash mem-ory is realized by programming the threshold voltageof each memory cell into two or more non-overlappingvoltage windows. Before one memory cell can be pro-grammed, it must be erased (i.e., remove the chargesin the floating gate, which sets its threshold voltage tothe lowest voltage window). NAND flash memory usesFowler-Nordheim (FN) tunneling to realize both eraseand program [7], because FN tunneling requires verylow current and hence enables high erase/program par-

allelism. It is well known that the threshold voltage oferased memory cells tends to have a wide Gaussian-likedistribution [41]. Hence, we can approximately modelthe threshold voltage distribution of erased state as

pe(x) =1

σe√

2πe− (x−µe)2

2σ2e , (1)

where µe and σe are the mean and standard deviationof the erased state threshold voltage. Regarding mem-ory program, a tight threshold voltage control is typi-cally realized by using incremental step pulse program(ISPP) [6, 39], i.e., all the memory cells on the sameword-line are recursively programmed using a program-and-verify approach with a stair case program word-linevoltage Vpp. Let ∆Vpp denote the incremental programstep voltage. For the k-th programmed state with the ver-ify voltage V (k)

p , ideally ISPP program results in a uni-form threshold voltage distribution:

p(k)p (x) =

{1

∆Vpp, if V (k)

p ≤ x≤V (k)p +∆Vpp

0, else. (2)

Unfortunately, the above ideal memory cell thresh-old voltage distribution can be (significantly) distortedin practice, mainly due to PE cycling and cell-to-cell in-terference, which will be discussed in the remainder ofthis section.

2.2 Effects of PE CyclingFlash memory PE cycling causes damage to the tunneloxide of floating gate transistors in the form of chargetrapping in the oxide and interface states [9, 30, 34],which directly results in threshold voltage shift and fluc-tuation and hence gradually degrades memory devicenoise margin. Major distortion sources include

1. Electrons capture and emission events at chargetrap sites near the interface developed over PE cy-cling directly result in memory cell threshold volt-age fluctuation, which is referred to as random tele-graph noise (RTN) [15, 17];

2. Interface trap recovery and electron detrapping [26,31,45] gradually reduce memory cell threshold volt-age, leading to the data retention limitation.

Moreover, electrons trapped in the oxide over PE cy-cling make it difficult to erase the memory cells, leadingto a longer erase time, or equivalently, under the sameerase time, those trapped electrons make the thresholdvoltage of the erased state increase [4, 21, 27, 42]. Mostcommercial flash chips employ erase-and-verify opera-tion to prevent the increase of erase state threshold volt-age at the penalty of gradually longer erase time with PEcycling.

2

Page 3: Exploiting Memory Device Wear-Out Dynamics to Improve NAND

Ideal Programming

Memory Erase

),( ee !µ

Distorted by RTN

r"

Distorted by Cell-to-Cell Interference

#

Distorted by interface trap recovery and electron detrapping

Final Threshold Voltage Distribution

),( dd !µ t

PE cycling number N

ppV$

Figure 1: Illustration of the approximate NAND flash memory device model to incorporate major threshold voltagedistortion sources.

RTN causes random fluctuation of memory cellthreshold voltage, where the fluctuation magnitude issubject to exponential decay. Hence, we can modelthe probability density function pr(x) of RTN-inducedthreshold voltage fluctuation as a symmetric exponentialfunction [15]:

pr(x) =1

2λre−|x|λr . (3)

Let N denote the PE cycling number, λr scales with Nin an approximate power-law fashion, i.e., λr is approxi-mately proportional to Nα , where α tends to be less than1.

Interface trap recovery and electron detrapping pro-cesses approximately follow Poisson statistics [30],hence threshold voltage reduction due to interface traprecovery and electron detrapping can be approximatelymodeled as a Gaussian distribution N (µd ,σ

2d ). Both µd

and σ2d scale with N in an approximate power-law fash-

ion, and scale with the retention time t in a logarithmicfashion. Moreover, the significance of threshold voltagereduction induced by interface trap recovery and electrondetrapping is also proportional to the initial thresholdvoltage magnitude [27], i.e., the higher the initial thresh-old voltage is, the faster the interface trap recovery andelectron detrapping occur and hence the larger thresholdvoltage reduction will be.

2.3 Cell-to-Cell InterferenceIn NAND flash memory, the threshold voltage shift ofone floating gate transistor can influence the thresh-old voltage of its neighboring floating gate transistorsthrough parasitic capacitance-coupling effect [25]. Thisis referred to as cell-to-cell interference, which has beenwell recognized as the one of major noise sources inNAND flash memory [24,29,36]. Threshold voltage shiftof a victim cell caused by cell-to-cell interference can beestimated as [25]

F = ∑k

(∆V (k)t · γ(k)), (4)

where ∆V (k)t represents the threshold voltage shift of one

interfering cell which is programmed after the victim

cell, and the coupling ratio γ(k) is defined as

γ(k) =

C(k)

Ctotal, (5)

where C(k) is the parasitic capacitance between the in-terfering cell and the victim cell, and Ctotal is the totalcapacitance of the victim cell. Cell-to-cell interferencesignificance is affected by NAND flash memory bit-linestructure. In current design practice, there are two differ-ent bit-line structures, including conventional even/oddbit-line structure [35,40] and emerging all-bit-line struc-ture [10,28]. For write, all-bit-line structure writes all thecells on the same wordline. In even/odd bit-line struc-ture, memory cells on one word-line are alternativelyconnected to even and odd bit-lines and they are pro-grammed at different time. Therefore, an even cell ismainly interfered by five neighboring cells and an oddcell is interfered by only three neighboring cells. There-fore even cells and odd cells experience largely differ-ent amount of cell-to-cell interference. Cells in all-bit-line structure suffers less cell-to-cell inference than evencells in odd/even structure, and the all-bit-line structurecan most effectively support high-speed current sensingto improve the memory read and verify speed. Therefore,throughout the remainder of this paper, we mainly con-sider NAND flash memory with the all-bit-line structure.

2.4 An Approximate NAND Flash MemoryDevice Model

Based on the above discussions, we can approximatelymodel NAND flash memory device characteristics asshown in Fig. 1. Accordingly, we can simulate memorycell threshold voltage distribution and the correspondingmemory cell raw storage reliability. Based upon Eq.(1)and Eq.(2), we can obtain the distortion-less thresholdvoltage distribution function pp(x). Recall that ppr(x)denotes the RTN distribution function (see Eq.(3)), andlet par(x) denote the threshold voltage distribution af-ter incorporating RTN, which is obtained by convolutingpp(x) and pr(x):

par(x) = pp(x)⊗

pr(x). (6)

3

Page 4: Exploiting Memory Device Wear-Out Dynamics to Improve NAND

Cell-to-cell interference is further incorporated based onEq.(4). To capture the inevitable process variability, weset both the vertical coupling ratio γy and diagonal cou-pling ratio γxy are random variables with bounded Gaus-sian distributions:

pc(x) =

ccσc√

2π· e− (x−µc)2

2σ2c , if |x−µc| ≤ wc

0, else, (7)

where µc and σc are the mean and standard deviation,and cc is chosen to ensure the integration of this boundedGaussian distribution equals to 1. We set wc = 0.1µc andσc = 0.4µc in this work.

Let pac denote the threshold voltage distribution afterincorporating cell-to-cell interference, pt(x) denote thedistribution of threshold voltage fluctuation induced byinterface trap recovery and electron detrapping, the finalthreshold voltage distribution p f is obtained as

p f (x) = pac(x)⊗

pt(x). (8)

Example 2.1 Let us consider 2bits/cell NAND flashmemory. We set normalized σe and µe of the erased stateas 0.35 and 1.4, respectively. For the three programmedstates, we set the normalized program step voltage ∆Vppas 0.3, and the normalized verify voltages Vp as 2.85,3.55 and 4.25, respectively. For the RTN distributionfunction pr(x), we set the parameter λr = Kλ ·N0.5 whereKλ equals to 4× 10−4. Regarding cell-to-cell interfer-ence, according to [36, 38], we set the means of γy andγxy as 0.08 and 0.0048, respectively. For the functionN (µd ,σ

2d ) to capture interface trap recovery and elec-

tron detrapping, according to [30, 31], we set that µdscale with N0.5 and σ2

d scales with N0.6, and both scalewith ln(1 + t/t0), where t denotes the memory retentiontime and t0 is an initial time and can be set as 1 hour.In addition, as pointed out earlier, both µd and σ2

d alsodepend on the initial threshold voltage. Hence, we setthat both approximately scale with Ks(x−x0), where x isthe initial threshold voltage, and x0 and Ks are constants.Therefore, we have{

µd = Ks(x− x0)KdN0.5 ln(1+ t/t0)σ2

d = Ks(x− x0)KmN0.6 ln(1+ t/t0), (9)

where we set Ks = 0.333, x0 = 1.4, Kd = 4× 10−4, andKm = 2×10−6 by fitting the measurement data presentedin [30,31]. Accordingly, we carry out Monte Carlo com-puter simulations to obtain the cell threshold voltage dis-tribution as shown in Fig. 2, which illustrates how RTN,cell-to-cell interference, and retention noise affect thethreshold voltage distribution.

Figure 2: Simulated results to show the effects of RTN,cell-to-cell interference, and retention noise on memorycell threshold voltage distribution.

3 System Design Adaptive to PE Cycling

From the above discussions, it is clear that NAND flashmemory cell raw storage reliability gradually degradeswith the PE cycling: During the early lifetime of mem-ory cells (i.e., the PE cycling number N is relativelysmall), the aggregated PE cycling effects are relativelysmall, which leads to a relatively large memory cell stor-age noise margin and hence good raw storage reliabil-ity (i.e., low raw storage bit error rate); since the ag-gregated PE cycling effects scale with N in approximatepower-law fashions, the memory cell storage noise mar-gin and hence raw storage reliability gradually degradeas the PE cycling number N increases. Given the targetPE cycling endurance limit (e.g., 10K PE cycling), eachmemory word-line must have enough redundant mem-ory cells so that the corresponding ECC can ensure thestorage integrity as the PE cycling reaches the endurancelimit. Due to the memory cell raw storage reliability dy-namics, the redundancy geared to the worst-case scenariowill over-protect the user data for most time throughoutthe entire memory lifetime, especially at its early life-time when memory cell operational noise margin is muchlarger. This can be illustrated in Fig. 3, which clearlysuggests that the redundant memory cells are essentiallyunder-utilized at the memory early lifetime.

Very intuitively, we may trade such under-utilized re-dundancy to improve certain memory system perfor-mance metrics, which should be carried out adaptive to

4

Page 5: Exploiting Memory Device Wear-Out Dynamics to Improve NAND

00 01 11 10 00 01 11 10

Existing redundancy geared to the worst case

Under-utilizedLow raw bit error rate High raw bit error rate

Fully utilized

1K PE cycling 10K PE cycling

Figure 3: Illustration of the under-utilized ECC redundancy before reaching PE cycling endurance limit.

the memory PE cycling. In this work, we explore thisadaptive memory system design space from two perspec-tives as discussed in the remainder of this section.

3.1 Approach I: Improve Memory Pro-gram Speed

In this subsection, we elaborate on the potential of trad-ing the under-utilized ECC redundancy to improve aver-age memory program speed. As discussed in Section 2.1,NAND flash memory program is carried out recursivelyby sweeping over the entire memory cell threshold volt-age range with a program step voltage ∆Vpp. As a re-sult, the memory program latency is inversely propor-tional to ∆Vpp, which suggests that we can improve thememory program speed by increasing ∆Vpp. However, alarger ∆Vpp directly results in a wider threshold voltagedistribution of each programmed state, leading to lessnoise margin between adjacent programmed states andhence worse raw storage bit error rate. Therefore, thereis an inherent trade-off between memory program speedvs. memory raw bit error rate, which can be configuredby adjusting the program step voltage ∆Vpp. Since thememory cell noise margin is further degraded by the PEcycling effects as discussed above, a given ∆Vpp will re-sult in different noise margin (hence different raw storagebit error rate) as memory cells undergo different amountof PE cycling.

In current design practice, ∆Vpp is fixed and its value issufficiently small so that the ECC can tolerate the worst-case memory raw storage bit error rate as the PE cyclingreaches its endurance limit. As a result, the memoryprogram speed remains largely unchanged while the rawstorage bit error rate gradually degrades. Before the PEcycling number reaches its endurance limit, the existingredundancy is under-utilized as pointed out in the above.Clearly, to eliminate such redundancy under-utilization,we could intentionally increase the the program step volt-age ∆Vpp according to the run-time PE cycling number insuch a way that the memory raw storage bit error rate isalways close to what can be maximally tolerated by the

existing redundancy. Therefore, the existing redundancyis always almost fully utilized, and meanwhile the dy-namically increased ∆Vpp leads to higher average mem-ory program speed. The above discussion can be furtherillustrated in Fig. 4.

Although it would be ideal if the program step voltage∆Vpp can be smoothly adjusted with a very fine gran-ularity, the limited reference voltage accuracy in realNAND flash memory chips may only enable the use ofa few discrete program step voltages. Assume there arem different program step voltages, i.e., ∆V (1)

pp > ∆V (2)pp >

· · · > ∆V (m)pp . Given the existing ECC redundancy, we

can obtain a sequence of PE cycling thresholds N0 = 0 <N1 < · · ·< Nm so that, if the run-time PE cycling numberfalls into the range of [Ni−1,Ni), we can use the programstep voltage V (i)

pp and still ensure the overall system datastorage integrity. If we follow the conventional designpractice where the program step voltage is fixed accord-ing to the worst-case scenario, the smallest step voltage∆V (m)

pp will be used throughout the entire memory life-time. Therefore, we can estimate the average programspeed improvement over the entire memory lifetime as

s = 1−∑

mi=1(Ni−Ni−1) · 1

∆V (i)pp

Nm · 1∆V (m)

pp

. (10)

3.2 Approach II: Improve Memory Tech-nology Scalability

In this subsection, we elaborate on the potential of trad-ing the under-utilized ECC redundancy to improve mem-ory defect tolerance. With the help of very sophisticatedtechniques such as double patterning [20], the decade-long 193nm photolithography has successfully pushedNAND flash memory into the sub-30nm region. How-ever, as the industry is striving to push the NAND flashmemory technology scaling into the sub-20nm regionby using immersion photolithography or new lithogra-phy technologies such as nanoimprint, defects in suchextremely dense memory arrays may inevitably increase.

5

Page 6: Exploiting Memory Device Wear-Out Dynamics to Improve NAND

00 01 11 10 00 01 11 10

fastProgramming speed

Raw bit error rate

1K PE cycling

Small ppV!high

Large ppV!slow

low

Figure 4: Illustration of the impact of program step voltage ∆Vpp on the program speed vs. raw storage bit error ratetrade-off.

As a result, conventional spare row/column repair tech-niques may become inadequate to ensure a sufficientlyhigh yield.

Very intuitively, the existing ECC redundancy can beleveraged to tolerate memory defects, especially randommemory cell defects. However, if certain portion of ECCredundancy is used for defect tolerance, it will not beable to ensure the specified PE cycling limit, leading toPE cycling endurance degradation. Since all the pagesin each memory block undergo the same number of PEcycling, the worst-case page (i.e., the page contains themost defects) in each block sets the achievable PE cy-cling endurance for this block. For example, assume theexisting ECC redundancy can tolerate up to 50 errors foreach page and survive up to 10K PE cycling in the ab-sence of any memory cell defects. If the worst-case pagein one block contains 5 defective cells, then it can onlyuse the residual 45-error-correcting capability to toler-ate memory operational noises such as PE cycling ef-fects and cell-to-cell interference. Suppose this makesthe worst-case page can only survive up to 8K PE cy-cling, this block can only be erased by 8K times insteadof 10K times before risking data loss.

Clearly, if we attempt to reserve certain ECC re-dundancy for tolerating memory cell defects, we mustminimize the impact on overall memory PE cyclingendurance. In current design practice, NAND flashmemory uses wear-leveling to uniformly spread pro-gram/erase operations among all the memory blocks tomaximize the overall memory lifetime. Since differentmemory blocks with different amount of defective mem-ory cells can survive different number of PE cycling, uni-form wear-leveling is clearly not an optimal option. In-stead, we should make wear-leveling fully aware of thedifferent achievable PE cycling limits among differentmemory blocks, which is referred to as differential wear-leveling. This can be illustrated in Fig. 5: instead of uni-formly distributing program/erase operations among allthe memory blocks, the differential wear-leveling sched-ule the program/erase operations among all the memoryblocks in proportional to their achievable PE cycling lim-its. As a result, we may largely improve the overall mem-ory lifetime compared with uniform wear-leveling.

Early lifetimeBlock 0 Block 1 Block 2

Middle lifetime End lifetime

Endurance

Block 0 Block 1 Block 2 Block 0 Block 1 Block 2

Remained Life

(a)

Block 0 Block 1 Block 2

Early lifetime Middle lifetime End lifetimeBlock 0 Block 1 Block 2Block 0 Block 1 Block 2

Endurance

Remained Life

(b)

Figure 5: Illustration of (a) conventional uniform wear-leveling, and (b) proposed differential wear-leveling,where the ECC is used to tolerate defective memory cellsand hence different blocks may have different achievablePE cycling endurance.

Assume the worst-case page can at most contains Mdefective memory cells, and let Pd denote the probabilitythat the worst-case page in one block contains d ∈ [0, M]defective memory cells. Given the number of defectivememory cells in the worst-case page d, we can obtainthe corresponding achievable PE cycling endurance limitN(d), i.e., the ECC can ensure a PE cycling number up toN(d) while tolerating d defective memory cells. Clearly,we have N(0) > N(1) > · · · > N(M), where N(0) is theachievable PE cycling limit in the defect-free scenario.Define the effective PE cycling endurance as the averagePE cycling limits of all the memory blocks. Under theuniform wear-leveling, the memory chip can only sus-tain PE cycling of N(M). Therefore, compared with thedefect-free scenario, the effective PE cycling endurancedegrades by N(0)/N(M), which can result in a significantmemory lifetime degradation. On the other hand, underthe ideal differential wear-leveling, each block can reachits own PE cycling limit as illustrated in Fig. 5, hencethe effective PE cycling endurance will be ∑

Md=0 Pd ·N(d),

6

Page 7: Exploiting Memory Device Wear-Out Dynamics to Improve NAND

representing the improvement of

∑Md=0 Pd ·N(d)

N(M) (11)

over the uniform wear-leveling. We note that this de-sign approach can be combined with the one presented inSection 3.1 to improve average memory program speedin the presence of memory cell defects. Given the num-ber of defective memory cells d and the set of m pro-gram step voltage ∆V (i)

pp for 1 ≤ i ≤ m, we can obtain aset of PE cycling thresholds Nd

0 = 0 < Nd1 < · · · < Nd

m,i.e., if present PE cycling number falls into the range of[N(d)

i−1,N(d)i ), we can use the program step voltage ∆V (i)

ppand meanwhile ensure the tolerance to d defective mem-ory cells. Therefore, for the blocks whose worst-casepage contains d defective memory cells, the average pro-gram speed improvement is

sd = 1−∑

mi=1(N

di −Nd

i−1) ·1

∆V (i)pp

Ndm · 1

∆V (m)pp

. (12)

The overall average program speed improvement can befurther estimated as ∑

Mi=1 Pi · si.

4 Evaluation Results

We carried out simulations and analysis to future demon-strate the effectiveness of the above two simple de-sign approaches and their combination. To carry outtrace-based simulations, we use the SSD module [3] inDiskSim [8], and use 6 workload traces including Iozoneand Postmark [3], Finance1 and Finance2 from [1], andTrace1 and Trace2 from [16]. The simulator can sup-port the use of several parallel packages that can workin parallel to improve the SSD throughput. Each pack-age contains 2 dies that share an 8-bit I/O bus and anumber of common control signals, and each die con-tains 4 planes and each plane contains 2048 blocks. Eachblock contains 64 4KB pages, each of which consists of8 512B sectors. Following the version 2.1 of the OpenNAND Flash Interface (ONFI) [2], we set the NANDflash chip interface bus frequency as 200MB/s. Re-garding the ECC, we assume that binary (n, k, t) BCHcodes are being used, where n is the codeword length,k is the user data length (i.e., 512B in this study), andt is the error-correcting capability. We consider the useof 2bit/cell NAND flash memory, and set the baseline2bit/cell NAND flash memory using the equivalent mem-ory channel model parameters presented in Example 2.1in Section 2.4, for which a (4798, 4096, 54) BCH codecan ensure a PE cycling endurance limit of 10K under theretention time of 1 year. We note that the target NAND

flash memory retention time is fixed as 1 year throughoutall the studies in this work.

In this section, we first present trace-based simulationresults to demonstrate how the first design approach canreduce the overall request response time and hence im-prove SSD speed performance. Then, we present anal-ysis results to demonstrate the second design approachby assuming memory cell defects follow Poisson distri-bution. Finally, we demonstrate the effectiveness whenthese two approaches are combined together to improveSSD speed performance in the presence of memory celldefects.

4.1 Improve SSD Speed PerformanceIn the baseline scenario with the parameters listed in Ex-ample 2.1, the normalized program step voltage ∆Vpp is0.3. As discussed in Section 3.1, we can use larger-than-worst-case ∆Vpp over the memory lifetime to improvememory program speed by exploiting the memory de-vice wear-out dynamics. In this work, we assume thatmemory chip voltage generators can increase ∆Vpp with astep of 0.05, hence we consider four different normalizedvalues: ∆V (1)

pp = 0.45, ∆V (2)pp = 0.4, ∆V (1)

pp = 0.35, and∆V (4)

pp = 0.3. By carrying out Monte Carlo simulationswithout changing the other memory model parameters,we have that these four different program step voltagescan survive up to N1 = 2710, N2 = 4820, N3 = 7500, andN4 = 10000 PE cycling, respectively, under the retentiontime of 1 year. Therefore, according to Eq.(10), the av-erage NAND flash memory program speed can be im-proved by 18% compared with the baseline scenario. Wefurther carried out DiskSim-based simulations to investi-gate how such improved memory program speed can re-duce the SSD average response time (incorporating bothwrite and read request response time) for different tracesunder different system configurations. We set that the2bit/cell NAND flash memory program latency as 600µswhen the normalized program step voltage ∆Vpp is 0.3,on-chip memory sensing latency as 30µs, and erase timeas 3ms.

In this study, we consider the use of 4 and 8 parallelpackages. Fig. 6 compares the normalized SSD averageresponse time when using 4 and 8 parallel packages, re-spectively, where we set ∆Vpp as 0.3. It shows that usingmore parallel packages can directly improve SSD speedperformance, which can be intuitively justified. Fig. 7(a)and Fig. 7(b) show the normalized SSD average responsetime under the 4 different normalized program step volt-age ∆Vpp for all the 6 traces when the SSD contains 4 and8 parallel packages, respectively. We use the first-comefirst-serve (FCFS) scheduling scheme in the simulations.Compared with the baseline scenario with ∆Vpp = 0.3,the average response time can be reduced by up to∼50%

7

Page 8: Exploiting Memory Device Wear-Out Dynamics to Improve NAND

0

0.2

0.4

0.6

0.8

1

1.2

Finance1 Finance2 Postmark Iozone Trace1 Trace2

Nor

mal

ized

Ave

rage

Res

pons

e Ti

me

ΔVpp=0.3 ΔVpp=0.35 ΔVpp=0.4 ΔVpp=0.45

(a)

1.2

ΔVpp=0.3 ΔVpp=0.35 ΔVpp=0.4 ΔVpp=0.45

1

nse

Tim

e

0.8

e R

espo

n

0.6

d Av

erag

e

0.4

orm

aliz

ed

0.2No

0Finance1 Finance2 Postmark Iozone Trace1 Trace2

(b)

Figure 7: Simulated normalized average response time when the SSD contains (a) 4 parallel packages, and (b) 8parallel packages.

1.2

Parallel Packages=4 Parallel Packages=8

1

1.2

se T

ime

Parallel Packages=4 Parallel Packages=8

0.8

1

1.2

Res

pons

e T

ime

Parallel Packages=4 Parallel Packages=8

0.6

0.8

1

1.2

Ave

rage

Res

pons

e T

ime

Parallel Packages=4 Parallel Packages=8

0.4

0.6

0.8

1

1.2

rmal

ized

Ave

rage

Res

pons

e T

ime

Parallel Packages=4 Parallel Packages=8

0.2

0.4

0.6

0.8

1

1.2

Nor

mal

ized

Ave

rage

Res

pons

e T

ime

Parallel Packages=4 Parallel Packages=8

0

0.2

0.4

0.6

0.8

1

1.2

Finance1 Finance2 Postmark Iozone Trace1 Trace2

Nor

mal

ized

Ave

rage

Res

pons

e T

ime

Parallel Packages=4 Parallel Packages=8

0

0.2

0.4

0.6

0.8

1

1.2

Finance1 Finance2 Postmark Iozone Trace1 Trace2

Nor

mal

ized

Ave

rage

Res

pons

e T

ime

Parallel Packages=4 Parallel Packages=8

Figure 6: Comparison of normalized SSD average re-sponse time with 4 and 8 parallel packages (∆Vpp = 0.3).

with 4 parallel packages and up to ∼40% with 8 parallelpackages. The results show that the use of larger programstep voltage can consistently improve SSD speed perfor-mance under different number of parallel packages.

Given the PE cycling thresholds Ni for i = 1,2,3,4 aspresented in the above, the NAND flash memory shouldemploy the program step voltage ∆V (i)

pp when the presentPE cycling number falls into [Ni−1, Ni), where N0 isset to 0. Therefore, based on the the simulation resultsshown in Fig. 7, we can obtain the overall SSD averageresponse time reduction compared with the baseline sce-nario, as shown in Fig. 8. It shows that this proposeddesign approach can noticeably improve the overall SSDspeed performance. Intuitively, those traces with higherwrite request ratios (e.g., Iozone, Trace1, and Trace2)tend to benefit more from this design approach, as shownin Fig. 8. In addition, as we increase the package par-

allelism from 4 to 8, the overall response time reduc-tion consistently reduces over all the traces. This can beexplained as follows: As the SSD contains more paral-lel packages, the increased architecture-level parallelismwill directly improve SSD speed performance, as illus-trated in Fig. 6. As a result, this will make the improve-ment on the device-level program speed become rela-tively less significant with respect to the improvementof overall system speed performance.

35%

n

Parallel Packages=4 Parallel Packages=8

30%

Red

uctio

n

20%

25%

se T

ime

R

15%

20%

Res

pons

10%

l Ave

rage

5%

Ove

rall

0%Finance1 Finance2 Postmark Iozone Trace1 Trace2

Figure 8: Overall SSD average response time reductioncompared with the baseline scenario when using 4 and 8parallel packages.

In the above simulations, the FCFS scheduling schemehas been used. To study the sensitivity of this design ap-proach to different scheduling schemes, we repeat theabove simulations using two other popular schedulingschemes including ELEVATOR and SSTF (shortest seektime first) [43]. Fig. 9 shows the overall SSD average

8

Page 9: Exploiting Memory Device Wear-Out Dynamics to Improve NAND

response time reduction compared with the baseline sce-nario, where the SSD contains 4 parallel packages. Theresults show the proposed design approach can consis-tently improve overall SSD speed performance under dif-ferent scheduling schemes.

35%

40%

ion

FCFS ELEVATOR SSTF

30%

e R

educ

ti

25%

onse

Tim

e

15%

20%

ge R

espo

10%

15%

all A

vera

g

5%Ove

ra

0%Finance1 Finance2 Postmark Iozone Trace1 Trace2

Figure 9: Overall SSD average response time reduc-tion compared with the baseline scenario under differentscheduling schemes.

4.2 Improve Defect ToleranceTo demonstrate the proposed design approach for im-proving memory defect tolerance, we assume that thenumber of defective memory cells in each worst-casepage follows a Poisson distribution that is widely usedto model defects in integrated circuits. Therefore, un-der the Poisson-based distribution model, the probabilitythat the worst-case page in each block contains d defec-tive memory cells is f (k;λ ) = λ ke−λ

k! , where the param-eter λ is the mean of the number of defective memorycells in each worst-case page. Given the parameter λ ,we find the value M so that ∑

Mi=0 f (i;λ ) ≥ 0.999, and

assume that any blocks whose worst-case page containsmore than M defective memory cells can be replaced bya redundant block. In this work, we consider the mean λ

ranging from 1 to 4, and accordingly have that the maxi-mum value of M is 12.

Using the baseline NAND flash memory model pa-rameters as listed in Example 2.1, we can obtain theachievable PE cycling limit N(d) for each d, i.e., we usethe (4798, 4096, 54) BCH code to tolerate d defectivememory cells and meanwhile use its residual (54− d)-error-correcting capability to ensure a PE cycling en-durance limit of N(d) under the retention time of 1 year.Fig. 10 shows the achievable PE cycling limit N(d) withd ranging from 0 to 12. Under different value of meanλ , we have different value of M, denoted as M(λ ). Whenthe uniform wear-leveling is being used, the effective PE

10000

12000

8000

10000

ance

6000

8000

g E

ndur

a

4000

6000

Cyc

ling

2000

PE

00 1 2 3 4 5 6 7 8 9 10 11 12

Number of defects

Figure 10: Achievable PE cycling endurance under dif-ferent value of defective memory cells in the worst-casepage.

cycling endurance is simply N(d) when d = M(λ ). Whenthe proposed differential wear-leveling is being used, theeffective PE cycling endurance is

M(λ )

∑d=0

λ ke−λ

k!·N(d), (13)

for a given mean λ . Fig. 11 shows the effective PE cy-cling endurance when these two different wear-levelingschemes are being used under different value of λ . Theresults show that the proposed differential wear-levelingcan noticeably improve the effective PE cycling en-

00.10.20.30.40.50.60.70.80.9

1

No Defects

λ=1 λ=2 λ=3 λ=4

Nor

mal

ized

Eff

ectiv

e E

ndur

ance

Differential wear-leveling with defectsUniform wear-leveling with defects

Figure 11: Effective PE cycling endurance when usinguniform wear-leveling and differential wear-leveling un-der different value of λ .

durance and hence SSD lifetime compared with uniformwear-leveling. As the defects density increases (i.e., λ

9

Page 10: Exploiting Memory Device Wear-Out Dynamics to Improve NAND

increases), the gain of differential wear-leveling over uni-form wear-leveling will accordingly improve (i.e., fromabout 10% improvement at λ = 1 to about 30% improve-ment at λ = 4).

4.3 Combination of the Two Design Ap-proaches

As discussed earlier, we can combine the proposed twodesign approaches in order to improve SSD speed perfor-mance when ECC is also used to tolerate defective mem-ory cells. Following the discussions in Section 4.2, weassume that the number of defective memory cells in theworst-case page has a Poisson distribution and considerthe cases when the mean λ ranges from 1 to 4. Follow-ing the discussions in Section 4.1, beyond the normal-ized program step voltage ∆Vpp of 0.3 in the baseline sce-nario, we consider three larger values of ∆Vpp, including0.35, 0.4, and 0.45. Denote ∆V (1)

pp = 0.45, ∆V (2)pp = 0.4,

∆V (3)pp = 0.35, and ∆V (4)

pp = 0.3. Given the memory celldefects number d and the (4798, 4096, 54) BCH codebeing used, we can obtain a set of PE cycling thresh-olds Nd

0 = 0 < Nd1 < · · · < Nd

4 so that, if present PE cy-cling number falls into the range of [N(d)

i−1,N(d)i ), we can

use the program step voltage ∆V (i)pp and meanwhile en-

sure the tolerance of d defective memory cells. Fig. 12shows the PE cycling thresholds when the defect numberincreases from 0 to 12. The results can be intuitively jus-tified: as the defect number increases, the residual ECCerror-correcting capability degrades, and consequentlythe larger program step voltage can only be used overa less number of PE cycling.

6000

8000

10000

12000

End

uran

ce

ΔVpp=0.3 ΔVpp=0.35 ΔVpp=0.4 ΔVpp=0.45

0

2000

4000

6000

0 1 2 3 4 5 6 7 8 9 10 11 12

PE C

yclin

g

Number of defects

Figure 12: PE cycling thresholds corresponding to dif-ferent number of defective cells in the worst-case pageof one block.

Given each program step voltage ∆V (i)pp , we can obtain

the normalized SSD response time τi for each specific

trace, as shown in Fig. 7. Recall that, when the PE cy-cling number falls into the range [N(d)

i−1,N(d)i ), we can use

the program step voltage ∆V (i)pp , and the baseline scenario

fixes the program step voltage as ∆V (4)pp = 0.3 throughout

the entire memory lifetime. Therefore, we can calculatethe overall SSD average response time reduction over thebaseline scenario for each trace as

12

∑d=0

f (d;λ )∑

4i=1(N

(d)i −N(d)

i−1) · τi

N(d)4 · τ4

, (14)

and the results are shown in Fig. 13. The results suggestthat we still can maintain a noticeable SSD speed perfor-mance improvement when ECC is also used to toleratedefective memory cells.

5 Related Work

NAND flash memory system design has attracted manyrecent attentions, where most work focused on improv-ing system speed performance and endurance. Dirik andJacob [16] studied the effect on SSD system speed per-formance by changing various SSD system parallelismand concurrency at different levels such as the numbersof planes on each channel and the number of channels,and compared various existing disk access scheduling al-gorithms. Agrawal et al. [3] analyzed the effect of pagesize, striping and interleaving policy on the memory sys-tem performance, and proposed a conception of gang as ahigher-level “superblock” to facilitate SSD system-levelparallelism configurations. Min and Nam [32] developedseveral NAND flash memory performance enhancementtechniques such as write request interleaving. Seonget al. [37] applied bus-level and chip-level interleavingto exploit the inherent parallelism in multiple flash mem-ory chips to improve the SSD speed performance. Theauthors of [11,13] applied adaptive bank scheduling poli-cies to achieve an even distribution of write request andload balance to improve system speed performance.

Wear-leveling is used to improve NAND flash mem-ory endurance. Gal and Toledo [18] surveyed manypatented and published wear-leveling algorithms anddata structures for NAND flash memory. Ben-Aroyaand Toledo [5] more quantitatively evaluated differentwear-leveling algorithms, including both on-line andoff-line algorithms. The combination of wear-levelingand garbage collection and the involved design trade-offs have been investigated by many researchers, e.g.,see [12, 14, 22, 23, 44]. In current design practice, defecttolerance has been mainly realized by bad block man-agement that run-time monitors and disables the futureuse of blocks with defects. Traditional redundant repaircan also be used to compensate certain memory defects,

10

Page 11: Exploiting Memory Device Wear-Out Dynamics to Improve NAND

30%

35%ct

ion

25%

30%

me

Red

uc

20%

onse

Tim

No Defectsλ=1

15%

ge R

espo λ=1

λ=2λ=3

10%

ll Av

erag λ=4

5%

Ove

ral

0%Finance1 Finance2 Postmark Iozone Trace1 Trace2

(a)

0%

5%

10%

15%

20%

25%

Finance1 Finance2 Postmark Iozone Trace1 Trace2

Ove

rall

Ave

rage

Res

pons

e T

ime

Red

uctio

n

No Defectsλ=1λ=2λ=3λ=4

(b)

Figure 13: Overall average response time reduction over the baseline scenario under different λ when SSD contains(a) 4 parallel packages and (b) 8 parallel packages.

e.g., see [19]. In addition, a NAND flash memory devicemodel was presented in [33], which nevertheless doesnot take into account of RTN noise and cell-to-cell in-terference, and the model was used to show that time-dependent trap recovery can be leveraged to improvememory endurance.

We note that most prior work on improving SSD sys-tem speed performance and/or memory endurance arecarried out mainly from architecture/system perspectiveto combat flash memory device issues. To the best ofour knowledge, this paper represents the first attempt toadaptively exploit flash memory device characteristics,in particular PE-cycling-dependent device wear-out dy-namics, at the system level to improve SSD system speedperformance and NAND flash memory scalability. Theproposed design approaches are completely orthogonalto prior architecture/system level techniques and can bereadily combined together.

6 Conclusion

This paper investigates the potential of adaptively lever-aging NAND flash memory cell wear-out dynamics toimprove memory system performance. As memory PEcycling increases, NAND flash memory cell storagenoise margin and hence raw storage reliability accord-ingly degrade. Therefore, the specified PE cycling en-durance limit determines the worst-case raw memorystorage reliability, which further sets the amount of re-dundant memory cells that must be fabricated. Motivatedby the fact that such worst-case oriented redundancy isessentially under-utilized over the entire memory life-time, especially when the PE cycling number is relativelysmall, this paper proposes to trade such under-utilized re-

dundancy to improve system speed performance and/ortolerate defective memory cells. We further propose asimple differential wear-leveling scheme to minimize theimpact on PE cycling endurance if the redundancy isused to tolerate defective memory cells. To quantita-tively evaluate such adaptive NAND flash memory sys-tem design strategies, we first develop an approximateNAND flash memory device model that can capture theeffects of PE cycling on memory cell storage reliabil-ity. To evaluate the effectiveness on improving memorysystem speed, we carry out extensive simulations over avariety of traces using the DiskSim-based SSD simula-tor under different system configurations, and the resultsshow up to 32% SSD average response time reductioncan be achieved. To evaluate the effectiveness on de-fect tolerance, with a Poisson-based defect statics model,we show that this design strategy can tolerate relativelyhigh defect rates at small degradation of effective PE cy-cling endurance. Finally, we show that these two aspectscan be combined together so that we could noticeablyreduce SSD average response time even in the presenceof high memory defect densities. generate the the refer-ences with alphatical order.

AcknowledgmentsWe thank Albert Fazio at Intel for his valuable com-ments on NAND flash memory device modeling. Wealso thank the anonymous reviewers and our shepherdEno Thereska for their feedback.

References

[1] “SPC Trace File Format Specification.http://traces.cs.umass.edu/index.php/Storage/Storage,

11

Page 12: Exploiting Memory Device Wear-Out Dynamics to Improve NAND

Last accessed on June 6, 2010,” Storage Perfor-mance Council, Tech. Rep. Revision 1.0.1, 2002.

[2] “Open NAND Flash Interface Specification,”Hynix Semiconductor and Intel Corporation andMicron Technology, Inc. and Numonyx and PhisonElectronics Corp. and Sony Corporation and Span-sion, Tech. Rep. Revision 2.1, Jan. 2009.

[3] N. Agrawal, V. Prabhakaran, T. Wobber, J. D.Davis, M. Manasse, and R. Panigrahy, “DesignTradeoffs for SSD Performance,” in Proc. ofUSENIX Annual Technical Conference, 2008, pp.57–70.

[4] S. Aritome, R. Shirota, G. Hemink, T. Endoh, andF. Masuoka, “Reliability Issues of Flash MemoryCells,” Proceedings of the IEEE, vol. 81, no. 5, pp.776–788, 1993.

[5] A. Ben-Aroya and S. Toledo, “Competitive Analy-sis of Flash-Memory Algorithms,” in Proc. of theAnnual European Symposium, 2006, pp. 100–111.

[6] R. Bez, E. Camerlenghi, A. Modelli, and A. Vis-conti, “Introduction to Flash memory,” Proceedingsof the IEEE, vol. 91, pp. 489–502, April 2003.

[7] R. Bez and P. Cappelletti, “Flash Memory and Be-yond,” in Proc. of IEEE VLSI-TSA InternationalSymposium on VLSI Technology, 2005, pp. 84–87.

[8] J. S. Bucy, J. Schindler, S. W. Schlosser, and G. R.Ganger, “The DiskSim Simulation EnvironmentVersion 4.0 Reference Manual,” Carnegie MellonUniversity Parallel Data Lab, Tech. Rep. CMU-PDL-08-101, May 2008.

[9] P. Cappelletti, R. Bez, D. Cantarelli, and L. Fratin,“Failure Mechanisms of Flash Cell in Pro-gram/erase Cycling,” in Proc. of InternationalElectron Devices Meeting (IEDM), 1994, pp. 291–294.

[10] R.-A. Cernea and et al., “A 34 MB/s MLC WriteThroughput 16 Gb NAND With All Bit Line Archi-tecture on 56 nm Technology,” IEEE J. Solid-StateCircuits, vol. 44, pp. 186–194, Jan. 2009.

[11] L.-P. Chang and T.-W. Kuo, “An Adaptive Strip-ing Architecture for Flash Memory Storage Sys-tems of Embedded Systems,” Proc. of IEEE Real-Time and Embedded Technology and ApplicationsSymposium, pp. 187 – 196, 2002.

[12] L.-P. Chang, T.-W. Kuo, and S.-W. Lo, “Real-time Garbage Collection for Flash-Memory Stor-age Systems of Real-time Embedded Systems,”

ACM Transactions on Embedded Computing Sys-tems, vol. 3, no. 4, pp. 837–863, 2004.

[13] Y.-B. Chang and L.-P. Chang, “A Self-BalancingStriping Scheme for NAND-Flash Storage Sys-tems,” in Proc. of the ACM Symposium on AppliedComputing, 2008, pp. 1715–1719.

[14] M.-L. Chiang and R.-C. Chang, “Cleaning Policiesin Mobile Computers Using Flash Memory,” Jour-nal of Systems and Software, vol. 48, no. 3, pp.213–231, 1999.

[15] C. Compagnoni, M. Ghidotti, A. Lacaita,A. Spinelli, and A. Visconti, “Random TelegraphNoise Effect on the Programmed Threshold-Voltage Distribution of Flash Memories,” IEEEElectron Device Letters, vol. 30, no. 9, 2009.

[16] C. Dirik and B. Jacob, “The Performance ofPC Solid-state disks (SSDs) as a Function ofBandwidth, Concurrency, Device Architecture, andSystem Organization,” SIGARCH Comput. Archit.News, vol. 37, no. 3, pp. 279–289, 2009.

[17] K. Fukuda, Y. Shimizu, K. Amemiya,M. Kamoshida, and C. Hu, “Random TelegraphNoise in Flash Memories - Model and TechnologyScaling,” in Proc. of IEEE International ElectronDevices Meeting (IEDM), 2007, pp. 169–172.

[18] E. Gal and S. Toledo, “Algorithms and Data Struc-tures for Flash Memories,” ACM Computing Sur-veys, vol. 37, no. 2, pp. 138–163, 2005.

[19] Y.-Y. Hsiao, C.-H. Chen, and C.-W. Wu, “Built-InSelf-Repair Schemes for Flash Memories,” IEEETransactions on Computer-Aided Design of Inte-grated Circuits and Systems, vol. 29, no. 8, pp.1243 –1256, Aug. 2010.

[20] B. Hwang and et al, “Comparison of DoublePatterning Technologies in NAND Flash Mem-ory With Sub-30nm Node,” in Proc. of the Eu-ropean Solid State Device Research Conference(ESSDERC), 2009, pp. 269 –271.

[21] T. Jung and et al., “A 117-mm 3.3-V only 128-MbMultilevel NAND Flash Memory for Mass StorageApplications,” IEEE J. Solid-State Circuits, vol. 31,no. 11, pp. 1575–1583, Nov. 1996.

[22] A. Kawaguchi, S. Nishioka, and H. Motoda, “AFlash-Memory Based File System,” in Proc. of theUSENIX Technical Conference, 1995, pp. 13–13.

12

Page 13: Exploiting Memory Device Wear-Out Dynamics to Improve NAND

[23] H. Kim and S. Lee, “An Effective Flash MemoryManager for Reliable Flash Memory Space Man-agement,” IEICE Transactions on Information andSystems, vol. 85, no. 6, pp. 950–964, 2002.

[24] K. Kim and et.al, “Future Memory Technology:Challenges and Opportunities,” in Proc. of Inter-national Symposium on VLSI Technology, Systemsand Applications, Apr. 2008, pp. 5–9.

[25] J.-D. Lee, S.-H. Hur, and J.-D. Choi, “Effects ofFloating-Gate Interference on NAND Flash Mem-ory Cell Operation,” IEEE Electron Device Letters,vol. 23, no. 5, pp. 264–266, May. 2002.

[26] J. Lee, J. Choi, D. Park, and K. Kim, “DataRetention Characteristics of sub-100 nm NANDFlash Memory Cells,” IEEE Electron Device Let-ters, vol. 24, no. 12, pp. 748–750, 2003.

[27] J. Lee, J. Choi, D. Park, K. Kim, R. Center, S. Co,and S. Gyunggi-Do, “Effects of Interface TrapGeneration and Annihilation on the Data Reten-tion Characteristics of Flash Memory Cells,” IEEETransactions on Device and Materials Reliability,vol. 4, no. 1, pp. 110–117, 2004.

[28] Y. Li and et. al, “A 16Gb 3b/Cell NAND FlashMemory in 56nm with 8MB/s Write Rate,” in Proc.of IEEE International Solid-State Circuits Confer-ence (ISSCC), Feb. 2008, pp. 506–632.

[29] H. Liu, S. Groothuis, C. Mouli, J. Li, K. Parat, andT. Krishnamohan, “ 3D Simulation Study of Cell-Cell Interference in Advanced NAND Flash Mem-ory,” in Proc. of IEEE Workshop on Microelectron-ics and Electron Devices, April 2009.

[30] N. Mielke, H. Belgal, I. Kalastirsky, P. Kalavade,A. Kurtz, Q. Meng, N. Righos, and J. Wu, “FlashEEPROM Threshold Instabilities Due to ChargeTrapping During Program/erase Cycling,” IEEETransactions on Device and Materials Reliability,vol. 4, no. 3, pp. 335–344, 2004.

[31] N. Mielke, H. Belgal, A. Fazio, Q. Meng, andN. Righos, “Recovery Effects in the DistributedCycling of Flash Memories,” in Proc. of IEEE In-ternational Reliability Physics Symposium, 2006,pp. 29–35.

[32] S. L. Min and E. H. Nam, “Current Trends in FlashMemory Technology,” Proc. of Asia and South Pa-cific Conference on Design Automation., p. 2., Jan.2006.

[33] V. Mohan, T. Siddiqua, S. Gurumurthi, and M. R.Stan, “How I Learned to Stop Worrying and LoveFlash Endurance,” in Proc. of the 2nd USENIX con-ference on Hot topics in storage and file systems,2010, pp. 3–3.

[34] P. Olivo, B. Ricco, and E. Sangiorgi, “High FieldInduced Voltage Dependent Oxide Charge,” Ap-plied Physics Letter, vol. 48, pp. 1135–1137, 1986.

[35] K.-T. Park and et al., “A Zeroing Cell-to-Cell In-terference Page Architecture With Temporary LSBStoring and Parallel MSB Program Scheme forMLC NAND Flash Memories,” IEEE J. Solid-StateCircuits, vol. 40, pp. 919–928, Apr. 2008.

[36] K. Prall, “Scaling Non-Volatile Memory Below30nm,” in Proc. of IEEE Non-Volatile Semiconduc-tor Memory Workshop, Aug. 2007, pp. 5–10.

[37] Y. J. Seong, E. H. Nam, J. H. Yoon, H. Kim, J. Choi,S. Lee, Y. H. Bae, J. Lee, Y. Cho, and S. L. Min,“Hydra: A Block-Mapped Parallel Flash MemorySolid-State Disk Architecture,” IEEE Transactionson Computers, vol. 59, no. 7, pp. 905 –921, Jul.2010.

[38] N. Shibata and et al., “A 70 nm 16 Gb 16-level-cellNAND flash memory,” in Proc. of IEEE Symposiumon VLSI Circuits, 2007, pp. 190–191.

[39] K.-D. Suh and et al., “A 3.3 V 32 Mb NANDFlash Memory with Incremental Step Pulse Pro-gramming Scheme,” IEEE J. Solid-State Circuits,vol. 30, no. 11, pp. 1149–1156, Nov. 1995.

[40] K. Takeuchi and et al., “A 56-nm CMOS 99-mm2

8-Gb Multi-Level NAND Flash Memory With 10-MB/s Program Throughput,” IEEE J. Solid-StateCircuits, vol. 42, pp. 219–232, Jan. 2007.

[41] K. Takeuchi, T. Tanaka, and H. Nakamura, “ADouble-level-Vth Select Gate Array Architecturefor Multilevel NAND Flash Memories,” IEEE J.Solid-State Circuits, vol. 31, no. 4, pp. 602–609,Apr. 1996.

[42] D. Wellekens, J. Van Houdt, L. Faraone, G. Groe-seneken, and H. Maes, “Write/erase Degradation inSource Side Injection Flash EEPROM’s: Charac-terization Techniques and Wearout Mechanisms,”IEEE Transactions on Electron Devices, vol. 42,no. 11, pp. 1992–1998, 1995.

[43] B. L. Worthington, G. R. Ganger, and Y. N.Patt, “Scheduling Algorithms for Modern DiskDrives,” in Proc. of the ACM SIGMETRICS Inter-national Conference on Measurement and Model-ing of Computer Systems, 1994, pp. 241–251.

13

Page 14: Exploiting Memory Device Wear-Out Dynamics to Improve NAND

[44] M. Wu and W. Zwaenepoel, “eNVy: a NonVolatileMain Memory Storage System,” Proc. of FourthWorkshop on Workstation Operating Systems, pp.116 –118, Oct. 1993.

[45] H. Yang and et al., “Reliability Issues and Modelsof sub-90nm NAND Flash Memory Cells,” in Proc.of International Conference on Solid-State and In-tegrated Circuit Technology, 2006, pp. 760–762.

14