an efficient index structure for nand flash memory and it ... · b+tree (1) characteristics •...

39
An Efficient Index Structure An Efficient Index Structure An Efficient Index Structure for NAND Flash Memory d It A li ti An Efficient Index Structure for NAND Flash Memory d It A li ti and Its Applications and Its Applications 한국과학기술원 전산학과 김진수 (jinsoo@cs kaist ac kr) (jinsoo@cs.kaist.ac.kr)

Upload: others

Post on 23-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

An Efficient Index Structure An Efficient Index Structure An Efficient Index Structure for NAND Flash Memory

d It A li ti

An Efficient Index Structure for NAND Flash Memory

d It A li tiand Its Applicationsand Its Applications

한국과학기술원 전산학과

김 진 수(jinsoo@cs kaist ac kr)([email protected])

Page 2: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

OutlineOutlineOutlineOutline

μ-Tree• Motivation• Design• Analysis• Evaluations

h b dCurrent Research based on μ-Tree• μFS• μ-FTL

2NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 3: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μ-Treeμ-Treeμμ

Page 4: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

Index StructureIndex StructureIndex StructureIndex StructureIndex• A data structure that enables sublinear time lookup.

(wikipedia.org)• Key Record

Examples• File system:

– File name File metadata (directory)

• DBMS:– Primary key Database record

4NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 5: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

B+Tree (1)B+Tree (1)B+Tree (1)B+Tree (1)Characteristics• Disk-based index structure• Used in most file systems and DBMSsy• f-ary balanced search tree• O(logf N) of retrieval, insertion, and deletionO(logf N) of retrieval, insertion, and deletion• Records are only in leaf nodes

15

10 13 1910 13 19

1 7 10 11 18 19 2114

5NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 6: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

B+Tree (2)B+Tree (2)B+Tree (2)B+Tree (2)B+Tree on Flash

A A1 A2

valid node invalid node

A

B C

A1

C1

A2

B2B C

D E F

C1

F1

B2

D2D E F F1 D2

D E F B C AD E F B C A F1D E F B C A F1 C1D E F B C A F1 C1 A1D E F B C A F1 C1 A1D E F B C A F1 C1 A1 D2D E F B C A F1 C1 A1 D2 B2D E F B C A F1 C1 A1 D2 B2 A2D E F B C A F1 C1 A1 D2 B2 A2

6NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 7: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

B+Tree (3)B+Tree (3)B+Tree (3)B+Tree (3)B+Tree on Flash (cont’d)( )• Node size = Page size in NAND flash memory• Wandering treeg

– A tree where any update in the tree requires updating parent nodes up to the root

• Update cost: Cw * H – H : The height of B+Tree

C : The cost of a write operation– Cw : The cost of a write operation

• B+Tree updates can be cached in memory for a while.– Journal tree in JFFS3Journal tree in JFFS3– Flush out all the changes in bulk

7NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 8: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μ-Tree (1)μ-Tree (1)μ Tree (1)μ Tree (1)

lid d i lid d

A A1 A2

valid node invalid node page

B C C1 B2

D E F F1 D2

AAAA A1A A1A A1A A1A A1 A2A A1 A2

B C

D E F

B C

D E F F1

B C C1

D E F F1

B C C1

D E F F1

B C C1

D E F F1

B C C1

D E F F1 D2

B C C1 B2

D E F F1 D2

B C C1 B2

D E F F1 D2

B C C1 B2

D E F F1 D2

8NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 9: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μ-Tree (2)μ-Tree (2)μ Tree (2)μ Tree (2)μ-Tree (minimally updated-Tree)μ ( y p )• A tree which store all the nodes from leaf to the root

into a single page.

Pros• Minimal update cost: 1 * Cw

ConsCons• Smaller node size Higher height• Lower space utilization• Lower space utilization

9NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 10: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μ-Tree (3)μ-Tree (3)μ Tree (3)μ Tree (3)Page Layoutg y

RootRoot

512 bytes

3Root

Root512 bytes

22 1024 bytes

Root

111 2048 bytes

10NVRAMOS 2008 – April 25, 2008. ([email protected])

2 3 41 Height

Page 11: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μ-Tree (4)μ-Tree (4)μ Tree (4)μ Tree (4)Height Increaseg

R

valid nodeinvalid node

filled entry slot

AA1AR AL

filled entry slotvacant entry slot

B CLC DCR

RRA

B C D

A

B C D CL CR

A A1

B C D CL CR

AAL AR

B C D CL CR

AR

AL AR

B C D CL CR

AR

AL AR

B C D CL CR

11NVRAMOS 2008 – April 25, 2008. ([email protected])

B C DB C D CL CRB C D CL CRB C D CL CRB C D CL CRB C D CL CR

Page 12: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

Analysis (1)Analysis (1)Analysis (1)Analysis (1)The Cost of Operationsp• When there is no garbage collection:

12NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 13: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

Analysis (2)Analysis (2)Analysis (2)Analysis (2)

B+-Tree μ-Tree

Max # of entries pernode at l -th level

The total # of records indexed withrecords indexed with

a height h tree

The height of treeThe height of tree needed for indexing

n records

13NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 14: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

Analysis (3)Analysis (3)Analysis (3)Analysis (3)

a billiona million Height B+-Tree µ-Tree

6

7

e

B+TreeB+Tree with ceiling

Treeμ

1 512 512

2 256K 64K

4

5

t o

f th

e tr

ee

-Treeμ-Tree with ceilingμ

3 128M 4M

4 64G 128M

2

3

the

hei

gh 4 64G 128M

5 32T 2G

1 2 3 4 5 6 7 8 9 100

1

log10(the total number of records)

6 16P 16G

7 8E 64G

14NVRAMOS 2008 – April 25, 2008. ([email protected])

log10(the total number of records)

Page 15: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

CachingCachingCachingCachingPage-level cache vs. Node-level cacheg

Page-level cache

LRUhead

33 54 32 57dd

valid nodeinvalid node

Node-level cache

33 54 32 57page addr.

d

page or nodeon flash

level 2LRUhead

level 1 level 1 level 1

33 33 54 32

page or nodewhich will be written

level 1 level 1 level 1 level 1

31 57 55 58

15NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 16: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

Evaluation (1)Evaluation (1)Evaluation (1)Evaluation (1)Simulator• MLC NAND (Samsung K9GAG08U0M-P)

– Flash size: 64 – 256MB– 4KB page, 512KB block– Read latency: 165.6 μs

W it l t 905 8– Write latency: 905.8 μs– Erase latency: 1500 μs

• Fanout f = 512• Fanout f = 512– 4-byte key– 4-byte pointery p

16NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 17: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

Evaluation (2)Evaluation (2)Evaluation (2)Evaluation (2)Microbenchmarks• Total 1 million records (height = 3)• 10,000 retrievals, deletions, and insertions, , ,• 64MB, No cache

Costs for garbage collection Average # of read Average # of write

3.00 3.00

4.00

3.29

4.02

3.303.75 3.77

Costs for garbage collection Average # of read Average # of write

1.07 1.08

0.00 0.00

B+Tree μ-Tree B+Tree μ-Tree B+Tree μ-Tree

17NVRAMOS 2008 – April 25, 2008. ([email protected])

Retrieval Insertion Deletion

Page 18: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

Evaluation (3)Evaluation (3)Evaluation (3)Evaluation (3)Traces from Real Workloads

kernel compile

postmark

mp3TPC-C mp3 ID3 tag

ReiserFSLinux MySQL

p mp3

B+Tree B+Tree

Trace Retrieval Insertion Deletion Initial State

ReiserFSfs_kernel 2,274,867 260,974 236,027 0fs_postmark 4,617,494 574,148 391,847 0fs_mp3 512,986 223,188 23,950 0

MySQLdb_tpcc 2,283 1,419 0 4,805,268db_mp3_io 0 5,992 4,280 0

18NVRAMOS 2008 – April 25, 2008. ([email protected])

db_mp3_rand 1,712 0 0 1,712

Page 19: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

Evaluation (4)Evaluation (4)Evaluation (4)Evaluation (4)Real Workload Results• No cache• 64MB (256MB in db tpcc)( _ p )

1

1.2

d tim

e read write erase

18% 28% 20% 78% 42% 0%

0.6

0.8

e elapsed

ree = 1)

0.2

0.4

e relative

(B+Tr

0

B+Tree μ‐Tree B+Tree μ‐Tree B+Tree μ‐Tree B+Tree μ‐Tree B+Tree μ‐Tree B+Tree μ‐Tree

The

19NVRAMOS 2008 – April 25, 2008. ([email protected])

fs_kernel fs_postmark fs_mp3 db_tpcc db_mp3_io db_mp3_rand

Page 20: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

Evaluation (5)Evaluation (5)Evaluation (5)Evaluation (5)Effects of Cachingg

6000fs_postmark

4000

5000

6000

time(sec) read

writeerase

13%

64%

1000

2000

3000

 elapsed

 t erase64%

0

B+Tree

Tree(P)

ree(N)

B+Tree

Tree(P)

ree(N)

B+Tree

Tree(P)

ree(N)

B+Tree

Tree(P)

ree(N)

B+Tree

Tree(P)

ree(N)

B+Tree

Tree(P)

ree(N)

B+Tree

Tree(P)

ree(N)

The total

B

μ‐T

μ‐Tr B

μ‐T

μ‐Tr B

μ‐T

μ‐Tr B

μ‐T

μ‐Tr B

μ‐T

μ‐Tr B

μ‐T

μ‐Tr B

μ‐T

μ‐Tr

0KB 4KB 8KB 16KB 32KB 64KB 128KB

20NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 21: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

Evaluation (6)Evaluation (6)Evaluation (6)Evaluation (6)Garbage Collection Costsg• Total 1 million records (height = 3)

– Actual tree size: 11MB (B+Tree), 22MB (μ-Tree)

• 10,000 insertions without caching

C f b ll i A # f d A # f i

5.82

3 78 4.00 3 47

5.11

3 75

Costs for garbage collection Average # of read Average # of write

3.78 4.00 3.29 3.47 3.13

1.20

3.75

1.07

3.35

1.03

B+Tree μ-Tree B+Tree μ-Tree B+Tree μ-Tree

Insertion / 48MB Insertion / 64MB Insertion / 80MB

21NVRAMOS 2008 – April 25, 2008. ([email protected])

Insertion / 48MB Insertion / 64MB Insertion / 80MB

Page 22: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

SummarySummarySummarySummaryμ-Treeμ• A variant of B+Tree.• The sum of the size of nodes in a path from the root to p

the leaf is constant.• The size and the position of a node in each level do not

change after height increase or decrease except for the root node and its children.

• Only the direct ancestors can be stored in the upper levels, if any.

• Less writes, slightly more reads.

22NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 23: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μFSμFSμμ

Page 24: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μFS (1)μFS (1)μFS (1)μFS (1)μFS Requirementsμ q• A flash-aware file system for portable multimedia

devices (mp3 players, digital camcorders, etc.)

• MLC NAND support• High-performance metadata operations

– Lookup, unlink, unlink all, random seek, truncate, …

• Real-time support– For audio/video recorders (MP3, MP4, SD/HD, etc.)– HD: 2MB/s min.

• Power-off recovery

24NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 25: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μFS (2)μFS (2)μFS (2)μFS (2)Index Structures in File Systemsy• Directory

– File name Inode number – Linear array, B+Tree (or variants), ...– Local vs. Global

• File structureInode File contents– Inode File contents

– FAT, Block pointers, B+Tree (or variants), ...– Fixed vs. Variable

25NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 26: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μFS Architecture (1)μFS Architecture (1)μFS Architecture (1)μFS Architecture (1)µ-Tree for Directoriesµ• Directory tree key

1 Parent FilenameMSB

looking up “/dir/file”

• Directory page

1 inode # hashMSB

µ-Tree

Directory pageJournaling info.

Current inode # (.) . = 3= 3

. = 12= 3Parent inode # (..)

Hash Offset

Hash Offset

.. = 3

hash(“dir”)

.. = 3

hash(“file”)

... ...

... ... ...

Header Inode # Filename

#12, “dir”

#27, “file”

26NVRAMOS 2008 – April 25, 2008. ([email protected])

Header Inode # Filename

Page 27: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μFS Architecture (2)μFS Architecture (2)μFS Architecture (2)μFS Architecture (2)µ-Tree for Filesµ• File tree key

0 Inode # OffsetMSB

seeking “/dir/file” at offset 32768

• Directory page

0 Inode # OffsetMSB

µ-Tree

Directory pageJournaling info.

Inode StatusInode 27

Inode Status

Offset PageAddress Length

0, P, 2

2, Q, 6 File Data

Page R

... ... … Extentsinfo.

8, R, 4

12, S, 8

27NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 28: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μFS Architecture (3)μFS Architecture (3)μFS Architecture (3)μFS Architecture (3)Logical Layoutg y

File Leaf Page

type : dirp inode# : 4

type : inodeInode# : 9

ACL, mode, etc…

Dir Leaf PageFile Leaf Page

0, 256, 64…

1024, 8192, 21026, 8196, 6

p_inode# : 45, “abc”9, “def”

ACL, mode, etc…

Super Bit Dir & File Directory Inode Fil d tSuper block µ-Bitmaps Dir & File

TreeDirectory

PagesInodePages File data area

28NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 29: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μFS PerformanceμFS PerformanceμFS PerformanceμFS PerformanceWrite Bandwidth• Write 3GB (average file size: 4.44MB)

120

100

μFS(SYNC) μFS(10M SYNC) μFS(Delayed SYNC) TFS4

dwid

th

4521.97 KB/s

60

80

rite

ban

d

20

40

f fl

ash

wr

0

20

1K 4K 8K 16K 32K 64K 128K 256K 512K 1024K

% o

f

29NVRAMOS 2008 – April 25, 2008. ([email protected])

Write Buffer Size

Page 30: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μ-FTLμ-FTLμμ

Page 31: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

Page MappingPage MappingPage MappingPage MappingCharacteristics• Each table entry maps one page• Large memory footprintg y p• Efficient handling of small writes

ABC

01 F

ED Data

Block

2345

…GH

Free

Write to the Logical Page

Number(LPN) 2 G’

Invalid

67

Page Mapping

FreeFreeFree

Extra Block

Number(LPN) 2

31NVRAMOS 2008 – April 25, 2008. ([email protected])

Flash MemoryTable

Page 32: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

Log Block SchemeLog Block SchemeLog Block SchemeLog Block SchemeLog Blockg• A temporary storage for small writes• Incremental updates from the first pagep p g

ABC

01 F

E

CD

Write to the LPN 2Repeat

FGH

Free

Block Mapping Table LogBlock

G’01

FreeFree

Free

Free

G123

Page Mapping Table

G’’G’’’

32NVRAMOS 2008 – April 25, 2008. ([email protected])

Freeg pp g

Flash Memory

Page 33: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μ-FTL (1)μ-FTL (1)μ FTL (1)μ FTL (1)Index Structure in FTL• LPN <PBN, PPN>

– LPN: Logical Page Number– PBN: Physical Block Number– PPN: Physical Page Number

T bl (li ) i id l d• Table (linear array) is most widely used.• Fixed mapping granularity: Page or Block

• Typical write patterns in real workloadsSmall and random– Small and random

– Large and sequential

33NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 34: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μ-FTL (2)μ-FTL (2)μ FTL (2)μ FTL (2)Examplep

34NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 35: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μ-FTL (3)μ-FTL (3)μ FTL (3)μ FTL (3)Extent Updatesp

35NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 36: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μ-FTL Performance (1)μ-FTL Performance (1)μ FTL Performance (1)μ FTL Performance (1)Garbage Collection Overheadg

250 3000Logblock

150

200

second

s)

2000

2500

econ

ds)

FASTSuperblockDACμ‐FTL

100

Overhead(s

1000

1500

Overhead(se μ FTL

0

50

1% 2% 3% 4% 5%

O

0

500

1% 2% 3% 4% 5%O

1% 2% 3% 4% 5%

Extra blocks

1% 2% 3% 4% 5%

Extra blocks

36NVRAMOS 2008 – April 25, 2008. ([email protected])

MOV_8GB (16KB+44KB+4KB) WEB_32GB (64KB+160KB+32KB)

Page 37: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μ-FTL Performance (2)μ-FTL Performance (2)μ FTL Performance (2)μ FTL Performance (2)Extent Distribution (GENERAL 32GB)( )

Installed applications

Unused MFT zone

Temporary Internet files

Downloaded movies

System files

Unused MFT zone

37NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 38: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

μ-FTL Performance (3)μ-FTL Performance (3)μ FTL Performance (3)μ FTL Performance (3)Extent Distribution (MOV 8GB)( )

38NVRAMOS 2008 – April 25, 2008. ([email protected])

Page 39: An Efficient Index Structure for NAND Flash Memory and It ... · B+Tree (1) Characteristics • Disk-based index structure • Used in most file systems and DBMSs • f-ary balanced

ConclusionConclusionConclusionConclusionμ-Treeμ• A flash-aware index structure

μFS• A flash-aware file system based on μ-Tree• A flash aware file system based on μ Tree

μ FTLμ-FTL• A flash translation layer based on μ-Tree

39NVRAMOS 2008 – April 25, 2008. ([email protected])