application managed flash (fast'16)

33
Application-Managed Flash Sungjin Lee *, Ming Liu, Sangwoo Jun, Shuotao Xu, Jihong Kim and Arvind *Inha University Massachusetts Institute of Technology Seoul National University Operating System Support for Next Generation Large Scale NVRAM (NVRAMOS) October 20-21, 2016 (Presented at USENIX FAST ‘16)

Upload: vankiet

Post on 13-Feb-2017

234 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Application Managed Flash (FAST'16)

Application-Managed Flash

Sungjin Lee*, Ming Liu, Sangwoo Jun, Shuotao Xu, Jihong Kim† and Arvind

*Inha University

Massachusetts Institute of Technology†Seoul National University

Operating System Support for Next Generation Large Scale NVRAM (NVRAMOS)

October 20-21, 2016

(Presented at USENIX FAST ‘16)

Page 2: Application Managed Flash (FAST'16)

NAND Flash and FTL

2

NAND flash SSDs have become the preferred storage devices in consumer electronics and datacenters

FTL plays an important role in flash management

The principal virtue of FTL is providing interoperability with the existing block I/O abstraction

NAND Flash

Databases File-systems KV Store

Flash Device

Host System

Block I/O Layer

Overwriting

restriction

Limited P/E

cyclesBad blocks

Asymmetric I/O

operationFlash Translation Layer (FTL)

Page 3: Application Managed Flash (FAST'16)

FTL is a Complex Piece of Software

3

Requires significant hardware resources (e.g., 4 CPUs / 1-4 GB DRAM)

Incurs extra I/Os for flash management (e.g., GC)

Badly affects the behaviors of host applications

FTL runs complicated firmware algorithms to avoid in-place

updates and manages unreliable NAND substrates

Flash Translation Layer (FTL)

NAND Flash

Databases File-systems KV Store

Address

Remapping

Garbage

Collection

I/O

Scheduling

Wear-leveling &

Bad-blockFlash Device

Host System

Block I/O Layer

But, FTL is a root of evil in terms of HW resources and performance

Page 4: Application Managed Flash (FAST'16)

Existing Approach

4

Improving FTL itself

Better logical-to-physical address mapping and garbage collection algorithms

Limited optimization due to the lack of information

Optimizing FTL with custom interface

Delivering system-level information to FTL for better optimization (e.g., file

access pattern, hint when to trigger GC, and stream ID, …)

Special interfaces, hard for standardization, more functions added to FTL

Offloading host functions into FTL

Moving some part of a file system to FTL (e.g., nameless writes and object-

based flash storage)

More hardware resources and greater storage design complexity

Many efforts have been made to put more functions to flash

storage devices

Page 5: Application Managed Flash (FAST'16)

Databases File-systems KV Store …

However,

Functionality of FTL is Mostly Useless

5

Many host applications manage underlying storage in a log-like

manner, mostly avoiding in-place updates

NAND Flash

Log-structured Host Applications

Block I/O Layer

Flash Device

Host SystemObject-to-storage

Remapping

Versioning &

CleaningI/O Scheduling …

Flash Translation Layer (FTL)

Address

Remapping

Garbage

Collection

I/O

Scheduling

Wear-leveling &

Bad-block…

This duplicate management not only (1) incurs serious performance

penalties but also (2) wastes hardware resources

Address

Remapping

Garbage

Collection

I/O

Scheduling

Wear-leveling &

Bad-block…

Flash Translation Layer (FTL)

Duplicate Management

Page 6: Application Managed Flash (FAST'16)

Which Applications???

6

F2FS

WAFL

Btrfs

NILFS

RethinkDBLevelDBRocksDB

FlexVol

BlueSky

LogBase

Hyder

SpriteLFS

BigTable

MongoDB

Cassandra

HDFS

Which

Applications

???LSM-Tree

File Systems

Key-value Stores Databases

Storage Virtualization

Page 7: Application Managed Flash (FAST'16)

Question:

What if we removed FTL from storage

devices and allowed host applications to

directly manage NAND flash?

Page 8: Application Managed Flash (FAST'16)

Application-Managed Flash (AMF)

8

Host Applications (Log-structured)

Block I/O Layer

Flash Device

Host SystemObject-to-storage

Remapping

Versioning &

CleaningI/O Scheduling …

NAND Flash

Flash Translation Layer (FTL)

Address

Remapping

Garbage

Collection

I/O

Scheduling

Wear-leveling &

Bad-block…

NAND Flash

Light-weight Flash Translation Layer

(2) The host runs almost all of the complicated algorithms- Reuse existing algorithms to manage storage devices

(1) The device runs essential device management algorithms- Manages unreliable NAND flash and hides internal storage architectures

AMF Block I/O Layer (AMF I/O)

(3) A new AMF block I/O abstraction enables us to separate

the roles of the host and the device

Log-structured Host Applications

Object-to-storage

Remapping

Versioning &

CleaningI/O Scheduling ……

Page 9: Application Managed Flash (FAST'16)

AMF Block I/O Abstraction (AMF I/O)

9

AMF I/O is similar to a conventional block I/O interface

A linear array of fixed-size sectors (e.g., 4 KB) with existing

I/O primitives (e.g., READ and WRITE)

Host Applications

A logical layout exposed to applications Sector (4KB)READ and WRITE

AMF Block I/O LayerHost System

Flash Device …

Minimize changes in existing host applications

Page 10: Application Managed Flash (FAST'16)

Append-only Segment

10

Segment: a group of 4 KB sectors (e.g., several MB)

A unit of free-space allocation and free-space reclamation

Append-only: overwrite of data is prohibited

Host Applications

Host SystemAMF Block I/O Layer

Flash Device …

Segment (MB)

Appending new data (WRITE)Overwrite TRIMAppending

Only sequential writes with no in-place updates

→ Minimize the functionality of the FTL

Page 11: Application Managed Flash (FAST'16)

Case Study with AMF

11

F2FS

WAFL

Btrfs

NILFS

RethinkDBLevelDBRocksDB

FlexVol

BlueSky

LogBase

Hyder

SpriteLFS

BigTable

MongoDB

Cassandra

HDFS

LSM-Tree

File Systems

Key-value Stores Databases

Storage Virtualization

Which

Applications

???

Page 12: Application Managed Flash (FAST'16)

Case Study with File System

12

Host Applications (Log-structured)

AMF Block I/O Layer

Object-to-storage

Remapping

Versioning &

CleaningI/O Scheduling ……

NAND Flash

AMF Log-structured File System (ALFS)(based on F2FS)

Host System

Flash Device AMF Flash Translation Layer (AFTL)

Segment-level Address

Remapping

Wear-leveling &

Bad-block

Page 13: Application Managed Flash (FAST'16)

0 4000 8000 12000 16000

ALFS F2FS

<A comparison of source-code lines of F2FS and ALFS>

AMF Log-structured File System (ALFS)

13

ALFS is based on the F2FS file system

How did we modify F2FS for ALFS?

Eliminate in-place updates

F2FS overwrites check-points and inode-map blocks

Change the TRIM policy

TRIM is issued to individual sectors

How many new codes were added?

1300 lines

Page 14: Application Managed Flash (FAST'16)

How Conventional LFS (F2FS) Works

14

LFS

PFTL

Check-Point

Segment

Inode-Map

Segment

Data

Segment 0

Data

Segment 1

Data

Segment 2Segment

Block with 2 pages

* PFTL: page-level FTL

Page 15: Application Managed Flash (FAST'16)

How Conventional LFS (F2FS) Works

15

LFS

CP

Check-Point

Segment

Inode-Map

Segment

CP

Data

Segment 0

Data

Segment 1

Data

Segment 2

IM

#0A B C D E B F GCP

IM

#0B

Invalid

Check-point and inode-map blocks are overwritten

CP

PFTL

* PFTL: page-level FTL

Page 16: Application Managed Flash (FAST'16)

How Conventional LFS (F2FS) Works

16

LFS

CP

Check-Point

Segment

Inode-Map

Segment

CP

Data

Segment 0

Data

Segment 1

Data

Segment 2

IM

#0A B C D E B F GCP

IM

#0B

IM

#0A B C D E F GBCP CP CP

IM

#0

The FTL appends incoming data to NAND flash

PFTL

* PFTL: page-level FTL

Page 17: Application Managed Flash (FAST'16)

How Conventional LFS (F2FS) Works

17

LFS

CP

Check-Point

Segment

Inode-Map

Segment

CP

Data

Segment 0

Data

Segment 1

Data

Segment 2

IM

#0A B C D E B F GCP

IM

#0B

IM

#0A B C D E F GBCP CP CP

IM

#0A C D E

The FTL triggers garbage collection

A C D E

: 4 page copies and 4 block erasures

PFTL

* PFTL: page-level FTL

Page 18: Application Managed Flash (FAST'16)

How Conventional LFS (F2FS) Works

18

LFS

CP

Check-Point

Segment

Inode-Map

Segment

CP

Data

Segment 0

Data

Segment 1

Data

Segment 2

IM

#0A B C D E B F GCP

IM

#0B

IM

#0A B C D E F GBCP CP CP

IM

#0A C D EA C D E

The LFS triggers garbage collection

A C DA C D

A C DDA C

TRIM

: 3 page copies

PFTL

* PFTL: page-level FTL

Page 19: Application Managed Flash (FAST'16)

How ALFS Works

19

ALFS

AFTL

Check-Point

Segment

Inode-Map

Segment

Data

Segment 0

Data

Segment 1

Data

Segment 2Segment

Segment with 2 flash blocks

Page 20: Application Managed Flash (FAST'16)

How ALFS Works

20

ALFS

AFTL

Check-Point

Segment

Inode-Map

Segment

Data

Segment 0

Data

Segment 1

Data

Segment 2

A B C D E B F GCPIM

#0CP

IM

#0BCPCP

IM

#0CP

CP A B C DIM

#0CP E B F G

IM

#0CP

No in-place updates

No obsolete pages – GC is not necessary

Page 21: Application Managed Flash (FAST'16)

How ALFS Works

21

ALFS

AFTL

Check-Point

Segment

Inode-Map

Segment

CPIM

#0CP

IM

#0CPCP

IM

#0CP

CP A B C DIM

#0CP E B F G

IM

#0CP

Data

Segment 0

Data

Segment 1

Data

Segment 2

A B C D E B F GB A C DA C D

A B C D

TRIM

A C D

The ALFS triggers garbage collection : 3 page copies and 2 block erasures

Page 22: Application Managed Flash (FAST'16)

Comparison of F2FS and AMF

22

F2FS AMF

File System PFTL File System

3 page copies 4 copies + 4 erasures 3 copies + 2 erasures

7 copies + 4 erasures 3 copies + 2 erasures

Duplicate Management

Page 23: Application Managed Flash (FAST'16)

Experimental Setup

23

Implemented ALFS and AFTL in the Linux kernel 3.13

Compared AMF with different file-systems Two file-systems: EXT4 and F2FS with page-level FTL (PFTL)

Ran all of them in our in-house SSD platform BlueDBM developed by MIT

Page 24: Application Managed Flash (FAST'16)

Performance with FIO

For random writes, AMF shows better throughput

F2FS is badly affected by the duplicate management problem

24

Page 25: Application Managed Flash (FAST'16)

Performance with Databases

AMF outperforms EXT4 with more advanced GC policies

F2FS shows the worst performance

25

Page 26: Application Managed Flash (FAST'16)

Erasure Counts

26

AMF achieves 6% and 37% better lifetimes than EXT4 and

F2FS, respectively, on average

Page 27: Application Managed Flash (FAST'16)

Resource (DRAM & CPU)

27

FTL mapping table size

Host CPU usage

SSD Capacity Block-level FTL Hybrid FTL Page-level FTL AMF

512 GB 4 MB 96 MB 512 MB 4 MB

1 TB 8 MB 186 MB 1 GB 8 MB

Page 28: Application Managed Flash (FAST'16)

Conclusion

28

We proposed the Application-Managed Flash (AMF)

architecture.

AMF was based on a new block I/O interface, called AMF IO, which

exposed flash storage as append-only segments

Based on AMF IO, we implemented a new FTL scheme (AFTL) and a

new file system (ALFS) in the Linux kernel and evaluated them using our

in-house SSD prototype

Our results showed that DRAM in the flash controller was reduced by

128X and performance was improved by 80%

Future Work

We are doing case studies with key-value stores, database systems, and

storage virtualization platforms

Page 29: Application Managed Flash (FAST'16)

Discussion

29

Hardware Implementation of AFTL

Smaller segment size

Open-Channel SSDs vs AMF

Page 30: Application Managed Flash (FAST'16)

Hardware Implementation of AFTL

30

Implement pure hardware-based FTL in FPGA that

support the basic functions of AFTL

Expose block I/O interfaces to host

Segment-level remapping

Dynamic wear-leveling

Bad-block management

It is still a proof-of-concept prototype

But, it strongly shows that CPU-less and DRAM-less flash

storage could be a promising design choice

Page 31: Application Managed Flash (FAST'16)

Smaller Segments

31

ALFS shows good performance with smaller segments

F2FS and ALFS(small) are with 2MB segments

The segment size of ALFS increase in proportional to channel and way #

EXT4

F2FS(FTL)

F2FS(FS)

ALFS

ALFS(Small)

Page 32: Application Managed Flash (FAST'16)

Open-Channel SSDs vs AMF

32

Two different approaches are based on similar ideas

The main difference is a level of abstraction

AMF still maintains block I/O abstraction

AMF respects the unreliable NAND management by FTL

AMF allows SSD vendors to hide the details of their SSDs

AMF requires small modification on the host kernel side

AMF exhibits better data persistency and reliability

Page 33: Application Managed Flash (FAST'16)

Source Code

33

All of the software/hardware is being developed under

the GPL license

Please refer to our Git repositories

Hardware Platform: https://github.com/sangwoojun/bluedbm.git

FTL: https://github.com/chamdoo/bdbm_drv.git

File-System: https://bitbucket.org/chamdoo/risa-f2fs

Thank you!