system co-design and data management for flash devices ... · ibm research, switzerland stratis d....

127
1 System Co-Design and Data Management for Flash Devices VLDB’2011 Philippe Bonnet, ITU, Denmark Luc Bouganim, INRIA, France Ioannis Koltsidas IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom

Upload: others

Post on 19-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

1

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

System Co-Design and Data Management for Flash Devices

VLDB’2011

Philippe Bonnet, ITU, Denmark

Luc Bouganim, INRIA, France

Ioannis Koltsidas IBM Research, Switzerland

Stratis D. Viglas University of Edinburgh, United Kingdom

Page 2: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

2

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Flash Devices (SSD)

Why Bother? Disk is disk

~650 mio units shipped in 2010

IO don't matter

CPU is the critical resource

PCM is coming

100x faster 10 mio write cycles

[Papandreou et al., IMW 2011]

Just a SATA drive

I can readily plug in flash devices in my server. What is the big deal?

Page 3: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

3

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Some Trends ... 2000 2010

HDD Capacity

HDD IOPS

200 GB 2 TB

200 200

SSD Capacity

SSD IOPS

14 GB (2001) 256 GB

HDD GB/$ 0,05

SSD GB/$ 3 x10E-4 0,5

30

10E6+ (PCIe) 5x10E3+ (SATA)

10E3 (SCSI)

x1 x600 x10

x20 x1000 x1000

PCM Capacity PCM IOPS 10E6+ (1 chip)

2x10E5 cells, 4 bits/cell

Page 4: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

4

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

… and a Fact

Flash-based SSDs do nothing well! They offer high throughput at low energy consumption.

[Tsorigiannis et al. 2010]

Page 5: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

5

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

SSD-based Systems With more than 1,000 stores, Danish Supermarket group is one of Denmark’s largest retailers. To help keep up with customer needs, the company manages more than 10 terabytes of business intelligence data.

Neteeza Twin-fin Oracle Exadata

Database Appliances SSD-based blades Scaled up

Super Micro 6026

Scaled down

Amdahl blade [Szalay et al., 2009]

IOs matter. Systems are being designed and commercialized for efficient data management for flash devices.

Page 6: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

6

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Block Device SSDs and HDDs provide the same memory abstraction: a block device interface

Figure courtesy of Koschaak and Saltzer

ERASE (address)

Page 7: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

7

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Strong Modularity SSDs and HDDs provide the same memory abstraction: a block device interface

application => There should be no impact on application (e.g., DBMS) ?

Page 8: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

8

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Design Assumptions

Controller

read/write head

disk arm

tracks

platter

spindle

actuator

disk interface

=> Actually DBMS design very much based on disk characteristics: (1) locality in the logical space preserved in the physical space, (2) sequential access is faster than random access.

Page-based IO quantization;

Identical representation In memory and on disk

Random accesses are avoided

Sequential accesses are favored: Extent-based

allocation, clustering

Write-ahead logging; Physiological logging

Page 9: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

9

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

How do flash devices impact DBMS design? (Bottom-up) We need to understand flash devices a bit better.

If they exhibit stable properties

=> Design principles for data management

If they do not exhibit stable properties

=> How to tackle the increased complexity?

(Top-down) We make assumptions about the behaviour of flash devices, and we design adapted DBMS components. We then need to make sure that (at least some) flash devices actually fit our assumptions.

Page 10: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

10

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Tutorial Outline

1. Introduction (Philippe)

2. Flash devices characteristics (Luc)

3. Data management for flash devices (Stratis)

4. Two outlooks (Stratis & Philippe)

Page 11: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

11

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

A short motivating story (1)

•! Alice, Bob, Charlie and Dave want to measure the performance of a given data intensive algorithm for flash devices…

•! They use different strategies but start from the same IO traces of that algorithm and own an MTRON and 2 identical INTEL X25-M SSDs.

IO Traces RW(2000, 2.0, 8000) SR(2000, 16.0) RW(500, 2.0, 8000) RW(500, 2.0, 8000) RR(100, 4.0, 8000) …

Algorithm X

Same model Same firmware

Never used

Used

Page 12: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

12

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

•! Alice believes in datasheets. She builds a simple SSD simulator configured with basic SSD performance numbers.

•! She takes the SSD performance numbers from the datasheet and runs the simulator using the traces….

•! Bob, does not believe in datasheets. He runs simple tests on both SSDs to obtain the basic performance numbers…He then runs Alice’s simulator on the traces with his numbers

Configuration File IOS SR RR SW RW 1 70 87 51 9023 2 81 98 64 8723 4 104 122 85 8686 8 150 167 129 8682

A short motivating story (2): Alice & Bob

IO Traces RW(2000, 2.0, 8000) SR(2000, 16.0) RW(500, 2.0, 8000) RW(500, 2.0, 8000) RR(100, 4.0, 8000)

Mtron Datasheet

Results Simulator

Page 13: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

13

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

A short motivating story (3): Charlie & Dave

•! Charlie, does not believe in Bob! He is more cautious and runs long tests on the same SSDs and obtain his own basic performance numbers. Then, he proceeds as Bob.

•! Dave does not like simulation and runs the traces directly on the SSDs.

What is your take on the resulting measures?

IO Traces RW(2000, 2.0, 8000) SR(2000, 16.0) RW(500, 2.0, 8000) RW(500, 2.0, 8000) RR(100, 4.0, 8000)

Page 14: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

14

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

A short motivating story (4): Results

! " #! #" $! $" %! %" &'(

)*+,- &./0/'1--0( 2345&'+67*-55

,/*+48/93:(5;1/8*+-5&*3:< ,/*+48/93:( =/>-5&8?:53:

@ABCD( !

!E" #

#E" $

$E" &'(

2345&'+67*- ,/*+48/93:( ;1/8*+-5&*3:<

,/*+48/93:( =/>-5&B?:53: ?'-.5F$"( =/>-5&B?:53:

:-G5F$"(

INTEL X25 MTRON Used

Never used

•! Mtron and Intel devices behave differently

•! Identical Intel devices behave differently

! Confidence in performance measurements is very low!

•! Modeling flash devices seems difficult •! What about designing algorithms for flash devices ?

"! e.g., database systems, operating systems, applications ?

Page 15: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

15

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Outline of the first part of this tutorial

Goal: understand the impact of flash memory on software (DBMS) design and vice-versa

•!We study flash chips, explaining their constraints and trends

•!We then consider flash devices as black boxes and try to understand their performance behavior (uFLIP). Goal: Find a simple model, basis for a DBMS design

•!We hit a wall with the black box approach # we open the box, i.e., the FTL, and look at FTL techniques.

•! Finally, we propose an alternative to complex FTLs, better adapted for DBMS design.

Page 16: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

16

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

The Good

NAND Flash chip performance! •!A single flash chip offers great performance

"! e.g., 40 MB/s Read, 10 MB/s Program "! Random access is as fast as sequential access "! Low energy consumption

Page 17: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

17

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

The Bad

The severe constraints of NAND flash chips! •! C1: Program granularity:

"! Program must be performed at flash page granularity (2KB-16KB)

•! C2: Must erase a block before updating a page (256 KB-1MB) •! C3: Pages must be programmed sequentially within a block •! C4: Limited lifetime (from 104 up to 105 erase operations)

Program granularity: a page (32 KB) Pagess must be

programmed sequentially

within the block (256 pages)

Erase granularity: a block (1 MB)

Page 18: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

18

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Flash chips BY

A bit of electronic to understand flash chip constraints and trends

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Page 19: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

19

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Flash cells

Control Gate

Floating Gate

P substrate N+ N+

Oxide Layer

Flash cell: a floating gate transistor

•! Flash cell: resembles a semiconductor transistor "! 2 gates instead of 1 "! Floating gate insulated all around by an oxide layer

•! Electrons placed on the floating gate are trapped

•! The floating gate will not discharge for many years

Page 20: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

20

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Flash cells: NOR vs NAND

NAND "! Slower read (Page) "! Quicker prog. (Page) "! Quicker erase (Block) "! Files, data

NOR "! Quick read (Byte) "! Slow prog. (Byte) "! Slow erase "! XIP # Code

Page 21: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

21

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

•! Programming: Apply a high voltage to the control gate # electrons get trapped in the floating gate

•! Erasing: Apply a high voltage to the substrate # electrons are removed from the floating gate

NAND Flash cells mode of operation

Programming

20 V

0 V 0 V

Wear out cell Erasing

0 V

20 V 20 V

•! Reading: the charge changes the threshold voltage of the cell "! Single level cell (SLC) store one bit per cell: charged = 0, not charged = 1 "! Multi level cell (MLC) store 2 bits per cell (4 levels)

•! After a number of program/erase cycle, electrons are getting trapped in the oxyde layer # End of life of the cell

Page 22: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

22

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

NAND Architecture & timings

•! Based upon independent blocks (4 Mio cells here)

•! Block: smallest erasable unit

•! Page: smallest programmable unit

Geometry & Timings

NAND flash MICRON MLC: MT29F128G08CJABB 34560 bits/page (4 KB + 224 B)

256 pages/block

1 page

1 flash cell

MLC Page Size 4 KB Block Size 1 MB Chip Size 16 GB Read Page (µs) 150 Program Page (µs) 1000 Erase Block (µs) 3000

Floating gate

Control gate

Page 23: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

23

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Program Disturb

•! Some cells not being programmed receive elevated voltage stress (near the cells being programmed)

•! Stressed cells can appear weakly programmed

Reducing program disturb: •!Use Error Correction Code to recover errors •!Program page sequentially within a block Cooke (FMS 2007)

Page 24: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

24

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Impact on flash chip IOs

•!Flash cell technology ! Limited lifetime for entire blocks (when a cell wear out,

the entire block is marked as failed).

•!NAND Layout and structure !Block is the smallest erase granularity

•!Program Disturb ! Page is the smallest program granularity (! for SLC) ! Pages must me programmed sequentially within a block ! Use of ECC is mandatory # ECC unit is the smallest

read unit (generally 1 or ! page)

Page 25: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

25

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Flash chips: trends

•! Density increases (price decreases) "! NAND process migration: faster than Moore’s Law (today 20 nm) "! More bits/cell:

–! SLC (1), MLC (2), TLC (3)

•! Flash chip layout and structure: larger, parallel "! Larger blocks (32 # 256 Pages) "! Larger pages: 512 B (old SLC) # 16KB (future TLC) "! Dual plane Flash # parallelism within the flash chip

•! Lifetime decreases "! 100 000 (SLC), 10 000 (MLC), 5000 (TLC)

•! ECC size increases

•! Basic performance decreases "! Compensated by parallelism

Abraham (FMS 2011), StorageSearch.com

Page 26: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

26

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Outline of the first part of this tutorial

Goal: understand the impact of flash memory on software (DBMS) design and vice-versa

•!We study flash chips, explaining their constraints and trends

•!We then consider flash devices as black boxes and try to understand their performance behavior (uFLIP)

•!We hit a wall with the black box approach # we open the box, i.e., the FTL, and look at FTL techniques

•! Finally, we propose an alternative to complex FTLs, better adapted for DBMS design

Page 27: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

27

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

The Good

The hardware! •! A single flash chip offers great performance

"! e.g., 40 MB/s Read, 10 MB/s Program "! Random access is as fast as sequential access "! Low energy consumption

•! A flash device contains many (e.g., 32, 64) flash chips and provides inter-chips parallelism

•! Flash devices may include some (power-failure resistant) SRAM

Page 28: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

28

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

The Bad

The severe constraints of flash chips! •! C1: Program granularity:

"! Program must be performed at flash page granularity

•! C2: Must erase a block before updating a page •! C3: Pages must be programmed sequentially within a block •! C4: Limited lifetime (from 104 up to 106 erase operations)

Program granularity: a page (32 KB) Pagess must be

programmed sequentially

within the block (256 pages)

Erase granularity: a block (1 MB)

Page 29: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

29

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

The software!, the Flash Translation Layer "!emulates a classical block device and

handle flash constraints

And The FTL

SSD

Write sector

Read sector

No constraint!

Flash chips

Read page

Program page

Erase block

Constraints

(C1) Program granularity

(C2) Erase before prog.

(C3) Sequential program within a block

(C4) Limited lifetime

MAPPING

GARBAGE COLLECTION

WEAR LEVELING

FTL

Page 30: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

30

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

DBMS

Flash devices are black boxes! •! Flash devices are not flash chips

"! Do not behave as the flash chip they contain "! No access to the flash chip API but only through the device API "! Complex architecture and software, proprietary and not documented

#! Flash devices are black boxes ! #! DBMS design cannot be based on flash chip behavior!

We need to understand flash devices behavior!

SSD

Flash chips

Read page

Program page

Erase block

Constraints

(C1) Program granularity

(C2) Erase before prog.

(C3) Sequential program within a block

(C4) Limited lifetime

Write sector

Read sector

No constraint!

MAPPING

GARBAGE COLLECTION

WEAR LEVELING

FTL ?

Page 31: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

31

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Understanding flash devices behavior

•!Define an experimental benchmark which can exhibit the behavior of flash devices.

•!Define a broad benchmark "! No safe assumption can be made on the device behavior (black box)

–! e.g., Random writes are expensive… "! No safe assumption on the benchmark usage!

•!Design a sound benchmarking methodology "! IO cost is highly variable and depends on the whole device history!

Page 32: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

32

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Methodology (1): Device state

Random Writes – Samsung SSD Out of the box

! Enforce a well-defined device state "! performing random write IOs of random size on the whole device "! The alternative, sequential IOs, is less stable, thus more difficult to enforce

Random Writes – Samsung SSD After filling the device

Page 33: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

33

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Methodology (2): Startup and running phases •!When do we reach a steady state? How long to run each test?

Startup and running phases for the Mtron SSD (RW)

Running phase for the Kingston DTI flash Drive (SW)

!! Startup and running phase: Run experiments to define "! IOIgnore: Number of IOs ignored when computing statistics "! IOCount: Number of measures to allow for convergence of those statistics.

Page 34: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

34

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Methodology (3): Interferences

! Interferences: Introduce a pause between experiments

0.1

1

10

0 250 500 750 1000 1250 1500

Sequential Reads Random Writes

Pause

Sequential Reads

Page 35: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

35

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Results (1): Samsung, memoright, Mtron

Locality for the Samsung, Memoright and Mtron SSDs

•! When limited to a focused area, RW performs very well

•! For SR, SW and RR, "! linear behavior, almost no latency "! good throughputs with large IO Size

•! For RW, !5ms for a 16KB-128KB IO

Granularity for the Memoright SSD

Page 36: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

36

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Results (2): Intel X25-E

RW (16 KB) performance varies from 100 µs to

100 ms!! (x 1000)

SR, SW and RW have similar performance.

RR are more costly!

IO size (KB)

Response time (µs)

Response time (µs)

Page 37: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

37

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Results (3): Fusion IO

•!Capacity vs Performance tradeoff (80 GB # 22 GB!) •!Sensitivity to device state

!"

#!"

$!!"

$#!"

%!!"

%#!"

&'()'*" &'(+,-./" " "

"

"

"

"

Low level formatted

Response time (µs)

"

"

"

"

"

"

" " " "

01"

11"

0+"

1+"

Fully written

!"

#!"

$!!"

$#!"

%!!"

%#!"

&'()'*" &'(+,-./" &'()'*" &'(+,-./"

01"

11"

0+"

1+"

IO Size = 4KB

Page 38: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

38

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Conclusion: Flash device behavior Finally, what is the behavior of flash devices? Common wisdom

$!Update in place are inefficient? $!Random writes are slower than sequential ones? $!Better not filling the whole device if we want good performance?

!Behavior varies across devices and firmware updates

!Behavior depends heavily on the device state!

Is it a problem ?

Page 39: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

39

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Conclusion: Flash device behavior (2)

•! Flash devices are difficult (impossible?) to model!

•! Hard to build DBMS design on such a moving ground!

Bill Nesheim: Mythbusting Flash Performance

•! Substantial performance variability "! Some cases can be even worse than disk

•! Performance outliers can have significant adverse impact

•!What’s Needed: –! Predictable scaling & performance over time –! Less asymmetry between reads/writes, random/sequential –! Predictable response time

(FMS 2011)

Page 40: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

40

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Outline of the first part of this tutorial

Goal: understand the impact of flash memory on software (DBMS) design and vice-versa

•!We study flash chips, explaining their constraints and trends

•!We then consider flash devices as black boxes and try to understand their performance behavior (uFLIP)

•!We hit a wall with the black box approach # we open the box, i.e., the FTL, and look at FTL techniques

•! Finally, we propose an alternative to complex FTLs, better adapted for DBMS design

Page 41: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

41

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Opening the black box !

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Page 42: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

42

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

FTL – Basic components

SSD

Write sector

Read sector

No constraint!

Flash chips

Read page

Program page

Erase block

Constraints

(C1) Program granularity

(C2) Erase before prog.

(C3) Sequential program within a block

(C4) Limited lifetime

MAPPING

GARBAGE COLLECTION

WEAR LEVELING

FTL

Page 43: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

43

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

FTL – Page Level Mapping •! Basic page level mapping: translation table stored in SRAM

Block 0 Block 1 Block 2 Block 3

Logical

Physical

Translation blocks

Data blocks

Cached Mapping Table

SRAM

Flash

Global Translation Directory

•! Demand-base FTL: DFTL (Gupta et al. 2009) "! The translation table is stored in Flash and cached in SRAM

"! Problem: the table is too large ! (1 GB for 1 TB flash) (4KB pages)

Page 44: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

44

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

FTL - Mapping: Block Level / Hybrid

•! Pure Block Level Mapping "! Translation table at block level "! The page offset remains the same "! Does not comply with C3!

Block 0 Block 1 Block 2 Block 3

Logical

Physical

•! Hybrid Mapping "! Updates done out-of-place in log blocks "! Data blocks # block mapping "! Log blocks # page mapping "! Proposals differ in the way log blocks are managed

–! 1 log block for 1 data block # BAST (Kim et al. 2002) –! n log blocks for all data blocks # FAST (Lee et al. 2007) –! Exploiting locality # LAST (Lee et al. 2008)

"! Cleaning when log blocks are exhausted # Major costs "! Block mapping for data blocks does not either comply with C3!

Page 45: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

45

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

FTL – Garbage Collection

•!With page mapping:

Block 1 Block 2 Block 3 !! !! !! !! !! !!

Block 1 Block 2 Block 3 !! !! !! !! !! !!

Erased

!!

Blo

ck 0

!!

!!

Log(

Blo

ck0)

!!

!! !! !! !!

New

Blo

ck0

Erase Erase

Full Merge Partial Merge Switch

!! B

lock

0 !!

Log(

Blo

ck0)

!!

Erase

!! !!

Blo

ck 0

Log(

Blo

ck0)

!!

Erase

!!

!!

•!With hybrid mapping: three cases with BAST

New block 0

•! More complex with FAST "! pages of the same block can be on different log blocks

Page 46: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

46

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

FTL-Wear leveling

•! Goal: ensure that all blocks of the flash have about the same erase count (i.e., number of program/erase cycle).

•! Basic algorithm: hot-cold swapping (Jung et al. 2007) "! Swap the blocks with min and max erase count.

•! Difficulties: (1)!When to trigger the WL algorithm (2)!How to manage erase count, how to select min or max erase count block wrt

the limited CPU and memory resources of the flash controler (3)!What wear leveling strategy? (4)! Interactions between Garbage Collection and Wear Leveling

•! The same difficulties arise with garbage collection!

Page 47: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

47

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

MAPPING

GARBAGE COLLECTION

WEAR LEVELING

FTL: Trends

Hybrid mapping

Consider hot/cold data

Detect sequential or semi-random

writes

Temporal/spatial locality?

Background/on demand

Compression /deduplication Security /

encryption

Dynamic / static WL

Caching

Adaptivity

TRIM management

Page 48: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

48

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

FTL designers vs DBMS designers goals •! Flash device designers goals:

"! Hide the flash device constraints (usability) "! Improve the performance for most common workloads "! Make the device auto-adaptive "! Mask design decision to protect their advantage (black box approach)

•! DBMS designers goals: "! Have a model for IO performance (and behavior)

–! Predictable –! Clear distinction between efficient and inefficient IO patterns

! To design the storage model and query processing/optimization strategies "! Reach best performance, even at the price of higher complexity (having

a full control on actual IOs)

These goals are conflicting!

Page 49: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

49

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Outline of the first part of this tutorial

Goal: understand the impact of flash memory on software (DBMS) design and vice-versa

•!We study flash chips, explaining their constraints and trends

•!We then consider flash devices as black boxes and try to understand their performance behavior (uFLIP)

•!We hit a wall with the black box approach # we open the box, i.e., the FTL, and look at FTL techniques

•! Finally, we propose an alternative to complex FTLs, better adapted for DBMS design

Page 50: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

50

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Minimal FTL: Take the FTL out of equation!

FTL provides only wear leveling, using block mapping to address C4 (limited lifetime)

•! Pros "! Maximal performance for

–! SR, RR, SW –! Semi-Random Writes

"! Maximal control for the DBMS

•! Cons "! All complexity is handled

by the DBMS "! All IOs must follow C1-C3 –! The whole DBMS must

be rewritten –! The flash device is

dedicated

Flash chips

Block mapping, Wear Leveling (C4)

DBMS Constrained Patterns only

(C1, C2, C3) (C1) Write granularity (C2) Erase before prog. (C3) Sequential prog. within a block

(C4) Limited lifetime

Min

imal

flas

h de

vice

Page 51: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

51

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Semi-random writes (uFLIP [CIDR09]) •! Inter-blocks : Random •! Intra-block : Sequential •! Example with 3 blocks of 10 pages:

!"

#"

$!"

$#"

%!"

%#"

&!"

0 10 11 1 20 21 22 2 23 24 12 3 13 14 4 25 26 15 5 16 27 6 7 17 18 19 28 8 29 9

IO address

time

Page 52: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

52

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Bimodal FTL: a simple idea … •!Bimodal Flash Devices:

"! Provide a tunnel for those IOs that respect constraints C1-C3 ensuring maximal performance

"! Manage other unconstrained IOs in best effort "! Minimize interferences between these two modes of operation

•! Pros "! Flexible "! Maximal performance and

control for the DBMS for constrained IOs

•! Cons "! No behavior guarantees for

unconstrained IOs.

Flash chips

Block map., Wear Leveling (C4)

DBMS unconstrained constr. patterns patterns (C1, C2, C3)

(C1) Program granularity (C2) Erase before prog. (C3) Sequential prog. within a block (C4) Limited lifetime

Bim

odal

flas

h de

vice

Page map., Garb. Coll. (C1, C2, C3)

Page 53: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

53

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Bimodal FTL: easy to implement •! Constrained IOs lead to optimal blocks

•! Optimal blocks can be trivially "! mapped using a small map table in safe cache "! detected using a flag and cursor in safe cache

•! No interferences!

•! No change to the block device interface: "! Need to expose two constants: block size and page size

16 MB for a 1TB device

Page 0 Page 1 Page 2 Page 3 Page 4 Page 5

Flag = Optimal

CurPos=6

Page 0 Page 1 Page 1’ Page 1’’ Page 0’ Page 2

Flag = Non-Optimal

CurPos=6

Page 54: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

54

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Bimodal FTL: better than Minimal + FTL Free

(CurPos = 0)

Optimal

TRIM

Garbage collector actions

Write at @ ! CurPos

Write at @ CurPos++

Write at @ CurPos++

TRIM

Non optimal

•! Non-optimal block can become optimal (thanks to GC)

Page 0’ Page 1’’ Page 2

Flag = Optimal

CurPos=3

Page 0 Page 1 Page 1’ Page 1’’ Page 0’ Page 2

Flag = Non-Optimal

CurPos=6

Page 55: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

55

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Impact on DBMS Design Using bimodal flash devices, we have a solid basis

for designing efficient DBMS on flash:

•!What IOs should be constrained? "! i.e., what part of the DBMS should be redesigned?

•! How to enforce these constraints? Revisit literature: "! Solutions based on flash chip behavior enforce C1-C3 constraints "! Solutions based on existing classes of devices might not.

Page 56: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

56

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Example: Hash Join on HDD

Tradeoff: IOSize vs Memory consumption

•! IOSize should be as large as possible, e.g., 256KB – 1 MB "! To minimize IO cost when writing or reading partitions

•! IOSize should be as small as possible "! To minimize memory consumption: One pass partitioning needs

2 x IOSize x NbPartitions in RAM "! Insufficient memory # multi-pass # performance degrades!

One pass partitioning Multi-pass partitioning (2 passes)

Page 57: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

57

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Hash join on SSD and on bimodal SSD

•!With non bimodal SSDs "! No behavior guarantees but… "! Choosing IOSize = Block size (256 KB – 1MB) should bring good performance

•!With bimodal SSDs "! Maximal performance are guaranteed (constrained patterns) "! Use semi-random writes "! IOSize can be reduced up to page size (4 – 16 KB) with no penalty !!Memory savings !!Performance improvement !!Predictability!

Page 58: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

58

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Summary •! Flash chips

"! Performance & Energy consumption "! Wired in parallel in flash devices

•! Hardware constraints! (C1) Program granularity, (C2) Erase before program, (C3) Sequential program within a block, (C4) Limited lifetime

•! FTL: a complex piece of sofware "! Constantly evolving, no common behavior "! Hard to model "! Hard to build a consistent DBMS design!

Page 59: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

59

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Simple FTLs

HW Constraints

Bimodal

Predictable & Optimal

Stable Design

Complex FTLs

HW Constraints

Complex FTLs

Unpredictable performance

No stable design

Conclusion: DBMS Design ?

•! Adding bimodality does not hinder competition between flash device manufacturers, they can "! bring down the cost of constrained IO patterns (e.g., using parallelism) "! bring down the cost of unconstrained IO patterns without jeopardizing DBMS

design

Page 60: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

60

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Tutorial Outline

1. Introduction (Philippe)

2. Flash devices characteristics (Luc)

3. Data management for flash devices (Stratis)

4. Two outlooks (Stratis & Philippe)

Page 61: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

!"#$%&'#'()$*+,-./'0/1%23"345'

%!6*(/*$'37'8#42)0+(/'9/:/*',/*73*8#21/'0%#2';<<''

%! =3>',3>/*'132$+8,-32'

%! <*3,,)24',*)1/$'

!! ?(/#&'@%*3>'#>#5';<<$'#2('*/,"#1/'/./*50%)24'>)0%'!"#$%'AA<$'!! B30'/23+4%'1#,#1)05'

!! B30'/23+4%'832/5'03'9+5'0%/'230C/23+4%C1#,#1)05'

!! ;3>/./*D'!"#$%'E0$'./*5'>/""'9/0>//2'<FGH'#2(';<<'!! <FGHI!"#$%I;<<',*)1/'*#-3&'JKLL&KL&K',/*'MN'

!! <FGHI!"#$%I;<<'"#0/215'*#-3&'JK&KL&KLL'

!! ?20/4*#0/'!"#$%')203'0%/'$03*#4/'%)/*#*1%5'!! 138,"/8/20'/O)$-24'<FGH'8/83*5'#2(';<<$'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 61

Page 62: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

6+0")2/'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 62

!! !"#$%C9#$/('(/.)1/'(/$)42'!! A3")('$0#0/'(*)./$'

!! H#P)24'AA<$'(#0#9#$/C7*)/2("5'

!! A5$0/8C"/./"'1%#""/24/$'!! ;59*)('$5$0/8$'

!! A03*#4/D'9+Q/*)24'#2('1#1%)24'

!! ?2(/O)24'32'R#$%'

!! S+/*5'#2('0*#2$#1-32',*31/$$)24'

Page 63: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

6+0")2/'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 63

!! !"#$%C9#$/('(/.)1/'(/$)42'!! A3")('$0#0/'(*)./$'

!! H#P)24'AA<$'(#0#9#$/C7*)/2("5'

!! A5$0/8C"/./"'1%#""/24/$'!! ;59*)('$5$0/8$'

!! A03*#4/D'9+Q/*)24'#2('1#1%)24'

!! ?2(/O)24'32'R#$%'

!! S+/*5'#2('0*#2$#1-32',*31/$$)24'

Page 64: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

!"#$%C9#$/('A3")('A0#0/'<*)./$'!! T38832'?I6')20/*7#1/'

!! N"31PC#((*/$$#9"/')20/*7#1/'

!! B3'8/1%#2)1#"'"#0/215'!! G11/$$'"#0/215')2(/,/2(/20'37'0%/'#11/$$',#:/*2'!! UL'03'VL'-8/$'83*/'/W1)/20')2'?6XAIY',/*'MN'0%#2';<<$'

!! F/#('I'Z*)0/'#$588/0*5'!! F/#($'#*/'7#$0/*'0%#2'>*)0/$'

!! [*#$/C9/73*/C>*)0/'")8)0#-32'

!! =)8)0/('/2(+*#21/'I'>/#*'"/./")24''!! V'5/#*'>#**#205'73*'/20/*,*)$/'AA<$'\#$$+8)24'KL'138,"/0/'*/C>*)0/$',/*'(#5]'

!! [2/*45'/W1)/215'!! KLL'^'_LL'-8/$'83*/'/W1)/20'0%#2';<<$')2'?6XA'I'Z#:'

!! X%5$)1#"',*3,/*-/$'!! F/$)$0#21/'03'/O0*/8/'$%31PD'.)9*#-32D'0/8,/*#0+*/D'#"-0+(/'!! B/#*C)2$0#20'$0#*0C+,'-8/'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 64

Page 65: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

AA<'#*1%)0/10+*/'!! `#*)3+$'73*8'7#103*$'

!! XT?C/'

!! AGA'\Kabc'^'UaVc]'

!! AG@G'\Kabc'^'UaVc]'

!! B+89/*'37'1%#22/"$'!! d'03'Ke'3*'83*/'

!! FGH'9+Q/*$'!! KHN'+,'03'83*/'0%#2'_VeHN'

!! 6./*C,*3.)$)32)24'!! KLf'+,'03'dLf'

!! T388#2('X#*#""/")$8'!! ?20*#C1388#2('

!! ?20/*C1388#2('

!! F/3*(/*)24'I'8/*4)24'\BTS]'

!"#$%&#'()!*&+,(-#&.+*&/0.(1&2'#(3(!-14(

556(7#8,"9'89:#'(

M#*9#4/'T3""/103*'

Z/#*'=/./")24'

=NGC03CXNG'8#,,/*'

Z*)0/'X#4/'G""31#03*'

F/g+/$0';#2("/*' =NGC03CXNG'H#,'

!*//'N"31P'S+/+/'

N#('N"31P'=)$0'

H/0#'<#0#'T#1%/'

;3$0'?20/*7#1/'

H)1*3C',*31/$$3*'

T%#22/"'T320*3""/*'

!"#$%'T%),'

FGH'

<#0#'N+Q/*'

T%#22/"'T320*3""/*'

!"#$%'T%),'

h'

h'

!"#$%'T%),'

!"#$%'T%),'

h'

[TT'

[TT'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 65

Page 66: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

6QC0%/C$%/"7'AA<$'

)5 25 ;5 =5 H5

I3865I/,038' XG@G'<*)./' AG@G'<*)./' AG@G'<*)./' AGA'<*)./' XT?C/'1#*('

I*/'15;1+7'' H=T' H=T' H=T' A=T' A=T'

;/7/,+0J' U_'MN' KLL'MN' KeLMN' KdLMN' dVL'MN'

B-/.52/:.G+.01'

VU'HNI$' _bV'HNI$' _VL'HNI$' __L'HNI$' iLL'HNI$'

K8+0-52/:.G+.01'

_b'HNI$' _VL'HNI$' KLL'HNI$' KKV'HNI$' VLL'HNI$'

B/:.365LM25B-/.5NCOP'

UaVP' ULP' UVP' dVP' KdLP'

B/:.365LM25K8+0-5NCOP'

LaLKP' KLP' LaeP' KeP' iLP'

P08--05O8+,-Q'J'KV'YIMN'\_LLi]'

J'd'YIMN'\_LKL]'

J'_aV'YIMN'\_LKL]'

JKb'YIMN'\_LKK]'

J'Ub'YIMN'\_LLj]'

H:0-878+'-5H:0-878+'-5;3:'?6-85;3:'?6-85;3:'?6-85

J'K'3*(/*'37'8#42)0+(/'

k'_'3*(/*$'37'8#42)0+(/'

KVP'FXH'AGA';<<&'J$"!R%!!'?6XA'ia_P'FXH'AG@G';<<&'JS!'?6XA'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 66

Page 67: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

F/#('"#0/215'

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1

0 20000 40000 60000 80000 100000 120000 140000 160000

Late

ncy

(ms)

IOPS

B

C

D

E

d'lN'F#2(38'F/#($'+2)73*8"5'()$0*)9+0/('3./*'0%/'>%3"/'8/()+8'VLf'*#2(38'(#0#'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 67

Page 68: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

Z*)0/'"#0/215'd'lN'F#2(38'Z*)0/$'+2)73*8"5'()$0*)9+0/('3./*'0%/'>%3"/'8/()+8'

VLf'*#2(38'(#0#'

0

0,5

1

1,5

2

2,5

3

0 5000 10000 15000 20000 25000 30000 35000 40000 45000

Late

ncy

(ms)

IOPS

B C D E

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 68

Page 69: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

H)O/('>3*P"3#('^'F/#('"#0/215'd'lN'?I6'3,/*#-32$'+2)73*8"5'()$0*)9+0/('3./*'0%/'>%3"/'8/()+8'

VLf'*#2(38'(#0#D'S+/+/'(/,0%'m'U_'

L'

LDV'

K'

KDV'

_'

_DV'

U'

UDV'

d'

Lf' KLf' _Lf' ULf' dLf' VLf' eLf' iLf' bLf' jLf' KLLf'

B-'73:'-5A+6-5&6

'(5

T53U5K8+0-'5

)>-8/<-5B-/.5V/0-:,J5

N' T' <' ['

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 69

Page 70: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

H)O/('>3*P"3#('^'Z*)0/'"#0/215'd'lN'?I6'3,/*#-32$'+2)73*8"5'()$0*)9+0/('3./*'0%/'>%3"/'8/()+8'

VLf'*#2(38'(#0#D'S+/+/'(/,0%'m'U_'

L'

_'

d'

e'

b'

KL'

K_'

Lf' KLf' _Lf' ULf' dLf' VLf' eLf' iLf' bLf' jLf' KLLf'

B-'73:'-5A+6-5&6

'(5

T53U5K8+0-'5

)>-8/<-5K8+0-5V/0-:,J5

N' T' <' ['

L'

LDK'

LD_'

LDU'

LDd'

LDV'

LDe'

Lf' KLf' _Lf' ULf' dLf' VLf' eLf' iLf' bLf' jLf' KLLf'

B-'73:'-5A+6-5&6

'(5

T53U5K8+0-'5

)>-8/<-5K8+0-5V/0-:,J5

['

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 70

Page 71: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

6+0")2/'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 71

!! !"#$%C9#$/('(/.)1/'(/$)42'!! A3")('$0#0/'(*)./$'

!! H#P)24'AA<$'(#0#9#$/C7*)/2("5'

!! A5$0/8C"/./"'1%#""/24/$'!! ;59*)('$5$0/8$'

!! A03*#4/D'9+Q/*)24'#2('1#1%)24'

!! ?2(/O)24'32'R#$%'

!! S+/*5'#2('0*#2$#1-32',*31/$$)24'

Page 72: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

?2C,#4/'"344)24'

!! X#4/'+,(#0/$'#*/'"344/('

!! =34'$/103*'73*'/#1%'<N',#4/'!! G""31#0/('>%/2',#4/'9/138/'()*05'

!! =34'*/4)32')2'/#1%'R#$%'9"31P'

!! X#4/'>*)0/C9#1P$'32"5')2.3"./'"34C$/103*'>*)0/$'!! n2-"'#'8/*4/')$'*/g+)*/('

!! n,32'*/#(&'!! !/01%'"34'*/13*($'7*38'R#$%''!! G,,"5'0%/8'03'0%/')2C8/83*5',#4/'

!! A#8/'3*'83*/'2+89/*'37'>*)0/$'!! 2?0D'$)42)E1#20'*/(+1-32'37'/*#$+*/$'

o=//'p'H332D'A?MH6<'_LLiq'

;3>/./*&'^! @%/'<NHA'2//($'03'1320*3"',%5$)1#"',"#1/8/20''''

^! X#*-#"'R#$%',#4/'>*)0/$'#*/')2.3"./('

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 72

Page 73: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

X#*#""/")$8r'

!! A-""D'$38/'3,/*#-32$'#*/'83*/'/W1)/20'32'%#*(>#*/'

!! H#,,)24'37'0%/'#((*/$$'$,#1/'03'R#$%',"#2/$D'()/$'#2('1%#22/"$'

!! [TTD'/21*5,-32'/01a'

!! Z/#*C"/./")24'$-""'2//($'03'9/'(32/'95'0%/'(/.)1/'E*8>#*/'

!! @%/')20/*2#"'(/.)1/'4/38/0*5')$'1*)-1#"'03'#1%)/./'8#O)8+8',#*#""/")$8'

!! @%/'<NHA'2//($'03'9/'#>#*/'37'0%/'4/38/0*5'03'$38/'(/4*//''

T%#22/"'T320*3""/*'

!"#$%'T%),'

T%#22/"'T320*3""/*'

!"#$%'T%),'

h'h'

!"#$%'T%),'

!"#$%'T%),'

h'

[TT'

[TT'

L=34)1#"'9"31P'

K _ U

LK

_U

V3<+,/*54*3,M5'+W-5X5I*/'154*3,M5'+W-5

=34)1#"'9"31P'$)s/'*/"/.#20'03'2+89/*'37'1%#22/"$D',),/")2)24'1#,#9)")-/$'32'/#1%'1%#22/"D'/01a'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 73

Page 74: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

AA<$'^'$+88#*5'

!! !"#$%'8/83*5'%#$'0%/',30/2-#"'03'*/83./'0%/'?I6'93:"/2/1P'

!! [$,/1)#""5'73*'*/#(C(38)2#0/('>3*P"3#($'

!! tAA<c&'8+"-,"/'1"#$$/$'37'(/.)1/$'

!! [O1/""/20'*#2(38'*/#('"#0/215')$'+2)./*$#"'

!! F/#('#2('>*)0/'9#2(>)(0%'.#*)/$'>)(/"5'

!! <*#8#-1'()Q/*/21/'#1*3$$'*#2(38'>*)0/'"#0/21)/$'

!! <*#8#-1'()Q/*/21/$')2'0/*8$'37'/13238)1$&'YIMN'13$0D',3>/*'132$+8,-32D'/O,/10/('")7/-8/D'*/")#9)")05'

!! G'"30'37'*/$/#*1%'03'9/'(32/'03>#*($'(/E2)24'<NHAC$,/1)E1')20/*7#1/$'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 74

Page 75: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

6+0")2/'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 75

!! !"#$%C9#$/('(/.)1/'(/$)42'!! A3")('$0#0/'(*)./$'

!! H#P)24'AA<$'(#0#9#$/C7*)/2("5'

!! A5$0/8C"/./"'1%#""/24/$'!! ;59*)('$5$0/8$'

!! A03*#4/D'9+Q/*)24'#2('1#1%)24'

!! ?2(/O)24'32'R#$%'

!! S+/*5'#2('0*#2$#1-32',*31/$$)24'

Page 76: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

A03*#4/D'9+Q/*)24'#2('1#1%)24'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

AA<',/*$)$0/20'$03*#4/'

;<<',/*$)$0/20'$03*#4/'

9+Q/*',33"'

(/8#2(',#4)24'

AA<'1#1%/'

/.)1-32'

76

Page 77: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

!"#$%'8/83*5'73*',/*$)$0/20'$03*#4/'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

AA<',/*$)$0/20'$03*#4/'

;<<',/*$)$0/20'$03*#4/'

9+Q/*',33"'

AA<'1#1%/'

77

Page 78: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

;59*)('$03*#4/'"#5/*'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

AA<',/*$)$0/20'$03*#4/'

;<<',/*$)$0/20'$03*#4/'

9+Q/*',33"'

AA<'1#1%/'

78

Page 79: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

!"#$%'8/83*5'#$'1#1%/'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

AA<',/*$)$0/20'$03*#4/'

;<<',/*$)$0/20'$03*#4/'

9+Q/*',33"'

AA<'1#1%/'

79

Page 80: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

6+0")2/'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 80

!! !"#$%C9#$/('(/.)1/'(/$)42'!! A3")('$0#0/'(*)./$'

!! H#P)24'AA<$'(#0#9#$/C7*)/2("5'

!! A5$0/8C"/./"'1%#""/24/$'!! ;59*)('$5$0/8$'

!! A03*#4/D'9+Q/*)24'#2('1#1%)24'

!! ?2(/O)24'32'R#$%'

!! S+/*5'#2('0*#2$#1-32',*31/$$)24'

Page 81: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

;59*)('$5$0/8$'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! X*39"/8'$/0+,'!! AA<$'#*/'9/138)24'13$0C/Q/1-./D'9+0'$-""'230'*/#(5'03'*/,"#1/';<<$'

)2'0%/'/20/*,*)$/'!! T/*0#)2"5'1321/).#9"/'03'%#./'930%'AA<$'#2(';<<$'#0'0%/'$#8/'"/./"'

37'0%/'$03*#4/'%)/*#*1%5'

!! F/$/#*1%'g+/$-32$'!! ;3>'1#2'>/'0#P/'#(.#20#4/'37'0%/'AA<'1%#*#10/*)$-1$'>%/2'

(/$)42)24'#'(#0#9#$/r'!! ;3>'1#2'>/'3,-8#""5',"#1/'(#0#'#1*3$$'930%'05,/$'37'8/()+8r'

!! H/0%3(3"34)/$'!! Z3*P"3#('(/0/1-32'73*'(#0#',"#1/8/20'!! =3#('9#"#21)24'03'8)2)8)s/'*/$,32$/'-8/'!! T#1%)24'(#0#'9/0>//2'()$P$'

81

Page 82: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

!"#$%&'()%*'$(

Z3*P"3#(C(*)./2',#4/',"#1/8/20'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! !"#$%'8/83*5'#2(';<<'#0'0%/'$#8/'"/./"'37'0%/'$03*#4/'%)/*#*1%5'

!! H32)03*',#4/'+$/'95'P//,)24'0*#1P'37'*/#($'#2('>*)0/$'!! =34)1#"'3,/*#-32$'\)a/aD'*/7/*/21/$'

32"5]'!! X%5$)1#"'3,/*#-32$'\#10+#""5'

03+1%)24'0%/'()$P]'!! ;59*)('83(/"'\"34)1#"'3,/*#-32$'

8#2)7/$0/('#$',%5$)1#"'32/$]'

!! ?(/2-75'0%/'>3*P"3#('37'#',#4/'#2('#,,*3,*)#0/"5',"#1/')0'!! F/#(C)20/2$)./',#4/$'32'R#$%'!! Z*)0/C)20/2$)./',#4/$'32';<<'!! H)4*#0/',#4/$'>%/2'0%/5'%#./'

/O,/2$/('0%/)*'13$0')7'/**32/3+$"5',"#1/('

+%,-( .//(GPP=5

GY/'15

F/#(I>*)0/'3,/*#-32'

_C$0#0/'0#$P'$5$0/8'

8Y/'15&GY/'1(5 8PP=5&GPP=(5

n$/*C"/./"',#4/'?I6'

N+Q/*'8#2#4/*'

A03*#4/'8#2#4/*'

F/,"#1/8/20',3")15'

AA<' ;<<'X#4/'8)4*#-32$'

0#&12%)(345(#6'$%7#8,(

9':;',",(<#$(6-*,12%)(345(

=-*,12%)(345(#6'$%7#8,(

ol'p' D̀''`=<N'_LLbq'

82

Page 83: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

69u/10',"#1/8/20'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! ;59*)('()$P'$/0+,'!! 6v)2/'033"'

!! 6,-8#"'39u/10'#""31#-32'#1*3$$'0%/'0>3'05,/$'37'()$P'

!! @>3',%#$/$'!! X*3E")24&'$0#*0'>)0%'#""'39u/10$'

32'0%/';<<'#2('832)03*'$5$0/8'+$/'

!! </1)$)32&'9#$/('32',*3E")24'$0#-$-1$'/$-8#0/',/*73*8#21/'4#)2/('7*38'83.)24'/#1%'39u/10'7*38'0%/';<<'03'0%/'AA<'

!! F/(+1/'0%/'(/1)$)32'03'#'P2#,$#1P',*39"/8'#2('#,,"5'4*//(5'%/+*)$-1$'

!! ?8,"/8/20/(')2'<N_'

<#0#9#$/'/24)2/'

N+Q/*',33

"'832

)03*'

A03*#4/'$5$0/8'

69u/10',"#1/8/20'#(.)$3*'

>3*P"3#('(/.)1/'

,#*#8/0/*$'

F/#(I>*)0/$'

='$<#$>%82'(&%18(?,@(

!!/(A;B&'"(?C@(

!!/( .//(

D;"E#F(6#18"(

oT#2)8D'H)%#)"#D'N%#:#1%#*u//D'F3$$'p'=#24D'`=<N'_LLjq'

83

Page 84: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

Z*)0/'1#1%)24'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! AA<'73*',*)8#*5'$03*#4/D'#+O)")#*5';<<''!! @#P/'#(.#20#4/'37'9/:/*';<<'>*)0/',/*73*8#21/'03'/O0/2('AA<'")7/-8/'#2(')8,*3./'>*)0/'0%*3+4%,+0'

!! Z*)0/$'#*/',+$%/('03'0%/';<<''!! =34'$0*+10+*/'/2$+*/$'$/g+/2-#"'>*)0/$'!! !)O/('"34'$)s/'!! 621/'"34')$'7+""'8/*4/'>*)0/$'9#1P'03'0%/'AA<'

oA3+2(#*#*#u#2D'X*#9%#P#*#2D'N#"#P*)$%2#2D'p'Z399/*D'!GA@'_LKLq'o;3""3>#5'X%<'@%/$)$D'nZCH#()$32D'_LLjq'

>*)0/'

AA<'

;<<C*/$)(/20'"34'

*/#(' 8/*4/'

84

Page 85: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

=3#('9#"#21)24'03'8#O)8)s/'0%*3+4%,+0'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! A/w24'132$)$0$'37'#'0*#2$#1-32',*31/$$)24'$5$0/8'>)0%'930%'05,/$'37'()$P'

!! 69u/1-./')$'03'9#"#21/'0%/'"3#('#1*3$$'8/()#'!! G1%)/./('>%/2'0%/'*/$,32$/'-8/$'#1*3$$'8/()#'#*/'/g+#"D')a/aD'#'

Z#*(*3,'/g+)")9*)+8'!! G"43*)0%8$'03'#1%)/./'0%)$'/g+)")9*)+8''

!! X#4/'1"#$$)E1#-32'\%30'3*'13"(]'!! X#4/'#""31#-32'#2('8)4*#-32'

A03*#4/'8#'

N:.+8-,056/77+:<50/4*-5

0#&12%)(%BB$G(

/'H12'( =-*,12%)(%BB$G(

;<<_'

AA<_'

6,/*#-32'*/()*/103*'

;30I13"('(#0#'1"#$$)E/*'

</.)1/',/*73*8#21/'832)03*'

<#0#'\*/]"31#03*'

X3")15'132E4+*#-32'

1#1%/'!"#$%&'(>%8%&'>'8"()%*'$(

oZ+'p'F/((5D''HGAT6@A'_LKLq'

85

Page 86: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

6+0")2/'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 86

!! !"#$%C9#$/('(/.)1/'(/$)42'!! A3")('$0#0/'(*)./$'

!! H#P)24'AA<$'(#0#9#$/C7*)/2("5'

!! A5$0/8C"/./"'1%#""/24/$'!! ;59*)('$5$0/8$'

!! A03*#4/D'9+Q/*)24'#2('1#1%)24'

!! ?2(/O)24'32'R#$%'

!! S+/*5'#2('0*#2$#1-32',*31/$$)24'

Page 87: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

N+Q/*)24')2'8#)2'8/83*5'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! X*39"/8'$/0+,'!! !"#$%'8/83*5')$'+$/('73*',/*$)$0/20'$03*#4/'

!! @5,)1#"'32C(/8#2(',#4)24'

!! F/$/#*1%'g+/$-32$'!! Z%)1%',#4/$'(3'>/'9+Q/*r'

!! Z%)1%',#4/$'(3'>/'/.)10'#2('>%/2r'

!! H/0%3(3"34)/$'!! !"#$%'8/83*5'$)s/'#")428/20'

!! T3$0C9#$/('*/,"#1/8/20'

!! Z*)0/'$1%/(+")24'

87

Page 88: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

N"31P',#(()24'=Fn'\NX=Fn]'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! H#2#4/$'0%/'32'()$P'FGH'9+Q/*'

!! <#0#'9"31P$'#*/'3*4#2)s/('#0'/*#$/C+2)0'4*#2+"#*)05'!! =Fn'g+/+/')$'32'(#0#'9"31P$'

!! 62'*/7/*/21/D'83./'0%/'/2-*/'9"31P'03'0%/'%/#('37'0%/'g+/+/'

!! 62'/.)1-32D'$/g+/2-#""5'>*)0/'0%/'/2-*/'9"31P'

e' L'

K' d'

V'

j'

KK'

N"P'L'N"P'_' N"P'U' N"P'K'

I9J(A)#2K( 09J(A)#2K(

j'

KK'

e' d'

V'

L'

K'

N"P'_'N"P'U' N"P'L' N"P'K'

0#&12%)(,'2"#$(LL($'<'$'82'B(

M127>(A)#2KN(()#&12%)(,'2"#$,(OP(Q(R$1S'8(

ol)8'p'G%2D'!GA@'_LLbq'

88

Page 89: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

NX=Fn&'!+*0%/*'3,-8)s#-32$'!! n$/'37',#(()24'

!! ?7'#'(#0#'9"31P'03'9/'>*):/2'%#$'230'9//2'7+""5'*/#(D'*/#('>%#0x$'8)$$)24'#2('>*)0/'$/g+/2-#""5'

!! =Fn'138,/2$#-32'!! A/g+/2-#""5'>*):/2'9"31P$'#*/'83./('03'0%/'/2('37'0%/'=Fn'g+/+/'

!! =/#$0'")P/"5'03'9/'>*):/2')2'0%/'7+0+*/'

e'

b'

i'

j'

KK'

ol)8'p'G%2D'!GA@'_LLbq'

*/#('

>*)0/'

!@='*/#($'8)$$)24'$/103*$'#2('*/,"#1/$'(#0#'9"31P')2'32/'$/g+/2-#"'>*)0/'

e'

i'

b'

L'

K' d'

V'

j'

KK'

N"P'L'N"P'_' N"P'U' N"P'K'

I9J(A)#2K( 09J

(A)#2K(

L'

K'

j'

KK'

e'

i'

e'

d'

V'

N"P'U'N"P'L' N"P'K' N"P'_'Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 89

Page 90: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

T3$0C9#$/('*/,"#1/8/20'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! T%3)1/'37'.)1-8'(/,/2($'32',*39#9)")05'37'*/7/*/21/'\#$'+$+#"]'

!! N+0'0%/'/.)1-32'13$0')$'230'+2)73*8'!! T"/#2',#4/$'9/#*'23'>*)0/'13$0D'()*05',#4/$'*/$+"0')2'#'>*)0/'

!! ?I6'#$588/0*5&'>*)0/$'83*/'/O,/2$)./'0%#2'*/#($'

!! ?0'(3/$2x0'%+*0')7'>/'8)$/$-8#0/'0%/'%/#0'37'#',#4/'!! A3'"324'#$'>/'$#./'\/O,/2$)./]'>*)0/$'

!! l/5')(/#&'1389)2/'=FnC9#$/('*/,"#1/8/20'>)0%'13$0C9#$/('#"43*)0%8$'!! G,,")1#9"/'930%')2'AA<C32"5'#$'>/""'#$'%59*)('$5$0/8$'

90

Page 91: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

D)'%8ET$,"($'&1#8(U#$K18&($'&1#8(

T"/#2'E*$0'=Fn'\T!=Fn]'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! N+Q/*',33"'().)(/(')203'0>3'*/4)32$'

!! Z3*P)24'*/4)32&'9+$)2/$$'#$'+$+#"'

!! T"/#2CE*$0'*/4)32&'1#2()(#0/$'73*'/.)1-32'

!! B+89/*'37'1#2()(#0/$')$'1#""/('0%/'>)2(3>'$)s/'Z'

!! G">#5$'/.)10'7*38'1"/#2CE*$0'*/4)32'

!! [.)10'1"/#2',#4/$'9/73*/'()*05'32/$'03'$#./'>*)0/'13$0'

!! ?8,*3./8/20&'T"/#2C!)*$0'<)*05CT"+$0/*/('o6+D';#*(/*'p'y)2D'<GH6B'_LLjq'

!! T"+$0/*'()*05',#4/$'37'0%/'1"/#2CE*$0'*/4)32'9#$/('32'$,#-#"',*3O)8)05'

X_' XU' Xd'XK' Xe' Xi' Xb'XV'

I9J( 09J(

()*05',#4/' 1"/#2',#4/'

U18B#R(,1V'(U(

=Fn'3*(/*&' XbD'XiD'XeD'XV'

T!=Fn'3*(/*&' XiD'XVD'XbD'Xe'

oX#*PD'z+24D'l#24D'l)8'p'=//D'TGA[A'_LLeq'

91

Page 92: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

T3$0C9#$/('*/,"#1/8/20')2'%59*)('$5$0/8$'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! A)8)"#*'03'0%/',*/.)3+$')(/#D'9+0'73*'%59*)('$/0+,$'!! AA<'#2(';<<'73*',/*$)$0/20'$03*#4/'

!! <).)(/'0%/'9+Q/*',33"')203'0>3'*/4)32$'!! @)8/'*/4)32&'05,)1#"'=Fn'!! T3$0'*/4)32&'73+*'=Fn'g+/+/$D'32/'

,/*'13$0'1"#$$'!! T"/#2'R#$%'!! T"/#2'8#42/-1'!! <)*05'R#$%'!! <)*05'8#42/-1'

!! 6*(/*'g+/+/$'9#$/('32'13$0'

!! [.)10'7*38'-8/'*/4)32'03'13$0'*/4)32'

!! !)2#"'.)1-8')$'#">#5$'7*38'0%/'13$0'*/4)32'

D#,"($'&1#8(

W1>'($'&1#8(

13$0'

ol'p' D̀''`=<N'_LLbq'

92

Page 93: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

G,,/2('#2(',#1P'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! T32./*0'*#2(38'>*)0/$'03'$/g+/2-#"'32/$'!! A%)8'"#5/*'9/0>//2'$03*#4/'

8#2#4/*'#2('AA<'!! 62'/.)1-32D'4*3+,'()*05',#4/$D'

)2'9"31P$'0%#0'#*/'8+"-,"/$'37'0%/'/*#$/'+2)0'

!! <3'230'3./*>*)0/'3"('./*$)32$D')2$0/#('>*)0/'9"31P'$/g+/2-#""5'

!! ?2.#")(#0/'3"('./*$)32$'

!! X#5'0%/',*)1/'37'#'7/>'/O0*#'*/#($'9+0'$#./'0%/'13$0'37'*#2(38'>*)0/$'

oA03)1#D'G0%#2#$$3+")$D'y3%2$32'p'G)"#8#P)D'<GH6B'_LLjq'

A%)8'$03*#4/'8#2#4/*'"#5/*'

AA<',/*$)$0/20'$03*#4/'

*#2(38'>*)0/$'

4*3+,'#2('>*)0/''

$/g+/2-#""5'

)2.#")(#0/'

93

Page 94: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

T#1%)24')2'R#$%'8/83*5'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! X*39"/8'$/0+,'!! AA<'#2(';<<'#0'()Q/*/20'"/./"$'37'0%/'$03*#4/'%)/*#*1%5'

!! !"#$%'8/83*5'+$/('#$'#'1#1%/'73*';<<',#4/$'

!! F/$/#*1%'g+/$-32$'!! Z%/2'#2('%3>'03'+$/'0%/'AA<'#$'#'1#1%/r'

!! Z%)1%',#4/$'03'9*)24')203'0%/'1#1%/r'

!! ;3>'03'1%33$/'.)1-8',#4/$r'

!! H/0%3(3"34)/$'!! 6,-8#"'1%3)1/'37'%#*(>#*/'132E4+*#-32'

!! AA<'#$'#'*/#('1#1%/'

!! !"#$%C*/$)(/20'/O0/2(/('9+Q/*',33"$'

94

Page 95: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

?213*,3*#-24'AA<$')203'0%/'/20/*,*)$/'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! <)$1),")2/('>#5'37')20*3(+1)24'AA<'$03*#4/'

!! H)4*#-32'7*38';<<$'03'AA<$'!! F/g+)*/8/20$'#2('83(/"$'

#2#"5-1#""5'03'$3"./'0%/'132E4+*#-32',*39"/8'

!! A)8,"5'*/,"#1)24';<<$'>)0%'AA<$')$'230'13$0C/Q/1-./'

!! AA<$'#*/'9/$0'+$/('#$'#'1#1%/'!! _C-/*/('#*1%)0/10+*/'!! =34'#2('*/#('1#1%/'32'0%/'

AA<$D'(#0#'32'0%/';<<$'!! N+0'/./2'0%/2'0%/'9/2/E0$'#*/'

")8)0/('

oB#*#5#2#2D'@%/*/$P#D'<322/""5D'["2)P/-'p'F3>$0*32D'[+*3A5$'_LLjq'

Z3*P"3#('*/g+)*/8/20$'

</.)1/'83(/"$'

A3"./*'132E4+*#-32'

39u/1-./$'

0*#1/$'

$,/1$'

9/21%8#*P$'

;<<'-/*'

U$1"'E%-'%B()#&( 9'%B(2%2-'( AA<'-/*'

>*)0/' */#('

95

Page 96: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

@%/'{!A',/*$,/1-./'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! H+"-C-/*/('#*1%)0/10+*/'!! T389)2#-32'37'"344)24'#2('*/#('1#1%)24'!! !"#$%'8/83*5')$'433('73*'

"#*4/'$/g+/2-#"'>*)0/$')2'#2'#,,/2(C32"5'7#$%)32'\23'+,(#0/$]'

!! G"$3'433('#$'#'*/#('1#1%/'73*';<<'(#0#'

!! [.)10C#%/#(',3")15'!! G44*/4#0/'1%#24/$'7*38'

8/83*5'#2(',*/()1-./"5',+$%'0%/8'03'R#$%'03'#83*-s/'%)4%'>*)0/'13$0' ;<<'-/*'

)#&( 9'%B(2%2-'( AA<'-/*'

=34'3,/*#-32$'

G44*/4#0/'1%#24/$'#2(',*/()1-./"5',+$%'

9+Q/*',33"'

o=/./20%#"D'T388+2a'GTHD'VK\i]'_LLbq'

96

Page 97: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

<NHA'9+Q/*',33"'/O0/2$)32$'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! G4#)2'#'8+"-C-/*/('#,,*3#1%'

!! X3")1)/$'#2('#"43*)0%8$'73*'1#1%)24')2'R#$%C*/$)(/20'9+Q/*',33"$'

!! AA<'#$'$/132(#*5'9+Q/*',33"'!! @/8,/*#0+*/C9#$/('/.)1-32'

,3")15'7*38'8/83*5'03'AA<'

!! X#4/$'#*/'1#1%/('32"5')7'%30'/23+4%'

!! G"43*)0%8$'73*'$521)24'(#0#'#1*3$$'0%/'1#1%/$'

oT#2)8D'H)%#)"#D'N%#:#1%#*u//D'F3$$'p'=#24D'X`=<N'_LKLq'o;3""3>#5'X%<'@%/$)$D'nZCH#()$32D'_LLjq'

I%18(>'>#$*(A;F'$(6##)(

!!/(A;F'$(6##)(

.//(R1"-()#&12%)($'&1#8,(

@/8,/*#0+*/C9#$/('*/,"#1/8/20'

TXnI1#1%/'

\K]'*/#('*/13*(' \e]'>*)0/'*/13*('

\_]'*/#(',#4/'

\U]'*/#(',#4/'230'32'AA<'

\V]'>*)0/'()*05',#4/'

\e]'+,(#0/'AA<'13,5'

\d]'>*)0/',#4/'

97

Page 98: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

X+w24')0'#""'034/0%/*'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! [O0/2$)./'$0+(5'37'+$)24'R#$%'8/83*5'#$'1#1%/'

!! X#4/'R3>'$1%/8/$'()10#0/'%3>'(#0#'8)4*#0/$'#1*3$$'0%/'-/*$'!! ?21"+$)./&'(#0#')2'8/83*5')$'#"$3'32'

R#$%'!! [O1"+$)./&'23',#4/')$'930%')2'

8/83*5'#2('32'R#$%'!! =#s5&'#2')2C8/83*5',#4/'8#5'3*'

8#5'230'9/'32'R#$%'(/,/2()24'32'/O0/*2#"'1*)0/*)#'

!! T3$0'83(/"',*/()10$'%3>'#'1389)2#-32'37'>3*P"3#('#2('$1%/8/'>)""'9/%#./'32'132E4+*#-32''

!! B3'8#4)1'1389)2#-32|'()Q/*/20'$1%/8/$'73*'()Q/*/20'>3*P"3#($'#2('()Q/*/20';;<$'#2('AA<$'

ol'p' D̀'G<N?A'_LKKq'

;<<',/*$)$0/20'$03*#4/'

9+Q/*',33"'

AA<'1#1%/'

98

Page 99: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

6+0")2/'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 99

!! !"#$%C9#$/('(/.)1/'(/$)42'!! A3")('$0#0/'(*)./$'

!! H#P)24'AA<$'(#0#9#$/C7*)/2("5'

!! A5$0/8C"/./"'1%#""/24/$'!! ;59*)('$5$0/8$'

!! A03*#4/D'9+Q/*)24'#2('1#1%)24'

!! ?2(/O)24'32'R#$%'

!! S+/*5'#2('0*#2$#1-32',*31/$$)24'

Page 100: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

?2(/O)24'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! X*39"/8'$/0+,'!! Z%)"/',*/$/2-24'0%/'$#8/'?I6')20/*7#1/'#$';<<$D'R#$%'8/83*5'%#$'*#()1#""5'()Q/*/20'1%#*#10/*)$-1$'

!! ?I6'#$588/0*5D'/*#$/C9/73*/C>*)0/'")8)0#-32'

!! F/$/#*1%'g+/$-32$'!! ;3>'$%3+"('>/'#(#,0'/O)$-24')2(/O)24'#,,*3#1%/$r'

!! ;3>'1#2'>/'(/$)42'/W1)/20'$/132(#*5'$03*#4/')2(/O/$'^',30/2-#""5'73*'83*/'0%#2'32/'8/0*)1r'

!! H/0%3(3"34)/$'!! G.3)('/O,/2$)./'3,/*#-32$'>%/2'+,(#-24'0%/')2(/O'

!! A/"7C0+2)24')2(/O)24D'1#0/*)24'73*'R#$%C*/$)(/20'(#0#'

!! T389)2/'AA<$'#2(';<<$'73*')21*/#$/('0%*3+4%,+0'

100

Page 101: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

A/8)C*#2(38'>*)0/$'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! A0#*-24',3)20')$'$0+(5)24'05,)1#"'>*)0/'#11/$$',#:/*2$')2'0%/'1320/O0'37'$#8,")24'

!! !#10&'*#2(38'>*)0/$'%+*0',/*73*8#21/'!! N+0'1#*/7+"'#2#"5$)$'37'#'05,)1#"'>3*P"3#('$%3>$'0%#0'>*)0/$'#*/'*#*/"5'

138,"/0/"5'*#2(38'!! F#0%/*D'0%/5'#*/'$/8)C*#2(38'

!! F#2(38"5'()$,#01%/('#1*3$$'9"31P$D'$/g+/2-#""5'>*):/2'>)0%)2'#'9"31P'!! A)8)"#*'03'0%/'"31#")05',*)21),"/$'37'8/83*5'#11/$$'!! @#P/'#(.#20#4/'37'0%)$'#0'0%/'$0*+10+*/'(/$)42'"/./"'#2('>%/2')$$+)24'>*)0/$'!! N+"P'>*)0/$'03'#83*-s/'>*)0/'13$0'

N"31P'K' N"31P'_' N"31P'U' N"31P'8(

h'

7>'(>*)0/$'$//8)24"5'*#2(38"5'()$,#01%/(')2'-8/h'

h9+0'#10+#""5'>*):/2'$/g+/2-#""5'>)0%)2'#'9"31P'

oB#0%'p'M)9932$D'`=<N'_LLbq'

101

Page 102: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

!"#$%<N&'$/"7C0+2)24'N}C0*//'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! F/#($'#*/'1%/#,D'>*)0/$'#*/',30/2-#""5'/O,/2$)./')7'*#2(38'

!! @>3'83(/$'73*'N}C0*//'23(/$'!! <)$P'83(/&'23(/')$',*)8#*)"5'*/#('!! =34'83(/&'23(/')$',*)8#*)"5'+,(#0/(|'

)2$0/#('37'3./*>*)-24D'8#)20#)2'"34'/20*)/$'73*'0%/'23(/'#2('*/132$0*+10'32'(/8#2('

!! @*#2$"#-32'"#5/*',*/$/20$'+2)73*8')20/*7#1/'73*'930%'83(/$'

!! A5$0/8'$>)01%/$'9/0>//2'83(/$'95'832)03*)24'+$/'

!! A)8)"#*'"344)24'#,,*3#1%')2'oZ+D'l+3'p'T%#24D'GTH'@*#2$a'62'[89/((/('A5$0/8$D'e\U]D'_LLiq'!! <)Q/*/21/')$')2'>%/2'#2('%3>'>*)0/$'

#*/'#,,")/('!! N+Q/*/('E*$0D'0%/2'9#01%/('#2('#,,")/('

95'0%/'N}C0*//'!@='

B3(/'(#0#'

=34'/20*)/$'

8/*4/'N}C0*//'23(/'

)#&(>#B'(

/1,K(>#B'(

0#&(>#B'(

8)4*#-32'13$0'7*38'()$P'03'"34'

8)4*#-32'13$0'7*38'"34'03'()$P'

F/#(I>*)0/'3,/*#-32'

_C$0#0/'0#$P'$5$0/8'

oB#0%'p'l#2$#"D'?AXB'_LLiq'

102

Page 103: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

H#P)24'N}C0*//$'R#$%C7*)/2("5'\0%/'!<C0*//]'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! H+"-,"/'"/./"$')2'0%/')2(/O'!! [#1%',*34*/$$)./'"/./"'37'(3+9"/'$)s/'!! A3'"324'#$'>/'#*/'230'+,(#-24')2',"#1/D'#2('#"$3',/*73*8)24'"#*4/'$/g+/2-#"'>*)0/$'03'

#83*-s/'0%/'13$0D')0x$'#""'433('

!! !<C0*//'"/./"$'#*/'$3*0/('*+2$'!! ;/#(C0*//'\E*$0'"/./"$]')2'8#)2'8/83*5'!! 621/'#'"3>/*'"/./"'/O1//($')0$'1#,#1)05')0')$'8/*4/('>)0%'0%/'2/O0'32/'!! A,/1)#"'/20*)/$'\7/21/$]'#*/'+$/('03'8#)20#)2'0%/'$0*+10+*/'#2('(/#"'>)0%',30/2-#"'$P/>'

/#1%'"/./"'#''$3*0/('*+2'

Z%/2'"/./"')$'7+""D'8/*4/'>)0%'"3>/*'#2(''>*)0/'$/g+/2-#""5'

o=)D';/D'=+3'p'z)D'?T<['_LLjq'

103

Page 104: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

A,#-#"')2(/O)24'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! A)8)"#*'39$/*.#-32$'#$'>)0%'N}C0*//$'1#2'9/'8#(/'32'FC0*//$'!! @%/5x*/'0*//')2(/O/$'#~/*'#""�'

!! =/$$32')$'"#*4/"5'0%/'$#8/&'32/'2//($'03'1#*/7+""5'1*#~'0%/'$0*+10+*/'#2(')0$'#"43*)0%8$'73*'0%/'2/>'8/()+8'

!! N#01%'+,(#0/$'03'#83*-s/'>*)0/'13$0$'

!! oZ+D'T%#24'p'l+3D'M?A'_LLUq'

!! @*#(/'1%/#,'*/#($'73*'/O,/2$)./'>*)0/$'95')20*3(+1)24')89#"#21/'

!! ol'p' D̀'AA@<'_LKKq'

!! A5$0/8#-1'$0+(5'32',/*73*8#21/'37'FC0*//$'32'AA<$')2'o[8*)1%D'M*#7D'l*)/4/"D'A1%+9/*0'p'@%38#D'<GH6B'_LKLq'!! AA<$'230'#$'$/2$)-./'#$';<<$'03',#4/'$)s/'

!! T#,#9"/'37'#((*/$$)24'%)4%/*'()8/2$)32#"'(#0#')2'"/$$'-8/'

104

Page 105: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

;#$%')2(/O)24'9#$/('32'H)1*3;#$%'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! A/0+,')$'$/2$3*'23(/$'>)0%'")8)0/('8/83*5'#2(',*31/$$)24'1#,#9)")-/$'

!! A)8)"#*'03'/O0/2()9"/'%#$%)24'0/1%2)g+/$'!! <)*/103*5'P//,$'0*#1P'37'9+1P/0'93+2(#*)/$'

!! !3*'/#1%'9+1P/0'8#)20#)2'0%/'"#$0'-8/')0'>#$'+$/('A'#2('0%/'2+89/*'37'-8/$')0'%#$'9//2'$,")0'T'

!! X*34*/$$)./'/O,#2$)32'9#$/('32'/g+)C>)(0%'$,")w24'!! [O,#2$)32'0*)44/*/('>%/2'0%/'2+89/*'37'

$,")0$'/O1//($'$38/'0%*/$%3"('

!! ?27*/g+/20"5'+$/('9+1P/0$'#*/'R+$%/('03'AA<'!! </"/-32'0%*3+4%'4#*9#4/'13""/1-32'#2('

*/3*4#2)s#-32'!! N#01%'+,(#0/$'#*/'%/",7+"'

!! M/2/*#")s#-32'37'")2/#*'%#$%)24'95'"#s5'$,")w24')2'oÄ)#24D'{%3+'p'H/24D'ZG?H'_LLbq''!! @#P/$'#(.#20#4/'37'9#01%'+,(#0/$'

()*/103*5'

oLCKL]'A&'_'T&'K'

oKLC_L]'A&'K'T&'U'

o_LCUL]'A&'L'T&'L'

oULCdL]'A&'L'T&'L'

()*/103*5'

oLCKL]'A&'_'T&'K'

oKLCKV]'A&'K'T&'L'

oKVC_L]'A&'U'T&'K'

o_LCUL]'A&'L'T&'L'

oULCdL]'A&'L'T&'L'

/.)10/('03'R#$%'

F/3*4#2)s#-32'\*/,#*--32)24]')7'$,")0'0%*/$%3"(')$'_'

o{/)2#"),3+*Cz#s-D'=)2D'l#"34/*#P)D'M+23,+"3$'p'B#uu#*D'!GA@'_LLVq'

105

Page 106: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

G'(/$)42'73*'%59*)('$5$0/8$'\!"#$%A03*/]'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

o</92#0%D'A/24+,0#'p'=)D'`=<N'_LKLq'

;<<'

h'

(/$0#4)24'

;&+,(9&<*'(".='>(

?#"9'(<:@'#(

l/5C.#"+/',#)*'

h'

A'&=(8&8,'(

l/5C.#"+/',#)*'

h'

A'8'.82(<"9(B'890#(

K' L' h' K'

6"+C(D#'+'.8'(E*00$(F*9'#(

K' L' h' K' h' L'

!)*$0'.#")(',#4/'

=#$0'.#")(',#4/'

X)%,-(>'>#$*&''*/151"/('#,,/2('"34'3*4#2)s/('#$'#''151")1'")$0'37',#4/$D'(/$0#4/('03';<<'9#$/('32'*/1/215'

9YI(>'>#$*&'>*)0/'#2('*/#('9+Q/*$'#2('8/0#(#0#'

l//,$'0*#1P'37'(/$0#4/('/20*)/$'

106

Page 107: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

X*30313"'

A/*./*'_'A/*./*'K'

A1#"#9"/'*/")#9"/'()$0*)9+0/('"34'

@*#2$#1-32'8#2#4/8/20'32'R#$%&';5(/*'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! G'()Q/*/20'#*1%)0/10+*/'73*'0*#2$#1-32'8#2#4/8/20'!! @%/'0#*4/0')$'(#0#'1/20/*$'

!! A%#*/('(#0#D'8+"-C13*/'23(/$'

!! B//('73*'$1#"/C3+0'!! =34C$0*+10+*/('8+"-C./*$)32/('

(#0#9#$/'!! t@%/'"34')$'0%/'(#0#9#$/c'

!! B3',#*--32)24'

!! F#>'R#$%'1%),$'+$/('73*'$03*#4/'!! @%3+4%'AA<$'3*'/./2';<<$'8#5'9/'

+$/('#$'>/""'

!! @%*//C"#5/*/('#*1%)0/10+*/'!! A03*#4/'"#5/*'8#)20#)2$'$%#*/('"34'!! ?2(/O'"#5/*'$+,,3*0$'"33P+,'#2('

./*$)32)24'!! @*#2$#1-32'"#5/*',*3.)(/$')$3"#-32'#2('

132-2+3+$"5'*/7*/$%/$'0%/'(#0#9#$/'1#1%/'95'*+22)24'0%/'t8/"(c'#"43*)0%8'

oN/*2$0/)2D'F/)('p'<#$D'T?<F'_LKLq'

@*#2$#1-32')2,+0'$2#,$%30'

@*#2$#1-32')20/2-32'\FZ'$/0$]'

N*3#(1#$0'03'$/*./*$'

G,,/2('03'"34'

!3*>#*(')20/2-32'

G$$/89"/'"31#"'"34'13,5'

F3""'"34'73*>#*('

107

Page 108: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

6+0")2/'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011 108

!! !"#$%C9#$/('(/.)1/'(/$)42'!! A3")('$0#0/'(*)./$'

!! H#P)24'AA<$'(#0#9#$/C7*)/2("5'

!! A5$0/8C"/./"'1%#""/24/$'!! ;59*)('$5$0/8$'

!! A03*#4/D'9+Q/*)24'#2('1#1%)24'

!! ?2(/O)24'32'R#$%'

!! S+/*5'#2('0*#2$#1-32',*31/$$)24'

Page 109: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

S+/*5'#2('0*#2$#1-32',*31/$$)24'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! X*39"/8'$/0+,'!! A#8/'05,/$'37'g+/*5'#2('0*#2$#1-32#"'>3*P"3#('

!! <)Q/*/20'8/()+8|'230'>%#0'/O)$-24'#,,*3#1%/$'%#./'9//2'3,-8)s/('73*'

!! F/$/#*1%'g+/$-32$'!! G*/'0%/*/',*39"/8$'0%#0'9/$0'E0'03'AA<$r'

!! <3/$'32/'2//('*#()1#""5'()Q/*/20'#,,*3#1%/$D'3*'$")4%0'#(#,0#-32$r'

!! Z%/*/')2'0%/'$03*#4/'%)/*#*1%5'$%3+"('>/'+$/'AA<$'#2('%3>r'

!! H/0%3(3"34)/$'!! !"#$%C#>#*/'#"43*)0%8$'/)0%/*'95'(/$)42'3*'0%*3+4%'#(#,0#-32'

!! 6v3#(',#*0$'37'0%/'138,+0#-32'03'R#$%'8/83*5'

!! [13238)/$'37'$1#"/'

109

Page 110: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

6"('$03*)/$D'2/>'035$'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! ?8,#10'37'$/"/1-.)05'32',*/()1#0/'/.#"+#-32'oH5/*$D'HA1'@%/$)$D'H?@D'_LLiq'!! 6./*#""D'#$'$/"/1-.)05'7#103*')21*/#$/$',/*73*8#21/'(/4*#(/$'\2//("/'

)2'%#5$0#1P'g+/*)/$]'!! G0'-8/$';<<$'8)4%0'3+0,/*73*8'AA<$'

!! y3)2',*31/$$)24'32'AA<$'+$)24'#"43*)0%8$'(/$)42/('73*';<<$'o<3'p'X#0/"D'<GH6B'_LLjq'!! AA<'u3)2$'8#5'>/""'9/138/'TXnC93+2(D'$3'>#5$'03')8,*3./'0%/'TXn'

,/*73*8#21/'9/138/'$#")/20''!! @*#()24'*#2(38'*/#($'73*'*#2(38'>*)0/$',#5$'3Q'!! F#2(38'>*)0/$'*/$+"0')2'.#*5)24'?I6'#2('+2,*/()10#9"/',/*73*8#21/'!! N"31P/('?I6'$-""')8,*3./$',/*73*8#21/'!! N"31P'$)s/'$%3+"('9/'#'8+"-,"/'37'0%/',#4/'$)s/'

110

Page 111: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

?8,#10'37'$03*#4/'"#53+0&'0%/'!"#$%y3)2'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! A03*#4/'"#53+0'9#$/('32'XGÄ'

!! B3'2//('03'*/0*)/./'>%#0x$'230'2/1/$$#*5'!! n$/'#'$,/1)#")s/('3,/*#03*'

\!"#$%A1#2]'73*'32C0%/CR5',*3u/1-32$'3./*'XGÄ'

!! </"/4#0/'u3)2'138,+0#-32'03'0>3'$0/,$'!! !/01%)24'(#0#'95',*3u/1-24'

*/"/.#20'\0%/'7/01%'P/*2/"]'!! [.#"+#0/'0%/'u3)2',*/()1#0/'

\0%*3+4%'0%/'u3)2'P/*2/"]'#2('8#0/*)#")s/'0%/'*/$+"0')2'#'u3)2')2(/O' !"#$%'

A1#2'!"#$%'A1#2'

FK\GD'ND'T]' F_\<D'[D'MD';]'

!/01%'P/*2/"'

y3)2'P/*2/"'G'm'<'

!/01%'P/*2/"'

y3)2'P/*2/"'M'm'l'

!"#$%'A1#2'

FU\!D'lD'=]'G' <'

l'

ND'T'

!'

[D';'

M'

ND'TD'[D';D'!'select R1.B, R1.C, R2.E, R2.H, R3.F from R1, R2, R3 where R1.A = R2.D and R2.G = R3.K

o@$)*34)#22)$D';#*)s3,3+"3$D'A%#%D'Z)/2/*'p'M*#/7/D'A?MH6<'_LLjq'

111

Page 112: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

N+Q/*'"#5/*')2'8/83*5'

!)"0/*'"#5/*'32'AA<'

H/89/*$%),'g+/*)/$'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! H3-.#-32&',33*'"31#")05'37'N"338'E"0/*$'!! ;+*0$'1#1%/',/*73*8#21/')2'TXnC)20/2$)./'#,,")1#-32$'!! M33('1#2()(#0/'73*'3v3#()24'03'R#$%'8/83*5'!! M33('*#2(38'*/#(',/*73*8#21/'138,#*/('03';<<$D'9+0'>/'$-""'

2//('03'1*3$$'0%/'8/83*5C()$P'93+2(#*5'

!! A3"+-32&'9/)24'"#s5',#5$'3Q'!! </7/*'*/#($'#2('>*)0/$'#2('0%*3+4%'9+Q/*)24'!! ?20*3(+1/'%)/*#*1%)1#"'$0*+10+*/'03'#113+20'73*'()$PC"/./"',#4)24'

LKKLKKLLLKLKKLLK' KKLLKKLLLKLKKLKK' KKLLKKLKLKLKKLKL' LLKKKKLLLKLKKLKL'

N+Q/*'9"31P$'P//,'0*#1P'37'(/7/**/('*/#($'#2('>*)0/$'<)$P',#4/$'#10'#$'

$+9CN"338'E"0/*$'

oT#2)8D'H)%#)"#D'N%#:#1%#*u//D'=#24'p'F3$$D'G<HA'_LKLq'

112

Page 113: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

<#0#9#$/'3,/*#-32$'$+)0/('73*'AA<$'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! M)./2'0*#2$#1-32#"'#2('g+/*5',*31/$$)24'>3*P"3#($D'>%)1%'3,/*#-32$'#*/'AA<$'9/:/*'73*r'

!! A0+(5'37'0%/'?I6',#:/*2$'

!! ?(/2-75'$/*./*'$03*#4/'$,#1/$'0%#0'/O%)9)0'AA<C7*)/2("5'?I6',#:/*2$'!! @#9"/$D')2(/O/$D'0/8,3*#*5'$03*#4/D'"34D'*3""9#1P'$/48/20$'

!! A/132(#*5'$0*+10+*/$'#*/'9/:/*'$+)0/('73*'AA<$'!! =324'$/g+/2-#"'>*)0/$'\23'+,(#0/$]'#2('*#2(38'*/#($'

!! X/*73*8#21/')8,*3./8/20'83*/'0%#2'32/'3*(/*'37'8#42)0+(/'>%/2'"344)24'#2('*3""9#1P'$/48/20$'#*/'(/"/4#0/('03'AA<$'

!! !#103*'37'0>3')8,*3./8/20'>%/2'0/8,3*#*5'$03*#4/')$'32'AA<$'

o=//'D'H332D'X#*PD'l)8'p'l)8D'A?MH6<'_LLbq'

113

Page 114: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

?2/O,/2$)./'"344)24')2'R#$%'8/83*5'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! G$'8/2-32/(D'"344)24')$'32/'37'0%/'9/$0'E0$'73*'AA<$'!! @5,)1#""5D'>*)0/$'#*/'#,,/2(/('03'0%/'"34'

!! @%/'32")2/'./*$)32'37'0%/'"34')$'+$+#""5'$8#""'

!! nAN'R#$%'8/83*5')$'1%/#,'#2('nAN',3*0$'#*/'#9+2(#20'

!! ?20+)-32&'$,*/#('0%/'"34'#1*3$$'8+"-,"/'1%/#,'nAN'R#$%'()$P$'!! X*3.)(/'$)8,"/'"344)24')20/*7#1/'

oT%/2D'A?MH6<'_LLjq'

>3*P/*'

>3*P/*'

#*1%)./*'

h'?2C8/83*5'"34'9+Q/*'

9':;',"(:;';'(

38"'$<%2'(>*)0/D'R+$%D'1%/1P,3)20|'*/13./*5'

114

Page 115: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

115

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Tutorial Outline

1. Introduction (Philippe)

2. Flash devices characteristics (Luc)

3. Data management for flash devices (Stratis)

4. Two outlooks (Stratis & Philippe)

Page 116: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

6+0"33P'

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

!! AA<')$'#'()./*$/'1"#$$'37'(/.)1/$'!! @%/'32"5'138832'1%#*#10/*)$-1'37'#""'8/89/*$'37'0%/'1"#$$')$'0%/'

/O1/""/20'*#2(38'*/#(',/*73*8#21/'!! n2(/*"5)24'0/1%23"345'#Q/10$',/*73*8#21/')2'30%/*'3,/*#-32$'!! AA<$'(3'230'138,"/0/"5'(38)2#0/';<<$'^'230'5/0'

!! A38/'05,/$'37'AA<'8#5'9/'#2'3*(/*'37'8#42)0+(/'$"3>/*'0%#2';<<$')2'*#2(38'>*)0/$'

!! Z%/*/'(3'0%/5'E0')2'0%/'(#0#9#$/'$0#1Pr'!! X/*$)$0/20'$03*#4/'^'8#59/')2'1389)2#-32'>)0%';<<$'!! F/#('1#1%/'37';<<'(#0#'!! @*#2$#1-32#"'"344)24'!! n$)24'0%/';<<'#$'#'"34C$0*+10+*/('>*)0/C1#1%/'73*'0%/'AA<'!! @/8,3*#*5'$03*#4/'#2('$0#4)24'#*/#'!! G25'37'0%/'#93./'

!! H3*/'*/$/#*1%'2/1/$$#*5'#0'0%/'AA<I<N')20/*7#1/$'

116

Page 117: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

117

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Design Space

Query Processor

Storage Manager

OS

FTL Which IOs are issued?

How are IOs scheduled?

How are IOs interpreted?

Cross-layer issues: - Avoid duplicating work

- Split work most effectively - Schedule work most effectively

- Avoid arbitrary limitations

Sources of increased complexity: Improve Performance AND predictability, no stable performance contract at interfaces,

high utilization, d(techno)/dt

FD HW

RAID Controller

Page 118: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

118

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Performance Contract

- Predictable, unconstrained and inefficient : USB key or low-end SSD

- Predictable, constrained and efficient : mininal FTL

- Unpredictable and unconstrainted

•!Which DBMS functions can efficiently enforce constraints? How? •!Performance Modelling.

Derive constraints for efficient regimes (ad-hoc)

Existing DBMS are probably good enough

Flash Devices Characteristics:

Query Processor

Storage Manager

OS

FTL

FD HW

RAID controller

Page 119: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

119

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Flash chips trend: Less into the chip: Storage Class Flash

Scaramuzzo (FMS 2010)

Page 120: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

120

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Dealing with complexity

Figures courtesy of Koschaak and Saltzer

From a memory abstraction (block device)...

... to a communication abstraction (rich interface)

Command Interpreter

send(link_name, outgoing_message_buffer)

ERASE (address)

receive(link_name, incoming_message_buffer)

send(link_name, outgoing_message_buffer)

ERASE (address)

Query Processor

Storage Manager

OS

FTL

FD HW

[Schloser et al, CMU tech report 2003; Schloser et al, FAST 2004; Prabakharan et al., OSDI 2008] RAID controller

Page 121: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

121

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

TRIM command - ATA/ATAPI Command set

Data Set Management command (TRIM) TRIM(LBA) is a hint to FTL to unmap LBA-PBA

•! Unmapping is asynchronous, i.e., fast (if at all executed)

•! Read after TRIM is unspecified

- Pushed by file systems community Supported in Linux kernel 2.6.33+ and Windows 7/2008R2 Implemented in X25

•! X25-M has 80GB capacity, but provides LBA for 74,4 GB

•! Trimming a whole disk does not take it back to factory settings.

Page 122: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

122

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Beyond TRIM

Slide courtesy of David Nellans, FusionIO, FMS'11

[Nellans et al. FusionIO 2010, Arpaci-Dusseau et al, HotStorage 2010]

Page 123: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

123

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Atomic Writes

Slide courtesy of , Gary Orenstein Fusion IO

[Prabakharan et al, OSDI 2008; Ouyang et al, HPCA'11]

Problem: partial writes due to system failure during an in-place update Solution: copy on write (InnoDB physiological logging + double write buffer);

atomic write at FTL level improve performance significantly (single write) and reduce DBMS complexity; it also limits concurrency.

Page 124: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

124

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Co-design: What's next?

- No in-place updates, do we still need WAL? - If we reconsider physiological logging, do we still

need page-based IOs? Do we still need the same representation in memory and on disk?

- Can we leverage prioritized IOs to improve a form of predictability?

- What does extent-based data allocation buy us? - How to efficiently deal with arrays of flash

devices?

Page 125: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

125

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Take-away Point # 1

&! Flash devices are here to stay Towards high-performance, energy proportionality

&! The key issue is to improve predictability AND performance

As long as flash devices hide flash chip constraints to support any types of IOs, performance characteristics will remain opaque.

Page 126: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

126

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

Take-away Point # 2

&! Need to revisit DBMS design decisions stemming from hard drive characteristics

&! Need to revisit strict layering between DBMS, OS and FTL

The complexity of flash devices should not be abstracted away if it results in unpredictable and suboptimal performance.

Page 127: System Co-Design and Data Management for Flash Devices ... · IBM Research, Switzerland Stratis D. Viglas University of Edinburgh, United Kingdom . 2 Bonnet, Bouganim, Koltsidas,

127

Bonnet, Bouganim, Koltsidas, Viglas, VLDB 2011

There is an opportunity for the DB community to stop running after the technology, and influence the

upcoming developments of flash devices

Take-away Point # 3

&! Lot of work in DB community based on FD assumptions

&! The co-design train has left the station. FusionIO and Oracle are leading the way.