hyeong-jun kim, jin-soo kim sungkyunkwan university young ... › ... › protected-files ›...

22
Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik Lee KAIST HotStorage ’16 June 20, 2016

Upload: others

Post on 29-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

Hyeong-Jun Kim, Jin-Soo Kim

Sungkyunkwan University

Young-Sik Lee

KAIST

HotStorage ’16June 20, 2016

Page 2: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

2

HDD10s of milliseconds

NAND10s of microseconds

3D XPointTM

10s of nanoseconds

~1/1000

~1/1000

DRAMnanoseconds

Page 3: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

3

User Apps

USER

KERNEL

VFS/File System

Block Layer

SAS Driver

Request Queue

SCSI XLAT

HDD (~10ms)

Min

imiz

ed

Sta

ck

User Apps

VFS/File System

Block Layer

NVMe Driver

NVMe SSD (<100µs)

User Apps

VFS/File System

Block Layer

SAS Driver

Request Queue

SCSI XLAT

SAS SSD (~150µs)

Op

tim

ize

d S

tack

Page 4: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

• Use of polling for the fast I/O completion [Yang et al. FAST 2012]

• Optimization of a low-level hardware abstraction layer

[Shin et al. ATC 2014]

• Reducing the translation overhead between abstraction layers

• Optimizations to fully exploit the performance of fast storage devices

[Yu et al. ACM TOCS 2014]

• Polling, request merging, double buffering and reducing context switches

4

Page 5: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

• Kernel should be general to provide an abstraction layer

• Kernel cannot implement any policy that favors a certain application

• Updating kernel requires a constant effort to port application-

specific optimizations

5

Page 6: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

• Direct access to the special storage device [Caulfield et al. ASPLOS 2012]

– Special hardware is required

• Direct access to NVMe device

• Intel Storage Performance Development Kit – SPDK (Sep 2015)

• Micron Userspace NVMe driver project – UNVMe (Feb 2016)

– Device dedicated to a single user process

– Provides just simple read & write interface based on polling

– Not sufficient to port existing applications

6

Page 7: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

7

USER

KERNEL

User Apps

VFS/File System

Block Layer

NVMe Driver

NVMe Device

H/W

NVMe SSD

NVMe Device

User Apps

NVMeDirect Framework

NVMe I/O Queue

NVMe Driver

• Permission management• Queue management

NVMe SSD

con

tro

l

dat

a

Page 8: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

8

• Allows user-space applications to directly access NVMe SSDswithout any hardware modifications

• Achieves high performance by avoiding storage stack overhead

• Supports various I/O policies

• Applications can be optimized according to their I/O characteristics

• Selective use of block cache, I/O scheduler, or I/O completion thread

• Asynchronous I/O vs. Synchronous I/O

• Buffered I/O vs. Direct I/O

• Designed to maximize performance for trusted applications

• Storage appliance, private clouds, etc.

Page 9: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

9

NV

Me

Dir

ect

Lib

rary

NVMe Controller

I/O

Han

dle

sI/

O Q

ue

ue

s

Block Cache

I/O Scheduler

I/O Completion

Thread

Handle Handle

Admin Tool

NVMeDirect API

Use

rKe

rnel

HW

NV

Me

Driv

er

Def

ault

Que

ues

Use

r Cre

ated

Q

ueue

s

• NVMeDirect Management

• Kernel driver

• Admin tool

• NVMeDirect I/O

• I/O Handles

• User-space I/O Queues

• NVMeDirect I/O Framework

• Block Cache

• I/O Scheduler

• I/O Completion Thread

Page 10: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

• User-space I/O Queues

• Memory-mapped address space for NVMe I/O Queuescreated in the kernel address space

• I/O Handles

• Used to send I/O requests to NVMe I/O Queue(s)

• A thread can create one or more I/O Handles

• Each Handle can be configured to use different features :caching, I/O scheduling, I/O completion, etc.

10

I/O

Han

dle

sI/

O Q

ue

ue

s

Handle Handle Handle Handle

1:1 1:N N:1

Page 11: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

11

Page 12: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

12

NV

Me

Dir

ect

Lib

rary

NVMe Controller

I/O

Han

dle

sI/

O Q

ue

ue

s

Block Cache

I/O Scheduler

I/O Completion

Thread

Handle

Admin Tool

NVMeDirect API

Use

rKe

rnel

HW

NV

Me

Driv

er

Def

ault

Que

ue

Use

r Cre

ated

Q

ueue

1) Open device

nvmed = nvmed_open(“/proc/nvme0/n1”);

2) Create queue

queue = nvmed_queue_create(nvmed);

3) Create handle

handle = nvmed_handle_create(queue);

4) Perform I/O

size = nvmed_read(handle, buf, len);

5) Configure Handle

ret = nvmed_set_param(handle, BUFFERED_IO, TRUE);

Block Cache

Page 13: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

13

• Enables high performance I/O

• Low latency and high throughput

• Easy to support new interfaces

• Weighted queue, multi-stream, etc.

• Easy to develop and debug

• Provides various I/O policies

• Free from kernel update

• Co-exists with legacy kernel I/O

Page 14: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

• Implementation on the Linux kernel 4.3.3

• Experimental setup• Ubuntu 14.04 LTS

• 3.3GHz Intel Core i7 CPU (6 cores) & 64GB of DRAM

• Intel 750 Series 400GB NVMe SSD

• Comparison with• Kernel I/O

• SPDK

• NVMeDirect

14

Page 15: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

• Asynchronous random I/O performance using FIO

15

- 50

100 150 200 250 300 350 400 450

1 2 4 8 16 32 64

IOP

S (4

KB

) x

1,0

00

Queue Depth

Random Read

-

50

100

150

200

250

300

1 2 4 8 16 32 64

Queue Depth

Random Write

Page 16: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

• Polling is not efficient on bandwidth sensitive workloaddue to the significant increase in the CPU load

• Significant performance degradation occurs in a certain polling period

• Control Polling Period dynamically based on I/O size or hints from applications

16

020406080100

00.20.40.60.8

1

CP

U U

tiliz

atio

n (

%)

No

rmal

ized

Rea

d IO

PS

Polling Period (µs)

4KB 8KB 16KBCPU Utilization Normalized Read IOPS

Page 17: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

• Redis: in-memory data structure store

• Logging every operation for persistency

• Logs are 10 to 100 bytes in size

• Write buffer is required due to small-size data

• Difficult to run on SPDK without significant code modification

17NV

Me

Dir

ect

Lib

rary

I/O

Han

dle

sI/

O Q

ue

ue

s

Block Cache

I/O Scheduler

I/O Completion

Thread

Handle

NVMeDirect API

Use

r

Redis

Page 18: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

• Using workload-A in YCSB on Redis

• Update-heavy workload with Zipf distribution

18

0

10,000

20,000

30,000

40,000

50,000

60,000

Throughput

(op

s/s)

x 1

,00

0

Kernel I/O NVMeDirect

0

50

100

150

200

250

Read Update

Late

ncy

s)

Kernel I/O NVMeDirect

15% 13% 20%

Page 19: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

• NVMeDirect supports prioritized I/O without H/W features

• Prioritized I/O without a weighted round-robin scheduler

• Using flexible binding between Handles and Queues

• Sharing a single Queue with multiple Handles

19NV

Me

Dir

ect

Lib

rary

I/O

H

and

les

I/O

Q

ue

ue

s

Block Cache

I/O Scheduler

I/O Completion Thread

NVMeDirect API

Handle Handle HandleHandle

Page 20: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

• One prioritized thread with a dedicated queue,Three threads with a shared queue

• Each thread performs 4KB random write

20

0

20,000

40,000

60,000

80,000

100,000

Kernel I/ONVMeDirect

with Dedicated Queue

4K

B IO

PS

Page 21: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

• NVMeDirect

• First full framework forI/O in the user-space based on stock NVMe devices

• Can be easily applied to many applications

• Useful for emerging storage devices, e.g. 3D XPointTM, etc.

• Available as open-source at https://github.com/nvmedirect (July 2016)

• Future work

• User-level file systems

• Porting diverse data-intensive applications over NVMeDirect

• Protecting the system from illegal access

21

Page 22: Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young ... › ... › protected-files › hotstorage16_slides_ki… · Hyeong-Jun Kim, Jin-Soo Kim Sungkyunkwan University Young-Sik

Thank [email protected]

22