“implementation of smb3.0 in scale-out nas” · (mpa) file states. smb3. distributed access...

34
“Implementation of SMB3.0 in Scale-Out NAS” Kalyan Das Jun Liu Huawei Technologies Co.

Upload: others

Post on 20-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

“Implementation of SMB3.0 in Scale-Out NAS”

Kalyan Das Jun Liu

Huawei Technologies Co.

Page 2: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Agenda

Target Storage Subsystem Distributed NAS without DLM Transparent Failover Copy Offload (ODX) Multi Channel SMB Direct (RDMA) Remote VSS

2

Page 3: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

3

Huawei Unified Storage

Page 4: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Controller B

Unified Storage: Two controller unit

4

Controller A

File System (FS1) File System (FS2)

PCIe Link

BBU Mirrored Memory Region

Disks Pool Spaces

Page 5: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Unified Storage: 2C Unit (in real life)

5

BBU

Controller A Controller B

Power Power FAN

Backend I/O

Module

Frontend I/O

Module

Management I/O Module

Data switch

I/O Module

DKE 8

DKE 7

DKE 6

DKE 5

Controller

KVM SVP

DKE 4

DKE 3

DKE 2

DKE 1

Sys0 Front view

Sys0 Rear view

Page 6: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Unified Storage: Multi unit cluster

6

A

B

A

DSW 0 DSW 1

B

A

B

A

B

Enclosure 0 Enclosure 1 Enclosure 2 Enclosure 3

PCI-E link

PCI-E switch

PCIE2.0 port 4GB/s

Mirror channel

Page 7: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Unified Storage: 16 Controller Rack layout

7

DKE 8

DKE 7

DKE 6

DKE 5

Controller

KVM SVP

DKE 4

DKE 3

DKE 2

DKE 1

Sys0 Front view

Sys0 Rear view

Sys1 Front view

Sys1 Rear view

Controller

DSW

DSW

DKE 1

DKE 2

DKE 3 DKE 4

DKE 5

DKE 6

DKE 7

DKE 8

DKE 9 DKE 10

DKE 11

DKE 12

DKE 13

DKE 14 DKE 15

DKE 16

Sys2-7 Front view

Sys2-7 Rear view

Page 8: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Controller B

Unified Storage (2C): Mirrored FS Dirty pages

8

Controller A

File System (FS1) File System (FS2)

Memory Channel

Disks Pool Spaces

Protocol

Layer

FS Upper Layer

FS Lower Layer

Write foo

Dirty Pages

Mirror Modified

Pages

Ack Flush

1

2

3 4

5

6

7

1: Client Write request 2: FS Write API 3: Dirty pages into mirror segment 4: Mirrored to 2nd controller 5: Ack from 2nd controller 6: Write request is committed 7: The dirty pages are flushed Repeat step 4 and 5 to clear dirty bits in the 2nd controller.

Page 9: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Controller B

Unified Storage (2C): File System Failover

9

Controller A

File System (FS1) File System (FS2)

Memory Channel

Disks Pool Spaces

Protocol Layer

FS Upper Layer

FS Lower Layer

Dirty Pages

FS Upper Layer

Protocol Layer

X

S11 S12

\\S12\FS2 \\S12\FS1 \\S11\FS1

Flush

Page 10: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Controller A

HWFS

Unified Storage (2C): Active-Active File Systems

10

File System (FS1) File System (FS2)

PCIe Link

Disks Pool Spaces

Storage Layer

Protocol Adapter

Layer

Protocol Layer

Controller A

HWFS Storage Layer

Protocol Adapter

Layer

Protocol Layer

\\S11\FS1 \\S11\FS2 \\S12\FS2 \\S12\FS1

S11 S12

Memory Channel

System manager

Page 11: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Multi Protocol Access Handling

11

SMB

NFSv4

Transport File System

Local Apps

NTFS Interface

POSIX Interface

NFSv4 Interface

FTP/HTTP

Multi-Protocol Access Handler

(MPA) File States

Page 12: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

12

Unified Storage: Scale Out NAS

Page 13: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Active-Active FS: Distributed Access without DLM

13

SMB

Transport

File System

NTFS Interface

POSIX Interface

Multi-Protocol Access Handler

(MPA)

File States

SMB

Transport

File System

NTFS Interface

POSIX Interface

Multi-Protocol Access Handler

(MPA)

File States

PCIe

FS1 FS2

\\This_Server\FS1 \\This_Server\FS2

\\This_Server\FS1 \\This_Server\FS2

Page 14: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Unified Storage (MC): Distributed NAS without DLM

14

SMB

Transport

File System

NTFS Interface

POSIX Interface

Multi-Protocol Access Handler (MPA)

File States

SMB

Transport

File System

NTFS Interface

POSIX Interface

Multi-Protocol Access Handler (MPA)

File States

PCIe

FS1 FS2

SMB

Transport

File System

NTFS Interface

POSIX Interface

Multi-Protocol Access Handler (MPA)

File States

SMB

Transport

File System

NTFS Interface

POSIX Interface

Multi-Protocol Access Handler (MPA)

File States

PCIe

FS3 FS24

DSW 0 PCIe Switch

S11 S12 S21 S22

\\S11\FS1 \\S11\FS2 \\S11\FS3 \\S11\FS4

\\S12\FS1 \\S12\FS2 \\S12\FS3 \\S12\FS4

\\S21\FS1 \\S21\FS2 \\S21\FS3 \\S21\FS4

\\S22\FS1 \\S22\FS2 \\S22\FS3 \\S22\FS4

Page 15: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

15

SMB3 Transparent Failover

Page 16: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

SMB3 Transparent Failover

16

SMB

Distributed Access Service

(DAS)

File States

NFSv4

Transport File System

Local App

NTFS Interface

POSIX Interface

NFSv4 Interface

Multi Protocol Lock and Transparent Failover Support

FTP/HTTP

Witness Service

Witness Interface

Failover Partner DAS Interface

DLM Interface Remote DAS Interface

Multi-Protocol Access Handler

(MPA) File States

Page 17: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

SMB3 TF: Under 2C Unified Storage

17

SMB3

Distributed Access Service

(DAS)

File States

Transport File

System

NTFS Interface

POSIX Interface

Witness Service

Witness Interface

Multi-Protocol Access Handler

(MPA)

File States

SMB3

Distributed Access Service

(DAS)

File States

Transport File

System

NTFS Interface

POSIX Interface

Witness Service

Witness Interface

Multi-Protocol Access Handler

(MPA)

File States

Witness Partner heart-

beat

PCIe

FS1 FS2

Failover Partner DAS Interface

Page 18: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

SMB3 TF: Under Multi-Controllers Unified Storage

18

SMB3

Distributed Access Service (DAS)

File States

Transport

File System

NTFS Interface

POSIX Interface

Witness Service

Witness Interface

Multi-Protocol Access Handler (MPA)

File States

SMB3

Distributed Access Service (DAS)

File States

Transport

File System

NTFS Interface

POSIX Interface

Witness Service

Witness Interface

Multi-Protocol Access Handler (MPA)

File States

Witness Partner heart-

beat

PCIe

FS1 FS2

Failover Partner DAS Interface

SMB3

Distributed Access Service (DAS)

File States

Transport

File System

NTFS Interface

POSIX Interface

Witness Service

Witness Interface

Multi-Protocol Access Handler (MPA)

File States

SMB3

Distributed Access Service (DAS)

File States

Transport

File System

NTFS Interface

POSIX Interface

Witness Service

Witness Interface

Multi-Protocol Access Handler (MPA)

File States

Witness Partner heart-

beat

PCIe

FS3 FS24

Failover Partner DAS Interface

DSW 0 PCIe Switch

S11 S12 S21 S22

\\S11\FS1 \\S11\FS2 \\S11\FS3 \\S11\FS4

\\S12\FS1 \\S12\FS2 \\S12\FS3 \\S12\FS4

\\S21\FS1 \\S21\FS2 \\S21\FS3 \\S21\FS4

\\S22\FS1 \\S22\FS2 \\S22\FS3 \\S22\FS4

Page 19: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

SMB3 TF: Normal Flow

19

SMB3

Distributed Access Service

(DAS)

File States

Transport File

System

NTFS Interface

POSIX Interface

Witness Service

Witness Interface

Multi-Protocol Access Handler

(MPA)

File States

SMB3

Distributed Access Service

(DAS)

File States

Transport File

System

NTFS Interface

POSIX Interface

Witness Service

Witness Interface

Multi-Protocol Access Handler

(MPA)

File States

Witness Partner heart-

beat

PCIe

FS1 FS2 1 Open

2 3

4 5

6

7

8

9 10

Failover Partner DAS Interface

Page 20: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

SMB3 TF: Flow after failover

20

Failover Partner DAS Interface

SMB3

Distributed Access Service

(DAS)

File States

Transport File

System

NTFS Interface

POSIX Interface

Witness Service

Witness Interface

Multi-Protocol Access Handler

(MPA)

File States

Witness Partner heart-

beat

PCIe

FS2 2 Re-open

3

4

5

1

7

6

8

9

10

11 12

13 14

15

Page 21: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

path info pACL pBRL pOL

SMB3 TF: Performance through File State Batching

21

SMB

Distributed Access Service

(DAS)

File States

Transport File

System

NTFS Interface

POSIX Interface

Witness Service

Witness Interface

Multi-Protocol Access Handler

(MPA)

File States

FS1

path info pACL pBRL pOL path info pACL pBRL pOL

ACE ACE ACE

Info + TFF + TS Info + TFF + TS Info + TFF + TS

Open List

File

SD BRL

SID DH CrGUID info TF

F TS CF SID DH CrGUI

D info TFF TS CF

SID DH CrGUID info TF

F TS CF SID DH Cr

GUID info TFF TS CF

DH Durable Handle CrGUID Create-GUID TFF Transparent Failover Flag TS Time Stamp CF Close Flag

Periodic Batch

Periodic Batch

Page 22: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

SMB3 TF: Asymmetric Share migration (SMB 3.02)

22

FS1

Multi-Protocol Access Handler (MPA)

SMB3

Distri-buted

Access Service (DAS)

File System

Witness Service

FS2

Multi-Protocol Access Handler (MPA)

SMB3

Distri-buted

Access Service (DAS)

File System

Witness Service

FS3

Multi-Protocol Access Handler (MPA)

SMB3

Distri-buted

Access Service (DAS)

File System

Witness Service

FS4

Multi-Protocol Access Handler (MPA)

SMB3

Distri-buted

Access Service (DAS)

File System

Witness Service

Client

S11 S12 S21 S22

\\S21\FS1 \\S21\FS3

\\S11\FS1 move \\S21\FS1 => \\S11\FS1

1

2

3

4

6 5

DSW 0 PCIe Switch

Page 23: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

23

Copy Offload (ODX)

Page 24: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Unified Storage: Copy Offload (ODX)

Storage support of T10 XCOPY Storage Token structure FS support to copy offload Fsctl codes Copy at the FS Layer Fast Copy Use of PCIe Link Copy at the Protocol Layer

24

Page 25: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Copy Offload Practice

Three Layers to Implement Copy offload NAS System I

NAS Protocol

FileSystem

Backend Storage

Protocol Layer Independent filesystem and backend Storage solutions But,in distribute environment, protocol layer must support parallel access which maybe trigger TOKEN invalid events in Multi-protocol access environment,must know the invalidation which triggered by other protocol like nfs , http

Filesystem Layer Leverage filesystem snapshot or file level snapshot feature Good practice for distribute file system which base on common disks Can also use lower block storage odx feature and manage Lower odx token list

Backend Storage Layer Leverage Block function, like VAAI,ODX over SCSI Features Support T10 XCOPY But, a file maybe cross multi block DEVs which are managed by filesystem And need the same type backend storage

NAS System I

NAS Protocol

FileSystem

Backend Storage

TOKEN (fsid fid,offset,len)

TOKEN (fsid fid,offset,len)

TOKENS (devno list, LBA list)

In our Practice, we show how Protocol and Filesystem Layers to implement Copy Offload Feature.

Page 26: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Copy Offload Practice

Some Data Structures and Workflows Design(I)

Protocol Layer Convert Token as a type of LOCK which only conflict with WRITE and Change Size Operations Can be revoked or broken by conflict operations

Filesystem Layer Each file keeps offload token list as meta data for a file(inode) Write and Change size operations maybe invalidate the Token

Page 27: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Controller B

Unified Storage (2C): ODX

27

Controller A

File System (FS1) File System (FS2)

Disks Pool Spaces

Protocol

Layer

FS Driver

Protocol

Layer

FS Driver

SMB3 Client

Copy \\S11\FS1\foo \\S12\FS2\foo.bar

SMB_IOCTL (FSCTL_OFFLOAD_READ)

FS_IOCTL (FSCTL_OFFLOAD_READ)

T10 XCOPY (CDB 0x83 & 0x84)

Token

Storage Token

FS Token

Protocol Token

Native

PCIe Link

Network

SMB_IOCTL (FSCTL_OFFLOAD_WRITE)

FS_IOCTL (FSCTL_OFFLOAD_WRITE)

T10 XCOPY (CDB 0x83 & 0x84) 1

2

9

7 6

5

4 3

8

Page 28: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

XCOPY test Results

28

Copy Offload Performance

Without Copy Offload With Copy Offload

Page 29: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Hyper-V Clone test Results

Copy Offload Performance

Without Copy Offload With Copy Offload

Page 30: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Read Offload Operation

30

Copy Offload (Packet Trace for VM Cloning)

Offload Read Request

Offload Read Response

fileoffset and length

TokenOffSet = 0x0

TokenFsid= 0x1f89

TokenLen = 0x014E400000

TokenFid = 0x26

Page 31: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Write Offload Operation

31

Copy Offload (Packet Trace for VM Cloning)

Offload Write Request

Offload Write Response

TokenOffSet = 0x0

TokenFsid= 0x1f89

TokenLen = 0x014E400000

TokenFid = 0x26

Offset and Length in this Offload Write

Success to wirte length

Page 32: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Copy Offload Practice

Some Data Structures and Workflows Design(II)

Filesystem supports Copy Offload, so Protocol just pass the token request to the filesystem(Protocol has to convert FH to according fsid and fid which filesystem known)

Page 33: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

Copy Offload Practice

Some Data Structures and Workflows Design(III)

Filesystem do not support Copy Offload, so Protocol generates Token and converts it to Lock for Multi-Protocol Access and Distribute environment

Page 34: “Implementation of SMB3.0 in Scale-Out NAS” · (MPA) File States. SMB3. Distributed Access Service (DAS) File States. Transport. File System. NTFS Interface. POSIX Interface

2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.

34

Thank You Q & A