“implementation of smb3.0 in scale-out nas” · (mpa) file states. smb3. distributed access...
TRANSCRIPT
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
“Implementation of SMB3.0 in Scale-Out NAS”
Kalyan Das Jun Liu
Huawei Technologies Co.
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Agenda
Target Storage Subsystem Distributed NAS without DLM Transparent Failover Copy Offload (ODX) Multi Channel SMB Direct (RDMA) Remote VSS
2
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
3
Huawei Unified Storage
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Controller B
Unified Storage: Two controller unit
4
Controller A
File System (FS1) File System (FS2)
PCIe Link
BBU Mirrored Memory Region
Disks Pool Spaces
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Unified Storage: 2C Unit (in real life)
5
BBU
Controller A Controller B
Power Power FAN
Backend I/O
Module
Frontend I/O
Module
Management I/O Module
Data switch
I/O Module
DKE 8
DKE 7
DKE 6
DKE 5
Controller
KVM SVP
DKE 4
DKE 3
DKE 2
DKE 1
Sys0 Front view
Sys0 Rear view
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Unified Storage: Multi unit cluster
6
A
B
A
DSW 0 DSW 1
B
A
B
A
B
Enclosure 0 Enclosure 1 Enclosure 2 Enclosure 3
PCI-E link
PCI-E switch
PCIE2.0 port 4GB/s
Mirror channel
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Unified Storage: 16 Controller Rack layout
7
DKE 8
DKE 7
DKE 6
DKE 5
Controller
KVM SVP
DKE 4
DKE 3
DKE 2
DKE 1
Sys0 Front view
Sys0 Rear view
Sys1 Front view
Sys1 Rear view
Controller
DSW
DSW
DKE 1
DKE 2
DKE 3 DKE 4
DKE 5
DKE 6
DKE 7
DKE 8
DKE 9 DKE 10
DKE 11
DKE 12
DKE 13
DKE 14 DKE 15
DKE 16
Sys2-7 Front view
Sys2-7 Rear view
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Controller B
Unified Storage (2C): Mirrored FS Dirty pages
8
Controller A
File System (FS1) File System (FS2)
Memory Channel
Disks Pool Spaces
Protocol
Layer
FS Upper Layer
FS Lower Layer
Write foo
Dirty Pages
Mirror Modified
Pages
Ack Flush
1
2
3 4
5
6
7
1: Client Write request 2: FS Write API 3: Dirty pages into mirror segment 4: Mirrored to 2nd controller 5: Ack from 2nd controller 6: Write request is committed 7: The dirty pages are flushed Repeat step 4 and 5 to clear dirty bits in the 2nd controller.
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Controller B
Unified Storage (2C): File System Failover
9
Controller A
File System (FS1) File System (FS2)
Memory Channel
Disks Pool Spaces
Protocol Layer
FS Upper Layer
FS Lower Layer
Dirty Pages
FS Upper Layer
Protocol Layer
X
S11 S12
\\S12\FS2 \\S12\FS1 \\S11\FS1
Flush
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Controller A
HWFS
Unified Storage (2C): Active-Active File Systems
10
File System (FS1) File System (FS2)
PCIe Link
Disks Pool Spaces
Storage Layer
Protocol Adapter
Layer
Protocol Layer
Controller A
HWFS Storage Layer
Protocol Adapter
Layer
Protocol Layer
\\S11\FS1 \\S11\FS2 \\S12\FS2 \\S12\FS1
S11 S12
Memory Channel
System manager
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Multi Protocol Access Handling
11
SMB
NFSv4
Transport File System
Local Apps
NTFS Interface
POSIX Interface
NFSv4 Interface
FTP/HTTP
Multi-Protocol Access Handler
(MPA) File States
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
12
Unified Storage: Scale Out NAS
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Active-Active FS: Distributed Access without DLM
13
SMB
Transport
File System
NTFS Interface
POSIX Interface
Multi-Protocol Access Handler
(MPA)
File States
SMB
Transport
File System
NTFS Interface
POSIX Interface
Multi-Protocol Access Handler
(MPA)
File States
PCIe
FS1 FS2
\\This_Server\FS1 \\This_Server\FS2
\\This_Server\FS1 \\This_Server\FS2
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Unified Storage (MC): Distributed NAS without DLM
14
SMB
Transport
File System
NTFS Interface
POSIX Interface
Multi-Protocol Access Handler (MPA)
File States
SMB
Transport
File System
NTFS Interface
POSIX Interface
Multi-Protocol Access Handler (MPA)
File States
PCIe
FS1 FS2
SMB
Transport
File System
NTFS Interface
POSIX Interface
Multi-Protocol Access Handler (MPA)
File States
SMB
Transport
File System
NTFS Interface
POSIX Interface
Multi-Protocol Access Handler (MPA)
File States
PCIe
FS3 FS24
DSW 0 PCIe Switch
S11 S12 S21 S22
\\S11\FS1 \\S11\FS2 \\S11\FS3 \\S11\FS4
\\S12\FS1 \\S12\FS2 \\S12\FS3 \\S12\FS4
\\S21\FS1 \\S21\FS2 \\S21\FS3 \\S21\FS4
\\S22\FS1 \\S22\FS2 \\S22\FS3 \\S22\FS4
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
15
SMB3 Transparent Failover
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
SMB3 Transparent Failover
16
SMB
Distributed Access Service
(DAS)
File States
NFSv4
Transport File System
Local App
NTFS Interface
POSIX Interface
NFSv4 Interface
Multi Protocol Lock and Transparent Failover Support
FTP/HTTP
Witness Service
Witness Interface
Failover Partner DAS Interface
DLM Interface Remote DAS Interface
Multi-Protocol Access Handler
(MPA) File States
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
SMB3 TF: Under 2C Unified Storage
17
SMB3
Distributed Access Service
(DAS)
File States
Transport File
System
NTFS Interface
POSIX Interface
Witness Service
Witness Interface
Multi-Protocol Access Handler
(MPA)
File States
SMB3
Distributed Access Service
(DAS)
File States
Transport File
System
NTFS Interface
POSIX Interface
Witness Service
Witness Interface
Multi-Protocol Access Handler
(MPA)
File States
Witness Partner heart-
beat
PCIe
FS1 FS2
Failover Partner DAS Interface
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
SMB3 TF: Under Multi-Controllers Unified Storage
18
SMB3
Distributed Access Service (DAS)
File States
Transport
File System
NTFS Interface
POSIX Interface
Witness Service
Witness Interface
Multi-Protocol Access Handler (MPA)
File States
SMB3
Distributed Access Service (DAS)
File States
Transport
File System
NTFS Interface
POSIX Interface
Witness Service
Witness Interface
Multi-Protocol Access Handler (MPA)
File States
Witness Partner heart-
beat
PCIe
FS1 FS2
Failover Partner DAS Interface
SMB3
Distributed Access Service (DAS)
File States
Transport
File System
NTFS Interface
POSIX Interface
Witness Service
Witness Interface
Multi-Protocol Access Handler (MPA)
File States
SMB3
Distributed Access Service (DAS)
File States
Transport
File System
NTFS Interface
POSIX Interface
Witness Service
Witness Interface
Multi-Protocol Access Handler (MPA)
File States
Witness Partner heart-
beat
PCIe
FS3 FS24
Failover Partner DAS Interface
DSW 0 PCIe Switch
S11 S12 S21 S22
\\S11\FS1 \\S11\FS2 \\S11\FS3 \\S11\FS4
\\S12\FS1 \\S12\FS2 \\S12\FS3 \\S12\FS4
\\S21\FS1 \\S21\FS2 \\S21\FS3 \\S21\FS4
\\S22\FS1 \\S22\FS2 \\S22\FS3 \\S22\FS4
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
SMB3 TF: Normal Flow
19
SMB3
Distributed Access Service
(DAS)
File States
Transport File
System
NTFS Interface
POSIX Interface
Witness Service
Witness Interface
Multi-Protocol Access Handler
(MPA)
File States
SMB3
Distributed Access Service
(DAS)
File States
Transport File
System
NTFS Interface
POSIX Interface
Witness Service
Witness Interface
Multi-Protocol Access Handler
(MPA)
File States
Witness Partner heart-
beat
PCIe
FS1 FS2 1 Open
2 3
4 5
6
7
8
9 10
Failover Partner DAS Interface
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
SMB3 TF: Flow after failover
20
Failover Partner DAS Interface
SMB3
Distributed Access Service
(DAS)
File States
Transport File
System
NTFS Interface
POSIX Interface
Witness Service
Witness Interface
Multi-Protocol Access Handler
(MPA)
File States
Witness Partner heart-
beat
PCIe
FS2 2 Re-open
3
4
5
1
7
6
8
9
10
11 12
13 14
15
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
path info pACL pBRL pOL
SMB3 TF: Performance through File State Batching
21
SMB
Distributed Access Service
(DAS)
File States
Transport File
System
NTFS Interface
POSIX Interface
Witness Service
Witness Interface
Multi-Protocol Access Handler
(MPA)
File States
FS1
path info pACL pBRL pOL path info pACL pBRL pOL
ACE ACE ACE
Info + TFF + TS Info + TFF + TS Info + TFF + TS
Open List
File
SD BRL
SID DH CrGUID info TF
F TS CF SID DH CrGUI
D info TFF TS CF
SID DH CrGUID info TF
F TS CF SID DH Cr
GUID info TFF TS CF
DH Durable Handle CrGUID Create-GUID TFF Transparent Failover Flag TS Time Stamp CF Close Flag
Periodic Batch
Periodic Batch
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
SMB3 TF: Asymmetric Share migration (SMB 3.02)
22
FS1
Multi-Protocol Access Handler (MPA)
SMB3
Distri-buted
Access Service (DAS)
File System
Witness Service
FS2
Multi-Protocol Access Handler (MPA)
SMB3
Distri-buted
Access Service (DAS)
File System
Witness Service
FS3
Multi-Protocol Access Handler (MPA)
SMB3
Distri-buted
Access Service (DAS)
File System
Witness Service
FS4
Multi-Protocol Access Handler (MPA)
SMB3
Distri-buted
Access Service (DAS)
File System
Witness Service
Client
S11 S12 S21 S22
\\S21\FS1 \\S21\FS3
\\S11\FS1 move \\S21\FS1 => \\S11\FS1
1
2
3
4
6 5
DSW 0 PCIe Switch
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
23
Copy Offload (ODX)
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Unified Storage: Copy Offload (ODX)
Storage support of T10 XCOPY Storage Token structure FS support to copy offload Fsctl codes Copy at the FS Layer Fast Copy Use of PCIe Link Copy at the Protocol Layer
24
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Copy Offload Practice
Three Layers to Implement Copy offload NAS System I
NAS Protocol
FileSystem
Backend Storage
Protocol Layer Independent filesystem and backend Storage solutions But,in distribute environment, protocol layer must support parallel access which maybe trigger TOKEN invalid events in Multi-protocol access environment,must know the invalidation which triggered by other protocol like nfs , http
Filesystem Layer Leverage filesystem snapshot or file level snapshot feature Good practice for distribute file system which base on common disks Can also use lower block storage odx feature and manage Lower odx token list
Backend Storage Layer Leverage Block function, like VAAI,ODX over SCSI Features Support T10 XCOPY But, a file maybe cross multi block DEVs which are managed by filesystem And need the same type backend storage
NAS System I
NAS Protocol
FileSystem
Backend Storage
TOKEN (fsid fid,offset,len)
TOKEN (fsid fid,offset,len)
TOKENS (devno list, LBA list)
In our Practice, we show how Protocol and Filesystem Layers to implement Copy Offload Feature.
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Copy Offload Practice
Some Data Structures and Workflows Design(I)
Protocol Layer Convert Token as a type of LOCK which only conflict with WRITE and Change Size Operations Can be revoked or broken by conflict operations
Filesystem Layer Each file keeps offload token list as meta data for a file(inode) Write and Change size operations maybe invalidate the Token
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Controller B
Unified Storage (2C): ODX
27
Controller A
File System (FS1) File System (FS2)
Disks Pool Spaces
Protocol
Layer
FS Driver
Protocol
Layer
FS Driver
SMB3 Client
Copy \\S11\FS1\foo \\S12\FS2\foo.bar
SMB_IOCTL (FSCTL_OFFLOAD_READ)
FS_IOCTL (FSCTL_OFFLOAD_READ)
T10 XCOPY (CDB 0x83 & 0x84)
Token
Storage Token
FS Token
Protocol Token
Native
PCIe Link
Network
SMB_IOCTL (FSCTL_OFFLOAD_WRITE)
FS_IOCTL (FSCTL_OFFLOAD_WRITE)
T10 XCOPY (CDB 0x83 & 0x84) 1
2
9
7 6
5
4 3
8
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
XCOPY test Results
28
Copy Offload Performance
Without Copy Offload With Copy Offload
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Hyper-V Clone test Results
Copy Offload Performance
Without Copy Offload With Copy Offload
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Read Offload Operation
30
Copy Offload (Packet Trace for VM Cloning)
Offload Read Request
Offload Read Response
fileoffset and length
TokenOffSet = 0x0
TokenFsid= 0x1f89
TokenLen = 0x014E400000
TokenFid = 0x26
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Write Offload Operation
31
Copy Offload (Packet Trace for VM Cloning)
Offload Write Request
Offload Write Response
TokenOffSet = 0x0
TokenFsid= 0x1f89
TokenLen = 0x014E400000
TokenFid = 0x26
Offset and Length in this Offload Write
Success to wirte length
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Copy Offload Practice
Some Data Structures and Workflows Design(II)
Filesystem supports Copy Offload, so Protocol just pass the token request to the filesystem(Protocol has to convert FH to according fsid and fid which filesystem known)
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
Copy Offload Practice
Some Data Structures and Workflows Design(III)
Filesystem do not support Copy Offload, so Protocol generates Token and converts it to Lock for Multi-Protocol Access and Distribute environment
2013 Storage Developer Conference. © Huawei Technologies Co. All Rights Reserved.
34
Thank You Q & A