nvme/tcp standards-based, fault-tolerant clustered storage … · 2020-02-19 · system architect...
TRANSCRIPT
![Page 1: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/1.jpg)
NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage with LightOS
Alex ShpinerSystem ArchitectLightbits [email protected]
![Page 2: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/2.jpg)
● Founded
● Key milestones:
● 80 Employees :
○
○
● Locations
○
○
○
○
● Funding
○
○
○
We are hiring!
![Page 3: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/3.jpg)
![Page 4: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/4.jpg)
●●●●●●
Optional hardware acceleration for SSD management and data services
High performance, low latency Global Flash Translation Layer with data services
High performance, low latency NVMe/TCP target
NVMe/TCP targetGlobal FTL with Rich
Data Services
![Page 5: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/5.jpg)
Optional hardware acceleration for SSD management and data services
High performance, low latency Global Flash Translation Layer with data services
High performance, low latency
NVMe/TCP target Standard TCP/IP
Network (no RDMA required)
Standard NVMe/TCP client
driver
NVMe/TCP targetGlobal FTL with Rich
Data Services
NVMe/TCP targetGlobal FTL with Rich
Data Services
NVMe/TCP targetGlobal FTL with Rich
Data Services
![Page 6: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/6.jpg)
With Application Replication
v1.xdo
replicate
No Application Replication
●●●
○●
○
![Page 7: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/7.jpg)
v2.x
With Application Replication No Application Replication
v1.xdo
replicate
●○
![Page 8: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/8.jpg)
Storage server level protection Storage server failure via LightOS Clustering
SSD level protection SSD failure via Global FTL Erasure Coding
v2.x
v1.x
![Page 9: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/9.jpg)
NVMe/TCP targetGlobal FTL with Rich
Data Services
NVMe/TCP targetGlobal FTL with Rich
Data Services
NVMe/TCP targetGlobal FTL with Rich
Data Services
●●
○○ All clients continue working!
![Page 10: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/10.jpg)
● Inherit storage services from LightOS 1.x● High performance and low latency
○ Single hop reads ○ Two hop writes (user + replications)
NVMe/TCP target
Global FTL with Rich
Data ServicesNVMe/TCP target
Global FTL with Rich
Data ServicesNVMe/TCP target
Global FTL with Rich
Data ServicesNVMe/TCP target
Global FTL with Rich
Data ServicesNVMe/TCP target
Global FTL with Rich
Data ServicesNVMe/TCP target
Global FTL with Rich Data Services
● Standard unmodified clients and network○ Leveraging standard NVMe-1.4 and NVMeoF 1.1○ Transparent failover via multipath with Asymmetric
Namespace Access (ANA)
● Distributed and fault tolerant storage servers○ Automatic volume assignment○ Failure domains○ Management○ Discovery service
![Page 11: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/11.jpg)
● Multi-replica volumes
● Each replica is stored on a separate storage server
LightOS Cluster
Storage Server Storage Server
Storage Server Storage Server
Storage Server Storage Server
Storage Server Storage Server
Storage Server Storage Servervol1_replica_2 vol1_replica_3vol1_replica_1
vol1_replica_1
vol1_replica_2
vol1_replica_3
![Page 12: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/12.jpg)
● Different groups of storage servers can be impacted by common elements that share a point of failure:○ Network○ Power○ Geographical
![Page 13: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/13.jpg)
● User defined server assignments to specific failure domain groups.
● Configured via labels assigned to servers, reflecting common dependencies.○ rack_01, rack_02, …○ power_0, power_1, ...
● Replicas are placed in different failure domains.
LightOS Cluster
rack_01
rack_02
rack_03
rack_04
rack_05
Storage Server Storage Server
Storage Server Storage Server
Storage Server Storage Server
Storage Server Storage Server
Storage Server Storage Server
vol1_replica_1
vol1_replica_2
vol1_replica_3
![Page 14: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/14.jpg)
●●●●●
●
LightOS Cluster
Storage Server Storage Server Storage Server
NVMe/TCP Client
Secondary Secondary Primary
Writes Reads
![Page 15: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/15.jpg)
●○ partial rebuild
○
●
LightOS Cluster
Storage Server Storage Server Storage Server
NVMe/TCP Client
Secondary Secondary Primary
Writes Reads
Temporary Failure
“Partial rebuild” Only the necessary
data is sent
![Page 16: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/16.jpg)
●
●○ Symmetric○ Asymmetric
■■
■● LightOS leverages NVMe ANA for Clustering
○○
○ Failure Handling
LightOS Cluster
Storage Server Storage Server Storage Server
NVMe/TCP Client
Secondary Secondary Primary
![Page 17: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/17.jpg)
●
●○ Symmetric○ Asymmetric
■■
■
● LightOS Leverages NVMe ANA for Clustering○○
○ Failure Handling
LightOS Cluster
Storage Server Storage Server Storage Server
NVMe/TCP Client
Secondary Secondary Primary
Failure
![Page 18: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/18.jpg)
●
●○ Symmetric○ Asymmetric
■■
■
● LightOS Leverages NVMe ANA for Clustering○○
○ Failure Handling
LightOS Cluster
Storage Server Storage Server Storage Server
NVMe/TCP Client
Secondary Primary Secondary
Failure
![Page 19: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/19.jpg)
●○
○●
○○○
●○
○●
LightOS Cluster
Storage Server Storage Server Storage Server
NVMe/TCP Client
Cluster Management DiscoveryAPI
![Page 20: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/20.jpg)
●
○
●
○
![Page 21: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/21.jpg)
●
○
●
○
![Page 22: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/22.jpg)
Initial state
Missing
●
○
●
●
![Page 23: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/23.jpg)
Initial state
Failover
●
○
●
●
![Page 24: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/24.jpg)
●
●
●
![Page 25: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/25.jpg)
Contact information
![Page 26: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/26.jpg)
![Page 27: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/27.jpg)
![Page 28: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/28.jpg)
●
○
○
●
●
●
![Page 29: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/29.jpg)
●
○
![Page 30: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/30.jpg)
●
○ lsblk nvme list
● optimized inaccessbile
○ nvme list-subsys <dev>
![Page 31: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/31.jpg)
●
●
○
●
○
![Page 32: NVMe/TCP Standards-Based, Fault-Tolerant Clustered Storage … · 2020-02-19 · System Architect Lightbits Labs alex@lightbitslabs.com Founded ... Distributed and fault tolerant](https://reader033.vdocuments.mx/reader033/viewer/2022043003/5f81dab7eb6da10c0c76a68f/html5/thumbnails/32.jpg)
●
●
○
●