[rakutentechconf2014] [d-4] the next step of leofs and introducing newdb project
DESCRIPTION
Rakuten Technology Conference 2014 "The next step of LeoFS and Introducing NewDB Project" Yosuke Hara w/Paras Patel (Rakuten)TRANSCRIPT
The Next Step
Yosuke Hara - @yosukehara
A Researcher of R.I.T. and Tech Lead LeoFSwith Paras Patel, An engineer of Rakuten DU
1
OverviewBrief Benchmark Reportv1.0 -Multi Data Center Replicationv1.1 - NFS SupportLeoFS Administration at RakutenFuture Plan
2
Overview
3
Terabyte Petabyte Exabyte2014 2020
LeoFS - Background
2010 20122008 2016 2018
4
� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �
The Lion of Storage Systems
HIGH Availability
HIGH Cost Performance Ratio
HIGH Scalability
LeoFS Non Stop
Velocity: Low LatencyMinimum Resources
Volume: Petabyte / ExabyteVariety: Photo, Movie, Unstructured-data
3 Vs in 3 HIGHs
5
Metadata Object Storage
Storage Engine/Router
Monitor
GUI Console
( Erlang RPC)
LeoFS - Architecture
Storage(storage cluster)
Manager
( Erlang RPC)
Gateway
( TCP/IP,SNMP )
S3-API / NFS
- Software Defined Storage- Satisfied the 3 HIGH- Easy Administration
Metadata Object Storage
Storage Engine/Router
Metadata Object Storage
Storage Engine/Router
Clients
.....Monitoring
System
Huge Capacity Storage
6
LeoFS Gateway
7
LeoFS Overview - Gateway
Stateless Proxy + Object Cache
REST-API / S3-API
Use Consistent Hashingfor decision of a primary node
[ Memory Cache, Disc Cache ]
Storage C
lusterG
ateway(s)
Clients
HTTP Request and ResponseBuilt in Object Cache Mechanism
Storage Cluster
Fast HTTP Server - CowboyAPI HandlerObject Cache Mechanism
8
LeoFS Storage
9
Storage (S
torage Cluster)
Gatew
ay
LeoFS Overview - Storage
Use "Consistent Hashing"for Data Operation
in the Storage Cluster
Choosing Replica Target Node(s)
RING2 ^ 128 (MD5)
# of replicas = 3
KEY = “bucket/leofs.key”Hash = md5(Filename)
Secondary-1
Secondary-2
Primary Node
"P2P"
WRITE: Auto ReplicationREAD : Auto Repair of an Inconsistent Object with Async
10
Request From Gateway
LeoFS Overview - Storage
...
LeoFS Storage
ReplicatorRecoverer
...
Storage Engine
Storage E
ngine, Metadata + O
bject Storage
Gatew
ay
Storage consists of Object Storage and Metadata StorageIncludes Replicator and Recoverer for the eventual consistency
MetadataStorage Object
Storage
11
LeoFS Overview - Storage - Data Structure
Metadata
Storage
Object S
torage
Robust andHigh PerformanceNecessary for GC
Offset Version Time-stamp Key
<Metadata>
Checksum
for Sync
KeySize CustomMeta Size File Size
for retrieving an object
Footer (8B)
Checksum KeySize DataSize Offset Version Time-stamp Key User-Meta Footer
Header (Metadata - Fixed length) Body (Variable Length)
User-MetaSize
ActualFile
<Needle>
Supe
r-bl
ock
Nee
dle-
1
Nee
dle-
2
Nee
dle-
3
<Object Container>N
eedl
e-4
Nee
dle-
5
Log Structure File Format
12
LeoFS Manager
13
Storage Cluster
LeoFS Overview - Manager
Monitor
Operate
RING, Node State
status, suspend,resume, detach, whereis, ...
Gateway(s)
Storage C
lusterG
ateway(s)
Manager(s)
Operates LeoFS - Gateway and Storage Cluster"RING Monitor" and "NodeState Monitor"
14
Brief BenchmarkReport
15
Brief Benchmark Report
1st Case: Group of Value Ranges Storage:5, Gateway:1, Manager:2 R:W = 9:1
2nd Case: Group of Value Ranges Storage:5, Gateway:1, Manager:2 R:W = 8:2
source: https://github.com/leo-project/notes/tree/master/leofs/benchmark/leofs/20140605/tests/1m_r9w1_240min
source: https://github.com/leo-project/notes/tree/master/leofs/benchmark/leofs/20140605/tests/1m_r8w2_120min
16
Brief Benchmark Report
CPU Intel(R) Xeon(R) CPU X5650 @ 2.67GHz * 2 (12 cores / 24 threads)
Memory 96GBDisk HDD - 240GB RAID0
Network 10G-Ether
Server Spec - Gateway:
CPU Intel(R) Xeon(R) CPU X5650 @ 2.67GHz * 2 (12 cores / 24 threads)
Memory 96GB
DiskHDD - 240GB RAID0 (System)
DiskHDD - 2TB RAID0 (Data)
Network 10G-Ether
Server Spec - Storage x5:
17
Network 10GbpsOS CentOS release 6.5 (Final)
Erlang OTP R16B03-1LeoFS v1.0.2
Environment:
System Consistency Level: [ N:3, W:2, R:1, D:2 ]
Duration 4.0hR:W 9:1
# of Concurrent Processes 64
# of Keys 100,000
Value Size
Benchmark Configuration:
Range (byte)Range (byte) Percentage
1,024 10,240 24%
10,241 102,400 30%
102,401 819,200 30%
819,201 1,572,864 16%
Brief Benchmark Report - 1st Case (R:W=9:1)
18
Range (byte)Range (byte) Percentage
1,024 10,240 24%
10,241 102,400 30%
102,401 819,200 30%
819,201 1,572,864 16%
Brief Benchmark Report - 1st Case (R:W=9:1)
The range of the data size of 80%is between 1KB and 1.5MB
19
source: https://github.com/leo-project/notes/tree/master/leofs/benchmark/leofs/20140601/tests/1m_r9w1_240min
50ms
Brief Benchmark Report - 1st Case (R:W=9:1)
50ms
1,500ops
No Errors
OPS
Latency
20
Brief Benchmark Report - 1st Case / Network Traffic
0
150,000
300,000
450,000
600,000
750,000
900,000
1,050,000
1,200,000
1,350,000
1,500,000
0s 500s
1000s
1500s
2000s
2500s
3000s
3500s
4000s
4500s
5000s
5500s
6000s
6500s
7000s
7500s
8000s
8500s
9000s
9500s
10000s
10500s
11000s
11500s
12000s
12500s
13000s
13500s
14000s
gateway rxbyt/s gateway txbyt/sstorage-1 rxbyt/s storage-1 txbyt/sstorage-2 rxbyt/s storage-2 txbyt/sstorage-3 rxbyt/s storage-3 txbyt/sstorage-4 rxbyt/s storage-4 txbyt/sstorage-5 rxbyt/s storage-5 txbyt/s
10.0Gbps
7.0Gbps
5.0Gbps
6.0Gbps
Storage
Gatew
ay
60%
21
Brief Benchmark Report - 1st Case / Memory and CPU
0
0.1
0.3
0.4
0.6
0.7
0.9
1.0
1.1
1.3
1.4
1.6
1.7
1.9
2.0
0s 500s
1000s
1500s
2000s
2500s
3000s
3500s
4000s
4500s
5000s
5500s
6000s
6500s
7000s
7500s
8000s
8500s
9000s
9500s
10000s
10500s
11000s
11500s
12000s
12500s
13000s
13500s
14000s
Memory Usage
CPU Load 5min
1.0
0
10
20
30
40
50
60
70
80
90
100
0s 500s
1000s
1500s
2000s
2500s
3000s
3500s
4000s
4500s
5000s
5500s
6000s
6500s
7000s
7500s
8000s
8500s
9000s
9500s
10000s
10500s
11000s
11500s
12000s
12500s
13000s
13500s
14000s
gateway storage-1storage-2 storage-3storage-4 storage-5
22
Network 10GbpsOS CentOS release 6.5 (Final)
Erlang OTP R16B03-1LeoFS v1.0.2
Environment:
System Consistency Level: [ N:3, W:2, R:1, D:2 ]
Duration 2.0hR:W 8:2
# of Concurrent Processes 64
# of Keys 100,000
Value Size
Benchmark Configuration:
Brief Benchmark Report - 2nd Case (R:W=8:2)
Range (byte)Range (byte) Percentage
1,024 10,240 24%
10,241 102,400 30%
102,401 819,200 30%
819,201 1,572,864 16%
23
Brief Benchmark Report - 2nd Case (R:W=8:2)
65ms 85ms
1,000ops
No Errors
OPS
Latency
24
Compare 1st case with 2nd case
25
Brief Benchmark Report
0
150,000
300,000
450,000
600,000
750,000
900,000
1,050,000
1,200,000
1,350,000
1,500,000
0s 500s
1000s
1500s
2000s
2500s
3000s
3500s
4000s
4500s
5000s
5500s
6000s
6500s
7000sgateway rxbyt/s gateway txbyt/sstorage-1 rxbyt/s storage-1 txbyt/sstorage-2 rxbyt/s storage-2 txbyt/sstorage-3 rxbyt/s storage-3 txbyt/sstorage-4 rxbyt/s storage-4 txbyt/sstorage-5 rxbyt/s storage-5 txbyt/s
0
300,000
600,000
900,000
1,200,000
1,500,000
1,800,000
2,100,000
2,400,000
2,700,000
3,000,000
0s 500s
1000s
1500s
2000s
2500s
3000s
3500s
4000s
4500s
5000s
5500s
6000s
6500s
7000s
6.0Gbps7.0Gbps
6.0Gbps7.0Gbps
minus 0.7Gbps
1st Case - Network Traffic
2nd Case - Network Traffic
26
Brief Benchmark Report
0
50.0
100.0
150.0
200.0
250.0
300.0
350.0
400.0
450.0
500.0
550.0
600.0
0s 500s
1000s
1500s
2000s
2500s
3000s
3500s
4000s
4500s
5000s
5500s
6000s
6500s
7000s
0
50.0
100.0
150.0
200.0
250.0
300.0
350.0
400.0
450.0
500.0
550.0
600.0
0s 500s
1000s
1500s
2000s
2500s
3000s
3500s
4000s
4500s
5000s
5500s
6000s
6500s
7000s
storage-1 storage-2 storage-3storage-4 storage-5
100
100
2nd Case - Disk util%
200
200
1st Case - Disk util%
1.8x high
27
Brief Benchmark Report
00.20.40.60.81.01.21.41.61.82.02.22.42.62.83.0
0s 500s
1000s
1500s
2000s
2500s
3000s
3500s
4000s
4500s
5000s
5500s
6000s
6500s
7000s
gatewaystorage-1storage-2storage-3storage-4storage-5
00.20.40.60.81.01.21.41.61.82.02.22.42.62.83.0
0s 500s
1000s
1500s
2000s
2500s
3000s
3500s
4000s
4500s
5000s
5500s
6000s
6500s
7000s
1.00
1.00
1.6x high2nd Case - CPU Load 5min
1st Case - CPU Load 5min
28
LeoFS kept in a stable performance through the benchmark
Brief Benchmark Report
Bottleneck is Disk I/O
The cache mechanism contributed to reduce network traffic between Gateway and Storage
Conclusion:
29
LeoFS v1.0Multi Data CenterReplication
30
LeoFS - For Disaster Recovery
Storage cluster
Manager cluster
For Disaster Recovery= High Scalability + High Availability
w/o SPOF and Performance Degradation
31
1. Easy Operation to build multi clusters.
2. Asynchronous data replication between clusters
Stacked data is transferred to remote cluster(s)
3. Eventual consistency
Multi Data Center Replication
Designes it as simple as possible
32
DC-3DC-2
Storage cluster
Manager cluster
Client
DC-1
Monitors and Replicates each “RING” and “System Configuration”
"Leo Storage Platform"
[# of replicas:1] [# of replicas:1][# of replicas:3]
"join cluster DC-2 and DC-3"
leo_rpcleo_rpc
Multi Data Center Replication
Executing “Join Cluster” on Manager Console
Preparing the MDC Replication
33
DC-3DC-2
Storage cluster
Manager cluster
Client
Monitors and Replicates each “RING” and “System Configuration”
"Leo Storage Platform"
[# of replicas:1] [# of replicas:1]
Request tothe Target Region
Application(s)
DC-1
[# of replicas:3]
Temporally Stacking objects- One container's capacity is *32MB- When capacity is full,
send it to remote cluster(s)* 32MB: default capacity - able to set optional value
leo_rpcleo_rpc
Multi Data Center Replication
Stacking objects
34
DC-3DC-2
Storage cluster
Manager cluster
Client
Monitors and Replicates each “RING” and “System Configuration”
"Leo Storage Platform"
DC-1
Stacked an object with a metadata
Compress it with LZ4
Replicated an object
Request tothe Target Region
Application(s)
leo_rpc
leo_rpcleo_rpc
Multi Data Center Replication
Transferring stacked objects
Stacked objects
35
DC-3DC-2
Storage cluster
Manager cluster
Client
Monitor and Replicate each “RING” and “System Configuration”
"Leo Storage Platform"
Request tothe Target Region
Application(s)
DC-1
1) Receive metadata of stored objects2) Compare them at the local cluster3) Fix inconsistent objects
leo_rpcleo_rpc
leo_rpcleo_rpc
Multi Data Center Replication
Investigating stored objects
36
LeoFS v1.1NFS Support
37
PaaS / IaaS
BigData Analysis
Log / Event / Sensor data
BigDataAnalysis Platform
Document / Contents
Photo / Movie
Various Kind and Huge Amount Data
Various Kind Data
Any Services
Log / Event / Sensor data
Offline - NFS / S3-API Online - REST / S3-API
LeoFS v1.1 - NFS Support
38
LeoFS Administrationat Rakuten
Presented by Paras Patel An engineer of Rakuten DU
39
Storage PlatformFile Sharing ServiceEnd-User Servers Portal Site
Photo Storage
Background Storage of OpenStack
LeoFS Administration at Rakuten
40
41
Rakuten Services are using LeoFS
41
Project FiFo is an open-source Cloud Management and Orchestration system for SmartOS virtualization environments.
The components of FiFo are written entirely in Erlang which gives the suite excellent stability and fault recovery as it continues maturing to a production quality release.
Project FiFo uses LeoFS packages for SmartOS in their repository.
External Users
42
Storage Platform
43
Storage Platform - Scaling the Storage Platform
(Movie)
Reduce CostsHigh ReliabilityEasy to ScaleS3-API
44
Using Various Services Total Usage: 450TB/600TB
# of Files: 600Million Daily Growth: 100GB Daily Reqs: 13Million
Storage Platform - Scaling the Storage Platform
E-Commerce
Blog
Insurance Calendar
Recruiting
Review Photoshare
Portal &Contents
Bookmark
B
Storage Platform
(Movie)
45
Monitor
GUI Console
( Erlang RPC)
( Erlang RPC) ( TCP/IP,SNMP )
Gatew
ay x 4Storage x 14
Manager x 2
Requests fromWeb Applications / Browsers
w/HTTP over S3-API
Load Balancer / Cache Servers
Storage Platform - System LayoutTotal disk space: 600TBNumber of Files: 600MillionAccess Stats: 800Mbps (MAX) 400Mbps (AVG)
46
Monitor
GUI Console
( Erlang RPC)
( Erlang RPC) ( TCP/IP,SNMP )
Gatew
ay x 4Storage x 14
Manager x 2
Storage Platform - Monitor
Send Mail AlertGanglia Agent
Status Collection (Ganglia)Status Check (Nagios)Port + Threshold Check
47
Storage Platform - Spreading Globally
Covering All Services with Multi DC Replication
48
+https://owncloud.com/ http://roma-kvs.org/
+=
49
+
File Sharing Service - Required Targets
Reduce CostsHandle Confidential Files
Store Large FilesScale Easily
50
+
Share Docs and Videos with Group CompaniesOver 20 Companies, Over 10 Countries
Over 10,000 Users, Over 4,000 Teams
File Sharing Service - Usage
51
LDAP
Monitor
GUI Console
( Erlang RPC)
( Erlang RPC) ( TCP/IP,SNMP )
Manager x 2
Authenticate Users
Manage Configurations
ManageLogin Session(KVS)
File Sharing Service - System Layout
Web GUI File Browser
52
LeoFS Client Test: https://github.com/leo-project/leofs_client_tests
LeoFS Continuous Automated Testing
AWS SDK for PHPAWS SDK for RubyAWS SDK for JavaBoto(Python)erlcloud(Erlang)s3cmds3fuse
Continuous Integration with Jenkins
53
Cover 25 Countries/RegionsOver 20,000 Users
+
File Sharing Service - Future Plans
54
Enhancing the Servicesand Empowering the Users through the Cloud Storage
55
LeoFS v2.0
Beyond Storage System,Toward Cloud Storage
56
LeoFS v2.0 - Cloud Storage
Autonomic OperationAutomatic RebalanceAutomatic Data Compaction
QoSStatistics of internal LeoFSWatchdog
Request / Traffic Control
Always keeps best condition of LeoFS
57
PaaS / IaaS
BigData Analysis
Log / Event / Sensor data
BigDataAnalysis Platform
Document / Contents
Photo / Movie
Various Kind and Huge Amount Data
Various Kind Data
Any Services
Log / Event / Sensor data
LeoFS v2.0 - Cloud Storage
Hybrid StorageCentralizes Huge Amount and Various Kind data in LeoFSSupports Online and Offline Access
58
PaaS / IaaS
BigData Analysis
Log / Event / Sensor data
BigDataAnalysis Platform
Document / Contents
Photo / Movie
Various Kind and Huge Amount Data
Various Kind Data
Any Services
Log / Event / Sensor data
REST-API (JSON)
Operate LeoFS
Notify a message of over # of req threshold
NewDB Projfor LeoFS Insight
LeoFS v2.0 - Cloud Storage
+Retrieve m
etrics and stats from NewDB Proj's Agents
- Autonomic operation- NFS Support- Erasure Code- NewDB Integration- OpenStack Integrationand more...
59
IntroducingNewDB Project
60
NewDB Proj - Background
We need to get "Value in Data" in realtime
Transaction Analysis Cache
Typical System
NewDB Proj is a Hybrid DBHandle Atomic Data
and Analyze Data in realtime
+Database Statistics / ML
NewDB Proj
Typical Systems still realise"Semi-realtime Data Analysis"
61
NewDB Proj - Concept
Analyzing Data in RealtimeRetrieving "Value" in Exponential Data
High Realiability High Scalability
Ad-hoc QueryPluggable QLODBC/JDBC, REST, CLI,...
NewDB Proj
ML-libIntegration
Pluggable ML/Statistics Mechanism
62
Analyzing DataOnline
TransactionStatistics, ML
Handle Event, Sensorand Metrics Data
Clustering DataFinding Patterns in Data
Calculate and Generate DataThen Grow Value of Data
Pluggable QLEase of Operation
and The QL Express Intuition Directly
NewDB Proj - Concept
SQL
Snapshot, BackupRestore
Visualizing Data
63
Early 2015
64
Website: leo-project.netTwitter: @LeoFastStorage
LeoProject
Set Sail for the new Leo land
65