dynamic object routing
TRANSCRIPT
2016 Storage Developer Conference. © Insert Your Company Name. All Rights Reserved.
Dynamic Object Routing
Balaji GanesanBharat BodduCloudian Inc.
About Cloudian
GLOBAL PRESENCE
HQ: San Mateo CA; Offices in US, Japan, China, EMEA
TOP TECHNICAL TALENT
With deep experience in storage, big data & enterprise software
BASED IN SILICON VALLEY
To tackle storage challenges created by exponentially growing data
2011 90 Global
2© 2016 Cloudian, Inc. All rights reserved.
3
Why Object Storage
To the application user, the logical “object” matters,Not how it’s physically stored (e.g., pieces, versions, location).
© 2016 Cloudian, Inc. All rights reserved.
4
Object vs. File vs. Block Storage
Abstraction Level
OBJECTS
FILES
BLOCKS
HTTP (S3)
ApplicationLevel
OS UserLevel
OS KernelLevel
NAS (NFS, CIFS)
SAN (iSCSI)
© 2016 Cloudian, Inc. All rights reserved.
1. Full Amazon S3 API Compatibility, including error codes.
2. Multi-datacenter, peer-to-peer architecture. No single point of failure.
3. Multi-tenant: QoS controls, billing, reporting by user and group.
4. Elastic Capacity: Small start and scale-out as needed.
5. Management/Monitoring Console or REST API
6. Easy to Deploy : Packaged software or Appliance
5
HyperStore System Overview
Object Storage
User/Administrator
Management console
ApplicationS3 over HTTP
© 2016 Cloudian, Inc. All rights reserved.
HyperStore Use Cases
Media Content Store
Data Analytics
Private Cloud
File Distribution & Sharing
Backup Archive
6© 2016 Cloudian, Inc. All rights reserved.
7
Logical Architecture
AdminServer
S3 Server
Authentication
Account & QoS
Reports
Applications
HTTPREST
ManagementConsole
Data Store(Cassandra)
Authentication
User Management
Reports
Data Explorer
Cloudian Mgmt Console
HTTP/S (S3)
HTTP/SWebBrowser
Cloudian HyperStore
Data Store(Replicas)
Data Store(Erasure Coding)
HyperStore® Manager
© 2016 Cloudian, Inc. All rights reserved.
8
High-level System View
DNS Server and/or Load Balancer
Object Storage Cluster
© 2016 Cloudian, Inc. All rights reserved.
Peer-to-peer system = no SPOF Distributed Everything = Data , Metadata, Configuration
9
Distributed & Elastic Geo Cluster
User Defined Location Affinity
DC1
DC2
Add Node <-> Auto Rebalance
Server <-> vNodes <-> Disks
© 2016 Cloudian, Inc. All rights reserved.
ObjectId-MD5
Metadata
DataObjectId-
Timestamp
Objects In HyperStore
• Objects are immutable.
ObjectId-MD5
Metadata
DataObjectId-
Timestamp
10© 2016 Cloudian, Inc. All rights reserved.
Data Partitioning
• Each storage node in the cluster is randomly assigned a set of tokens, which represents the nodes position on the ring.
• Each node stores objects whose tokens are between it’s own token (including) and predecessor node token (excluding).
• This region on the ring is also call Token Range or Virtual Node (vNode)
11© 2016 Cloudian, Inc. All rights reserved.
12
vnodes
Vnodes are mapped to physical disks. Then one disk failure only affects those vnodes.
Max 256 vnodes per physical node. No token management. Tokens randomly assigned.
Increased rebuild speed in case of disk or node failure Allows heterogeneous machines in a cluster
HyperStore Node1
vNode vNode vNode
HyperStore Node2
vNode vNode
HyperStore Node3
vNode vNode vNodevNode vNode vNode
© 2016 Cloudian, Inc. All rights reserved.
Static Mapping Table
disk1
vNode-A1
vNode-A2
disk2
vNode-B1
vNode-B2
disk3
vNode-C1
vNode-C3
13© 2016 Cloudian, Inc. All rights reserved.
vNode-
A1vNode-
A2
vNode-
B1
vNode-
B2
vNode-
C1
disk1 disk2 disk3 disk4 disk5
Problems With Static Mapping
Uneven Disk Usage
14© 2016 Cloudian, Inc. All rights reserved.
vNode-
A1vNode-
A2
vNode-
B1
vNode-
B2,B3
vNode-
C1
disk1 disk2 disk3 disk4 disk5disk4
Problems With Static Mapping
Complex Failure Handling
© 2016 Cloudian, Inc. All rights reserved. 15
• Use a tool to move data from heavily used disk to less used disk.
• Tool needs to be run manually.• Lot of data movement. • Complex to recover from errors if data movement fails.
Initial Solution
16© 2016 Cloudian, Inc. All rights reserved.
• A routing table is used to determine object's storage location.
• ObjectId-MD5 as well as its insertion timestamp is used to determine the object's storage location.
• Each vNode is assigned initially to one of the available disks and a routing table entry is created for that vNode with timestamp 0.
• Periodically run checks on disks usage
Dynamic Object Routing
17© 2016 Cloudian, Inc. All rights reserved.
Dynamic Object Routing
• When a disk utilization is greater than overall average utilization, then some vNode(s) are moved from this disk to less utilized disk(s) and a new entry is added to routing table.
• All new objects that map to the vNode will be stored in new disk location. Existing objects will be accessed from old disk location using the routing table.
• This method avoids unnecessary data movement.
18© 2016 Cloudian, Inc. All rights reserved.
vNode-A1
Time:0,disk:disk1
Time:T1,disk:disk2
vNode-A2
Time:0, disk:disk1
vNode-B1
Time:0,disk:disk2
Time:T1, disk:disk3
vNode-C1
Time:0, disk: disk3
Time:T1, disk: disk4
Time:T, disk: disk3
Routing Table
19© 2016 Cloudian, Inc. All rights reserved.
vNode-
A1vNode-
A2
vNode-
B1
vNode-
B2,B3
vNode-
C1
disk1 disk2 disk3 disk4 disk5
Smart Disk Balancing
20© 2016 Cloudian, Inc. All rights reserved.
vNode-
A1vNode-
A2
vNode-
B1
vNode-
B2,B3
vNode-
C1
disk1 disk2 disk3 disk4 disk5disk4
Disk Failure/Maintenance
© 2016 Cloudian, Inc. All rights reserved. 21
Adding Multiple Nodes
• Cluster size is expanded by adding a new node to it• Objects are moved to newly added node to re-balance data • If cluster size is in the order of PBs, huge time is spent in
data re-balance• If multiple nodes added at once some objects need to be
moved multiple times.• DOR can be used to eliminate redundant object movement
and reduce data re-balance time• It also provides high availability while cluster is transitioning
through changes• It can be used to find the node storing a given object at any
point of time.
22© 2016 Cloudian, Inc. All rights reserved.
A
BC
Token: 0
Token: 400Token: 750
Token: 50
Token Range (0, 400] with Replication Factor 3 is stored on Nodes {B, C, A} at T0
Example Cluster
23© 2016 Cloudian, Inc. All rights reserved.
A
B
Token: 0
Token: 400Token: 750
Token: 50
Token: 200X
Token: 600
Y
Token: 900 Z
C
Cluster State Changes
24© 2016 Cloudian, Inc. All rights reserved.
Token Range (0, 200]
B C A Time T0
X B C Time T1
X B Y Time T2
DOR For Cluster
Time T3X B YX
25© 2016 Cloudian, Inc. All rights reserved.
THANK YOU
CLOUDIAN HYPERSTORE
More Info, free trial, demo, PoC:
● www.cloudian.com
● @CloudianStorage
● www.facebook.com/cloudian.cloudstorage