cross-site bigtable using hbase
DESCRIPTION
Speakers: Jingcheng Du and Ramkrishna Vasudevan (Intel) As HBase continues to expand in application and enterprise or government deployments, there is a growing demand for storing data across geographically distributed datacenters for improved availability and disaster recovery. The Cross-Site BigTable extends HBase to make it well-suited for such deployments, providing the capabilities of creating and accessing HBase tables that are partitioned and asynchronously backed-up over a number of distributed datacenters. This talk reveals how the Cross-Site BigTable manages data access over multiple datacenters and removes the data center itself as a single point of failure in geographically distributed HBase deployments.TRANSCRIPT
![Page 1: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/1.jpg)
HBase Cross-site BigTableSecurity Features in Apache HBase – An Operator’s Guide
Cross-Site Big Table using HBaseAnoop Sam John, Du Jingcheng, Ramkrishna S. Vasudevan
Big Data US Research And Development, Intel
![Page 2: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/2.jpg)
Partitioning Rule• A rule to parse row keys, help to map records to different
clusters. ClusterLocator provides this facility which is recorded in the central ZK– PrefixClusterLocator– SuffixClusterLocator– …
• An example of PrefixClusterLocator– If a row key is “clusterA,rowKey1”, then this record belongs to
clusterA
Motivation
![Page 3: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/3.jpg)
Motivation• Growing demands for storing the data across geographically
distributed data centers.– Data and data pattern is similar across data centers.– But the data is private to each of the data center.
• Improve the data availability and disaster recovery.• An easy way to access these distributed data.• Manage the hierarchy relationship between data centers. (Grouping of
data centers)
![Page 4: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/4.jpg)
Partitioning Rule• A rule to parse row keys, help to map records to different
clusters. ClusterLocator provides this facility which is recorded in the central ZK– PrefixClusterLocator– SuffixClusterLocator– …
• An example of PrefixClusterLocator– If a row key is “clusterA,rowKey1”, then this record belongs to
clusterA
Use Case
![Page 5: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/5.jpg)
Intelligent Transportation System• Monitors traffic movements, traffic patterns etc. in every city.• Data in every data center is private and holds traffic pattern of that city• Hierarchy of departments - National Transportation Department/State
Transportation Department/City Transportation Department• National/State Transportation Department – Virtual node• Helps to aggregate results/statistics over all the data centers.• Easy access and single point of view of all the data centers.
![Page 6: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/6.jpg)
Partitioning Rule• A rule to parse row keys, help to map records to different
clusters. ClusterLocator provides this facility which is recorded in the central ZK– PrefixClusterLocator– SuffixClusterLocator– …
• An example of PrefixClusterLocator– If a row key is “clusterA,rowKey1”, then this record belongs to
clusterA
Agenda
![Page 7: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/7.jpg)
Agenda• Goals of CSBT[1]
• Architecture of CSBT– Highly Available Global Zookeeper Quorum– Cross-Site Metadata in Global Zookeeper– Cluster Locator– Hierarchy
• Admin Operations on CSBT• Read/Write operations on CrossSiteHTable• Data Replication and FailOver• Future Improvements
[1] – CSBT refers to Cross-Site Big Table
![Page 8: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/8.jpg)
Partitioning Rule• A rule to parse row keys, help to map records to different
clusters. ClusterLocator provides this facility which is recorded in the central ZK– PrefixClusterLocator– SuffixClusterLocator– …
• An example of PrefixClusterLocator– If a row key is “clusterA,rowKey1”, then this record belongs to
clusterA
Goals
![Page 9: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/9.jpg)
Goals• A global view for tables across different data centers.• Define and manage the hierarchy relationship between data centers
and data.• High availability• Locality – In terms of geography
– Each data center holds its own data.
![Page 10: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/10.jpg)
Partitioning Rule• A rule to parse row keys, help to map records to different
clusters. ClusterLocator provides this facility which is recorded in the central ZK– PrefixClusterLocator– SuffixClusterLocator– …
• An example of PrefixClusterLocator– If a row key is “clusterA,rowKey1”, then this record belongs to
clusterA
Architecture
![Page 11: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/11.jpg)
Architecture
CSBTCSBT
![Page 12: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/12.jpg)
Architecture• An across data center dedicated, distributed zookeeper quorum –
Global Zookeeper• Table partitioning
– Each data center holds a specific partition of the table– Every partition of the Cross-site HTable is an HTable itself, bearing a
table name “<tableName>_<clustername>”– The partitioning rule is set by user at table creation time using
Cluster Locators.• Supports all admin and table operations as supported on the HTable.Apache ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services (Source : http://zookeeper.apache.org/)
![Page 13: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/13.jpg)
Architecture• Data center relationship
– Allows each data center to configure its peer data center (for replication and failover in read)• Peers for a cluster could be nodes in another cluster also.• Master-Master, Master-slave replication etc.
– Uses Asynchronous Replication of Apache HBase• Asynchronously writes the WAL entries to the configured peers
– Could define hierarchy for the data centers
![Page 14: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/14.jpg)
Highly Available Global Zookeeper Quorum• Dedicated Zookeeper cluster.• Split the Zookeeper quorum across data centers.• Recommended not to use the zookeeper cluster used by the individual HBase setups.• Leverage the Zookeeper Observer
– Do not impact the Zookeeper write performance– Configure in such a way that ‘reads’ are served locally– Configure the Zookeeper quorum as <local observers>,<leader/followers>,<observers in other DCs>
Observers are non-voting members of an ensemble which only hear the results of votesObservers may be used to talk to an Apache ZooKeeper server from another data centerClients of the Observer will see fast reads, as all reads are served locally, and writes result in minimal network traffic as the number of messages required in the absence of the vote protocol is smaller (Source : http://zookeeper.apache.org/)
![Page 15: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/15.jpg)
Cross-Site Metadata in Global ZookeeperCrossSite
clusters
tables
address
hierarchy
state
splitkeys
desc
proposed_desc
peerscluster1
cluster2
cluster3
table1
table2
table3
locator
![Page 16: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/16.jpg)
Cluster Locator• Sets data partition logic/rule. • Helps to locate a specific cluster based on the row key.• Users are allowed to create their own cluster locators
– PrefixClusterLocator – <clustername>,<row>where “,” is the delimiter
– SuffixClusterLocator – <row>,<xxx>,<yyy>,<clustername>where “,” is the delimiter and cluster name is always the string
that appears after the occurrence of the last delimiter.• Note that its up to the user to specify the cluster name in the row key while
doing ‘puts’ based on the cluster locator configured while table creation.
![Page 17: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/17.jpg)
Hierarchy
![Page 18: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/18.jpg)
Hierarchy• Could define the parent-child relationship for clusters• The node in the hierarchy could be either a physical cluster, or a virtual
one. A virtual node may represent a logical grouping of a set of physical clusters
• The hierarchy is used while ‘scan’ing– If a parent node is specified, all its descendants are also counted
![Page 19: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/19.jpg)
Partitioning Rule• A rule to parse row keys, help to map records to different
clusters. ClusterLocator provides this facility which is recorded in the central ZK– PrefixClusterLocator– SuffixClusterLocator– …
• An example of PrefixClusterLocator– If a row key is “clusterA,rowKey1”, then this record belongs to
clusterA
Admin operations on Cross-site BigTable
![Page 20: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/20.jpg)
Admin Operations• Operation performed using CrossSiteHBaseAdmin• Extends HBaseAdmin.
![Page 21: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/21.jpg)
Create Peers• Specifying peers creates the peers under the ‘peers’ node • Address of each peer is written as data in the peer znodes
peers cluster2
cluster3
cluster4
cluster1
![Page 22: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/22.jpg)
Create Table
Cluster a01
Table:T1_a01
Cluster a02
Table:T1_a02
Peer1Table:
T1_a01(backup
)
CSBTAdmin Global ZKCluster a01->Peer1Cluster a02->Peer2
Peer2Table:T1_a02(backup
)
1. Create the table znode in ZK
4. Writes the table related data in table’s znode and updates the state in zk
2. Create tables in clusters
3. Create table in peers if any
![Page 23: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/23.jpg)
Disable Table
Cluster a01
Table:T1_a01
Cluster a02
Table:T1_a02
Peer1Table:T1_a01(backup
)
CSBTAdmin Global ZKCluster a01->Peer1Cluster a02->Peer2
Peer2Table:
T1_a02(backup
)
1. Update the state to DISABLING
3. Update the state to DISABLED
2. Disable tables in clusters
• Do NOT disable the tables in the peers - As it is asynchronous replication, disabling peer may stop the entire replication. There may be some unfinished WALs from getting replicated
![Page 24: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/24.jpg)
Enable Table
Cluster a01
Table:T1_a01
Cluster a02
Table:T1_a02
Peer1Table:T1_a01(backup
)
CSBTAdmin Global ZKCluster a01->Peer1Cluster a02->Peer2
Peer2Table:
T1_a02(backup
)
1. Update the state to ENABLING
4. Update the state to ENABLED
2. Enable tables in clusters
3. Handle TableNotDisabledException as Peers already ENABLED
![Page 25: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/25.jpg)
Alter Schema
Cluster a01
Table:T1_a01
Cluster a02
Table:T1_a02
Peer1Table:T1_a01(backup
)
CSBTAdmin Global ZKCluster a01->Peer1Cluster a02->Peer2
Peer2Table:
T1_a02(backup
)
1. Write the new HTD to PROPOSED_DESC znode
3. Update the table’s HTD znode4. Update table state to DISABLED5. Delete the PROPOSED_DESC znode
3. Alter schema in clusters
4. Add/Modify column in peers by DISABLING. ENABLE after completion. If table not present create the table with the new HTD.
2. Update the state to MODIFYING/ADDING xxx state
![Page 26: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/26.jpg)
Delete Table
Cluster a01
Table:T1_a01
Cluster a02
Table:T1_a02
Peer1Table:T1_a01(backup
)
CSBTAdmin Global ZKCluster a01->Peer1Cluster a02->Peer2
Peer2Table:
T1_a02(backup
)
1. Update the state to DELETING
4. Remove the table from the zk
2. Delete tables in clusters
3. Disable and Delete the tables from the peer
![Page 27: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/27.jpg)
Failure handling• Failures are handled for the create/enable/disable/delete table by
using ZK states. Any failure the entire operation has to be retried.• A tool that helps to deduce and auto-correct inconsistencies in the
CSBT cluster in terms of table state.
![Page 28: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/28.jpg)
Partitioning Rule• A rule to parse row keys, help to map records to different
clusters. ClusterLocator provides this facility which is recorded in the central ZK– PrefixClusterLocator– SuffixClusterLocator– …
• An example of PrefixClusterLocator– If a row key is “clusterA,rowKey1”, then this record belongs to
clusterA
Read/Write operations on CrossSiteHTable
![Page 29: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/29.jpg)
Operations using CrossSiteHTable• Operations like put/get/scan/delete performed using CrossSiteHTable• Extends HTable
![Page 30: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/30.jpg)
Get/Put/Delete• Get/Put/Delete “a01, row1” from table T1
Cluster a01
Table:T1_a01
Cluster a02
Table:T1_a02
CSBTHTable
Global ZK
1. retrieve cluster locator for table “T1”(cached)
2. map “a01,row1” to cluster “a01”
3. find address for cluster “a01” (cached)
4. do get/put/delete(“a01,row1”) on table “T1_a01” from cluster “a01”
![Page 31: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/31.jpg)
Scan with Start/Stop row• New scan APIs added where cluster names could be passed while creating scans• Scan from table T1 [ start – “row1”, end – “row6” ] , clusters-[cluster a01, cluster a02]
Cluster a01Table:
T1_a01
CSBTHTable
Global ZK
1. retrieve cluster info for table “T1”(cached)
2. find address for cluster “a01” and “a02” (cached)
3. scan from(“a01,row1”) to (“a01,row6) on table “T1_a01” from cluster “a01”
4. scan from(“a02,row1”) to (“a02,row6) on table “T1_a02” from cluster “a02”
Cluster a02Table:
T1_a02
Cluster a03Table:
T1_a03
![Page 32: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/32.jpg)
Scan with Hierarchy
Scan from table T1 [ start – “row1”, end – “row6” ] , clusters-[California]California – virtual nodeSFO, LA, San Diego – physical nodes
![Page 33: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/33.jpg)
Scan• Uses a merge sort iterator to merge the results from different clusters
Client
Merge(sort) Iterator
Cluster A Cluster B Cluster Zall clusters …
![Page 34: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/34.jpg)
Operations on CSBT• The admin operations have shell and thrift support.• Also supports MapReduce for operations on CrossSiteBigTable.
![Page 35: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/35.jpg)
Partitioning Rule• A rule to parse row keys, help to map records to different
clusters. ClusterLocator provides this facility which is recorded in the central ZK– PrefixClusterLocator– SuffixClusterLocator– …
• An example of PrefixClusterLocator– If a row key is “clusterA,rowKey1”, then this record belongs to
clusterA
Data Replication and FailOver
![Page 36: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/36.jpg)
Data Center Relationship• Allows data centers to add peers• Apache HBase replication
– Asynchronous data replication– Customized replication sink for CSBT
• Read-only failover– Automatically redirects the read to the peer center
• Existing data not getting replicated for dynamic peer addition.
![Page 37: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/37.jpg)
Data Replication
Cluster “a01”
Table:T1_a01
Table:T1_a02’(backup)
Cluster “a02”
Table:T1_a03’(backup)
Table:T1_a02
Cluster “a03”
Table:T1_a01’(backup)
Table:T1_a03
CSBTHTable
replicate
replicate
replicate
putput
put
![Page 38: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/38.jpg)
Read-only Failover
Cluster “a01”
Table:T1_a01
Table:T1_a02’(backup)
Cluster “a02”
Table:T1_a03’(backup)
Table:T1_a02
Cluster “a03”
Table:T1_a01’(backup)
Table:T1_a03
CSBTHTable
failover to backup DC
get/scan
![Page 39: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/39.jpg)
Partitioning Rule• A rule to parse row keys, help to map records to different
clusters. ClusterLocator provides this facility which is recorded in the central ZK– PrefixClusterLocator– SuffixClusterLocator– …
• An example of PrefixClusterLocator– If a row key is “clusterA,rowKey1”, then this record belongs to
clusterA
Future improvements
![Page 40: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/40.jpg)
Future improvements• Security – CSBT security and how user/group authentications interact• MR improvement• Full fledged CSBT HBCK.
Currently the MR tasks runs in one cluster and all the result computation happens in one cluster. We could improve this by dispatching the task to each cluster and then collect the results from them.
![Page 41: Cross-Site BigTable using HBase](https://reader035.vdocuments.mx/reader035/viewer/2022062703/554f73c9b4c905bb178b534f/html5/thumbnails/41.jpg)
Partitioning Rule• A rule to parse row keys, help to map records to different
clusters. ClusterLocator provides this facility which is recorded in the central ZK– PrefixClusterLocator– SuffixClusterLocator– …
• An example of PrefixClusterLocator– If a row key is “clusterA,rowKey1”, then this record belongs to
clusterA
Q & A