creating hbase cluster and replication on aws

31
CREATING HBASE CLUSTER AND REPLICATION ON AWS 1 Setting up Amazon EC2 Instances Creating two clusters on same regions with 3 node on one cluster and 3 nodes on other Clusters with minimum volume of 8GB. 1.1 Launch Instance Login to Amazon Web Services, click on My Account and navigate to Amazon EC2 Console 1.2 Select AMI Select the Ubuntu-precise-12.04 Server 64 bit OS

Upload: pinkpantherprem

Post on 15-Sep-2015

25 views

Category:

Documents


0 download

DESCRIPTION

Hbase cluster

TRANSCRIPT

CREATING HBASE CLUSTER AND REPLICATION ON AWS1 Setting up Amazon EC2 InstancesCreating two clusters on same regions with 3 node on one cluster and 3 nodes on other Clusters with minimum volume of 8GB.1.1 Launch InstanceLogin to Amazon Web Services, click on My Account and navigate to Amazon EC2 Console

1.2 Select AMISelect the Ubuntu-precise-12.04 Server 64 bit OS

1.3 Select Instance TypeSelect the `Instance Type` as `m3.medium.

1.4 Configure Number of InstancesProvide the instance details ,shutdown behavior and availability zone.

1.5 Add StorageUse the default options in the below screen.

1.6 Instance DescriptionProvide instance name and description

1.7 Define a Security GroupIt is very important to configure the EC2 firewall correctly. On the Configure Firewall page choose Create a new Security Group, and authorize all the ports listed below:

1.8 Review and Launch Instance.Check the instance details and click launch

1.9 Launch Instance and Create Security PairAmazon EC2 uses publickey cryptography to encrypt and decrypt login information. Publickey cryptography uses a public key to encrypt a piece of data, such as a password, then the recipient uses the private key to decrypt the data. The public and private keys are known as akey pair.1.10 Define a Security GroupCreate a new security group, and modify the security group with security rules.

1.11 Launching InstancesOnce you click Launch Instance 6 instance should be launched with pending state

Once in running state rename the instance name as below.NameNode Standby1Standby2MasterSlave1Slave2

2 Setting up client access to Amazon InstancesCreate a new keypair and give it a name Clusterkey and download the keypair (.pem) file to your local machine. Click Launch Instance

2.1 Generating Private KeyLets launch PUTTYGEN client and import the key pair which is already created during launch instance step Clusterkey.pem Navigate to Conversions and Import Key

Click Generate ,

Save Private KeyNow save the private key by clicking on Save Private Key and click Yes and leave passphrase empty. 2.2 Connect to Amazon InstanceLaunch Putty client and Load the ppk file.Repeat this for slave nodes.2.3 Setup WinSCP access to EC2 instances:

In order to securely transfer files from your windows machine to Amazon EC2 WinSCP is a handy utility.For User name, enter the default user name for your AMI. For Amazon Ubuntu AMIs, the user name is UbuntuFor Private key, enter the path to your private key, or click the "" button to browse for the file.Click Login to connect, and click Yes to add the host fingerprint to the host cache.

Select the pem file clusterkey.pem file and drag it to other right pane.

Repeat this for slave nodes.

3 Setup Password-less SSH on Servers

Master server remotely starts services on salve nodes, whichrequires password-less access to Slave Servers. AWS Ubuntu server comes with pre-installed OpenSSh server.The public part of the key loaded into the agent must be put on the target system in ~/.ssh/authorized_keys. This has been taken care of by the AWS Server creation processNow we need to add the AWS EC2 Key Pair identity Clusterkey.pem to ssh profile In order to do that we will need to use following ssh utilities ssh-agent is a background program that handles passwords for SSH private keys. ssh-add command prompts the user for a private key password and adds it to the list maintained by ssh-agent. Once you add a password to ssh-agent, you will not be asked to provide the key when using SSH or SCP to connect to hosts with your public key.Amazon EC2 Instance has already taken care of authorized_keys on master server, execute following commands to allow password-less SSH access to slave servers.

Steps: In a command line shell, change directories to the location of the private key file that you created when you launched the instance. Use the chmod command to make sure your private key file isn't publicly viewable. For example, if the name of your private key file is my-key-pair.pem, you would use the following command: chmod 400 Clusterkey.pem

Use the ssh command to connect to the instance. You'll specify the private key (.pem) file and username@public_dns_name. For Amazon Ubuntu, the default user name is ubuntu. For RHEL5, the user name is often root but might be ec2-user. For Ubuntu, the user name is ubuntu. For SUSE Linux, the user name is root. Otherwise, check with your AMI provider.

ssh -i Clusterkey.pem [email protected]

You'll see a response like the following.The authenticity of host 'ec2-198-51-100-1.compute-1.amazonaws.com (10.254.142.33)'can't be established.RSA key fingerprint is 1f:51:ae:28:bf:89:e9:d8:1f:25:5d:37:2d:7d:b8:ca:9f:f5:f1:6f.Are you sure you want to continue connecting (yes/no)?

(Optional) If you've launched a public AMI, verify that the fingerprint in the security alert matches the fingerprint that you obtained in step 1. If these fingerprints don't match, someone might be attempting a "man-in-the-middle" attack. If they match, continue to the next step Enter yes.You'll see a response like the following.Warning: Permanently added 'ec2-54-241-10-95.compute-1.amazonaws.com' (RSA) to the list of known hosts.

Sample screenshot for the password-less ssh,

4 Download the Cloudera Manager 4.5 installer and execute it on the remote instance:$ wget http://archive.cloudera.com/cm4/installer/latest/cloudera-manager-installer.bin$ chmod +x cloudera-manager-installer.bin$ sudo ./cloudera-manager-installer.bin

Click Yes,

Note down the http://localhost:7180/ this is used to open the Cloudera Manager Console using browser.

4.2 Installing a CDH Cluster with Cloud Express WizardAfter logging in, Cloudera Manager will detect that it runs on EC2, and it will greet you with the welcome screen of the new wizard (see below). There is a warning that the instances started by this installer are instance store-based, which implies that stopping or terminating these instances results in losing all data stored on them. Remember to back-up important data from the cluster before terminating the instances!Default username:adminDefault password:admin

Select Cloudera Enterprise Trial and click next,

Click Launch the classic wizard,

Click continue,

Enter the internal ips of each node on the clusters

Select the package,versoin and release ,

Login as Ubuntu user and click browse to upload the .pem file and click continue

Installation Progress Starts here,

If No issues with configurations installation will complete successfully.

Click Continue,

Choose the CDH services whichever required, and click inspect Assignments,

Assign appropriate services and its roles to the required hosts

Click test connection,

Click continue,

Cluster services starts here,

Check the health status and configuration issues it should shows good health

The Java Heap size recommended minimum size is 1G

HBase Replication:

Step1:Enable the replication In the Cloudera Manager as below

Restart the HBase

Step2:Add the following code to HBase's configuration file (hbase-site.xml) to enablereplication on the master cluster:hadoop@master1$ vi $HBASE_HOME/conf/hbase-site.xml

hbase.replicationtrue

Sync the change to all the servers, including the client nodes in the cluster, andrestart HBase.Repeat this to slave node.Step3:hbase(main):010:0> create 'emp', { NAME => 'Details', REPLICATION_SCOPE =>1}0 row(s) in 1.1070 seconds=> Hbase::Table - emphbase(main):011:0> disable 'emp'0 row(s) in 1.2170 secondsIf you are using an existing table, alter it to support replication:hbase(main):012:0> alter 'emp', NAME => 'cf1', REPLICATION_SCOPE => '1'Updating all regions with the new schema...1/1 regions updated.Done.0 row(s) in 1.5200 seconds

hbase(main):013:0> enable 'emp'0 row(s) in 1.1860 secondsExecute steps 2 to 3 on the peer (slave) cluster as well. This includes enablingreplication, restarting HBase, and creating an identical copy of the table.Step4:hbase(main):014:0> start_replication0 row(s) in 0.1210 secondshbase(main):016:0> put 'emp', 'row1', 'Details:name','devaraj'0 row(s) in 0.0180 secondshbase(main):017:0>put 'emp','row1','Details:Eid','1009'0 row(s) in 0.0130 seconds

hbase(main):019:0>put 'emp','row1','Details:mobile','90000101011'0 row(s) in 0.0140 secondshbase(main):021:0> put 'emp','row1','Details:Year','2013'0 row(s) in 0.0110 secondshbase(main):022:0> put 'emp','row2','Details:Name','Prabu'Step5:To check peer is enabled or not:hbase(main):001:0> list_peers PEER_ID CLUSTER_KEY STATE 1 ip-10-202-169-141.us-west-1.compute.internal:2181:/hbase ENABLED2 ip-10-190-147-97.us-west-1.compute.internal:2181:/hbase ENABLED 3 ip-10-249-0-249.us-west-1.compute.internal:2181:/hbase ENABLED

hbase(main):002:0> add_peer '2', 'ip-10-190-147-97.us-west-1.compute.internal:2181:/hbase'0 row(s) in 0.0290 seconds

hbase(main):003:0> add_peer '3', 'ip-10-249-0-249.us-west-1.compute.internal:2181:/hbase'0 row(s) in 0.0700 seconds.

Step6:Connect to HBase Shell on the peer cluster and do a scan on the table to see if thedata has been replicated:

$HBASE_HOME/bin/hbase shell

hbase> scan ' emp'ROW COLUMN+CELL row1 column=Details:name, timestamp=1401702464224, value=Devaraj row1 column=Details:Eid, timestamp=1401703326645, value=1010

HADOOP_HOME/bin/hadoop jar $HBASE_HOME/hbase-0.92.1.jar verifyrep 1 empStep6:Stop the replication on the master cluster by running the following command:

hbase> stop_replication

Step7:Remove the replication peer from the master cluster by using the following command:

hbase> remove_peer '1'