003 admin featuresandclients

Scott Miao 2012/7/12

HBase Admin API & Available Clients

Agenda

Course Credit

HBase Admin APIs

HTableDescriptor

HColumnDescriptor

HBaseAdmin

Available Clients

Interactive Clients

Batch Clients

Web-based UI

Course Credit

Show up, 30 scores

Ask question, each question earns 5 scores

Hands-on, 40 scores

70 scores will pass this course

Each course credit will be calculated once for each course

finished

The course credit will be sent to you and your supervisor by

Hadoop RPC framework

Writable interface

void write(DataOutput out) throws IOException;

Serialize the Object data and send to remote

void readFields(DataInput in) throws IOException;

New an instance and deserialize the remote-data for subsequent

operations

Parameterless Constructor

Hadoop will instantiate a empty Object

Call the readFields method to deserialize the remote data

HTableDescriptor

Constructor

HTableDescriptor();

HTableDescriptor(String name);

HTableDescriptor(byte[] name);

HTableDescriptor(HTableDescriptor desc);

ch05/admin.CreateTableExample

Can be used to fine-tune the table’s performance

HTableDescriptor – Logical V.S. physical views

HTableDescriptor - Properties Property Description

Name Specify Table Name

byte[] getName();

String getNameAsString();

void setName(byte[] name);

Column Families Specify column family

void addFamily(HColumnDescriptor family);

boolean hasFamily(byte[] c);

HColumnDescriptor[] getColumnFamilies();

HColumnDescriptor getFamily(byte[]column);

HColumnDescriptor removeFamily(byte[] column);

Maximum File Size Specify maximum size a region within the table can grow to

long getMaxFileSize();

void setMaxFileSize(long maxFileSize);

It really about the maximum size of each store, the better name would be

maxStoreSize; By default, it’s size is 256 MB, a larger value may be required

when you have a lot of data. 7

HTableDescriptor - Properties Property Description

Read-only By default, all tables are writable, If the flag is set to true, you can only read

from the table and not modify it at all.

boolean isReadOnly();

void setReadOnly(boolean readOnly);

Memstore flush size An in-memory store to buffer values before writing them to disk as a new

storage file. default 64 MB.

long getMemStoreFlushSize();

void setMemStoreFlushSize(long memstoreFlushSize);

Deferred log flush Save write-ahead-log entries to disk, by default, set to false.

synchronized boolean isDeferredLogFlush();

void setDeferredLogFlush(boolean isDeferredLogFlush);

Miscellaneous options Stored with the table definition and can be retrieved if necessary.

byte[] getValue(byte[] key)

String getValue(String key)

Map<ImmutableBytesWritable, ImmutableBytesWritable> getValues()

void setValue(byte[] key, byte[] value)

void setValue(String key, String value)

void remove(byte[] key) 8

HColumnDescriptor

A more appropriate name would be HColumnFamilyDescriptor

The family name must be printable

You cannot simply rename them later

Constructor

HColumnDescriptor();

HColumnDescriptor(String familyName),

HColumnDescriptor(byte[] familyName);

HColumnDescriptor(HColumnDescriptor desc);

HColumnDescriptor(byte[] familyName, int maxVersions, String compression,

boolean inMemory, boolean blockCacheEnabled, int timeToLive,

String bloomFilter);

HColumnDescriptor(byte [] familyName, int maxVersions, String compression,

boolean inMemory, boolean blockCacheEnabled, int blocksize,

int timeToLive, String bloomFilter, int scope); 9

HColumnDescriptor –

Column families V.S. store files

Property Description

Name Specify column family name. A column family cannot be renamed, create a new family

with the desired name and copy the data over, using the API

byte[] getName();

String getNameAsString();

Maximum

versions

Predicate deletion. How many versions of each value you want to keep. Default value is 3

int getMaxVersions();

void setMaxVersions(int maxVersions);

Compression HBase has pluggable compression algorithm support. Default value is NONE.

HColumnDescriptor – Properties

Property Description

Block size All stored files are divided into smaller blocks that are loaded during a get or scan

operation, default value is 64KB.

synchronized int getBlocksize();

void setBlocksize(int s);

HDFS is using a block size of—by default—64 MB

Block cache HBase reads entire blocks of data for efficient I/O usage and retains these blocks

in an in-memory cache so that subsequent reads do not need any disk operation. The

default is true.

boolean isBlockCacheEnabled();

void setBlockCacheEnabled(boolean blockCacheEnabled);

if your use case only ever has sequential reads on a particular column family, it is

advisable that you disable it.

Time-to-live (TTL) Predicate deletion. A threshold based on the timestamp of a value and the internal

housekeeping is checking automatically if a value exceeds its TTL.

int getTimeToLive();

void setTimeToLive(int timeToLive);

By default, keeping the values forever (set to Integer.MAX_VALUE) 12

HColumnDescriptor – Properties Property Description

In-memory lock cache and how HBase is using it to keep entire blocks of data in memory for

efficient sequential access to values. The in-memory flag defaults to false.

boolean isInMemory();

void setInMemory(boolean inMemory);

is good for small column families with few values, such as the passwords of a user

table, so that logins can be processed very fast.

Bloom filter Allowing you to improve lookup times given you have a specific access pattern.

Since they add overhead in terms of storage and memory, they are turned off by

default.

Replication scope It enables you to have multiple clusters that ship local updates across the network so

that they are applied to the remote copies. By default is 0.

HBaseAdmin

Just like a DDL in RDBMSs

Create tables with specific column families

Check for table existence

Alter table and column family definitions

Drop tables

And more…

HBaseAdmin – Basic Operations

boolean isMasterRunning()

HConnection getConnection()

Configuration getConfiguration()

close()

HBaseAdmin – Table Operations

Table-related admin. API

They are asynchronous in nature

createTable() V.S. createTableAsync(), etc

Create Table

ch05/admin.CreateTableExample

ch05/admin.CreateTableWithRegionsExample

A numRegions that is at least 3: otherwise, the call will return with an

exception

This is to ensure that you end up with at least a minimum set of regions

HBaseAdmin – Table Operations Does Table exist

ch05/admin.ListTablesExample

You should be using existing table names

Otherwise, org.apache.hadoop.hbase.TableNotFoundException will be thrown

Delete Table

ch05/admin. TableOperationsExample

Disabling a table can potentially take a very long time, up to several

minutes

Depending on how much data is residual in the server’s memory and

not yet persisted to disk

Undeploying a region requires all the data to be written to disk first

isTableAvailable() V.S. isTableEnabled()/isTableDisabled()

HBaseAdmin – Table Operations

Modify Table

ch05/admin. ModifyTableExample

HTableDescriptor.equals()

Compares the current with the specified instance

Returns true if they match in all properties

Also including the contained column families and their respective settings

HBaseAdmin – Schema Operations

Besides using the modifyTable() call, there are dedicated

methods provided by the HBaseAdmin

Make sure the table to be modified is disabled first

All of these calls are asynchronous

void addColumn(String tableName, HColumnDescriptor column)

void addColumn(byte[] tableName, HColumnDescriptor column)

void deleteColumn(String tableName, String columnName)

void deleteColumn(byte[] tableName, byte[] columnName)

void modifyColumn(String tableName, HColumnDescriptor descriptor)

void modifyColumn(byte[] tableName, HColumnDescriptor descriptor)

HBaseAdmin – Cluster Operations

Methods in HBaseAdmin Class Description

• static void

checkHBaseAvailable(Configuration

• ClusterStatus getClusterStatus()

• Client application can com-municate with the remote

HBase cluster, either silently succeeds, or throws said error

• Retrieve an instance of the ClusterStatus class,

containing detailed information about the cluster status

• void closeRegion(String regionname,

String hostAndPort)

• void closeRegion(byte[] regionname,

String hostAndPort)

Close regions that have previously been deployed to region

servers. Does bypass any master notification, the region is

directly closed by the region server, unseen by the master

• void flush(String

tableNameOrRegionName)

• void flush(byte[]

Call the MemStore instances of the region or table, to flush

the cached modification data into disk. Or the data would be

written by hitting the memstore flush size.

For advanced users, so please check these API in the document and handle with care

HBaseAdmin – Cluster Operations Methods in HBaseAdmin

Description

• void compact(String

• void compact(byte[]

Minor-compaction, compactions can potentially take a long

time to complete. It is executed in the background by the

server hosting the named region, or by all servers hosting

any region of the given table

• void majorCompact(String

• void majorCompact(byte[]

Major-compaction

• void split(String

• void split(byte[]

• …

These calls allows you to split a specific region, or table

HBaseAdmin – Cluster Operations Methods in HBaseAdmin

Description

• void assign(byte[] regionName,

boolean force)

• void unassign(byte[]

regionName, boolean force)

A client requires a region to be deployed or undeployed from

the region servers, it can invoke these calls.

• void move(byte[]

encodedRegionName, byte[]

destServerName)

Move a region from its current region server to a new one.

The destServerName parameter can be set to null to pick a new

server at random.

• boolean balanceSwitch(boolean

• boolean balancer()

• Allows you to switch the region balancer on or off.

• A call to balancer() will start the process of moving regions

• from the servers, with more deployed to those with less

deployed regions.

• void shutdown()

• void stopMaster()

• void stopRegionServer(String

hostnamePort)

• Shut down the entire cluster

• Stop the master server

• Stop a particular region server only

• Once invoked, the affected servers will be stopped, that is,

there is no delay nor a way to revert the process 22

HBaseAdmin –

Cluster Status Information

You can get more details info. about your HBase cluster from

HBaseAdmin.getClusterStatus()

Related Classes

ClusterStatus

ServerName => HServerInfo

HServerLoad

RegionLoad

ch05/admin.ClusterStatusExample

Available Clients HBase comes with a variety of clients that can be used from

various programming languages

Interactive Clients Native Java API REST Thrift Avro

Batch Clients MapReduce Hive Pig

Web-based UI

Available Clients Interactive Clients

Native Java API

Thrift

Batch Clients

MapReduce

Web-based UI

We’ve already done

Batch Clients – MapReduce framework

HDFS: A distributed filesystem

MapReduce: A distributed Algorithm

Batch Clients - MapReduce framework

Batch Clients - MapReduce

InputFormat and TableInputFormat

Mapper and TableMapper

Reducer and TableReducer

OutputFormat and TableOutputFomrat

Batch Clients - MapReduce Sample

ch07/mapreduce.Driver

How to run //in root account

In hbase shell

create ‘testtable_mr’, ‘data’

//in hbase-user account

cd ${GIT_HOME}/hbase-training/002/projects/hbase-book/ch07

Hadoop fs –copyFromLocal

hadoop fs -copyFromLocal test-data.txt /tmp

hadoop jar target/hbase-book-ch07-1.0.jar ImportFromFile -t testtable -i /tmp/test-data.txt -c data:json

How to use hadoop jar target/hbase-book-ch07-1.0.jar //will show usage

Apache Pig project

A platform to analyze large amounts of data

It has its own high-level query language, called Pig Latin

uses an imperative programming style to formulate the steps

involved in transforming the input data to the final output

Opposite of Hive’s declarative approach to emulate SQL (HiveQL)

Combined with the power of Hadoop and the MapReduce

framework

Batch Clients - Pig

Batch Clients – Pig Latin Sample

--Load data from a file and write to HBase

raw = LOAD 'tutorial/data/excite-small.log' USING PigStorage('\t') \

AS (user, time, query);

T = FOREACH raw GENERATE \

CONCAT(CONCAT(user, '\u0000'), time), query;

STORE T INTO 'excite' USING \

org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam1:query');

--Load records which just been written from HBase

R = LOAD 'excite' USING \

org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam1:query', \

'-loadKey') AS (key: chararray, query: chararray); 34

We already used on course #1

hbase shell

The majority of commands have a direct match with a

method provided by either the client or administrative API

Grouped into five different categories, representing their

semantic relationships

Shell - General

Shell – Data definition

Shell – Data manipulation

Shell – Tools

Shell – Replication

Web-based UI

Master UI (http://${your_host}:8110/master.jsp)

Main page

User Table page

Zookeeper page

Region Server UI

Shared pages

Local logs

Thread Dump

Log level

呼~終於完了…Orz

003 admin featuresandclients

default value

int scope9

maximum size

string compression

memory cache

hcolumndescriptor properties11

memory store

object data

Technology

proceso cas n° 003 -2021-minedu/-ugel.06-arh...

the new bmw 1 series %003/% %003 - auto-brochures.com

aonexaviers.orgaonexaviers.org/pagefiles/maths.pdf003 003...

negotiation for cohesion in a complex adaptive...

j2998 - raquel | 003 - deserto · j2998 - raquel | 003 -...

jsbccl.jharkhand.gov.injsbccl.jharkhand.gov.in/assets/admin/uploads/... ·...

licitaciÓn pÚblica nacional presencial convocatoria...

aÑo de la consolidaciÓn del mar de grau normas...

003-003 chaouen-marruecos

ΚΑΤΑΛΟΓΟΣ ΠΡΟΪΟΝΤΩΝ - poker store...star...

math 255-003 -- math 255-003 -- math 255-003 -- math...

1a11-003-003 manual aps

order for supplies or services · 003 av/desktop admin....

pinturas tecnidecor | pintura en medellín, itagüi ... ·...

audience : admin staff, research staff and students of...

automatically generated pdf from existing...

red ensign group passenger yacht code industry working...

2020 blackwood lake advisory committee 7 ......comm planning...

peoplesoft admin training | peoplesoft admin course |...

direcciÓn de comercio y alcoholes de la secretarÍa...