003 admin featuresandclients
Post on 27-May-2015
987 Views
Preview:
TRANSCRIPT
Scott Miao 2012/7/12
HBase Admin API & Available Clients
1
Agenda
Course Credit
HBase Admin APIs
HTableDescriptor
HColumnDescriptor
HBaseAdmin
Available Clients
Interactive Clients
Batch Clients
Shell
Web-based UI
2
Course Credit
Show up, 30 scores
Ask question, each question earns 5 scores
Hands-on, 40 scores
70 scores will pass this course
Each course credit will be calculated once for each course
finished
The course credit will be sent to you and your supervisor by
3
Hadoop RPC framework
Writable interface
void write(DataOutput out) throws IOException;
Serialize the Object data and send to remote
void readFields(DataInput in) throws IOException;
New an instance and deserialize the remote-data for subsequent
operations
Parameterless Constructor
Hadoop will instantiate a empty Object
Call the readFields method to deserialize the remote data
4
HTableDescriptor
Constructor
HTableDescriptor();
HTableDescriptor(String name);
HTableDescriptor(byte[] name);
HTableDescriptor(HTableDescriptor desc);
ch05/admin.CreateTableExample
Can be used to fine-tune the table’s performance
5
HTableDescriptor – Logical V.S. physical views
6
HTableDescriptor - Properties Property Description
Name Specify Table Name
byte[] getName();
String getNameAsString();
void setName(byte[] name);
Column Families Specify column family
void addFamily(HColumnDescriptor family);
boolean hasFamily(byte[] c);
HColumnDescriptor[] getColumnFamilies();
HColumnDescriptor getFamily(byte[]column);
HColumnDescriptor removeFamily(byte[] column);
Maximum File Size Specify maximum size a region within the table can grow to
long getMaxFileSize();
void setMaxFileSize(long maxFileSize);
It really about the maximum size of each store, the better name would be
maxStoreSize; By default, it’s size is 256 MB, a larger value may be required
when you have a lot of data. 7
HTableDescriptor - Properties Property Description
Read-only By default, all tables are writable, If the flag is set to true, you can only read
from the table and not modify it at all.
boolean isReadOnly();
void setReadOnly(boolean readOnly);
Memstore flush size An in-memory store to buffer values before writing them to disk as a new
storage file. default 64 MB.
long getMemStoreFlushSize();
void setMemStoreFlushSize(long memstoreFlushSize);
Deferred log flush Save write-ahead-log entries to disk, by default, set to false.
synchronized boolean isDeferredLogFlush();
void setDeferredLogFlush(boolean isDeferredLogFlush);
Miscellaneous options Stored with the table definition and can be retrieved if necessary.
byte[] getValue(byte[] key)
String getValue(String key)
Map<ImmutableBytesWritable, ImmutableBytesWritable> getValues()
void setValue(byte[] key, byte[] value)
void setValue(String key, String value)
void remove(byte[] key) 8
HColumnDescriptor
A more appropriate name would be HColumnFamilyDescriptor
The family name must be printable
You cannot simply rename them later
Constructor
HColumnDescriptor();
HColumnDescriptor(String familyName),
HColumnDescriptor(byte[] familyName);
HColumnDescriptor(HColumnDescriptor desc);
HColumnDescriptor(byte[] familyName, int maxVersions, String compression,
boolean inMemory, boolean blockCacheEnabled, int timeToLive,
String bloomFilter);
HColumnDescriptor(byte [] familyName, int maxVersions, String compression,
boolean inMemory, boolean blockCacheEnabled, int blocksize,
int timeToLive, String bloomFilter, int scope); 9
HColumnDescriptor –
Column families V.S. store files
10
Property Description
Name Specify column family name. A column family cannot be renamed, create a new family
with the desired name and copy the data over, using the API
byte[] getName();
String getNameAsString();
Maximum
versions
Predicate deletion. How many versions of each value you want to keep. Default value is 3
int getMaxVersions();
void setMaxVersions(int maxVersions);
Compression HBase has pluggable compression algorithm support. Default value is NONE.
HColumnDescriptor – Properties
11
HColumnDescriptor – Properties
Property Description
Block size All stored files are divided into smaller blocks that are loaded during a get or scan
operation, default value is 64KB.
synchronized int getBlocksize();
void setBlocksize(int s);
HDFS is using a block size of—by default—64 MB
Block cache HBase reads entire blocks of data for efficient I/O usage and retains these blocks
in an in-memory cache so that subsequent reads do not need any disk operation. The
default is true.
boolean isBlockCacheEnabled();
void setBlockCacheEnabled(boolean blockCacheEnabled);
if your use case only ever has sequential reads on a particular column family, it is
advisable that you disable it.
Time-to-live (TTL) Predicate deletion. A threshold based on the timestamp of a value and the internal
housekeeping is checking automatically if a value exceeds its TTL.
int getTimeToLive();
void setTimeToLive(int timeToLive);
By default, keeping the values forever (set to Integer.MAX_VALUE) 12
HColumnDescriptor – Properties Property Description
In-memory lock cache and how HBase is using it to keep entire blocks of data in memory for
efficient sequential access to values. The in-memory flag defaults to false.
boolean isInMemory();
void setInMemory(boolean inMemory);
is good for small column families with few values, such as the passwords of a user
table, so that logins can be processed very fast.
Bloom filter Allowing you to improve lookup times given you have a specific access pattern.
Since they add overhead in terms of storage and memory, they are turned off by
default.
Replication scope It enables you to have multiple clusters that ship local updates across the network so
that they are applied to the remote copies. By default is 0.
13
HBaseAdmin
Just like a DDL in RDBMSs
Create tables with specific column families
Check for table existence
Alter table and column family definitions
Drop tables
And more…
14
HBaseAdmin – Basic Operations
boolean isMasterRunning()
HConnection getConnection()
Configuration getConfiguration()
close()
15
HBaseAdmin – Table Operations
Table-related admin. API
They are asynchronous in nature
createTable() V.S. createTableAsync(), etc
Create Table
ch05/admin.CreateTableExample
ch05/admin.CreateTableWithRegionsExample
A numRegions that is at least 3: otherwise, the call will return with an
exception
This is to ensure that you end up with at least a minimum set of regions
16
HBaseAdmin – Table Operations Does Table exist
ch05/admin.ListTablesExample
You should be using existing table names
Otherwise, org.apache.hadoop.hbase.TableNotFoundException will be thrown
Delete Table
ch05/admin. TableOperationsExample
Disabling a table can potentially take a very long time, up to several
minutes
Depending on how much data is residual in the server’s memory and
not yet persisted to disk
Undeploying a region requires all the data to be written to disk first
isTableAvailable() V.S. isTableEnabled()/isTableDisabled()
17
HBaseAdmin – Table Operations
Modify Table
ch05/admin. ModifyTableExample
HTableDescriptor.equals()
Compares the current with the specified instance
Returns true if they match in all properties
Also including the contained column families and their respective settings
18
HBaseAdmin – Schema Operations
Besides using the modifyTable() call, there are dedicated
methods provided by the HBaseAdmin
Make sure the table to be modified is disabled first
All of these calls are asynchronous
void addColumn(String tableName, HColumnDescriptor column)
void addColumn(byte[] tableName, HColumnDescriptor column)
void deleteColumn(String tableName, String columnName)
void deleteColumn(byte[] tableName, byte[] columnName)
void modifyColumn(String tableName, HColumnDescriptor descriptor)
void modifyColumn(byte[] tableName, HColumnDescriptor descriptor)
19
HBaseAdmin – Cluster Operations
Methods in HBaseAdmin Class Description
• static void
checkHBaseAvailable(Configuration
conf)
• ClusterStatus getClusterStatus()
• Client application can com-municate with the remote
HBase cluster, either silently succeeds, or throws said error
• Retrieve an instance of the ClusterStatus class,
containing detailed information about the cluster status
• void closeRegion(String regionname,
String hostAndPort)
• void closeRegion(byte[] regionname,
String hostAndPort)
Close regions that have previously been deployed to region
servers. Does bypass any master notification, the region is
directly closed by the region server, unseen by the master
node.
• void flush(String
tableNameOrRegionName)
• void flush(byte[]
tableNameOrRegionName)
Call the MemStore instances of the region or table, to flush
the cached modification data into disk. Or the data would be
written by hitting the memstore flush size.
For advanced users, so please check these API in the document and handle with care
20
HBaseAdmin – Cluster Operations Methods in HBaseAdmin
Class
Description
• void compact(String
tableNameOrRegionName)
• void compact(byte[]
tableNameOrRegionName)
Minor-compaction, compactions can potentially take a long
time to complete. It is executed in the background by the
server hosting the named region, or by all servers hosting
any region of the given table
• void majorCompact(String
tableNameOrRegionName)
• void majorCompact(byte[]
tableNameOrRegionName)
Major-compaction
• void split(String
tableNameOrRegionName)
• void split(byte[]
tableNameOrRegionName)
• …
These calls allows you to split a specific region, or table
21
HBaseAdmin – Cluster Operations Methods in HBaseAdmin
Class
Description
• void assign(byte[] regionName,
boolean force)
• void unassign(byte[]
regionName, boolean force)
A client requires a region to be deployed or undeployed from
the region servers, it can invoke these calls.
• void move(byte[]
encodedRegionName, byte[]
destServerName)
Move a region from its current region server to a new one.
The destServerName parameter can be set to null to pick a new
server at random.
• boolean balanceSwitch(boolean
b)
• boolean balancer()
• Allows you to switch the region balancer on or off.
• A call to balancer() will start the process of moving regions
• from the servers, with more deployed to those with less
deployed regions.
• void shutdown()
• void stopMaster()
• void stopRegionServer(String
hostnamePort)
• Shut down the entire cluster
• Stop the master server
• Stop a particular region server only
• Once invoked, the affected servers will be stopped, that is,
there is no delay nor a way to revert the process 22
HBaseAdmin –
Cluster Status Information
You can get more details info. about your HBase cluster from
HBaseAdmin.getClusterStatus()
Related Classes
ClusterStatus
ServerName => HServerInfo
HServerLoad
RegionLoad
ch05/admin.ClusterStatusExample
23
Available Clients HBase comes with a variety of clients that can be used from
various programming languages
Interactive Clients Native Java API REST Thrift Avro
Batch Clients MapReduce Hive Pig
Shell
Web-based UI
24
Available Clients Interactive Clients
Native Java API
REST
Thrift
Avro
Batch Clients
MapReduce
Hive
Pig
Shell
Web-based UI
We’ve already done
25
Batch Clients – MapReduce framework
HDFS: A distributed filesystem
MapReduce: A distributed Algorithm
26
Batch Clients - MapReduce framework
27
Batch Clients - MapReduce
InputFormat and TableInputFormat
28
Batch Clients - MapReduce
Mapper and TableMapper
29
Batch Clients - MapReduce
Reducer and TableReducer
30
Batch Clients - MapReduce
OutputFormat and TableOutputFomrat
31
Batch Clients - MapReduce Sample
ch07/mapreduce.Driver
How to run //in root account
In hbase shell
create ‘testtable_mr’, ‘data’
//in hbase-user account
cd ${GIT_HOME}/hbase-training/002/projects/hbase-book/ch07
Hadoop fs –copyFromLocal
hadoop fs -copyFromLocal test-data.txt /tmp
hadoop jar target/hbase-book-ch07-1.0.jar ImportFromFile -t testtable -i /tmp/test-data.txt -c data:json
How to use hadoop jar target/hbase-book-ch07-1.0.jar //will show usage
32
Apache Pig project
A platform to analyze large amounts of data
It has its own high-level query language, called Pig Latin
uses an imperative programming style to formulate the steps
involved in transforming the input data to the final output
Opposite of Hive’s declarative approach to emulate SQL (HiveQL)
Combined with the power of Hadoop and the MapReduce
framework
Batch Clients - Pig
33
Batch Clients – Pig Latin Sample
--Load data from a file and write to HBase
raw = LOAD 'tutorial/data/excite-small.log' USING PigStorage('\t') \
AS (user, time, query);
T = FOREACH raw GENERATE \
CONCAT(CONCAT(user, '\u0000'), time), query;
STORE T INTO 'excite' USING \
org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam1:query');
--Load records which just been written from HBase
R = LOAD 'excite' USING \
org.apache.pig.backend.hadoop.hbase.HBaseStorage('colfam1:query', \
'-loadKey') AS (key: chararray, query: chararray); 34
Shell
We already used on course #1
hbase shell
The majority of commands have a direct match with a
method provided by either the client or administrative API
Grouped into five different categories, representing their
semantic relationships
35
Shell - General
36
Shell – Data definition
37
Shell – Data manipulation
38
Shell – Tools
39
Shell – Replication
40
Web-based UI
Master UI (http://${your_host}:8110/master.jsp)
Main page
User Table page
Zookeeper page
Region Server UI
Shared pages
Local logs
Thread Dump
Log level
41
呼~終於完了…Orz
42
43
top related