hbase, dances on the elephant back

HBaseDANCES ON THE ELEPHANT BACK

Roman Nikitchenko, 13.08.2014

2www.vitech.com.ua

Agenda

Integration with Hadoop, crazy ideas, magic.

Architecture, data model, features.

Motivation and place for HBase in NoSQL world

HBASE: WHO AND WHY?

HBASE as is

AROUND HBASE

3www.vitech.com.ua

Is hadoop good for data?

… so attractive

● Hadoop is open source framework for big data. Both distributed storage and processing.

● Hadoop is reliable and fault tolerant with no rely on hardware for these properties.

● Hadoop has unique horisontal scalability. Currently — from single computer up to thousands of cluster nodes.

4www.vitech.com.ua

Hadoop: classical picture

Hadoop historical top view

● HDFS serves as file system layer

● MapReduce originally served as distributed processing framework.

● Native client API is Java but there are lot of alternatives.

● But where is SQL server here?

5www.vitech.com.ua

HBase motivation

● Designed for throughput, not for latency.

● HDFS blocks are expected to be large. There is issue with lot of small files.

● Write once, read many times ideology.

● MapReduce is not so flexible so any database built on top of it.

● How about realtime?

So Hadoop is...

6www.vitech.com.ua

HBase motivation

BUT WE OFTEN NEED...

LATENCY, SPEED and all Hadoop properties.

7www.vitech.com.ua

So HBASE is for this.

● Open source Google BigTable implementation with appropriate infrastructure place.

● Realtime, low latency, linear scalability.● Distributed, reliable and fault tolerant.● Natural integration with other Hadoop

components.

● No any SQL, secondary indexes out of the box.● Limited ACID guarantees.● Really good for massive scans.

8www.vitech.com.ua

Google Bigtable / Hadoop architecture and HBase

High layer applications

MapReduce (Hadoop MapReduce)

YARN (resource management)

Distributed file system (Google FS, HDFS).

9www.vitech.com.ua

HBASE facts and trends

2006 2007 2008 2009 2010 … 2014 … future

2008, HBase goes OLTP (online transaction processing). 0.20 is first performance release

2010, HBase becomes Apache top-level project

HBase 0.92 is considered production ready release

November 2010, Facebook elected HBase to implement

new messaging platform

2007, First code is released as part of

Hadoop 0.15. Focus is on offline, crawl data storage

2006, Google BigTable paper is published. HBase

development starts

10www.vitech.com.ua

HBase data paths on conceptual level

Analytics, long running jobs Realtime operations

Adapters (Hive) MapReduce API HBase API Adapters

(Impala)

MapReduce (Hadoop MapReduce)

YARN (resource management)

Distributed file system (Google FS, HDFS)

● HBase can be used both for long running analytics and real time low latency operations.

● Third party adapters are possible if you need fast track. Some functionality and performance drawbacks are the price you pay.

11www.vitech.com.ua

Loose data structure

Book: title, author, pages, price

Ball: color, size, material, price

Toy car: color, type, radio control, price

Kind Price Title Author Pages Color Size Material Type Radio control

Book + + + +

Ball + + + +

Toy car + + + +

● Data looks like tables with large number of columns.

● Columns set can vary from row to row.

● No table modification is needed to add column to row.

Book #1: Kind, Price, Title, Author, Pages

Ball #1: Kind, Price, Color, Size, Material

Toy car #1: Price, Color, Type +Radio control

Book #2: Kind, Price, Title, Author

12www.vitech.com.ua

Table

Logical data structure

Region

Region

Row

Key Family #1 Family #2 ...Column Column ... ...

...

...

...

Data is placed in tables.

Tables are split into regions based on row key ranges.

Columns are grouped into families.Every table row

is identified by unique row key.

Every row consists of columns.

13www.vitech.com.ua

Table

Region

Data storage structure

RegionRow

Key Family #1 Family #2 ...Column Column ... ...

...

● Data is stored in HFile.● Families are stored on

disk in separate files.● Row keys are

indexed in memory.● Column includes key,

qualifier, value and timestamp.● No column limit.● Storage is block based (default 64K).

HFile: family #1

Row key Column Value TS

... ... ... ...

... ... ... ...

HFile: family #2

Row key Column Value TS

... ... ... ...

... ... ... ...

● Delete is just another marker record.

● Periodic compaction is required.

14www.vitech.com.ua

Architecture

● Zookeeper coordinates distributed elements and is primary contact point for client.

● Master server keeps metadata and manages data distribution over Region servers.

● Region servers manage data table regions but actual data storage service including replication is on HDFS data nodes. Clients directly communicate with region server for data.

DATA

META

Rack

DN DN

RS RS

Rack

DN DN

RS RS

Rack

DN DN

RS RSNameNode

Client

HMasterZookeeper

15www.vitech.com.ua

CRUD: Put and Delete

● Writes are logged and cached in memory.● Main thing to remember: lower layer is

WRITE ONLY filesystem (HDFS). So both PUT and DELETE path is identical.

● Both PUT and DELETE requests are per row key. No row key range for DELETE.

● DELETE is just another marker added.● Actual DELETE is performed during

compactions.● Don't forget we can have several families.

16www.vitech.com.ua

CRUD: Put and Delete, write path

● Actual write is to region server. Master is not involved.● All requests are coming to WAL (write ahead log) to

provide recovery.● Region server keeps MemStore as temporary storage.● Only when needed write is flushed to disk (into HFile).

17www.vitech.com.ua

CRUD: Get and Scan

● Get operation is simple data request by row key.

● Scan operation is performed based on row key range which could involve several table regions.

● Both Get and Scan can include client filters — expressions that are processed on server side and can seriously limit results so traffic.

● Both Scan and Get operations can be performed on several column families.

● Get operation is implemented through Scan.

18www.vitech.com.ua

DATA

META

Integration with MapReduce

● HBase provides number of classes for native MapReduce integration. Main point is data locality.

● TableInputFormat allows massive MapReduce table processing (maps table with one region per mapper).

● HBase classes like Result (Get / Scan result) or Put (Put request) can be passed between MapReduce job stages.

● We have moderate experience of making things here even better.

DataNode

NameNodeJobTracker TaskTracker

RegionServerHMaster Ofen single node so data is local

19www.vitech.com.ua

Coprocessors: Key points

● Coprocessors is feature that allows to extend HBase without product code modification.

● RegionObserver can attach code to operations on region level.

● Similar functionality exists for HMaster.● Endpoints is the way to provide functionality

equal to stored procedure.● Together coprocessor infrastructure can bring

realtime distributed processing framework (lightweight MapReduce).

20www.vitech.com.ua

Request

Coprocessors: Region observer

Client

Table

Region observer Region observer

Result

Region Region

RegionServer RegionServer

Region observer works like hook on region operations. Region observer Region observerRegion observer Region observer

Region observers can be stacked.

21www.vitech.com.ua

RegionServer RegionServer

Coprocessors: Endpoints

Request (RPC)

Client Table

Region Region

Direct communication via separate protocol.

Response

Endpoint Endpoint

Your commands can have effect on

table regions.

22www.vitech.com.ua

Secondary indexes

● HBase has no support for secondary indexes out-of-the-box.

● Coprocessor (RegionObserver) is used to track Put and Delete operations and update index table.

● Scan operations with index column filter are intercepted and processed based on index table content.

Table

ClientIndextable

RegionobserverPut / Delete Index update

Scan with filter

Region

Index search

23www.vitech.com.ua

Bulk load

● There is ability to load data in table MUCH FASTER.

● HFile is generated with required data.

● It is preferable to generate one HFile per table region. MapReduce can be used.

● Prepared HFile is merged with table storage on maximum speed.

Dataimporters

HFile generator

HFile generator

HFile generator

Table region

Table region

Table region

Mappers Reducers

HFile

HFile

HFile

24www.vitech.com.ua

HDFS

Replication and search integration

WAL, Regions

Data update

Client

User just puts (or deletes) data.

Search responses

Lily HBase NRT indexer

Replication can be set up to column

family level.

REPLICATIONHBasecluster

Translates data changes into SOLR

index updates.

SOLR cloudSearch requests (HTTP)

Apache Zookeeper does all coordination

Finally provides search

Serves low level file system.

25www.vitech.com.ua

HUG benefits for members

USER GROUP MEMBERSHIP

Just enter ‘ug367’ in the Promotional Code box when you check out at manning.com.

To get this discount, please shop on www.oreilly.com

and quote reference DSUG.

http://www.manning.com/

http://www.oreilly.com/

26www.vitech.com.ua

Future meetups

http://[email protected]

We and O’Reilly encourage you to

host future meetups, speech on them and participate in group

activities.

27www.vitech.com.ua

Questions and discussion

Any questions?

hbase, dances on the elephant back

Technology

hbase data paths

hdfs hbase

hbase motivation

project hbase

hbase dances

hbase facts

table row

hbase development