introduction to · 2012-07-17 · introduction to andrás garzó (garzo@ilab.sztaki.hu) ... hbase...

2012.04.27.

Introduction to

András Garzó (garzo@ilab.sztaki.hu)

Google BigTable (2006)

● Distributed multi level map

● Fault tolerant

● Horizontal Scalability

● Runs on commodity hardware

● Self managing

● Large number of R/W

● Tight integration with MapReduce

● Open source BigTable implementation

● Based on Hadoop stack

● Data stored on HDFS

● Integrated with MapReduce

HBase Data Model

Column-oriented database

HBase column oriented only in column-family level!

HBase Architecture

Auto Sharding

Distribution

Auto Sharding and Distribution

● Region is the unit of scalability

● Sorted, contiguous range of rows

● Load balancing and failover

● Split automatically or manually to scale with

growing data

● Capacity is solely a factor of cluster nodes vs.

region per node

Regions and Splitting

HMaster

● Coordinates region splitting

● Load balancing

● Table management

● Multiple masters for failover

Zookeeper

● Master election

● Locate –ROOT- region

● Region Server membership

Where is my row?

Inside the Region

Writing to HBase

Reading from HBase

Merge Reads

Logical and physical layout

● Logical layout does not match physical one

● All values stored with the full coordinates,

including Row Key, Column Family, Column

Qualifier and Timestamp

● Folds columns into „row per column”

● NULLs are cost free

● Versions are multiple „rows” in folded table

Key Cardinality

● Best performance is gained from using row

● Selecting column families reduces the amount

of the data to be scanned

● Time range bound reads can skip store files

(and Bloom Filters too!)

● Pure value based filtering is a full table

Tall-Narrow or Flat-Wide tables?

● Same storage footprint

● Atomicity only on row level

● Rows do not split

● Put more details into the row keys

● Tall with scans, wide with gets

Example

Sequential read and write

HBase and MapReduce

How data locality achieved?

● Region server and data node runs on the same

● HBase shuts down very rarely

● DataNodes help: when Region Servers write to

HDFS, data blocks will be stored locally also

HBase vs. RDBMS

HBase RDBMS

Column oriented Row oriented

Flexible scheme, add columns on the fly Fixed scheme

Good with sparse table Not optimized for sparse tables

No query language (just Scan and Get) SQL

No joins (but we can do with MapReduce) Optimized for joins

Horizontal scalability Hard to scale

No transactions Transactional

Consistent Consistent

It’s a really wrong comparison!

HBase on the CAP triangle

introduction to · 2012-07-17 · introduction to andrás garzó (garzo@ilab.sztaki.hu) ... hbase...

Documents

evaluation of small intestine grafts decellularization...

gustavo martín garzo - fundacionlengua.com€¦ · gustavo...

transit oriented development types - 2030...

carta cerrada - martin garzo, gustavo

martín garzo

vicente garzó departamento de física universidad de...

literatura y realidad: estudio comparativo …aldecoa, luis...

espido freire fernando marías gustavo martín garzo...

cos 240 object-oriented languages 5.1 object-oriented design...

hubble a kitáruló univerzum © dr. garzó lászló 2010....

guía de lectura: gustavo martin garzo

delivering personalized precision care today. … · marlon...

dinora marisol arroyo...

gustavo martÍn garzo

object-oriented modeling of object oriented concepts

service service - - oriented oriented architecture

agent-oriented software engineering - unibo.it ·...

object oriented programming and object oriented design

1. demand-oriented 2. cost-oriented 3. competition-oriented

„current issues in data management, database and...