introduction to · 2012-07-17 · introduction to andrás garzó ([email protected]) ... hbase...

40
2012.04.27. Introduction to András Garzó ([email protected])

Upload: others

Post on 23-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

2012.04.27.

Introduction to

András Garzó ([email protected])

Page 2: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Google BigTable (2006)

● Distributed multi level map

● Fault tolerant

● Horizontal Scalability

● Runs on commodity hardware

● Self managing

● Large number of R/W

● Tight integration with MapReduce

Page 3: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

HBase

● Open source BigTable implementation

● Based on Hadoop stack

● Data stored on HDFS

● Integrated with MapReduce

Page 4: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

HBase Data Model

Page 5: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Column-oriented database

HBase column oriented only in column-family level!

Page 6: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

HBase Architecture

Page 7: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Auto Sharding

Page 8: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Distribution

Page 9: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Auto Sharding and Distribution

● Region is the unit of scalability

● Sorted, contiguous range of rows

● Load balancing and failover

● Split automatically or manually to scale with

growing data

● Capacity is solely a factor of cluster nodes vs.

region per node

Page 10: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Regions and Splitting

Page 11: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Regions and Splitting

Page 12: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Regions and Splitting

Page 13: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

HMaster

● Coordinates region splitting

● Load balancing

● Table management

● Multiple masters for failover

Page 14: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Zookeeper

● Master election

● Locate –ROOT- region

● Region Server membership

Page 15: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Where is my row?

Page 16: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Where is my row?

Page 17: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Where is my row?

Page 18: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Where is my row?

Page 19: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Where is my row?

Page 20: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Inside the Region

Page 21: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Writing to HBase

Page 22: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Writing to HBase

Page 23: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Writing to HBase

Page 24: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Writing to HBase

Page 25: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Writing to HBase

Page 26: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Writing to HBase

Page 27: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Reading from HBase

Page 28: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Reading from HBase

Page 29: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Merge Reads

Page 30: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Logical and physical layout

Page 31: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Logical and physical layout

● Logical layout does not match physical one

● All values stored with the full coordinates,

including Row Key, Column Family, Column

Qualifier and Timestamp

● Folds columns into „row per column”

● NULLs are cost free

● Versions are multiple „rows” in folded table

Page 32: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Key Cardinality

Page 33: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Key Cardinality

● Best performance is gained from using row

keys

● Selecting column families reduces the amount

of the data to be scanned

● Time range bound reads can skip store files

(and Bloom Filters too!)

● Pure value based filtering is a full table

scan!

Page 34: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Tall-Narrow or Flat-Wide tables?

● Same storage footprint

● Atomicity only on row level

● Rows do not split

● Put more details into the row keys

● Tall with scans, wide with gets

Page 35: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Example

Page 36: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Sequential read and write

Page 37: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

HBase and MapReduce

How data locality achieved?

● Region server and data node runs on the same

node

● HBase shuts down very rarely

● DataNodes help: when Region Servers write to

HDFS, data blocks will be stored locally also

Page 38: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

HBase vs. RDBMS

HBase RDBMS

Column oriented Row oriented

Flexible scheme, add columns on the fly Fixed scheme

Good with sparse table Not optimized for sparse tables

No query language (just Scan and Get) SQL

No joins (but we can do with MapReduce) Optimized for joins

Horizontal scalability Hard to scale

No transactions Transactional

Consistent Consistent

It’s a really wrong comparison!

Page 39: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

HBase on the CAP triangle

Page 40: Introduction to · 2012-07-17 · Introduction to András Garzó (garzo@ilab.sztaki.hu) ... HBase RDBMS Column oriented Row oriented Flexible scheme, add columns on the fly Fixed

Q&A