h base

13
[email protected] The term planet-size web application comes to mind, a nd in this case it is fitti n g

Upload: shashwat-shriparv

Post on 20-May-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: H base

D W I V E D I S H A S H W A T @ G M A I L . CO M

The

term

pla

net-s

ize

web

applic

atio

n com

es to

min

d, and in

this

cas

e it

is fi

ttin

g

Page 2: H base

WHAT IS IT?

It is the Hadoop database,

Sclable

Distributed

BigDatastore

Column Oriented

HBASE

HDFS

Reader Writer

Page 3: H base

FEATURES OF HBASE

Scalable.

Automatic failover

Consistent reads and writes.

Sharding of tables

Failover support

Classes for backing hadoop mapreduce jobs

Java API for client access

Thrift gateway and a REST-ful Web

Shell support

Page 4: H base

WHAT IT IS NOT

No sql

No relation

No joins

Not a replacement of RDBMS

Page 5: H base

INTRODUCTIONNoSQL

HBase is a type of "NoSQL" database. "NoSQL" is a general term meaning that the database isn't an RDBMS which supports SQL as its primary access language.

When we should think of using it

HBase isn't suitable for every problem. We should have lot of data, if data is less RDBMS is better.

Difference Between HDFS and Hbase

HDFS is a distributed file system that is well suited for the storage of large files. It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files. HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables.

Page 6: H base

THINK ON THIS

Facebook, for example, is adding more than 15 TB, and processing daily

Google addint Peta Bytes of data and processing.

Companies stroing Logs, temperature details, and many other prospectives to store and process, which come in peta byte for which conventional technologies will days to read the data forget about processing it.

Page 7: H base

WHAT IS COLUMNS ORIENTED MEANS

Grouped by columns,

The reason to store values on a per-column basis instead is based on the assumption

that, for specific queries, not all of the values are needed.

Reduced I/O

Page 8: H base

COMPONENTS

Page 9: H base

HMASTER

Master server is responsible for monitoring all RegionServer instances in the cluster, and is the interface for all metadata changes, it runs on the server which hosts namenode.

Master controls critical functions such as RegionServer failover and completing region splits. So while the cluster can still run for a time without the Master, the Master should be restarted as soon as possible.

Page 10: H base

REGIONSERVERS

It is responsible for serving and managing regions, its like a data node for HBase.

These can be thought of Datanode for Hadoop cluster. It serve the client request for the data.

It handle the actual data storage and request.

It consists of Regions or in better words tables.

RegionServers are usually configured to run on servers of HDFS DataNode. Running RegionServer on the DataNode server has the advantage of data locality too

Page 11: H base

ZOOKEEPER

Zookeeper is an open source software providing a highly reliable, distributed coordination service

Entry point for an HBase system

It includes tracking of region servers, where the root region is hosted

Page 12: H base

API

Interface to HBase

Using these we can we can access HBase and perform read/write and other operation on HBase.

REST, Thrift, and Avro

Thrift API framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.

Page 13: H base

QUERIES ?