apache hbasecis.csuohio.edu/~sschung/cis612/lecturenotes_hbasearchitecture.p… · hadoop/hdfs...
TRANSCRIPT
![Page 1: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/1.jpg)
Apache HBASE
CIS 612
Sunnie Chung
![Page 2: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/2.jpg)
H-Base
� Distributed Column-Oriented database on top of Hadoop/HDFS
� Provides low-latency access to single rows from billions of records
� Column oriented:� OLAP
� Best for aggregation� High compression rate: Few distinct values
� Do not have a Schema or Data type
� Built for Wide tables : Millions of columns Billions of rows
� Denormalized data� Master-Slave architecture
![Page 3: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/3.jpg)
HBase SystemOverview
![Page 4: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/4.jpg)
H-Base Architecture
![Page 5: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/5.jpg)
HMaster Server
� Like Name Node in HDFS
� Manages and Monitors HBase Cluster
Operations
� Assign Region to Region Servers
� Handling Load-Balancing and Splitting
![Page 6: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/6.jpg)
Region Server
� Like Data Node in HDFS
� Highly Scalable
� Handle Read/Write Requests
� Direct Communication with Clients
![Page 7: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/7.jpg)
Internal Architecture
� Tables Regions
� Store
� MemStore
� FileStore Blocks
� Column Families
![Page 8: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/8.jpg)
� HBase is composed of three main components in a master slave type of architecture.
� Region servers serve data for reads and writes.
� Region assignment, DDL (create, delete tables) operations are handled by the HBase Master process.
� Zookeeper, which is part of HDFS, maintains a live cluster state.
Apache HBase Architecture
![Page 9: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/9.jpg)
Contd…
![Page 10: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/10.jpg)
HBase consists of:� Set of tables
� Each table with column families and rows
� Row key acts as a Primary key in HBase.� Any access to HBase tables uses this Primary Key
� Each column qualifier present in HBase denotes attribute corresponding to the object which resides in the cell.
Apache HBase storage structure
![Page 11: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/11.jpg)
HBase HFile and Indexing
![Page 12: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/12.jpg)
� Fault tolerant
� Replication across the data center
� Atomic and strongly consistent row-
level operations
� High availability through automatic
failover
� Automatic sharding and load
balancing of tables
Characteristics of HBase
![Page 13: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/13.jpg)
Characteristics of HBase
� Fast
� Near real time lookups
� In-memory caching via block cache and bloom
filters
� Server side processing via filters and co-
processors
13
![Page 14: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/14.jpg)
� Adobe
� Airbnb uses HBase as part of its Airstream real-time stream computation framework
� Facebook uses HBase for its messaging platform.
� Flurry� Imgur uses HBase to power its notifications system
� Netflix
� Rocket Fuel� Spotify uses HBase as base for Hadoop and machine
learning jobs.
� Sears� Yahoo!
Applications
![Page 15: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/15.jpg)
Apache ZooKeeper
![Page 16: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/16.jpg)
ZooKeeper
� Coordination� Race Condition
� Dead-locks
� Partial Failure� Inconsistency
� What is ZooKeeper?� Distributed coordination service for distributed
applications
� Like a Centralized Repository
� Challenges for Distributed Applications
� ZooKeeper Goals� Serialization
� Atomicity
� Reliability
� Simple API
![Page 17: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/17.jpg)
ZooKeeper Architecture
![Page 18: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/18.jpg)
Introduction to Zookeeper
� Zookeeper: A software service for a distributed environment
that coordinates and configures different machines in a
centralized way.
� A change is not considered successful until it has been
written to a quorum
� A leader is elected within the ensemble for conflicts
� In HBase, ZooKeeper coordinates and shares state between
the Masters and RegionServers.
� Tagline: Enables highly reliable distributed coordination
![Page 19: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/19.jpg)
� Always Odd number of nodes.
� Leader is elected by voting.
� Leader and Follower can get connected to
Clients and Perform Read Operations
� Write Operation is done only by the Leader.
� Observer nodes to address scaling problems
ZooKeeper Architecture
![Page 20: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/20.jpg)
ZooKeeper Data Model
![Page 21: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/21.jpg)
� Z Nodes:
� Similar to Directory in File system
� Container for data and other nodes
� Stores Statistical information and User data up to
1MB
� Used to store and share configuration information
between applications
ZooKeeper Data Model
![Page 22: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/22.jpg)
Z Node Types
� Persistent Nodes
� Ephemeral Nodes
� Sequential Nodes
� Watch : Event system for client notification
22
![Page 23: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/23.jpg)
Projects & Tools on Hadoop
� HBase
� Hive
� Pig
� Jaql
� ZooKeeper
� AVRO
� UIMA
� Sqoop
![Page 24: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/24.jpg)
References
[1] "Apache Hadoop", http://hadoop.apache.org/Hadoop/
[2] “Apache Hive”, http://hive.apache.org/hive
[3] “Apache HBase”, https://hbase.apache.org/hbase
[4] “Apache ZooKeeper”, http://zookeeper.apache.org/zookeeper
[5] Jason Venner, "Pro Hadoop", Apress Books, 2009
[6] "Hadoop Wiki", http://wiki.apache.org/hadoop/
[7] Jiong Xie, Shu Yin, Xiaojun Ruan, Zhiyang Ding, Yun Tian, James Majors, Adam Manzanares, Xiao Qin, " Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters", 19th International Heterogeneity in Computing Workshop, Atlanta, Georgia, April 2010
![Page 25: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/25.jpg)
[8]Dhruba Borthakur, The Hadoop Distributed File System: Architecture
and Design, The Apache Software Foundation 2007.
[9] "Apache Hadoop",
http://en.wikipedia.org/wiki/Apache_Hadoop
[10] "Hadoop Overview",
http://www.revelytix.com/?q=content/hadoop-overview
[11] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert
Chansler, The Hadoop Distributed File System, Yahoo!,
Sunnyvale, California USA, Published in: Mass Storage
Systems and Technologies (MSST), 2010 IEEE 26th
Symposium.
References
![Page 26: Apache HBASEcis.csuohio.edu/~sschung/cis612/LectureNotes_HBaseArchitecture.p… · Hadoop/HDFS Provides low-latency access to single rows from billions of records Column oriented:](https://reader035.vdocuments.mx/reader035/viewer/2022062506/5f47abdba627871b7b7477e9/html5/thumbnails/26.jpg)
[12] Vinod Kumar Vavilapalli, Arun C Murthy, Chris Douglas, Sharad Agarwal,
Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah,
Siddharth Seth, Bikas Saha, Carlo Curino, Owen O’Malley, Sanjay Radia,
Benjamin Reed, Eric Baldeschwieler, Apache Hadoop YARN: Yet Another
Resource Negotiator, ACM Symposium on Cloud Computing 2013, Santa
Clara, California.
[13] Raja Appuswamy, Christos Gkantsidis, Dushyanth Narayanan, Orion
Hodson, and Antony Rowstron, Scale-up vs Scale-out for Hadoop: Time to
rethink?, Microsoft Research, ACM Symposium on Cloud Computing 2013,
Santa Clara, California.
References