gowtham rajappan
DESCRIPTION
Gowtham Rajappan. HDFS – Hadoop Distributed File System modeled on Google GFS. Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google Bigtable. Master: hadoop01.cselabs.umn.edu Slaves: hadoop02 – hadoop05.cselabs.umn.edu - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Gowtham Rajappan](https://reader036.vdocuments.mx/reader036/viewer/2022062315/56815465550346895dc27f8c/html5/thumbnails/1.jpg)
Gowtham Rajappan
![Page 2: Gowtham Rajappan](https://reader036.vdocuments.mx/reader036/viewer/2022062315/56815465550346895dc27f8c/html5/thumbnails/2.jpg)
HDFS – Hadoop Distributed File System modeled on Google GFS.
Hadoop MapReduce – Similar to Google MapReduce Hbase – Similar to Google Bigtable
![Page 3: Gowtham Rajappan](https://reader036.vdocuments.mx/reader036/viewer/2022062315/56815465550346895dc27f8c/html5/thumbnails/3.jpg)
![Page 4: Gowtham Rajappan](https://reader036.vdocuments.mx/reader036/viewer/2022062315/56815465550346895dc27f8c/html5/thumbnails/4.jpg)
![Page 5: Gowtham Rajappan](https://reader036.vdocuments.mx/reader036/viewer/2022062315/56815465550346895dc27f8c/html5/thumbnails/5.jpg)
Master: hadoop01.cselabs.umn.edu Slaves: hadoop02 – hadoop05.cselabs.umn.edu You will require cselabs account to access this cluster. You
can login to any of these machines from any cs/cselabs machine.
![Page 6: Gowtham Rajappan](https://reader036.vdocuments.mx/reader036/viewer/2022062315/56815465550346895dc27f8c/html5/thumbnails/6.jpg)
Data is divided into various tables Table is composed of columns, columns are grouped into
column-families
![Page 7: Gowtham Rajappan](https://reader036.vdocuments.mx/reader036/viewer/2022062315/56815465550346895dc27f8c/html5/thumbnails/7.jpg)
Partitioning A table is horizontally partitioned into regions, each
region is composed of sequential range of keys Each region is managed by a RegionServer, a single
RegionServer may hold multiple regions Persistence and data availability
HBase stores its data in HDFS, it doesn't replicate RegionServers and relies on HDFS replication for data availability.
Region data is cached in-memory Updates and reads are served from in-memory cache
(MemStore) MemStore is flushed periodically to HDFS Write Ahead Log (stored in HDFS) is used for
durability of updates
![Page 8: Gowtham Rajappan](https://reader036.vdocuments.mx/reader036/viewer/2022062315/56815465550346895dc27f8c/html5/thumbnails/8.jpg)
![Page 9: Gowtham Rajappan](https://reader036.vdocuments.mx/reader036/viewer/2022062315/56815465550346895dc27f8c/html5/thumbnails/9.jpg)
HBase shell provides interactive commands for manipulating database
Create/delete tables Insert/update/read from tables Manage regions
![Page 10: Gowtham Rajappan](https://reader036.vdocuments.mx/reader036/viewer/2022062315/56815465550346895dc27f8c/html5/thumbnails/10.jpg)
Hbase provides single row atomic operations CheckAndPut – Similar to test-and-set CheckAndDelete All row operations are atomic no matter how many
columns are involved.
Hbase also provides row level exclusive locks You can use these locks to implement single row level
transactions
![Page 11: Gowtham Rajappan](https://reader036.vdocuments.mx/reader036/viewer/2022062315/56815465550346895dc27f8c/html5/thumbnails/11.jpg)
HBase stores multiple versions of a column in a row. Each version is identified by a integer timestamp
By default system time is used as version timestamps. However user can specify a logical timestamp for versioning
Each update to a row creates a new version, for the specified column.
A version can be accessed or deleted using its timestamp. HBase allows to obtain list of all the versions.
![Page 12: Gowtham Rajappan](https://reader036.vdocuments.mx/reader036/viewer/2022062315/56815465550346895dc27f8c/html5/thumbnails/12.jpg)
Hadoop Home - http://hadoop.apache.org/ Hbase - http://hbase.apache.org/ API
http://hbase.apache.org/apidocs/ http://hadoop.apache.org/