![Page 2: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/2.jpg)
HBase
• Overview
• Table layout
• Architecture
• Client API
• Key design
2
![Page 3: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/3.jpg)
Overview
3
![Page 4: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/4.jpg)
Overview
• NoSQL
• Column oriented
• Versioned
4
![Page 5: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/5.jpg)
Overview
• All rows ordered by row key
• All cells are uninterpreted arrays of bytes
• Auto-sharding
5
![Page 6: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/6.jpg)
Overview
• Distributed
• HDFS based
• Highly fault tolerant
• CP/CAP
6
![Page 7: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/7.jpg)
Overview
• Coordinates: <RowKey, CF:CN, Version>
• Data is grouped by column family
• Null values for free
7
![Page 8: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/8.jpg)
Overview
• No actual deletes (tombstones)
• TTL for cells
8
![Page 9: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/9.jpg)
Disadvantages• No secondary indexes from the box
• Available through additional table and coprocessors
• No transactions
• CAS support, row locks, counters
• Not good for multiple data centers
• Master-slave replication
• No complex query processing9
![Page 10: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/10.jpg)
Table layout
10
![Page 11: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/11.jpg)
Table layout
11
![Page 12: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/12.jpg)
Table layout
12
![Page 13: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/13.jpg)
Table layout
13
![Page 14: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/14.jpg)
Architecture
14
![Page 15: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/15.jpg)
Terms
• Region
• Region server
• WAL
• Memstore
• StoreFile/HFile
15
![Page 16: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/16.jpg)
Architecture
16
![Page 17: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/17.jpg)
17
![Page 18: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/18.jpg)
Architecture
18
![Page 19: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/19.jpg)
Architecture
• One master (with backup) many slaves (region servers)
• Very tight integration with ZooKeeper
• Client calls ZK, RS, Master
19
![Page 20: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/20.jpg)
Master
• Handles schema changes
• Cluster housekeeping operations
• Not SPOF
• Check RS status through ZK
20
![Page 21: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/21.jpg)
Region server
• Stores data in regions
• Region is a container of data
• Does actual data manipulation
• HDFS DataNode
21
![Page 22: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/22.jpg)
ZooKeeper
• Distributed synchronization service
• FS-like tree structure contains keys
• Provides primitives to achieve a lot of goals
22
![Page 23: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/23.jpg)
Storage
• Handled by region server
• Memstore: sorted write buffer
• 2 types of file
• WAL log
• Data storage
23
![Page 24: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/24.jpg)
Client connection
• Ask ZK to get address -ROOT- region
• From -ROOT- gets the RS with .META.
• From .META. gets the address of RS
• Cache all these data
24
![Page 25: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/25.jpg)
Write path
• Write to HLog (one per RS)
• Write to Memstore (one per CF)
• If Memstore is full, flush is requested
• This request is processed async by another thread
25
![Page 26: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/26.jpg)
On Memstore flush
• On flush Memstore is written to HDFS file
• The last operation sequence number is stored
• Memstore is cleared
26
![Page 27: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/27.jpg)
Region splits
• Region is closed
• Creates directory structure for new regions (multithreaded)
• .META. is updated (parent and daughters)
27
![Page 28: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/28.jpg)
Region splits
• Master is notified for load balancing
• All steps are tracked in ZK
• Parent cleaned up eventually
28
![Page 29: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/29.jpg)
Compaction
• When Memstore is flushed new store file is created
• Compaction is a housekeeping mechanism which merges store files
• Come in two varieties: minor and major
• The type is determined when the compaction check is executed
29
![Page 30: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/30.jpg)
Minor compaction
• Fast
• Selects files to include
• Params: max size and min number of files
• From the oldest to the newest
30
![Page 31: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/31.jpg)
Minor compaction
31
![Page 32: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/32.jpg)
Major compaction
• Compact all files into a single file
• Might be promoted from the minor compaction
• Starts periodically (each 24 hours by default)
• Checks tombstone markers and removes data
32
![Page 33: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/33.jpg)
HFile format
33
![Page 34: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/34.jpg)
HFile format
• Immutable (trailer is written once)
• Trailer has pointers to other blocks
• Indexes store offsets to data, metadata
• Block size by default 64K
• Block contains magic and number of key values
34
![Page 35: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/35.jpg)
KeyValue format
35
![Page 36: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/36.jpg)
Write-Ahead log
36
![Page 37: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/37.jpg)
Write-Ahead log
• One per region server
• Contains data from several regions
• Stores data sequentially
37
![Page 38: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/38.jpg)
Write-Ahead log
• Rolled periodically (every hour by default)
• Applied on cluster restart and on server failure
• Split is distributed
38
![Page 39: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/39.jpg)
Read path
• There is no single index with logical rows
• Data is stored in several files and Memstore per column family
• Gets are the same as scans
39
![Page 40: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/40.jpg)
Read path
40
![Page 41: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/41.jpg)
Block cache
• Each region has LRU block cache
• Based on priorities
1. Single access
2. Multi access priority
3. Catalog CF: ROOT and META
41
![Page 42: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/42.jpg)
Block cache
• Size of cache:
• number of region servers * heap size * (hfile.block.cache.size) * 0.85
• 1 RS with 1 Gb RAM = 217 Mb
• 20 RS with 8 Gb RAM = 34 Gb
• 100 RS with 24 Gb RAM with 0.5 = 1 Tb
42
![Page 43: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/43.jpg)
Block cache
• Always in cache:
• Catalog tables
• HFiles indexes
• Keys (row keys, CF, TS)
43
![Page 44: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/44.jpg)
Client API
44
![Page 45: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/45.jpg)
CRUD
• All operations do through HTable object
• All operations per-row atomic
• Row lock is available
• Batch is available
• Looks like this:HTable table = new HTable("table");
table.put(new Put(...));
45
![Page 46: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/46.jpg)
Put
• Put(byte[] row)
• Put(byte[] row, RowLock rowLock)
• Put(byte[] row, long ts)
• Put(byte[] row, long ts, RowLock rowLock)
46
![Page 47: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/47.jpg)
Put
• Put add(byte[] family, byte[] qualifier, byte[] value)
• Put add(byte[] family, byte[] qualifier, long ts, byte[] value)
• Put add(KeyValue kv)
47
![Page 48: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/48.jpg)
Put
• setWriteToWAL()
• heapSize()
48
![Page 49: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/49.jpg)
Write buffer
• There is a client-side write buffer
• Sorts data by Region Servers
• 2Mb by default
• Controlled by:
table.setAutoFlush(false);
table.flushCommits();
49
![Page 50: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/50.jpg)
Write buffer
50
![Page 51: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/51.jpg)
What else
• CAS:boolean checkAndPut(byte[] row, byte[] family,
byte[] qualifier, byte[] value, Put put) throws IOException
• Batch:void put(List<Put> puts) throws IOException
51
![Page 52: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/52.jpg)
Get
• Returns strictly one row
Result res = table.get(new Get(row));
• Can ask if row exists
boolean e = table.exist(new Get(row));
• Filter can be applied
52
![Page 53: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/53.jpg)
Row lock
• Region server provide a row lock
• Client can ask for explicit lock:RowLock lockRow(byte[] row) throws IOException;
void unlockRow(RowLock rl) throws IOException;
• Do not use if you do not have to
53
![Page 54: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/54.jpg)
Scan
• Iterator over a number of rows
• Can be defined start and end rows
• Leased for amount of time
• Batching and caching is available
54
![Page 55: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/55.jpg)
Scanfinal Scan scan = new Scan(); scan.addFamily(Bytes.toBytes("colfam1")); scan.setBatch(5);scan.setCaching(5);scan.setMaxVersions(1);scan.setTimeStamp(0);scan.setStartRow(Bytes.toBytes(0));scan.setStartRow(Bytes.toBytes(1000));ResultScanner scanner = table.getScanner(scan);for (Result res : scanner) { ...} scanner.close();
55
![Page 56: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/56.jpg)
Scan
• Caching is for rows
• Batching is for columns
• Result.next() will return next set of batched cols (more than one for row)
56
![Page 57: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/57.jpg)
Scan
57
![Page 58: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/58.jpg)
Counters
• Any column can be treated as counters
• Atomic increment without locking
• Can be applied for multiple rows
• Looks like this:long cnt = table.incrementColumnValue(Bytes.toBytes("20131028"),
Bytes.toBytes("daily"), Bytes.toBytes("hits"), 1);
58
![Page 59: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/59.jpg)
HTable pooling
• Creating an HTable instance is expensive
• HTable is not thread safe
• Should be created on start up and closed on application exit
59
![Page 60: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/60.jpg)
HTable pooling
• Provided pool: HTablePool
• Thread safe
• Shares common resources
60
![Page 61: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/61.jpg)
HTable pooling
• HTablePool(Configuration config, int maxSize)
• HTableInterface getTable(String tableName)
• void putTable(HTableInterface table)
• void closeTablePool(String tableName)
61
![Page 62: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/62.jpg)
Key design
62
![Page 63: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/63.jpg)
Key design
• The only index is row key
• Row keys are sorted
• Key is uninterpreted array of bytes
63
![Page 64: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/64.jpg)
Key design
64
![Page 65: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/65.jpg)
Email example
Approach #1: user id is raw key, all messages in one cell
<userId> : <cf> : <col> : <ts>:<messages>12345 : data : msgs : 1307097848 : ${msg1}, ${msg2}...
65
![Page 66: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/66.jpg)
Email example
Approach #1: user id is raw key, all messages in one cell
<userId> : <cf> : <colval> : <messages>
• All in one request
• Hard to find anything
• Users with a lot of messages
66
![Page 67: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/67.jpg)
Email example
Approach #2: user id is raw key, message id in column family
<userId> : <cf> : <msgId> : <timestamp> : <email-message>
12345 : data : 5fc38314-e290-ae5da5fc375d : 1307097848 : "Hi, ..."
12345 : data : 725aae5f-d72e-f90f3f070419 : 1307099848 : "Welcome, and ..."
12345 : data : cc6775b3-f249-c6dd2b1a7467 : 1307101848 : "To Whom It ..."
12345 : data : dcbee495-6d5e-6ed48124632c : 1307103848 : "Hi, how are ..."
67
![Page 68: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/68.jpg)
Email example
Approach #2: user id is raw key, message id in column family
<userId> : <cf> : <msgId> : <timestamp> : <email-message>
• Can load data on demand
• Find anything only with filter
• Users with a lot of messages
68
![Page 69: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/69.jpg)
Email example
Approach #3: raw id is a <userId> <msgId>
<userId> - <msgId>: <cf> : <qualifier> : <timestamp> : <email-message>
12345-5fc38314-e290-ae5da5fc375d : data : : 1307097848 : "Hi, ..."
12345-725aae5f-d72e-f90f3f070419 : data : : 1307099848 : "Welcome, and ..."
12345-cc6775b3-f249-c6dd2b1a7467 : data : : 1307101848 : "To Whom It ..."
12345-dcbee495-6d5e-6ed48124632c : data : : 1307103848 : "Hi, how are ..."
69
![Page 70: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/70.jpg)
Email example
Approach #3: raw id is a <userId> <msgId>
<userId> - <msgId>: <cf> : <qualifier> : <timestamp> : <email-message>
• Can load data on demand
• Find message with row key
• Users with a lot of messages
70
![Page 71: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/71.jpg)
Email example
• Go further:
<userId>-<date>-<messageId>-<attachmentId>
71
![Page 72: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/72.jpg)
Email example
• Go further:
<userId>-<date>-<messageId>-<attachmentId>
<userId>-(MAX_LONG - <date>)-<messageId>
-<attachmentId>
72
![Page 73: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/73.jpg)
Performance
73
![Page 74: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/74.jpg)
Performance
74
nobuf, noWAL
buf:100, noWAL
buf:1000, noWAL
nobuf, WAL buf:100, WALbuf:1000, WAL
MB/s 12MB/s 53MB/s 48MB/s 5MB/s 11MB/s 31MB/s
puts/s 11k rows/s 50k rows/s 45k rows/s 4.7k rows/s 10k rows/s 30k rows/s
• 10 Gb data, 10 parallel clients, 1kb per entry
• HDFs replication = 3
• 3 Region servers, 2CPU (24 cores), 48Gb RAM, 10GB for App
![Page 75: HBase - Computer Science Center · PDF file• Null values for free 7. Overview ... • Params: max size and min number of files ... • heapSize() 48. Write buffer](https://reader034.vdocuments.mx/reader034/viewer/2022051722/5aa655ba7f8b9afa758e5c1f/html5/thumbnails/75.jpg)
Sources
• HBase. The defenitive guide (O’Reilly)
• HBase in action (Manning)
• Official documentation http://hbase.apache.org/0.94/book.html
• Cloudera articleshttp://blog.cloudera.com/blog/category/hbase/
75