hbase at flurry

17
HBase at Flurry 2013-08-20 Dave Latham

Upload: ddlatham

Post on 28-May-2015

2.570 views

Category:

Technology


0 download

DESCRIPTION

Slides from HBase Meetup at Flurry 2013-08-20

TRANSCRIPT

Page 1: HBase at Flurry

HBase at Flurry

2013-08-20Dave Latham

Page 2: HBase at Flurry

Overview

History Stats How We Store Data Challenges Mistakes We Made Tips / Patterns Future Moral of the Story

Page 3: HBase at Flurry

History

2008 –Flurry Analytics for Mobile Apps Sharded MySQL, or HBase!

Launched on 0.18.1 with a 3 node cluster

Great community Now running 0.94.5 (+ patches) 2 data centers with 2 clusters each Bidirectional replication

Page 4: HBase at Flurry

Stats – Main cluster

1000 slave nodes per cluster 32 GB RAM, 4 drives (1 or 2 TB), 1 GigE, dual

quad-core * 2 HT = 16 procs DataNode, TaskTracker, RegionServer (11GB),

5 Mappers, 2 Reducers ~30 tables, 250k regions, 430 TB (after

LZO) 2 big tables are about 90% of that▪ 1 wide table: 3 CF, 4 billion rows, up to 1MM cells

per row▪ 1 tall table: 1 CF, 1 trillion rows, most 1 cell per row

Page 5: HBase at Flurry

Stats – Low latency cluster 12 physical nodes

5 region servers with 20GB heaps on each 1 table - 8 billion small rows - 500GB

(LZO) All in block cache (after 20 minute

warmup) 100k-1MM QPS - 99.9% Reads 2ms mean, 99% <10ms

25 ms GC pause every 40 seconds slow after compaction

Page 6: HBase at Flurry

App Layer

DAO for Java apps Requires:▪ writeRowIndex / readRowIndex▪ readKeyValue / writeRowContents

Provides:▪ save / delete▪ streamEntities / pagination▪ MR input formats on entities (rather than

Result) Uses HTable or asynchbase

Page 7: HBase at Flurry

Migrations

Change row key format DAO supports both formats1. Create new table2. Writes to both3. Migrate existing4. Validate5. Reads to new table6. Write to (only) new table7. Drop old table

Page 8: HBase at Flurry

Challenges – Big Cluster

Bottlenecks (not horizontally scalable) HMaster (e.g. HLog cleaning falls behind

creation [HBASE-9208]) NameNode▪ Disable table / shutdown => many HDFS files at

once▪ Scan table directory => slow region assignments

ZooKeeper (HBase replication) JobTracker (heap) META region

Page 9: HBase at Flurry

Challenges – Big Cluster (cont.) Too many regions (250k)

Max size 256M -> 1 GB -> 5 GB Slow reassignments on failure Slow hbck recovery Lots of META queries / big client cache▪ Soft refs can exacerbate

Slow rolling restarts More failures (Common and otherwise)

Zombie RS

Page 10: HBase at Flurry

Challenges – Big Cluster (cont.)

Latency long tail HTable Flush write buffer GC pauses RegionServer failure (See The Tail at Scale – Jeff Dean, Luiz André

Barroso)

Page 11: HBase at Flurry

More Challenges

Shared cluster for MapReduce and live queries IO bound requests hog handler threads Even cached reads get slow RegionServer falls behind, stays behind If the cluster goes down, it takes awhile

to come back

Page 12: HBase at Flurry

More Challenges

HDFS-5042 Completed files lost after power failure ZOOKEEPER-1277 servers stop serving when

lower 32bits of zxid roll over ZOOKEEPER-1731 Unsynchronized access to

ServerCnxnFactory.connectionBeans results in deadlock

Page 13: HBase at Flurry

Mistakes We Made (So You Don’t Have To)

Small region size -> many regions Nagle’s Trying to solve a crisis you don’t

understand (hbck fixSplitParents) Setting up replication

Custom backup / restore CopyTable OOM Verification

Page 14: HBase at Flurry

Tips / Patterns

Compact data matters (even with compression) Block cache, network not compressed   

Avoid random reads on non cached tables (duh!)

Write cell fragments, combine at read time to avoid doing random reads

compact later - coprocessor? can lead to large rows▪ probabilistic counter

Page 15: HBase at Flurry

Future

HDFS HA Snapshots (see how it works with 100k

regions on 1000 servers) 2000 node clusters

test those bottlenecks larger regions, larger HDFS blocks, larger HLogs

More (independent) clusters Load aware balancing? Separate RPC priorities for workloads 0.96

Page 16: HBase at Flurry

Moral of the Story

Scaled 1000x and more on the same DB

If you’re on the edge you need to understand your system Monitor Open Source Load test

Know your load Disk or Cache (or SSDs?)

Page 17: HBase at Flurry

Questions

And maybe some answers