storage infrastructure using hbase behind line messages

Storage infrastructure using HBase behind LINE messages

NHN Japan Corp. LINE Server Task Force

Shunsuke Nakamura @sunsuk7tp

13.1.21 2 Hadoop Conference Japan 2013 Winter

To support ’s users, we have built message storage that is

Large scale (tens of billion rows/day) Responsive (under 10 ms)

High available (dual clusters)

Outline

•  About LINE •  LINE & Storage requirements •  What we achieved •  Today’s topics

–  IDC online migration – NN failover – Stabilizing LINE message cluster

•  Conclusion 13.1.21 4 Hadoop Conference Japan 2013 Winter

LINE - A global messenger powered by NHN Japan -

Devices 5 different mobile platforms + Desktop support

New year 2013 in Japan

3 5mes traffic explosion LINE Storage had no problems :)

(ploFed by 1min)

Number of requests in a HBase cluster Usual Peak Hours New Year 2013

あけおめ! 新年好!

LINE on Hadoop Storages for service, backup and log

For HBase, M/R and log archive

Bulk migration and ad-hoc analysis

For HBase and Sharded-Redis

Collecting Apache and Tomcat logs

KPI, Log analysis 13.1.21 10 Hadoop Conference Japan 2013 Winter

LINE on Hadoop Storages for service, backup and log

For HBase, M/R and log archive

Bulk migration and ad-hoc analysis

For HBase and Sharded-Redis

Collecting Apache and Tomcat logs

KPI, Log analysis 13.1.21 11 Hadoop Conference Japan 2013 Winter

LINE service requirements

LINE is a… Messaging Service - Should be fast Global Service - Downtime not allowed

But, not a Simple Messaging Service. Message synchronization b/w phone & PCs – Messages should be kept for a while.

LINE’s storage requirements

No data loss

Low latency

Easy scale-‐out

Flexible schema

management

Eventual consistency

Our selection is HBase

•  Low latency for large amount of data

•  Linearly scalable •  Relatively lower operating cost

– Replication by nature – Automatic failover

•  Data model fits our requirements – Semi-structured – Timestamp

Stored rows per day in a cluster 10

(billions/day)

What we achieved with HBase

•  No data loss – Persistent – Data replication

•  Automatic recovery from server failure

•  Reasonable performance for large data sets – Hundreds of billion rows – Write: ~ 1 ms – Read: 1 ~ 10 ms

Many issues we had •  Heterogeneous storages coordination •  IDC online migration •  Flush & Compaction Storms by “too many HLogs” •  Row & Column distribution •  Secondary Index •  Region Management

–  load, size balancing –  RS Allocation –  META region –  M/R

•  Monitoring for diagnostics •  Traffic burst by decommission •  NN problems •  Performance degradation

–  hotspot problem –  timeout burst –  GC problem

•  Client bugs –  Thread Blocking on server failure (HBASE-6364)

Today’s topics

IDC online migration

NN failover

Stabilizing LINE message cluster

IDC online migration NN failover

•  Move whole HBase clusters and data

•  For better network infrastructure

•  Without downtime

App Server

src-HBase

dst-HBase

Before migration

•  Write to both (client-level replication)

App Server

src-HBase

dst-HBase write

•  New data: Incremental replication •  Old data: Bulk migration •  dst’s timestamp equals src’s one

App Server

src-HBase

dst-HBase write

LINE HBase Replicator & BulkMigrator

Replicator is for incremental replication BulkMigrator is for bulk migration

LINE HBase Replicator •  Our own implementation •  Prefer pull to push

•  Throughput throttling •  Workload isolation of replicator and RS

•  Rowkey conversion and filtering

src-HBase

dst-HBase

HBase Replicator

src-HBase

dst-HBase

LINE HBase Replicator

LINE HBase Replicator - A simple daemon to replicate local regions -

1.  HLogTracker reads a ckpt and selects next HLog.

2.  For each entry in HLog: 1.  Filter & convert a HLog.Entry 2.  Create Puts and batch to dst HBase

•  Periodic checkpointing •  Generally, entries are replicated

in seconds

Bulk migration 1.  MapReduce between any storages

–  Map task only –  Read source, write destination –  Task scheduling problem depends on region allocation

2.  Non MapReduce version (BulkMigrator) –  Our own implementation –  HBase → HBase –  On each RS, scan & batch by a region –  Throughput throttling –  Slow, but easy to implement and debug

Background

•  Our HBase has a SPOF: NameNode •  “Apache Hadoop HA Configuration”

http://blog.cloudera.com/blog/2009/07/hadoop-ha-configuration/ •  Furthermore, added Pacemaker

– Heartbeat can’t detect whether NN is running

Previous: HA-NN DRBD + VIP + Pacemaker

NameNode failure in 2012.10

HA-NN failover failed

•  Not NameNode process •  Incorrect leader election at network partitioning •  Complicated configuration

–  Easy to mistake, difficult to control –  Pacemaker scripting was not straightforward –  VIP is risky to HDFS

•  DRBD split-brain problem –  Protocol C –  Unable to re-sync while service is online

Now: In-house NN failure handling

•  Bye-bye old HA-NN –  Had to restart whole HBase clusters after NN failover

•  Alternative ideas –  Quorum-based leader election (Using ZK) –  Using L4 switch –  Implement our own AvatarNode

•  Safer solution instead of a little downtime

rsync with -‐-‐link-‐dest periodically

In-house NN failure handling (1)

Performance

“Too many HLogs”

Hotspot problems

Region mappings to RS

META region workload isola5on

RS GC Storm H/W Failure Handling

Case 1

Case 2

Case 4

Case 3

Case1: “Too many HLogs” •  Effect

–  MemStore flush storm –  Compaction storm

•  Cause –  Different regions growth –  Heterogeneous tables in a RS

•  Solution –  Region balancing –  External flush scheduler

Case1: Number of HLogs

No flushed

Forced flushed

peak off-peak

better case

worse case

Forced flushed Forced flushed

flush storm

Forced flushed

Periodic flushed

Case2: Hotspot problems •  Effect

–  Excessive GC –  RS performance degradation (High CPU usage)

•  Cause – Get/Scan:

•  Row or column, updated too frequently •  Row which has too many columns (+ tombstones)

•  Solution –  Schema and row/column distribution are important –  Hotspot region isolation

Case3: META region workload isolation

•  Effect 1.  RS high CPU 2.  Excessive timeout 3.  META lookup timeout

•  Cause –  Inefficient exception handling of HBase client –  Hotspot region and META in same RS

•  Solution –  META only RS

Case4: Region mappings to RS

•  Effect –  Region mapping is not restored on RS restart –  Some region mappings aren’t restored properly

after graceful restart •  graceful_stop.sh --restart --reload

•  Cause –  HBase does not support it well

•  Solution –  Periodic dump and restore it

Summary

•  IDC online migration –  Without downtime –  LINE HBase Replicator & BulkMigrator

•  NN failover –  Simple solution for a person saying

“What’s Hadoop?” •  Stabilizing LINE message cluster

–  Improved response time of RS

Conclusion

We won 100M user adopting HBase

LINE Storage is a successful example

of a messaging service using HBase

storage infrastructure using hbase behind line messages

tomcat logs

redis collecting

hbase write

hadoop storages

messaging

hoc analysis

hbase clusters

log analysis13

Technology

messages in bottles: what the student residents of kent...

hbase @ meetup

building a linq provider for hbase mapreduce ·...

hbase hivepig

hbase user group #9: hbase and hdfs

hbase presentation

hoya : hbase on yarn (2013-08-20 hbase hug)

analysis of hdfs under hbase - usenix.org · 10 june 2014...

hbase intro

hbase overview

hbase - arif.works

intro to hbase internals & schema design (for hbase users)

best brand logos with behind the hidden messages

hadoop developer - sevenmentor · 2021. 2. 17. · d....

secure hbase

going viral –building an online mass-movement behind your...

mylife with hbase or hbase three flavors

christopher knight, behind 'smoke,' fiery messages, the

hbase schema design - hbase-con 2012

hbase nosql