storage infrastructure using hbase behind line messages

Post on 08-Sep-2014

18.350 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Slides at hcj13w (http://hcj2013w.eventbrite.com/)

TRANSCRIPT

Storage infrastructure using HBase behind LINE messages

NHN Japan Corp. LINE Server Task Force

Shunsuke Nakamura @sunsuk7tp

13.1.21 2 Hadoop  Conference  Japan  2013  Winter

To support ’s users, we have built message storage that is

Large scale (tens of billion rows/day) Responsive (under 10 ms)

High available (dual clusters)

13.1.21 3 Hadoop  Conference  Japan  2013  Winter

Outline

•  About LINE •  LINE & Storage requirements •  What we achieved •  Today’s topics

–  IDC online migration – NN failover – Stabilizing LINE message cluster

•  Conclusion 13.1.21 4 Hadoop  Conference  Japan  2013  Winter

LINE - A global messenger powered by NHN Japan -

Devices 5 different mobile platforms + Desktop support

13.1.21 5 Hadoop  Conference  Japan  2013  Winter

13.1.21 6 Hadoop  Conference  Japan  2013  Winter

13.1.21 7 Hadoop  Conference  Japan  2013  Winter

New year 2013 in Japan

3  5mes  traffic  explosion  LINE  Storage  had  no  problems  :)  

(ploFed  by  1min)

13.1.21 9 Hadoop  Conference  Japan  2013  Winter

Number of requests in a HBase cluster Usual Peak Hours New Year 2013

あけおめ! 新年好!

X  3

LINE on Hadoop Storages for service, backup and log

For HBase, M/R and log archive

Bulk migration and ad-hoc analysis

For HBase and Sharded-Redis

Collecting Apache and Tomcat logs

KPI, Log analysis 13.1.21 10 Hadoop  Conference  Japan  2013  Winter

LINE on Hadoop Storages for service, backup and log

For HBase, M/R and log archive

Bulk migration and ad-hoc analysis

For HBase and Sharded-Redis

Collecting Apache and Tomcat logs

KPI, Log analysis 13.1.21 11 Hadoop  Conference  Japan  2013  Winter

LINE service requirements

LINE is a… Messaging Service - Should be fast Global Service - Downtime not allowed

But, not a Simple Messaging Service. Message synchronization b/w phone & PCs – Messages should be kept for a while.

13.1.21 12 Hadoop  Conference  Japan  2013  Winter

LINE’s storage requirements

HA

No    data  loss

Low  latency

Easy  scale-­‐out

Flexible  schema  

management

Eventual  consistency

13.1.21 13 Hadoop  Conference  Japan  2013  Winter

Our selection is HBase

•  Low latency for large amount of data

•  Linearly scalable •  Relatively lower operating cost

– Replication by nature – Automatic failover

•  Data model fits our requirements – Semi-structured – Timestamp

13.1.21 14 Hadoop  Conference  Japan  2013  Winter

Stored rows per day in a cluster 10

8

6

4

2

(billions/day)

13.1.21 15 Hadoop  Conference  Japan  2013  Winter

What we achieved with HBase

•  No data loss – Persistent – Data replication

•  Automatic recovery from server failure

•  Reasonable performance for large data sets – Hundreds of billion rows – Write: ~ 1 ms – Read: 1 ~ 10 ms

13.1.21 16 Hadoop  Conference  Japan  2013  Winter

Many issues we had •  Heterogeneous storages coordination •  IDC online migration •  Flush & Compaction Storms by “too many HLogs” •  Row & Column distribution •  Secondary Index •  Region Management

–  load, size balancing –  RS Allocation –  META region –  M/R

•  Monitoring for diagnostics •  Traffic burst by decommission •  NN problems •  Performance degradation

–  hotspot problem –  timeout burst –  GC problem

•  Client bugs –  Thread Blocking on server failure (HBASE-6364)

13.1.21 17 Hadoop  Conference  Japan  2013  Winter

Today’s topics

IDC online migration

NN failover

Stabilizing LINE message cluster

13.1.21 18 Hadoop  Conference  Japan  2013  Winter

IDC online migration NN failover

Stabilizing LINE message cluster

Why?

•  Move whole HBase clusters and data

•  For better network infrastructure

•  Without downtime

13.1.21 20 Hadoop  Conference  Japan  2013  Winter

IDC online migration

App Server

src-HBase

dst-HBase

write

Before migration

13.1.21 21 Hadoop  Conference  Japan  2013  Winter

IDC online migration

•  Write to both (client-level replication)

App Server

src-HBase

dst-HBase write

write

13.1.21 22 Hadoop  Conference  Japan  2013  Winter

IDC online migration

•  New data: Incremental replication •  Old data: Bulk migration •  dst’s timestamp equals src’s one

App Server

src-HBase

dst-HBase write

write

13.1.21 23 Hadoop  Conference  Japan  2013  Winter

LINE HBase Replicator & BulkMigrator

Replicator is for incremental replication BulkMigrator is for bulk migration

13.1.21 24 Hadoop  Conference  Japan  2013  Winter

LINE HBase Replicator •  Our own implementation •  Prefer pull to push

•  Throughput throttling •  Workload isolation of replicator and RS

•  Rowkey conversion and filtering

src-HBase

dst-HBase

push

HBase  Replicator

src-HBase

dst-HBase

pull

LINE  HBase  Replicator

13.1.21 25 Hadoop  Conference  Japan  2013  Winter

LINE HBase Replicator - A simple daemon to replicate local regions -

1.  HLogTracker reads a ckpt and selects next HLog.

2.  For each entry in HLog: 1.  Filter & convert a HLog.Entry 2.  Create Puts and batch to dst HBase

•  Periodic checkpointing •  Generally, entries are replicated

in seconds

13.1.21 26 Hadoop  Conference  Japan  2013  Winter

Bulk migration 1.  MapReduce between any storages

–  Map task only –  Read source, write destination –  Task scheduling problem depends on region allocation

2.  Non MapReduce version (BulkMigrator) –  Our own implementation –  HBase → HBase –  On each RS, scan & batch by a region –  Throughput throttling –  Slow, but easy to implement and debug

13.1.21 27 Hadoop  Conference  Japan  2013  Winter

IDC online migration NN failover

Stabilizing LINE message cluster

Background

•  Our HBase has a SPOF: NameNode •  “Apache Hadoop HA Configuration”

http://blog.cloudera.com/blog/2009/07/hadoop-ha-configuration/ •  Furthermore, added Pacemaker

– Heartbeat can’t detect whether NN is running

13.1.21 29 Hadoop  Conference  Japan  2013  Winter

Previous: HA-NN DRBD + VIP + Pacemaker

13.1.21 30 Hadoop  Conference  Japan  2013  Winter

NameNode failure in 2012.10

13.1.21 31 Hadoop  Conference  Japan  2013  Winter

HA-NN failover failed

•  Not NameNode process •  Incorrect leader election at network partitioning •  Complicated configuration

–  Easy to mistake, difficult to control –  Pacemaker scripting was not straightforward –  VIP is risky to HDFS

•  DRBD split-brain problem –  Protocol C –  Unable to re-sync while service is online

13.1.21 32 Hadoop  Conference  Japan  2013  Winter

Now: In-house NN failure handling

•  Bye-bye old HA-NN –  Had to restart whole HBase clusters after NN failover

•  Alternative ideas –  Quorum-based leader election (Using ZK) –  Using L4 switch –  Implement our own AvatarNode

•  Safer solution instead of a little downtime

13.1.21 33 Hadoop  Conference  Japan  2013  Winter

 rsync  with  -­‐-­‐link-­‐dest  periodically  

In-house NN failure handling (1)

13.1.21 34 Hadoop  Conference  Japan  2013  Winter

Bomb

In-house NN failure handling (2)

13.1.21 35 Hadoop  Conference  Japan  2013  Winter

In-house NN failure handling (3)

13.1.21 36 Hadoop  Conference  Japan  2013  Winter

IDC online migration NN failover

Stabilizing LINE message cluster

Stabilizing LINE message cluster

Performance  

“Too  many  HLogs”  

Hotspot  problems

Region  mappings  to  RS

META  region  workload  isola5on

RS  GC  Storm   H/W  Failure  Handling  

Case  1

Case  2

Case  4

Case  3

13.1.21 38 Hadoop  Conference  Japan  2013  Winter

Case1: “Too many HLogs” •  Effect

–  MemStore flush storm –  Compaction storm

•  Cause –  Different regions growth –  Heterogeneous tables in a RS

•  Solution –  Region balancing –  External flush scheduler

13.1.21 39 Hadoop  Conference  Japan  2013  Winter

Case1: Number of HLogs

No flushed

Forced flushed

peak off-peak

better case

worse case

Forced flushed Forced flushed

flush storm

Forced flushed

Periodic flushed

13.1.21 40 Hadoop  Conference  Japan  2013  Winter

Case2: Hotspot problems •  Effect

–  Excessive GC –  RS performance degradation (High CPU usage)

•  Cause – Get/Scan:

•  Row or column, updated too frequently •  Row which has too many columns (+ tombstones)

•  Solution –  Schema and row/column distribution are important –  Hotspot region isolation

13.1.21 41 Hadoop  Conference  Japan  2013  Winter

Case3: META region workload isolation

•  Effect 1.  RS high CPU 2.  Excessive timeout 3.  META lookup timeout

•  Cause –  Inefficient exception handling of HBase client –  Hotspot region and META in same RS

•  Solution –  META only RS

13.1.21 42 Hadoop  Conference  Japan  2013  Winter

Case4: Region mappings to RS

•  Effect –  Region mapping is not restored on RS restart –  Some region mappings aren’t restored properly

after graceful restart •  graceful_stop.sh --restart --reload

•  Cause –  HBase does not support it well

•  Solution –  Periodic dump and restore it

13.1.21 43 Hadoop  Conference  Japan  2013  Winter

Summary

•  IDC online migration –  Without downtime –  LINE HBase Replicator & BulkMigrator

•  NN failover –  Simple solution for a person saying

“What’s Hadoop?” •  Stabilizing LINE message cluster

–  Improved response time of RS

13.1.21 44 Hadoop  Conference  Japan  2013  Winter

Conclusion

We won 100M user adopting HBase

LINE Storage is a successful example

of a messaging service using HBase

13.1.21 45 Hadoop  Conference  Japan  2013  Winter

top related