storage infrastructure using hbase behind line messages

46

Upload: line-corporation-tech-unit

Post on 08-Sep-2014

18.350 views

Category:

Technology


0 download

DESCRIPTION

Slides at hcj13w (http://hcj2013w.eventbrite.com/)

TRANSCRIPT

Page 1: Storage infrastructure using HBase behind LINE messages
Page 2: Storage infrastructure using HBase behind LINE messages

Storage infrastructure using HBase behind LINE messages

NHN Japan Corp. LINE Server Task Force

Shunsuke Nakamura @sunsuk7tp

13.1.21 2 Hadoop  Conference  Japan  2013  Winter

Page 3: Storage infrastructure using HBase behind LINE messages

To support ’s users, we have built message storage that is

Large scale (tens of billion rows/day) Responsive (under 10 ms)

High available (dual clusters)

13.1.21 3 Hadoop  Conference  Japan  2013  Winter

Page 4: Storage infrastructure using HBase behind LINE messages

Outline

•  About LINE •  LINE & Storage requirements •  What we achieved •  Today’s topics

–  IDC online migration – NN failover – Stabilizing LINE message cluster

•  Conclusion 13.1.21 4 Hadoop  Conference  Japan  2013  Winter

Page 5: Storage infrastructure using HBase behind LINE messages

LINE - A global messenger powered by NHN Japan -

Devices 5 different mobile platforms + Desktop support

13.1.21 5 Hadoop  Conference  Japan  2013  Winter

Page 6: Storage infrastructure using HBase behind LINE messages

13.1.21 6 Hadoop  Conference  Japan  2013  Winter

Page 7: Storage infrastructure using HBase behind LINE messages

13.1.21 7 Hadoop  Conference  Japan  2013  Winter

Page 8: Storage infrastructure using HBase behind LINE messages
Page 9: Storage infrastructure using HBase behind LINE messages

New year 2013 in Japan

3  5mes  traffic  explosion  LINE  Storage  had  no  problems  :)  

(ploFed  by  1min)

13.1.21 9 Hadoop  Conference  Japan  2013  Winter

Number of requests in a HBase cluster Usual Peak Hours New Year 2013

あけおめ! 新年好!

X  3

Page 10: Storage infrastructure using HBase behind LINE messages

LINE on Hadoop Storages for service, backup and log

For HBase, M/R and log archive

Bulk migration and ad-hoc analysis

For HBase and Sharded-Redis

Collecting Apache and Tomcat logs

KPI, Log analysis 13.1.21 10 Hadoop  Conference  Japan  2013  Winter

Page 11: Storage infrastructure using HBase behind LINE messages

LINE on Hadoop Storages for service, backup and log

For HBase, M/R and log archive

Bulk migration and ad-hoc analysis

For HBase and Sharded-Redis

Collecting Apache and Tomcat logs

KPI, Log analysis 13.1.21 11 Hadoop  Conference  Japan  2013  Winter

Page 12: Storage infrastructure using HBase behind LINE messages

LINE service requirements

LINE is a… Messaging Service - Should be fast Global Service - Downtime not allowed

But, not a Simple Messaging Service. Message synchronization b/w phone & PCs – Messages should be kept for a while.

13.1.21 12 Hadoop  Conference  Japan  2013  Winter

Page 13: Storage infrastructure using HBase behind LINE messages

LINE’s storage requirements

HA

No    data  loss

Low  latency

Easy  scale-­‐out

Flexible  schema  

management

Eventual  consistency

13.1.21 13 Hadoop  Conference  Japan  2013  Winter

Page 14: Storage infrastructure using HBase behind LINE messages

Our selection is HBase

•  Low latency for large amount of data

•  Linearly scalable •  Relatively lower operating cost

– Replication by nature – Automatic failover

•  Data model fits our requirements – Semi-structured – Timestamp

13.1.21 14 Hadoop  Conference  Japan  2013  Winter

Page 15: Storage infrastructure using HBase behind LINE messages

Stored rows per day in a cluster 10

8

6

4

2

(billions/day)

13.1.21 15 Hadoop  Conference  Japan  2013  Winter

Page 16: Storage infrastructure using HBase behind LINE messages

What we achieved with HBase

•  No data loss – Persistent – Data replication

•  Automatic recovery from server failure

•  Reasonable performance for large data sets – Hundreds of billion rows – Write: ~ 1 ms – Read: 1 ~ 10 ms

13.1.21 16 Hadoop  Conference  Japan  2013  Winter

Page 17: Storage infrastructure using HBase behind LINE messages

Many issues we had •  Heterogeneous storages coordination •  IDC online migration •  Flush & Compaction Storms by “too many HLogs” •  Row & Column distribution •  Secondary Index •  Region Management

–  load, size balancing –  RS Allocation –  META region –  M/R

•  Monitoring for diagnostics •  Traffic burst by decommission •  NN problems •  Performance degradation

–  hotspot problem –  timeout burst –  GC problem

•  Client bugs –  Thread Blocking on server failure (HBASE-6364)

13.1.21 17 Hadoop  Conference  Japan  2013  Winter

Page 18: Storage infrastructure using HBase behind LINE messages

Today’s topics

IDC online migration

NN failover

Stabilizing LINE message cluster

13.1.21 18 Hadoop  Conference  Japan  2013  Winter

Page 19: Storage infrastructure using HBase behind LINE messages

IDC online migration NN failover

Stabilizing LINE message cluster

Page 20: Storage infrastructure using HBase behind LINE messages

Why?

•  Move whole HBase clusters and data

•  For better network infrastructure

•  Without downtime

13.1.21 20 Hadoop  Conference  Japan  2013  Winter

Page 21: Storage infrastructure using HBase behind LINE messages

IDC online migration

App Server

src-HBase

dst-HBase

write

Before migration

13.1.21 21 Hadoop  Conference  Japan  2013  Winter

Page 22: Storage infrastructure using HBase behind LINE messages

IDC online migration

•  Write to both (client-level replication)

App Server

src-HBase

dst-HBase write

write

13.1.21 22 Hadoop  Conference  Japan  2013  Winter

Page 23: Storage infrastructure using HBase behind LINE messages

IDC online migration

•  New data: Incremental replication •  Old data: Bulk migration •  dst’s timestamp equals src’s one

App Server

src-HBase

dst-HBase write

write

13.1.21 23 Hadoop  Conference  Japan  2013  Winter

Page 24: Storage infrastructure using HBase behind LINE messages

LINE HBase Replicator & BulkMigrator

Replicator is for incremental replication BulkMigrator is for bulk migration

13.1.21 24 Hadoop  Conference  Japan  2013  Winter

Page 25: Storage infrastructure using HBase behind LINE messages

LINE HBase Replicator •  Our own implementation •  Prefer pull to push

•  Throughput throttling •  Workload isolation of replicator and RS

•  Rowkey conversion and filtering

src-HBase

dst-HBase

push

HBase  Replicator

src-HBase

dst-HBase

pull

LINE  HBase  Replicator

13.1.21 25 Hadoop  Conference  Japan  2013  Winter

Page 26: Storage infrastructure using HBase behind LINE messages

LINE HBase Replicator - A simple daemon to replicate local regions -

1.  HLogTracker reads a ckpt and selects next HLog.

2.  For each entry in HLog: 1.  Filter & convert a HLog.Entry 2.  Create Puts and batch to dst HBase

•  Periodic checkpointing •  Generally, entries are replicated

in seconds

13.1.21 26 Hadoop  Conference  Japan  2013  Winter

Page 27: Storage infrastructure using HBase behind LINE messages

Bulk migration 1.  MapReduce between any storages

–  Map task only –  Read source, write destination –  Task scheduling problem depends on region allocation

2.  Non MapReduce version (BulkMigrator) –  Our own implementation –  HBase → HBase –  On each RS, scan & batch by a region –  Throughput throttling –  Slow, but easy to implement and debug

13.1.21 27 Hadoop  Conference  Japan  2013  Winter

Page 28: Storage infrastructure using HBase behind LINE messages

IDC online migration NN failover

Stabilizing LINE message cluster

Page 29: Storage infrastructure using HBase behind LINE messages

Background

•  Our HBase has a SPOF: NameNode •  “Apache Hadoop HA Configuration”

http://blog.cloudera.com/blog/2009/07/hadoop-ha-configuration/ •  Furthermore, added Pacemaker

– Heartbeat can’t detect whether NN is running

13.1.21 29 Hadoop  Conference  Japan  2013  Winter

Page 30: Storage infrastructure using HBase behind LINE messages

Previous: HA-NN DRBD + VIP + Pacemaker

13.1.21 30 Hadoop  Conference  Japan  2013  Winter

Page 31: Storage infrastructure using HBase behind LINE messages

NameNode failure in 2012.10

13.1.21 31 Hadoop  Conference  Japan  2013  Winter

Page 32: Storage infrastructure using HBase behind LINE messages

HA-NN failover failed

•  Not NameNode process •  Incorrect leader election at network partitioning •  Complicated configuration

–  Easy to mistake, difficult to control –  Pacemaker scripting was not straightforward –  VIP is risky to HDFS

•  DRBD split-brain problem –  Protocol C –  Unable to re-sync while service is online

13.1.21 32 Hadoop  Conference  Japan  2013  Winter

Page 33: Storage infrastructure using HBase behind LINE messages

Now: In-house NN failure handling

•  Bye-bye old HA-NN –  Had to restart whole HBase clusters after NN failover

•  Alternative ideas –  Quorum-based leader election (Using ZK) –  Using L4 switch –  Implement our own AvatarNode

•  Safer solution instead of a little downtime

13.1.21 33 Hadoop  Conference  Japan  2013  Winter

Page 34: Storage infrastructure using HBase behind LINE messages

 rsync  with  -­‐-­‐link-­‐dest  periodically  

In-house NN failure handling (1)

13.1.21 34 Hadoop  Conference  Japan  2013  Winter

Page 35: Storage infrastructure using HBase behind LINE messages

Bomb

In-house NN failure handling (2)

13.1.21 35 Hadoop  Conference  Japan  2013  Winter

Page 36: Storage infrastructure using HBase behind LINE messages

In-house NN failure handling (3)

13.1.21 36 Hadoop  Conference  Japan  2013  Winter

Page 37: Storage infrastructure using HBase behind LINE messages

IDC online migration NN failover

Stabilizing LINE message cluster

Page 38: Storage infrastructure using HBase behind LINE messages

Stabilizing LINE message cluster

Performance  

“Too  many  HLogs”  

Hotspot  problems

Region  mappings  to  RS

META  region  workload  isola5on

RS  GC  Storm   H/W  Failure  Handling  

Case  1

Case  2

Case  4

Case  3

13.1.21 38 Hadoop  Conference  Japan  2013  Winter

Page 39: Storage infrastructure using HBase behind LINE messages

Case1: “Too many HLogs” •  Effect

–  MemStore flush storm –  Compaction storm

•  Cause –  Different regions growth –  Heterogeneous tables in a RS

•  Solution –  Region balancing –  External flush scheduler

13.1.21 39 Hadoop  Conference  Japan  2013  Winter

Page 40: Storage infrastructure using HBase behind LINE messages

Case1: Number of HLogs

No flushed

Forced flushed

peak off-peak

better case

worse case

Forced flushed Forced flushed

flush storm

Forced flushed

Periodic flushed

13.1.21 40 Hadoop  Conference  Japan  2013  Winter

Page 41: Storage infrastructure using HBase behind LINE messages

Case2: Hotspot problems •  Effect

–  Excessive GC –  RS performance degradation (High CPU usage)

•  Cause – Get/Scan:

•  Row or column, updated too frequently •  Row which has too many columns (+ tombstones)

•  Solution –  Schema and row/column distribution are important –  Hotspot region isolation

13.1.21 41 Hadoop  Conference  Japan  2013  Winter

Page 42: Storage infrastructure using HBase behind LINE messages

Case3: META region workload isolation

•  Effect 1.  RS high CPU 2.  Excessive timeout 3.  META lookup timeout

•  Cause –  Inefficient exception handling of HBase client –  Hotspot region and META in same RS

•  Solution –  META only RS

13.1.21 42 Hadoop  Conference  Japan  2013  Winter

Page 43: Storage infrastructure using HBase behind LINE messages

Case4: Region mappings to RS

•  Effect –  Region mapping is not restored on RS restart –  Some region mappings aren’t restored properly

after graceful restart •  graceful_stop.sh --restart --reload

•  Cause –  HBase does not support it well

•  Solution –  Periodic dump and restore it

13.1.21 43 Hadoop  Conference  Japan  2013  Winter

Page 44: Storage infrastructure using HBase behind LINE messages

Summary

•  IDC online migration –  Without downtime –  LINE HBase Replicator & BulkMigrator

•  NN failover –  Simple solution for a person saying

“What’s Hadoop?” •  Stabilizing LINE message cluster

–  Improved response time of RS

13.1.21 44 Hadoop  Conference  Japan  2013  Winter

Page 45: Storage infrastructure using HBase behind LINE messages

Conclusion

We won 100M user adopting HBase

LINE Storage is a successful example

of a messaging service using HBase

13.1.21 45 Hadoop  Conference  Japan  2013  Winter

Page 46: Storage infrastructure using HBase behind LINE messages