five major tips to maximize performance on a 200+ sql hbase/phoenix cluster

32
Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster Masayasu “Mas” Suzuki Shinji Nagasaka Takanari Tamesue Sony Corporation

Upload: mas4share

Post on 16-Apr-2017

619 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

Masayasu “Mas” Suzuki

Shinji Nagasaka Takanari Tamesue

Sony Corporation

Page 2: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

2

Who we are, and why we chose HBase/Phoenix

We are DevOps members from

Sony’s News Suite team – http://socialife.sony.net/

HBase/Phoenix was chosen

because of – Scalability, – SQL compatibility, and – secondary indexing support

Page 3: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

3

Our use case

Internet

Sony News Suite Server Architecture

Application Server

HBase Phoenix

EventHandler

HTTP

SQL (READ)

SQL (WRITE) Fetcher

HTTP

End user

Outside content providers

Main use case is caching contents temporarily

Page 4: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

4

Basic test design

Query response time is measured as shown in red Query read/write ratio is 6 to 1 12 different types of queries using eight separate indexes

Application Server

HBase Phoenix

EventHandler SQL (READ)

SQL (WRITE) Fetcher

Page 5: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

5

Table schema

A table with 1.2 billion records were created Each record is around 1.0 Kbytes

– Raw data is around 1.7 KBytes each – Gzip is used to compress column pt and hence the total comes out to be around 1.0 Kbytes

id is the primary key – Two MD5 hashed values are concatenated to create id

• Example: df461a2bda4002aaaa8117d4e43ee737_cfcd208495d565ef66e7dff9f98764da

CHAR(65) id

VARCHAR ai

VARCHAR ao

DECIMAL b

DECIMAL c

CHAR(5) cl

CHAR(2) lg

DECIMAL lw

DECIMAL u

VARBINARY pt

1adf… TR DSATE... 82122... 9071.9 true es 823.199 0.1243 (binary)

9d0a… FB Adad... 54011… 122114.5 true ja 23.632 5.22 (binary)

c5ae... KW 4 of … 20011… 3253.55 false fr 0.343 2.77 (binary)

ea4a... AB p7mj… 67691… 8901.0 true en 76.21 23.11 (binary)

1.2 billion

records

Page 6: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

6

Split points

Because it was impossible to store all 1.2 billion records on one single node, we manually split the tables by defining the split points

Split points were set so that each divided block, or region file, would be nearly equal in size

– This was possible because we knew a. the exact range of our primary keys, and b. the hashed values of our primary keys would be uniformly distributed

CREATE TABLE IF NOT EXISTS TBL_1200M_IDX_LZ4_VER1_SPLT200_PTBIN_INT2DEC ( id CHAR(65) NOT NULL, ai VARCHAR, ao VARCHAR, b DECIMAL, c DECIMAL, cl CHAR(5), lg CHAR(2), lw DECIMAL, u DECIMAL, p_t VARBINARY, CONSTRAINT my_pk PRIMARY KEY ( id ) ) COMPRESSION='LZ4', VERSIONS='1', MAX_FILESIZE=26843545600 SPLIT ON ( '0148','0290','03d8','0520','0668','07b0','08f8','0a40', …,'fef8' );

Page 7: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

7

Distribution of region file per RegionServer

If split points can be evenly set, then data allocation can be evened out

Different color denotes different tables

200 RegionServer

Total size per

node

Page 8: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

8

Queries

Ratio of R/W queries is 6 to 1 Sample READ queries

Sample WRITE queries

Constants (ex. in the above example, 228343239, or the value of b) were randomly generated to simulate current production environment

SELECT id FROM TBL_1200M_IDX_LZ4_VER1_SPLT200_PTBIN_INT2DEC WHERE b=228343239 AND cl='false';

SELECT id FROM TBL_1200M_IDX_LZ4_VER1_SPLT200_PTBIN_INT2DEC WHERE ai=‘AB' AND cl='false' AND c>0 AND c<1417648603068;

/* Written as a Java PreparedStatement */ UPSERT INTO TBL_1200M_IDX_LZ4_VER1_SPLT200_PTBIN_INT2DEC (id,p_t,c,lw,u) VALUES (?,?,?,?,?)

Page 9: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

9

Queries – Details

Query No.

Name Read/Write Percentage generated

Description Randomly generated part

1 Id READ 25% Search using primary key Id (primary key)

2 IdCnt READ 10% Count using primary key Id (primary key)

3 IdOr READ 10% Search using “OR” of ten primary keys Id (primary key)

4 AiAoU READ 5% Search using columns Ai, Ao, and U Ai, Ao, U

5 AiCCl READ 5% Search using columns Ai, C, and Cl Ai, C, Cl

6 AiLwCl READ 5% Search using columns Ai, Lw, and Cl Ai, Lw, Cl

7 AiULg READ 5% Search using columns Ai, U, and Lg Ai, U, Lg

8 BCl READ 5% Search using columns B and Cl B, Cl

9 BLg READ 5% Search using columns B and Lg B, Lg

10 CLg READ 5% Search using columns C and Lg C, Lg

11 LwLg READ 5% Search using columns Lw and Lg Lw, Lg

12 PtCLwU WRITE 15% Upsert binary data Pt and upsert columns C, Lw, and U Id (primary key), Pt, C, Lw, U

Page 10: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

10

Secondary indexes

Following eight indexes were created Eight indexes are designed to be orthogonal indexes Split points were manually set for index tables so that each region file would be

similar in size Index No.

Name Index type Description

1 AiAoU CHAR/CHAR/DECIMAL For use in search using columns Ai, Ao, and U

2 AiCCl CHAR/DECIMAL/CHAR For use in search using columns Ai, C, and Cl

3 AiLwCl CHAR/DECIMAL/CHAR For use in search using columns Ai, Lw, and Cl

4 AiULg CHAR/DECIMAL/CHAR For use in search using columns Ai, U, and Lg

5 BCl DECIMAL/CHAR For use in search using columns B and Cl

6 BLg DECIMAL/CHAR For use in search using columns B and Lg

7 CLg DECIMAL/CHAR For use in search using columns C and Lg

8 LwLg DECIMAL/CHAR For use in search using columns Lw and Lg

Page 11: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

11

Test environment

HBase Clusters

Zookeepers

HMasters

Zookeeper 1

Zookeeper 2

Zookeeper 3

HMaster (Main)

HMaster (Secondary)

HMaster (Secondary Backup)

RegionServers

RegionServer 1

Clients

Client 1

Client 2

Client 100

・・・ disk

RegionServer 1 disk

RegionServer 200 disk

SYSTEM.CATALOG (Meta data for Phoenix Plug-in)

・・・

100 clients (100 x c4.xlarge)

3 Zookeepers (3 x m3.xlarge) 3 HMasters (3 x m3.xlarge) 200 RegionServers (199 x r3.xlarge) (1 x c4.8xlarge)

Page 12: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

12

Tools used

Tools were especially useful for – Pinpointing the bottlenecks in resource usage – Determining when and where an error occurred within the cluster – Verifying the effect of solutions applied – Managing multiple nodes seamlessly without having to manage them separately

Tools used Purpose

Analysis of resource usage per AWS instance (ex. CPU usage, network traffic, disk utilization, Java stats)

Analysis of status of HBase and Hadoop layers (ex. number of regions, store files, requests)

Analysis of distribution of each HBase table over the cluster (ex. number and size of region files per node)

Fabric Remotely control multiple nodes via SSH

Page 13: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

13

Performance test apparatus & results

Test apparatus

Test results

Specs Number of records 1.2 billion records (1KB each) Number of indexes 8 orthogonal indexes

Servers 3 Zookeepers (Zookeeper 3.4.5, m3.xlarge x 3)

3 HMaster servers (hadoop 2.5.0, hbase 0.98.6, Phoenix 4.3.0, m3.xlarge x 3)

200 RegionServers (hadoop 2.5.0, hbase 0.98.6, Phoenix 4.3.0, r3.xlarge x 199, c4.8xlarge x 1)

Clients 100 x c4.xlarge

Results Number of queries 51,053 queries/sec Response time (average) 46 ms

Page 14: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

14

Cost

Total: $325,236 (per year, “All Upfront” pricing)

This is a preliminary setup! – There is room for further spec/cost optimization

Node Type Instance Type Quantity Cost (per year)

HBase:ZooKeeper m3.xlarge 3 $ 4,284

Hadoop:Name Node HBase:Hmaster

m3.xlarge 3 $ 4,284

Hadoop:Data Node HBase:RegionServer

r3.xlarge 199 $ 307,455

HBase:RegionServer (for housing meta table SYSTEM.CATALOG)

c4.8xlarge 1 $ 9,213

Page 15: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

15

Five major tips to maximize performance using HBase/Phoenix Ordered by effectiveness

Page 16: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

16

Tips 1 – Use SQL hint clause when using an index

Response without hint clause

Response with hint clause

0

50

100

150

200

250

300

350

400

Id

IdCn

t

IdO

r

AiAo

U

AiCC

l

AiLw

Cl

AiU

Lg BCl

BLg

CLg

LwLg

PtCL

wU

0.08

0.5

1

1.5

2

2.5

[ms]

Queries using primary key W

rite

quer

y

Queries using index

Elapsed time [hours]

Performance improved by 6 times

0

50

100

150

200

250

300

350

400

Id

IdCn

t

IdO

r

AiAo

U

AiCC

l

AiLw

Cl

AiU

Lg BCl

BLg

CLg

LwLg

PtCL

wU

0.08

0.5

1

1.5

2

2.5

[ms]

Queries using primary key W

rite

quer

y

Queries using index

Elapsed time [hours]

Page 17: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

17

Tips 1 – Use SQL hint clause when using an index

Major possible cause (yet to be verified)

– When the index is used, an extra RPC is issued to verify latest meta/statistics – Using hint clause may reduce this RPC (still hypothesis)

Other possible solutions

– Changing “UPDATE_CACHE_FREQUENCY” (available from Phoenix 4.7) may resolve this issue (we have not tried this yet)

From Phoenix website … https://phoenix.apache.org/#Altering “When a SQL statement is run which references a table, Phoenix will by default check with the server to ensure it has the most up to date table metadata and statistics. This RPC may not be necessary when you know in advance that the structure of a table may never change.”

Page 18: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

18

Tips 2 – Use memories aggressively

In early stages of our testing, disk utilization and iowait of RegionServers were extremely high

Test period Test period

iowait

Page 19: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

19

Tips 2 – Use memories aggressively

Issue was most critical during major compaction and index creation

Initially, we thought we had enough memory – Total size of data (includes all tables/indexes and mirrored data in Hadoop layer)

• More than 1,360 GB

– Total available memory combined on RegionServers (then) • Around 1,500 GB (m3.2xlarge(30GiB) x 50 nodes)

But this left very little margin for computation intensive tasks

We decided to allocate memory at least 3 times the size of data for added

protection and performance (has worked thus far)

Page 20: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

20

Tips 3 – Manually split the region file but don’t over split them

A single table is too big to be placed and managed by one single node We wanted to know whether we should split in a “more finer” way or in a

“more coarser” way

Page 21: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

21

Tips 3 – Manually split the region file but don’t over split them

Comparison between 200 and 4002 split points – 200 RegionServers were used in both cases

Don’t over split region files

02000400060008000

1000012000140001600018000

0 2 4 6 8 10 12 14 16

Volu

me

proc

esse

d [q

uerie

s/se

c]

Elapsed time [H]

SplitPoint = 200 SplitPoint = 4002

0

100

200

300

400

500

600

700

0 2 4 6 8 10 12 14 16

Resp

onse

tim

e [m

s]

Elapsed time [H]

SplitPoint = 200 SplitPoint = 4002

Page 22: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

22

Tips 4 – Scale-out instead of scale-up

Comparison of RegionServers running c3.4xlarge and c3.8xlarge – c3.8xlarge is twice the spec of c3.4xlarge – Combined computing power of “100 nodes of c3.4xlarge” is equal to “50 nodes

of c3.8xlarge”, but former scores better

02000400060008000

100001200014000160001800020000

0 2 4 6 8

Volu

me

proc

esse

d [q

uerie

s/se

c]

Elapsed time [H]

c3.4xlarge x 100 c3.8xlarge x 50

020406080

100120140160180200

0 2 4 6 8

Resp

onse

tim

e [m

s]

Elapsed time [H]

c3.4xlarge x 100 c3.8xlarge x 50

Scale-out!

Page 23: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

23

Tips 5 – Avoid running power intense tasks simultaneously

For example, do not run major compaction together with index creation Also, performance impact from major compaction can be lessened by

running them in smaller units

26,142

29,980

24000

25000

26000

27000

28000

29000

30000

31000

Volu

me

proc

esse

d [q

uerie

s/se

c]

91 ms

80 ms

72

74

76

78

80

82

84

86

88

90

92

94

Resp

onse

tim

e [m

s]

Major compaction for nine tables done

simultaneously

Major compaction for nine tables done

separately

Major compaction for nine tables done

simultaneously

Major compaction for nine tables done

separately

13% increase in volume processed 9% faster

Page 24: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

24

Items of very limited or no success

Page 25: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

25

First and foremost

Please understand that these are lessons learned through our tests on our environment

Any one or all of these items may prove useful in your environment

Page 26: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

26

Items of limited success – Changing GC algorithm

RegionServers’ GC algorithm were changed and tested Performance is more even with G1 Performance of G1 is on average, 2% lower than CMS

02000400060008000

100001200014000160001800020000

0 2 4 6 8 10

Volu

me

proc

esse

d [q

uerie

s/se

c]

Elapsed time [H]

CMS G1

020406080

100120140160180200

0 2 4 6 8 10

Resp

onse

tim

e [m

s]

Elapsed time [H]

CMS G1

Page 27: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

27

Items of limited success – Changing Java heap size

RegionServers’ Java heap size were changed and tested Maximum physical memory is 30.5 GiB (r3.xlarge) When heap was set to 26.0 GB, system crashed after five hours

02000400060008000

100001200014000160001800020000

0 4 8 12 16

Volu

me

proc

esse

d [q

uerie

s/se

c]

Elapsed time [H]

JavaHeap = 20.5GB JavaHeap = 23.0GB JavaHeap = 26.0GB

020406080

100120140160180200

0 4 8 12 16

Resp

onse

tim

e [m

s]

Elapsed time [H]

JavaHeap = 20.5GB JavaHeap = 23.0GB JavaHeap = 26.0GB

Page 28: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

28

Items of limited success – Changing disk file format

RegionServers’ disk file format was changed and tested The newer xfs tend to score slightly better when compared at its highs

02000400060008000

100001200014000160001800020000

0 4 8 12 16

Volu

me

proc

esse

d [q

uerie

s/se

c]

Elapsed time [H]

ext4 xfs

020406080

100120140160180200

0 4 8 12 16

Resp

onse

tim

e [m

s]

Elapsed time [H]

ext4 xfs

Page 29: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

29

Closing comments

Page 30: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

30

Five major tips to maximize performance on HBase/Phoenix

Ordered by effectiveness (Most effective on the very top)

– An extra RPC is issued when the client runs a SQL statement that uses a secondary index – Using SQL hint clause can mitigate this – From Ver. 4.7, changing “UPDATE_CACHE_FREQUENCY” may also work (we have yet to test this)

– A memory rich node should be selected for use in RegionServers so as to minimize disk access

– More nodes running in parallel yield better results than fewer but powerful nodes running in parallel

– As an example, running major compaction and index creation simultaneously should be avoided

Tips 1. Use SQL hint clause when using a secondary index

Tips 2. Use memories aggressively

Tips 3. Manually split the region file if you can but never over split them

Tips 4. Scale-out instead of scale-up

Tips 5. Avoid running power intensive tasks simultaneously

Page 31: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

31

Special Thanks

Takafumi Suzuki – Thank you very much for the countless and invaluable discussions – We owe the success of this project to you!

Thank you very much!

Page 32: Five major tips to maximize performance on a 200+ SQL HBase/Phoenix cluster

“Sony” is a registered trademark of Sony Corporation.

Names of Sony products and services are the registered trademarks and/or trademarks of Sony Corporation or its Group companies.

Other company names and product names are the registered trademarks and/or trademarks of the respective companies.