april 2014 hug : apache phoenix

21
© Hortonworks Inc. 2011 Apache Phoenix – SQL skin over HBase Jeffrey Zhong [email protected] [email protected]

Upload: yahoo-developer-network

Post on 11-Aug-2014

1.286 views

Category:

Data & Analytics


2 download

DESCRIPTION

April 2014 HUG : Apache Phoenix

TRANSCRIPT

Page 1: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Apache Phoenix – SQL skin over HBase

Jeffrey [email protected]@apache.org

Page 2: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Overview

•What is Phoenix?•Major Phoenix Features•Futures •Phoenix In Action•Summary

Architecting the Future of Big Data

Page 3: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

What is Phoenix?SQL skin for HBase originally developed by folks in

Salesforce.com and now is an Apache Incubator ProjectTargets low latency queries over HBase data

Query engine transforms SQL into native HBase APIs: put, delete, parallel scans instead of Map/Reduce

Delivered as an fat JDBC driver(client)•Support features not provided by HBase: Secondary

Indexing, Multi-tenancy, simple Hash Join and more

Architecting the Future of Big Data

Page 4: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Phoenix Semantics Support

Architecting the Future of Big Data

Feature Supported?

UPSERT / DELETE Yes

SELECT Yes

WHERE / HAVING Yes

GROUP BY Yes

ORDER BY Yes

LIMIT Yes

Views Yes

JOIN Yes (Introduced in 4.0), limited to hash joins

Transactions No

Page 5: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Why Phoenix?Leverage existing tooling

SQL client

•Free the burden to write huge amount code to do simple things

SELECT COUNT(*) FROM WEB_STAT WHERE HOST='EU' and CORE > 35 GROUP BY DOMAIN;

•Performance optimizations transparent to the user Phoenix breaks up queries into multiple scans and runs them in

parallel. For aggregate queries, coprocessors complete partial aggregation on local region server and only returns relevant data to the client

Architecting the Future of Big Data

Page 6: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Phoenix Query Optimization

0: jdbc:phoenix:localhost> explain SELECT count(*) FROM WEB_STAT WHERE HOST='EU' and CORE > 35 GROUP BY DOMAIN;+------------+| PLAN |+------------+| CLIENT PARALLEL 32-WAY RANGE SCAN OVER WEB_STAT ['EU'] || SERVER FILTER BY USAGE.CORE > 35 || SERVER AGGREGATE INTO DISTINCT ROWS BY [DOMAIN] || CLIENT MERGE SORT |+------------+

Architecting the Future of Big Data

CREATE TABLE IF NOT EXISTS WEB_STAT ( HOST CHAR(2) NOT NULL, DOMAIN VARCHAR NOT NULL, FEATURE VARCHAR NOT NULL, DATE DATE NOT NULL, USAGE.CORE BIGINT, USAGE.DB BIGINT, STATS.ACTIVE_VISITOR INTEGER CONSTRAINT PK PRIMARY KEY (HOST, DOMAIN, FEATURE, DATE));

SELECT count(*) FROM WEB_STAT WHERE HOST='EU' and CORE > 35 GROUP BY DOMAIN;

WEB_STAT Table Schema

Page 7: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Major Features In Phoenix DDL support: CREATE/DROP/ALTER TABLE for adding/removing

columns Extend Schema at query time: Dynamic Column Salting Mapping to an existing HBase table

DML support: UPSERT VALUES for row-by-row insertion, UPSERT SELECT for mass data transfer between the same or different tables and DELETE for deleting rows

Secondary Indexes to improve performance for queries on non-row key columns(still maturing)

Multi-Tenancy (Available in Phoenix 3.0/4.0) Limited Hash Join(Available in Phoenix 3.0/4.0)

Architecting the Future of Big Data

Page 8: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Phoenix Futures

• Improved Secondary Indexing.–Tolerant of region split/merge, RegionServer

failures.• Improved JOIN support.•Transaction support.• Improved Phoenix / Hive interoperability.•More at

http://phoenix.incubator.apache.org/roadmap.html

Architecting the Future of Big Data

Page 9: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Mapping an existing HBase Table

Architecting the Future of Big Data

• create 't1', {NAME=>'f1', VERSIONS => 3}– put 't1', 'r1', 'f1.col1', 'val1’– put 't1', ’r2', 'f1.col2', 'val2’

• Mapping t1 into Phoenix Table– Phoenix stores its own metadata in Table SYSTEM.CATALOG so you need recreate Phoenix

Table or Views to mapping the existing HBase Table– By default, Phoenix uses capital characters, so it’s a better practice to use always “”.

• create table "t1" (myPK VARCHAR PRIMARY KEY, "f1"."col1" VARCHAR);0: jdbc:phoenix:localhost> select * from "t1";+------------+------------+| MYPK | col1 |+------------+------------+| r1 | val1 || r2 | null |+------------+------------+2 rows selected (0.049 seconds)

0: jdbc:phoenix:localhost> select * from t1;Error: ERROR 1012 (42M03): Table undefined. tableName=T1 (state=42M03,code=1012)

Page 10: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Changes Behind Scenes of Mapping

Architecting the Future of Big Data

• Metadata are inserted into SYSTEM.CATALOG table0: jdbc:phoenix:localhost> select table_name, column_name, table_type from system.catalog where table_name='t1';+------------+-------------+------------+| TABLE_NAME | COLUMN_NAME | TABLE_TYPE |+------------+-------------+------------+| t1 | null | u || t1 | MYPK | null || t1 | col1 | null |+------------+-------------+------------+

• Empty cell is created for each row. It’s used to enforce PRIMAY KEY constraints because HBase doesn’t store cells with NULL values.

hbase(main):023:0> scan 't1'ROW COLUMN+CELL r1 column=f1:_0, timestamp=1397527184229, value= r1 column=f1:col1, timestamp=1397527184229, value=val1 r2 column=f1:_0, timestamp=1397527197205, value= r2 column=f1:col2, timestamp=1397527197205, value=val2

Page 11: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Mapping an existing HBase Table – Cont.

•The bytes were serialized must match the way the bytes are serialized by Phoenix. You can refer to Phoenix data types. (http://phoenix.incubator.apache.org/language/datatypes.html)

Architecting the Future of Big Data

Page 12: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Dynamic Columns - Extend Schema During Query

•HBase can create new columns(qualifier) after table created. In Phoenix, a subset of columns may be specified at table create time while the rest is possibly surfaced at query time through dynamic columns.– In the previous table mapping, we only mapped one column “f1”.”col1” create table "t1" (myPK VARCHAR PRIMARY KEY, "f1"."col1" VARCHAR);

– In order to get data from col2, we can do 0: jdbc:phoenix:localhost> select * from "t1"("f1"."col2" VARCHAR);+------------+------------+------------+| MYPK | col1 | col2 |+------------+------------+------------+| r1 | val1 | null || r2 | null | val2 |+------------+------------+------------+2 rows selected (0.065 seconds)

Architecting the Future of Big Data

Page 13: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Secondary Index

• Index data are stored in separate HBase table and located in different region servers other than data table.

•Two types of Secondary IndexImmutable Indexes

– Targets tables where rows are immutable after written– When new rows are inserted, updates are sent to data

table and then index table– Client handles failures

Mutable Indexes

Architecting the Future of Big Data

Page 14: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Phoenix Secondary Index – Cont.Mutable Indexes

– Implemented through coprocessors–Aborts region server when index updates fails(could change with

custom IndexFailurePolicy)

Courtesy of Jesse Yates from SF Hbase User Group Slides

Page 15: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Phoenix Secondary Index – Cont.

• Index Creation– Same statement to create both types of indexes. Immutable Indexes are

created for tables created with “IMMUTABLE_ROWS=true” otherwise mutable indexes are created

–DDL Statement: CREATE INDEX <index_name>

ON <table_name>(<columns_to_index>…)INCLUDE (<columns_to_cover>…);

– Examples– create index "t1_index" on "t1" ("f1"."col1")

– Verify index will be used0: jdbc:phoenix:localhost> explain select * from "t1" where "f1"."col1"='val1';+------------+| PLAN |+------------+| CLIENT PARALLEL 1-WAY RANGE SCAN OVER t1_index ['val1'] |+------------+1 row selected (0.037 seconds)

Architecting the Future of Big Data

Page 16: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Phoenix Secondary Index – Cont.

•How Index Data are Storedhbase(main):008:0> scan 't1_index'ROW COLUMN+CELL \x00r2 column=0:_0, timestamp=1397611429248, value= val1\x00r1 column=0:_0, timestamp=1397611429248, value=

Row key are concatenated with index column values delimited by a zero byte character end with data table primary key. If you define covered columns, you’ll see cells with their values as well in the index table.

Architecting the Future of Big Data

Page 17: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Salted Table

•HBase uses salting to prevent region server hot spotting if row key is monotonically increasing. Phoenix provides a way to salt the row key with salting bytes during table creation time.

For optimal performance, number of salt buckets should match number of region servers

Architecting the Future of Big Data

CREATE TABLE table (a_key VARCHAR PRIMARY KEY, a_col VARCHAR) SALT_BUCKETS = 20;

Page 18: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Resources•Apache Phoenix Home Page

–http://phoenix.incubator.apache.org/index.html•Mailing Lists

–http://phoenix.incubator.apache.org/mailing_list.html•Latest Release

–Phoenix 3.0 for HBase0.94.*, Phoenix 4.0 for HBase0.98.1+(http://phoenix.incubator.apache.org/download.html)– HDP(Hortonworks Data Platform)2.1 will ship Phoenix

4.0

Architecting the Future of Big Data

Page 19: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Try by yourself

•Load Sample Data./psql.py localhost ../examples/WEB_STAT.sql ../examples/WEB_STAT.csv

•Start Sql Client./sqlline.py localhost

•Run Performance Test./performance.py localhost 10000

Architecting the Future of Big Data

Assuming HBase Zookeeper Quorum String = “localhost” and you are under bin folder of the installation.

Page 20: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Summary

•Phoenix vs HBase Native APIsAs a rule of thumb, you should leverage Phoenix as your Hbase client whenever is possible because Phoenix provides easy to use APIs and performance optimizations.

Architecting the Future of Big Data

Page 21: April 2014 HUG : Apache Phoenix

© Hortonworks Inc. 2011

Questions? Comments?