apache phoenix: past, present and future of sql over hbase

41
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Phoenix and HBase: Past, Present and Future of SQL over HBase Enis Soztutar ([email protected])

Upload: enissoz

Post on 20-Jan-2017

2.084 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Apache Phoenix and HBase: Past, Present and Future of SQL over HBase

Enis Soztutar ([email protected])

Page 2: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

About Me

Enis Soztutar

Committer and PMC member in Apache HBase, Phoenix, and Hadoop

HBase/Phoenix team @Hortonworks

Twitter @enissoz

Disclaimer: Not a SQL expert!

Page 3: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Outline

PART I – The Past (a.k.a. All the existing stuff) Phoenix the basics Architecture Overview of existing Phoenix features

PART II – The Present (a.k.a. All the recent stuff) Look at recent releases Transactions Phoenix Query Server Other features

PART III – The Future (a.k.a. All the upcoming stuff) Calcite integration Phoenix – Hive

Page 4: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Part I – The PastAll the existing stuff !

Page 5: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Obligatory Slide - Who uses Phoenix

Page 6: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Phoenix – The Basics

• Hope everybody is familiar with HBase• Otherwise you are in the wrong talk!

• What is wrong with pure-HBase?• HBase is a powerful, flexible and extensible “engine”• Too low level• Have to write java code to do anything!

• Phoenix is relational layer over HBase• Also described as a SQL-Skin• Looking more and more like a generic SQL engine

• Why not Hive / Spark SQL / other SQL-over-Hadoop• OTLP versus OLAP• As fast as HBase, 1 ms query, 10K-1M qps

Page 7: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Why SQL?

Page 8: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

From CDK Global slideshttps://phoenix.apache.org/presentations/StrataHadoopWorld.pdf

Page 9: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HBase Architecture

DataNode

RegionServer 2

T:foo, region:a

T:bar, region:54

T:foo, region:t

Application

HBase client

DataNode

RegionServer 1

T:foo, region:c

T:bar, region:14

T:foo, region:d

DataNode

RegionServer 3

T:bar, region:32

T:foo, region:k

ZooKeeper Quorum

Page 10: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Phoenix Architecture

DataNode

RegionServer 2

T:foo, region:c

T:bar, region:54

T:foo, region:t

Phoenix RPC endpoint

px

px

Application

Phoenix client / JDBC

HBase client

DataNode

RegionServer 1

T:foo, region:c

T:bar, region:14

T:foo, region:d

Phoenix RPC endpoint

px

px

DataNode

RegionServer 3

T:SYSTEM.CATALOG

T:bar, region:32

T:foo, region:k

Phoenix RPC endpoint

px

px

ZooKeeper Quorum

Page 11: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Phoenix Goodies

SQL DataTypesSchemas / DDL / HBase table propertiesComposite Types (Composite Primary Key)Map existing HBase tablesWrite from HBase, read from PhoenixSaltingParallel ScanSkip scanFilter push downStatistics Collection / Guideposts

Page 12: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

DDL Example

CREATE TABLE IF NOT EXISTS METRIC_RECORD ( METRIC_NAME VARCHAR, HOSTNAME VARCHAR, SERVER_TIME UNSIGNED_LONG NOT NULL METRIC_VALUE DOUBLE, … CONSTRAINT pk PRIMARY KEY (METRIC_NAME, HOSTNAME, SERVER_TIME))DATA_BLOCK_ENCODING=’FAST_DIFF', TTL=604800, COMPRESSION=‘SNAPPY’SPLIT ON ('a', 'k', 'm');

Page 13: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

METRIC_NAME HOSTNAME SERVER_TIME METRIC_VALUE

Regionserver.readRequestCount cn011.hortonworks.com 1396743589 92045759

Regionserver.readRequestCount cn011.hortonworks.com 1396767589 93051916

Regionserver.readRequestCount cn011.hortonworks.com …. …

Regionserver.readRequestCount cn012. hortonworks.com 1396743589

….. … … …

Regionserver.wal.bytesWritten cn011.hortonworks.com

Regionserver.wal.bytesWritten …. …. …

SORT ORDER

SO

RT O

RD

ER

HBASE ROW KEY OTHER COLUMNS

Page 14: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Parallel ScanSELECT * FROM METRIC_RECORD;

CLIENT 4-CHUNK PARALLEL 1-WAY FULL SCAN OVER METRIC_RECORD

Region1

Region2

Region3

Region4

Client

RS

3R

S 2

RS

1

scan

scan

scan

scan

Page 15: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Filter push downSELECT * FROM METRIC_RECORD WHERE SERVER_TIME > NOW() - 7;

CLIENT 4-CHUNK PARALLEL 1-WAY FULL SCAN OVER METRIC_RECORD SERVER FILTER BY SERVER_TIME > DATE '2016-04-06 09:09:05.978’

Region1

Region2

Region3

Region4

Client

RS

3R

S 2

RS

1

scan

scan

scan

scan

Server-side Filter

Page 16: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Skip ScanSELECT * FROM METRIC_RECORD WHERE METRIC_NAME LIKE 'abc%' AND HOSTNAME in ('host1’, 'host2');

CLIENT 1-CHUNK PARALLEL 1-WAY SKIP SCAN ON 2 RANGES OVER METRIC_RECORD ['abc','host1'] - ['abd','host2']

Region1

Region2

Region3

Region4

Client

RS

3R

S 2

RS

1

Skip scan

Page 17: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

TopNSELECT * FROM METRIC_RECORD WHERE SERVER_TIME > NOW() - 7 ORDER BY HOSTNAME LIMIT 5;

CLIENT 4-CHUNK PARALLEL 4-WAY FULL SCAN OVER METRIC_RECORD

SERVER FILTER BY SERVER_TIME > …

SERVER TOP 5 ROWS SORTED BY [HOSTNAME]CLIENT MERGE SORT

Region1

Region2

Region3

Region4

Client

RS

3R

S 2

RS

1

scan

scan

scan

scan

Sort by HOSTNAMEReturn only 5 ROWS

Page 18: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

AggregationSELECT METRIC_NAME, HOSTNAME, AVG(METRIC_VALUE)

FROM METRIC_RECORD WHERE SERVER_TIME > NOW() - 7 GROUP BY METRIC_NAME, HOSTNAME ORDER BY METRIC_NAME, HOSTNAME;

CLIENT 4-CHUNK PARALLEL 1-WAY FULL SCAN OVER METRIC_RECORD

SERVER FILTER BY SERVER_TIME > …

SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY

[METRIC_NAME, HOSTNAME]

CLIENT MERGE SORT

Region1

Region2

Region3

Region4

Client

RS

3R

S 2

RS

1

scan

scan

scan

scan

Return only aggregated data by METRIC_NAME, HOSTNAME

Page 19: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Joins and subqueries in Phoenix

Grammar• Inner, Left, Right, Full outer join, Cross join• Semi-join / Anti-join

Algorithms• Hash-join, sort-merge join• Hash-join table is computed and pushed to each regionserver from client

Optimizations• Predicate push-down• PK-to-FK join optimization• Global index with missing columns• Correlated query rewrite

Page 20: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Joins and subqueries in Phoenix

Phoenix can execute most of TPC-H queries!No nested loop joinWith Calcite support, more improvements soonNo statistical Guided join selection yetNot very good at executing very big joins

• No generic YARN / Tez execution layer• But Hive / Spark support for generic DAG execution

Page 21: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Secondary Indexes

HBase table is a sorted map• Everything in HBase is sorted in primary key order• Full or partial scans in sort order is very efficient in HBase• Sort data differently with secondary index dimensions

Two types• Global index• Local index

Query• Indexes are “covered”• Indexes are automatically selected from queries• Only covered columns are returned from index without going back to data table

Page 22: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Global and Local Index

Global Index• A single instance for all table data in a

different sort order• A different HBase table per index• Optimized for read-heavy use cases• Can be one edit “behind” actual primary

data• Transactional tables indices have ACID

guarantees• Different consistency / durability for

mutable / immutable tables

Local Index• Multiple mini-instances per region

• Uses same HBase table, different cf• Optimized for write-heavy use cases• Atomic commit and visibility (coming soon)• Queries have to ask all regions for

relevant data from index

Page 23: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Part II – The PresentAll the recent stuff !

Page 24: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Release Note Highlights

4.4• Functional Indexes• UDFs• Query Server• UNION ALL• MR Index Build• Spark Integration• Date built-in functions

4.5• Client-side per-statement metrics• SELECT without FROM• ALTER TABLE with VIEWS• Math and Array built-in functions

Page 25: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Release Note Highlights

4.6• ROW_TIMESTAMP for HBase native timestamps• Support for correlate variable• Support for un-nesting arrays• Web-app for visualizing trace info (alpha)

4.7 • Transaction support• Enhanced secondary index consistency guarantees• Statistics improvements• Perf improvements

Page 26: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Row Timestamps

A pseudo-column for HBase native timestamps (versions)Enables setting and querying cell timestamps Perfect for time-series use cases

• Combine with FIFO / Date Tiered Compaction policies• And HBase scan file pruning based on min-max ts for very efficient scans

CREATE TABLE METRICS_TABLE ( CREATED_DATE NOT NULL DATE, METRIC_ID NOT NULL CHAR(15), METRIC_VALUE LONG CONSTRAINT PK PRIMARY KEY(CREATED_DATE ROW_TIMESTAMP, METRIC_ID)) SALT_BUCKETS = 8;

Page 27: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Transactions

Uses TephraSnapshot isolation semanticsCompletely optional.

• Can be enabled per-table (TRANSACTIONAL=true)• Transactional and non-transactional tables can live side by side

Transactions see their own uncommitted dataReleased in 4.7, will GA in 5.0Optimistic Concurrency Control

• No locking for rows• Transactions have to roll back and undo their writes in case of conflict• Cost of conflict is higher

Page 28: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Tephra Architecture

RegionServer 2

Tephra / HBase Client

RegionServer 1 RegionServer 3

HBase client

ZooKeeper Quorum

Tephra Trx Manager(active)

Tephra Trx Manager(standby)

Page 29: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Transaction Lifecycle

From Tephra presentation http://www.slideshare.net/alexbaranau/transactions-over-hbase

Page 30: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Phoenix Query Server

Similar to HBase REST Server / Hive Server 2Built on top of Calcite’s Avatica Server with Phoenix bindingsEmbeds a Phoenix thick client insideNo client side sorting / join! Protobuf-3.0 over HTTP protocolHas a (thin) JDBC driver Allows ODBC driver for Phoenix

Page 31: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Phoenix architecture revisited (thick client)

RegionServer 2

T:foo, region:d

Phoenix RPC endpoint

px

Application

RegionServer 1

T:foo, region:d

Phoenix RPC endpoint

px

RegionServer 3

T:foo, region:d

Phoenix RPC endpoint

px

HBase client

Phoenix client / JDBC

Page 32: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Phoenix Query Server

Phoenix Query Server (thin client)

RegionServer 2

T:foo, region:d

Phoenix RPC endpoint

px

Application

Phoenix thin client / JDBC

RegionServer 1

T:foo, region:d

Phoenix RPC endpoint

px

RegionServer 3

T:foo, region:d

Phoenix RPC endpoint

px

Phoenix client / JDBC

HBase client

Phoenix Query Server

Phoenix client / JDBC

HBase client

Phoenix Query Server

Phoenix client / JDBC

HBase client

Page 33: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Other new features (4.8+)

Shaded client by default. No more library dependency problems! Phoenix schema mapping to HBase namespace

• Allows using isolation and security features of HBase namespaces• Standard SQL syntax:

CREATE SCHEMA FOO; USE FOO;

LIMIT / OFFSET• We already had LIMIT. Now we have OFFSET• Together with Row-Value-Constructs, covers most of cursor use cases

Page 34: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Part III – The FutureAll the upcoming stuff !

Page 35: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Local Index

• Local Index re-implemented• Instead of a different table, now local index data is kept within the same data

table• Local index data goes into a different column family• Index and data is committed together atomically without external transactions• Bunch of stability improvements with region splits and merges

Page 36: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Calcite Integration

Calcite is a framework for:• Query parser• Compiler• Planner• Cost based optimizer

SQL-92 compliantBased on relational algebraCost based optimizer with default rules + pluggable rules per-backendUsed by Hive / Drill / Kylin / Samza, etc.

Page 37: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Calcite Integration

Page 38: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Phoenix - Hive integration

Hive is a very rich and generic execution engineUses Tez + YARN to execute arbitrary DAGHive integration enables big joins and other Hive featuresPhoenix DDL with HiveQLData insert / update delete (DML) with HiveQLPredicate pushdown, salting, partitioning, partition pruning, etc Can use secondary indexes as well since it uses Phoenix compilerhttps://issues.apache.org/jira/browse/PHOENIX-2743

Page 39: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Future<Phoenix>

JSON supportTPC-H / Microstrategy / Tableau queriesSqoop integrationSupport Omid based transactionsDogfooding within the Hadoop-ecosystem

• Ambari Metrics Service (AMS) uses Phoenix • YARN will soon use HBase / Phoenix (ATS)

STRUCT typeImprovements to cost based optimizationSecurity and other HBase features used from PhoenixSee https://phoenix.apache.org/roadmap.html

Page 40: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Further Reference

Even more info on https://phoenix.apache.org New Features: https://phoenix.apache.org/recent.html Roadmap: https://phoenix.apache.org/roadmap.html

Get involved in mailing lists [email protected] [email protected]

Page 41: Apache phoenix: Past, Present and Future of SQL over HBAse

Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

ThanksQ & A