apache phoenix query server phoenixcon2016

19
Apache Phoenix Query Server Josh Elser 2016-05-25

Upload: josh-elser

Post on 16-Apr-2017

862 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Apache Phoenix Query Server PhoenixCon2016

Apache PhoenixQuery ServerJosh Elser2016-05-25

Page 2: Apache Phoenix Query Server PhoenixCon2016

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

About me• Apache Phoenix Committer

• Apache Calcite Committer and PMC

• Long-time NoSQL developer, re-learning SQL

• ASF Member, Mentor, always-available-dude

Calcite, Apache Calcite, Avatica, Phoenix, and Apache Phoenix are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

Page 3: Apache Phoenix Query Server PhoenixCon2016

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

What?

Why?

How?

Apache Phoenix Query Server

Page 4: Apache Phoenix Query Server PhoenixCon2016

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

“What” is Apache Phoenix?

Been called many things [1]

– “We put the SQL back in NoSQL!”– “A SQL skin on HBase”– “A relational layer on HBase”– “Online transaction processing and operational analytics for Hadoop”

[1] https://medium.com/salesforce-open-source/apache-phoenix-a-conversation-with-pmc-chair-james-taylor-cc0dd8c7c3e5

You all quite likely know the rest.

Page 5: Apache Phoenix Query Server PhoenixCon2016

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

“What” is the Apache Phoenix QueryServer?

A Client-Server abstraction of a JDBC Driver– Built using Apache Calcite’s Avatica sub-project

A standalone-service to be run on each node in a cluster– An HTTP server– Configurable serialization mechanism

A new JDBC Driver to use with the QueryServer– A glorified HTTP client– A new “sqlline” script

HBase

PQS

Client

Page 6: Apache Phoenix Query Server PhoenixCon2016

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

“What” is Apache Calcite?

SQL Parser– One SQL implementation usable by everyone

Cost-Based Optimizer– “Optimizations are easy”

Pluggable Data Sources– Implement your own SQL engine

Avatica– Calcite sub-project– Implements the JDBC-over-HTTP abstraction– Written to the JDBC spec, not database-specific

The coolest project approximately one person can explain

Page 7: Apache Phoenix Query Server PhoenixCon2016

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

What?

Why?

How?

Apache Phoenix QueryServer

Page 8: Apache Phoenix Query Server PhoenixCon2016

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

“Why” should I care?

A true “thin” client– No direct connection to HBase/ZooKeeper/HDFS– Greatly simplifies definition of a “Phoenix client” (less than 1/10th the size)

Offload computational resources to cluster– Query Servers run on the cluster– Not your laptop or an “edge” node

Enables non-Java clients– The big one

Because it’s friggin’ cool!

Page 9: Apache Phoenix Query Server PhoenixCon2016

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

“Why” are non-Java clients important?

”Native” bindings in any language– HTTP clients are easily implemented– Serialization approaches (often) have cross-language support

Access to data in HBase is suddenly easily accessible– Standardized table structure through Phoenix– Well-defined APIs: Python Database API, Ruby ActiveRecord, etc

ODBC and BI Tools– The moonshot. – The hopes and dreams of users everywhere.

Not everyone wants to use Java.

Page 10: Apache Phoenix Query Server PhoenixCon2016

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

“Why” didn’t you use …?

HTTP is simple– “You have multiple versions of Thrift on the classpath”– “You have to use Protobuf 2.4”

Designed to be stateless– JDBC doesn’t make this easy– Can (try to) work around it via Avatica’s wire API

Statelessness makes scaling easier– Any off-the-shelf HTTP load balancer– Deploy more Avatica servers to scale up– Visions of YARN/Slider, Docker, etc

Because portability sucks

Page 11: Apache Phoenix Query Server PhoenixCon2016

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

What?

Why?

How?

Apache Phoenix QueryServer

Page 12: Apache Phoenix Query Server PhoenixCon2016

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

“How” does it work?

HTTP Server– Jetty– Phoenix “thick” Driver

Serialization mechanism– Protocol Buffers– JSON

Metrics system– Dropwizard Metrics– Apache Hadoop Metrics2

Authentication– Kerberos via SPNEGO– HTTP Basic or Digest

The QueryServer itself

Page 13: Apache Phoenix Query Server PhoenixCon2016

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

“How” does the serialization work?

Google Protocol Buffers (v3)– “think XML, but smaller, faster, and simpler” [1]

– 110% supported WRT compatibility– Native bindings in most every popular language– Clients can use any version of protobuf3 (relies on serialized messages)

JSON– Nice for testing– 175% unsupported WRT compatibility– You will run into issue with mismatched client/server versions

Please, please, please use Protocol Buffers

[1] https://developers.google.com/protocol-buffers/

Page 14: Apache Phoenix Query Server PhoenixCon2016

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

“How” do I make a client?

Choose a language– Find an HTTP client supported with that language– Install Protobuf bindings for that language

Read the Avatica docs [1]

– Tell us when docs are incorrect/lacking/wrong/boring/lame

Write tests Publish the client

– And tell us!

Sit down and write code

[1] http://calcite.apache.org/avatica/docs/protobuf_reference.html

Page 15: Apache Phoenix Query Server PhoenixCon2016

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

“How” do I get involved?

Clients that have already been built

“Python PhoenixDB” https://bitbucket.org/lalinsky/python-phoenixdb

“Golang Client” https://github.com/Boostport/avatica

Page 16: Apache Phoenix Query Server PhoenixCon2016

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

“How” do I find new improvements?

Batch API support (2-3x improvement in writes) Kerberos client authentication (SPNEGO) Technology compatibility kit

– Automated testing harness– Verifies compatibility assertions

Even more documentation

Upcoming changes in Avatica 1.8.0 (as of rc0)

Page 17: Apache Phoenix Query Server PhoenixCon2016

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

“How” do I know what’s coming next?

The wire API is steadily changing, but backwards compatibly (yay)– Need to define compatibility statement (what’s “public api”?)– Need to validate releases WRT the compatibility statement

Better shaded and non-shaded artifacts– Only shaded artifacts provided in some cases

Better docs/examples for integrators (build your own “PQS”) More clients (hopefully)

– With docs, examples, and tests

Things quickly coming down the road

Page 18: Apache Phoenix Query Server PhoenixCon2016

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

“How” do I get involved?

Provide servers for databases– A simple project for a specific database

Write some tests Proofread the docs Contribute a client Write a tutorial Answer questions on Stackoverflow/mailing lists

Carpe diem

Page 19: Apache Phoenix Query Server PhoenixCon2016

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thanks!Email: [email protected]: @josh_elserMailing lists: Phoenix: [email protected], [email protected], Calcite: [email protected] info: https://phoenix.apache.org/server.html https://calcite.apache.org/avatica/