apache phoenix query server

Apache PhoenixQuery ServerJosh ElserFuture of Data, NYC2016/10/11

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Engineer at Hortonworks, Member of the Apache Software Foundation

Top-Level Projects• Apache Accumulo®• Apache Calcite™• Apache Commons ™• Apache HBase ®• Apache Phoenix ™

ASF Incubator• Apache Fluo ™• Apache Gossip ™• Apache Pirk ™• Apache Rya ™• Apache Slider ™

These Apache project names are trademarks or registeredtrademarks of the Apache Software Foundation.


Agenda

What?

Why?

How?

Apache Phoenix Query Server


Apache Phoenix

Been called many things [1]

– “We put the SQL back in NoSQL!”– “A SQL skin on HBase”– “A relational layer on HBase”– “Online transaction processing and operational analytics for Hadoop”

Built on HDFS and HBase– Clients use a JDBC driver– Lots of server-side “magic” through HBase Coprocessors

A query system capable of both OLAP and OLTP workloads

[1] https://medium.com/salesforce-open-source/apache-phoenix-a-conversation-with-pmc-chair-james-taylor-cc0dd8c7c3e5


Apache Phoenix Query Server (PQS)

An HTTP abstraction of a JDBC Driver– Built on Apache Calcite’s Avatica

A standalone-service to be run on each node in a cluster– An HTTP server

A new JDBC Driver– A new sqlline script



Apache HBasePQSPhoenix Client


Apache Calcite

SQL Parser– One SQL implementation usable by everyone

Cost-Based Optimizer– “Optimizations are easy”

Pluggable Data Sources– Implement your own SQL engine

Avatica– Implements the JDBC-over-HTTP abstraction– Written to the JDBC spec


Agenda

What?

Why?

How?



Why should I care?

A true “thin” client– No connection to HBase/ZooKeeper/HDFS– Greatly simplifies definition of “Phoenix client”

Offload computational resources to cluster– Query Servers run anywhere with access to the cluster– Not your laptop or some “edge” node

Enables non-Java clients– The big one


Non-Java Clients

”Native” bindings in any language– HTTP clients are easily implemented– Serialization often provides multi-language support

Access to data in Phoenix is suddenly easily accessible– Well-defined APIs in each language– Python: SQLAlchemy backing a Flask application– Ruby: Ruby on Rails via ActiveRecord– C#: Windows application via .NET

ODBC and BI Tools


Why not <insert rpc framework here>?

HTTP is simple– “You have multiple versions of Thrift on the classpath”– “You have to use Protobuf 2.5”

Stateless is beautiful– JDBC doesn’t make this easy– Can work around it via Avatica’s wire API

Off-the-shelf services– Pull down any HTTP load balancer– Deploy more Phoenix Query Servers to scale up


Agenda

What?

Why?

How?



The Tech

HTTP Server– Jetty– Phoenix “thick” Driver

Serialization– Protocol Buffers– JSON

Metrics– Dropwizard Metrics– Apache Hadoop® Metrics2

Authentication– Kerberos via SPNEGO– HTTP Basic or Digest

Hadoop is a registered trademark of the Apache Software Foundation


On Serialization

Google Protocol Buffers (v3)– “think XML, but smaller, faster, and simpler” [1]

– 110% supported WRT compatibility– Native bindings in most every popular language– Clients can use any version of protobuf3

JSON– 110% unsupported WRT compatibility– You will run into issue with mismatched client/server versions

Please, please, please use Protocol Buffers

[1] https://developers.google.com/protocol-buffers/


Making a client

Choose a language– Find an HTTP client supported with that language– Install Protobuf bindings for that language

Read the Avatica docs [1]

– Tell us when docs are incorrect/lacking/wrong/boring/lame

Write some code/tests Publish the client Profit: http://calcite.apache.org/avatica/docs/#clients

[1] http://calcite.apache.org/avatica/docs/protobuf_reference.html


Current Clients

Java JDBC driver (https://calcite.apache.org/avatica) Microsoft .NET driver (https://github.com/Azure/hdinsight-phoenix-sharp) Go-lang Client (https://github.com/Boostport/avatica) Python Client (https://bitbucket.org/lalinsky/python-phoenixdb) ODBC Driver (http://www.simba.com/ and https://hortonworks.com )

https://calcite.apache.org/avatica

https://github.com/Azure/hdinsight-phoenix-sharp)

https://github.com/Azure/hdinsight-phoenix-sharp)

https://github.com/Boostport/avatica)

https://github.com/Boostport/avatica)

https://bitbucket.org/lalinsky/python-phoenixdb)

https://bitbucket.org/lalinsky/python-phoenixdb)

http://www.simba.com/)

http://www.simba.com/

http://www.simba.com/)

https://hortonworks.com/




Involvement

Provide servers for databases– A simple project for a specific database

Write some tests Proofread the docs Contribute a client Answer questions on mail lists/StackOverflow


Using the Thin JDBC Driver

New JAR file and URL– jdbc:phoenix:thin:url=http://localhost:8765;serialization=PROTOBUF– ${PHOENIX_HOME}/phoenix-$VERSION-thin-client.jar

JDBC APIs hide the rest! Some caveats

– ARRAY support (CALCITE-1050)– DatabaseMetaData operations (CALCITE-1308)– Use Statement#addBatch() and Statement#executeBatch() for maximum perf


Thanks!Email: [email protected]: @josh_elserMailing lists: Phoenix: [email protected], [email protected], Calcite: [email protected] info: https://phoenix.apache.org/server.html https://calcite.apache.org/avatica/