apache phoenix query server
TRANSCRIPT
Apache PhoenixQuery ServerJosh ElserFuture of Data, NYC2016/10/11
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Engineer at Hortonworks, Member of the Apache Software Foundation
Top-Level Projects• Apache Accumulo®• Apache Calcite™• Apache Commons ™• Apache HBase ®• Apache Phoenix ™
ASF Incubator• Apache Fluo ™• Apache Gossip ™• Apache Pirk ™• Apache Rya ™• Apache Slider ™
These Apache project names are trademarks or registeredtrademarks of the Apache Software Foundation.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
What?
Why?
How?
Apache Phoenix Query Server
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Phoenix
Been called many things [1]
– “We put the SQL back in NoSQL!”– “A SQL skin on HBase”– “A relational layer on HBase”– “Online transaction processing and operational analytics for Hadoop”
Built on HDFS and HBase– Clients use a JDBC driver– Lots of server-side “magic” through HBase Coprocessors
A query system capable of both OLAP and OLTP workloads
[1] https://medium.com/salesforce-open-source/apache-phoenix-a-conversation-with-pmc-chair-james-taylor-cc0dd8c7c3e5
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Phoenix Query Server (PQS)
An HTTP abstraction of a JDBC Driver– Built on Apache Calcite’s Avatica
A standalone-service to be run on each node in a cluster– An HTTP server
A new JDBC Driver– A new sqlline script
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Phoenix Query Server
Apache HBasePQSPhoenix Client
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Calcite
SQL Parser– One SQL implementation usable by everyone
Cost-Based Optimizer– “Optimizations are easy”
Pluggable Data Sources– Implement your own SQL engine
Avatica– Implements the JDBC-over-HTTP abstraction– Written to the JDBC spec
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
What?
Why?
How?
Apache Phoenix Query Server
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why should I care?
A true “thin” client– No connection to HBase/ZooKeeper/HDFS– Greatly simplifies definition of “Phoenix client”
Offload computational resources to cluster– Query Servers run anywhere with access to the cluster– Not your laptop or some “edge” node
Enables non-Java clients– The big one
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Non-Java Clients
”Native” bindings in any language– HTTP clients are easily implemented– Serialization often provides multi-language support
Access to data in Phoenix is suddenly easily accessible– Well-defined APIs in each language– Python: SQLAlchemy backing a Flask application– Ruby: Ruby on Rails via ActiveRecord– C#: Windows application via .NET
ODBC and BI Tools
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why not <insert rpc framework here>?
HTTP is simple– “You have multiple versions of Thrift on the classpath”– “You have to use Protobuf 2.5”
Stateless is beautiful– JDBC doesn’t make this easy– Can work around it via Avatica’s wire API
Off-the-shelf services– Pull down any HTTP load balancer– Deploy more Phoenix Query Servers to scale up
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
What?
Why?
How?
Apache Phoenix Query Server
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The Tech
HTTP Server– Jetty– Phoenix “thick” Driver
Serialization– Protocol Buffers– JSON
Metrics– Dropwizard Metrics– Apache Hadoop® Metrics2
Authentication– Kerberos via SPNEGO– HTTP Basic or Digest
Hadoop is a registered trademark of the Apache Software Foundation
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
On Serialization
Google Protocol Buffers (v3)– “think XML, but smaller, faster, and simpler” [1]
– 110% supported WRT compatibility– Native bindings in most every popular language– Clients can use any version of protobuf3
JSON– 110% unsupported WRT compatibility– You will run into issue with mismatched client/server versions
Please, please, please use Protocol Buffers
[1] https://developers.google.com/protocol-buffers/
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Making a client
Choose a language– Find an HTTP client supported with that language– Install Protobuf bindings for that language
Read the Avatica docs [1]
– Tell us when docs are incorrect/lacking/wrong/boring/lame
Write some code/tests Publish the client Profit: http://calcite.apache.org/avatica/docs/#clients
[1] http://calcite.apache.org/avatica/docs/protobuf_reference.html
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Current Clients
Java JDBC driver (https://calcite.apache.org/avatica) Microsoft .NET driver (https://github.com/Azure/hdinsight-phoenix-sharp) Go-lang Client (https://github.com/Boostport/avatica) Python Client (https://bitbucket.org/lalinsky/python-phoenixdb) ODBC Driver (http://www.simba.com/ and https://hortonworks.com )
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Involvement
Provide servers for databases– A simple project for a specific database
Write some tests Proofread the docs Contribute a client Answer questions on mail lists/StackOverflow
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Using the Thin JDBC Driver
New JAR file and URL– jdbc:phoenix:thin:url=http://localhost:8765;serialization=PROTOBUF– ${PHOENIX_HOME}/phoenix-$VERSION-thin-client.jar
JDBC APIs hide the rest! Some caveats
– ARRAY support (CALCITE-1050)– DatabaseMetaData operations (CALCITE-1308)– Use Statement#addBatch() and Statement#executeBatch() for maximum perf
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thanks!Email: [email protected]: @josh_elserMailing lists: Phoenix: [email protected], [email protected], Calcite: [email protected] info: https://phoenix.apache.org/server.html https://calcite.apache.org/avatica/