apache phoenix query server

19
Apache Phoenix Query Server Josh Elser Future of Data, NYC 2016/10/11

Upload: josh-elser

Post on 06-Jan-2017

257 views

Category:

Software


5 download

TRANSCRIPT

Page 1: Apache Phoenix Query Server

Apache PhoenixQuery ServerJosh ElserFuture of Data, NYC2016/10/11

Page 2: Apache Phoenix Query Server

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Engineer at Hortonworks, Member of the Apache Software Foundation

Top-Level Projects• Apache Accumulo®• Apache Calcite™• Apache Commons ™• Apache HBase ®• Apache Phoenix ™

ASF Incubator• Apache Fluo ™• Apache Gossip ™• Apache Pirk ™• Apache Rya ™• Apache Slider ™

These Apache project names are trademarks or registeredtrademarks of the Apache Software Foundation.

Page 3: Apache Phoenix Query Server

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

What?

Why?

How?

Apache Phoenix Query Server

Page 4: Apache Phoenix Query Server

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Phoenix

Been called many things [1]

– “We put the SQL back in NoSQL!”– “A SQL skin on HBase”– “A relational layer on HBase”– “Online transaction processing and operational analytics for Hadoop”

Built on HDFS and HBase– Clients use a JDBC driver– Lots of server-side “magic” through HBase Coprocessors

A query system capable of both OLAP and OLTP workloads

[1] https://medium.com/salesforce-open-source/apache-phoenix-a-conversation-with-pmc-chair-james-taylor-cc0dd8c7c3e5

Page 5: Apache Phoenix Query Server

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Phoenix Query Server (PQS)

An HTTP abstraction of a JDBC Driver– Built on Apache Calcite’s Avatica

A standalone-service to be run on each node in a cluster– An HTTP server

A new JDBC Driver– A new sqlline script

Page 6: Apache Phoenix Query Server

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Phoenix Query Server

Apache HBasePQSPhoenix Client

Page 7: Apache Phoenix Query Server

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Calcite

SQL Parser– One SQL implementation usable by everyone

Cost-Based Optimizer– “Optimizations are easy”

Pluggable Data Sources– Implement your own SQL engine

Avatica– Implements the JDBC-over-HTTP abstraction– Written to the JDBC spec

Page 8: Apache Phoenix Query Server

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

What?

Why?

How?

Apache Phoenix Query Server

Page 9: Apache Phoenix Query Server

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Why should I care?

A true “thin” client– No connection to HBase/ZooKeeper/HDFS– Greatly simplifies definition of “Phoenix client”

Offload computational resources to cluster– Query Servers run anywhere with access to the cluster– Not your laptop or some “edge” node

Enables non-Java clients– The big one

Page 10: Apache Phoenix Query Server

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Non-Java Clients

”Native” bindings in any language– HTTP clients are easily implemented– Serialization often provides multi-language support

Access to data in Phoenix is suddenly easily accessible– Well-defined APIs in each language– Python: SQLAlchemy backing a Flask application– Ruby: Ruby on Rails via ActiveRecord– C#: Windows application via .NET

ODBC and BI Tools

Page 11: Apache Phoenix Query Server

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Why not <insert rpc framework here>?

HTTP is simple– “You have multiple versions of Thrift on the classpath”– “You have to use Protobuf 2.5”

Stateless is beautiful– JDBC doesn’t make this easy– Can work around it via Avatica’s wire API

Off-the-shelf services– Pull down any HTTP load balancer– Deploy more Phoenix Query Servers to scale up

Page 12: Apache Phoenix Query Server

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

What?

Why?

How?

Apache Phoenix Query Server

Page 13: Apache Phoenix Query Server

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

The Tech

HTTP Server– Jetty– Phoenix “thick” Driver

Serialization– Protocol Buffers– JSON

Metrics– Dropwizard Metrics– Apache Hadoop® Metrics2

Authentication– Kerberos via SPNEGO– HTTP Basic or Digest

Hadoop is a registered trademark of the Apache Software Foundation

Page 14: Apache Phoenix Query Server

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

On Serialization

Google Protocol Buffers (v3)– “think XML, but smaller, faster, and simpler” [1]

– 110% supported WRT compatibility– Native bindings in most every popular language– Clients can use any version of protobuf3

JSON– 110% unsupported WRT compatibility– You will run into issue with mismatched client/server versions

Please, please, please use Protocol Buffers

[1] https://developers.google.com/protocol-buffers/

Page 15: Apache Phoenix Query Server

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Making a client

Choose a language– Find an HTTP client supported with that language– Install Protobuf bindings for that language

Read the Avatica docs [1]

– Tell us when docs are incorrect/lacking/wrong/boring/lame

Write some code/tests Publish the client Profit: http://calcite.apache.org/avatica/docs/#clients

[1] http://calcite.apache.org/avatica/docs/protobuf_reference.html

Page 16: Apache Phoenix Query Server

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Current Clients

Java JDBC driver (https://calcite.apache.org/avatica) Microsoft .NET driver (https://github.com/Azure/hdinsight-phoenix-sharp) Go-lang Client (https://github.com/Boostport/avatica) Python Client (https://bitbucket.org/lalinsky/python-phoenixdb) ODBC Driver (http://www.simba.com/ and https://hortonworks.com )

Page 17: Apache Phoenix Query Server

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Involvement

Provide servers for databases– A simple project for a specific database

Write some tests Proofread the docs Contribute a client Answer questions on mail lists/StackOverflow

Page 18: Apache Phoenix Query Server

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Using the Thin JDBC Driver

New JAR file and URL– jdbc:phoenix:thin:url=http://localhost:8765;serialization=PROTOBUF– ${PHOENIX_HOME}/phoenix-$VERSION-thin-client.jar

JDBC APIs hide the rest! Some caveats

– ARRAY support (CALCITE-1050)– DatabaseMetaData operations (CALCITE-1308)– Use Statement#addBatch() and Statement#executeBatch() for maximum perf

Page 19: Apache Phoenix Query Server

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thanks!Email: [email protected]: @josh_elserMailing lists: Phoenix: [email protected], [email protected], Calcite: [email protected] info: https://phoenix.apache.org/server.html https://calcite.apache.org/avatica/