sql now! how optiq brings the best of sql to nosql data

37
SQL Now! NoSQL Now! San Jose, California August, 2013 Julian Hyde @julianhyde

Upload: julian-hyde

Post on 25-May-2015

2.907 views

Category:

Technology


0 download

DESCRIPTION

A talk given at the August, 2013 NoSQL Now! conference in San Jose, California. 2013 is the year of that SQL meets Hadoop & NoSQL. There is a surprisingly good fit between thirty-something SQL and the upstart millennial data technologies. SQL helps you deliver faster value, improve integration with visualization and analytics tools, and bring your data to a larger audience. Julian Hyde, author of Mondrian and Optiq, and a contributor to Apache Drill and Cascading Lingual, shows how to quickly build a SQL interface to a NoSQL system using the Optiq framework. As a case study, he shows how Optiq has allows the Mondrian in-memory analysis engine (part of Pentaho's business intelligence suite) to leverage the memory and compute power of a cluster and pull data from hybrid SQL and NoSQL sources.

TRANSCRIPT

Page 1: SQL Now! How Optiq brings the best of SQL to NoSQL data

SQL Now!

NoSQL Now!San Jose, California

August, 2013

Julian Hyde @julianhyde

Page 2: SQL Now! How Optiq brings the best of SQL to NoSQL data

About me

• Database: Oracle, Broadbase

• Streaming query: SQLstream

• Open source BI: Mondrian / Pentaho

• Open source SQL: LucidDB, Optiq

• Contributor to Apache Drill, Cascading Lingual

Page 3: SQL Now! How Optiq brings the best of SQL to NoSQL data

http://manning.com/back/

Page 4: SQL Now! How Optiq brings the best of SQL to NoSQL data

http://manning.com/back/

45% off codemlnosql13

Page 5: SQL Now! How Optiq brings the best of SQL to NoSQL data

3 things

1. Modern data challenges need both NoSQL and SQL.

2. Optiq is not a database (and this is a Good Thing).

3. How to use Optiq with your data.

Page 6: SQL Now! How Optiq brings the best of SQL to NoSQL data

A disturbance in the Force

Page 7: SQL Now! How Optiq brings the best of SQL to NoSQL data

SQL has been around avery, very long time

Page 8: SQL Now! How Optiq brings the best of SQL to NoSQL data

NoSQL: trade-offs vs SQL

Gain Lose

Scale out Optimization

Flexibility Data independence

Productivity Central control

Purchase cost Tool integration

Page 9: SQL Now! How Optiq brings the best of SQL to NoSQL data

SQL: key features

• Inspired by Codd’s 1970 “relational database” paper

• Semantics based on relational operators (scan, filter, project, join, agg, union, sort)

• All implementations transform queries

• Optimizer rewrites queries onto available data structures and operators

Page 10: SQL Now! How Optiq brings the best of SQL to NoSQL data

Optiq

Page 11: SQL Now! How Optiq brings the best of SQL to NoSQL data

Optiq design principles

1. Do the stuff that SQL does well

2. Let other systems do what they do well

Page 12: SQL Now! How Optiq brings the best of SQL to NoSQL data

Conventional DB architecture

Page 13: SQL Now! How Optiq brings the best of SQL to NoSQL data

Optiq architecture

Page 14: SQL Now! How Optiq brings the best of SQL to NoSQL data

Examples

Page 15: SQL Now! How Optiq brings the best of SQL to NoSQL data

Example #1: CSV

• Uses CSV adapter (optiq-csv)

• Demo using sqlline

• Easy to run this for yourself:$ git clone https://github.com/julianhyde/optiq-csv$ cd optiq-csv$ mvn install$ ./sqlline

Page 16: SQL Now! How Optiq brings the best of SQL to NoSQL data

!connect jdbc:optiq:model=target/test-classes/model.json admin admin

!tables

!describe emps

SELECT * FROM emps;

EXPLAIN PLAN FOR SELECT * FROM emps;

SELECT depts.name, count(*)FROM emps JOIN depts USING (deptno)GROUP BY depts.name;

model.json:{ version: '1.0', defaultSchema: 'SALES', schemas: [ { name: 'SALES', type: 'custom', factory: 'net.hydromatic.optiq.impl.csv.CsvSchemaFactory', operand: { directory: 'target/test-classes/sales' } } ]}

Page 17: SQL Now! How Optiq brings the best of SQL to NoSQL data

More adapters

Adapters Embedded Planned

CSV Cascading (Lingual) HBase (Phoenix)

JDBC Apache Drill Spark

MongoDB Cassandra

Splunk Mondrian

linq4j

Page 18: SQL Now! How Optiq brings the best of SQL to NoSQL data

SQL on NoSQL

• No schema or schema-on-read

• Optiq MAP columns, dynamic schema

• Nested data (Drill, MongoDB)

• Optiq ARRAY columns

• Source system optimized for transactions & focused reads, not analytic queries

• Optiq cache and materializations

Page 19: SQL Now! How Optiq brings the best of SQL to NoSQL data

• Mongo’s standard “zips” data set with all US zip codes

• Raw ZIPS table with _MAP column

• ZIPS view extracts named fields, including nested latitude and longitude, and converts them to SQL types

Example #2: MongoDB

Page 20: SQL Now! How Optiq brings the best of SQL to NoSQL data

Splunk

• NoSQL database

• Every log file in the enterprise

• A single “table”

• A record for every line in every log file

• A column for every field that exists in any log file

• No schema

Page 21: SQL Now! How Optiq brings the best of SQL to NoSQL data

Example #3:Splunk + MySQL

SELECT p.product_name, COUNT(*) AS cFROM splunk.splunk AS s JOIN mysql.products AS p ON s.product_id = p.product_idWHERE s.action = 'purchase'GROUP BY p.product_nameORDER BY c DESC

Page 22: SQL Now! How Optiq brings the best of SQL to NoSQL data

Expression tree

Page 23: SQL Now! How Optiq brings the best of SQL to NoSQL data

Optimized tree

Page 24: SQL Now! How Optiq brings the best of SQL to NoSQL data

Analytics

Page 25: SQL Now! How Optiq brings the best of SQL to NoSQL data

Interactive analytics on NoSQL?

• NoSQL operational DB (e.g. HBase, MongoDB, Cassandra)

• Analytic queries aggregate over full scan

• Speed-of-thought response (< 5 seconds)

• Data freshness (< 10 minutes)

Page 26: SQL Now! How Optiq brings the best of SQL to NoSQL data

Simple analytics problem

• 100M U.S. census records

• 1KB each record, 100GB total

• 4 SATA3 disks, total 1.2GB/s

• How to count all records in under 5s?

Page 27: SQL Now! How Optiq brings the best of SQL to NoSQL data

Simple analytics problem

• 100M U.S. census records

• 1KB each record, 100GB total

• 4 SATA3 disks, total 1.2GB/s

• How to count all records in under 5s?

Page 28: SQL Now! How Optiq brings the best of SQL to NoSQL data

Simple analytics problem

• 100M U.S. census records

• 1KB each record, 100GB total

• 4 SATA3 disks, total 1.2GB/s

• How to count all records in under 5s?

• Not possible?! It takes 80s just to read the data.

Page 29: SQL Now! How Optiq brings the best of SQL to NoSQL data

Solution: Cheat!

Page 30: SQL Now! How Optiq brings the best of SQL to NoSQL data

Solution: Cheat!

• Compress data

• Column-oriented storage

• Store data in sorted order

• Put data in memory

• Cache previous query results

• Pre-compute (materialize) aggregates

Page 31: SQL Now! How Optiq brings the best of SQL to NoSQL data
Page 32: SQL Now! How Optiq brings the best of SQL to NoSQL data

Mondrian on Optiq on NoSQL (under construction)

Page 33: SQL Now! How Optiq brings the best of SQL to NoSQL data

Hybrid analytic architecture

1. NoSQL source system

2. Access via Optiq SQL

3. Optiq rewrites queries to use materialized data (e.g. aggregate tables)

4. Cached results are treated as “in-memory tables”

5. Materializations offline dynamically as underlying data changes, and go online as they are refreshed

Page 34: SQL Now! How Optiq brings the best of SQL to NoSQL data

Summary

Page 35: SQL Now! How Optiq brings the best of SQL to NoSQL data

3 things (reprise)

1. SQL allows you to reorganize your data and optimize your queries—while still using your NoSQL database for what it does best.

2. Optiq is not a database. It lets you create very powerful federated data architectures.

3. Access your data using Optiq adapters. Write schemas in JSON or use schema SPI. Connect via JDBC.

Page 36: SQL Now! How Optiq brings the best of SQL to NoSQL data

Questions?

Page 37: SQL Now! How Optiq brings the best of SQL to NoSQL data

Thanks!@julianhyde

optiq https://github.com/julianhyde/optiq

optiq-csv https://github.com/julianhyde/optiq-csv

drill http://incubator.apache.org/drill/

lingual http://www.cascading.org/lingual/

mondrian http://mondrian.pentaho.com

blog http://julianhyde.blogspot.com/

book http://manning.com/back/ 45% off code mlnosql13