phoenix james taylor [email protected] we put the sql back in nosql
TRANSCRIPT
![Page 2: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/2.jpg)
Agenda
Completed
What is Phoenix?Why SQL?What is next?Q&A
![Page 3: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/3.jpg)
What is Phoenix?
Completed
SQL layer on top of HBaseDelivered as an embedded JDBC driverTargets low latency queries over HBase dataColumns modeled as multi-part row key and key valuesVersioned schema repositoryQuery engine transforms SQL into puts, delete, scansUses native HBase APIs instead of Map/ReduceBrings the computation to the data:
Aggregate, insert, delete datathrough coprocessorsPush predicates through custom filters
100% JavaOpen source here: https://github.com/forcedotcom/phoenix
![Page 4: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/4.jpg)
Why SQL?
Completed
Broaden HBase adoptionGive folks an API they already know
Reduce the amount of code users need to writeSELECT TRUNC(date,'DAY’), AVG(cpu_usage)FROM web_statWHERE domain LIKE 'Salesforce%’GROUP BY TRUNC(date,'DAY')
Performance optimizations transparent to the userAggregationStats gatheringSecondary indexing
Leverage existing toolingSQL client
![Page 5: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/5.jpg)
But I can’t surface x,y,z in SQL…
Completed
![Page 6: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/6.jpg)
But I can’t surface x,y,z in SQL…
Completed
![Page 7: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/7.jpg)
But I can’t surface x,y,z in SQL…
Completed
Define multi-part row keys
![Page 8: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/8.jpg)
But I can’t surface x,y,z in SQL…
Completed
Define multi-part row keysCREATE TABLE web_stat (
domain VARCHAR NOT NULL, feature VARCHAR NOT NULL, date DATE NOT NULL, usage BIGINT, active_visitor INTEGER,
CONSTRAINT pk PRIMARY KEY (domain, feature, date));
![Page 9: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/9.jpg)
But I can’t surface x,y,z in SQL…
Completed
Define multi-part row keysImplement my whizz-bang custom function
![Page 10: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/10.jpg)
But I can’t surface x,y,z in SQL…
Completed
Define multi-part row keysImplement my whizz-bang custom function
Derive class from ScalarFunctionAdd annotation to define name, args, and typesImplement evaluate methodRegister function
(blog on this coming soon: http://phoenix-hbase.blogspot.com/)
![Page 11: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/11.jpg)
But I can’t surface x,y,z in SQL…
Completed
Define multi-part row keysImplement my whizz-bang built-in functionRun snapshot in time queries
![Page 12: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/12.jpg)
But I can’t surface x,y,z in SQL…
Completed
Define multi-part row keysImplement my whizz-bang built-in functionRun snapshot in time queries
Set CURRENT_SCN property on connection to earlier timestamp
Queries will see only rows before timestampSchema in-place at that point in time will be used
![Page 13: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/13.jpg)
But I can’t surface x,y,z in SQL…
Completed
Define multi-part row keysImplement my whizz-bang built-in functionRun snapshot in time queriesNest child entities inside of a row
![Page 14: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/14.jpg)
But I can’t surface x,y,z in SQL…
Completed
Define multi-part row keysImplement my whizz-bang built-in functionRun snapshot in time queriesNest child entities inside of a row
Declare new new child entity as nested tablePrefix column qualifier of nested entities with:
table name + child primary key + child column nameRestrict join to be only through parent/child relationExecute query by scanning nested child rows
TBD: https:/github.com/forcedotcom/phoenix/issues/19
![Page 15: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/15.jpg)
But I can’t surface x,y,z in SQL…
Completed
Define multi-part row keysImplement my whizz-bang built-in functionRun snapshot in time queriesNest child entities inside of a rowPrevent hot spotting on writes
![Page 16: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/16.jpg)
But I can’t surface x,y,z in SQL…
Completed
Define multi-part row keysImplement my whizz-bang built-in functionRun snapshot in time queriesNest child entities inside of a rowPrevent hot spotting on writes
“Salt” row key on upsert by mod-ing with cluster sizeQuery for fully qualified key by inserting salt byteRange scan by concatenating results of scan over all
possible salt bytesOr alternately
Define column used for hash to derive row key prefix
TBD: https://github.com/forcedotcom/phoenix/issues/74
![Page 17: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/17.jpg)
But I can’t surface x,y,z in SQL…Define multi-part row keysImplement my whizz-bang built-in functionRun snapshot in time queriesNest child entities inside of a rowPrevent hot spotting on writesIncrement atomic counter
![Page 18: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/18.jpg)
But I can’t surface x,y,z in SQL…Define multi-part row keysImplement my whizz-bang built-in functionRun snapshot in time queriesNest child entities inside of a rowPrevent hot spotting on writesIncrement atomic counter
Surface the HBase put-and-increment functionality through the standard SQL sequence support
TBD: https://github.com/forcedotcom/phoenix/issues/18
![Page 19: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/19.jpg)
But I can’t surface x,y,z in SQL…Define multi-part row keysImplement my whizz-bang built-in functionRun snapshot in time queriesNest child entities inside of a rowPrevent hot spotting on writesIncrement atomic counterSample table data
![Page 20: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/20.jpg)
But I can’t surface x,y,z in SQL…Define multi-part row keysImplement my whizz-bang built-in functionRun snapshot in time queriesNest child entities inside of a rowPrevent hot spotting on writesIncrement atomic counterSample table data
Support the standard SQL TABLESAMPLE clauseImplement filter that uses a skip next hint Base next key on the table stats “guide posts”
TBD: https://github.com/forcedotcom/phoenix/issues/22
![Page 21: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/21.jpg)
But I can’t surface x,y,z in SQL…Define multi-part row keysImplement my whizz-bang built-in functionRun snapshot in time queriesNest child entities inside of a rowPrevent hot spotting on writesIncrement atomic counterSample table dataDeclare columns at query time
![Page 22: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/22.jpg)
But I can’t surface x,y,z in SQL…Define multi-part row keysImplement my whizz-bang built-in functionRun snapshot in time queriesNest child entities inside of a rowPrevent hot spotting on writesIncrement atomic counterSample table dataDeclare columns at query time
SELECT col1,col2,col3FROM my_table(col2 VARCHAR, col3 INTEGER)WHERE col3 > 10
TBD: https://github.com/forcedotcom/phoenix/issues/9
![Page 23: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/23.jpg)
ConclusionPhoenix fits the 80/20 use case ruleLet us know what you’d like to see addedGet involved – we need your help!Think about how your new feature can be surfaced in SQL
![Page 24: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/24.jpg)
Thank you!Questions/comments?
![Page 25: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/25.jpg)
Query Processing
FEATURERow Key
Key Values
ORG_ID DATE
TXNS
IO_TIME
RESPONSE_TIME
Product Metrics HTable
Scan Start key: ORG_ID (:1) + DATE (:2) End key: ORG_ID (:1) + DATE (:3)
Filter Filter: IO_TIME > 100
Aggregation Intercepts scan on region server Builds map of distinct FEATURE values Returns one row per distinct group Client does final merge
SELECT feature, SUM(txns)FROM product_metricsWHERE org_id = :1AND date >= :2 AND date <= :3AND io_time > 100GROUP BY feature
![Page 26: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/26.jpg)
Phoenix Query Optimizations
Completed
Start/stop key of scan based on AND-ed columnsThrough SUBSTR, ROUND, TRUNC, LIKE
Parallelized on client by chunking over start/stop key of scanAggregation on region-servers through coprocessor
Inline for GROUP BY over row key ordered columnsIn memory map per group otherwise
WHERE clause executed through custom filtersIncremental evaluation with early terminationEvaluated through byte pointers
IN and OR over same column (in progress)Becomes batched get or filter with next row hint
Top N queries (future)Through coprocessor keeping top N rows
TABLESAMPLE (future)Becomes filter with next row hint
![Page 27: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/27.jpg)
Phoenix Performance
![Page 28: Phoenix James Taylor jtaylor@salesforce.com We put the SQL back in NoSQL](https://reader036.vdocuments.mx/reader036/viewer/2022062511/55146d63550346414e8b5eef/html5/thumbnails/28.jpg)
Phoenix Performance
Completed