Download - BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014
![Page 1: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/1.jpg)
HANDS ON WITH BIGQUERY JAVASCRIPT UDFS
THOMAS PARKSOFTWARE ENGINEER - GOOGLE
![Page 2: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/2.jpg)
Hands-on with BigQuery JavaScriptUser-Defined Functions
Thomas ParkSoftware Engineer - Google
Felipe Hoffa@felipehoffaDeveloper Advocate - Google
![Page 3: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/3.jpg)
Agenda
Background
Example: Cross-row intervals
Under the hood
Example: Codebreaking
I.
II.
III.
IV.
![Page 4: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/4.jpg)
Agenda
Background
Example: Cross-row intervals
Under the hood
Example: Codebreaking
I.
II.
III.
IV.
![Page 5: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/5.jpg)
What is BigQuery?
![Page 6: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/6.jpg)
BigQuery: Big Data Analytics in the Cloud
Unrivaled Performance and Scale
● Scan multiple TB’s in seconds● Interactive query
performance● No limits on amount of data
Ease of Use and Adoption
● No administration / provisioning
● Convenience of SQL● Open interfaces
(REST, WebUI, ODBC)● First 1 TB of data processed
per month is free
Advanced “Big Data” Storage
● Familiar database structure● Easy data management and
ACL’s● Fast, atomic imports
![Page 7: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/7.jpg)
Google confidential │ Do not distribute
How many pageviews does Wikipediahave in a month?
SELECT COUNT(*)FROM[fh-bigquery:wikipedia.wikipedia_views_201308]
https://bigquery.cloud.google.com/table/fh-bigquery:wikipedia.pagecounts_20140602_18
![Page 8: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/8.jpg)
Google confidential │ Do not distribute
$500 in Cloud Platform credit to launch your idea!
Build. Store. Analyze.On the same infrastructure
that powers GoogleStart building
Click ‘Apply Now’ and complete the
application with promo code: bigdata-spain
Starter Pack
Offer Description
1
2
3
Go to cloud.google.com/developers/starterpack
![Page 9: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/9.jpg)
Agenda
Background
Example: Cross-row intervals
Under the hood
Example: Codebreaking
I.
II.
III.
IV.
![Page 10: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/10.jpg)
Images by Connie Zhou
Scenario:
Door access records from a
Very Well-Secured Lab
where users must badge in
to enter or leave
Image by Tod Kurt
![Page 11: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/11.jpg)
Images by Connie Zhou
Example:
Time-series analysis
from discrete user action data
Image by Tod Kurt
![Page 12: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/12.jpg)
user_id timestamp
Beep!!9h: arrive @ lab thomas 2014.07.15 09:00
Beep!!10h: leave to pick up
prototypethomas 2014.07.15 10:00
Beep!!10h15: return with
prototypethomas 2014.07.15 10:15
Beep!!12h: out for lunch thomas 2014.07.15 12:00
![Page 13: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/13.jpg)
How can we find out how much time each user spent in the lab?
...where each scan of the user’s access card is
represented as a discrete row?
![Page 14: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/14.jpg)
rownum user_id timestamp1 thomas 2014.07.15 09:002 thomas 2014.07.15 10:003 thomas 2014.07.15 10:154 thomas 2014.07.15 12:00
60 minutes
105 minutes
Our analysis with data in this format via SQL is horrid and painful
A BigQuery + JS friendly format:
data for each user in separate rows
user_id timestampsthomas [ 09:00, 10:00, 10:15, 12:00, ... ]hoffa [ 08:10, 11:30, 12:00, 12:15, ... ]
SELECT user_id, NEST(timestamp) AS timestampsFROM TGROUP BY user_id;
Producing this format is trivial in BigQuery...
![Page 15: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/15.jpg)
// This function will be called once for each user,// and receive an array of timestamps.function(record, emit) { var total_time = 0; // Order of records built by NEST are not guaranteed! // Sort to guarantee ascending timestamps. var ts = record.ts.sort( function (a, b) {return a > b;}); // Loop over timestamp pairs, calculate interval. for (var i = 0; i < ts.length - 1; i += 2) { total_time += (ts[i+1] - ts[i]); } // Emit total time for this user. emit({user: record.user_id, total_time: total_time});}
JS: Total time for each user
![Page 16: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/16.jpg)
SELECT * FROM js( // Input table or query. [secret-lab:door_scans.201411] // Input columns. user_id, timestamps, // Output schema. "[{name: 'user_id', type:'string'}, {name: 'tot_time', type:'integer'}]", // The function. "function(r, emit) { ... emit(...); }")
![Page 17: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/17.jpg)
SELECT * FROM js( // Input table or query. [secret-lab:door_scans.201411] // Input columns. user_id, timestamps, // Output schema. "[{name: 'user_id', type:'string'}, {name: 'tot_time', type:'integer'}]", // The function. "function(r, emit) { ... emit(...); }")
The JS function
![Page 18: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/18.jpg)
SELECT * FROM js( // Input table or query. [secret-lab:door_scans.201411] // Input columns. user_id, timestamps, // Output schema. "[{name: 'user_id', type:'string'}, {name: 'tot_time', type:'integer'}]", // The function. "function(r, emit) { ... emit(...); }")
Input schema (column
names only!)
![Page 19: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/19.jpg)
SELECT * FROM js( // Input table or query. [secret-lab:door_scans.201411] // Input columns. user_id, timestamps, // Output schema. "[{name: 'user_id', type:'string'}, {name: 'tot_time', type:'integer'}]", // The function. "function(r, emit) { ... emit(...); }")
Output schema (full
declaration)
![Page 20: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/20.jpg)
SELECT * FROM js( // Input table or query. [secret-lab:door_scans.201411] // Input columns. user_id, timestamps, // Output schema. "[{name: 'user_id', type:'string'}, {name: 'tot_time', type:'integer'}]", // The function. "function(r, emit) { ... emit(...); }")
Input table or subquery
![Page 21: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/21.jpg)
SELECT * FROM js( // Input table or query. [secret-lab:door_scans.201411] // Input columns. user_id, timestamps, // Output schema. "[{name: 'user_id', type:'string'}, {name: 'tot_time', type:'integer'}]", // The function. "function(r, emit) { ... emit(...); }")
![Page 22: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/22.jpg)
Agenda
Background
Example: Cross-row intervals
Under the hood
Example: Codebreaking
I.
II.
III.
IV.
![Page 23: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/23.jpg)
How BigQuery works
Get data from lower levels, filter / join / transform,send rows up
Tree Structured Query Dispatch and Aggregation
Distributed StorageSELECT title, requests
Leaf Leaf Leaf LeafSUM(requests)GROUP BY titleWHERE REGEX_MATCH(title, 'pat.*rn')
Mixer 1 Mixer 1 SUM(requests)GROUP BY title
Mixer 0
LIMIT 10ORDER BY c DESCSUM(requests)GROUP BY title
![Page 24: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/24.jpg)
Data for each row is calculated and
streamed through a “Row Iterator”
Subquery0 Subquery1
JOINRow Iterator 0 Row Iterator 1
Row Iterator 2
![Page 25: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/25.jpg)
Can insert JavaScript Functions
wherever we have a Row Iterator
Subquery0 Subquery1
JOINRow Iterator 0 Row Iterator 1
Row Iterator 2
UDF1
UDF0
![Page 26: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/26.jpg)
Join order item info with web hits info
SELECT item FROM
orders
SELECT query string FROM hits
JOINRow Iterator 0 Row Iterator 1
Row Iterator 2
UDF1
UDF0
![Page 27: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/27.jpg)
http://www.store.com/?q=7%2e1+Speakers
SELECT item FROM
orders
SELECT query string FROM hits
JOINRow Iterator 0 Row Iterator 1
Row Iterator 2
UDF1
UDF0
![Page 28: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/28.jpg)
http://www.store.com/?q=7%2e1+Speakers
Extract and decode query term => “7.1 Speakers”
SELECT item FROM
orders
SELECT query string FROM hits
JOINRow Iterator 0 Row Iterator 1
Row Iterator 2
UDF1
UDF0
![Page 29: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/29.jpg)
UDF execution
Subquery0 Subquery1
JOIN
UDF1
Process boundary
UDF0UDF0
User Code
![Page 30: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/30.jpg)
Agenda
Background
Example: Cross-row intervals
Under the hood
Example: Codebreaking
I.
II.
III.
IV.
![Page 31: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/31.jpg)
Demos:
Ñ
![Page 32: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/32.jpg)
Image: El Hormiguero (Flickr CC)
![Page 33: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/33.jpg)
http://jsfiddle.net/fhoffa/y4pt9s23/
Image: TheVanCats (Flickr CC)
![Page 34: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/34.jpg)
Questions?
News: reddit.com/r/bigqueryAsk: stackoverflow.com
Share: bigqueri.es
Thomas ParkFelipe Hoffa @felipehoffa +FelipeHoffa
Rate us?
http://goo.gl/k3bzdw
![Page 35: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/35.jpg)
Backup slides / screenshots
![Page 36: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/36.jpg)
![Page 37: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/37.jpg)
![Page 38: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/38.jpg)
![Page 39: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/39.jpg)
![Page 40: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/40.jpg)
![Page 41: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/41.jpg)
![Page 42: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014](https://reader035.vdocuments.mx/reader035/viewer/2022070323/55a1e2411a28abfe428b4572/html5/thumbnails/42.jpg)
17TH ~ 18th NOV 2014MADRID (SPAIN)