bigquery javascript user-defined functions by thomas park and felipe hoffa at big data spain 2014

42
HANDS ON WITH BIGQUERY JAVASCRIPT UDFS THOMAS PARK SOFTWARE ENGINEER - GOOGLE

Upload: big-data-spain

Post on 12-Jul-2015

1.740 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

HANDS ON WITH BIGQUERY JAVASCRIPT UDFS

THOMAS PARKSOFTWARE ENGINEER - GOOGLE

Page 2: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

Hands-on with BigQuery JavaScriptUser-Defined Functions

Thomas ParkSoftware Engineer - Google

Felipe Hoffa@felipehoffaDeveloper Advocate - Google

Page 3: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

Agenda

Background

Example: Cross-row intervals

Under the hood

Example: Codebreaking

I.

II.

III.

IV.

Page 4: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

Agenda

Background

Example: Cross-row intervals

Under the hood

Example: Codebreaking

I.

II.

III.

IV.

Page 5: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

What is BigQuery?

Page 6: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

BigQuery: Big Data Analytics in the Cloud

Unrivaled Performance and Scale

● Scan multiple TB’s in seconds● Interactive query

performance● No limits on amount of data

Ease of Use and Adoption

● No administration / provisioning

● Convenience of SQL● Open interfaces

(REST, WebUI, ODBC)● First 1 TB of data processed

per month is free

Advanced “Big Data” Storage

● Familiar database structure● Easy data management and

ACL’s● Fast, atomic imports

Page 7: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

Google confidential │ Do not distribute

How many pageviews does Wikipediahave in a month?

SELECT COUNT(*)FROM[fh-bigquery:wikipedia.wikipedia_views_201308]

https://bigquery.cloud.google.com/table/fh-bigquery:wikipedia.pagecounts_20140602_18

Page 8: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

Google confidential │ Do not distribute

$500 in Cloud Platform credit to launch your idea!

Build. Store. Analyze.On the same infrastructure

that powers GoogleStart building

Click ‘Apply Now’ and complete the

application with promo code: bigdata-spain

Starter Pack

Offer Description

1

2

3

Go to cloud.google.com/developers/starterpack

Page 9: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

Agenda

Background

Example: Cross-row intervals

Under the hood

Example: Codebreaking

I.

II.

III.

IV.

Page 10: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

Images by Connie Zhou

Scenario:

Door access records from a

Very Well-Secured Lab

where users must badge in

to enter or leave

Image by Tod Kurt

Page 11: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

Images by Connie Zhou

Example:

Time-series analysis

from discrete user action data

Image by Tod Kurt

Page 12: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

user_id timestamp

Beep!!9h: arrive @ lab thomas 2014.07.15 09:00

Beep!!10h: leave to pick up

prototypethomas 2014.07.15 10:00

Beep!!10h15: return with

prototypethomas 2014.07.15 10:15

Beep!!12h: out for lunch thomas 2014.07.15 12:00

Page 13: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

How can we find out how much time each user spent in the lab?

...where each scan of the user’s access card is

represented as a discrete row?

Page 14: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

rownum user_id timestamp1 thomas 2014.07.15 09:002 thomas 2014.07.15 10:003 thomas 2014.07.15 10:154 thomas 2014.07.15 12:00

60 minutes

105 minutes

Our analysis with data in this format via SQL is horrid and painful

A BigQuery + JS friendly format:

data for each user in separate rows

user_id timestampsthomas [ 09:00, 10:00, 10:15, 12:00, ... ]hoffa [ 08:10, 11:30, 12:00, 12:15, ... ]

SELECT user_id, NEST(timestamp) AS timestampsFROM TGROUP BY user_id;

Producing this format is trivial in BigQuery...

Page 15: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

// This function will be called once for each user,// and receive an array of timestamps.function(record, emit) { var total_time = 0; // Order of records built by NEST are not guaranteed! // Sort to guarantee ascending timestamps. var ts = record.ts.sort( function (a, b) {return a > b;}); // Loop over timestamp pairs, calculate interval. for (var i = 0; i < ts.length - 1; i += 2) { total_time += (ts[i+1] - ts[i]); } // Emit total time for this user. emit({user: record.user_id, total_time: total_time});}

JS: Total time for each user

Page 16: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

SELECT * FROM js( // Input table or query. [secret-lab:door_scans.201411] // Input columns. user_id, timestamps, // Output schema. "[{name: 'user_id', type:'string'}, {name: 'tot_time', type:'integer'}]", // The function. "function(r, emit) { ... emit(...); }")

Page 17: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

SELECT * FROM js( // Input table or query. [secret-lab:door_scans.201411] // Input columns. user_id, timestamps, // Output schema. "[{name: 'user_id', type:'string'}, {name: 'tot_time', type:'integer'}]", // The function. "function(r, emit) { ... emit(...); }")

The JS function

Page 18: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

SELECT * FROM js( // Input table or query. [secret-lab:door_scans.201411] // Input columns. user_id, timestamps, // Output schema. "[{name: 'user_id', type:'string'}, {name: 'tot_time', type:'integer'}]", // The function. "function(r, emit) { ... emit(...); }")

Input schema (column

names only!)

Page 19: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

SELECT * FROM js( // Input table or query. [secret-lab:door_scans.201411] // Input columns. user_id, timestamps, // Output schema. "[{name: 'user_id', type:'string'}, {name: 'tot_time', type:'integer'}]", // The function. "function(r, emit) { ... emit(...); }")

Output schema (full

declaration)

Page 20: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

SELECT * FROM js( // Input table or query. [secret-lab:door_scans.201411] // Input columns. user_id, timestamps, // Output schema. "[{name: 'user_id', type:'string'}, {name: 'tot_time', type:'integer'}]", // The function. "function(r, emit) { ... emit(...); }")

Input table or subquery

Page 21: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

SELECT * FROM js( // Input table or query. [secret-lab:door_scans.201411] // Input columns. user_id, timestamps, // Output schema. "[{name: 'user_id', type:'string'}, {name: 'tot_time', type:'integer'}]", // The function. "function(r, emit) { ... emit(...); }")

Page 22: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

Agenda

Background

Example: Cross-row intervals

Under the hood

Example: Codebreaking

I.

II.

III.

IV.

Page 23: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

How BigQuery works

Get data from lower levels, filter / join / transform,send rows up

Tree Structured Query Dispatch and Aggregation

Distributed StorageSELECT title, requests

Leaf Leaf Leaf LeafSUM(requests)GROUP BY titleWHERE REGEX_MATCH(title, 'pat.*rn')

Mixer 1 Mixer 1 SUM(requests)GROUP BY title

Mixer 0

LIMIT 10ORDER BY c DESCSUM(requests)GROUP BY title

Page 24: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

Data for each row is calculated and

streamed through a “Row Iterator”

Subquery0 Subquery1

JOINRow Iterator 0 Row Iterator 1

Row Iterator 2

Page 25: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

Can insert JavaScript Functions

wherever we have a Row Iterator

Subquery0 Subquery1

JOINRow Iterator 0 Row Iterator 1

Row Iterator 2

UDF1

UDF0

Page 26: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

Join order item info with web hits info

SELECT item FROM

orders

SELECT query string FROM hits

JOINRow Iterator 0 Row Iterator 1

Row Iterator 2

UDF1

UDF0

Page 27: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

http://www.store.com/?q=7%2e1+Speakers

SELECT item FROM

orders

SELECT query string FROM hits

JOINRow Iterator 0 Row Iterator 1

Row Iterator 2

UDF1

UDF0

Page 28: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

http://www.store.com/?q=7%2e1+Speakers

Extract and decode query term => “7.1 Speakers”

SELECT item FROM

orders

SELECT query string FROM hits

JOINRow Iterator 0 Row Iterator 1

Row Iterator 2

UDF1

UDF0

Page 29: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

UDF execution

Subquery0 Subquery1

JOIN

UDF1

Process boundary

UDF0UDF0

User Code

Page 30: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

Agenda

Background

Example: Cross-row intervals

Under the hood

Example: Codebreaking

I.

II.

III.

IV.

Page 31: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

Demos:

Ñ

Page 32: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

Image: El Hormiguero (Flickr CC)

Page 33: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

http://jsfiddle.net/fhoffa/y4pt9s23/

Image: TheVanCats (Flickr CC)

Page 34: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

Questions?

News: reddit.com/r/bigqueryAsk: stackoverflow.com

Share: bigqueri.es

Thomas ParkFelipe Hoffa @felipehoffa +FelipeHoffa

Rate us?

http://goo.gl/k3bzdw

Page 35: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

Backup slides / screenshots

Page 36: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014
Page 37: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014
Page 38: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014
Page 39: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014
Page 40: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014
Page 41: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014
Page 42: BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at Big Data Spain 2014

17TH ~ 18th NOV 2014MADRID (SPAIN)