bigquery implementation

29
Google BigQuery - Big data with SQL like query feature, but fast... Google BigQuery Google BigQuery

Upload: simon-su

Post on 22-Nov-2014

147 views

Category:

Technology


4 download

DESCRIPTION

Google BigQuery technical presentation for starting use of BigQuery

TRANSCRIPT

Page 1: BigQuery implementation

Google BigQuery - Big data with SQL like query feature, but fast...

Google BigQueryGoogle BigQuery

Page 2: BigQuery implementation

BigQuery Features

● TB level data analysis● Fast mining response● SQL like query language● Multi-dataset interactive

support● Cheap and pay by use● Offline job support

Page 3: BigQuery implementation

Getting Start

Page 4: BigQuery implementation

BigQuery Web UI

https://bigquery.cloud.google.com/

Page 5: BigQuery implementation

BigQuery structure● Project● Dataset● Table● Job

Page 6: BigQuery implementation

Handson - Import

Page 7: BigQuery implementation

The easily way - Import Wizard

Page 8: BigQuery implementation

Load Data to BigQuery in CMD

CSV / JSON Cloud Storage BigQuery

Page 9: BigQuery implementation

Load CSV to BigQuerygsutil cp [source] gs://[bucket-name]# gsutil cp ~/Desktop/log.csv gs://your-bucket/Copying file:///Users/simonsu/Desktop/log.csv [Content-Type=text/csv]...Uploading: 4.59 MB/36.76 MB

bq load [project]:[dataset].[table] gs://[bucket]/[csv path] [schema]# bq load project.dataset gs://your-bucket/log.csv IP:STRING,DNS:STRING,TS:STRING,URL:STRING

Waiting on bqjob_rf4f3f1d9e2366a6_00000142c1bdd36f_1 ... (24s) Current status: DONE

Page 10: BigQuery implementation

Load JSON to BigQuerybq load --source_format NEWLINE_DELIMITED_JSON \ [project]:[dataset].[table] [json file] [schema file]

# bq load --source_format NEWLINE_DELIMITED_JSON testbq.jsonTest ./sample.json ./schema.json

Waiting on bqjob_r7182196a0278f1c6_00000145f940517b_1 ... (39s) Current status: DONE

# bq load --source_format NEWLINE_DELIMITED_JSON testbq.jsonTest gs://your-bucket/sample.json ./schema.

json

Waiting on bqjob_r7182196a0278f1c6_00000145f940517b_1 ... (39s) Current status: DONE

Page 11: BigQuery implementation

Handson - Query

Page 12: BigQuery implementation

Web way - Query Console

Page 13: BigQuery implementation

Install google_cloud_sdk (https://developers.google.com/cloud/sdk/)

Shell way - bq commad

Page 14: BigQuery implementation

Shell way - bq commad

bq query <sql_query># bq query 'select charge_unit,charge_desc,one_charge from testbq.test'

Page 15: BigQuery implementation

BigQuery - Query Language

Page 16: BigQuery implementation

Query syntax● SELECT● WITHIN● FROM● FLATTEN● JOIN● WHERE● GROUP BY● HAVING● ORDER BY● LIMIT

Query supportSupported functions and operators

● Aggregate functions● Arithmetic operators● Bitwise operators● Casting functions● Comparison functions● Date and time functions● IP functions● JSON functions● Logical operators● Mathematical functions● Regular expression functions● String functions● Table wildcard functions● URL functions● Window functions● Other functions

Page 17: BigQuery implementation

select charge_unit,charge_desc,one_charge from testbq.test

Select

+-----------------+----------------+--------------------+| charge_unit | charge_desc | one_charge |+-----------------+----------------+--------------------+| M | 按月計費 |0 || D | 按日計費 |0 || HH | 小時計費 |0 || T | 分計費 |0 || SS | 按次計費 |1 | +-----------------+----------------+--------------------+

Page 18: BigQuery implementation

SELECT a.THEID, a.THENAME ,b.DESCRIPITON FROM user01.USER_MST a LEFT JOIN user01.USER_DETAIL_MST b on a.THEID = b.THEID limit 10'

Join

+-----------------+----------------+-----------------------------+| a_THEPID | a_THENAME | b_DESCRIPITON |+-----------------+----------------+-----------------------------+| 2 | 關於道具 |在道具編成道具。 | | 2 | 關於道具 |寶玉。 || 1 | 關於夥伴 |勇氣覺醒。 || 1 | 關於夥伴 |編輯進行任務的隊伍。 || 1 | 關於夥伴 |數個不同的類型 |+-----------------+----------------+-----------------------------+

Page 19: BigQuery implementation

SELECT

fullName,

age,

gender,

citiesLived.place

FROM (FLATTEN([dataset.tableId], children))

WHERE

(citiesLived.yearsLived > 1995) AND

(children.age > 3)

GROUP BY fullName, age, gender, citiesLived.place

Flatten

+------------+-----+--------+--------------------+

| fullName | age | gender | citiesLived_place |

+------------+-----+--------+--------------------+

| John Doe | 22 | Male | Stockholm |

| Mike Jones | 35 | Male | Los Angeles |

| Mike Jones | 35 | Male | Washington DC |

| Mike Jones | 35 | Male | Portland |

| Mike Jones | 35 | Male | Austin |

+------------+-----+--------+---------------------+

Page 20: BigQuery implementation

SELECT word, COUNT(word) AS countFROM publicdata:samples.shakespeareWHERE (REGEXP_MATCH(word,r'\w\w\'\w\w'))GROUP BY wordORDER BY count DESCLIMIT 3;

Regular Expression

+-----------------+----------------+| word | count |+-----------------+----------------+| ne'er | 42 || we'll | 35 || We'll | 33 |+-----------------+----------------+

Page 21: BigQuery implementation

SELECT TOP (FORMAT_UTC_USEC(timestamp * 1000000), 5) AS top_revision_time, COUNT (*) AS revision_countFROM [publicdata:samples.wikipedia];

+----------------------------+----------------+| top_revision_time | revision_count |+----------------------------+----------------+| 2002-02-25 15:51:15.000000 | 20971 || 2002-02-25 15:43:11.000000 | 15955 || 2010-01-14 15:52:34.000000 | 3 || 2009-12-31 19:29:19.000000 | 3 || 2009-12-28 18:55:12.000000 | 3 |+----------------------------+----------------+

Time Function

Page 22: BigQuery implementation

SELECT DOMAIN(repository_homepage) AS user_domain, COUNT(*) AS activity_countFROM [publicdata:samples.github_timeline]GROUP BY user_domainHAVING user_domain IS NOT NULL AND user_domain != ''ORDER BY activity_count DESCLIMIT 5;

IP Function

+-----------------+----------------+| user_domain | activity_count |+-----------------+----------------+| github.com | 281879 || google.com | 34769 || khanacademy.org | 17316 || sourceforge.net | 15103 || mozilla.org | 14091 |+-----------------+----------------+

Page 23: BigQuery implementation

Handson - Programming

Page 24: BigQuery implementation

● Prepare a Google Cloud Platform project● Create a Service Account● Generate key from Service Account p12 key

Prepare

Page 25: BigQuery implementation

Google Service Account

web server applictionservice account

v.s.

Page 26: BigQuery implementation

Prepare Authentications

p12 key → pem key轉換$ openssl pkcs12 -in privatekey.p12 -out privatekey.pem -nocerts $ openssl rsa -in privatekey.pem -out key.pem

Page 27: BigQuery implementation

Node.js - bigquery模組

var bq = require('bigquery') , prjId = 'your-bigquery-project-id';

bq.init({ client_secret: '/path-to-client_secret.json', privatekey_pem: '/path-to-privatekey.pem', key_pem: '/path-to-key.pem'});

bq.job.listds(prjId, function(e,r,d){ if(e) console.log(e); console.log(JSON.stringify(d));});

操作時,透過bq呼叫job之下的function做操作

bigquery模組可參考:https://github.com/peihsinsu/bigquery

Page 28: BigQuery implementation

/* Ref: https://developers.google.com/apps-script/advanced/bigquery */var request = { query: 'SELECT TOP(word, 30) AS word, COUNT(*) AS word_count ' + 'FROM publicdata:samples.shakespeare WHERE LENGTH(word) > 10;' };var queryResults = BigQuery.Jobs.query(request, projectId);var jobId = queryResults.jobReference.jobId;queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId);var rows = queryResults.rows;while (queryResults.pageToken) { queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId, { pageToken: queryResults.pageToken }); rows = rows.concat(queryResults.rows);}

Google Drive way - Apps Script