workshop 20140522 bigquery implementation

33
MiTAC MiCloud - Google Cloud Platform Partner @ APAC 2014Q2 BigQuery Workshop Google BigQuery Big data with SQL like query feature, but fast... Google BigQuery Google BigQuery http://goo.gl/XZmqgN

Upload: simon-su

Post on 10-May-2015

433 views

Category:

Technology


5 download

DESCRIPTION

The BigQuery starter guide for load data using CSV or JSON format. And the query guide...

TRANSCRIPT

Page 1: Workshop 20140522   BigQuery Implementation

MiTAC MiCloud - Google Cloud Platform Partner @ APAC2014Q2 BigQuery Workshop

Google BigQuery Big data with SQL like query feature, but fast...

Google BigQueryGoogle BigQueryhttp://goo.gl/XZmqgN

Page 2: Workshop 20140522   BigQuery Implementation

RESTful

GCE LB

前言:

● 我們要實作喔~ 有興趣的

朋友,請打開您的電腦...

● 開好GCP專案?

● Enable Billing了?

● 裝好google_cloud_sdk?

● 這裡的無線AP:

○ 帳號:

○ 密碼:

Data Access

Big Data Access

Frontend Services

Backend Services

Page 3: Workshop 20140522   BigQuery Implementation

BigQuery它是...

● TB level data analysis● Fast mining response● SQL like query language● Multi-dataset interactive

support● Cheap and pay by use● Offline job support

Page 4: Workshop 20140522   BigQuery Implementation

Getting Start

Page 5: Workshop 20140522   BigQuery Implementation

BigQuery Web UI

https://bigquery.cloud.google.com/

Page 6: Workshop 20140522   BigQuery Implementation

BigQuery structure● Project● Dataset● Table● Job

Page 7: Workshop 20140522   BigQuery Implementation

Handson - Import

Page 9: Workshop 20140522   BigQuery Implementation

The easily way - Import Wizard

Page 10: Workshop 20140522   BigQuery Implementation

JCMB_2014.csv Schema

date_time:String,atmospheric_pressure:float,rainfall:float,wind_speed:float,wind_direction:float,surface_temperature:float,relative_humidity:float,solar_flux:float,battery:float

Page 11: Workshop 20140522   BigQuery Implementation

Load Data to BigQuery in CMD

CSV / JSON Cloud Storage BigQuery

Page 12: Workshop 20140522   BigQuery Implementation

Load CSV to BigQuerygsutil cp [source] gs://[bucket-name]# gsutil cp ~/Desktop/log.csv gs://your-bucket/Copying file:///Users/simonsu/Desktop/log.csv [Content-Type=text/csv]...Uploading: 4.59 MB/36.76 MB

bq load [project]:[dataset].[table] gs://[bucket]/[csv path] [schema]# bq load project.dataset gs://your-bucket/log.csv IP:STRING,DNS:STRING,TS:STRING,URL:STRING

Waiting on bqjob_rf4f3f1d9e2366a6_00000142c1bdd36f_1 ... (24s) Current status: DONE

Page 13: Workshop 20140522   BigQuery Implementation

Load JSON to BigQuerybq load --source_format NEWLINE_DELIMITED_JSON \ [project]:[dataset].[table] [json file] [schema file]

# bq load --source_format NEWLINE_DELIMITED_JSON testbq.jsonTest ./sample.json ./schema.json

Waiting on bqjob_r7182196a0278f1c6_00000145f940517b_1 ... (39s) Current status: DONE

# bq load --source_format NEWLINE_DELIMITED_JSON testbq.jsonTest gs://your-bucket/sample.json ./schema.

json

Waiting on bqjob_r7182196a0278f1c6_00000145f940517b_1 ... (39s) Current status: DONE

Page 14: Workshop 20140522   BigQuery Implementation

Handson - Query

Page 15: Workshop 20140522   BigQuery Implementation

Web way - Query Console

Page 16: Workshop 20140522   BigQuery Implementation

Install google_cloud_sdk (https://developers.google.com/cloud/sdk/)

Shell way - bq commad

Page 17: Workshop 20140522   BigQuery Implementation

Shell way - bq commad

bq query <sql_query># bq query 'select charge_unit,charge_desc,one_charge from testbq.test'

Page 18: Workshop 20140522   BigQuery Implementation

BigQuery - Query Language

Page 19: Workshop 20140522   BigQuery Implementation

Query syntax● SELECT● WITHIN● FROM● FLATTEN● JOIN● WHERE● GROUP BY● HAVING● ORDER BY● LIMIT

Query supportSupported functions and operators

● Aggregate functions● Arithmetic operators● Bitwise operators● Casting functions● Comparison functions● Date and time functions● IP functions● JSON functions● Logical operators● Mathematical functions● Regular expression functions● String functions● Table wildcard functions● URL functions● Window functions● Other functions

Page 20: Workshop 20140522   BigQuery Implementation

select charge_unit,charge_desc,one_charge from testbq.test

Select

+-----------------+----------------+--------------------+| charge_unit | charge_desc | one_charge |+-----------------+----------------+--------------------+| M | 按月計費 |0 || D | 按日計費 |0 || HH | 小時計費 |0 || T | 分計費 |0 || SS | 按次計費 |1 | +-----------------+----------------+--------------------+

Page 21: Workshop 20140522   BigQuery Implementation

SELECT a.order_id,a.sales,b.begin_use_date FROM testbq.order_master a LEFT JOIN testbq.order_detail b ON a.order_id = b.order_id

Join

+-----------------+----------------+-----------------------------+| a_order_id | a_sales | b_begin_use_date |+-----------------+----------------+-----------------------------+| OM2003 | D589 | 2011-11-01 17:43:00 UTC | | OM2004 | D589 | 2011-11-01 09:43:00 UTC || OM2005 | D589 | 2011-11-01 17:55:00 UTC || OM2006 | D589 | 2011-11-01 17:54:00 UTC || OM2007 | D589 | 2011-11-03 16:31:00 UTC |+-----------------+----------------+-----------------------------+

Page 22: Workshop 20140522   BigQuery Implementation

SELECT

fullName,

age,

gender,

citiesLived.place

FROM (FLATTEN([dataset.tableId], children))

WHERE

(citiesLived.yearsLived > 1995) AND

(children.age > 3)

GROUP BY fullName, age, gender, citiesLived.place

Flatten

+------------+-----+--------+--------------------+

| fullName | age | gender | citiesLived_place |

+------------+-----+--------+--------------------+

| John Doe | 22 | Male | Stockholm |

| Mike Jones | 35 | Male | Los Angeles |

| Mike Jones | 35 | Male | Washington DC |

| Mike Jones | 35 | Male | Portland |

| Mike Jones | 35 | Male | Austin |

+------------+-----+--------+---------------------+

Page 23: Workshop 20140522   BigQuery Implementation

SELECT word, COUNT(word) AS countFROM publicdata:samples.shakespeareWHERE (REGEXP_MATCH(word,r'\w\w\'\w\w'))GROUP BY wordORDER BY count DESCLIMIT 3;

Regular Expression

+-----------------+----------------+| word | count |+-----------------+----------------+| ne'er | 42 || we'll | 35 || We'll | 33 |+-----------------+----------------+

Page 24: Workshop 20140522   BigQuery Implementation

SELECT TOP (FORMAT_UTC_USEC(timestamp * 1000000), 5) AS top_revision_time, COUNT (*) AS revision_countFROM [publicdata:samples.wikipedia];

+----------------------------+----------------+| top_revision_time | revision_count |+----------------------------+----------------+| 2002-02-25 15:51:15.000000 | 20971 || 2002-02-25 15:43:11.000000 | 15955 || 2010-01-14 15:52:34.000000 | 3 || 2009-12-31 19:29:19.000000 | 3 || 2009-12-28 18:55:12.000000 | 3 |+----------------------------+----------------+

Time Function

Page 25: Workshop 20140522   BigQuery Implementation

SELECT DOMAIN(repository_homepage) AS user_domain, COUNT(*) AS activity_countFROM [publicdata:samples.github_timeline]GROUP BY user_domainHAVING user_domain IS NOT NULL AND user_domain != ''ORDER BY activity_count DESCLIMIT 5;

IP Function

+-----------------+----------------+| user_domain | activity_count |+-----------------+----------------+| github.com | 281879 || google.com | 34769 || khanacademy.org | 17316 || sourceforge.net | 15103 || mozilla.org | 14091 |+-----------------+----------------+

Page 26: Workshop 20140522   BigQuery Implementation

Handson - Programming

Page 27: Workshop 20140522   BigQuery Implementation

● Prepare a Google Cloud Platform project● Create a Service Account● Generate key from Service Account p12 key

Prepare

Page 28: Workshop 20140522   BigQuery Implementation

Google Service Account

web server applictionservice account

v.s.

Page 29: Workshop 20140522   BigQuery Implementation

Prepare Authentications

p12 key → pem key轉換$ openssl pkcs12 -in privatekey.p12 -out privatekey.pem -nocerts $ openssl rsa -in privatekey.pem -out key.pem

Page 30: Workshop 20140522   BigQuery Implementation

Node.js - bigquery模組

var bq = require('bigquery') , prjId = 'your-bigquery-project-id';

bq.init({ client_secret: '/path/to/client_secret.json', key_pem: '/path/to/key.pem'});

bq.job.listds(prjId, function(e,r,d){ if(e) console.log(e); console.log(JSON.stringify(d));}); 操作時,透過bq呼叫job之下的

function做操作

bigquery模組可參考:https://github.com/peihsinsu/bigquery

Page 31: Workshop 20140522   BigQuery Implementation

/* Ref: https://developers.google.com/apps-script/advanced/bigquery */var request = { query: 'SELECT TOP(word, 30) AS word, COUNT(*) AS word_count ' + 'FROM publicdata:samples.shakespeare WHERE LENGTH(word) > 10;' };var queryResults = BigQuery.Jobs.query(request, projectId);var jobId = queryResults.jobReference.jobId;queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId);var rows = queryResults.rows;while (queryResults.pageToken) { queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId, { pageToken: queryResults.pageToken }); rows = rows.concat(queryResults.rows);}

Google Drive way - Apps Script

Page 33: Workshop 20140522   BigQuery Implementation

http://goo.gl/LD4RN4