big data with bigquery, presented at devoxxuk 2014 by javier ramirez from teowaki
DESCRIPTION
Big data is amazing. You can get insights from your users, find interesting patterns and have lots of geek fun. Problem is big data usually means many servers, a complex set up, intensive monitoring and a steep learning curve. All those things cost money. If you don’t have the money, you are losing all the fun. In my talk I show you how you can use Google BigQuery to manage big data from your application using a hosted solution. And you can start with less than $1 per month.TRANSCRIPT
![Page 1: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/1.jpg)
@supercoco9#devoxxBigQuery
Big Data with Google BigQuery
Javier Ramirez@supercoco9https://teowaki.com
![Page 2: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/2.jpg)
@supercoco9#DevoxxBigquery
Managing Big Data with BigQuery
Javier Ramirez
•Writing software since 1996
•Web dev. since 1999 (C++, JAVA, PHP, Ruby, JS...)
•Founder of https://teowaki.com
•Google Developer Expert on the Cloud Platform
![Page 3: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/3.jpg)
@YourTwitterHandle@supercoco9#DevoxxBigquery
BIG
BIG
DAT
A
DAT
A
![Page 4: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/4.jpg)
@YourTwitterHandle@supercoco9#DevoxxBigquery
BIG
BIG
SERVER
S
SERVER
S
![Page 5: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/5.jpg)
@YourTwitterHandle@supercoco9#DevoxxBigquery
BIG
BIG
DEV
OPS
DEV
OPS
![Page 6: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/6.jpg)
@YourTwitterHandle@supercoco9#DevoxxBigquery
BIG
BIG
MONEY
MONEY
![Page 7: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/7.jpg)
bigdata is cool but...
hard to set up and monitor
expensive cluster
not interactive enough
![Page 8: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/8.jpg)
@supercoco9#DevoxxBigquery
bigdata is doing a fullscan to 330MM rows, matching them against a regexp, and getting the result (223MM rows) in just 5 seconds
![Page 9: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/9.jpg)
Google BigQuery
Data analysis as a service
http://developers.google.com/bigquery
![Page 10: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/10.jpg)
Based on “Dremel”
Specifically designed for interactive queries over
petabytes of real-time data
![Page 11: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/11.jpg)
@supercoco9#DevoxxBigquery
Your only worries
•Load data
•Query the dataset
![Page 12: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/12.jpg)
loading data.
You just send the data in
text (or JSON) format
![Page 13: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/13.jpg)
up to 100K inserts per second
in stream mode
![Page 14: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/14.jpg)
It's just SQL
select name from USERS order by date;
select count(*) from users;
select max(date) from USERS;
select sum(total) from ORDERS group by user;
![Page 15: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/15.jpg)
@supercoco9#DevoxxBigquery
Subselect and joins out of the box
SELECT Year, Actor1Name, Actor2Name, Count FROM (SELECT Actor1Name, Actor2Name, Year, COUNT(*) Count, RANK() OVER(PARTITION BY YEAR ORDER BY Count DESC) rankFROM
(SELECT Actor1Name, Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name < Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode), (SELECT Actor2Name Actor1Name, Actor1Name Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name > Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode),
WHERE Actor1Name IS NOT nullAND Actor2Name IS NOT nullGROUP EACH BY 1, 2, 3HAVING Count > 100)
WHERE rank=1ORDER BY Year
http://gdeltproject.org/data.html#googlebigquery
![Page 16: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/16.jpg)
@supercoco9#DevoxxBigquery
specific extensions for analytics
withinflattennest
stddev
topfirstlastnth
variance
var_popvar_samp
covar_popcovar_samp
quantiles
correlations
![Page 17: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/17.jpg)
Things you always wanted to try but were too scared to
select count(*) from publicdata:samples.wikipedia where REGEXP_MATCH(title, "[0-9]*") AND wp_namespace = 0;
223,163,387 Query complete (5.6s elapsed, 9.13 GB processed, Cost: 32¢)
![Page 18: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/18.jpg)
columnar storage
https://cookbook.experiencesaphana.com/crm/what-is-crm-on-hana/technology-innovation/row-vs-column-based/
![Page 19: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/19.jpg)
highly distributed execution using a tree
![Page 20: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/20.jpg)
web console screenshot
![Page 21: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/21.jpg)
@supercoco9#DevoxxBigquery
country segmented traffic
![Page 22: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/22.jpg)
@supercoco9#DevoxxBigqueryjavier ramirez @supercoco9 https://teowaki.com
window functions
![Page 23: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/23.jpg)
@supercoco9#DevoxxBigquery
our most active user
![Page 24: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/24.jpg)
@supercoco9#DevoxxBigquery
Worldwide events in the last 36 years
SELECT Year, Actor1Name, Actor2Name, Count FROM (SELECT Actor1Name, Actor2Name, Year, COUNT(*) Count, RANK() OVER(PARTITION BY YEAR ORDER BY Count DESC) rankFROM
(SELECT Actor1Name, Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name < Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode), (SELECT Actor2Name Actor1Name, Actor1Name Actor2Name, Year FROM [gdelt-bq:full.events] WHERE Actor1Name > Actor2Name and Actor1CountryCode != '' and Actor2CountryCode != '' and Actor1CountryCode!=Actor2CountryCode),
WHERE Actor1Name IS NOT nullAND Actor2Name IS NOT nullGROUP EACH BY 1, 2, 3HAVING Count > 100)
WHERE rank=1ORDER BY Year
http://gdeltproject.org/data.html#googlebigquery
![Page 25: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/25.jpg)
![Page 26: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/26.jpg)
![Page 27: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/27.jpg)
SELECT repository_name, repository_language, repository_description, COUNT(repository_name) as cnt,repository_urlFROM github.timelineWHERE type="WatchEvent"AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC("#{yesterday} 20:00:00")AND repository_url IN (
SELECT repository_urlFROM github.timelineWHERE type="CreateEvent"AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('#{yesterday}
20:00:00')AND repository_fork = "false"AND payload_ref_type = "repository"GROUP BY repository_url
)GROUP BY repository_name, repository_language, repository_description, repository_urlHAVING cnt >= 5ORDER BY cnt DESCLIMIT 25
![Page 28: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/28.jpg)
@supercoco9#DevoxxBigquery
![Page 29: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/29.jpg)
![Page 30: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/30.jpg)
![Page 31: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/31.jpg)
@supercoco9#DevoxxBigquery
Automation with Apps Script
●Read from BigQuery
●Create a spreadsheet on Drive
●E-mail it everyday as a PDF
https://developers.google.com/apps-script/
![Page 32: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/32.jpg)
![Page 33: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/33.jpg)
![Page 34: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/34.jpg)
![Page 35: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/35.jpg)
@supercoco9#DevoxxBigquery
bigquery pricing
$26 per stored TB1000000 rows => $0.00416 / month
£0.00243 / month
$5 per processed TB1 full scan = 160 MB
1 count = 0 MB1 full scan over 1 column = 5.4 MB100 GB => $0.05 / month £0.03
AppsScripts is for free
![Page 36: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/36.jpg)
@supercoco9#DevoxxBigquery
£0.054307 / month*
per 1MM rows
*the 1st 1TB every month is free of charge**assumming your rows have web server logs-like info
price per month
![Page 37: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/37.jpg)
@supercoco9#DevoxxBigquery
ig
![Page 38: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/38.jpg)
@YourTwitterHandle#DVXFR14{session hashtag} @supercoco9#devoxxBigquery
THAN
KS!
Javier Ramirez@supercoco9https://teowaki.com
Related links at:
https://teowaki.com/teams/javier-community/link-categories/bigquery-talk
![Page 39: Big Data with BigQuery, presented at DevoxxUK 2014 by Javier Ramirez from teowaki](https://reader033.vdocuments.mx/reader033/viewer/2022060108/554f5a5cb4c905b9508b5344/html5/thumbnails/39.jpg)
@supercoco9#DevoxxBigquery
Thanks / Creative Commons
•Presentation Template — Guillaume LaForge
•The Queen — A prestigious heritage with some inspiration from The Sex Pistols and funny Devoxxians
•Girl with a Balloon — Banksy
•Tube — Michael Keen