google bigquery
TRANSCRIPT
![Page 1: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/1.jpg)
Google BigQuery
![Page 2: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/2.jpg)
About myselfMatthias Feys
work @Datatonic: - big data (with Google Cloud) - machine learning - data visualizations (Tableau/Spotfire)
Google Qualified Cloud Developer
contact:
- @FsMatt- [email protected]
![Page 3: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/3.jpg)
About DatatonicDatatonic is a team of data science experts that help corporations unleash the power of data. They use Google Cloud Platform, data visualisation technologies (like Tableau or Spotfire) and machine learning to build breakthrough solutions. Either as expert advisors, or built as a fully managed solution with support end-to-end (A3S).
Some references:
@teamdatatonic
![Page 4: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/4.jpg)
● What is BigQuery?● How does it scale/work?● How does it compare to:
- NoSQL datastores- MapReduce
● Demo● Pricing Model● Best Practices
This talk
![Page 5: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/5.jpg)
What is BigQuery?
“BigQuery is a fully-managed and cloud-based interactive query service for massive datasets.”
It’s the externalization of Dremel, one of
Google’s core technologies
![Page 6: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/6.jpg)
What is BigQuery? (2)BigQuery Service is available via:
● Web UI (bigquery.cloud.google.com)● console (gcloud)● API (+ client libraries)● external tools (Tableau, Excel, …)● ODBC connector
![Page 7: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/7.jpg)
How Does it Scale?
![Page 8: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/8.jpg)
Dremel Architecture
Data Model/Storage:- Columnar Storage- Nested/Repeated Fields- No Index! -> Single Full Table Scan (from disk)
Query Execution:- Tree Architecture- Using tens of thousands machines over fast Google network (+1Petabit/s)
![Page 9: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/9.jpg)
Columnar Storage
● Traffic minimization:○ only read selected
columns
● Higher Compression Ratio:○ Similar values in the
same column○ From 1:3 → 1:10
![Page 10: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/10.jpg)
Tree Architecture- root server: ->receives query + reads table metadata ->rewrites the query(s) ->sends queries to the next level <-returns final query results
- intermediate servers: ->(similar steps) <-parallel partial aggregation
- leaf servers: ->actually scan (parts) of the table <-send data to intermediate servers
![Page 11: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/11.jpg)
NoSQL Datastore vs. BigQuery?
NoSQL Datastore
● Index based (expected queries)
● Read-write
BigQuery
● Non-index based (ad hoc queries)
● Read-only (append-only)
![Page 12: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/12.jpg)
MapReduce vs. BigQuery?
MapReduce
● High latency● Flexible (complex)
batch processing● Unstructured data
BigQuery
● Low latency ● SQL-like queries
● Structured data
![Page 13: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/13.jpg)
Demo’s
![Page 15: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/15.jpg)
Pricing Model
Category Price Note
Storage Cost $0.020 per GB, per month
Query Cost $5 per TB 1st TB per month is free
![Page 16: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/16.jpg)
Best Practices
![Page 17: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/17.jpg)
Denormalize / Pre-Join Where Possible● Best performance● Only pay for the columns you need● Nested/repeated fields!
Relational Database Design Denormalized Nested/Repeated (JSON)
![Page 18: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/18.jpg)
Table Sharding● You pay for what you read
→ Read less, pay less● Table wildcards allow for easy reading over multiple tables
https://cloud.google.com/bigquery/query-reference#tablewildcardfunctions
![Page 19: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/19.jpg)
Optimize for Query vs. Storage CostsCommon Queries?
- Materialized views(save intermediate results in tables)with pre-aggregated data:
→ faster + cheaper queries
- Store data in multiple tables:- table for daily data- table for weekly data- table for monthly data
![Page 20: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/20.jpg)
Narrow the Table ScansYou only pay for the columns you read
Don’t use “SELECT *” !!!
![Page 21: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/21.jpg)
Table DecoratorsOnly way to avoid doing full table scans!
Allows undeleting tables
options:
● snapshot decorator + range decorator● relative value + absolute values
https://cloud.google.com/bigquery/table-decorators
https://cloud.google.com/bigquery/table-decorators
![Page 22: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/22.jpg)
Query optimizationsQuery Plan
https://cloud.google.com/bigquery/query-plan-explanation
![Page 23: Google BigQuery](https://reader031.vdocuments.mx/reader031/viewer/2022021420/586fdcf21a28ab18428b66b7/html5/thumbnails/23.jpg)
Big Data Reference Architecture