![Page 1: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/1.jpg)
BigQuery Basics
![Page 2: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/2.jpg)
Who? Why?
BigQuery Basics
Ido GreenDeveloper Advocate
plus.google.com/greenido
greenido.wordpress.com
![Page 3: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/3.jpg)
Topics we cover
● BigQuery Overview● Typical Uses● Project Hierarchy
○ Access Control and Security○ Datasets and Tables
● Tools● Demos
BigQuery Basics
![Page 4: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/4.jpg)
● MapReduce based analysis can be slow for ad-hoc queries
● Managing data centers and tuning software takes time & money
● Analytics tools should be services
How does BigQuery fit in the analytics landscape?
BigQuery Basics
![Page 5: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/5.jpg)
Why BigQuery?
● Generate big data reports require expensive servers and skilled database administrators
● Interacting with big data has been expensive, slow and inefficient
● BigQuery changes all that○ Reducing time and expense to query data
BigQuery Basics
![Page 6: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/6.jpg)
What's BigQuery?● Service for interactive analysis of massive datasets (TBs)
○ Query billions of rows: seconds to write, seconds to return○ Uses a SQL-style query syntax○ It's a service, accessed by a RESTful API
● Reliable and secure○ Replicated across multiple sites○ Secured through Access Control Lists
● Scalable○ Store hundreds of terabytes○ Pay only for what you use
● Fast (really)○ Run ad hoc queries on multi-terabyte data sets in seconds
BigQuery Basics
![Page 7: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/7.jpg)
Analyzing Large Amount of Data .....at high speed
BigQuery Basics
demobigquery.appspot.com
![Page 8: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/8.jpg)
Uses
![Page 9: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/9.jpg)
Typical UsesAnalyzing query results using a visualization library such as Google Charts Tools API
BigQuery Basics
![Page 10: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/10.jpg)
Typical UsesAnother way to analyze query results with Google Spreadsheets
○ greenido.wordpress.com/2013/12/16/big-query-and-google-spreadsheet-intergration/
○ greenido.wordpress.com/2013/07/24/big-query-power-with-javascript/
BigQuery Basics
![Page 11: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/11.jpg)
BigQuery Use Cases● Log Analysis - Making sense of computer generated records
● Retailer - Using data to forecast product sales
● Ads Targeting - Targeting proper customer sections
● Sensor Data - Collect and visualize ambient data
● Data Mashup - Query terabytes of heterogeneous data
BigQuery Basics
![Page 12: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/12.jpg)
Some Customer Case Studies
Uses BigQuery to hone ad targeting and gain insights into their business
Dashboards using BigQuery to analyze booking and inventory data
Use BigQuery to provide their customers ways to expand game engagement and find new channels for monetization
Used BigQuery, App Engine and the Visualizaton API to build a business intelligence solution
BigQuery Basics
![Page 13: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/13.jpg)
BigQuery Basic Technical Details
![Page 14: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/14.jpg)
Project Hierarchy● Project
○ All data in BigQuery belongs inside a project○ Set of users, APIs, authentication, billing information and ACL
● Dataset○ Holds one or more tables ○ Lowest access control unit (to which ACLs are applied)
● Table○ Row-column structure that contains actual data
● Job○ Used to start
potentially long running queries
BigQuery Basics
![Page 15: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/15.jpg)
Datasets and Tables
Table name is represented as follows:● Current Project
<dataset>.<table name>
● Different Project <project>:<dataset>.<table>
e.g. publicdata:samples.wikipedia
BigQuery Basics
![Page 16: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/16.jpg)
Schema Example● Demographics about names occurrence table schema
name:string,gender:string,count:integer
BigQuery Basics
![Page 17: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/17.jpg)
Data Types● String
○ UTF-8 encoded, <64kB● Integer
○ 64 bit signed● Float● Boolean
○ "true" or "false", case insensitive● Timestamp
○ String format■ YYYY-MM-DD HH:MM:SS[.sssss] [+/-][HH:MM]
○ Numeric format (seconds from UNIX epoch)■ 1234567890, 1.234567890123456E9
(*) Max row size: 64kBDate type is supported as timestamp
BigQuery Basics
![Page 18: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/18.jpg)
Data FormatBigQuery supports the following format for loading data:
1. Comma Separated Values (CSV)
2. JSON a. BigQuery can load data faster, if your data contains
embedded newlines.b. Supports nested/repeated data fields
BigQuery Basics
![Page 19: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/19.jpg)
Loading data with repeated and nested fields is supported by JSON data format only
Repeated and Nested Fields
BigQuery Basics
[
{
"fields": [
{
"mode": "nullable",
"name": "country",
"type": "string"
},
{
"mode": "nullable",
"name": "city",
"type": "string"
}
],
"mode": "repeated",
"name": "location",
"type": "record"
},
...........
[
{
"fields": [
{
"mode": "nullable",
"name": "country",
"type": "string"
},
{
"mode": "nullable",
"name": "city",
"type": "string"
}
],
"mode": "repeated",
"name": "location",
"type": "record"
},
...........
Schema example
![Page 20: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/20.jpg)
Accessing BigQuery
● BigQuery Web browser○ Imports/exports data, runs
queries ● bq command line tool
○ Performs operations from the command line
● Service API○ RESTful API to access
BigQuery programmatically○ Requires authorization by
OAuth2○ Google client libraries for
Python, Java, JavaScript, PHP, ...
○
BigQuery Basics
![Page 21: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/21.jpg)
Third-party Tools
BigQuery Basics
Visualization and Business Intelligence
ETL tools for loading data into BigQuery
![Page 22: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/22.jpg)
Example of Visualization ToolsUsing commercial visualization tools to graph the query results
BigQuery Basics
![Page 23: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/23.jpg)
Loading Data Using the Web Browser● Upload from local disk or from Cloud Storage● Start the Web browser● Select Dataset● Create table and follow the wizard steps
BigQuery Basics
![Page 24: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/24.jpg)
"bq load" commandSyntax
● If not specified, the default file format is CSV (comma separated values)● The files can also use newline delimited JSON format● Schema
○ Either a filename or a comma-separated list of column_name:datatype pairs that describe the file format.
● Data source may be on local machine or on Cloud Storage
Loading Data Using bq Tool
BigQuery Basics
bq load [--source_format=NEWLINE_DELIMITED_JSON|CSV]
destination_table data_source_uri table_schema
![Page 25: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/25.jpg)
● 1,000 import jobs per table per day
● 10,000 import jobs per project per day
● File size (for both CSV and JSON)○ 1GB for compressed file○ 1TB for uncompressed
■ 4GB for uncompressed CSV with newlines in strings
● 10,000 files per import job
● 1TB per import job
Load Limitations
BigQuery Basics
![Page 26: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/26.jpg)
CSV/JSON must be split into chunks less than 1TB● "split" command with --line-bytes option● Split to smaller files
○ Easier error recovery○ To smaller data unit (day, month instead of year)
● Uploading to Cloud Storage is recommended
Best Practices
Cloud Storage BigQuery
BigQuery Basics
![Page 27: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/27.jpg)
● Split Tables by Dates○ Minimize cost of data scanned○ Minimize query time
● Upload Multiple Files to Cloud Storage○ Allows parallel upload into BigQuery
● Denormalize your data
Best Practices
BigQuery Basics
![Page 28: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/28.jpg)
Google I/O Data Sensing ● Start the BigQuery Web browser● Click on Display Project in the project chooser dialog window● Enter data-sensing-lab when prompted
● In the dataset data-sensing-lab:io_sensor_data, select the table moscone_io13
● In the New Query box, enter the following query: SELECT * FROM [data-sensing-lab:io_sensor_data.moscone_io13] LIMIT 10
● Click Run Query button● Scroll to see relevant results
BigQuery Basics
![Page 29: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/29.jpg)
Data Structure● Define table schema when creating table● Data is stored in per-column structure● Each column is handled separately and only combined when
necessaryAdvantage of this data structure:● No need to set index in advance● Load only the relevant Columns
BigQuery Basics
![Page 30: Big Query - Women Techmarkers (Ukraine - March 2014)](https://reader033.vdocuments.mx/reader033/viewer/2022042521/5553b5e4b4c905d9448b4d35/html5/thumbnails/30.jpg)
Questions?
BigQuery Basics
Thank you!