intro to new google cloud technologies: google storage, prediction api, bigquery

44

Upload: chris-schalk

Post on 21-Jan-2015

4.098 views

Category:

Technology


4 download

DESCRIPTION

This is an introductory presentation given at DevFest Madrid 2010 by Google Developer Advocate Chris Schalk. It introduces new Google cloud technologies: Google Storage, Google Prediction API and BigQuery.

TRANSCRIPT

Page 1: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery
Page 2: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Introducción a las nuevas tecnologías de plataforma cloud de Google

APIs de Almacenamiento, Predicción y BigQuery

Chris SchalkSept 23rd, 2010

Page 3: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Agenda

Google Storage for DevelopersPrediction API & BigQuery

Page 4: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Google Storage for DevelopersStore your data in Google's cloud

Page 5: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

What Is Google Storage?

Store your data in Google's cloudany format, any amount, any time

You control access to your dataprivate, shared, or public

Access via Google APIs or 3rd party tools/libraries

Page 6: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Sample Use Cases

Static content hostinge.g. static html, images, music, video Backup and recoverye.g. personal data, business records Sharinge.g. share data with your customers Data storage for applicationse.g. used as storage backend for Android, AppEngine, Cloud based apps Storage for Computatione.g. BigQuery, Prediction API

Page 7: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Google Storage Benefits

High Performance and Scalability Backed by Google infrastructure

Strong Security and Privacy Control access to your data

Easy to UseGet started fast with Google & 3rd party tools

Page 8: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Google Storage Technical Details

RESTful API Verbs: GET, PUT, POST, HEAD, DELETE Resources: identified by URICompatible with S3

Buckets Flat containers

Objects

Any typeSize: 100 GB / object

Access Control for Google Accounts For individuals and groups

Two Ways to Authenticate Requests Sign request using access keys Web browser login

Page 9: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Performance and Scalability

Objects of any type and 100 GB / ObjectUnlimited numbers of objects, 1000s of buckets

All data replicated to multiple US data centersLeveraging Google's worldwide network for data delivery

Only you can use bucket names with your domain names Read-your-writes data consistencyRange Get

Page 10: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Security and Privacy Features

Key-based authenticationAuthenticated downloads from a web browser

Sharing with individualsGroup sharing via Google Groups

Access control for buckets and objectsSet Read/Write/List permissions

Page 11: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Demo

Tools:GS ManagerGSUtil

Upload / Download

Page 12: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Google Storage usage within Google

Haiti Relief Imagery USPTO data

Partner Reporting

Google BigQuery

Google Prediction API

Partner Reporting

Page 13: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Some Early Google Storage Adopters

Page 14: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Google Storage - Pricing

Storage$0.17/GB/Month

NetworkUpload - $0.10/GBDownload

$0.15/GB Americas / EMEA$0.30/GB APAC

RequestsPUT, POST, LIST - $0.01 / 1000 RequestsGET, HEAD - $0.01 / 10000 Requests

Page 15: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Google Storage - Availability

Limited preview in US currently 100GB free storage and network from Google per accountSign up for waitlist at http://code.google.com/apis/storage/

Note: Non US preview available on case-by-case basis

Page 16: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Google Storage Summary

Store any kind of data using Google's cloud infrastructureEasy to Use APIs

Many available tools and libraries

gsutil, GS Manager3rd party:

Boto, CloudBerry, CyberDuck, JetS3t, and more

Page 17: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Google Prediction APIGoogle's prediction engine in the cloud

Page 18: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Introducing the Google Prediction API

Google's sophisticated machine learning technologyAvailable as an on-demand RESTful HTTP web service

Page 19: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

How does it work?

"english" The quick brown fox jumped over the lazy dog.

"english" To err is human, but to really foul things up you need a computer.

"spanish" No hay mal que por bien no venga.

"spanish" La tercera es la vencida.

? To be or not to be, that is the question.

? La fe mueve montañas.

The Prediction APIfinds relevantfeatures in the sample data during training.

The Prediction APIlater searches forthose featuresduring prediction.

Page 20: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

A virtually endless number of applications...

CustomerSentiment

TransactionRisk

SpeciesIdentification

MessageRouting

Legal DocketClassification

SuspiciousActivity

Work RosterAssignment

RecommendProducts

PoliticalBias

UpliftMarketing

EmailFiltering

Diagnostics

InappropriateContent

CareerCounselling

ChurnPrediction

... and many more ...

Page 21: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

A Prediction API ExampleAutomatically categorize and respond to emails by language

Customer: ACME Corp, a multinational organizationGoal: Respond to customer emails in their languageData: Many emails, tagged with their languages�Outcome: Predict language and respond accordingly

Page 22: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Using the Prediction API

1. Upload

2. Train

Upload your training data toGoogle Storage

Build a model from your data

Make new predictions3. Predict

A simple three step process...

Page 23: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Step 1: UploadUpload your training data to Google Storage

Training data: outputs and input features Data format: comma separated value format (CSV)

"english","To err is human, but to really ...""spanish","No hay mal que por bien no venga."...Upload to Google Storagegsutil cp ${data} gs://yourbucket/${data}

Page 24: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Step 2: TrainCreate a new model by training on data

To train a model:

POST prediction/v1.1/training?data=mybucket%2Fmydata

Training runs asynchronously. To see if it has finished:

GET prediction/v1.1/training/mybucket%2Fmydata

{"data":{ "data":"mybucket/mydata", "modelinfo":"estimated accuracy: 0.xx"}}}

Page 25: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Step 3: PredictApply the trained model to make predictions on new data

POST prediction/v1.1/query/mybucket%2Fmydata/predict

{ "data":{ "input": { "text" : [ "J'aime X! C'est le meilleur" ]}}}

Page 26: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Step 3: PredictApply the trained model to make predictions on new data

POST prediction/v1.1/query/mybucket%2Fmydata/predict

{ "data":{ "input": { "text" : [ "J'aime X! C'est le meilleur" ]}}}

{ data : { "kind" : "prediction#output", "outputLabel":"French", "outputMulti" :[ {"label":"French", "score": x.xx} {"label":"English", "score": x.xx} {"label":"Spanish", "score": x.xx}]}}

Page 27: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Step 3: PredictApply the trained model to make predictions on new data

import httplib

header = {"Content-Type" : "application/json"}#...put new data in JSON format in params variableconn = httplib.HTTPConnection("www.googleapis.com")conn.request("POST", "/prediction/v1.1/query/mybucket%2Fmydata/predict", params, header)print conn.getresponse()

Page 28: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Prediction API Capabilities

DataInput Features: numeric or unstructured textOutput: up to hundreds of discrete categories

TrainingMany machine learning techniquesAutomatically selected Performed asynchronously

Access from many platforms:Web app from Google App EngineApps Script (e.g. from Google Spreadsheet)Desktop app

Page 29: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Prediction API v1.1 - new features

Updated SyntaxMulti-category prediction

Tag entry with multiple labelsContinuous Output

Finer grained prediction rankings based on multiple labels Mixed Inputs

Both numeric and text inputs are now supported

Can combine continuous output with mixed inputs

Page 30: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Google BigQueryInteractive analysis of large datasets in Google's cloud

Page 31: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Introducing Google BigQuery

Google's large data adhoc analysis technologyAnalyze massive amounts of data in seconds

Simple SQL-like query language Flexible access

REST APIs, JSON-RPC, Google Apps Script

Page 32: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Why BigQuery?

Working with large data is a challenge

Page 33: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Many Use Cases ...

Spam TrendsDetection

Web Dashboards

Network Optimization

Interactive Tools

Page 34: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Key Capabilities of BigQuery

Scalable: Billions of rowsFast: Response in seconds

Simple: Queries in SQL

Web ServiceRESTJSON-RPCGoogle App Scripts

Page 35: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Using BigQuery

1. Upload

2. Import

Upload your raw data toGoogle Storage

Import raw data into BigQuery table

Perform SQL queries on table

3. Query

Another simple three step process...

Page 36: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Writing Queries

Compact subset of SQLSELECT ... FROM ...WHERE ... GROUP BY ... ORDER BY ...LIMIT ...;

Common functionsMath, String, Time, ...

Statistical approximationsTOPCOUNT DISTINCT

Page 37: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

BigQuery via REST

GET /bigquery/v1/tables/{table name}

GET /bigquery/v1/query?q={query}

Sample JSON Reply:{ "results": { "fields": { [ {"id":"COUNT(*)","type":"uint64"}, ... ] }, "rows": [ {"f":[{"v":"2949"}, ...]}, {"f":[{"v":"5387"}, ...]}, ... ] }}

Also supports JSON-RPC

Page 38: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Security and Privacy

Standard Google AuthenticationClient LoginOAuthAuthSub

HTTPS supportprotects your credentialsprotects your data

Relies on Google Storage to manage access

Page 39: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Large Data Analysis Example

Wikimedia Revision history data from: http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-meta-history.xml.7z

Wikimedia Revision History

Page 40: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Using BigQuery ShellPython DB API 2.0 + B. Clapper's sqlcmdhttp://www.clapper.org/software/python/sqlcmd/

Page 41: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

BigQuery from a Spreadsheet

Page 42: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

BigQuery from a Spreadsheet

Page 43: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Recap

Google StorageHigh speed data storage on Google Cloud

Prediction APIGoogle's machine learning technology able to predict outcomes based on sample data

BigQueryInteractive analysis of very large data setsSimple SQL query language access

Page 44: Intro to new Google cloud technologies: Google Storage, Prediction API, BigQuery

Further info available at:

Google Storage for Developershttp://code.google.com/apis/storage

Prediction APIhttp://code.google.com/apis/prediction

BigQueryhttp://code.google.com/apis/bigquery