full stack analytics on aws - london-summit-slides-2017.s3 ... summit... · elastic transcoder...

52
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sr Mgr, AWS Specialist Solution Architecture June 28 th , 2017 Full Stack Analytics on AWS Ian Meyers

Upload: lambao

Post on 17-Mar-2018

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Sr Mgr, AWS Specialist Solution Architecture

June 28th, 2017

Full Stack Analytics on AWSIan Meyers

Page 2: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Forces and Trends

Cost OptimizationLicensesHardwareData center and operations

Dark DataPrematurely discarding data

AgilityExperimentation (data & tools)Democratised Access to DataTime-to-first-results Terminate failed experiments early

From BI to Data ScienceIn-house data scienceFrom back office to product

Page 3: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Storage is the Gravity for Cloud Applications

Page 4: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Separation of Storage and Compute

Store all your data, forever, at every stage of its lifecycle Apply it using the appropriate technology

Page 5: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Storage is Job #1

Page 6: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Foundations: Storage, Discovery and Lifecycle

Secure, governed, scalable, cheap

Storage & CatalogSecure, cost-effective storage in Amazon

S3. Robust metadata in AWS Catalog

Page 7: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Amazon EFS

File

Amazon EBS Amazon EC2Instance Store

Block

Amazon S3 Amazon Glacier

Object

Data Transfer

AWS Direct Connect

AWS Snowball

ISV Connectors Amazon Kinesis

Firehose

S3 TransferAcceleration

StorageGateway

AWS Storage Platforms

Page 8: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Amazon S3 Amazon Glacier

Object

Object storage is foundational

EC2 Lambda EMR Data Pipeline Kinesis

CloudFront RDS DynamoDB RedShift

Database

AnalyticsCompute

Elastic Transcoder

Content Delivery

Page 9: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

S3 Data Lifecycle and Events

Standard

Active data Archive dataInfrequently accessed data

Standard - Infrequent Access

Amazon Glacier

Create

Delete

Page 10: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Augmenting storage with a data catalog

Page 11: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

AWS Glue: Components

Data Catalog

Crawl, store, search metadata in different data stores

Populate in a Hive metastore compliant catalog

Job Execution

Fully managed orchestration & execution of ETL jobs

Server-less execution model – no need to pre-provision resources

Job Authoring

Author, edit, share ETL jobs in using your favorite tools

Store, share, re-use ETL code/script with Git integration

Page 12: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Manage table metadata through a Hive metastore API or Hive SQL. Supported by tools such as Hive, Presto, Spark, etc.

We added a few extensions:§ Search metadata for data discovery§ Connection info – JDBC URLs, credentials§ Classification for identifying and parsing files§ Versioning of table metadata as schemas

evolve and other metadata are updated

Populate using Hive DDL, bulk import, or automatically through crawlers.

Glue Data Catalog

Page 13: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Crawlers: Auto-Populate Data Catalog

Automatic schema inference:• Built-in classifiers detect file type and

extract schema: record structure and data types.

• Add your own or share with others in the Glue community - It's all Grok and Python.

Auto-detects Hive-style partitions, grouping similar files into one table.

Run crawlers on schedule to discovernew data and schema changes.

Serverless – only pay when crawls run.

Page 14: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

semi-structuredper-file schema

semi-structured unified schema

identify file type and parse files

enumerateS3 objects

file 1

file 2

file N

…int

array

intchar

struct

char int

array

struct

char

bool int

int

arraybool int

char

char intcustom classifiers

app log parsermetrics parser

system classifiersJSON parser

CSV parserApache log parser

Crawlers: Automatic Schema Inference

Page 15: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •
Page 16: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Security is Job #0

Page 17: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Storage & CatalogSecure, cost-effective storage in Amazon

S3. Robust metadata in AWS Catalog

Page 18: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Data Access & AuthorisationGive your users easy and secure access

Storage & CatalogSecure, cost-effective storage in Amazon

S3. Robust metadata in AWS Catalog

Protect and SecureUse entitlements to ensure data is secure and users’ identities are verified

Page 19: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

AWS implements security at the data level, not tool-by-tool

IAM

AmazonS3

Amazon ElastiCache

AmazonDynamoDB

Amazon EMR

Amazon Kinesis

AmazonAthena

Service API Access

Page 20: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Third party ecosystem security tools

AmazonS3

AWSCloudTrail

http://amzn.to/2tSimHjAmazonAthena

Access Logging

API Logging

Access Log

Analytics

IAM

Amazon EMR

http://amzn.to/2si6RqS

+ storage level support for access logging and audit

Page 21: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Additional S3 Security PracticesUse S3 Bucket policies:• Restrict access by IP

address• Restrict deletes• Enforce encryption use

and

Restrict deletes to require MFA Authentication

Use Versioning!!!

Page 22: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

AWSServer-sideencryptionAWSmanagedkeyinfrastructure

AWSKeyManagementServiceAutomatedkeyrotation&auditingIntegrationwithotherAWSservices

AWSCloudHSMDedicatedTenancySafeNet LunaSAHSMDeviceCommonCriteriaEAL4+,NISTFIPS140-2

Encryption Options

Page 23: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Extensible & hybrid crypto integration for AWS services

class myCrypt implements EncryptionMaterialsProvider

Amazon Redshift

On PremisesHSM

Page 24: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Data Access & AuthorisationGive your users easy and secure access

Storage & CatalogSecure, cost-effective storage in Amazon

S3. Robust metadata in AWS Catalog

Protect and SecureUse entitlements to ensure data is secure and users’ identities are verified

Page 25: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Kinesis Firehose

Data Access & AuthorisationGive your users easy and secure access

Data IngestionGet your data into S3 quickly and securely

Storage & CatalogSecure, cost-effective storage in Amazon

S3. Robust metadata in AWS Catalog

Protect and SecureUse entitlements to ensure data is secure and users’ identities are verified

Page 26: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Data Ingestion into S3

Page 27: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

S3 Transfer Acceleration

S3 BucketAWS EdgeLocation

Uploader

OptimizedThroughput!

Typically 50%-400% faster

Change your endpoint, not your code

No firewall exceptions or client software required

59 global edge locations

Page 28: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Rio De Janeiro

Warsaw New York Atlanta Madrid Virginia Melbourne Paris Los Angeles

Seattle Tokyo Singapore

Tim

e [h

rs.]

500 GB upload from these edge locations to a bucket in Singapore

Public Internet

How Fast is S3 Transfer Acceleration?S3 Transfer Acceleration

Page 29: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Tip: Parallelizing PUTs with Multipart Uploads

• Increase aggregate throughput by parallelizing PUTs on high-bandwidth networks• Move the bottleneck to the network,

where it belongs • Increase resiliency to network errors;

fewer large restarts on error-prone networks

• Performed automatically by the aws-cli and ’TransferManager’ modules

Page 30: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Write Database Changes to S3 with DMS

<schema_name>/<table_name>/LOAD001.csv <schema_name>/<table_name>/LOAD002.csv <schema_name>/<table_name>/<time-stamp>.csv

Full Load

Change Data Capture

Page 31: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Kinesis Firehose

Data Access & AuthorisationGive your users easy and secure access

Data IngestionGet your data into S3 quickly and securely

Storage & CatalogSecure, cost-effective storage in Amazon

S3. Robust metadata in AWS Catalog

Protect and SecureUse entitlements to ensure data is secure and users’ identities are verified

Page 32: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Kinesis Firehose

AthenaQuery Service Glue

Data Access & AuthorisationGive your users easy and secure access

Data IngestionGet your data into S3 quickly and securely

Processing & AnalyticsUse of predictive and prescriptive

analytics to gain better understanding

Storage & CatalogSecure, cost-effective storage in Amazon

S3. Robust metadata in AWS Catalog

Protect and SecureUse entitlements to ensure data is secure and users’ identities are verified

Machine LearningPredictive analytics Amazon AI

Page 33: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Glue: Managed ETL

• Serverless job execution

• Visual Workflowor

• Directly edit PySparktransformations

• Monitoring, metrics and notifications

Page 34: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Glue: Managed ETL

• Combine with AWS Lambda and AWS Step Functions for complex data orchestrations

• Automatically maintain data catalog entries

• Track lineage of data over time

Page 35: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Analysing streaming data…

Page 36: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Amazon Kinesis Analytics

• Interact with streaming data in real time using SQL• Build fully managed and elastic stream processing

applications that process data for real-time visualizations and alarms

Page 37: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

SELECT STREAM author, count(author) OVER ONE_MINUTE

FROM Tweets WINDOW ONE_MINUTE AS(PARTITION BY author RANGE INTERVAL '1' MINUTE PRECEDING)WHERE text LIKE ‘%#AwsLondonSummit%';

Amazon Kinesis Analytics – Simple SQL Interface

Page 38: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Analysing streaming data… and when at rest

Page 39: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Amazon Athena

• No Infrastructure or administration• Zero Spin up time• Transparent upgrades• Query data in its raw format

• AVRO, Text, CSV, JSON, weblogs, AWS service logs• Convert to an optimized form like ORC or Parquet for the

best performance and lowest cost• No loading of data, no ETL required

• Stream data from directly from Amazon S3, take advantage of Amazon S3 durability and availability

Page 40: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Simple Query editor with syntax highlighting

and autocomplete

Data Catalog

Query History, Saved Queries, and Catalog Management

Page 41: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

QuickSight allows you to connect to data from a wide variety of AWS, third-party, and on-premises sources including Amazon Athena

Amazon RDS

Amazon S3

Amazon Redshift

Amazon Athena

Using Amazon Athena with Amazon QuickSight

Page 42: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •
Page 43: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Building smarter applications

Page 44: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Add Machine Learning CapabilitiesAmazon Machine Learning ServiceBatch and online predictionsTrain using data in S3, RDS and Redshift

Amazon EMRComprehensive machine learning libraries (eg Spark mLib, Anaconda)Provision analytics clusters in minutes, autoscale with data volume or query demand

Page 45: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Amazon AI Services

Amazon Polly – Lifelike Text-to-Speech47 voices, 24 languagesLow-latency, real time

Amazon Rekognition – Image AnalysisObject and scene detectionFacial analysis

Amazon Lex – Conversational EngineSpeech and text recognitionEnterprise connectors

Page 46: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Let’s hear from Polly..

Page 47: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Demographic Data

Facial Landmarks

Sentiment Expressed

Image Quality

Facial Analysis with Rekognition

Brightness: 25.84Sharpness: 160

General Attributes

Page 48: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Up to ~40k CUDA coresPre-configured CUDA driversJupyter notebook with Python2, Python3, Anaconda

CloudFormation TemplateAWS Marketplace – one-click deploy

AWS Deep Learning AMI

Page 49: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Scaling Distributed Experiments

• Inception v3 model

• Increasing machines

from 1 to 47

• 2x faster than

TensorFlow if using more

than 10 machines

Page 50: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Example MXNet User | TuSimple|Autonomous Driving

Page 51: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Kinesis Firehose

AthenaQuery Service Glue

Machine LearningPredictive analytics

Data Access & AuthorisationGive your users easy and secure access

Data IngestionGet your data into S3 quickly and securely

Processing & AnalyticsUse of predictive and prescriptive

analytics to gain better understanding

Storage & CatalogSecure, cost-effective storage in Amazon

S3. Robust metadata in AWS Catalog

Protect and SecureUse entitlements to ensure data is secure and users’ identities are verified

Amazon AI

Page 52: Full Stack Analytics on AWS - london-summit-slides-2017.s3 ... Summit... · Elastic Transcoder Content Delivery. ... Crawl, store, search metadata in different data stores ... •

Full Stack Analytics on AWS