consuming the datalake€¦ · aws kms aws cloudtrail manage & secure aws iam amazon cloudwatch...

28
© 2020, Amazon Web Services, Inc. or its Affiliates. Data Lake, Reporting, Analytics, Machine Learning Consuming The DataLake

Upload: others

Post on 20-Jul-2020

26 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Data Lake, Reporting, Analytics, Machine Learning

Consuming The DataLake

Page 2: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Session’s Focus

Catalog & Search Access & User Interfaces

Data Ingestion

Analytics & Serving

S3

Amazon DynamoDB

Amazon Elasticsearch Service

AWS AppSync

AmazonAPI Gateway

AmazonCognito

AWS KMS

AWSCloudTrail

Manage & Secure

AWS IAM

Amazon CloudWatch

AWS Snowball

AWS Storage Gateway

Amazon Kinesis Data

Firehose

AWS Direct Connect

AWS Database Migration

Service

AmazonAthena

Amazon EMR

AWS Glue

Amazon Redshift

Amazon DynamoDB

AmazonQuickSight

AmazonKinesis

Amazon Elasticsearch

Service

Amazon Neptune

AmazonRDS

Central StorageScalable, secure, cost-

effective

AWS Glue

AWSDataSync

AWS Transfer for SFTP

Amazon S3 Transfer Acceleration

Page 3: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Anti-Pattern

RDBMS

Everything

Query

Page 4: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Also an Anti-Pattern

RDBMSData Lake

Everything

Query

Page 5: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

One tool to rule them all

Page 6: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Where do I start?

• Understand your data• Data Structure, Access patterns & characteristics,

Temperature, Cost, Size

• Know your audience• Business Users, Data Scientists, Developers

• Select the right service

Page 7: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Archival

In-memory Warehouse

NoSQL

Hot data Warm data Cold data

Dat

a St

ruct

ure

Low

High

Object

Search

Understand your Data

Latency

Data volumeHighLow

Request rate

Cost / GBHigh Low

Page 8: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Amazon ElastiCache

Amazon ES

AmazonDynamoDB Amazon S3 Amazon Glacier

Hot data Warm data Cold data

Dat

a St

ruct

ure

Low

High

Understand your Data

Latency

Data volumeHighLow

Request rate

Cost / GBHigh Low

NoSQLObject

Archival

Search

In-MemoryWarehouse

Amazon Redshift

Page 9: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Who is your audience?

Page 10: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

PRIORITIES NEEDS

Creating engaging visual and narrative journeys for analytical solutionsData Visualizer

Manages data as a product. Ensures freshness and consistency of data; understands lineage and compliance needs; treats DS as customers

Data Product Manager

Monitoring for reliability, quickly diagnose deployment or availability issues

DevOps Engineer

ROLE

VisualizationDashboardsReporting

Reports – data quality, errors

Ad hoc queryingDashboards

Makes sense of data, generates and communicates insights to improve or create business processes, creates predictive ML models to support them

Data Scientist Ad hoc querying Robust ML tools

Builds scalable pipelines, transforms and loads data into structures complete with metadata that can be readily consumed by DS

Data Engineer

Ad hoc queryingQuick visualization

Vetting the priortization and ROI, funding projects, providing ongoing feedback

Business Sponsor

ReportingDashboards

Page 11: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Enabling your ConsumersDashboards – Reports – Ad-Hoc Analysis – Machine Learning

Page 12: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Dashboards - Near Real-time

Visual Representation of key metrics that change over time• Data structure - Low• Usage - Near real-time visualization• Data temperature - Hot

Available Services:

AWS Lambda Amazon DynamoDB Amazon KinesisData Streams

Amazon Elasticsearch Service

Page 13: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Dashboards – Near Real-time

Amazon EMR

AWS Glue

OR

ETL

Data Lake

AmazonS3

Raw Bucket Transformed Data Bucket

DynamoDBUsers

EC2

Containers

Serverless

OR

OR

Web serving layer

Page 14: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Dashboards + Search

Amazon EMR

AWS Glue

OR

ETL

Data Lake

AmazonS3

Raw Bucket Transformed Data Bucket

DynamoDB

Users

Dynamo Streams

Amazon Kinesis Firehose

AWSLambda

AmazonElasticsearch

Page 15: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Ad Hoc AnalysisInformation sought on an as-needed basis• Usage - Dynamic Data Querying• Data structure - Case based• Data temperature - Medium - cold

Available Services:

Amazon RedshiftSpectrum

Athena Amazon EMR

Amazon ElasticSearch

Page 16: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Reports and Ad-Hoc Analysis

OR

Amazon Redshift

Amazon Athena

Amazon EMR

AWS Glue

OR

ETL

Data Lake

AmazonS3

Raw Bucket Transformed Data Bucket Amazon Redshift

Spectrum

AmazonQuickSight

Page 17: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Machine Learning

Data labeled with outcomes to train predication models• Usage - Machine learning data preparation• Data structure - Case based• Data temperature - Medium - cold

Available Services:

Amazon EMR

Amazon SageMaker

Page 18: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Machine Learning

Amazon EMR

Users

Amazon EMR

AWS Glue

OR

ETL

Data Lake

AmazonS3

Raw Bucket Transformed Data Bucket

Amazon SageMaker

Page 19: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Reports

Static representations of data rendered at a point in time• Usage - Point in time data extraction• Data structure - High• Data temperature – Medium - cold

Available Services:

Amazon Redshift Amazon Athena Amazon QuickSight

Page 20: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Amazon Redshift

Data Scientists & Developers

Business UsersAmazon Redshift

OR

Report and Data Mart

Amazon EMR

AWS Glue

OR

ETL

Data Lake

AmazonS3

Raw Bucket Transformed Data Bucket

Amazon EMR Data Scientist

BI/BA Engineer

Amazon QuickSight

Page 21: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Processing & Analytics

Transactional & RDBMS

DynamoDB

NoSQL DB Relational DatabaseAurora

BI & Data Visualization

Kinesis Streams & Firehose

Batch

EMRHadoop, Spark,

Presto

RedshiftData Warehouse

AthenaQuery Service

AWS Batch

Predictive

Real-time

AWS LambdaApache Storm

on EMR

Apache Flinkon EMR

Spark Streaming on EMR

ElasticsearchService

Kinesis Analytics, Kinesis Streams

ElastiCache DAX

Page 22: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Services for security and governance

Compliance

AWS Artifact

Amazon Inspector

Amazon CloudHSM

Amazon Cognito

AWS CloudTrail

Security

Amazon GuardDuty

AWS Shield

AWS WAF

Amazon Macie

Amazon VPC

Encryption

AWS Certificate Manager

AWS Key Management Service

Encryption at rest

Encryption in transit

Bring your own keys, HSM support

Identity

AWS IAM

AWS SSO

Amazon Cloud Directory

AWS Directory Service

AWS Organizations

Customers need to have multiple levels of security, identity and access management, encryption, and compliance to secure their data lake

Page 23: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Data movement

Analytics

AWS for Analytics

+ 10 more

Amazon Redshift

Amazon EMR (Spark & Hadoop)

Amazon Athena

Amazon Elasticsearch Service

AmazonKinesis Data Analytics

AWS Glue (Spark & Python)

Amazon S3 & Amazon S3 Glacier

AWS Glue

AWS Lake Formation

Visualization & machine learning

Amazon QuickSight

AmazonSageMaker

AmazonComprehend

Amazon Lex

Amazon Polly

Amazon Rekognition

AmazonTranslate

AmazonTranscribe

Deep learning AMIs

AWS Database Migration Service | AWS Snowball | AWS Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka

Data lake infrastructure & management

Page 24: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Amazon SageMaker

Frameworks Interfaces

Amazon EC2 P3 & P3dn

AmazonEC2 C5

FPGASs AWS IoT Greengrass

Amazon ElasticInference

The Amazon ML stackBroadest & deepest set of capabilitiesAI services

ML frameworks & infrastructure

Amazon RekognitionImage

Amazon Polly

Transcribe

AmazonTranslate

AmazonComprehend

& Amazon Comprehend Medical

Amazon RekognitionVideo

Amazon Textract

AmazonForecast

Amazon Personalize

Amazon Lex

Vision Speech ChatbotsLanguage Forecasting Recommendations

Infrastructure

Pre-built algorithms & notebooks

Data labeling (Amazon SageMaker Ground Truth)

One-click model training & tuning

Optimization (NEO)

One-click deployment & hosting

Reinforcement learningAlgorithms & models (AWS Marketplace for ML)

Train DeployBuild

ML services

Page 25: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Summary

AWS enables you to build sophisticated big data applications • Retrospective, Real-time, Predictive

Understand who is the user

• Business user, Data Scientist & Developers

Use the right tool for the job• Data structure, latency, throughput, access patterns

Leverage AWS managed services• Scalable/elastic, available, reliable, secure, no/low admin

Page 26: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Tens of thousands of data lakes run on AWS across all industries

Page 27: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Infrastructure certifications

CSACloud Security Alliance Controls

ISO 9001Global Quality Standard

ISO 27001Security Management Controls

ISO 27017Cloud Specific Controls

ISO 27018Personal Data Protection

PCI DSS Level 1Payment Card Standards

SOC 1Audit Controls Report

SOC 2Security, Availability, & Confidentiality Report

SOC 3General Controls Report

Global United StatesCJISCriminal Justice Information Services

DoD SRGDoD Data Processing

FedRAMPGovernment Data Standards

FERPAEducational Privacy Act

FIPSGovernment Security Standards

FISMAFederal Information Security Management

GxPQuality Guidelines and Regulations

ISO FFIECFinancial Institutions Regulation

HIPPAProtected Health Information

ITARInternational Arms Regulations

MPAAProtected Media Content

NISTNational Institute of Standards and Technology

SEC Rule 17a-4(f)Financial DataStandards

VPAT/Section 508Accountability Standards

Asia Pacific

FISC [Japan]Financial Industry Information Systems

IRAP [Australia]Australian Security Standards

K-ISMS [Korea]Korean Information Security

MTCS Tier 3 [Singapore]Multi-Tier Cloud Security Standard

My Number Act [Japan]Personal Information Protection

Europe

C5 [Germany]Operational Security Attestation

Cyber Essentials Plus [UK]Cyber Threat Protection

G-Cloud [UK]UK Government Standards

IT-Grundschutz [Germany]Baseline Protection Methodology

X P

G

Page 28: Consuming The DataLake€¦ · AWS KMS AWS CloudTrail Manage & Secure AWS IAM Amazon CloudWatch AWS Snowball AWS Storage Gateway Amazon Kinesis Data Firehose AWS Direct Connect AWS

© 2020, Amazon Web Services, Inc. or its Affiliates.

Thank you!