consuming the datalake€¦ · aws kms aws cloudtrail manage & secure aws iam amazon cloudwatch...
TRANSCRIPT
© 2020, Amazon Web Services, Inc. or its Affiliates.
Data Lake, Reporting, Analytics, Machine Learning
Consuming The DataLake
© 2020, Amazon Web Services, Inc. or its Affiliates.
Session’s Focus
Catalog & Search Access & User Interfaces
Data Ingestion
Analytics & Serving
S3
Amazon DynamoDB
Amazon Elasticsearch Service
AWS AppSync
AmazonAPI Gateway
AmazonCognito
AWS KMS
AWSCloudTrail
Manage & Secure
AWS IAM
Amazon CloudWatch
AWS Snowball
AWS Storage Gateway
Amazon Kinesis Data
Firehose
AWS Direct Connect
AWS Database Migration
Service
AmazonAthena
Amazon EMR
AWS Glue
Amazon Redshift
Amazon DynamoDB
AmazonQuickSight
AmazonKinesis
Amazon Elasticsearch
Service
Amazon Neptune
AmazonRDS
Central StorageScalable, secure, cost-
effective
AWS Glue
AWSDataSync
AWS Transfer for SFTP
Amazon S3 Transfer Acceleration
© 2020, Amazon Web Services, Inc. or its Affiliates.
Anti-Pattern
RDBMS
Everything
Query
© 2020, Amazon Web Services, Inc. or its Affiliates.
Also an Anti-Pattern
RDBMSData Lake
Everything
Query
© 2020, Amazon Web Services, Inc. or its Affiliates.
One tool to rule them all
© 2020, Amazon Web Services, Inc. or its Affiliates.
Where do I start?
• Understand your data• Data Structure, Access patterns & characteristics,
Temperature, Cost, Size
• Know your audience• Business Users, Data Scientists, Developers
• Select the right service
© 2020, Amazon Web Services, Inc. or its Affiliates.
Archival
In-memory Warehouse
NoSQL
Hot data Warm data Cold data
Dat
a St
ruct
ure
Low
High
Object
Search
Understand your Data
Latency
Data volumeHighLow
Request rate
Cost / GBHigh Low
© 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon ElastiCache
Amazon ES
AmazonDynamoDB Amazon S3 Amazon Glacier
Hot data Warm data Cold data
Dat
a St
ruct
ure
Low
High
Understand your Data
Latency
Data volumeHighLow
Request rate
Cost / GBHigh Low
NoSQLObject
Archival
Search
In-MemoryWarehouse
Amazon Redshift
© 2020, Amazon Web Services, Inc. or its Affiliates.
Who is your audience?
© 2020, Amazon Web Services, Inc. or its Affiliates.
PRIORITIES NEEDS
Creating engaging visual and narrative journeys for analytical solutionsData Visualizer
Manages data as a product. Ensures freshness and consistency of data; understands lineage and compliance needs; treats DS as customers
Data Product Manager
Monitoring for reliability, quickly diagnose deployment or availability issues
DevOps Engineer
ROLE
VisualizationDashboardsReporting
Reports – data quality, errors
Ad hoc queryingDashboards
Makes sense of data, generates and communicates insights to improve or create business processes, creates predictive ML models to support them
Data Scientist Ad hoc querying Robust ML tools
Builds scalable pipelines, transforms and loads data into structures complete with metadata that can be readily consumed by DS
Data Engineer
Ad hoc queryingQuick visualization
Vetting the priortization and ROI, funding projects, providing ongoing feedback
Business Sponsor
ReportingDashboards
© 2020, Amazon Web Services, Inc. or its Affiliates.
Enabling your ConsumersDashboards – Reports – Ad-Hoc Analysis – Machine Learning
© 2020, Amazon Web Services, Inc. or its Affiliates.
Dashboards - Near Real-time
Visual Representation of key metrics that change over time• Data structure - Low• Usage - Near real-time visualization• Data temperature - Hot
Available Services:
AWS Lambda Amazon DynamoDB Amazon KinesisData Streams
Amazon Elasticsearch Service
© 2020, Amazon Web Services, Inc. or its Affiliates.
Dashboards – Near Real-time
Amazon EMR
AWS Glue
OR
ETL
Data Lake
AmazonS3
Raw Bucket Transformed Data Bucket
DynamoDBUsers
EC2
Containers
Serverless
OR
OR
Web serving layer
© 2020, Amazon Web Services, Inc. or its Affiliates.
Dashboards + Search
Amazon EMR
AWS Glue
OR
ETL
Data Lake
AmazonS3
Raw Bucket Transformed Data Bucket
DynamoDB
Users
Dynamo Streams
Amazon Kinesis Firehose
AWSLambda
AmazonElasticsearch
© 2020, Amazon Web Services, Inc. or its Affiliates.
Ad Hoc AnalysisInformation sought on an as-needed basis• Usage - Dynamic Data Querying• Data structure - Case based• Data temperature - Medium - cold
Available Services:
Amazon RedshiftSpectrum
Athena Amazon EMR
Amazon ElasticSearch
© 2020, Amazon Web Services, Inc. or its Affiliates.
Reports and Ad-Hoc Analysis
OR
Amazon Redshift
Amazon Athena
Amazon EMR
AWS Glue
OR
ETL
Data Lake
AmazonS3
Raw Bucket Transformed Data Bucket Amazon Redshift
Spectrum
AmazonQuickSight
© 2020, Amazon Web Services, Inc. or its Affiliates.
Machine Learning
Data labeled with outcomes to train predication models• Usage - Machine learning data preparation• Data structure - Case based• Data temperature - Medium - cold
Available Services:
Amazon EMR
Amazon SageMaker
© 2020, Amazon Web Services, Inc. or its Affiliates.
Machine Learning
Amazon EMR
Users
Amazon EMR
AWS Glue
OR
ETL
Data Lake
AmazonS3
Raw Bucket Transformed Data Bucket
Amazon SageMaker
© 2020, Amazon Web Services, Inc. or its Affiliates.
Reports
Static representations of data rendered at a point in time• Usage - Point in time data extraction• Data structure - High• Data temperature – Medium - cold
Available Services:
Amazon Redshift Amazon Athena Amazon QuickSight
© 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon Redshift
Data Scientists & Developers
Business UsersAmazon Redshift
OR
Report and Data Mart
Amazon EMR
AWS Glue
OR
ETL
Data Lake
AmazonS3
Raw Bucket Transformed Data Bucket
Amazon EMR Data Scientist
BI/BA Engineer
Amazon QuickSight
© 2020, Amazon Web Services, Inc. or its Affiliates.
Processing & Analytics
Transactional & RDBMS
DynamoDB
NoSQL DB Relational DatabaseAurora
BI & Data Visualization
Kinesis Streams & Firehose
Batch
EMRHadoop, Spark,
Presto
RedshiftData Warehouse
AthenaQuery Service
AWS Batch
Predictive
Real-time
AWS LambdaApache Storm
on EMR
Apache Flinkon EMR
Spark Streaming on EMR
ElasticsearchService
Kinesis Analytics, Kinesis Streams
ElastiCache DAX
© 2020, Amazon Web Services, Inc. or its Affiliates.
Services for security and governance
Compliance
AWS Artifact
Amazon Inspector
Amazon CloudHSM
Amazon Cognito
AWS CloudTrail
Security
Amazon GuardDuty
AWS Shield
AWS WAF
Amazon Macie
Amazon VPC
Encryption
AWS Certificate Manager
AWS Key Management Service
Encryption at rest
Encryption in transit
Bring your own keys, HSM support
Identity
AWS IAM
AWS SSO
Amazon Cloud Directory
AWS Directory Service
AWS Organizations
Customers need to have multiple levels of security, identity and access management, encryption, and compliance to secure their data lake
© 2020, Amazon Web Services, Inc. or its Affiliates.
Data movement
Analytics
AWS for Analytics
+ 10 more
Amazon Redshift
Amazon EMR (Spark & Hadoop)
Amazon Athena
Amazon Elasticsearch Service
AmazonKinesis Data Analytics
AWS Glue (Spark & Python)
Amazon S3 & Amazon S3 Glacier
AWS Glue
AWS Lake Formation
Visualization & machine learning
Amazon QuickSight
AmazonSageMaker
AmazonComprehend
Amazon Lex
Amazon Polly
Amazon Rekognition
AmazonTranslate
AmazonTranscribe
Deep learning AMIs
AWS Database Migration Service | AWS Snowball | AWS Snowmobile | Kinesis Data Firehose | Kinesis Data Streams | Managed Streaming for Kafka
Data lake infrastructure & management
© 2020, Amazon Web Services, Inc. or its Affiliates.
Amazon SageMaker
Frameworks Interfaces
Amazon EC2 P3 & P3dn
AmazonEC2 C5
FPGASs AWS IoT Greengrass
Amazon ElasticInference
The Amazon ML stackBroadest & deepest set of capabilitiesAI services
ML frameworks & infrastructure
Amazon RekognitionImage
Amazon Polly
Transcribe
AmazonTranslate
AmazonComprehend
& Amazon Comprehend Medical
Amazon RekognitionVideo
Amazon Textract
AmazonForecast
Amazon Personalize
Amazon Lex
Vision Speech ChatbotsLanguage Forecasting Recommendations
Infrastructure
Pre-built algorithms & notebooks
Data labeling (Amazon SageMaker Ground Truth)
One-click model training & tuning
Optimization (NEO)
One-click deployment & hosting
Reinforcement learningAlgorithms & models (AWS Marketplace for ML)
Train DeployBuild
ML services
© 2020, Amazon Web Services, Inc. or its Affiliates.
Summary
AWS enables you to build sophisticated big data applications • Retrospective, Real-time, Predictive
Understand who is the user
• Business user, Data Scientist & Developers
Use the right tool for the job• Data structure, latency, throughput, access patterns
Leverage AWS managed services• Scalable/elastic, available, reliable, secure, no/low admin
© 2020, Amazon Web Services, Inc. or its Affiliates.
Tens of thousands of data lakes run on AWS across all industries
© 2020, Amazon Web Services, Inc. or its Affiliates.
Infrastructure certifications
CSACloud Security Alliance Controls
ISO 9001Global Quality Standard
ISO 27001Security Management Controls
ISO 27017Cloud Specific Controls
ISO 27018Personal Data Protection
PCI DSS Level 1Payment Card Standards
SOC 1Audit Controls Report
SOC 2Security, Availability, & Confidentiality Report
SOC 3General Controls Report
Global United StatesCJISCriminal Justice Information Services
DoD SRGDoD Data Processing
FedRAMPGovernment Data Standards
FERPAEducational Privacy Act
FIPSGovernment Security Standards
FISMAFederal Information Security Management
GxPQuality Guidelines and Regulations
ISO FFIECFinancial Institutions Regulation
HIPPAProtected Health Information
ITARInternational Arms Regulations
MPAAProtected Media Content
NISTNational Institute of Standards and Technology
SEC Rule 17a-4(f)Financial DataStandards
VPAT/Section 508Accountability Standards
Asia Pacific
FISC [Japan]Financial Industry Information Systems
IRAP [Australia]Australian Security Standards
K-ISMS [Korea]Korean Information Security
MTCS Tier 3 [Singapore]Multi-Tier Cloud Security Standard
My Number Act [Japan]Personal Information Protection
Europe
C5 [Germany]Operational Security Attestation
Cyber Essentials Plus [UK]Cyber Threat Protection
G-Cloud [UK]UK Government Standards
IT-Grundschutz [Germany]Baseline Protection Methodology
X P
G
© 2020, Amazon Web Services, Inc. or its Affiliates.
Thank you!