simplify big data with aws
TRANSCRIPT
![Page 1: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/1.jpg)
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
@julsimon
Webinar “Salon du Big Data” 02/03/2016
Simplify Big Data with AWS
Julien Simon, Principal Technical Evangelist
![Page 2: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/2.jpg)
Simplify Big Data Processing
ingest / collect
store process / analyze
consume / visualize
Time to Answer (Latency) Throughput
Cost
![Page 3: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/3.jpg)
Collect / Ingest
![Page 4: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/4.jpg)
Types of Data
• Transactional • Database reads & writes (OLTP) • Cache
• Search • Logs • Streams
• File • Log files (/var/log) • Log collectors & frameworks
• Stream • Log records • Sensors & IoT data
Database
File Storage
Stream Storage
A
iOS Android
Web Apps
Logstash
Logg
ing
IoT
Appl
icat
ions
Transactional Data
File Data
Stream Data
Mobile Apps
Search Data
Search
Collect Store Lo
ggin
g Io
T
![Page 5: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/5.jpg)
Store
![Page 6: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/6.jpg)
Stream Storage
A
iOS Android
Web Apps
Logstash
Amazon RDS
Amazon DynamoDB
Amazon ES
AmazonS3
Apache Kafka
AmazonGlacier
AmazonKinesis
AmazonDynamoDB
Amazon ElastiCache
Sear
ch
SQL
NoS
QL
Cac
he
Stre
am S
tora
ge
File
Sto
rage
Transactional Data
File Data
Stream Data
Mobile Apps
Search Data
Database
File Storage
Search
Collect Store Lo
ggin
g Io
T Ap
plic
atio
ns
ü
![Page 7: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/7.jpg)
File Storage
A
iOS Android
Web Apps
Logstash
Amazon RDS
Amazon DynamoDB
Amazon ES
AmazonS3
Apache Kafka
AmazonGlacier
AmazonKinesis
AmazonDynamoDB
Amazon ElastiCache
Sear
ch
SQL
NoS
QL
Cac
he
Stre
am S
tora
ge
File
Sto
rage
Transactional Data
File Data
Stream Data
Mobile Apps
Search Data
Database
Search
Collect Store Lo
ggin
g Io
T Ap
plic
atio
ns
ü
![Page 8: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/8.jpg)
Database + Search
Tier
A AmazonS3
iOS Android
Web Apps
Logstash
Amazon RDS / Aurora
Amazon DynamoDB
Amazon ES
AmazonS3
Apache Kafka
AmazonGlacier
AmazonKinesis
AmazonDynamoDB
Amazon ElastiCache
Sear
ch
SQL
NoS
QL
Cac
he
Stre
am S
tora
ge
File
Sto
rage
Transactional Data
File Data
Stream Data
Mobile Apps
Search Data
Collect Store ü
Logg
ing
IoT
Appl
icat
ions
![Page 9: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/9.jpg)
Database + Search Tier Anti-pattern
RDBMS
Database + Search Tier
Applications
![Page 10: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/10.jpg)
What Data Store Should I Use? Amazon ElastiCache
Amazon DynamoDB
Amazon Aurora
Amazon Elasticsearch
Amazon EMR (HDFS)
Amazon S3 Amazon Glacier
Average latency
ms ms ms, sec ms,sec sec,min,hrs ms,sec,min (~ size)
hrs
Data volume GB GB–TBs (no limit)
GB–TB (64 TB Max)
GB–TB GB–PB (~nodes)
MB–PB (no limit)
GB–PB (no limit)
Item size B-KB KB (400 KB max)
KB (64 KB)
KB (1 MB max)
MB-GB KB-GB (5 TB max)
GB (40 TB max)
Request rate High - Very High
Very High (no limit)
High High Low – Very High
Low – Very High (no limit)
Very Low
Storage cost GB/month
$$ ¢¢ ¢¢ ¢¢
¢ ¢ ¢/10
Durability Low - Moderate
Very High Very High High High Very High Very High
Hot Data Warm Data Cold Data
Hot Data Warm Data Cold Data
![Page 11: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/11.jpg)
Process / Analyze
![Page 12: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/12.jpg)
Analyze A
iOS Android
Web Apps
Logstash
Amazon RDS
Amazon DynamoDB
Amazon ES
AmazonS3
Apache Kafka
AmazonGlacier
AmazonKinesis
AmazonDynamoDB
Amazon Redshift
Impala
Pig
Amazon ML
Streaming
AmazonKinesis
AWS Lambda
Amaz
on E
last
ic M
apR
educ
e
Amazon ElastiCache
Sear
ch
SQL
NoS
QL
Cac
he
Stre
am P
roce
ssin
g Ba
tch
Inte
ract
ive
Logg
ing
Stre
am S
tora
ge
IoT
Appl
icat
ions
File
Sto
rage
Hot
Cold
Warm
Hot
Hot
ML
Transactional Data
File Data
Stream Data
Mobile Apps
Search Data
Collect Store Analyze ü ü
![Page 13: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/13.jpg)
Analysis Tools and Frameworks
Machine Learning • Mahout, Spark ML, Amazon ML
Interactive Analytics • Amazon Redshift, Presto, Impala, Spark
Batch Processing • MapReduce, Hive, Pig, Spark
Stream Processing • Micro-batch: Spark Streaming, KCL, Hive, Pig • Real-time: Storm, AWS Lambda, KCL
Amazon Redshift
Impala
Pig
Amazon Machine Learning
Streaming
AmazonKinesis
AWS Lambda
Amaz
on E
last
ic M
apR
educ
e
Stre
am P
roce
ssin
g Ba
tch
Inte
ract
ive
ML
Analyze
![Page 14: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/14.jpg)
What Data Processing Technology Should I Use? AmazonRedshift
Impala Presto Spark Hive
Query Latency
Low Low Low Low Medium (Tez) – High (MapReduce)
Durability High High High High High
Data Volume 1.6 PB Max
~Nodes ~Nodes ~Nodes ~Nodes
Managed Yes Yes (EMR) Yes (EMR)
Yes (EMR) Yes (EMR)
Storage Native HDFS / S3 HDFS / S3 HDFS / S3 HDFS / S3
SQL Compatibility
High Medium High Low (SparkSQL) Medium (HQL)
Query Latency High
(Low is better)
Medium
![Page 15: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/15.jpg)
Consume / Visualize
![Page 16: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/16.jpg)
Collect Store Analyze Consume
A
iOS Android
Web Apps
Logstash
Amazon RDS
Amazon DynamoDB
Amazon ES
AmazonS3
Apache Kafka
AmazonGlacier
AmazonKinesis
AmazonDynamoDB
Amazon Redshift
Impala
Pig
Amazon ML
Streaming
AmazonKinesis
AWS Lambda
Amaz
on E
last
ic M
apR
educ
e
Amazon ElastiCache
Sear
ch
SQL
NoS
QL
Cac
he
Stre
am P
roce
ssin
g Ba
tch
Inte
ract
ive
Logg
ing
Stre
am S
tora
ge
IoT
Appl
icat
ions
File
Sto
rage
Anal
ysis
& V
isua
lizat
ion
Hot
Cold
Warm
Hot Slow
Hot
ML
Fast
Fast
Transactional Data
File Data
Stream Data
Not
eboo
ks
Predictions
Apps & APIs
Mobile Apps
IDE
Search Data
ETL
Amazon QuickSight
![Page 17: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/17.jpg)
Consume
• Predictions
• Analysis and Visualization
• Notebooks • IDE
• Applications & API
Consume
Anal
ysis
& V
isua
lizat
ion
Amazon QuickSight
Not
eboo
ks
Predictions
Apps & APIs
IDE
Store Analyze Consume ETL
Business users
Data Scientist, Developers
![Page 18: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/18.jpg)
Putting It All Together
![Page 19: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/19.jpg)
Collect Store Analyze Consume
A
iOS Android
Web Apps
Logstash
Amazon RDS
Amazon DynamoDB
Amazon ES
AmazonS3
Apache Kafka
AmazonGlacier
AmazonKinesis
AmazonDynamoDB
Amazon Redshift
Impala
Pig
Amazon ML
Streaming
AmazonKinesis
AWS Lambda
Amaz
on E
last
ic M
apR
educ
e
Amazon ElastiCache
Sear
ch
SQL
NoS
QL
Cac
he
Stre
am P
roce
ssin
g Ba
tch
Inte
ract
ive
Logg
ing
Stre
am S
tora
ge
IoT
Appl
icat
ions
File
Sto
rage
Anal
ysis
& V
isua
lizat
ion
Hot
Cold
Warm
Hot Slow
Hot
ML
Fast
Fast
Amazon QuickSight
Transactional Data
File Data
Stream Data
Not
eboo
ks
Predictions
Apps & APIs
Mobile Apps
IDE
Search Data
ETL
![Page 20: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/20.jpg)
Interactive & Batch Analytics
Producer Amazon S3
Amazon EMR
Hive
Pig
Spark
Amazon ML
process store
Consume
Amazon Redshift
Amazon EMR Presto Impala
Spark
Batch
Interactive
Batch Prediction
Real-time Prediction
![Page 21: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/21.jpg)
Real-time Analytics
Producer Apache Kafka
KCL
AWS Lambda
Spark Streaming
Apache Storm
Amazon SNS
Amazon ML
Notifications
Amazon ElastiCache
(Redis)
Amazon DynamoDB
Amazon RDS
Amazon ES
Alert
App state
Real-time Prediction
KPI
process store
DynamoDB Streams
Amazon Kinesis
![Page 22: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/22.jpg)
Batch Layer
Amazon Kinesis
data
process store
Lambda Architecture
Amazon Kinesis S3 Connector
Amazon S3
Applications
Amazon Redshift
Amazon EMR
Presto
Hive
Pig
Spark answer
Speed Layer
answer
Serving Layer
Amazon ElastiCache
Amazon DynamoDB
Amazon RDS
Amazon ES
answer
Amazon ML
KCL
AWS Lambda
Spark Streaming
Storm
![Page 23: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/23.jpg)
Summary
• Use the right tool for the job • Latency, throughput, access patterns
• Leverage AWS managed services • No/low admin
• Be cost conscious • Big data ≠ big cost
![Page 24: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/24.jpg)
Thank you. Let’s keep in touch!
@aws_actus @julsimon facebook.com/groups/AWSFrance/
AWS User Groups in Paris, Lyon, Nantes, Lille & Rennes
(meetup.com)
March 7-8
AWS Summit May 31st
April 20-22
March 23-24 April 6-7 (Lyon)
April 25
March 16
![Page 25: Simplify Big Data with AWS](https://reader031.vdocuments.mx/reader031/viewer/2022022411/58ed196d1a28ab260c8b46c1/html5/thumbnails/25.jpg)
Customer references & further reading
• Amazon Kinesis: https://aws.amazon.com/solutions/case-studies/supercell/ • Amazon DynamoDB: https://aws.amazon.com/fr/solutions/case-studies/adroll/ • Amazon S3 / Glacier: https://aws.amazon.com/fr/solutions/case-studies/soundcloud/ • Amazon EMR: https://aws.amazon.com/fr/solutions/case-studies/yelp/ • Amazon Aurora: https://aws.amazon.com/fr/rds/aurora/testimonials/ • Amazon Redshift: https://aws.amazon.com/fr/solutions/case-studies/financial-times/ • AWS Lambda: https://aws.amazon.com/fr/solutions/case-studies/nordstrom/ • Many more case studies at https://aws.amazon.com/fr/solutions/case-studies/big-data/
• Whitepaper: “Big Data Analytics Options on AWS” : http://d0.awsstatic.com/whitepapers/Big_Data_Analytics_Options_on_AWS.pdf
• AWS Big Data blog: https://blogs.aws.amazon.com/bigdata