찾아가는 aws 세미나(구로,가산,판교) - aws 기반 빅데이터 활용 방법 (김일호...
TRANSCRIPT
![Page 1: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/1.jpg)
BigdataonAWS김일호, SolutionsArchitect
09-Nov-2016
![Page 2: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/2.jpg)
Agenda
• AWS Big data building blocks
• AWS Big data platform
– Log data collection & storage
– Introducing Amazon Kinesis
– Data Analytics & Computation
– Collaboration & sharing
![Page 3: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/3.jpg)
AWS Big data building blocks (brief)
![Page 4: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/4.jpg)
Use the right tools
Amazon S3
Amazon Kinesis
Amazon DynamoDB
Amazon Redshift
Amazon Elastic
MapReduce
![Page 5: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/5.jpg)
Store anythingObject storage
Scalable99.999999999% durability
Amazon S3
![Page 6: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/6.jpg)
Real-time processingHigh throughput; elastic
Easy to useEMR, S3, Redshift, DynamoDB Integrations
Amazon Kinesis
![Page 7: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/7.jpg)
NoSQL DatabaseSeamless scalability
Zero adminSingle digit millisecond latency
Amazon DynamoDB
![Page 8: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/8.jpg)
Relational data warehouseMassively parallel
Petabyte scaleFully managed$1,000/TB/Year
Amazon Redshift
![Page 9: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/9.jpg)
Hadoop/HDFS clustersHive, Pig, Impala, Hbase
Easy to use; fully managedOn-demand and spot pricingTight integration with S3,
DynamoDB, and Kinesis
Amazon Elastic
MapReduce
![Page 10: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/10.jpg)
HDFS
AmazonRedShift
AmazonRDS
Amazon S3 AmazonDynamoDB
Amazon EMR
AmazonKinesis
AWS Data Pipeline
Data management Hadoop Ecosystem analytical tools
Data Sources
AWS DataPipeline
![Page 11: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/11.jpg)
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
![Page 12: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/12.jpg)
AmazonDynamoDB
Amazon RDS
AmazonRedshift
AWS Direct Connect
AWS Storage Gateway
AWS Import/ Export
Amazon GlacierS3Amazon
Kinesis Amazon EMR
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
![Page 13: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/13.jpg)
Amazon EC2 Amazon EMRAmazon Kinesis
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
![Page 14: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/14.jpg)
AmazonRedshift
AmazonDynamoDB
Amazon RDS
S3 Amazon EC2 Amazon EMR
Amazon CloudFront
AWS CloudFormation
AWSData Pipeline
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
![Page 15: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/15.jpg)
The right tools. At the right scale. At the right time.
![Page 16: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/16.jpg)
AWS Big data platform
![Page 17: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/17.jpg)
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
![Page 18: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/18.jpg)
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
![Page 19: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/19.jpg)
Collection of Data
Sources AggregationTool Data Sink
Web ServersApplication serversConnected Devices
Mobile PhonesEtc
Scalable method to collect and aggregateFlume, Kafka, Kinesis,
Queue
Reliable and durable destination OR Destinations
![Page 20: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/20.jpg)
Types of Data Ingest
• Transactional– Database
reads/writes
• File– Click-stream logs
• Stream– Click-stream logs
Database
Cloud Storage
StreamStorage
![Page 21: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/21.jpg)
Run your own log collector
Yourapplication Amazon S3
DynamoDB
Anyotherdatastore
Amazon S3
AmazonEC2
![Page 22: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/22.jpg)
Use a Queue
AmazonSimpleQueueService(SQS)
Amazon S3
DynamoDB
Anyotherdatastore
![Page 23: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/23.jpg)
Agency Customer: Video Analytics on AWS
Elastic LoadBalancer
Edge Servers on EC2
Workers onEC2
Logs Reports
HDFS Cluster
Amazon Simple Queue Service (SQS)
Amazon Simple Storage Service (S3)
Amazon Elastic MapReduce
![Page 24: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/24.jpg)
Use a Tool like FLUME, KAFKA, HONU etc
Flume running on EC2
Amazon S3
Anyotherdatastore
HDFS
![Page 25: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/25.jpg)
Stream Storage
Database
Cloud Storage
![Page 26: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/26.jpg)
26
Why Stream Storage?Convert multiple streams into fewer persistent sequential streams
Sequential streams are easier to process
Amazon Kinesis or Kafka4 4 3 3 2 2 1 14 3 2 1
4 3 2 1
4 3 2 1
4 3 2 1
4 4 3 3 2 2 1 1
Shard or Partition 1
Shard or Partition 2
Producer 1
Producer 2
Producer 3
Producer N
![Page 27: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/27.jpg)
27
Amazon Kinesis or Kafka
Why Stream Storage?Decouple producers and consumersBuffer
Preserve client orderingStreaming MapReduceConsumer replay / reprocess
4 4 3 3 2 2 1 14 3 2 1
4 3 2 1
4 3 2 1
4 3 2 1
4 4 3 3 2 2 1 1
Producer 1Shard or Partition 1
Shard or Partition 2
Consumer 1Count of Red = 4
Count of Violet = 4
Consumer 2Count of Blue = 4
Count of Green = 4
Producer 2
Producer 3
Producer N
![Page 28: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/28.jpg)
Introducing Amazon Kinesis
![Page 29: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/29.jpg)
DataSources
App.4
[MachineLearning]
AWSEn
dpoint
App.1
[Aggregate&De-Duplicate]
DataSources
DataSources
DataSources
App.2
[MetricExtraction]
S3
DynamoDB
Redshift
App.3[SlidingWindowAnalysis]
DataSources
Availability Zone
Shard 1Shard 2Shard N
Availability Zone
Availability Zone
Introducing Amazon Kinesis Managed Service for Real-Time Processing of Big Data
EMR
![Page 30: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/30.jpg)
Kinesis Architecture
Amazon Web Services
AZ AZ AZ
Durable, highly consistent storage replicates dataacross three data centers (availability zones)
Aggregate andarchive to S3
Millions ofsources producing100s of terabytes
per hour
FrontEnd
AuthenticationAuthorization
Ordered streamof events supportsmultiple readers
Real-timedashboardsand alarms
Machine learningalgorithms or
sliding windowanalytics
Aggregate analysisin Hadoop or a
data warehouse
Inexpensive: $0.028 per million puts
![Page 31: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/31.jpg)
Putting data into KinesisManaged Service for Ingesting Fast Moving Data• Streams are made of Shards⁻ A Kinesis Stream is composed of multiple Shards ⁻ Each Shard ingests up to 1MB/sec of data, and up to 1000 TPS⁻ Each Shard emits up to 2 MB/sec of data⁻ All data is stored for 24 hours⁻ You scale Kinesis streams by adding or removing Shards
• Simple PUT interface to store data in Kinesis⁻ Producers use a PUT call to store data in a Stream⁻ A Partition Key is used to distribute the PUTs across Shards⁻ A unique Sequence # is returned to the Producer upon a
successful PUT call
Producer
Shard 1
Shard 2
Shard 3
Shard n
Shard 4
Producer
Producer
Producer
Producer
Producer
Producer
Producer
Producer
Kinesis
![Page 32: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/32.jpg)
![Page 33: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/33.jpg)
Shard 1
Shard 2
Shard 3
Shard n
Shard 4
KCL Worker 1
KCL Worker 2
EC2 Instance
KCL Worker 3
KCL Worker 4
EC2 Instance
KCL Worker n
EC2 Instance
Kinesis
Building Kinesis AppsClient library for fault-tolerant, at least-once, real-time processing • Key streaming application attributes:
– Be distributed, to handle multiple shards
– Be fault tolerant, to handle failures in hardware or software
– Scale up and down as the number of shards increase or decrease
• Kinesis Client Library (KCL) helps with distributed processing:
– Automatically starts a Kinesis Worker for each shard
– Simplifies reading from the stream by abstracting individual shards
– Increases / Decreases Kinesis Workers as # of shards changes
– Checkpoints to keep track of a Worker’s location in the stream
– Restarts Workers if they fail
• Use the KCL with Auto Scaling Groups
– Automatically add EC2 instances when load increases
– KCL will redistributes Workers to use the new EC2 instances
![Page 34: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/34.jpg)
34
EasyAdministration
Managedserviceforreal-timestreamingdatacollection,processingandanalysis.
Simplycreateanewstream,setthedesiredlevelofcapacity,andlettheservicehandle
therest.
Real-timePerformance
Performcontinualprocessingonstreamingbigdata.Processinglatenciesfalltoafewseconds,comparedwiththeminutesorhoursassociatedwithbatchprocessing.
HighThroughput.Elastic
Seamlesslyscaletomatchyourdatathroughputrateandvolume.Youcaneasily
scaleuptogigabytespersecond.Theservicewillscaleupordownbasedonyour
operationalorbusinessneeds.
S3,EMR,Storm, Redshift,&DynamoDBIntegration
Reliablycollect,process,andtransformallofyourdatainreal-time&delivertoAWSdata
storesofchoice,withConnectorsfor S3,Redshift,andDynamoDB.
BuildReal-timeApplications
Clientlibrariesthatenabledeveloperstodesignandoperatereal-timestreamingdata
processingapplications.
LowCost
Cost-efficientforworkloadsofanyscale.Youcangetstartedbyprovisioningasmall
stream,andpaylowhourlyratesonlyforwhatyouuse.
Amazon Kinesis: Key Developer Benefits
![Page 35: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/35.jpg)
Customers using Amazon KinesisMobile/ Social Gaming Digital Advertising Tech.
Deliver continuous/ real-time delivery of game insight data by 100’s of game servers
Generate real-time metrics, KPIs for online ad performance for advertisers/ publishers
Custom-built solutions operationally complex to manage, & not scalable
Store + Forward fleet of log servers, and Hadoop based processing pipeline
• Delay with critical business data delivery• Developer burden in building reliable, scalable platform for real-time data ingestion/ processing
• Slow-down of real-time customer insights
• Lost data with Store/ Forward layer• Operational burden in managing reliable, scalable
platform for real-time data ingestion/ processing• Batch-driven real-time customer insights
Accelerate time to market of elastic, real-time applications – while minimizing operational
overhead
Generate freshest analytics on advertiser performance to optimize marketing spend, and increase
responsiveness to clients
![Page 36: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/36.jpg)
![Page 37: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/37.jpg)
Under NDA
Gaming Analytics with Amazon Kinesis
![Page 38: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/38.jpg)
Digital Ad. Tech Metering with Kinesis
Continuous Ad Metrics Extraction
Incremental Ad. Statistics Computation
Metering Record Archive
Ad Analytics Dashboard
![Page 39: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/39.jpg)
Amazon Kinesis Firehose
![Page 40: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/40.jpg)
Collection of Data
Sources AggregationTool Data Sink
Web ServersApplication serversConnected Devices
Mobile PhonesEtc
Scalable method to collect and aggregateFlume, Kafka, Kinesis,
Queue
Reliable and durable destination OR Destinations
![Page 41: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/41.jpg)
Cloud Database & Storage
![Page 42: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/42.jpg)
Cloud Database and Storage Tier Anti-pattern
App/Web Tier
Client Tier
Database & Storage Tier = All-in-one?
![Page 43: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/43.jpg)
Cloud Database and Storage Tier — Use the Right Tool for the Job!
App/Web Tier
Client Tier
Data TierDatabase & Storage Tier
Search
Hadoop/HDFS
Cache
Blob Store
SQL NoSQL
![Page 44: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/44.jpg)
Database & Storage Tier
Amazon RDSAmazon DynamoDB
Amazon ElastiCache
Amazon S3Amazon Glacier
Amazon CloudSearch
HDFS on Amazon EMR
Cloud Database and Storage Tier — Use the Right Tool for the Job!
![Page 45: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/45.jpg)
What Database and Storage Should I Use?
• Data structure• Query complexity• Data characteristics: hot, warm, cold
![Page 46: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/46.jpg)
Data Structure and Query Types vs Storage Technology
Structured – Simple QueryNoSQL
Amazon DynamoDBCache
Amazon ElastiCache
Structured – Complex QuerySQL
Amazon RDS Search
Amazon CloudSearch
Unstructured – No QueryCloud Storage
Amazon S3Amazon Glacier
Unstructured – Custom QueryHadoop/HDFS
Amazon Elastic MapReduce
Dat
a St
ruct
ure
Com
plex
ity
Query Structure Complexity
![Page 47: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/47.jpg)
What is the Temperature of Your Data?
![Page 48: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/48.jpg)
AmazonRDS
Request RateHigh Low
Cost/GBHigh Low
LatencyLow High
Data VolumeLow High
AmazonGlacier
AmazonCloudSearch
Stru
ctur
eLow
High
AmazonDynamoD
B
AmazonElastiCach
e
![Page 49: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/49.jpg)
What Data Store Should I Use?Amazon ElastiCache
AmazonDynamoDB
AmazonRDS
AmazonCloudSearch
Amazon EMR (HDFS)
Amazon S3 Amazon Glacier
Average latency
ms ms ms, sec ms,sec sec,min,hrs ms,sec,min(~ size)
hrs
Data volume GB GB–TBs(no limit)
GB–TB(3 TB Max)
GB–TB GB–PB(~nodes)
GB–PB(no limit)
GB–PB(no limit)
Item size B-KB KB(64 KB max)
KB(~rowsize)
KB(1 MB max)
MB-GB KB-GB(5 TB max)
GB(40 TB max)
Request rate Very High
Very High High High Low – Very High
Low–Very High(no limit)
Very Low(no limit)
Storage cost$/GB/month
$$ ¢¢ ¢¢ $ ¢ ¢ ¢
Durability Low -Moderate
Very High High High High Very High Very High
Hot Data Warm Data Cold Data
![Page 50: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/50.jpg)
Decouple your storage and analysis engine1. Single Version of Truth2. Choice of multiple analytics Tools3. Parallel execution from different teams4. Lower cost
![Page 51: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/51.jpg)
S3 as a “single source of truth”
Courtesy http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.html
S3
![Page 52: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/52.jpg)
AmazonSQS
Amazon S3
DynamoDB
AnySQLorNOSQLStore
Kinesis
Choose depending upon design
![Page 53: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/53.jpg)
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
![Page 54: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/54.jpg)
Process
• Answering questions about data
• Questions– Analytics: Think SQL/data warehouse– Classification: Think sentiment analysis – Prediction: Think page-views prediction – Etc
![Page 55: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/55.jpg)
Processing Frameworks
Generally come in two major types:• Batch processing• Stream processing• Interactive query
![Page 56: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/56.jpg)
Batch Processing
• Take large amount of cold data and ask questions
• Takes minutes or hours to get answers back
Example: Generating hourly, daily, weekly reports
![Page 57: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/57.jpg)
Process
![Page 58: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/58.jpg)
Stream Processing (AKA Real Time)
• Take small amount of hot data and ask questions
• Takes short amount of time to get your answer back
Example: 1min metrics
![Page 59: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/59.jpg)
Processing Tools
• Batch processing/analytic– Amazon Redshift– Amazon EMR
• Hive/Tez, Pig, Spark, Impala, Spark, Presto, ….
• Stream processing– Apache Spark streaming– Apache Storm (+ Trident)– Amazon Kinesis client and
connector library
![Page 60: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/60.jpg)
Amplab Big Data Benchmark
Scan query Aggregate query Join queryhttps://amplab.cs.berkeley.edu/benchmark/
![Page 61: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/61.jpg)
What Batch Processing Technology Should I Use?Redshift Impala Presto Spark Hive
Query Latency Low Low Low Low - Medium Medium - High
Durability High High High High High
Data Volume 1.6PB Max ~Nodes ~Nodes ~Nodes ~Nodes
Managed Yes EMR bootstrap EMRbootstrap
EMR bootstrap Yes (EMR)
Storage Native HDFS HDFS/S3 HDFS/S3 HDFS/S3
# of BI Tools High Medium High Low High
Query Latency (Low is better)
![Page 62: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/62.jpg)
What Stream Processing Technology Should I Use?Spark Streaming Apache Storm +
TridentKinesis Client Library
Scale/Throughput ~ Nodes ~ Nodes ~ Nodes
Data Volume ~ Nodes ~ Nodes ~ Nodes
Manageability Yes (EMR bootstrap) Do it yourself EC2 + Auto Scaling
Fault Tolerance Built-in Built-in KCL Check pointing
Programming languages Java, Python, Scala Java, Scala, Clojure Java, Python
![Page 63: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/63.jpg)
Amazon Kinesis Analytics
![Page 64: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/64.jpg)
Hadoop based Analysis
AmazonSQS
Amazon S3
DynamoDB
AnySQLorNOSQLStore
LogAggregationtools
Amazon EMR
![Page 65: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/65.jpg)
Your choice of tools on Hadoop/EMR
AmazonSQS
Amazon S3
DynamoDB
AnySQLorNOSQLStore
LogAggregationtools
Amazon EMR
![Page 66: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/66.jpg)
Hadoop based Analysis
AmazonSQS
Amazon S3
DynamoDB
AnySQLorNOSQLStore
LogAggregationtools
Amazon EMR
![Page 67: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/67.jpg)
Hadoop based Analysis
AmazonSQS
Amazon S3
DynamoDB
AnySQLorNOSQLStore
LogAggregationtools
Amazon EMR
Spark and Shark
Cloudera Impala
![Page 68: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/68.jpg)
Hadoop is good for
1. Ad Hoc Query analysis2. Large Unstructured Data Sets 3. Machine Learning and Advanced Analytics4. Schema less
![Page 69: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/69.jpg)
SQL based Low Latency Analytics on structured data
![Page 70: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/70.jpg)
SQL based processing
AmazonSQS
Amazon S3
DynamoDB
AnySQLorNOSQLStore
LogAggregationtools
Amazon Redshift
Petabyte scale Columnar Data -warehouse
![Page 71: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/71.jpg)
SQL based processing for unstructured data
AmazonSQS
Amazon S3
DynamoDB
AnySQLorNOSQLStore
LogAggregationtools
Amazon EMR
Amazon Redshift
Pre-processing framework
Petabyte scale Columnar Data -warehouse
![Page 72: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/72.jpg)
Your choice of BI Tools on the cloud
AmazonSQS
Amazon S3
DynamoDB
AnySQLorNOSQLStore
LogAggregationtools
Amazon EMR
Amazon Redshift
Pre-processing framework
![Page 73: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/73.jpg)
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
![Page 74: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/74.jpg)
Collaboration and Sharing insights
AmazonSQS
Amazon S3
DynamoDB
AnySQLorNOSQLStore
LogAggregationtools
Amazon EMR
Amazon Redshift
![Page 75: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/75.jpg)
Sharing results and visualizations
AmazonSQS
Amazon S3
DynamoDB
AnySQLorNOSQLStore
LogAggregationtools
Amazon EMR
Amazon Redshift
Web App ServerVisualization tools
![Page 76: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/76.jpg)
Sharing results and visualizations and scale
AmazonSQS
Amazon S3
DynamoDB
AnySQLorNOSQLStore
LogAggregationtools
Amazon EMR
Amazon Redshift
Web App ServerVisualization tools
![Page 77: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/77.jpg)
Sharing results and visualizations
AmazonSQS
Amazon S3
DynamoDB
AnySQLorNOSQLStore
LogAggregationtools
Amazon EMR
Amazon Redshift Business
Intelligence Tools
Business Intelligence Tools
![Page 78: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/78.jpg)
Geospatial Visualizations
AmazonSQS
Amazon S3
DynamoDB
AnySQLorNOSQLStore
LogAggregationtools
Amazon EMR
Amazon Redshift Business
Intelligence Tools
Business Intelligence Tools
GIS tools on hadoop
GIS tools
Visualization tools
![Page 79: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/79.jpg)
Rinse and Repeat
AmazonSQS
Amazon S3
DynamoDB
AnySQLorNOSQLStore
LogAggregationtools
Amazon EMR
Amazon Redshift
Visualization tools
Business Intelligence Tools
Business Intelligence Tools
GIS tools on hadoop
GIS tools
Amazon data pipeline
![Page 80: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/80.jpg)
The complete architecture
AmazonSQS
Amazon S3
DynamoDB
AnySQLorNOSQLStore
LogAggregationtools
Amazon EMR
Amazon Redshift
Visualization tools
Business Intelligence Tools
Business Intelligence Tools
GIS tools on hadoop
GIS tools
Amazon data pipeline
![Page 81: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/81.jpg)
![Page 82: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/82.jpg)
Expanding analytics architecture
![Page 83: 찾아가는 AWS 세미나(구로,가산,판교) - AWS 기반 빅데이터 활용 방법 (김일호 솔루션즈 아키텍트)](https://reader030.vdocuments.mx/reader030/viewer/2022013123/586fe2e81a28ab18428b7d87/html5/thumbnails/83.jpg)
Adding Amazon Kinesis Analytics, Amazon Machine Learning, and Amazon ElasticSearch
Amazon RedshiftAmazon Elastic MapReduce
Amazon Glacier
Amazon DynamoD
B
Amazon Machine Learning
Amazon Kinesis
Data WarehouseSemi-structured NoSQL Predictive Models
Other AppsStreaming
Amazon Simple Storage Service
Data Lake Archive
Log Generato
r
Creating summary tables from log table
Amazon Elasticsearch Service
AWSLambda
Amazon Kinesis
Analytics