20141021 aws cloud taekwon - big data on aws

76
Big Data on AWS Markku Lepistö Principal Technology Evangelist @markkulepisto

Upload: amazon-web-services-korea

Post on 26-Jun-2015

438 views

Category:

Technology


3 download

DESCRIPTION

AWS APAC Principal Technology Evangelist인 Markku Lepisto의 발표내용입니다.

TRANSCRIPT

Page 1: 20141021 AWS Cloud Taekwon - Big Data on AWS

Big Data on AWS Markku Lepistö Principal Technology Evangelist @markkulepisto

Page 2: 20141021 AWS Cloud Taekwon - Big Data on AWS

Does this Data make me look big?

Page 3: 20141021 AWS Cloud Taekwon - Big Data on AWS

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 4: 20141021 AWS Cloud Taekwon - Big Data on AWS
Page 5: 20141021 AWS Cloud Taekwon - Big Data on AWS

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 6: 20141021 AWS Cloud Taekwon - Big Data on AWS

Getting your Data into AWS

Amazon S3

Corporate Data Center

• Console Upload

• FTP

• AWS Import Export

• S3 API

• Direct Connect

• Storage Gateway

• 3rd Party Commercial Apps

• Tsunami UDP

1

Page 7: 20141021 AWS Cloud Taekwon - Big Data on AWS

Write directly to a data source

Your application Amazon S3

DynamoDB

Any other data store

Amazon S3

Amazon EC2

2

Page 8: 20141021 AWS Cloud Taekwon - Big Data on AWS

Zero Admin NoSQL Service

Unlimited Storage

Provisioned Throughput

Consistent <10ms response

Durable on SSD

Services: Database: Amazon DynamoDB

Compute Storage

AWS Global Infrastructure

Database

Networking

Page 9: 20141021 AWS Cloud Taekwon - Big Data on AWS

Queue, pre-process and then write to data source

Amazon Simple Queue Service

(SQS)

Amazon S3

DynamoDB

Any other data store

3

Page 10: 20141021 AWS Cloud Taekwon - Big Data on AWS
Page 11: 20141021 AWS Cloud Taekwon - Big Data on AWS
Page 12: 20141021 AWS Cloud Taekwon - Big Data on AWS
Page 13: 20141021 AWS Cloud Taekwon - Big Data on AWS
Page 14: 20141021 AWS Cloud Taekwon - Big Data on AWS

Aggregate and write to data source

Flume running on EC2

Amazon S3

Any other data store

HDFS

4

Page 15: 20141021 AWS Cloud Taekwon - Big Data on AWS

Amazon SQS

Amazon S3

DynamoDB

Any SQL or NO SQL Store

Log Aggregation tools

Choose depending upon design

Page 16: 20141021 AWS Cloud Taekwon - Big Data on AWS

Courtesy http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.html

S3 as a “single source of truth”

S3

Page 17: 20141021 AWS Cloud Taekwon - Big Data on AWS

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 18: 20141021 AWS Cloud Taekwon - Big Data on AWS

Hadoop based Analysis Amazon SQS

Amazon S3

DynamoDB

Any SQL or NO SQL Store

Log Aggregation tools

Amazon EMR

Page 19: 20141021 AWS Cloud Taekwon - Big Data on AWS

EMR is Hadoop in the Cloud

What is Amazon Elastic MapReduce (EMR)?

Page 20: 20141021 AWS Cloud Taekwon - Big Data on AWS

EMR Cluster

S3

Put the data into S3

Choose: Hadoop distribution, # of nodes, types of nodes, custom configs, Hive/Pig/etc.

Get the output from S3

Launch the cluster using the EMR console, CLI, SDK, or APIs

You can also store everything in HDFS

How does EMR work ?

Page 21: 20141021 AWS Cloud Taekwon - Big Data on AWS

S3

What can you run on EMR…

EMR Cluster

Page 22: 20141021 AWS Cloud Taekwon - Big Data on AWS

SQL based processing Amazon SQS

Amazon S3

DynamoDB

Any SQL or NO SQL Store

Log Aggregation tools

Amazon EMR

Amazon Redshift

Pre-processing framework

Petabyte scale Columnar Data -

warehouse

Page 23: 20141021 AWS Cloud Taekwon - Big Data on AWS

Amazon Redshift • Easily and rapidly analyze petabytes of data • Fully managed data warehouse service • Automated deployment and administration • 1/10th the cost of traditional data warehouses • < $1000 / Terabyte / year • Compatible with popular BI tools

Services: Database: Amazon Redshift

Compute Storage

AWS Global Infrastructure

Database

App Services

Deployment & Administration

Networking

Page 24: 20141021 AWS Cloud Taekwon - Big Data on AWS

Your choice of BI Tools on the cloud Amazon SQS

Amazon S3

DynamoDB

Any SQL or NO SQL Store

Log Aggregation tools

Amazon EMR

Amazon Redshift

Pre-processing framework

Page 25: 20141021 AWS Cloud Taekwon - Big Data on AWS

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 26: 20141021 AWS Cloud Taekwon - Big Data on AWS

Collaboration and Sharing insights

Amazon SQS

Amazon S3

DynamoDB

Any SQL or NO SQL Store

Log Aggregation tools

Amazon EMR

Amazon Redshift

Page 27: 20141021 AWS Cloud Taekwon - Big Data on AWS

Sharing results and visualizations at scale

Amazon SQS

Amazon S3

DynamoDB

Any SQL or NO SQL Store

Log Aggregation tools

Amazon EMR

Amazon Redshift

Web App Server Visualization tools

Page 28: 20141021 AWS Cloud Taekwon - Big Data on AWS

Rinse and Repeat every day or hour

Page 29: 20141021 AWS Cloud Taekwon - Big Data on AWS

Rinse and Repeat

Amazon SQS

Amazon S3

DynamoDB

Any SQL or NO SQL Store

Log Aggregation tools

Amazon EMR

Amazon Redshift

Visualization tools

Business Intelligence Tools

Business Intelligence Tools

GIS tools on hadoop

GIS tools

Amazon data pipeline

Page 30: 20141021 AWS Cloud Taekwon - Big Data on AWS

The complete architecture

Amazon SQS

Amazon S3

DynamoDB

Any SQL or NO SQL Store

Log Aggregation tools

Amazon EMR

Amazon Redshift

Visualization tools

Business Intelligence Tools

Business Intelligence Tools

GIS tools on hadoop

GIS tools

Amazon data pipeline

Page 31: 20141021 AWS Cloud Taekwon - Big Data on AWS

No it isn’t !

Page 32: 20141021 AWS Cloud Taekwon - Big Data on AWS

What about Real-Time?

Page 33: 20141021 AWS Cloud Taekwon - Big Data on AWS

nopeampi data on parempi data

Page 34: 20141021 AWS Cloud Taekwon - Big Data on AWS

HAPPENING NOW! real-time == stream analytics

Page 35: 20141021 AWS Cloud Taekwon - Big Data on AWS

Ingest data streams Store durably

Distribute Scale out

Process as packets flow in

Page 36: 20141021 AWS Cloud Taekwon - Big Data on AWS
Page 37: 20141021 AWS Cloud Taekwon - Big Data on AWS

Realtime Analytics in the Cloud

Amazon Kinesis Streaming Data Service

Page 38: 20141021 AWS Cloud Taekwon - Big Data on AWS

Kinesis architecture

Page 39: 20141021 AWS Cloud Taekwon - Big Data on AWS
Page 40: 20141021 AWS Cloud Taekwon - Big Data on AWS

Clash of Clans

Page 41: 20141021 AWS Cloud Taekwon - Big Data on AWS

In-game activity

Amazon Kinesis

Kinesis: Real-time data stream of in-game activity

Clash of Clans

Page 42: 20141021 AWS Cloud Taekwon - Big Data on AWS

Kinesis-enabled apps on EC2

In-game activity

Kinesis: Real-time data stream of in-game activity Multiple Kinesis applications: Dashboards, analytics and storage

Clash of Clans

Real-time clickstream processing app

Amazon Kinesis

Page 43: 20141021 AWS Cloud Taekwon - Big Data on AWS

S3 Aggregate statistics

In-game activity

EC2: In-game engagement

trends dashboard

Kinesis: Real-time data stream of in-game activity Multiple Kinesis applications: Dashboards, analytics and storage S3 and Glacier: Data storage and long term archival

Clash of Clans

Kinesis-enabled apps on EC2

Real-time clickstream processing app

Amazon Kinesis

Page 44: 20141021 AWS Cloud Taekwon - Big Data on AWS

Business-intelligence user

EC2: In-game engagement

trends dashboard

In-game activity

S3 Aggregate statistics

Kinesis: Real-time data stream of in-game activity Multiple Kinesis applications: Dashboards, analytics and storage

Data Warehouse: BI reporting and interactive queries S3 and Glacier: Data storage and long term archival

Clash of Clans

Kinesis-enabled apps on EC2

EC2 Data

Warehouse

Real-time clickstream processing app

Amazon Kinesis

Page 45: 20141021 AWS Cloud Taekwon - Big Data on AWS

Glacier

EC2 Data

Warehouse

Clickstream archive

EC2: In-game engagement

trends dashboard

Real-time clickstream processing app

Kinesis: Real-time data stream of in-game activity Multiple Kinesis applications: Dashboards, analytics and storage

Data Warehouse: BI reporting and interactive queries S3 and Glacier: Data storage and long term archival

In-game activity

S3

Clash of Clans

Aggregate statistics

Business-intelligence user

Kinesis-enabled apps on EC2

Amazon Kinesis

Page 46: 20141021 AWS Cloud Taekwon - Big Data on AWS

Demo

Sliding Window Analytics Live Dashboard

S3 Storage Redshift Data Warehouse

Kinesis

Website Clickstream

logs

Page 47: 20141021 AWS Cloud Taekwon - Big Data on AWS

AWS Cloud Taekwon

Page 48: 20141021 AWS Cloud Taekwon - Big Data on AWS

Bonus

Internet of Things

Page 49: 20141021 AWS Cloud Taekwon - Big Data on AWS

Smart Devices

Powered by the Cloud

Page 50: 20141021 AWS Cloud Taekwon - Big Data on AWS

Smart Devices

Powered by the Cloud

Page 51: 20141021 AWS Cloud Taekwon - Big Data on AWS

Smart Devices

Powered by the Cloud

Page 52: 20141021 AWS Cloud Taekwon - Big Data on AWS

Smart Devices

Powered by the Cloud

Page 53: 20141021 AWS Cloud Taekwon - Big Data on AWS

Smart?evices

Powered by the Cloud

Page 54: 20141021 AWS Cloud Taekwon - Big Data on AWS

Smart?evices

Powered by the Cloud Arduino Uno Raspberry Pi

CPU 20MHz 8bit 700MHz 32bit Memory 2 KB 512 MB Storage 32 KB SD card

Page 55: 20141021 AWS Cloud Taekwon - Big Data on AWS

Smart Devices

Powered by the Cloud

Page 56: 20141021 AWS Cloud Taekwon - Big Data on AWS

Camera Microphone

Thermometer

Distance

GPS

Gyroscope

Actuator

Relay

Motor

Manipulator

Switch Pressure

Accelerometer

Wheel Propeller

Rotor

Page 57: 20141021 AWS Cloud Taekwon - Big Data on AWS

Challenges

Page 58: 20141021 AWS Cloud Taekwon - Big Data on AWS

Challenges

Thousands – Millions of Devices / Producers

Page 59: 20141021 AWS Cloud Taekwon - Big Data on AWS

Challenges

Thousands – Millions of Devices / Producers

Thousands – Millions of Users / Consumers

Page 60: 20141021 AWS Cloud Taekwon - Big Data on AWS

Distributed

Thousands – Millions of Devices / Producers

Thousands – Millions of Users / Consumers

Page 61: 20141021 AWS Cloud Taekwon - Big Data on AWS

At scale

Thousands – Millions of Devices / Producers

Thousands – Millions of Users / Consumers

Page 62: 20141021 AWS Cloud Taekwon - Big Data on AWS

Smart Devices

Powered by the Cloud

Page 63: 20141021 AWS Cloud Taekwon - Big Data on AWS

Smart Devices

Powered by the Cloud Unlimited Storage – Memory Unlimited Compute – Logic

Page 64: 20141021 AWS Cloud Taekwon - Big Data on AWS
Page 65: 20141021 AWS Cloud Taekwon - Big Data on AWS

Camera Microphone

Thermometer

Distance

GPS

Actuator

Relay

Motor

Manipulator

Switch Pressure

Wheel Propeller

Rotor

Gyroscope Accelerometer

Page 66: 20141021 AWS Cloud Taekwon - Big Data on AWS

Smart Devices

Powered by the Cloud

Page 67: 20141021 AWS Cloud Taekwon - Big Data on AWS

70

Page 68: 20141021 AWS Cloud Taekwon - Big Data on AWS
Page 69: 20141021 AWS Cloud Taekwon - Big Data on AWS

Demo

Page 70: 20141021 AWS Cloud Taekwon - Big Data on AWS

Arduino Yún

Page 71: 20141021 AWS Cloud Taekwon - Big Data on AWS

Raspberry Pi

Page 72: 20141021 AWS Cloud Taekwon - Big Data on AWS

Spark Core

Page 73: 20141021 AWS Cloud Taekwon - Big Data on AWS

Accele-rometer

MQTT

Mosquitto MQTT Broker MQTT-Kinesis Bridge

AWS SDK

Amazon Kinesis Real-time Streaming

Data Service

AWS APIs

AWS Elastic Beanstalk

Dashboard

Amazon SNS Earthquake

Alerts

Mobile Push

Page 74: 20141021 AWS Cloud Taekwon - Big Data on AWS

Demo

Page 75: 20141021 AWS Cloud Taekwon - Big Data on AWS

COLLECT | STORE | ANALYZE | SHARE

Import Export

Glacier

S3 EC2

Redshift DynamoDB

EMR

Data Pipeline

S3 Direct Connect

Kinesis

The AWS Big Data Portfolio

CloudFront

Page 76: 20141021 AWS Cloud Taekwon - Big Data on AWS

AWS Cloud Taekwon