awsビッグデータソリューション amazon redshift, amazon emr
TRANSCRIPT
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Amazon Redshift, Amazon EMR, Amazon DynamoDB
Yifeng JiangSolutions ArchitectAmazon Data Services Japan
TC-03
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1. AWS
2. 3. Getting Started with Big Data Services
Amazon Redshift Amazon Elastic MapReduce Amazon DynamoDB
4. Practical Deep Dive
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1. AWS
2. 3. Getting Started with Big Data Services
Amazon Redshift Amazon Elastic MapReduce Amazon DynamoDB
4. Practical Deep Dive
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS 40
AWSRegions / Availability Zones / Contents Delivery POPSAZ Region
EC2 ElasticLoad Balancing
Auto Scaling S3 Glacier EBS Storage Gateway RDS DynamoDB ElastiCache Redshift
Kinesis EMR Data Pipeline
CloudFront
Virtual Private Cloud Direct Connect Rout53
WorkSpaces
SQS SNS SES SWF Elastic Transcoder CloudSearch
Management & Administration
CloudWatch CloudTrail IAM Management Console SDK CLI
CloudFormation BeanStalk OpsWorks
EcosystemTechnology Partner / Consulting Partner
4
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Ingestion
Storage
Big Data services on AWS
DWH NoSQL DynamoDB Redshift
S3
Glacier
Data Pipeline
RDB
Hadoop
Workflow Management
RDS
Elastic MapReduce
5
Kinesis
-
AWS
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Storage
Big Data services on AWS
DWH NoSQL DynamoDB Redshift
S3
Glacier
Data Pipeline
RDB
Hadoop
Workflow Management
RDS
Elastic MapReduce
7
Ingestion Kinesis
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Storage
Big Data services on AWS
DWH NoSQL DynamoDB Redshift
S3
Glacier
Data Pipeline
RDB
Hadoop
Workflow Management
RDS
Elastic MapReduce
8
Ingestion Kinesis
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1. AWS
2. 3. Getting Started with Big Data Services
Amazon Redshift Amazon Elastic MapReduce Amazon DynamoDB
4. Practical Deep Dive
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
4
1.
2.
3.
4. BI API
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. Glacier RDS
EMR
EMR
Redshift
DynamoDB
Data Pipeline
S3
Data
ETL
Sum
Web app
Analytics
Dashboard
11
Kinesis
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1.1.1.1, /login, 20140226000101, 192.168, /home, 20140226011226, 1.1.1.2, /home, 20140226011331,
EMRRedshift
ETLS3
S3
Web
ETL
USER PATH TIMESTAMP -----------------------------------USER1 /login 2014-02-26 00:00:01USER2 /home 2014-02-26 01:13:31
12
BI
1.1.1.1, /login, 20140226000101, 192.168, /home, 20140226011226, 1.1.1.2, /home, 20140226011331,
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
USER1, 20140226000101, USER2, 20140226011226, USER1, 20140226011331,
EMRRedshift
ETL
S3
S3Web
DMP-
13
API
DynamoDBETL
USER1: { Interest: [ Car, Home ], ... }USER2: { Interest: [ Dog, Cat ], }
EMRS3
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
4
1.
2.
3.
4. BI API
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1. AWS
2. : 3. Getting Started with Big Data Services
Amazon DynamoDBAmazon Elastic MapReduceAmazon Redshift
4. Practical Deep Dive
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
Data Warehouse as a Service
160GB1.6PB (MPP)
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
18
PostgreSQL(psql)
BI
JDBC/ODBC
10GigE Mesh
SQL :
N
S3, DynamoDB, EMR
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
RDBMS Redshiftorderid name price
1 Book 100
2 Pen 50
n Eraser 70
orderid name price
1 Book 100
2 Pen 50
n Eraser 70
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 20
Amazon Redshift dw2.large:
CPU: 2 virtual cores ECU: 7 Memory: 15 GiB Storage: 160GB(SSD) Network: 0.2GB/s
dw2.8xlarge CPU: 32 virtual cores ECU: 104 Memory: 244 GiB Storage: 2.56TB(SSD) Network: 3.7GB/s
" dw1.xlarge: CPU: 2 virtual cores ECU: 4.4 Memory: 15 GiB Storage: 2TB(HDD) Network: 0.3GB/s
" dw1.8xlarge CPU: 16 virtual cores ECU: 35 Memory: 120 GiB Storage: 16TB(SSD) Network: 2.4GB/s
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3CSV
21
Redshiftpsql -d mydb h YOUR_REDSHIFT_ENDPOINT -p 5439 -U awsuser -W !
!COPY customer FROM 's3://data/customer.tbl. CREDENTIALS aws_access_key_id=KEY;aws_secret_access_key=SEC DELIMITER , GZIP TIME_FORMAT auto; !
Redshiftcopy
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
22
RedshiftCREATE TABLE nginx ( ! remote_addr char(15), ! time timestamp, ! request varchar(255), ! status integer, ! bytes bigint, ! ua varchar!); !
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SQL
23
SELECT ua, request, COUNT(*) !FROM nginx!GROUP BY ua, request; !
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3
24
UNLOAD TO s3://YOUR_BUCKET/PATH/ !SELECT * FROM nginx; !
-
Tableau + Redshift
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
SQL RDB ETL
,
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Elastic MapReduce
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hadoop
Elastic MapReduce
AWSHadoop Hadoop12MapR
CloudWatch S3
S3DynamoDB
Hadoop
Hadoop
28
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hadoop
29
aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 30
aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !
Hadoop
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 31
aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !
AMIHadoop
Hadoop
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 32
aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !
Hadoop
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 33
aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !
Hadoop
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 34
aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !
Hadoop
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 35
aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !
VPC
Hadoop
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
37
aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !--steps Type=HIVE,Name='Hive program, Args=[-f,s3://PATH/TO/QUERY.q] \ !--auto-terminate !
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
cronData PipelineHadoop
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3
HDFSS3 INPUTOUTPUTs3://
40
hadoop jar YOUR_JAR.jar \ !--src s3://YOUR_BUCKET/logs/ \ !--dest s3://YOUR_BUCKET/output/ !
hadoop jar YOUR_JAR.jar \ !--src s3://YOUR_BUCKET/logs/ \ !--desct hdfs:///output/ !
S3S3
S3HDFS
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hive
CREATE EXTERNAL TABLE s3_as_external_table( !"user_id INT, !"movie_id INT, !"rating INT, !"unixtime STRING ) !
ROW FORMAT DELIMITED FIELDS !TERMINATED BY '\t' !STORED AS TEXTFILE !LOCATION 's3://mybucket/tables/'; !
41
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
HiveETL
INSERT INTO TABLE table2 !SELECT ! column1, ! column2, ! column5, !FROM table1; ! !
42
table1column3,4
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
EMR Deep DiveRedshiftEMR
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
RDB SQL BI
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Elastic MapReduce
Hadoop MapReduce, Hive, Pig, Hadoop StreamingHadoop
hiveSQLRedshift
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
EMRRedshift
SQL/Redshift
EMR
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Elastic MapReduce
Hadoop
S3
ETL
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon DynamoDB
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DynamoDB
NoSQL as a Service
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DynamoDB
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SPOF 3
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
ReadWrite
Read : 1,000 Write : 100
Read : 500 Write : 1,000
DB
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DynamoDB
API
SDK
HTTPAPI
Database
Client SideService Side
Client application
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DynamoDB
1. KeyIndex-
2. Read/Write
Thats it, write your code!
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DynamoDB
Hash key
Hash key & Range key
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
IDKVS
UserIdItem
UserId (Hash)
Name Nicknames Mail Address Interests
aed9d Bob [ Rob, Bobby ] [email protected] some address [ Car, Motor Cycle]
edfg12 Alice [ Allie ]
a8eesd Carol [ Caroline ]
f42aed Dan [ Daniel, Danny ]
Users Table
DynamoDBauto_incrementID UUID
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
&
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
User (Hash)
Timestamp (Range)
Opponent Result
Alice 2014-02-21 12:21:20 Bob Lost
Alice 2014-02-21 12:42:01 Bob Won
Alice 2014-02-24 09:48:00 Dan Won
Alice 2014-02-25 16:21:11 Charlie Won
Battle History
User(Alice)Timestamp7
Charlie 02-25 16:21
Won!
Your Battle History
Dan 02-24 09:48
Won!
Alice 02-21 12:42
Won!
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
+1
Local Secondary Index Range key Hash key
Global Secondary Index Hash Key
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon DynamoDB
NoSQL RedshiftOLTP SQLJOIN
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
1. AWS
2. : 3. Getting Started with Big Data Services
Amazon Redshift Amazon Elastic MapReduce Amazon DynamoDB
4. Practical Deep Dive
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWSS3
64
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Elastic MapReduce
DynamoDB Redshift
S3
S3
65
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3EMR
Hadoop
66
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3EMR
S3
HDFSS3
67
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3EMR
S3
S3
68
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3Redshift
RedshiftS3
COPY table_name FROM s3://hogeCREDENTIALS access_key_id:hogeDELIMITER ,
RedshiftS3
69
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3Redshift
RedshiftS3
RedshiftS3
UNLOAD (SELECT * FROM)TO s3://fuga/.CREDENTIALS access_key_id:hoge;
70
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3Redshift
S3
RedshiftS3
71
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3DynamoDB
DynamoDB
72
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3DynamoDB
S3
DynamoDBS3
73
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon S3
Elastic MapReduce
Redshift
EC2
RDS
Storage Gatewa
y
EBS
Redshift
CloudFront
GW
Storage Gateway
Elastic Transcoder
Glacier
Data Pipeline
S3
74
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.
S3
DynamoDB
RDS
EMR
EC2
Redshift
DynamoDB
RDS
-
2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.