awsビッグデータソリューション amazon redshift, amazon emr

78
©2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. ©2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWSビッグデータソリューション Amazon Redshift, Amazon EMR, Amazon DynamoDBのご紹介 Yifeng Jiang Solutions Architect Amazon Data Services Japan TC-03 テクノロジートラック

Upload: vanphuc

Post on 13-Feb-2017

253 views

Category:

Documents


5 download

TRANSCRIPT

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    AWS Amazon Redshift, Amazon EMR, Amazon DynamoDB

    Yifeng JiangSolutions ArchitectAmazon Data Services Japan

    TC-03

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    1. AWS

    2. 3. Getting Started with Big Data Services

    Amazon Redshift Amazon Elastic MapReduce Amazon DynamoDB

    4. Practical Deep Dive

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    1. AWS

    2. 3. Getting Started with Big Data Services

    Amazon Redshift Amazon Elastic MapReduce Amazon DynamoDB

    4. Practical Deep Dive

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    AWS 40

    AWSRegions / Availability Zones / Contents Delivery POPSAZ Region

    EC2 ElasticLoad Balancing

    Auto Scaling S3 Glacier EBS Storage Gateway RDS DynamoDB ElastiCache Redshift

    Kinesis EMR Data Pipeline

    CloudFront

    Virtual Private Cloud Direct Connect Rout53

    WorkSpaces

    SQS SNS SES SWF Elastic Transcoder CloudSearch

    Management & Administration

    CloudWatch CloudTrail IAM Management Console SDK CLI

    CloudFormation BeanStalk OpsWorks

    EcosystemTechnology Partner / Consulting Partner

    4

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Ingestion

    Storage

    Big Data services on AWS

    DWH NoSQL DynamoDB Redshift

    S3

    Glacier

    Data Pipeline

    RDB

    Hadoop

    Workflow Management

    RDS

    Elastic MapReduce

    5

    Kinesis

  • AWS

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Storage

    Big Data services on AWS

    DWH NoSQL DynamoDB Redshift

    S3

    Glacier

    Data Pipeline

    RDB

    Hadoop

    Workflow Management

    RDS

    Elastic MapReduce

    7

    Ingestion Kinesis

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Storage

    Big Data services on AWS

    DWH NoSQL DynamoDB Redshift

    S3

    Glacier

    Data Pipeline

    RDB

    Hadoop

    Workflow Management

    RDS

    Elastic MapReduce

    8

    Ingestion Kinesis

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    1. AWS

    2. 3. Getting Started with Big Data Services

    Amazon Redshift Amazon Elastic MapReduce Amazon DynamoDB

    4. Practical Deep Dive

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    4

    1.

    2.

    3.

    4. BI API

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. Glacier RDS

    EMR

    EMR

    Redshift

    DynamoDB

    Data Pipeline

    S3

    Data

    ETL

    Sum

    Web app

    Analytics

    Dashboard

    11

    Kinesis

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    1.1.1.1, /login, 20140226000101, 192.168, /home, 20140226011226, 1.1.1.2, /home, 20140226011331,

    EMRRedshift

    ETLS3

    S3

    Web

    ETL

    USER PATH TIMESTAMP -----------------------------------USER1 /login 2014-02-26 00:00:01USER2 /home 2014-02-26 01:13:31

    12

    BI

    1.1.1.1, /login, 20140226000101, 192.168, /home, 20140226011226, 1.1.1.2, /home, 20140226011331,

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    USER1, 20140226000101, USER2, 20140226011226, USER1, 20140226011331,

    EMRRedshift

    ETL

    S3

    S3Web

    DMP-

    13

    API

    DynamoDBETL

    USER1: { Interest: [ Car, Home ], ... }USER2: { Interest: [ Dog, Cat ], }

    EMRS3

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    4

    1.

    2.

    3.

    4. BI API

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    1. AWS

    2. : 3. Getting Started with Big Data Services

    Amazon DynamoDBAmazon Elastic MapReduceAmazon Redshift

    4. Practical Deep Dive

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Amazon Redshift

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Amazon Redshift

    Data Warehouse as a Service

    160GB1.6PB (MPP)

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Amazon Redshift

    18

    PostgreSQL(psql)

    BI

    JDBC/ODBC

    10GigE Mesh

    SQL :

    N

    S3, DynamoDB, EMR

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    RDBMS Redshiftorderid name price

    1 Book 100

    2 Pen 50

    n Eraser 70

    orderid name price

    1 Book 100

    2 Pen 50

    n Eraser 70

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 20

    Amazon Redshift dw2.large:

    CPU: 2 virtual cores ECU: 7 Memory: 15 GiB Storage: 160GB(SSD) Network: 0.2GB/s

    dw2.8xlarge CPU: 32 virtual cores ECU: 104 Memory: 244 GiB Storage: 2.56TB(SSD) Network: 3.7GB/s

    " dw1.xlarge: CPU: 2 virtual cores ECU: 4.4 Memory: 15 GiB Storage: 2TB(HDD) Network: 0.3GB/s

    " dw1.8xlarge CPU: 16 virtual cores ECU: 35 Memory: 120 GiB Storage: 16TB(SSD) Network: 2.4GB/s

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Amazon S3CSV

    21

    Redshiftpsql -d mydb h YOUR_REDSHIFT_ENDPOINT -p 5439 -U awsuser -W !

    !COPY customer FROM 's3://data/customer.tbl. CREDENTIALS aws_access_key_id=KEY;aws_secret_access_key=SEC DELIMITER , GZIP TIME_FORMAT auto; !

    Redshiftcopy

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Amazon Redshift

    22

    RedshiftCREATE TABLE nginx ( ! remote_addr char(15), ! time timestamp, ! request varchar(255), ! status integer, ! bytes bigint, ! ua varchar!); !

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    SQL

    23

    SELECT ua, request, COUNT(*) !FROM nginx!GROUP BY ua, request; !

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    S3

    24

    UNLOAD TO s3://YOUR_BUCKET/PATH/ !SELECT * FROM nginx; !

  • Tableau + Redshift

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Amazon Redshift

    SQL RDB ETL

    ,

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Amazon Elastic MapReduce

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Hadoop

    Elastic MapReduce

    AWSHadoop Hadoop12MapR

    CloudWatch S3

    S3DynamoDB

    Hadoop

    Hadoop

    28

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Hadoop

    29

    aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 30

    aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !

    Hadoop

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 31

    aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !

    AMIHadoop

    Hadoop

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 32

    aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !

    Hadoop

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 33

    aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !

    Hadoop

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 34

    aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !

    Hadoop

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 35

    aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !

    VPC

    Hadoop

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    37

    aws emr create-cluster \ !--name bigdata-handson \ !--ami-version 3.2.1 \ !--applications Name=Hive \ !--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.large InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.large \ !--log-uri s3:/PATH/TO/LOG/ \ !--ec2-attributes SubnetId=subnet-a06474e6,KeyName=YOUR_KEY !--steps Type=HIVE,Name='Hive program, Args=[-f,s3://PATH/TO/QUERY.q] \ !--auto-terminate !

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    cronData PipelineHadoop

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    S3

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    S3

    HDFSS3 INPUTOUTPUTs3://

    40

    hadoop jar YOUR_JAR.jar \ !--src s3://YOUR_BUCKET/logs/ \ !--dest s3://YOUR_BUCKET/output/ !

    hadoop jar YOUR_JAR.jar \ !--src s3://YOUR_BUCKET/logs/ \ !--desct hdfs:///output/ !

    S3S3

    S3HDFS

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Hive

    CREATE EXTERNAL TABLE s3_as_external_table( !"user_id INT, !"movie_id INT, !"rating INT, !"unixtime STRING ) !

    ROW FORMAT DELIMITED FIELDS !TERMINATED BY '\t' !STORED AS TEXTFILE !LOCATION 's3://mybucket/tables/'; !

    41

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    HiveETL

    INSERT INTO TABLE table2 !SELECT ! column1, ! column2, ! column5, !FROM table1; ! !

    42

    table1column3,4

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    EMR Deep DiveRedshiftEMR

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Amazon Redshift

    RDB SQL BI

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Amazon Elastic MapReduce

    Hadoop MapReduce, Hive, Pig, Hadoop StreamingHadoop

    hiveSQLRedshift

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    EMRRedshift

    SQL/Redshift

    EMR

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Amazon Elastic MapReduce

    Hadoop

    S3

    ETL

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Amazon DynamoDB

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    DynamoDB

    NoSQL as a Service

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    DynamoDB

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    SPOF 3

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    ReadWrite

    Read : 1,000 Write : 100

    Read : 500 Write : 1,000

    DB

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    DynamoDB

    API

    SDK

    HTTPAPI

    Database

    Client SideService Side

    Client application

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    DynamoDB

    1. KeyIndex-

    2. Read/Write

    Thats it, write your code!

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    DynamoDB

    Hash key

    Hash key & Range key

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    IDKVS

    UserIdItem

    UserId (Hash)

    Name Nicknames Mail Address Interests

    aed9d Bob [ Rob, Bobby ] [email protected] some address [ Car, Motor Cycle]

    edfg12 Alice [ Allie ]

    a8eesd Carol [ Caroline ]

    f42aed Dan [ Daniel, Danny ]

    Users Table

    DynamoDBauto_incrementID UUID

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    &

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    User (Hash)

    Timestamp (Range)

    Opponent Result

    Alice 2014-02-21 12:21:20 Bob Lost

    Alice 2014-02-21 12:42:01 Bob Won

    Alice 2014-02-24 09:48:00 Dan Won

    Alice 2014-02-25 16:21:11 Charlie Won

    Battle History

    User(Alice)Timestamp7

    Charlie 02-25 16:21

    Won!

    Your Battle History

    Dan 02-24 09:48

    Won!

    Alice 02-21 12:42

    Won!

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    +1

    Local Secondary Index Range key Hash key

    Global Secondary Index Hash Key

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Amazon DynamoDB

    NoSQL RedshiftOLTP SQLJOIN

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    1. AWS

    2. : 3. Getting Started with Big Data Services

    Amazon Redshift Amazon Elastic MapReduce Amazon DynamoDB

    4. Practical Deep Dive

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    AWSS3

    64

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Elastic MapReduce

    DynamoDB Redshift

    S3

    S3

    65

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    S3EMR

    Hadoop

    66

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    S3EMR

    S3

    HDFSS3

    67

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    S3EMR

    S3

    S3

    68

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    S3Redshift

    RedshiftS3

    COPY table_name FROM s3://hogeCREDENTIALS access_key_id:hogeDELIMITER ,

    RedshiftS3

    69

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    S3Redshift

    RedshiftS3

    RedshiftS3

    UNLOAD (SELECT * FROM)TO s3://fuga/.CREDENTIALS access_key_id:hoge;

    70

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    S3Redshift

    S3

    RedshiftS3

    71

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    S3DynamoDB

    DynamoDB

    72

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    S3DynamoDB

    S3

    DynamoDBS3

    73

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    Amazon S3

    Elastic MapReduce

    Redshift

    EC2

    RDS

    Storage Gatewa

    y

    EBS

    Redshift

    CloudFront

    GW

    Storage Gateway

    Elastic Transcoder

    Glacier

    Data Pipeline

    S3

    74

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

    S3

    DynamoDB

    RDS

    EMR

    EC2

    Redshift

    DynamoDB

    RDS

  • 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.