aws black belt techシリーズ aws data pipeline

53
©2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. ©2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Data Pipeline AWS Black Belt Tech Webinar 2014 (旧マイスターシリーズ) Yuta Imai Solutions Architect, Amazon Data Services Japan

Upload: amazon-web-services-japan

Post on 04-Jul-2015

1.875 views

Category:

Technology


2 download

DESCRIPTION

AWS Black Belt Tech Webinar 2014 (旧マイスターシリーズ) AWS Data Pipeline

TRANSCRIPT

  • 1. AWS Data PipelineAWS Black Belt Tech Webinar 2014 ()Yuta ImaiSolutions Architect, Amazon Data Services Japan2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.

2. 1. 2. AWS Data Pipeline3. AWS Data Pipeline2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 3. 1. 2. AWS Data Pipeline3. AWS Data Pipeline2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 4. AWSTechnology Partner / Consulting Partner EcosystemManagementAdministrationCloudWatchCloudTrailIAMManagement ConsoleSDKCLIKinesisEMRData PipelineCloudFormationBeanStalkOpsWorksSQSSNSSESSWFElastic TranscoderCloudSearchAuto ScalingS3GlacierEBSStorage GatewayRDSDynamoDBElastiCacheRedshiftAWSWorkSpacesRegions / Availability Zones / Contents Region AZ Delivery POPS2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.EC2ElasticLoad BalancingCloudFrontVirtual Private CloudDirect ConnectRout534 5. Big Data services on AWSNoSDQynLam oDBRedshDiftWHInterfaceStorageData PipelineS3Glacier2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.RDBHadoopWorkflowManagementRDSElasticMapReduce5Kinesis 6. 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 7. 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 8. 41. 2. 3. 4. BI API2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 9. EMRData PipelineEMRDynamoDBRedshiftS32014, Amazon Web Services, Inc. or its affiliates. All rights reserved. Glacier RDSDataETLSumWeb appAnalyticsDashboard9Kinesis 10. ETL2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 11. 41. 2. 3. 4. ExtractionTransform, Load2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 12. 41. 2. 3. 4. ExtractionETLTransform, Load2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 13. ETL2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 14. 1. 2. AWS Data Pipeline3. AWS Data Pipeline2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 15. ETL 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 16. ETL2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 17. ETL S3, RDS, EMR, Redshift, DynamoDB Input DataReady? Yes RunNo2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 18. AWS Data Pipeline is ETL of things.2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 19. AWS Data Pipeline 1. 2. 3. 4. 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 20. Pipeline Data Node: Activity: Schedule: Resource: Precondition: Action: 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 21. (Activities) AWS CopyActivity EmrActivity HiveActivity HiveCopyActivity PigActivity RedshiftCopyActivity SqlActivity ShellCommandActivity2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 22. Input / Output S3 SQL DynamoDB Redshift CSV Data Format 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 23. / : S3DataNode, SqlDataNode ShellCommandActivity : offStage = true : ${INPUTx_STAGING_DIR} , ${OUTPUTx_STAGING_DIR} HiveActivity : on. : ${inputx}, ${outputx}{id: MyHiveActivity,hiveScript: INSERT OVERWRITE TABLE ${output1} select * from ${input1};},2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.Table 24. (Preconditions) DynamoDB table S3 S3 Shell pipelineS3 keyexists? Yes Copy2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.No 25. Cron: start Time Series: end EC2EMRstart : 15 15min ~ 3year2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.Start/Cron1TS1 /Cron2PeriodTS2 /Cron3Period 26. (2)Backfill 1 CLI --force : UTC, YYYY-MM-DDTHH:MM:SS #{inTimeZone(myDateTime,Asia/Tokyo')}2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 27. EC2: EC2-ClassicEC2-VPC EMR: spot instance Multi-region2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 28. (2) Activity: 20 Resource: : EMR12014, Amazon Web Services, Inc. or its affiliates. All rights reserved.Task 1 Task 2 Task 3 29. SNS 1~6 3Task 1 Task 22014, Amazon Web Services, Inc. or its affiliates. All rights reserved.Alert Alert 30. GUI CSV/TSV EMR/EC22014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 31. GUIJSON Pipeline . /da taCpLipIeline --create pipeline_name --put pipeline_file--activate --force{objects: [{id: ActivityId_YYbJV,schedule: {ref: ScheduleId_X8kbH},scriptUri: s3://mybucket/myscript.sh,name: ShellActivity1,runsOn: {ref: ResourceId_5nJIh},...]} ./datapipeline validate my-pipeline.json credentialcredetials.json --force --id df-0123456789ABCD2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 32. 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. (EC2, S3) (11) On AWS $1.00 $0.60 $2.50 $1.50pipeline $1.00* 2014/3/19 / 33. Example: S3Redshift2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 34. Example: S3Redshift Type S3DataNode Directory Path S3 Compression Data Format S32014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 35. Example: S3Redshift Type TSV CSV 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 36. Example: S3Redshift Type RedshiftCopyActivity Insert Mode KEEP_EXISTING Input S3 Output Redshift Runs on EC2RedshiftCOPY2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 37. Example: S3Redshift Type Ec2Resource Schedule EC2 Terminate After TerminateRedshiftCopyActivityEC22014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 38. Example: S3Redshift Type Schedule Period Start/End Date Time 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 39. Example: S3Redshift Type RedshiftDataNode TableName Database DatabaseRedshift2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 40. AWS Data Pipeline 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 41. AWS Data Pipeline1.1.1.1, /login, 20140226000101, 192.168, /home, 20140226011226, 1.1.1.2, /home, 20140226011331, EMRUSER PATH TIMESTAMP -----------------------------------USER1 /login 2014-02-26 00:00:01USER2 /home 2014-02-26 01:13:31RedshiftETLS3S3WebETL2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.41BI1.1.1.1, /login, 20140226000101, 192.168, /home, 20140226011226, 1.1.1.2, /home, 20140226011331, 42. AWS Data PipelineEMRRedshiftETLS3S32014, Amazon Web Services, Inc. or its affiliates. All rights reserved.WebBIDataResourceData DataLogicResourceLogicSchedulePreconditions 43. 1. 2. AWS Data Pipeline3. AWS Data Pipeline2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 44. ShellCommandActivity What is ShellCommandActivity? S3Shell Ec2ActivityEmrActivity 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 45. Task RunnerEC2java -jar TaskRunner-1.0.jar --config ~/credentials.json workerGroup=WorkerGroup1 --region=MyRegion --logUri=s3://mybucket/foldername!http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-how-task-runner-user-managed.html2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 46. Data Pipeline/ Data Pipeline AWS Lambda[new!] S32014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 47. DynamoDB Import/ExportDynamoDB Import/ExportData Pipeline2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 48. Getting Started!http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/welcome.html2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 49. Dataduct https://github.com/coursera/dataduct CouseraData Pipeline YAML2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 50. See also..(BDT303) Construct Your ETL Pipeline with AWS DataPipeline, Amazon EMR, and Amazon Redshifthttp://www.slideshare.net/AmazonWebServices/bdt303-construct-your-etl-pipeline-with-aws-data-pipeline-amazon-emr-and-amazon-redshift-aws-reinvent-20142014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 51. 2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 52. AWS Data Pipeline is ETL of things! ETL ETL AWS Data PipelineETL/2014, Amazon Web Services, Inc. or its affiliates. All rights reserved. 53. AWS Data PipelineETL2014, Amazon Web Services, Inc. or its affiliates. All rights reserved.:http://www.slideshare.net/AmazonWebServices/bdt303-construct-your-etl-pipeline-with-aws-data-pipeline-amazon-emr-and-amazon-redshift-aws-reinvent-2014