Hadoop in the cloud with AWS' EMR

Download Hadoop in the cloud with AWS' EMR

Post on 17-Jan-2015




3 download

Embed Size (px)


Quick intro to and walkthrough of the AWS Elastic Map Reduce (EMR) service. Part of a larger course at http://bit.ly/get-hadoop


<ul><li> 1. Hadoop in the Cloud: AWS Elastic Map Reduce What is EMR? How does EMR compare to Hadoop? Use cases </li> <li> 2. EMR is an AWS Service AWS review helpful to understand Infiniteskills offers a course! http://bit.ly/learn-aws AWS constantly changing and evolving http://aws.amazon.com/documentation/elasticmapreduce/ </li> <li> 3. EMR Overview Abstracts out cluster setup &amp; management Integrated provisioning, tooling, debug, monitoring AWS constantly tuning and optimizing Failed nodes automatically re-provisioned by AWS Reduced costs Clusters shut down automatically by default Excellent for sporadic MapReduce needs Integration to AWS Leverage cost-effective EC2 instances for processing, S3 for storage Monitoring done via CloudWatch </li> <li> 4. EMR Architecture Master Instance Group EC2 S3 Core Instance Group EC2EC2 HDFS HDFS Task Instance Group EC2 EC2 EC2 EC2 Master group controls cluster Core group runs DataNode &amp; TaskTracker daemons Task group runs tasks Can be added &amp; removed S3 can be used for data input / output Master group coordinates core + task activities and manages cluster state Core + task instances read / write to / from S3 </li> <li> 5. EMR AWS Integration Datastore pull / push to RDS DynamoDB S3 Derived data can be stored in RedShift Via AWS DataPipelines Further post-processing Data can be pre-processed with Kinesis </li> <li> 6. What you give up with EMR Control Always 2-3 months behind Hadoop releases Cannot use CDH or HDP releases (although MapR is supported) Speed (if youre not an AWS customer) Vendor lock-in </li> <li> 7. EMR Use Cases Already AWS customer Lots of data in S3 / DynamoDB / RDS Sporadic MapReduce needs Proof-of-concepting Hadoop Ease of use Seamless, near-infinite scale Simple administration </li> <li> 8. Hadoop in the Cloud: AWS Elastic Map Reduce What is EMR? How does EMR compare to Hadoop? Benefits &amp; downsides Use cases </li> </ul>


View more >