(sdd401) amazon elastic mapreduce deep dive and best practices | aws re:invent 2014
DESCRIPTION
Amazon Elastic MapReduce is one of the largest Hadoop operators in the world. Since its launch five years ago, AWS customers have launched more than 5.5 million Hadoop clusters. In this talk, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters and other Amazon EMR architectural patterns. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost efficient.TRANSCRIPT
![Page 1: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/1.jpg)
November 13th, 2014 | Las Vegas, NV
Ian Meyers, Amazon Web Services
![Page 2: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/2.jpg)
Compute Storage
AWS Global Infrastructure
Database
App Services
Deployment & Administration
Networking
Analytics
Amazon Elastic MapReduceManaged, elastic Hadoop (1.x & 2.x) cluster
Integrates with Amazon S3, Amazon DynamoDB, Amazon
Kinesis and Amazon Redshift
Install Storm, Spark, Presto, Hive, Pig, Impala, & end-user
tools automatically
Native support for Spot Instances
Integrated HBase NoSQL database
Amazon EMR
![Page 3: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/3.jpg)
![Page 4: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/4.jpg)
![Page 5: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/5.jpg)
![Page 6: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/6.jpg)
--bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop
--keyword-config-file – merge values in new config to existing
--keyword-key-value – override values provided
Configuration File NameConfiguration File
KeywordFile Name Shortcut Key-Value Pair Shortcut
core-site.xml core C c
hdfs-site.xml hdfs H h
mapred-site.xml mapred M m
yarn-site.xml yarn Y y
![Page 7: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/7.jpg)
Set number of mappers per task tracker
Useful for small memory footprint map tasks
More work done with a given instance
![Page 8: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/8.jpg)
Set HDFS block size to 1MB
Useful for smaller files when HDFS is used
![Page 9: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/9.jpg)
Reuse mappers
Mapper startup time ~ 2-20 seconds
Useful for tasks with large number of mappers
Mappers must be “clean” after run (relevant for Java)
![Page 10: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/10.jpg)
Configure process heap size, Java opts, and allow for replacing the hadoop-user-env.sh
Hadoop 1
Hadoop 2
--bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-daemons--args –{namenode}-heap-size=2048,--{namenode}-opts=-XX:GCTimeRatio=19
![Page 11: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/11.jpg)
![Page 12: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/12.jpg)
![Page 13: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/13.jpg)
EMRfs
HDFS
Amazon EMR
Amazon S3 Amazon
DynamoDB
Processed Files
Registry
File Data
![Page 14: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/14.jpg)
![Page 15: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/15.jpg)
![Page 16: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/16.jpg)
![Page 17: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/17.jpg)
55
![Page 18: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/18.jpg)
5
![Page 19: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/19.jpg)
![Page 20: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/20.jpg)
≈60sec * 15MB 1GB
![Page 21: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/21.jpg)
![Page 22: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/22.jpg)
aws emr add-steps --cluster-id <cluster>
--steps Name=GroupSmallFiles,
Type=CUSTOM_JAR,
Args=files,home/hadoop/lib/emr-s3distcp-1.0.jar,
src,s3://myawsbucket/cf,
dest,hdfs:///local,
groupBy,.*(i-\w.log).*,
targetSize,128…
![Page 23: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/23.jpg)
![Page 24: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/24.jpg)
Algorithm % Space
Remaining
Encoding
Speed
Decoding
Speed
GZIP 13% 21MB/s 118MB/s
LZO 20% 135MB/s 410MB/s
Snappy 22% 172MB/s 409MB/s
![Page 25: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/25.jpg)
-outputCodec,lzo
![Page 26: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/26.jpg)
![Page 27: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/27.jpg)
Amazon EMR Cluster
Task Instance
Group
Core Instance
Group
HDF
S
HDF
S
Amazon S3
![Page 28: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/28.jpg)
HUGE Benefit!!
![Page 29: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/29.jpg)
![Page 30: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/30.jpg)
EMR
EMR
Amazon
S3
![Page 31: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/31.jpg)
![Page 32: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/32.jpg)
![Page 33: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/33.jpg)
Amazon EMR Cluster
Task Instance
Group
Core Instance
Group
HDF
S
HDF
S
Amazon S3
![Page 34: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/34.jpg)
![Page 35: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/35.jpg)
S3D
istC
P
![Page 36: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/36.jpg)
![Page 37: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/37.jpg)
S3D
istC
P
![Page 38: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/38.jpg)
![Page 39: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/39.jpg)
![Page 40: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/40.jpg)
![Page 41: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/41.jpg)
EMR
HDFS
Pig
![Page 42: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/42.jpg)
Hive 0.13.1• Support for ORC
• Window functions
• Decimal types
• TRUNCATE command
• Better optimiser (less
need for hinting)
Pig 0.12.0• Streaming UDF’s not
written in Java
• Native support for Avro
• Native support for
Parquet
• Improved data types
Impala 1.1 • In-memory SQL engine
• Support for HBase
tables
• Support for Parquet –
column-oriented file
format
• Query and interactive
shells
HBase 0.94.18• Database
Snapshotting
• Improved read caching
and seek optimisation
• Improved transactions
![Page 43: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/43.jpg)
Read Data Directly into Hive,
Pig, Streaming and Cascading
from Kinesis Streams
No Intermediate Data
Persistence Required
Simple way to introduce real time sources into
Batch Oriented Systems
Multi-Application Support & Automatic
Checkpointing
Amazon EMR Integration with Amazon Kinesis
![Page 44: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/44.jpg)
drop table call_data_records;
CREATE TABLE call_data_records (start_time bigint,end_time bigint,phone_number STRING,carrier STRING,recorded_duration bigint,calculated_duration bigint,lat double,long double
)ROW FORMAT DELIMITEDFIELDS TERMINATED BY ","STORED BY'com.amazon.emr.kinesis.hive.KinesisStorageHandler'TBLPROPERTIES("kinesis.stream.name"="TestAggregatorStream");
Amazon EMR Integration with Amazon Kinesis
![Page 45: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/45.jpg)
![Page 46: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/46.jpg)
![Page 47: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/47.jpg)
![Page 48: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/48.jpg)
![Page 49: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/49.jpg)
![Page 50: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/50.jpg)
![Page 51: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/51.jpg)
EC2 InstanceMap
Tasks
Reduce
Tasks
m1.small 2 1
m1.large 3 1
m1.xlarge 8 3
m2.xlarge 3 1
m2.2xlarge 6 2
m2.4xlarge 14 4
m3.xlarge 6 1
m3.2xlarge 12 3
cg1.4xlarge 12 3
cc2.8xlarge 24 6
c3.4xlarge 24 6
hi1.4xlarge 24 6
hs1.8xlarge 24 6
cr1.8xlarge &
c3.8xlarge48 12
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
0
50
100
150
200
250
300
Memory (GB) Mappers* Reducers* CPU (ECU Units) Local Storage (GB)
![Page 52: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/52.jpg)
![Page 53: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/53.jpg)
![Page 54: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/54.jpg)
Instance Cost / Map Task Cost / Reduce Task
m1.large $0.08 $0.15
m1.xlarge $0.06 $0.15
m3.xlarge $0.04 $0.07
m3.2xlarge $0.04 $0.07
![Page 55: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/55.jpg)
Instance Cost / Map Task Cost / Reduce Task
c1.medium $0.13 $0.13
c1.xlarge $0.35 $0.70
c3.xlarge $0.05 $0.11
c3.2xlarge $0.05 $0.11
![Page 56: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/56.jpg)
![Page 57: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/57.jpg)
![Page 58: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/58.jpg)
Total tasks * Time to process sample files
Instance task capacity * Desired processing time
Estimated number of nodes:
![Page 59: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/59.jpg)
1. Estimate the number of tasks your job requires
150
2. Pick an instance and note down the number of Tasks it can run in parallel
m1.xlarge with 8 task capacity per instance
![Page 60: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/60.jpg)
3. We need to pick some sample data files to run a
test workload. The number of sample files should
be the same number from step #2.
8 files selected for our sample test
![Page 61: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/61.jpg)
4. Run an Amazon EMR cluster with a single core
node and process your sample files from #3.
Note down the amount of time taken to process
this dataset.
3 min to process 8 files
![Page 62: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/62.jpg)
Total tasks for your job * Time to process sample files
Per instance task capacity * Desired processing time
Estimated number of nodes:
150 * 3 min 8 * 5 min
= 11 m1.xlarge
![Page 63: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/63.jpg)
![Page 64: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/64.jpg)
![Page 65: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/65.jpg)
Master instance group
Amazon EMR cluster
HDFS HDFS
Run TaskTrackers
(Compute)
Run DataNode
(HDFS)
Core instance group
![Page 66: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/66.jpg)
Can add core nodes
More HDFS space
More CPU/memory
Master instance group
Amazon EMR cluster
HDFS HDFS HDFS
Core instance group
![Page 67: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/67.jpg)
Can’t remove core
nodes because of
HDFS
Master instance group
HDFS HDFS HDFS
Amazon EMR cluster
Core instance group
![Page 68: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/68.jpg)
Run TaskTrackers
No HDFS
Reads from core node
HDFS
Master instance group
HDFS HDFS
Amazon EMR cluster
Task instance groupCore instance group
![Page 69: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/69.jpg)
Can add task
nodes
Master instance group
HDFS HDFS
Amazon EMR cluster
Task instance groupCore instance group
![Page 70: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/70.jpg)
More CPU power
More memory
Master instance group
HDFS HDFS
Amazon EMR cluster
Task instance groupCore instance group
![Page 71: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/71.jpg)
You can remove
task nodes when
processing is
completed
Task instance group
Master instance group
Core instance group
HDFS HDFS
Amazon EMR cluster
![Page 72: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/72.jpg)
You can remove
task nodes when
processing is
completed
Master instance group
HDFS HDFS
Amazon EMR cluster
Task instance groupCore instance group
![Page 73: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/73.jpg)
![Page 74: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/74.jpg)
![Page 75: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/75.jpg)
Amazon
CloudWatch
![Page 76: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/76.jpg)
![Page 77: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/77.jpg)
![Page 78: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/78.jpg)
![Page 79: (SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Invent 2014](https://reader034.vdocuments.mx/reader034/viewer/2022052507/5591b0721a28ab26518b46bf/html5/thumbnails/79.jpg)