Download - Introduction to YARN Apps
![Page 1: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/1.jpg)
Intro to YARN Apps Sandy Ryza
![Page 2: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/2.jpg)
Introduc4on
• What’s YARN? • YARN apps • Building YARN apps
![Page 3: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/3.jpg)
The OS analogy
Traditional Operating System
Storage: File System
Execution/Scheduling: Processes/Kernel
Scheduler
![Page 4: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/4.jpg)
The OS analogy
Hadoop
Storage: Hadoop Distributed File System (HDFS)
Execution/Scheduling: YARN!
![Page 5: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/5.jpg)
Goal: Mul4tenancy
• Different types of applications on the same cluster
• Different users and organizations on the same cluster
![Page 6: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/6.jpg)
ResourceManager (RM)
• Central service that tracks o Nodes
§ Resources o Applications o Containers
• Houses scheduler, which is in charge of all container placement decisions
![Page 7: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/7.jpg)
NodeManager (NM)
• One on every node • Launches container processes • Enforces resource allocations • Monitors liveliness
![Page 8: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/8.jpg)
Applica4on Master (AM)
• User/application code • Every application instance has one • Runs inside a container on the cluster • Requests resources from ResourceManager
![Page 9: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/9.jpg)
YARN
ResourceManager
NodeManager NodeManager
Container
Map Task
Container
Application Master
Container
Reduce Task
JobHistoryServer Client
![Page 10: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/10.jpg)
Processing Frameworks / YARN apps
• MapReduce o Batch processing, fault tolerant
• Impala o Low latency SQL on Hadoop
• Spark o Load data into memory, great for iterative
algorithms • Storm o Stream processing
![Page 11: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/11.jpg)
YARN app models
• Applica4on master (AM) per job • Most simple for batch • Used by MapReduce
![Page 12: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/12.jpg)
YARN app models
• Applica4on master per session • Runs mul4ple jobs on behalf of the same user • Recently added in Tez • Spark interac4ve mode
![Page 13: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/13.jpg)
YARN app models
• Singleton AM as permanent service • Always on, waits around for jobs to come in • Used for Impala
![Page 14: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/14.jpg)
YARN/MR Scheduling
Fair Scheduler Decide which jobs to give resources to
ResourceManager
Decide which tasks to give resources to within a job
MapReduce Application Master
![Page 15: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/15.jpg)
Scheduling on Hadoop
ResourceManager
Application Master 1
Application Master 2
Node 1 Node 2 Node 3
![Page 16: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/16.jpg)
Scheduling on Hadoop
ResourceManager
Application Master 1
Application Master 2
Node 1 Node 2 Node 3
I want 2 containers with 1024 MB and a 1 core each
![Page 17: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/17.jpg)
Scheduling on Hadoop
ResourceManager
Application Master 1
Application Master 2
Node 1 Node 2 Node 3
Noted
![Page 18: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/18.jpg)
Scheduling on Hadoop
ResourceManager
Application Master 1
Application Master 2
Node 1 Node 2 Node 3
I’m still here
![Page 19: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/19.jpg)
Scheduling on Hadoop
ResourceManager
Application Master 1
Application Master 2
Node 1 Node 2 Node 3
I’ll reserve some space on node1 for AM1
![Page 20: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/20.jpg)
Scheduling on Hadoop
ResourceManager
Application Master 1
Application Master 2
Node 1 Node 2 Node 3
Got anything for me?
![Page 21: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/21.jpg)
Scheduling on Hadoop
ResourceManager
Application Master 1
Application Master 2
Node 1 Node 2 Node 3
Here’s a security token to let you launch a container on Node 1
![Page 22: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/22.jpg)
Scheduling on Hadoop
ResourceManager
Application Master 1
Application Master 2
Node 1 Node 2 Node 3
Hey, launch my container with this shell command
![Page 23: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/23.jpg)
Scheduling on Hadoop
ResourceManager
Application Master 1
Application Master 2
Node 1 Node 2 Node 3
Container
![Page 24: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/24.jpg)
Should you build a YARN app?
• MapReduce can’t run arbitrary DAGs? o Use Spark
![Page 25: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/25.jpg)
Should you build a YARN app?
• MapReduce can’t store data in memory? o Use Spark
![Page 26: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/26.jpg)
Should you build a YARN app?
• Iterative processing? o Use Spark
![Page 27: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/27.jpg)
Should you build a YARN app?
• Have an existing distributed app that runs all tasks at once? o Use distributed shell
![Page 28: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/28.jpg)
When to build a YARN app
• Allocating and releasing containers dynamically
• Weird scheduling requirements o Gang o Complex locality
![Page 29: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/29.jpg)
What YARN does for you
• Deploys your bits • Runs your processes • Monitors your processes • Kills your processes when they misbehave
![Page 30: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/30.jpg)
What YARN does not do for you
• Communication between your processes
![Page 31: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/31.jpg)
AMRMClientAsync CallbackHandler handler = new CallbackHandler() {
public void onContainersAllocated(List<Container> containers) {
for (Container container : containers) {
startTask(container);
}
}
[... more methods]
}
AMRMClientAsync amClient = AMRMClientAsync.createAMRMClientAsync(1000, handler);
amClient.registerApplicationMaster(NetUtils.getHostName(), -1, “”);
amClient.addContainerRequest(
new ContainerRequest(
Resource.newInstance(1024, 1),
new String[] {“node1”, “node2”}, new String[] {“rack1”},
Priority.newInstance(2)));
![Page 32: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/32.jpg)
NMClientAsync CallbackHandler nmHandler = new CallbackHandler() {
[... listen for containers stopped and started]
}
NMClientAsync nmClient = NMClientAsync.createNMClientAsync(nmHandler);
![Page 33: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/33.jpg)
Launching Containers
public void startContainer(Container container) {
ContainerLaunchContext launchContext =
ContainerLaunchContext.newInstance(
localResources,
environment,
Arrays.asList(“sleep 1000”),
serviceData,
tokens,
acls);
nmClient.startContainerAsync(container, launchContext);
}
![Page 34: Introduction to YARN Apps](https://reader033.vdocuments.mx/reader033/viewer/2022042608/55d49c95bb61ebb42c8b458e/html5/thumbnails/34.jpg)
Local resources
HDFS
Node Container Container
file.txt
file.txt
Node Container Container
file.txt