apache hadoop yarn - hortonworks meetup presentation

12
Apache Hadoop YARN Page 1

Upload: hortonworks

Post on 25-May-2015

2.870 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: Apache Hadoop YARN - Hortonworks Meetup Presentation

Apache Hadoop YARN

Page 1

Page 2: Apache Hadoop YARN - Hortonworks Meetup Presentation

A Cursory Look At The Architecture

© Hortonworks Inc. 2012. Confidential and Proprietary. Page 2

Page 3: Apache Hadoop YARN - Hortonworks Meetup Presentation

Global Scheduler (ResourceManager)

Page 3

• Pure resource arbitration • Multiple resource dimensions

–<priority, data-locality, memory, cpu, …>

• In-built support for data-locality –Node, Rack etc.– Unique to YARN

© Hortonworks Inc. 2012. Confidential and Proprietary.

Page 4: Apache Hadoop YARN - Hortonworks Meetup Presentation

Scheduler Concepts

Page 4

• Input from AM(s) is a dynamic list of ResourceRequests –<resource-name, resource-capability>– Resource name: (hostname / rackname / any)– Resource capability: (memory, cpu, …) – Essentially an inverted <name, capability> request map from AM to

RM– No notion of tasks!

• Output - Container–Resource(s) grant on a specific machine–Verifiable grant

© Hortonworks Inc. 2012. Confidential and Proprietary.

Page 5: Apache Hadoop YARN - Hortonworks Meetup Presentation

Scheduling Walkthrough

Page 5

MapReduce job with 2 maps and 1 reduce

© Hortonworks Inc. 2012. Confidential and Proprietary.

Page 6: Apache Hadoop YARN - Hortonworks Meetup Presentation

Scheduling Walkthrough

Page 6

Container allocation on r22/h2121:

© Hortonworks Inc. 2012. Confidential and Proprietary.

Page 7: Apache Hadoop YARN - Hortonworks Meetup Presentation

Scheduling Walkthrough

Page 7

Container allocation on r11/h1010:

© Hortonworks Inc. 2012. Confidential and Proprietary.

Page 8: Apache Hadoop YARN - Hortonworks Meetup Presentation

Writing Custom Applications

Page 8

• Grand total of 3 protocols–ClientRMProtocol

– Application launching program– submitApplication

–AMRMProtocol– Protocol between AM & RM for resource allocation– registerApplication / allocate / finishApplication

–ContainerManagerProtocol– Protocol between AM & NM for container start/stop– startContainer / stopContainer

© Hortonworks Inc. 2012. Confidential and Proprietary.

Page 9: Apache Hadoop YARN - Hortonworks Meetup Presentation

© Hortonworks Inc. 2012

API improvements

• Overload of the ‘*’ entry.• Release / reject containers• Ask for specific nodes/racks (only)• Don’t give me containers on this racks/nodes• Single client thread allowed to request containers• Overloaded allocate call

Page 9

Page 10: Apache Hadoop YARN - Hortonworks Meetup Presentation

© Hortonworks Inc. 2012

Recent advancements

• Tools for debugging AMs–Unmanaged AM

• Generic AM – Utility libraries for writing –YARN-103, YARN-29

• YARN project split and how multiple versions of MapReduce can coexist.

Page 10

Page 11: Apache Hadoop YARN - Hortonworks Meetup Presentation

© Hortonworks Inc. 2012

Roadmap

• MapReduce container reuse• RM restart capability• Multi-resource scheduling• Generic application history server

Page 11

Page 12: Apache Hadoop YARN - Hortonworks Meetup Presentation

Questions?

Page 12

Thank You!

© Hortonworks Inc. 2012. Confidential and Proprietary.