resource managment in hadoop

36

Upload: apj123

Post on 21-Dec-2015

16 views

Category:

Documents


2 download

DESCRIPTION

Resource Managment in Hadoop

TRANSCRIPT

Page 1: Resource Managment in Hadoop
Page 2: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Hadoop. Resource management.Alexey FilanovskiyCloudera certified developer

Page 3: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 3

YARN and MRv2. General architecture.

Page 4: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

YARN and MRv2. Problem of MRv1

One coordinator for all MR jobs (JobTracker).- Cluster scalable till 3000 nodes- We want HA for JobTracker- Not efficient way to use HW resource of the cluster (separate map or reduce slots)- Desire for federating different components into one cluster (not only MR, also Impala, for example)

Page 5: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

YARN. Main idea #1.

Scalable of JobTracker.Split it into two components:- Resource Manager – handing cluster resource (CPU, RAM…). One per cluster.- Application Master, coordinate dedicated MR. One per MR job

Page 6: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

YARN. Main idea #2.

Move from slots approach of Resource management to physical world approach(Memory, CPU, Disk). Determinate amount of resource that can we used by each process for each node (for example, Impala can use 4 core, 16 Gb RAM, MapReduce 12 cores 32 Gb RAM)For each map of reduce are dedicated some amount RAM, cores, weight for IO operation…

Page 7: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

YARN.YARN: Yet-Another-Resource-Negotiator

Page 8: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

YARN. Running jobs. Advanced

Page 9: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 10

Scheduler

Page 10: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

YARN. Running jobs. Advanced

Let’s zoom it

Page 11: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

YARN. Running jobs. Advanced

Scheduler determinate quire of MR Jobs and starting order

Page 12: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Scheduler

There are 3 main schedulers that available for CDH:1) FIFO2) Fair Scheduler 3) Capacity Scheduler

Page 13: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 14

FIFO Scheduler

Page 14: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 15

FIFO (The first in, first out) Scheduler

Life queue. First comes, first goes!

Page 15: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 16

DEMO

Page 16: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 17

FIFO (The first in, first out) Scheduler. Demo - One by one were started 15 MR applications.on graph bellow obviously can be observed behaviorof FIFO scheduler.

-First 6 application occupied all available containers (MR slots). Other applications goes to pending pool.

Page 17: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 18

FIFO (The first in, first out) Scheduler. Demo - One by one were started 15 MR applications.on graph bellow obviously can be observed behaviorof FIFO scheduler.

-First 6 application occupied all available containers (MR slots). Other applications goes to pending pool (total 9).

- When some jobs finished, 9 jobs from pending pool share between themselves available resource

Page 18: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 19

FIFO (The first in, first out) Scheduler. Demo - One by one were started 15 MR applications.on graph bellow obviously can be observed behaviorof FIFO scheduler.

-First 6 application occupied all available containers (MR slots). Other applications goes to pending pool.

- When some jobs finished, 9 jobs from pending pool share between themselves available resource

- After this we start another one 5 MR applications.They goes to pending pool, but resource that is releasedgoes to application that already started. New one is stillpending

Page 19: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 20

FIFO (The first in, first out) Scheduler. Demo. Whole picture -First 6 application occupied all available containers (MR slots). Other applications goes topending pool.

- When some jobs finished, 9 jobsfrom pending pool share between themselves available resource

-After this we start another one 5 MR applications.They goes to pending pool, butresource that is releasedgoes to application that alreadystarted. New one is stillpending

Page 20: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 21

Fair Scheduler

Page 21: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 22

Fair Scheduler

Everything should be fair!

Page 22: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 23

Fair Scheduler. Main concepts

1. All hardware resources shared by all applications based on config file (some policies)2. Piece of HW resource that dedicate for each job determinate by queue3. Each application are placed in some queue4. If queue is not determinate explicitly application is put on default queue (parameter yarn.scheduler.fair.allow-undeclared-pools should be equal false)5. When there is a single job running, that job uses the entire cluster. 6. When other jobs are submitted, tasks slots that free up are assigned to the new jobs, so that each job gets roughly the same amount of CPU time. 7. The Fair Scheduler arose out of Facebook’s need to share its data warehouse between multiple users

“Better to see once than hear 100 times” (C)

http://stackoverflow.com/questions/13842241/can-we-use-both-fair-scheduler-and-capacity-scheduler-in-the-same-hadoop-cluster

Page 23: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 24

Fair Scheduler. Ways to specify queues

1) Log on in operation system as required userFor example, for running job into root.hdfs queue you can log on as hdfs Linux user:[hdfs@tvpbdaacn01 ~]$ iduid=201(hdfs) gid=123(hadoop) groups=123(hadoop),1001(oinstall),1003(hdfs)2) Specify “magic” parameter -Dmapred.job.queue.name during running MR jobFor example, in this example you will run -bash-4.1$ iduid=492(yarn) gid=490(yarn) groups=490(yarn)-bash-4.1$ hadoop hadoop-mapreduce-examples-2.3.0-cdh5.0.0.jar teragen -Dmapred.map.tasks=800 -Dmapred.job.queue.name=root.hdfs 1000000000 /tmp/test23) Use target name in some UI (hue, for example). This HQL will use root.hdfs queue

Page 24: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 25

DEMO

Page 25: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 26

Fair Scheduler. Main concepts

In our example we have 4 queues: root, hdfs, someuser, default- All queue exclude hdfs have eqal weight. - There are no limitation only weight

Page 26: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 27

Fair Scheduler. Main concepts- Run MR Job that placed on “root.someuser” pool- It takes all available CPU resource. Because it’s single Job in cluster

Page 27: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 28

Fair Scheduler. Main concepts- After this we run another one MR job in “root.root” pool- They divide CPU recouse in two equal part

Page 28: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 29

Fair Scheduler. Main concepts- After this we run another one MR job (third) in “root.hdfs” pool- They divide CPU recourse according config file. Hdfs pool takes half recourse, root and someuser quoter

Page 29: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 30

Fair Scheduler. Main conceptsAfter this we dynamically changed weight for “root.hdfs” pool (increase to 3)

Page 30: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 31

Fair Scheduler. Main concepts- After this we dynamically changed weight for “root.hdfs” pool (increase to 3) -Workload redistribute automatically!

Page 31: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 32

Fair Scheduler. Main concepts - In some time we limit number of CPU for someuser pool (11 cores as maximum)- Other pools takes released resource

Page 32: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 33

Fair Scheduler. Main concepts - In some time we limit number of CPU for someuser pool- Other pools takes released resource

Page 33: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 36

CapacityScheduler

Page 34: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 37

CapacityScheduler. Main concepts1. The CapacityScheduler is designed to allow sharing a large cluster while giving each organization a minimum capacity guarantee.2. The central idea is that the available resources in the Hadoop Map-Reduce cluster are partitioned among multiple organizations who collectively fund the cluster based on computing needs.3. The Capacity Scheduler from Yahoo offers similar functionality to the Fair Scheduler but takes a somewhat differentPhilosophy4. In the Capacity Scheduler, you define a number of named queues. Each queue has a configurable number of map and reduce slots. The scheduler gives each queue its capacity when it contains jobs, and shares any unused capacity between the queues.

http://stackoverflow.com/questions/13842241/can-we-use-both-fair-scheduler-and-capacity-scheduler-in-the-same-hadoop-cluster

Page 35: Resource Managment in Hadoop

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 38

Page 36: Resource Managment in Hadoop