clustermanagement (2)

Upload: ganapathiraju-sravani

Post on 02-Mar-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/26/2019 ClusterManagement (2)

    1/38

    Hadoop Admin:

    Index:

    - Responsibilities of Hadoop Admin.- Building Single Node Cluster.- Building Multi Node Cluster.- Commissioning and Deommissioning Nodes in Cluster.- C!allenges of running Hadoop Cluster.- S"stem #og $iles.- Hadoop Admin Commands.

    Hadoop Admin Responsibilities:

    Responsible for implementation and ongoing administration of Hadoop

    infrastructure. Aligning with the systems engineering team to propose and deploy new hardware

    and software environments required for Hadoop and to expand existing

    environments. Working with data delivery teams to setup new Hadoop users. This ob includes

    setting up !inux users" setting up #erberos principals and testing H$%&" Hive" 'igand (apReduce access for the new users.

    )luster maintenance as well as creation and removal of nodes using tools like

    *anglia" +agios" )loudera (anager ,nterprise" $ell -pen (anage and other tools.

    'erformance tuning of Hadoop clusters and Hadoop (apReduce routines.

    &creen Hadoop cluster ob performances and capacity planning

    (onitor Hadoop cluster connectivity and security

    (anage and review Hadoop log files.

    %ile system management and monitoring.

    H$%& support and maintenance.

    $iligently teaming with the infrastructure" network" database" application and

    business intelligence teams to guarantee high data quality and availability.

    )ollaborating with application teams to install operating system and Hadoop

    updates" patches" version upgrades when required.

    'oint of )ontact for endor escalation.

  • 7/26/2019 ClusterManagement (2)

    2/38

  • 7/26/2019 ClusterManagement (2)

    3/38

    TheNameNodeand theSecondary NameNode

    &tores the filesystem meta information/s 6directory structure" names"

    attributes and file locali?ation3 and ensures that blocks are properly replicated in

    the cluster

    0t needs a lot of memory 6memory bound3TheDataNode

    (anages the state of an H$%& node and interacts with its blocks

    +eeds a lot of 05- for processing and data transfer 6/! bound3

    The critical components in this architecture are theNameNodeand theSecondary

    NameNode.

    How HDFS manages its fles

    H$%& is optimi?ed for the storage of large files. @ou write the file once and access itmany times. 0n H$%&" a file is split into several blocks. ,ach block is asynchronously

    replicated in the cluster. Therefore" the client sends its files once and the cluster takes

    care of replicating its blocks in the background.

    A block is a contiguous area" a blob of data on the underlying filesystem" its default si?e

    is ;:(< but it can be extended to 24=(< or even 4>;(

  • 7/26/2019 ClusterManagement (2)

    4/38

    a"simagefile which contains the filesystem metadata

    The editsfile which contains a list of modifications performed on the content

    of"simage.

    The in memory image is the merge of those two files.

    When the +ame+ode starts" it first loads"simageand then applies the content

    of editson it to recover the latest state of the filesystem.

    An issue would be that over time" the edits file keeps growing undefinitely and ends up

    by1

    consuming all disk space

    slowdown restarts

    TheSecondary NameNoderole is to avoid this issue by regularly

    merging editswith"simage" thus pushing a new"simageand resetting the content

    of edits#The trigger for this compaction process is configurable. 0t can be1

    The number of transactions performed on the cluster

    The si?e of the edits file

    The elapsed time since the last compaction

    T$e "ollo%ing "ormula can be applied to kno% $o% muc$ memory a +ame+odeneeds&

    B+eeded memoryC D Btotal storage si?e in the cluster in (

  • 7/26/2019 ClusterManagement (2)

    5/38

    Building Single Node Cluster:

    )luster of (achines at @ahooFG

    %rere&uisites

    0nstall ava. C ava 2.; ersion.

    Adding a dediated Hadoop s"stem user

  • 7/26/2019 ClusterManagement (2)

    6/38

    We will use a dedicated Hadoop user account for running Hadoop. While that/s not

    required it is recommended because it helps to separate the Hadoop installation from

    other software applications.

    $ sudo addgroup Naresh

    $ sudo adduser --ingroup Naresh Srinu

    This will add the user srinu and the group +aresh to your local machine.

    Configuring SSH (Configuring Key Based Login)

    Hadoop requires &&H access to manage its nodes" i.e. remote machines plus your local

    machine if you want to use Hadoop on it .%or our singlenode setup of Hadoop" we

    therefore need to configure &&H access to localhost for the &rinu user

    1) We have to generate an SSH key for the Srinu user.

    user@ubuntu:~$ su - Srinu

    Srinu@ubuntu:~$ ssh-keygen -t rsa - !!

    Generating publicpri!ate rsa "e# pair

    %nter fle in which to sa!e the "e# &homeSrinusshid'rsa(:

    )reated director# *homeSrinussh*

    +our identifcation has been sa!ed in homeSrinusshid'rsa

    +our public "e# has been sa!ed in homeSrinusshid'rsapub

    The "e# fngerprint is:

    ,b:-.:ea:/-:b0:e1:2/:d3:4:5,:66:a6:e7:ae:1e:d. Srinu@ubuntu

    The "e#*s randomart image is:8snipp9Srinu@ubuntu:~$

    The second line will create an R&A key pair with an empty password. *enerally" using an

    empty password is not recommended" but in this case it is needed to unlock the key

  • 7/26/2019 ClusterManagement (2)

    7/38

    without your interaction 6you don/t want to enter the passphrase every time Hadoop

    interacts with its nodes3.

    2) We have to enable SSH access to your local achine !ith this ne!ly

    created key.

    Srinu@ubuntu:~$ cat "H#M$/%ssh/id&rsa%pub '' "H#M$/%ssh/authori(ed&keys

    The final step is to test the &&H setup by connecting to your local machine with

    the&rinu user. The step is also needed to save your local machine/s host key fingerprint

    to the&rinu user/sknownIhostsfile.

    Srinu@ubuntu:~$ ssh localhost

    The authenticit# o7 host *localhost &::5(* can*t be established

    S; "e# fngerprint is d3:-3:./:03:ae:1.:11:eb:5d:3/:07:bb:00:7,:26:.6

    ;re #ou sure #ou want to continue connecting esno(< yes=arning: >ermanentl# added *localhost* &S;( to the list o7 "nown hosts

    ?inu ubuntu .62.A..Ageneric B22ACbuntu S> =ed ;pr .- 52:.3:21 CT) .151 i6-6

    GNC?inu

    Cbuntu 5.10 ?TS

    8snipp9Srinu@ubuntu:~$

    "isabling #v%

    -ne problem with 0'v; on 7buntu is that usingE.E.E.Efor the various networking

    related Hadoop configuration options will result in Hadoop binding to the 0'v;

    addresses of my 7buntu box. 0n my case" 0 reali?ed that there/s no practical point in

    enabling 0'v; on a box when you are not connected to any 0'v; network. Hence" 0

    simply disabled 0'v; on my 7buntu machine. @our mileage may vary.

    To disable 0'v; on 7buntu" open5etc5sysctl.confin the editor of your choice and add

    the following lines to the end of the file1

    B disable ip!6

    netip!6con7alldisable'ip!6 E 5

    netip!6con7de7aultdisable'ip!6 E 5netip!6con7lodisable'ip!6 E 5

  • 7/26/2019 ClusterManagement (2)

    8/38

  • 7/26/2019 ClusterManagement (2)

    9/38

    alias hlsEM7s AlsM

    B 7 #ou ha!e ?O compression enabled in #our Hadoop cluster and

    B compress Job outputs with ?O> &not co!ered in this tutorial(:

    B )on!enientl# inspect an ?O> compressed fle 7rom the command

    B lineP run !ia:B

    B $ lohead hd7spathtolopcompressedflelo

    B

    B eQuires installed *lop* command

    B

    lohead &( R

    hadoop 7s Acat $5 lop Adc head A5111 less

    B ;dd Hadoop bin director# to >;TH

    eport >;THE$>;TH:$H;D>'H%bin

    Configuration

    Hadoop-en'.s!

    The only required environment variable we have to configure for Hadoop in this tutorial

    isAAIH-(,. -penconf*hadoop-env.shin the editor of your choice 6if you used

    the installation path in this tutorial" the full path

    is5usr*local*hadoop*conf*hadoop-env.sh 3 and set

    theAAIH-(,environment variable to the &un $#5R, ; directory.

    )hange

    )onf5hadoopenv.sh

    B The Ja!a implementation to use eQuired

    B eport ;I;'H%EusrlibJ.sd"5/Asun

    To

    )onf5hadoopenv.sh

  • 7/26/2019 ClusterManagement (2)

    10/38

    B The Ja!a implementation to use eQuired

    eport ;I;'H%EusrlibJ!mJa!aA6Asun

    Conf()-site.xml

    0n this section" we will configure the directory where Hadoop will store its data files" the

    network ports it listens to" etc. -ur setup will use Hadoop/s $istributed %ile

    &ystem"H$%&" even though our little KclusterL only contains our single local machine.

    @ou can leave the settings below Kas isL with the exception of

    thehadoop.tp.dirparameter M this parameter you must change to a directory ofyour choice. We will use the directory*app*hadoop*tp.

    +ow we create the directory and set the required ownerships and permissions1

    $ sudo m"dir Ap apphadooptmp

    $ sudo chown Srinu:hadoop apphadooptmp

    B and i7 #ou want to tighten up securit#U chmod 7rom 3// to 3/1

    $ sudo chmod 3/1 apphadooptmp

    Add the following snippets between theBconfigurationC ... B5configurationCtags in the

    respective configuration N(! file.

    0n fileconf*core-site.+l,

    onf*core-site.+l

    http://hadoop.apache.org/hdfs/docs/current/hdfs_design.htmlhttp://hadoop.apache.org/hdfs/docs/current/hdfs_design.htmlhttp://hadoop.apache.org/hdfs/docs/current/hdfs_design.html
  • 7/26/2019 ClusterManagement (2)

    11/38

    Vpropert#L

    VnameLhadoop%tmp%dirVnameL

    V!alueL/app/hadoop/tmpV!alueL

    VdescriptionL; base 7or other temporar# directoriesVdescriptionL

    Vpropert#L

    Vpropert#L

    VnameL)s%de)ault%nameVnameL

    V!alueLhd)s://localhost:*+,.V!alueL

    VdescriptionLThe name o7 the de7ault fle s#stem ; C whose

    scheme and authorit# determine the FileS#stem implementation The

    uri*s scheme determines the confg propert# &7sS)H%%impl( naming

    the FileS#stem implementation class The uri*s authorit# is used to

    determine the hostU portU etc 7or a fles#stemVdescriptionLVpropert#L

    onf*apred-site.+l

    Vpropert#L

    VnameLmapred%ob%trackerVnameL

    V!alueLlocalhost:*+,V!alueL

    VdescriptionLThe host and port that the apeduce Job trac"er runs at 7 MlocalMU then Jobs are run inAprocess as a single map

    and reduce tas"

    VdescriptionL

    Vpropert#L

    onf*Hdfs-site.+l

    Vpropert#L

    VnameLd)s%replicationVnameL

    V!alueLV!alueL

    VdescriptionLDe7ault bloc" replication

    The actual number o7 replications can be specifed when the fle is createdThe de7ault is

    used i7 replication is not specifed in create time

    VdescriptionL

  • 7/26/2019 ClusterManagement (2)

    12/38

    Vpropert#L

    oratting H"S via Nae Node.

    Srinu@ubuntu:~$ usrlocalhadoopbinhadoop namenode A7ormat

    Srinu@ubuntu:usrlocalhadoop$ binhadoop namenode A7ormat

    511/1- 56:/,:/6 NF namenodeNameNode: ST;TC>'SG:

    WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW

    ST;TC>'SG: Starting NameNode

    ST;TC>'SG: host E ubuntu5.3155ST;TC>'SG: args E 8A7ormat9

    ST;TC>'SG: !ersion E 1.1.

    ST;TC>'SG: build E https:s!napacheorgreposas7hadoopcommonbranchesbranchA

    1.1 Ar ,55313P compiled b# *chrisdo* on Fri Feb 5, 1-:13:20 CT) .151

    WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW

    511/1- 56:/,:/6 NF namenodeFSNames#stem: 7swnerESrinuUhadoop

    511/1- 56:/,:/6 NF namenodeFSNames#stem: supergroupEsupergroup

    511/1- 56:/,:/6 NF namenodeFSNames#stem: is>ermission%nabledEtrue

    511/1- 56:/,:/6 NF commonStorage: mage fle o7 sie ,6 sa!ed in 1 seconds

    511/1- 56:/,:/3 NF commonStorage: Storage director# hadoopASrinud7sname has

    been success7ull# 7ormatted511/1- 56:/,:/3 NF namenodeNameNode: SHCTD=N'SG:

    WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW

    SHCTD=N'SG: Shutting down NameNode at ubuntu5.3155

    WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW

    Srinu@ubuntu:usrlocalhadoop$

    Starting Single Node luster.

    Srinu@ubuntu:~$ usrlocalhadoopbin/start-all%sh

    These will startup a +amenode" $atanode" obtracker and a Tasktracker on your

    machine. The output will look like this1

  • 7/26/2019 ClusterManagement (2)

    13/38

    Srinu@ubuntu:usrlocalhadoop$ bin/start-all%sh

    starting namenodeU logging to usrlocalhadoopbinlogshadoopASrinuAnamenodeA

    ubuntuout

    localhost: starting datanodeU logging to usrlocalhadoopbinlogshadoopASrinuAdatanodeA

    ubuntuout

    localhost: starting secondar#namenodeU logging to usrlocalhadoopbinlogshadoopASrinuAsecondar#namenodeAubuntuout

    starting Jobtrac"erU logging to usrlocalhadoopbinlogshadoopASrinuAJobtrac"erA

    ubuntuout

    localhost: starting tas"trac"erU logging to usrlocalhadoopbinlogshadoopASrinuA

    tas"trac"erAubuntuout

    Srinu@ubuntu:usrlocalhadoop$

    0S

    Srinu@ubuntu:usrlocalhadoop$ps..-3 Tas"Trac"er

    .50, obTrac"er

    5,2- DataNode

    .1-/ Secondar#NameNode

    .20, ps

    53-- NameNode

    0f there are any errors" examine the log files in the5logs5directory.

    Stopping Single Node luster.

    Srinu@ubuntu:usrlocalhadoop$ bin/stop-all%sh

    stopping Jobtrac"er

    localhost: stopping tas"trac"er

    stopping namenode

    localhost: stopping datanode

    localhost: stopping secondar#namenode

    Srinu@ubuntu:usrlocalhadoop$

  • 7/26/2019 ClusterManagement (2)

    14/38

    Building Multi Node Cluster:

    From two single-node clusters to a multi-node cluster:We will build a

    multinode cluster using two 7buntu boxes. The best way to do this for starters is to

    install" configure and test a KlocalL Hadoop setup for each of the two 7buntu boxes" and

    in a second step to KmergeL these two singlenode clusters into one multinode cluster in

    which one 7buntu box will become the designated master 6but also act as a slave with

    regard to data storage and processing3" and the other box will become only a slave.

  • 7/26/2019 ClusterManagement (2)

    15/38

    Prerequisites:

    )onfguring singleAnode clusters frst: 0t is recommended that you

    use the OOsame settings// 6e.g." installation locations and paths3 on both machines" or

    otherwise you might run into problems later when we will migrate the two machines tothe final multinode cluster setup.+ow that you have two singlenode clusters up and

    running" we will modify the Hadoop configuration to make one 7buntu box the

    KmasterL 6which will also act as a slave3 and the other 7buntu box a KslaveL.

    apping the nodes:

    The easiest way is to put both machines in the same network with regard to hardware

    and software configuration" for example connect both machines via a single hub or

    switch and configure the network interfaces to use a common network such

    as2P4.2;=.E.x54:.

  • 7/26/2019 ClusterManagement (2)

    16/38

    To make it simple" we will assign the 0' address2P4.2;=.E.2to themastermachine

    and2P4.2;=.E.4to theslavemachine. 7pdate5etc5hostson both machines with the

    following line1

    etc5hosts 6%or (aster and slave3

    5,.56-15 master

    5,.56-1. sla!e

    SSH ;ccess:

    The&rinuuser on themaster6aka&rinuQmaster3 must be able to connect

    23 To its own user account on themasterM i.e.ssh asterin this context and not

    necessarilyssh localhost.43 To the&rinuuser account on theslave6aka&rinuQslave3 via a passwordless

    &&H login

    @ou ust have to add the&rinuQmaster/s public &&H key 6which should be

    inJH-(,5.ssh5idIrsa.pub3 to theauthori?edIkeysfile of&rinuQslave6in this

    user/sJH-(,5.ssh5authori?edIkeys3. @ou can do this manually or use

    Srinu@master:~$ sshAcop#Aid Ai $H%sshid'rsapub Srinu@sla!e

    This command will prompt you for the login password for userSrinuonslave" then

    copy the public &&H key for you" creating the correct directory and fixing the

    permissions as necessary.

    The final step is to test the &&H setup by connecting with userSrinufrom themasterto

    the user accountSrinuon theslave. The step is also needed to saveslaves host key

    fingerprint to the&rinuQmaster/sknownIhostsfile.

    &o" connecting frommastertomasterF

  • 7/26/2019 ClusterManagement (2)

    17/38

    Srinu@master:~$ ssh master

    The authenticit# o7 host *master &5,.56-15(* can*t be established

    S; "e# fngerprint is 2b:.5:b2:c1:.5:/c:3c:/0:.7:5e:.d:,6:3,:eb:37:,/

    ;re #ou sure #ou want to continue connecting esno(< yes

    =arning: >ermanentl# added *master* &S;( to the list o7 "nown hosts

    ?inu master .6.1A56A2-6 B. Thu un 3 .1:56:52 CT) .113 i6-6

    Srinu@master:~$

    And frommastertoslave.

    Srinu@master:~$ ssh slave

    The authenticit# o7 host *sla!e &5,.56-1.(* can*t be established

    S; "e# fngerprint is 30:d3:65:-6:db:-6:-7:25:,1:,c:6-:b1:52:--:/.:3.;re #ou sure #ou want to continue connecting esno(< yes

    =arning: >ermanentl# added *sla!e* &S;( to the list o7 "nown hosts

    Cbuntu 5110

    Srinu@sla!e:~$

    HadoopCluster

    We will see how to configure one 7buntu box as a master node and the other 7buntu

    box as a slave node. The master node will also act as a slave because we only have two

    machines available in our cluster but still want to spread data storage and processing to

    multiple machines.

  • 7/26/2019 ClusterManagement (2)

    18/38

    The master node will run the KmasterL daemons for each layer1 +ame+ode for the H$%&

    storage layer" and obTracker for the (apReduce processing layer.

  • 7/26/2019 ClusterManagement (2)

    19/38

    respectively 6the primary +ame+ode and the obTracker will be started on the same

    machine if you runbin5startall.sh3.

    To start individually...

    binhadoopAdaemonsh start 8namenode secondar#namenode datanode

    Jobtrac"er tas"trac"er9

    Again" the machine on which bin5startdfs.sh is run will become the

    primary +ame+ode.

    -n master" update conf5masters that it looks like this1

    )onf5(asters 6-n (aster file3

    aster

    Conf/slaves (master only)

    Theconf*slavesfile lists the hosts" one per line" where the Hadoop slave daemons

    6$ata +odes and Task Trackers3 will be run. We want both themasterbox and

    theslavebox to act as Hadoop slaves because we want both of them to store and process

    data.

    -naster" updateconf*slavesthat it looks like this1

    onf*slaves /on (aster)

    master

    sla!e

  • 7/26/2019 ClusterManagement (2)

    20/38

    Theconf5slavesfile onmasteris used only by the scripts likebin5startdfs.sh

    orbin5stopdfs.sh. %or example" if you want to add $ata +odes on the fly you can

    KmanuallyL start the $ata+ode daemon on a new slave machine viabin5hadoop

    daemon.sh start datanode. 7sing theconf5slaves

    file on the master simply helps you to

    make KfullL cluster restarts easier.

    onf()-site.xml *all ma!ines+

    We must change the configuration filesconf*core-site.+l0conf*apred-

    site.+landconf*hdfs-site.+lon A!! machines as follows.

    %irst" we have to change thefs.default.naeparameter 6inconf5coresite.xml3" which

    specifies the+ame+ode6the H$%& master3 host and port. 0n our case" this is themaster

    machine.

    onf*core-site.+l /'n ll (achines)

    Vpropert#L

    VnameL)s%de)ault%nameVnameL V!alueLhd)s: //master:*+,.V!alueL

    VdescriptionLThe name o7 the de7ault fle s#stem ; C whose

    scheme and authorit# determine the FileS#stem implementation The

    uri*s scheme determines the confg propert# &7sS)H%%impl( naming

    the FileS#stem implementation class The uri*s authorit# is used to

    determine the hostU portU etc 7or a fles#stemVdescriptionL

    Vpropert#L

    onf*(apred-site.+l/'n ll (achines)

    http://hadoop.apache.org/core/docs/current/hadoop-default.html#fs.default.namehttp://hadoop.apache.org/core/docs/current/api/overview-summary.htmlhttp://hadoop.apache.org/core/docs/current/hadoop-default.html#fs.default.namehttp://hadoop.apache.org/core/docs/current/api/overview-summary.html
  • 7/26/2019 ClusterManagement (2)

    21/38

    &econd" we have to change themapred.jo.trac!erparameter 6inconf5mapredsite.xml3"

    which specifies theJobTracker6(apReduce master3 host and port. Again" this is

    themasterin our case.

    Vpropert#L

    VnameLmapred%ob%trackerVnameL

    V!alueLmaster:*+,V!alueL

    VdescriptionLThe host and port that the apeduce Job trac"er runs

    at 7 MlocalMU then Jobs are run inAprocess as a single map

    and reduce tas"

    VdescriptionL

    Vpropert#L

    Third" we change thedfs.replicationparameter 6inconf5hdfssite.xml3 which specifies

    the default block replication. 0t defines how many machines a single file should be

    replicated to before it becomes available. 0f you set this to a value higher than the

    number of available slave nodes 6more precisely" the number of $ata+odes3" you will

    start seeing a lot of K6ero targets found" forbidden2.si?eD23L type errors in the log files.

    The default value ofdfs.replicationis9. However" we have only two nodes available" so

    we setdfs.replicationto4.

    http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.job.trackerhttp://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.job.trackerhttp://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobTracker.htmlhttp://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobTracker.htmlhttp://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.replicationhttp://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.job.trackerhttp://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobTracker.htmlhttp://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.replication
  • 7/26/2019 ClusterManagement (2)

    22/38

    onf*hdfs-site.+l

    Vpropert#L

    VnameLd)s%replicationVnameL

    V!alueL1V!alueL

    VdescriptionLDe7ault bloc" replication

    The actual number o7 replications can be specifed when the fle is created

    The de7ault is used i7 replication is not specifed in create time

    VdescriptionL

    Vpropert#L

    oratting the Nae Node

    To format the filesystem 6which simply initiali?es the directory specified by

    thedfs.name.dirvariable on the +ame+ode3" run the command

    Srinu@master:usrlocalhadoop$ bin/hadoop namenode -)ormat

    NF d7sStorage: Storage director# apphadooptmpd7sname has been success7ull#

    7ormatted

    Srinu@master:usrlocalhadoop$

    The H$%& name table is stored on the +ame +ode/s 6here1master3 local filesystem in

    the directory specified bydfs.nae.dir.The name table is used by the +ame+ode to

    store tracking and coordination information for the $ata+odes.

  • 7/26/2019 ClusterManagement (2)

    23/38

    Starting the multi-node cluster

    &tarting the cluster is performed in two steps.

    2. We begin with starting the H$%& daemons1 the +ame+ode daemon is started

    onmaster" and $ata+ode daemons are started on all slaves 6here1masterandslave3.

    4. Then we start the (apReduce daemons1 the obTracker is started onmaster" and

    TaskTracker daemons are started on all slaves 6here1masterandslave3.

    HD$S daemons

    Run the commandbin*start-dfs.shon the machine you want the 6primary3

    +ame+ode to run on. This will bring up H$%& with the +ame+ode running on the

    machine you ran the previous command on" and $ata+odes on the machines listed in

    theconf5slavesfile.0n our case" we will runbin5startdfs.shonmaster1

    ava rocesses running on (aster after bin*start-dfs.sh

    Srinu@master:usrlocalhadoop$ Jps503,, NameNode

    5/250 ps

    50--1 DataNode

    50,33 Secondar#NameNode

    Srinu@master:usrlocalhadoop$

    ava rocesses running on Slaves after bin*start-dfs.sh

    Srinu@sla!e:usrlocalhadoop$ Jps5/5-2 DataNode

    5/656 ps

    Srinu@sla!e:usrlocalhadoop$

  • 7/26/2019 ClusterManagement (2)

    24/38

    ava rocesses running on (asters after bin*start-apred.sh

    Srinu@master:usrlocalhadoop$ Jps

    56153 ps

    503,, NameNode

    5/6-6 Tas"Trac"er

    50--1 DataNode

    5//,6 obTrac"er

    50,33 Secondar#NameNode

    Srinu@master:usrlocalhadoop$

    ava rocesses running on Slaves after bin*start-apred.sh

    Srinu@sla!e:usrlocalhadoop$ Jps

    5/5-2 DataNode

    5/-,3 Tas"Trac"er

    56.-0 ps

    Srinu@sla!e:usrlocalhadoop$

  • 7/26/2019 ClusterManagement (2)

    25/38

    oissioning and "ecoissioning Nodes in aHadoop luster,

    -ne of the most attractive features of Hadoop framework is its utili?ation of commodity

    hardware. However" this leads to frequent $ata +ode crashes in a Hadoop cluster.

    Another striking feature of Hadoop %ramework is the ease of scale in accordance to the

    rapid growth in data volume.

  • 7/26/2019 ClusterManagement (2)

    26/38

  • 7/26/2019 ClusterManagement (2)

    27/38

  • 7/26/2019 ClusterManagement (2)

    28/38

    the +ame+ode. )ontent of the 5home5hadoop5hadoop2.4.25hdfsIexclude.txt file is

    shown below.

    slave4.in

    Ste %" &orce configuration reload

    Run the command ,HAD%/HM0(bin(!adoop dfsadmin -refres!Nodes,'ithout the uotes

    J JHA$--'IH-(,5bin5hadoop dfsadmin refresh+odes

    This will force the +ame+ode to reread its configuration" including the newly updated

    Oexcludes/ file. 0t will decommission the nodes over a period of time" allowing time for

    each nodeSs blocks to be replicated onto machines which are scheduled to remainactive.

    -n slave4.in" check the ps command output. After some time" you will see the

    $ata+ode process is shutdown automatically.

    Ste *" Shutdo'n nodes

    After the decommission process has been completed" the decommissioned hardware

    can be safely shut down for maintenance. Run the report command to dfsadmin to

    check the status of decommission. The following command will describe the status of

    the decommission node and the connected nodes to the cluster.

    $ JHA$--'IH-(,5bin5hadoop dfsadmin report

    Ste +" ,dit ecludes file again

    -nce the machines have been decommissioned" they can be removed from the

    Oexcludes/ file. Running JHA$--'IH-(,5bin5hadoop dfsadmin

    refresh+odes again will read the excludes file back into the +ame+odeU allowing the

    $ata+odes to reoin the cluster after the maintenance has been completed" or

    additional capacity is needed in the cluster again" etc.

    &pecial +ote1 0f the above process is followed and the tasktracker process is still

    running on the node" it needs to be shut down. -ne way is to disconnect the machine as

  • 7/26/2019 ClusterManagement (2)

    29/38

    we did in the above steps. The (aster will recogni?e the process automatically and will

    declare as dead. There is no need to follow the same process for removing the

    tasktracker because it is +-T much crucial as compared to the $ata+ode. $ata+ode

    contains the data that you want to remove safely without any loss of data.

    The tasktracker can be run5shutdown on the fly by the following command at any point

    of time.

    hallenges of running hadoop luster,

    (ain challenge in running a hadoop cluster comes from maintenance itself. We will

    point out some of the common problems we face every day.

  • 7/26/2019 ClusterManagement (2)

    30/38

    2. Replacing5upgrading hard drives. @ou got to be careful while scaling.

    4. )ommissioning and decommissioning is fairly simple but still keep an eye on

    site.xmlSs.

    9. 'erformance issues as your data grows. 0t could be at network level" 0- level" $isklevel or Application level. . There are opensource tools like +agios" *anglia or ,nterprise solutions like

    brightcomputing for better monitoring capabilities. @ou got to research on it.

    ;. -nce again better to have clear understanding on each and every parameter of

    mapredsite.xml and hdfssite.xml.0f you ust want to performance tune and configure

    the cluster" 0 would say understanding fully how each parameter in mapreddefault.xml

    and coredefault.xml impacts your obs are critical. There have been changes in the

    names of properties over releases and not taking them into account is common for new

    people in the domain. %ailing hardware 6disks mainly3 is very common.

    7.

    Typically a large hadoop clusterSs main ob is replacing hardware1 especially harddrives. &oftware management is not that much of work after the initial setup. Reboot

    resolves most of the problems.

  • 7/26/2019 ClusterManagement (2)

    31/38

    4argest luster in the World.

    The largest publicly known Hadoop clusters are @ahooGSs :EEE node cluster followed by

    %acebookSs 49EE node cluster V2.

    @ahooG has lots of Hadoop nodes but theySre organi?ed under different clusters and are

    used for different purposes 6a significant amount of these clusters are research clusters3.

    The current obTracker and +ame+ode actually donSt scale that well to that many nodes

    6theySve lots of concurrency issues3

  • 7/26/2019 ClusterManagement (2)

    32/38

    ,

  • 7/26/2019 ClusterManagement (2)

    33/38

  • 7/26/2019 ClusterManagement (2)

    34/38

    The statistics files are named1

    BHostnameCIBepochofobtrackerstartCIBobidCIBobnameC

    Standard Error

    These logs are created by each tasktracker. They contain information written tostandard error 6stderr3 captured when a task attempt is run. These logs can be usedfor debugging. %or example" a developer can include &ystem.err.println 6Ksome usefulinformationL3 calls in the ob code. The output will appear in the standard error files.

    The parent directory name for these logs is constructed as follows1

    5var5log5hadoop5userlogs5attemptIBobidCIB(ap or ReduceCIBattemptidC

    where BobidC is the 0$ of the ob that this attempt is doing work for" Bmapor

    reduceC is either KmL if the task attempt was a mapper" or KrL if the task attempt was areducer" and BattemptidC is the 0$ of the task attempt.

    %or example1

    5var5log5hadoop5userlogs5attemptI4EEPE=2PEE4PIEE2ImIEEEE2IE

    These logs are rotated according to the mapred.userlog.retain.hours property. @ou canclear these logs periodically without affecting Hadoop. However" consider archivingthe logs if they are of interest in the ob development process. (ake sure you do notmove or delete a file that is being written to by a running ob.

    1) H"'' N(N'" '((N"S

    3ommand Description

    hadoop namenode

    A7ormat Format HDFS fles#stem 7rom Namenode

    hadoop namenode

    AupgradeCpgrade the NameNode

    startAd7ssh Start HDFS Daemons

  • 7/26/2019 ClusterManagement (2)

    35/38

    stopAd7ssh Stop HDFS Daemons

    startAmapredsh Start apeduce Daemons

    stopAmapredsh Stop apeduce Daemons

    hadoop namenode

    Areco!er A7orce

    eco!er namenode metadata a7ter a cluster

    7ailure &ma# lose data(

    1% HAD## FS34 3#MMA5DS

    3ommand Description

    hadoop 7sc" Files#stem chec" on HDFS

    hadoop 7sc" Afles Displa# fles during chec"

    hadoop 7sc" Afles Abloc"sDispla# fles and bloc"s during

    chec"

    hadoop 7sc" Afles Abloc"s

    Alocations

    Displa# flesU bloc"s and its location

    during chec"

    hadoop 7sc" Afles Abloc"sAlocations Arac"s

    Displa# networ" topolog# 7or dataAnode locations

    hadoop 7sc" Adelete Delete corrupted fles

    hadoop 7sc" Amo!eo!e corrupted fles to lostX7ound

    director#

    ,% HAD## 0#6 3#MMA5DS

    3ommand Descriptionhadoop Job Asubmit

    VJobAfleLSubmit the Job

    hadoop Job Astatus

    VJobAidL>rint Job status completion percentage

  • 7/26/2019 ClusterManagement (2)

    36/38

    hadoop Job Alist all ?ist all Jobs

    hadoop Job AlistAacti!eA

    trac"ers?ist all a!ailable Tas"Trac"ers

    hadoop Job AsetApriorit#

    VJobAidL Vpriorit#L

    Set priorit# 7or a Job Ialid priorities:I%+'HGHU HGHU N;?U ?=U

    I%+'?=

    hadoop Job A"illAtas"

    Vtas"AidLYill a tas"

    hadoop Job Ahistor#Displa# Job histor# including Job detailsU

    7ailed and "illed Jobs

    +% HAD## DFSADM75 3#MMA5DS

    3ommand Description

    hadoop d7sadmin

    Areporteport fles#stem in7o and statistics

    hadoop d7sadmin

    Ametasa!e flett

    Sa!e namenodeZs primar# data structures to

    flett

    hadoop d7sadminAset[uota 51

    Quotatest

    Set Hadoop director# Quota to onl# 51 fles

    hadoop d7sadmin

    Aclr[uota Quotatest)lear Hadoop director# Quota

    hadoop d7sadmin

    Are7reshNodes

    ead hosts and eclude fles to update

    datanodes that are allowed to connect to

    namenode ostl# used to commission or

    decommissions nodes

    hadoop 7s Acount

    AQ m#dir)hec" Quota space on director# m#dir

    hadoop d7sadmin

    AsetSpace[uota

    Set Quota to 511 on hd7s director# named

  • 7/26/2019 ClusterManagement (2)

    37/38

    m#dir 511 m#dir

    hadoop d7sadmin

    AclrSpace[uota

    m#dir

    )lear Quota on a HDFS director#

    hadooop d7sadmin

    Asa!eNameSpace

    \ac"up etadata &7simage K edits( >ut

    cluster in sa7e mode be7ore this command

    *% HAD## SAF$M#D$ 3#MMA5DS%

    The following dfsadmin commands helps the cluster to enter or leave safe mode" which

    is also called as maintenance mode. 0n this mode" +amenode does not accept any

    changes to the name spaceU it does not replicate or delete blocks.

    3ommand Description

    hadoop d7sadmin Asa7emode

    enter%nter sa7e mode

    hadoop d7sadmin Asa7emode

    lea!e?ea!e sa7e mode

    hadoop d7sadmin Asa7emode

    getGet the status o7 mode

    hadoop d7sadmin Asa7emode

    wait

    =ait until HDFS fnishes data bloc"

    replication

    8% HAD## 3#5F79RA;7#5 F7arameters 7or entire Hadoop cluster

    hd7sAsiteml >arameters 7or HDFS and its clients

    mapredAsiteml >arameters 7or apeduce and its clients

    masters Host machines 7or secondar# Namenode

    sla!es ?ist o7 sla!e hosts

  • 7/26/2019 ClusterManagement (2)

    38/38

    =% HAD## MRADM75 3#MMA5DS

    3ommand Description

    hadoop mradmin Asa7emode get )hec" ob trac"er status

    hadoop mradmin Are7resh[ueueseload mapreduce

    confguration

    hadoop mradmin Are7reshNodes eload acti!e Tas"Trac"ers

    hadoop mradmin Are7reshSer!ice;clForce obtrac"er to reload

    ser!ice ;)?

    hadoop mradmin

    Are7reshCserToGroupsappings

    Force Jobtrac"er to reload user

    group mappings

    >% HAD## 6A