clustermanagement (2)

7/26/2019 ClusterManagement (2)

1/38

Hadoop Admin:

Index:

- Responsibilities of Hadoop Admin.- Building Single Node Cluster.- Building Multi Node Cluster.- Commissioning and Deommissioning Nodes in Cluster.- C!allenges of running Hadoop Cluster.- S"stem #og $iles.- Hadoop Admin Commands.

Hadoop Admin Responsibilities:

Responsible for implementation and ongoing administration of Hadoop

infrastructure. Aligning with the systems engineering team to propose and deploy new hardware

and software environments required for Hadoop and to expand existing

environments. Working with data delivery teams to setup new Hadoop users. This ob includes

setting up !inux users" setting up #erberos principals and testing H$%&" Hive" 'igand (apReduce access for the new users.

)luster maintenance as well as creation and removal of nodes using tools like

*anglia" +agios" )loudera (anager ,nterprise" $ell -pen (anage and other tools.

'erformance tuning of Hadoop clusters and Hadoop (apReduce routines.

&creen Hadoop cluster ob performances and capacity planning

(onitor Hadoop cluster connectivity and security

(anage and review Hadoop log files.

%ile system management and monitoring.

H$%& support and maintenance.

$iligently teaming with the infrastructure" network" database" application and

business intelligence teams to guarantee high data quality and availability.

)ollaborating with application teams to install operating system and Hadoop

updates" patches" version upgrades when required.

'oint of )ontact for endor escalation.


2/38


3/38

TheNameNodeand theSecondary NameNode

&tores the filesystem meta information/s 6directory structure" names"

attributes and file locali?ation3 and ensures that blocks are properly replicated in

the cluster

0t needs a lot of memory 6memory bound3TheDataNode

(anages the state of an H$%& node and interacts with its blocks

+eeds a lot of 05- for processing and data transfer 6/! bound3

The critical components in this architecture are theNameNodeand theSecondary

NameNode.

How HDFS manages its fles

H$%& is optimi?ed for the storage of large files. @ou write the file once and access itmany times. 0n H$%&" a file is split into several blocks. ,ach block is asynchronously

replicated in the cluster. Therefore" the client sends its files once and the cluster takes

care of replicating its blocks in the background.

A block is a contiguous area" a blob of data on the underlying filesystem" its default si?e

is ;:(< but it can be extended to 24=(< or even 4>;(


4/38

a"simagefile which contains the filesystem metadata

The editsfile which contains a list of modifications performed on the content

of"simage.

The in memory image is the merge of those two files.

When the +ame+ode starts" it first loads"simageand then applies the content

of editson it to recover the latest state of the filesystem.

An issue would be that over time" the edits file keeps growing undefinitely and ends up

by1

consuming all disk space

slowdown restarts

TheSecondary NameNoderole is to avoid this issue by regularly

merging editswith"simage" thus pushing a new"simageand resetting the content

of edits#The trigger for this compaction process is configurable. 0t can be1

The number of transactions performed on the cluster

The si?e of the edits file

The elapsed time since the last compaction

T$e "ollo%ing "ormula can be applied to kno% $o% muc$ memory a +ame+odeneeds&

B+eeded memoryC D Btotal storage si?e in the cluster in (


5/38

Building Single Node Cluster:

)luster of (achines at @ahooFG

%rere&uisites

0nstall ava. C ava 2.; ersion.

Adding a dediated Hadoop s"stem user


6/38

We will use a dedicated Hadoop user account for running Hadoop. While that/s not

required it is recommended because it helps to separate the Hadoop installation from

other software applications.

$ sudo addgroup Naresh

$ sudo adduser --ingroup Naresh Srinu

This will add the user srinu and the group +aresh to your local machine.

Configuring SSH (Configuring Key Based Login)

Hadoop requires &&H access to manage its nodes" i.e. remote machines plus your local

machine if you want to use Hadoop on it .%or our singlenode setup of Hadoop" we

therefore need to configure &&H access to localhost for the &rinu user

1) We have to generate an SSH key for the Srinu user.

user@ubuntu:~$ su - Srinu

Srinu@ubuntu:~$ ssh-keygen -t rsa - !!

Generating publicpri!ate rsa "e# pair

%nter fle in which to sa!e the "e# &homeSrinusshid'rsa(:

)reated director# *homeSrinussh*

+our identifcation has been sa!ed in homeSrinusshid'rsa

+our public "e# has been sa!ed in homeSrinusshid'rsapub

The "e# fngerprint is:

,b:-.:ea:/-:b0:e1:2/:d3:4:5,:66:a6:e7:ae:1e:d. Srinu@ubuntu

The "e#*s randomart image is:8snipp9Srinu@ubuntu:~$

The second line will create an R&A key pair with an empty password. *enerally" using an

empty password is not recommended" but in this case it is needed to unlock the key


7/38

without your interaction 6you don/t want to enter the passphrase every time Hadoop

interacts with its nodes3.

2) We have to enable SSH access to your local achine !ith this ne!ly

created key.

Srinu@ubuntu:~$ cat "H#M$/%ssh/id&rsa%pub '' "H#M$/%ssh/authori(ed&keys

The final step is to test the &&H setup by connecting to your local machine with

the&rinu user. The step is also needed to save your local machine/s host key fingerprint

to the&rinu user/sknownIhostsfile.

Srinu@ubuntu:~$ ssh localhost

The authenticit# o7 host *localhost &::5(* can*t be established

S; "e# fngerprint is d3:-3:./:03:ae:1.:11:eb:5d:3/:07:bb:00:7,:26:.6

;re #ou sure #ou want to continue connecting esno(< yes=arning: >ermanentl# added *localhost* &S;( to the list o7 "nown hosts

?inu ubuntu .62.A..Ageneric B22ACbuntu S> =ed ;pr .- 52:.3:21 CT) .151 i6-6

GNC?inu

Cbuntu 5.10 ?TS

8snipp9Srinu@ubuntu:~$

"isabling #v%

-ne problem with 0'v; on 7buntu is that usingE.E.E.Efor the various networking

related Hadoop configuration options will result in Hadoop binding to the 0'v;

addresses of my 7buntu box. 0n my case" 0 reali?ed that there/s no practical point in

enabling 0'v; on a box when you are not connected to any 0'v; network. Hence" 0

simply disabled 0'v; on my 7buntu machine. @our mileage may vary.

To disable 0'v; on 7buntu" open5etc5sysctl.confin the editor of your choice and add

the following lines to the end of the file1

B disable ip!6

netip!6con7alldisable'ip!6 E 5

netip!6con7de7aultdisable'ip!6 E 5netip!6con7lodisable'ip!6 E 5


8/38


9/38

alias hlsEM7s AlsM

B 7 #ou ha!e ?O compression enabled in #our Hadoop cluster and

B compress Job outputs with ?O> &not co!ered in this tutorial(:

B )on!enientl# inspect an ?O> compressed fle 7rom the command

B lineP run !ia:B

B $ lohead hd7spathtolopcompressedflelo

B

B eQuires installed *lop* command

B

lohead &( R

hadoop 7s Acat $5 lop Adc head A5111 less

B ;dd Hadoop bin director# to >;TH

eport >;THE$>;TH:$H;D>'H%bin

Configuration

Hadoop-en'.s!

The only required environment variable we have to configure for Hadoop in this tutorial

isAAIH-(,. -penconf*hadoop-env.shin the editor of your choice 6if you used

the installation path in this tutorial" the full path

is5usr*local*hadoop*conf*hadoop-env.sh 3 and set

theAAIH-(,environment variable to the &un $#5R, ; directory.

)hange

)onf5hadoopenv.sh

B The Ja!a implementation to use eQuired

B eport ;I;'H%EusrlibJ.sd"5/Asun

To

)onf5hadoopenv.sh


10/38

B The Ja!a implementation to use eQuired

eport ;I;'H%EusrlibJ!mJa!aA6Asun

Conf()-site.xml

0n this section" we will configure the directory where Hadoop will store its data files" the

network ports it listens to" etc. -ur setup will use Hadoop/s $istributed %ile

&ystem"H$%&" even though our little KclusterL only contains our single local machine.

@ou can leave the settings below Kas isL with the exception of

thehadoop.tp.dirparameter M this parameter you must change to a directory ofyour choice. We will use the directory*app*hadoop*tp.

+ow we create the directory and set the required ownerships and permissions1

$ sudo m"dir Ap apphadooptmp

$ sudo chown Srinu:hadoop apphadooptmp

B and i7 #ou want to tighten up securit#U chmod 7rom 3// to 3/1

$ sudo chmod 3/1 apphadooptmp

Add the following snippets between theBconfigurationC ... B5configurationCtags in the

respective configuration N(! file.

0n fileconf*core-site.+l,

onf*core-site.+l
http://hadoop.apache.org/hdfs/docs/current/hdfs_design.htmlhttp://hadoop.apache.org/hdfs/docs/current/hdfs_design.htmlhttp://hadoop.apache.org/hdfs/docs/current/hdfs_design.html


11/38

Vpropert#L

VnameLhadoop%tmp%dirVnameL

V!alueL/app/hadoop/tmpV!alueL

VdescriptionL; base 7or other temporar# directoriesVdescriptionL

Vpropert#L

Vpropert#L

VnameL)s%de)ault%nameVnameL

V!alueLhd)s://localhost:*+,.V!alueL

VdescriptionLThe name o7 the de7ault fle s#stem ; C whose

scheme and authorit# determine the FileS#stem implementation The

uri*s scheme determines the confg propert# &7sS)H%%impl( naming

the FileS#stem implementation class The uri*s authorit# is used to

determine the hostU portU etc 7or a fles#stemVdescriptionLVpropert#L

onf*apred-site.+l

Vpropert#L

VnameLmapred%ob%trackerVnameL

V!alueLlocalhost:*+,V!alueL

VdescriptionLThe host and port that the apeduce Job trac"er runs at 7 MlocalMU then Jobs are run inAprocess as a single map

and reduce tas"

VdescriptionL

Vpropert#L

onf*Hdfs-site.+l

Vpropert#L

VnameLd)s%replicationVnameL

V!alueLV!alueL

VdescriptionLDe7ault bloc" replication

The actual number o7 replications can be specifed when the fle is createdThe de7ault is

used i7 replication is not specifed in create time

VdescriptionL


12/38

Vpropert#L

oratting H"S via Nae Node.

Srinu@ubuntu:~$ usrlocalhadoopbinhadoop namenode A7ormat

Srinu@ubuntu:usrlocalhadoop$ binhadoop namenode A7ormat

511/1- 56:/,:/6 NF namenodeNameNode: ST;TC>'SG:

WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW

ST;TC>'SG: Starting NameNode

ST;TC>'SG: host E ubuntu5.3155ST;TC>'SG: args E 8A7ormat9

ST;TC>'SG: !ersion E 1.1.

ST;TC>'SG: build E https:s!napacheorgreposas7hadoopcommonbranchesbranchA

1.1 Ar ,55313P compiled b# *chrisdo* on Fri Feb 5, 1-:13:20 CT) .151


511/1- 56:/,:/6 NF namenodeFSNames#stem: 7swnerESrinuUhadoop

511/1- 56:/,:/6 NF namenodeFSNames#stem: supergroupEsupergroup

511/1- 56:/,:/6 NF namenodeFSNames#stem: is>ermission%nabledEtrue

511/1- 56:/,:/6 NF commonStorage: mage fle o7 sie ,6 sa!ed in 1 seconds

511/1- 56:/,:/3 NF commonStorage: Storage director# hadoopASrinud7sname has

been success7ull# 7ormatted511/1- 56:/,:/3 NF namenodeNameNode: SHCTD=N'SG:


SHCTD=N'SG: Shutting down NameNode at ubuntu5.3155


Srinu@ubuntu:usrlocalhadoop$

Starting Single Node luster.

Srinu@ubuntu:~$ usrlocalhadoopbin/start-all%sh

These will startup a +amenode" $atanode" obtracker and a Tasktracker on your

machine. The output will look like this1


13/38

Srinu@ubuntu:usrlocalhadoop$ bin/start-all%sh

starting namenodeU logging to usrlocalhadoopbinlogshadoopASrinuAnamenodeA

ubuntuout

localhost: starting datanodeU logging to usrlocalhadoopbinlogshadoopASrinuAdatanodeA

ubuntuout

localhost: starting secondar#namenodeU logging to usrlocalhadoopbinlogshadoopASrinuAsecondar#namenodeAubuntuout

starting Jobtrac"erU logging to usrlocalhadoopbinlogshadoopASrinuAJobtrac"erA

ubuntuout

localhost: starting tas"trac"erU logging to usrlocalhadoopbinlogshadoopASrinuA

tas"trac"erAubuntuout


0S

Srinu@ubuntu:usrlocalhadoop$ps..-3 Tas"Trac"er

.50, obTrac"er

5,2- DataNode

.1-/ Secondar#NameNode

.20, ps

53-- NameNode

0f there are any errors" examine the log files in the5logs5directory.

Stopping Single Node luster.

Srinu@ubuntu:usrlocalhadoop$ bin/stop-all%sh

stopping Jobtrac"er

localhost: stopping tas"trac"er

stopping namenode

localhost: stopping datanode

localhost: stopping secondar#namenode



14/38

Building Multi Node Cluster:

From two single-node clusters to a multi-node cluster:We will build a

multinode cluster using two 7buntu boxes. The best way to do this for starters is to

install" configure and test a KlocalL Hadoop setup for each of the two 7buntu boxes" and

in a second step to KmergeL these two singlenode clusters into one multinode cluster in

which one 7buntu box will become the designated master 6but also act as a slave with

regard to data storage and processing3" and the other box will become only a slave.


15/38

Prerequisites:

)onfguring singleAnode clusters frst: 0t is recommended that you

use the OOsame settings// 6e.g." installation locations and paths3 on both machines" or

otherwise you might run into problems later when we will migrate the two machines tothe final multinode cluster setup.+ow that you have two singlenode clusters up and

running" we will modify the Hadoop configuration to make one 7buntu box the

KmasterL 6which will also act as a slave3 and the other 7buntu box a KslaveL.

apping the nodes:

The easiest way is to put both machines in the same network with regard to hardware

and software configuration" for example connect both machines via a single hub or

switch and configure the network interfaces to use a common network such

as2P4.2;=.E.x54:.


16/38

To make it simple" we will assign the 0' address2P4.2;=.E.2to themastermachine

and2P4.2;=.E.4to theslavemachine. 7pdate5etc5hostson both machines with the

following line1

etc5hosts 6%or (aster and slave3

5,.56-15 master

5,.56-1. sla!e

SSH ;ccess:

The&rinuuser on themaster6aka&rinuQmaster3 must be able to connect

23 To its own user account on themasterM i.e.ssh asterin this context and not

necessarilyssh localhost.43 To the&rinuuser account on theslave6aka&rinuQslave3 via a passwordless

&&H login

@ou ust have to add the&rinuQmaster/s public &&H key 6which should be

inJH-(,5.ssh5idIrsa.pub3 to theauthori?edIkeysfile of&rinuQslave6in this

user/sJH-(,5.ssh5authori?edIkeys3. @ou can do this manually or use

Srinu@master:~$ sshAcop#Aid Ai $H%sshid'rsapub Srinu@sla!e

This command will prompt you for the login password for userSrinuonslave" then

copy the public &&H key for you" creating the correct directory and fixing the

permissions as necessary.

The final step is to test the &&H setup by connecting with userSrinufrom themasterto

the user accountSrinuon theslave. The step is also needed to saveslaves host key

fingerprint to the&rinuQmaster/sknownIhostsfile.

&o" connecting frommastertomasterF


17/38

Srinu@master:~$ ssh master

The authenticit# o7 host *master &5,.56-15(* can*t be established

S; "e# fngerprint is 2b:.5:b2:c1:.5:/c:3c:/0:.7:5e:.d:,6:3,:eb:37:,/

;re #ou sure #ou want to continue connecting esno(< yes

=arning: >ermanentl# added *master* &S;( to the list o7 "nown hosts

?inu master .6.1A56A2-6 B. Thu un 3 .1:56:52 CT) .113 i6-6

Srinu@master:~$

And frommastertoslave.

Srinu@master:~$ ssh slave

The authenticit# o7 host *sla!e &5,.56-1.(* can*t be established

S; "e# fngerprint is 30:d3:65:-6:db:-6:-7:25:,1:,c:6-:b1:52:--:/.:3.;re #ou sure #ou want to continue connecting esno(< yes

=arning: >ermanentl# added *sla!e* &S;( to the list o7 "nown hosts

Cbuntu 5110

Srinu@sla!e:~$

HadoopCluster

We will see how to configure one 7buntu box as a master node and the other 7buntu

box as a slave node. The master node will also act as a slave because we only have two

machines available in our cluster but still want to spread data storage and processing to

multiple machines.


18/38

The master node will run the KmasterL daemons for each layer1 +ame+ode for the H$%&

storage layer" and obTracker for the (apReduce processing layer.


19/38

respectively 6the primary +ame+ode and the obTracker will be started on the same

machine if you runbin5startall.sh3.

To start individually...

binhadoopAdaemonsh start 8namenode secondar#namenode datanode

Jobtrac"er tas"trac"er9

Again" the machine on which bin5startdfs.sh is run will become the

primary +ame+ode.

-n master" update conf5masters that it looks like this1

)onf5(asters 6-n (aster file3

aster

Conf/slaves (master only)

Theconf*slavesfile lists the hosts" one per line" where the Hadoop slave daemons

6$ata +odes and Task Trackers3 will be run. We want both themasterbox and

theslavebox to act as Hadoop slaves because we want both of them to store and process

data.

-naster" updateconf*slavesthat it looks like this1

onf*slaves /on (aster)

master

sla!e


20/38

Theconf5slavesfile onmasteris used only by the scripts likebin5startdfs.sh

orbin5stopdfs.sh. %or example" if you want to add $ata +odes on the fly you can

KmanuallyL start the $ata+ode daemon on a new slave machine viabin5hadoop

daemon.sh start datanode. 7sing theconf5slaves

file on the master simply helps you to

make KfullL cluster restarts easier.

onf()-site.xml *all ma!ines+

We must change the configuration filesconf*core-site.+l0conf*apred-

site.+landconf*hdfs-site.+lon A!! machines as follows.

%irst" we have to change thefs.default.naeparameter 6inconf5coresite.xml3" which

specifies the+ame+ode6the H$%& master3 host and port. 0n our case" this is themaster

machine.

onf*core-site.+l /'n ll (achines)

Vpropert#L

VnameL)s%de)ault%nameVnameL V!alueLhd)s: //master:*+,.V!alueL

VdescriptionLThe name o7 the de7ault fle s#stem ; C whose

scheme and authorit# determine the FileS#stem implementation The

uri*s scheme determines the confg propert# &7sS)H%%impl( naming

the FileS#stem implementation class The uri*s authorit# is used to

determine the hostU portU etc 7or a fles#stemVdescriptionL

Vpropert#L

onf*(apred-site.+l/'n ll (achines)
http://hadoop.apache.org/core/docs/current/hadoop-default.html#fs.default.namehttp://hadoop.apache.org/core/docs/current/api/overview-summary.htmlhttp://hadoop.apache.org/core/docs/current/hadoop-default.html#fs.default.namehttp://hadoop.apache.org/core/docs/current/api/overview-summary.html


21/38

&econd" we have to change themapred.jo.trac!erparameter 6inconf5mapredsite.xml3"

which specifies theJobTracker6(apReduce master3 host and port. Again" this is

themasterin our case.

Vpropert#L

VnameLmapred%ob%trackerVnameL

V!alueLmaster:*+,V!alueL

VdescriptionLThe host and port that the apeduce Job trac"er runs

at 7 MlocalMU then Jobs are run inAprocess as a single map

and reduce tas"

VdescriptionL

Vpropert#L

Third" we change thedfs.replicationparameter 6inconf5hdfssite.xml3 which specifies

the default block replication. 0t defines how many machines a single file should be

replicated to before it becomes available. 0f you set this to a value higher than the

number of available slave nodes 6more precisely" the number of $ata+odes3" you will

start seeing a lot of K6ero targets found" forbidden2.si?eD23L type errors in the log files.

The default value ofdfs.replicationis9. However" we have only two nodes available" so

we setdfs.replicationto4.
http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.job.trackerhttp://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.job.trackerhttp://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobTracker.htmlhttp://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobTracker.htmlhttp://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.replicationhttp://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.job.trackerhttp://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobTracker.htmlhttp://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.replication


22/38

onf*hdfs-site.+l

Vpropert#L

VnameLd)s%replicationVnameL

V!alueL1V!alueL

VdescriptionLDe7ault bloc" replication

The actual number o7 replications can be specifed when the fle is created

The de7ault is used i7 replication is not specifed in create time

VdescriptionL

Vpropert#L

oratting the Nae Node

To format the filesystem 6which simply initiali?es the directory specified by

thedfs.name.dirvariable on the +ame+ode3" run the command

Srinu@master:usrlocalhadoop$ bin/hadoop namenode -)ormat

NF d7sStorage: Storage director# apphadooptmpd7sname has been success7ull#

7ormatted

Srinu@master:usrlocalhadoop$

The H$%& name table is stored on the +ame +ode/s 6here1master3 local filesystem in

the directory specified bydfs.nae.dir.The name table is used by the +ame+ode to

store tracking and coordination information for the $ata+odes.


23/38

Starting the multi-node cluster

&tarting the cluster is performed in two steps.

2. We begin with starting the H$%& daemons1 the +ame+ode daemon is started

onmaster" and $ata+ode daemons are started on all slaves 6here1masterandslave3.

4. Then we start the (apReduce daemons1 the obTracker is started onmaster" and

TaskTracker daemons are started on all slaves 6here1masterandslave3.

HD$S daemons

Run the commandbin*start-dfs.shon the machine you want the 6primary3

+ame+ode to run on. This will bring up H$%& with the +ame+ode running on the

machine you ran the previous command on" and $ata+odes on the machines listed in

theconf5slavesfile.0n our case" we will runbin5startdfs.shonmaster1

ava rocesses running on (aster after bin*start-dfs.sh

Srinu@master:usrlocalhadoop$ Jps503,, NameNode

5/250 ps

50--1 DataNode

50,33 Secondar#NameNode


ava rocesses running on Slaves after bin*start-dfs.sh

Srinu@sla!e:usrlocalhadoop$ Jps5/5-2 DataNode

5/656 ps

Srinu@sla!e:usrlocalhadoop$


24/38

ava rocesses running on (asters after bin*start-apred.sh

Srinu@master:usrlocalhadoop$ Jps

56153 ps

503,, NameNode

5/6-6 Tas"Trac"er

50--1 DataNode

5//,6 obTrac"er

50,33 Secondar#NameNode


ava rocesses running on Slaves after bin*start-apred.sh

Srinu@sla!e:usrlocalhadoop$ Jps

5/5-2 DataNode

5/-,3 Tas"Trac"er

56.-0 ps

Srinu@sla!e:usrlocalhadoop$


25/38

oissioning and "ecoissioning Nodes in aHadoop luster,

-ne of the most attractive features of Hadoop framework is its utili?ation of commodity

hardware. However" this leads to frequent $ata +ode crashes in a Hadoop cluster.

Another striking feature of Hadoop %ramework is the ease of scale in accordance to the

rapid growth in data volume.


26/38


27/38


28/38

the +ame+ode. )ontent of the 5home5hadoop5hadoop2.4.25hdfsIexclude.txt file is

shown below.

slave4.in

Ste %" &orce configuration reload

Run the command ,HAD%/HM0(bin(!adoop dfsadmin -refres!Nodes,'ithout the uotes

J JHA$--'IH-(,5bin5hadoop dfsadmin refresh+odes

This will force the +ame+ode to reread its configuration" including the newly updated

Oexcludes/ file. 0t will decommission the nodes over a period of time" allowing time for

each nodeSs blocks to be replicated onto machines which are scheduled to remainactive.

-n slave4.in" check the ps command output. After some time" you will see the

$ata+ode process is shutdown automatically.

Ste *" Shutdo'n nodes

After the decommission process has been completed" the decommissioned hardware

can be safely shut down for maintenance. Run the report command to dfsadmin to

check the status of decommission. The following command will describe the status of

the decommission node and the connected nodes to the cluster.

$ JHA$--'IH-(,5bin5hadoop dfsadmin report

Ste +" ,dit ecludes file again

-nce the machines have been decommissioned" they can be removed from the

Oexcludes/ file. Running JHA$--'IH-(,5bin5hadoop dfsadmin

refresh+odes again will read the excludes file back into the +ame+odeU allowing the

$ata+odes to reoin the cluster after the maintenance has been completed" or

additional capacity is needed in the cluster again" etc.

&pecial +ote1 0f the above process is followed and the tasktracker process is still

running on the node" it needs to be shut down. -ne way is to disconnect the machine as


29/38

we did in the above steps. The (aster will recogni?e the process automatically and will

declare as dead. There is no need to follow the same process for removing the

tasktracker because it is +-T much crucial as compared to the $ata+ode. $ata+ode

contains the data that you want to remove safely without any loss of data.

The tasktracker can be run5shutdown on the fly by the following command at any point

of time.

hallenges of running hadoop luster,

(ain challenge in running a hadoop cluster comes from maintenance itself. We will

point out some of the common problems we face every day.


30/38

2. Replacing5upgrading hard drives. @ou got to be careful while scaling.

4. )ommissioning and decommissioning is fairly simple but still keep an eye on

site.xmlSs.

9. 'erformance issues as your data grows. 0t could be at network level" 0- level" $isklevel or Application level. . There are opensource tools like +agios" *anglia or ,nterprise solutions like

brightcomputing for better monitoring capabilities. @ou got to research on it.

;. -nce again better to have clear understanding on each and every parameter of

mapredsite.xml and hdfssite.xml.0f you ust want to performance tune and configure

the cluster" 0 would say understanding fully how each parameter in mapreddefault.xml

and coredefault.xml impacts your obs are critical. There have been changes in the

names of properties over releases and not taking them into account is common for new

people in the domain. %ailing hardware 6disks mainly3 is very common.

7.

Typically a large hadoop clusterSs main ob is replacing hardware1 especially harddrives. &oftware management is not that much of work after the initial setup. Reboot

resolves most of the problems.


31/38

4argest luster in the World.

The largest publicly known Hadoop clusters are @ahooGSs :EEE node cluster followed by

%acebookSs 49EE node cluster V2.

@ahooG has lots of Hadoop nodes but theySre organi?ed under different clusters and are

used for different purposes 6a significant amount of these clusters are research clusters3.

The current obTracker and +ame+ode actually donSt scale that well to that many nodes

6theySve lots of concurrency issues3


32/38

,


33/38


34/38

The statistics files are named1

BHostnameCIBepochofobtrackerstartCIBobidCIBobnameC

Standard Error

These logs are created by each tasktracker. They contain information written tostandard error 6stderr3 captured when a task attempt is run. These logs can be usedfor debugging. %or example" a developer can include &ystem.err.println 6Ksome usefulinformationL3 calls in the ob code. The output will appear in the standard error files.

The parent directory name for these logs is constructed as follows1

5var5log5hadoop5userlogs5attemptIBobidCIB(ap or ReduceCIBattemptidC

where BobidC is the 0$ of the ob that this attempt is doing work for" Bmapor

reduceC is either KmL if the task attempt was a mapper" or KrL if the task attempt was areducer" and BattemptidC is the 0$ of the task attempt.

%or example1

5var5log5hadoop5userlogs5attemptI4EEPE=2PEE4PIEE2ImIEEEE2IE

These logs are rotated according to the mapred.userlog.retain.hours property. @ou canclear these logs periodically without affecting Hadoop. However" consider archivingthe logs if they are of interest in the ob development process. (ake sure you do notmove or delete a file that is being written to by a running ob.

1) H"'' N(N'" '((N"S

3ommand Description

hadoop namenode

A7ormat Format HDFS fles#stem 7rom Namenode

hadoop namenode

AupgradeCpgrade the NameNode

startAd7ssh Start HDFS Daemons


35/38

stopAd7ssh Stop HDFS Daemons

startAmapredsh Start apeduce Daemons

stopAmapredsh Stop apeduce Daemons

hadoop namenode

Areco!er A7orce

eco!er namenode metadata a7ter a cluster

7ailure &ma# lose data(

1% HAD## FS34 3#MMA5DS

3ommand Description

hadoop 7sc" Files#stem chec" on HDFS

hadoop 7sc" Afles Displa# fles during chec"

hadoop 7sc" Afles Abloc"sDispla# fles and bloc"s during

chec"

hadoop 7sc" Afles Abloc"s

Alocations

Displa# flesU bloc"s and its location

during chec"

hadoop 7sc" Afles Abloc"sAlocations Arac"s

Displa# networ" topolog# 7or dataAnode locations

hadoop 7sc" Adelete Delete corrupted fles

hadoop 7sc" Amo!eo!e corrupted fles to lostX7ound

director#

,% HAD## 0#6 3#MMA5DS

3ommand Descriptionhadoop Job Asubmit

VJobAfleLSubmit the Job

hadoop Job Astatus

VJobAidL>rint Job status completion percentage


36/38

hadoop Job Alist all ?ist all Jobs

hadoop Job AlistAacti!eA

trac"ers?ist all a!ailable Tas"Trac"ers

hadoop Job AsetApriorit#

VJobAidL Vpriorit#L

Set priorit# 7or a Job Ialid priorities:I%+'HGHU HGHU N;?U ?=U

I%+'?=

hadoop Job A"illAtas"

Vtas"AidLYill a tas"

hadoop Job Ahistor#Displa# Job histor# including Job detailsU

7ailed and "illed Jobs

+% HAD## DFSADM75 3#MMA5DS

3ommand Description

hadoop d7sadmin

Areporteport fles#stem in7o and statistics

hadoop d7sadmin

Ametasa!e flett

Sa!e namenodeZs primar# data structures to

flett

hadoop d7sadminAset[uota 51

Quotatest

Set Hadoop director# Quota to onl# 51 fles

hadoop d7sadmin

Aclr[uota Quotatest)lear Hadoop director# Quota

hadoop d7sadmin

Are7reshNodes

ead hosts and eclude fles to update

datanodes that are allowed to connect to

namenode ostl# used to commission or

decommissions nodes

hadoop 7s Acount

AQ m#dir)hec" Quota space on director# m#dir

hadoop d7sadmin

AsetSpace[uota

Set Quota to 511 on hd7s director# named


37/38

m#dir 511 m#dir

hadoop d7sadmin

AclrSpace[uota

m#dir

)lear Quota on a HDFS director#

hadooop d7sadmin

Asa!eNameSpace

\ac"up etadata &7simage K edits( >ut

cluster in sa7e mode be7ore this command

*% HAD## SAF$M#D$ 3#MMA5DS%

The following dfsadmin commands helps the cluster to enter or leave safe mode" which

is also called as maintenance mode. 0n this mode" +amenode does not accept any

changes to the name spaceU it does not replicate or delete blocks.

3ommand Description

hadoop d7sadmin Asa7emode

enter%nter sa7e mode


lea!e?ea!e sa7e mode


getGet the status o7 mode


wait

=ait until HDFS fnishes data bloc"

replication

8% HAD## 3#5F79RA;7#5 F7arameters 7or entire Hadoop cluster

hd7sAsiteml >arameters 7or HDFS and its clients

mapredAsiteml >arameters 7or apeduce and its clients

masters Host machines 7or secondar# Namenode

sla!es ?ist o7 sla!e hosts


38/38

=% HAD## MRADM75 3#MMA5DS

3ommand Description

hadoop mradmin Asa7emode get )hec" ob trac"er status

hadoop mradmin Are7resh[ueueseload mapreduce

confguration

hadoop mradmin Are7reshNodes eload acti!e Tas"Trac"ers

hadoop mradmin Are7reshSer!ice;clForce obtrac"er to reload

ser!ice ;)?

hadoop mradmin

Are7reshCserToGroupsappings

Force Jobtrac"er to reload user

group mappings

>% HAD## 6A

clustermanagement (2)

Documents