Download - ClusterManagement (2)
-
7/26/2019 ClusterManagement (2)
1/38
Hadoop Admin:
Index:
- Responsibilities of Hadoop Admin.- Building Single Node Cluster.- Building Multi Node Cluster.- Commissioning and Deommissioning Nodes in Cluster.- C!allenges of running Hadoop Cluster.- S"stem #og $iles.- Hadoop Admin Commands.
Hadoop Admin Responsibilities:
Responsible for implementation and ongoing administration of Hadoop
infrastructure. Aligning with the systems engineering team to propose and deploy new hardware
and software environments required for Hadoop and to expand existing
environments. Working with data delivery teams to setup new Hadoop users. This ob includes
setting up !inux users" setting up #erberos principals and testing H$%&" Hive" 'igand (apReduce access for the new users.
)luster maintenance as well as creation and removal of nodes using tools like
*anglia" +agios" )loudera (anager ,nterprise" $ell -pen (anage and other tools.
'erformance tuning of Hadoop clusters and Hadoop (apReduce routines.
&creen Hadoop cluster ob performances and capacity planning
(onitor Hadoop cluster connectivity and security
(anage and review Hadoop log files.
%ile system management and monitoring.
H$%& support and maintenance.
$iligently teaming with the infrastructure" network" database" application and
business intelligence teams to guarantee high data quality and availability.
)ollaborating with application teams to install operating system and Hadoop
updates" patches" version upgrades when required.
'oint of )ontact for endor escalation.
-
7/26/2019 ClusterManagement (2)
2/38
-
7/26/2019 ClusterManagement (2)
3/38
TheNameNodeand theSecondary NameNode
&tores the filesystem meta information/s 6directory structure" names"
attributes and file locali?ation3 and ensures that blocks are properly replicated in
the cluster
0t needs a lot of memory 6memory bound3TheDataNode
(anages the state of an H$%& node and interacts with its blocks
+eeds a lot of 05- for processing and data transfer 6/! bound3
The critical components in this architecture are theNameNodeand theSecondary
NameNode.
How HDFS manages its fles
H$%& is optimi?ed for the storage of large files. @ou write the file once and access itmany times. 0n H$%&" a file is split into several blocks. ,ach block is asynchronously
replicated in the cluster. Therefore" the client sends its files once and the cluster takes
care of replicating its blocks in the background.
A block is a contiguous area" a blob of data on the underlying filesystem" its default si?e
is ;:(< but it can be extended to 24=(< or even 4>;(
-
7/26/2019 ClusterManagement (2)
4/38
a"simagefile which contains the filesystem metadata
The editsfile which contains a list of modifications performed on the content
of"simage.
The in memory image is the merge of those two files.
When the +ame+ode starts" it first loads"simageand then applies the content
of editson it to recover the latest state of the filesystem.
An issue would be that over time" the edits file keeps growing undefinitely and ends up
by1
consuming all disk space
slowdown restarts
TheSecondary NameNoderole is to avoid this issue by regularly
merging editswith"simage" thus pushing a new"simageand resetting the content
of edits#The trigger for this compaction process is configurable. 0t can be1
The number of transactions performed on the cluster
The si?e of the edits file
The elapsed time since the last compaction
T$e "ollo%ing "ormula can be applied to kno% $o% muc$ memory a +ame+odeneeds&
B+eeded memoryC D Btotal storage si?e in the cluster in (
-
7/26/2019 ClusterManagement (2)
5/38
Building Single Node Cluster:
)luster of (achines at @ahooFG
%rere&uisites
0nstall ava. C ava 2.; ersion.
Adding a dediated Hadoop s"stem user
-
7/26/2019 ClusterManagement (2)
6/38
We will use a dedicated Hadoop user account for running Hadoop. While that/s not
required it is recommended because it helps to separate the Hadoop installation from
other software applications.
$ sudo addgroup Naresh
$ sudo adduser --ingroup Naresh Srinu
This will add the user srinu and the group +aresh to your local machine.
Configuring SSH (Configuring Key Based Login)
Hadoop requires &&H access to manage its nodes" i.e. remote machines plus your local
machine if you want to use Hadoop on it .%or our singlenode setup of Hadoop" we
therefore need to configure &&H access to localhost for the &rinu user
1) We have to generate an SSH key for the Srinu user.
user@ubuntu:~$ su - Srinu
Srinu@ubuntu:~$ ssh-keygen -t rsa - !!
Generating publicpri!ate rsa "e# pair
%nter fle in which to sa!e the "e# &homeSrinusshid'rsa(:
)reated director# *homeSrinussh*
+our identifcation has been sa!ed in homeSrinusshid'rsa
+our public "e# has been sa!ed in homeSrinusshid'rsapub
The "e# fngerprint is:
,b:-.:ea:/-:b0:e1:2/:d3:4:5,:66:a6:e7:ae:1e:d. Srinu@ubuntu
The "e#*s randomart image is:8snipp9Srinu@ubuntu:~$
The second line will create an R&A key pair with an empty password. *enerally" using an
empty password is not recommended" but in this case it is needed to unlock the key
-
7/26/2019 ClusterManagement (2)
7/38
without your interaction 6you don/t want to enter the passphrase every time Hadoop
interacts with its nodes3.
2) We have to enable SSH access to your local achine !ith this ne!ly
created key.
Srinu@ubuntu:~$ cat "H#M$/%ssh/id&rsa%pub '' "H#M$/%ssh/authori(ed&keys
The final step is to test the &&H setup by connecting to your local machine with
the&rinu user. The step is also needed to save your local machine/s host key fingerprint
to the&rinu user/sknownIhostsfile.
Srinu@ubuntu:~$ ssh localhost
The authenticit# o7 host *localhost &::5(* can*t be established
S; "e# fngerprint is d3:-3:./:03:ae:1.:11:eb:5d:3/:07:bb:00:7,:26:.6
;re #ou sure #ou want to continue connecting esno(< yes=arning: >ermanentl# added *localhost* &S;( to the list o7 "nown hosts
?inu ubuntu .62.A..Ageneric B22ACbuntu S> =ed ;pr .- 52:.3:21 CT) .151 i6-6
GNC?inu
Cbuntu 5.10 ?TS
8snipp9Srinu@ubuntu:~$
"isabling #v%
-ne problem with 0'v; on 7buntu is that usingE.E.E.Efor the various networking
related Hadoop configuration options will result in Hadoop binding to the 0'v;
addresses of my 7buntu box. 0n my case" 0 reali?ed that there/s no practical point in
enabling 0'v; on a box when you are not connected to any 0'v; network. Hence" 0
simply disabled 0'v; on my 7buntu machine. @our mileage may vary.
To disable 0'v; on 7buntu" open5etc5sysctl.confin the editor of your choice and add
the following lines to the end of the file1
B disable ip!6
netip!6con7alldisable'ip!6 E 5
netip!6con7de7aultdisable'ip!6 E 5netip!6con7lodisable'ip!6 E 5
-
7/26/2019 ClusterManagement (2)
8/38
-
7/26/2019 ClusterManagement (2)
9/38
alias hlsEM7s AlsM
B 7 #ou ha!e ?O compression enabled in #our Hadoop cluster and
B compress Job outputs with ?O> ¬ co!ered in this tutorial(:
B )on!enientl# inspect an ?O> compressed fle 7rom the command
B lineP run !ia:B
B $ lohead hd7spathtolopcompressedflelo
B
B eQuires installed *lop* command
B
lohead &( R
hadoop 7s Acat $5 lop Adc head A5111 less
B ;dd Hadoop bin director# to >;TH
eport >;THE$>;TH:$H;D>'H%bin
Configuration
Hadoop-en'.s!
The only required environment variable we have to configure for Hadoop in this tutorial
isAAIH-(,. -penconf*hadoop-env.shin the editor of your choice 6if you used
the installation path in this tutorial" the full path
is5usr*local*hadoop*conf*hadoop-env.sh 3 and set
theAAIH-(,environment variable to the &un $#5R, ; directory.
)hange
)onf5hadoopenv.sh
B The Ja!a implementation to use eQuired
B eport ;I;'H%EusrlibJ.sd"5/Asun
To
)onf5hadoopenv.sh
-
7/26/2019 ClusterManagement (2)
10/38
B The Ja!a implementation to use eQuired
eport ;I;'H%EusrlibJ!mJa!aA6Asun
Conf()-site.xml
0n this section" we will configure the directory where Hadoop will store its data files" the
network ports it listens to" etc. -ur setup will use Hadoop/s $istributed %ile
&ystem"H$%&" even though our little KclusterL only contains our single local machine.
@ou can leave the settings below Kas isL with the exception of
thehadoop.tp.dirparameter M this parameter you must change to a directory ofyour choice. We will use the directory*app*hadoop*tp.
+ow we create the directory and set the required ownerships and permissions1
$ sudo m"dir Ap apphadooptmp
$ sudo chown Srinu:hadoop apphadooptmp
B and i7 #ou want to tighten up securit#U chmod 7rom 3// to 3/1
$ sudo chmod 3/1 apphadooptmp
Add the following snippets between theBconfigurationC ... B5configurationCtags in the
respective configuration N(! file.
0n fileconf*core-site.+l,
onf*core-site.+l
http://hadoop.apache.org/hdfs/docs/current/hdfs_design.htmlhttp://hadoop.apache.org/hdfs/docs/current/hdfs_design.htmlhttp://hadoop.apache.org/hdfs/docs/current/hdfs_design.html -
7/26/2019 ClusterManagement (2)
11/38
Vpropert#L
VnameLhadoop%tmp%dirVnameL
V!alueL/app/hadoop/tmpV!alueL
VdescriptionL; base 7or other temporar# directoriesVdescriptionL
Vpropert#L
Vpropert#L
VnameL)s%de)ault%nameVnameL
V!alueLhd)s://localhost:*+,.V!alueL
VdescriptionLThe name o7 the de7ault fle s#stem ; C whose
scheme and authorit# determine the FileS#stem implementation The
uri*s scheme determines the confg propert# &7sS)H%%impl( naming
the FileS#stem implementation class The uri*s authorit# is used to
determine the hostU portU etc 7or a fles#stemVdescriptionLVpropert#L
onf*apred-site.+l
Vpropert#L
VnameLmapred%ob%trackerVnameL
V!alueLlocalhost:*+,V!alueL
VdescriptionLThe host and port that the apeduce Job trac"er runs at 7 MlocalMU then Jobs are run inAprocess as a single map
and reduce tas"
VdescriptionL
Vpropert#L
onf*Hdfs-site.+l
Vpropert#L
VnameLd)s%replicationVnameL
V!alueLV!alueL
VdescriptionLDe7ault bloc" replication
The actual number o7 replications can be specifed when the fle is createdThe de7ault is
used i7 replication is not specifed in create time
VdescriptionL
-
7/26/2019 ClusterManagement (2)
12/38
Vpropert#L
oratting H"S via Nae Node.
Srinu@ubuntu:~$ usrlocalhadoopbinhadoop namenode A7ormat
Srinu@ubuntu:usrlocalhadoop$ binhadoop namenode A7ormat
511/1- 56:/,:/6 NF namenodeNameNode: ST;TC>'SG:
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
ST;TC>'SG: Starting NameNode
ST;TC>'SG: host E ubuntu5.3155ST;TC>'SG: args E 8A7ormat9
ST;TC>'SG: !ersion E 1.1.
ST;TC>'SG: build E https:s!napacheorgreposas7hadoopcommonbranchesbranchA
1.1 Ar ,55313P compiled b# *chrisdo* on Fri Feb 5, 1-:13:20 CT) .151
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
511/1- 56:/,:/6 NF namenodeFSNames#stem: 7swnerESrinuUhadoop
511/1- 56:/,:/6 NF namenodeFSNames#stem: supergroupEsupergroup
511/1- 56:/,:/6 NF namenodeFSNames#stem: is>ermission%nabledEtrue
511/1- 56:/,:/6 NF commonStorage: mage fle o7 sie ,6 sa!ed in 1 seconds
511/1- 56:/,:/3 NF commonStorage: Storage director# hadoopASrinud7sname has
been success7ull# 7ormatted511/1- 56:/,:/3 NF namenodeNameNode: SHCTD=N'SG:
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
SHCTD=N'SG: Shutting down NameNode at ubuntu5.3155
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
Srinu@ubuntu:usrlocalhadoop$
Starting Single Node luster.
Srinu@ubuntu:~$ usrlocalhadoopbin/start-all%sh
These will startup a +amenode" $atanode" obtracker and a Tasktracker on your
machine. The output will look like this1
-
7/26/2019 ClusterManagement (2)
13/38
Srinu@ubuntu:usrlocalhadoop$ bin/start-all%sh
starting namenodeU logging to usrlocalhadoopbinlogshadoopASrinuAnamenodeA
ubuntuout
localhost: starting datanodeU logging to usrlocalhadoopbinlogshadoopASrinuAdatanodeA
ubuntuout
localhost: starting secondar#namenodeU logging to usrlocalhadoopbinlogshadoopASrinuAsecondar#namenodeAubuntuout
starting Jobtrac"erU logging to usrlocalhadoopbinlogshadoopASrinuAJobtrac"erA
ubuntuout
localhost: starting tas"trac"erU logging to usrlocalhadoopbinlogshadoopASrinuA
tas"trac"erAubuntuout
Srinu@ubuntu:usrlocalhadoop$
0S
Srinu@ubuntu:usrlocalhadoop$ps..-3 Tas"Trac"er
.50, obTrac"er
5,2- DataNode
.1-/ Secondar#NameNode
.20, ps
53-- NameNode
0f there are any errors" examine the log files in the5logs5directory.
Stopping Single Node luster.
Srinu@ubuntu:usrlocalhadoop$ bin/stop-all%sh
stopping Jobtrac"er
localhost: stopping tas"trac"er
stopping namenode
localhost: stopping datanode
localhost: stopping secondar#namenode
Srinu@ubuntu:usrlocalhadoop$
-
7/26/2019 ClusterManagement (2)
14/38
Building Multi Node Cluster:
From two single-node clusters to a multi-node cluster:We will build a
multinode cluster using two 7buntu boxes. The best way to do this for starters is to
install" configure and test a KlocalL Hadoop setup for each of the two 7buntu boxes" and
in a second step to KmergeL these two singlenode clusters into one multinode cluster in
which one 7buntu box will become the designated master 6but also act as a slave with
regard to data storage and processing3" and the other box will become only a slave.
-
7/26/2019 ClusterManagement (2)
15/38
Prerequisites:
)onfguring singleAnode clusters frst: 0t is recommended that you
use the OOsame settings// 6e.g." installation locations and paths3 on both machines" or
otherwise you might run into problems later when we will migrate the two machines tothe final multinode cluster setup.+ow that you have two singlenode clusters up and
running" we will modify the Hadoop configuration to make one 7buntu box the
KmasterL 6which will also act as a slave3 and the other 7buntu box a KslaveL.
apping the nodes:
The easiest way is to put both machines in the same network with regard to hardware
and software configuration" for example connect both machines via a single hub or
switch and configure the network interfaces to use a common network such
as2P4.2;=.E.x54:.
-
7/26/2019 ClusterManagement (2)
16/38
To make it simple" we will assign the 0' address2P4.2;=.E.2to themastermachine
and2P4.2;=.E.4to theslavemachine. 7pdate5etc5hostson both machines with the
following line1
etc5hosts 6%or (aster and slave3
5,.56-15 master
5,.56-1. sla!e
SSH ;ccess:
The&rinuuser on themaster6aka&rinuQmaster3 must be able to connect
23 To its own user account on themasterM i.e.ssh asterin this context and not
necessarilyssh localhost.43 To the&rinuuser account on theslave6aka&rinuQslave3 via a passwordless
&&H login
@ou ust have to add the&rinuQmaster/s public &&H key 6which should be
inJH-(,5.ssh5idIrsa.pub3 to theauthori?edIkeysfile of&rinuQslave6in this
user/sJH-(,5.ssh5authori?edIkeys3. @ou can do this manually or use
Srinu@master:~$ sshAcop#Aid Ai $H%sshid'rsapub Srinu@sla!e
This command will prompt you for the login password for userSrinuonslave" then
copy the public &&H key for you" creating the correct directory and fixing the
permissions as necessary.
The final step is to test the &&H setup by connecting with userSrinufrom themasterto
the user accountSrinuon theslave. The step is also needed to saveslaves host key
fingerprint to the&rinuQmaster/sknownIhostsfile.
&o" connecting frommastertomasterF
-
7/26/2019 ClusterManagement (2)
17/38
Srinu@master:~$ ssh master
The authenticit# o7 host *master &5,.56-15(* can*t be established
S; "e# fngerprint is 2b:.5:b2:c1:.5:/c:3c:/0:.7:5e:.d:,6:3,:eb:37:,/
;re #ou sure #ou want to continue connecting esno(< yes
=arning: >ermanentl# added *master* &S;( to the list o7 "nown hosts
?inu master .6.1A56A2-6 B. Thu un 3 .1:56:52 CT) .113 i6-6
Srinu@master:~$
And frommastertoslave.
Srinu@master:~$ ssh slave
The authenticit# o7 host *sla!e &5,.56-1.(* can*t be established
S; "e# fngerprint is 30:d3:65:-6:db:-6:-7:25:,1:,c:6-:b1:52:--:/.:3.;re #ou sure #ou want to continue connecting esno(< yes
=arning: >ermanentl# added *sla!e* &S;( to the list o7 "nown hosts
Cbuntu 5110
Srinu@sla!e:~$
HadoopCluster
We will see how to configure one 7buntu box as a master node and the other 7buntu
box as a slave node. The master node will also act as a slave because we only have two
machines available in our cluster but still want to spread data storage and processing to
multiple machines.
-
7/26/2019 ClusterManagement (2)
18/38
The master node will run the KmasterL daemons for each layer1 +ame+ode for the H$%&
storage layer" and obTracker for the (apReduce processing layer.
-
7/26/2019 ClusterManagement (2)
19/38
respectively 6the primary +ame+ode and the obTracker will be started on the same
machine if you runbin5startall.sh3.
To start individually...
binhadoopAdaemonsh start 8namenode secondar#namenode datanode
Jobtrac"er tas"trac"er9
Again" the machine on which bin5startdfs.sh is run will become the
primary +ame+ode.
-n master" update conf5masters that it looks like this1
)onf5(asters 6-n (aster file3
aster
Conf/slaves (master only)
Theconf*slavesfile lists the hosts" one per line" where the Hadoop slave daemons
6$ata +odes and Task Trackers3 will be run. We want both themasterbox and
theslavebox to act as Hadoop slaves because we want both of them to store and process
data.
-naster" updateconf*slavesthat it looks like this1
onf*slaves /on (aster)
master
sla!e
-
7/26/2019 ClusterManagement (2)
20/38
Theconf5slavesfile onmasteris used only by the scripts likebin5startdfs.sh
orbin5stopdfs.sh. %or example" if you want to add $ata +odes on the fly you can
KmanuallyL start the $ata+ode daemon on a new slave machine viabin5hadoop
daemon.sh start datanode. 7sing theconf5slaves
file on the master simply helps you to
make KfullL cluster restarts easier.
onf()-site.xml *all ma!ines+
We must change the configuration filesconf*core-site.+l0conf*apred-
site.+landconf*hdfs-site.+lon A!! machines as follows.
%irst" we have to change thefs.default.naeparameter 6inconf5coresite.xml3" which
specifies the+ame+ode6the H$%& master3 host and port. 0n our case" this is themaster
machine.
onf*core-site.+l /'n ll (achines)
Vpropert#L
VnameL)s%de)ault%nameVnameL V!alueLhd)s: //master:*+,.V!alueL
VdescriptionLThe name o7 the de7ault fle s#stem ; C whose
scheme and authorit# determine the FileS#stem implementation The
uri*s scheme determines the confg propert# &7sS)H%%impl( naming
the FileS#stem implementation class The uri*s authorit# is used to
determine the hostU portU etc 7or a fles#stemVdescriptionL
Vpropert#L
onf*(apred-site.+l/'n ll (achines)
http://hadoop.apache.org/core/docs/current/hadoop-default.html#fs.default.namehttp://hadoop.apache.org/core/docs/current/api/overview-summary.htmlhttp://hadoop.apache.org/core/docs/current/hadoop-default.html#fs.default.namehttp://hadoop.apache.org/core/docs/current/api/overview-summary.html -
7/26/2019 ClusterManagement (2)
21/38
&econd" we have to change themapred.jo.trac!erparameter 6inconf5mapredsite.xml3"
which specifies theJobTracker6(apReduce master3 host and port. Again" this is
themasterin our case.
Vpropert#L
VnameLmapred%ob%trackerVnameL
V!alueLmaster:*+,V!alueL
VdescriptionLThe host and port that the apeduce Job trac"er runs
at 7 MlocalMU then Jobs are run inAprocess as a single map
and reduce tas"
VdescriptionL
Vpropert#L
Third" we change thedfs.replicationparameter 6inconf5hdfssite.xml3 which specifies
the default block replication. 0t defines how many machines a single file should be
replicated to before it becomes available. 0f you set this to a value higher than the
number of available slave nodes 6more precisely" the number of $ata+odes3" you will
start seeing a lot of K6ero targets found" forbidden2.si?eD23L type errors in the log files.
The default value ofdfs.replicationis9. However" we have only two nodes available" so
we setdfs.replicationto4.
http://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.job.trackerhttp://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.job.trackerhttp://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobTracker.htmlhttp://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobTracker.htmlhttp://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.replicationhttp://hadoop.apache.org/core/docs/current/hadoop-default.html#mapred.job.trackerhttp://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobTracker.htmlhttp://hadoop.apache.org/core/docs/current/hadoop-default.html#dfs.replication -
7/26/2019 ClusterManagement (2)
22/38
onf*hdfs-site.+l
Vpropert#L
VnameLd)s%replicationVnameL
V!alueL1V!alueL
VdescriptionLDe7ault bloc" replication
The actual number o7 replications can be specifed when the fle is created
The de7ault is used i7 replication is not specifed in create time
VdescriptionL
Vpropert#L
oratting the Nae Node
To format the filesystem 6which simply initiali?es the directory specified by
thedfs.name.dirvariable on the +ame+ode3" run the command
Srinu@master:usrlocalhadoop$ bin/hadoop namenode -)ormat
NF d7sStorage: Storage director# apphadooptmpd7sname has been success7ull#
7ormatted
Srinu@master:usrlocalhadoop$
The H$%& name table is stored on the +ame +ode/s 6here1master3 local filesystem in
the directory specified bydfs.nae.dir.The name table is used by the +ame+ode to
store tracking and coordination information for the $ata+odes.
-
7/26/2019 ClusterManagement (2)
23/38
Starting the multi-node cluster
&tarting the cluster is performed in two steps.
2. We begin with starting the H$%& daemons1 the +ame+ode daemon is started
onmaster" and $ata+ode daemons are started on all slaves 6here1masterandslave3.
4. Then we start the (apReduce daemons1 the obTracker is started onmaster" and
TaskTracker daemons are started on all slaves 6here1masterandslave3.
HD$S daemons
Run the commandbin*start-dfs.shon the machine you want the 6primary3
+ame+ode to run on. This will bring up H$%& with the +ame+ode running on the
machine you ran the previous command on" and $ata+odes on the machines listed in
theconf5slavesfile.0n our case" we will runbin5startdfs.shonmaster1
ava rocesses running on (aster after bin*start-dfs.sh
Srinu@master:usrlocalhadoop$ Jps503,, NameNode
5/250 ps
50--1 DataNode
50,33 Secondar#NameNode
Srinu@master:usrlocalhadoop$
ava rocesses running on Slaves after bin*start-dfs.sh
Srinu@sla!e:usrlocalhadoop$ Jps5/5-2 DataNode
5/656 ps
Srinu@sla!e:usrlocalhadoop$
-
7/26/2019 ClusterManagement (2)
24/38
ava rocesses running on (asters after bin*start-apred.sh
Srinu@master:usrlocalhadoop$ Jps
56153 ps
503,, NameNode
5/6-6 Tas"Trac"er
50--1 DataNode
5//,6 obTrac"er
50,33 Secondar#NameNode
Srinu@master:usrlocalhadoop$
ava rocesses running on Slaves after bin*start-apred.sh
Srinu@sla!e:usrlocalhadoop$ Jps
5/5-2 DataNode
5/-,3 Tas"Trac"er
56.-0 ps
Srinu@sla!e:usrlocalhadoop$
-
7/26/2019 ClusterManagement (2)
25/38
oissioning and "ecoissioning Nodes in aHadoop luster,
-ne of the most attractive features of Hadoop framework is its utili?ation of commodity
hardware. However" this leads to frequent $ata +ode crashes in a Hadoop cluster.
Another striking feature of Hadoop %ramework is the ease of scale in accordance to the
rapid growth in data volume.
-
7/26/2019 ClusterManagement (2)
26/38
-
7/26/2019 ClusterManagement (2)
27/38
-
7/26/2019 ClusterManagement (2)
28/38
the +ame+ode. )ontent of the 5home5hadoop5hadoop2.4.25hdfsIexclude.txt file is
shown below.
slave4.in
Ste %" &orce configuration reload
Run the command ,HAD%/HM0(bin(!adoop dfsadmin -refres!Nodes,'ithout the uotes
J JHA$--'IH-(,5bin5hadoop dfsadmin refresh+odes
This will force the +ame+ode to reread its configuration" including the newly updated
Oexcludes/ file. 0t will decommission the nodes over a period of time" allowing time for
each nodeSs blocks to be replicated onto machines which are scheduled to remainactive.
-n slave4.in" check the ps command output. After some time" you will see the
$ata+ode process is shutdown automatically.
Ste *" Shutdo'n nodes
After the decommission process has been completed" the decommissioned hardware
can be safely shut down for maintenance. Run the report command to dfsadmin to
check the status of decommission. The following command will describe the status of
the decommission node and the connected nodes to the cluster.
$ JHA$--'IH-(,5bin5hadoop dfsadmin report
Ste +" ,dit ecludes file again
-nce the machines have been decommissioned" they can be removed from the
Oexcludes/ file. Running JHA$--'IH-(,5bin5hadoop dfsadmin
refresh+odes again will read the excludes file back into the +ame+odeU allowing the
$ata+odes to reoin the cluster after the maintenance has been completed" or
additional capacity is needed in the cluster again" etc.
&pecial +ote1 0f the above process is followed and the tasktracker process is still
running on the node" it needs to be shut down. -ne way is to disconnect the machine as
-
7/26/2019 ClusterManagement (2)
29/38
we did in the above steps. The (aster will recogni?e the process automatically and will
declare as dead. There is no need to follow the same process for removing the
tasktracker because it is +-T much crucial as compared to the $ata+ode. $ata+ode
contains the data that you want to remove safely without any loss of data.
The tasktracker can be run5shutdown on the fly by the following command at any point
of time.
hallenges of running hadoop luster,
(ain challenge in running a hadoop cluster comes from maintenance itself. We will
point out some of the common problems we face every day.
-
7/26/2019 ClusterManagement (2)
30/38
2. Replacing5upgrading hard drives. @ou got to be careful while scaling.
4. )ommissioning and decommissioning is fairly simple but still keep an eye on
site.xmlSs.
9. 'erformance issues as your data grows. 0t could be at network level" 0- level" $isklevel or Application level. . There are opensource tools like +agios" *anglia or ,nterprise solutions like
brightcomputing for better monitoring capabilities. @ou got to research on it.
;. -nce again better to have clear understanding on each and every parameter of
mapredsite.xml and hdfssite.xml.0f you ust want to performance tune and configure
the cluster" 0 would say understanding fully how each parameter in mapreddefault.xml
and coredefault.xml impacts your obs are critical. There have been changes in the
names of properties over releases and not taking them into account is common for new
people in the domain. %ailing hardware 6disks mainly3 is very common.
7.
Typically a large hadoop clusterSs main ob is replacing hardware1 especially harddrives. &oftware management is not that much of work after the initial setup. Reboot
resolves most of the problems.
-
7/26/2019 ClusterManagement (2)
31/38
4argest luster in the World.
The largest publicly known Hadoop clusters are @ahooGSs :EEE node cluster followed by
%acebookSs 49EE node cluster V2.
@ahooG has lots of Hadoop nodes but theySre organi?ed under different clusters and are
used for different purposes 6a significant amount of these clusters are research clusters3.
The current obTracker and +ame+ode actually donSt scale that well to that many nodes
6theySve lots of concurrency issues3
-
7/26/2019 ClusterManagement (2)
32/38
,
-
7/26/2019 ClusterManagement (2)
33/38
-
7/26/2019 ClusterManagement (2)
34/38
The statistics files are named1
BHostnameCIBepochofobtrackerstartCIBobidCIBobnameC
Standard Error
These logs are created by each tasktracker. They contain information written tostandard error 6stderr3 captured when a task attempt is run. These logs can be usedfor debugging. %or example" a developer can include &ystem.err.println 6Ksome usefulinformationL3 calls in the ob code. The output will appear in the standard error files.
The parent directory name for these logs is constructed as follows1
5var5log5hadoop5userlogs5attemptIBobidCIB(ap or ReduceCIBattemptidC
where BobidC is the 0$ of the ob that this attempt is doing work for" Bmapor
reduceC is either KmL if the task attempt was a mapper" or KrL if the task attempt was areducer" and BattemptidC is the 0$ of the task attempt.
%or example1
5var5log5hadoop5userlogs5attemptI4EEPE=2PEE4PIEE2ImIEEEE2IE
These logs are rotated according to the mapred.userlog.retain.hours property. @ou canclear these logs periodically without affecting Hadoop. However" consider archivingthe logs if they are of interest in the ob development process. (ake sure you do notmove or delete a file that is being written to by a running ob.
1) H"'' N(N'" '((N"S
3ommand Description
hadoop namenode
A7ormat Format HDFS fles#stem 7rom Namenode
hadoop namenode
AupgradeCpgrade the NameNode
startAd7ssh Start HDFS Daemons
-
7/26/2019 ClusterManagement (2)
35/38
stopAd7ssh Stop HDFS Daemons
startAmapredsh Start apeduce Daemons
stopAmapredsh Stop apeduce Daemons
hadoop namenode
Areco!er A7orce
eco!er namenode metadata a7ter a cluster
7ailure &ma# lose data(
1% HAD## FS34 3#MMA5DS
3ommand Description
hadoop 7sc" Files#stem chec" on HDFS
hadoop 7sc" Afles Displa# fles during chec"
hadoop 7sc" Afles Abloc"sDispla# fles and bloc"s during
chec"
hadoop 7sc" Afles Abloc"s
Alocations
Displa# flesU bloc"s and its location
during chec"
hadoop 7sc" Afles Abloc"sAlocations Arac"s
Displa# networ" topolog# 7or dataAnode locations
hadoop 7sc" Adelete Delete corrupted fles
hadoop 7sc" Amo!eo!e corrupted fles to lostX7ound
director#
,% HAD## 0#6 3#MMA5DS
3ommand Descriptionhadoop Job Asubmit
VJobAfleLSubmit the Job
hadoop Job Astatus
VJobAidL>rint Job status completion percentage
-
7/26/2019 ClusterManagement (2)
36/38
hadoop Job Alist all ?ist all Jobs
hadoop Job AlistAacti!eA
trac"ers?ist all a!ailable Tas"Trac"ers
hadoop Job AsetApriorit#
VJobAidL Vpriorit#L
Set priorit# 7or a Job Ialid priorities:I%+'HGHU HGHU N;?U ?=U
I%+'?=
hadoop Job A"illAtas"
Vtas"AidLYill a tas"
hadoop Job Ahistor#Displa# Job histor# including Job detailsU
7ailed and "illed Jobs
+% HAD## DFSADM75 3#MMA5DS
3ommand Description
hadoop d7sadmin
Areporteport fles#stem in7o and statistics
hadoop d7sadmin
Ametasa!e flett
Sa!e namenodeZs primar# data structures to
flett
hadoop d7sadminAset[uota 51
Quotatest
Set Hadoop director# Quota to onl# 51 fles
hadoop d7sadmin
Aclr[uota Quotatest)lear Hadoop director# Quota
hadoop d7sadmin
Are7reshNodes
ead hosts and eclude fles to update
datanodes that are allowed to connect to
namenode ostl# used to commission or
decommissions nodes
hadoop 7s Acount
AQ m#dir)hec" Quota space on director# m#dir
hadoop d7sadmin
AsetSpace[uota
Set Quota to 511 on hd7s director# named
-
7/26/2019 ClusterManagement (2)
37/38
m#dir 511 m#dir
hadoop d7sadmin
AclrSpace[uota
m#dir
)lear Quota on a HDFS director#
hadooop d7sadmin
Asa!eNameSpace
\ac"up etadata &7simage K edits( >ut
cluster in sa7e mode be7ore this command
*% HAD## SAF$M#D$ 3#MMA5DS%
The following dfsadmin commands helps the cluster to enter or leave safe mode" which
is also called as maintenance mode. 0n this mode" +amenode does not accept any
changes to the name spaceU it does not replicate or delete blocks.
3ommand Description
hadoop d7sadmin Asa7emode
enter%nter sa7e mode
hadoop d7sadmin Asa7emode
lea!e?ea!e sa7e mode
hadoop d7sadmin Asa7emode
getGet the status o7 mode
hadoop d7sadmin Asa7emode
wait
=ait until HDFS fnishes data bloc"
replication
8% HAD## 3#5F79RA;7#5 F7arameters 7or entire Hadoop cluster
hd7sAsiteml >arameters 7or HDFS and its clients
mapredAsiteml >arameters 7or apeduce and its clients
masters Host machines 7or secondar# Namenode
sla!es ?ist o7 sla!e hosts
-
7/26/2019 ClusterManagement (2)
38/38
=% HAD## MRADM75 3#MMA5DS
3ommand Description
hadoop mradmin Asa7emode get )hec" ob trac"er status
hadoop mradmin Are7resh[ueueseload mapreduce
confguration
hadoop mradmin Are7reshNodes eload acti!e Tas"Trac"ers
hadoop mradmin Are7reshSer!ice;clForce obtrac"er to reload
ser!ice ;)?
hadoop mradmin
Are7reshCserToGroupsappings
Force Jobtrac"er to reload user
group mappings
>% HAD## 6A