cdh3 pseudo installation on ubuntu

4
CDH3 Pseudo installation on Ubuntu 1) Do not create username as hadoop as you will have issues in installation. 2) Install Java Copy the java jdk on to desktop. $ sudo cp jdk-6u30-linux-x** /usr/local $ cd /usr/local $ sudo sh jdk-6u30-linux-x** 3) Install CDH3 package Go to - http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH3/CDH3u6/CDH3- Installation-Guide/CDH3-Installation-Guide.html Click on Installing CDH3 on Ubuntu and Debian Systems Click on - this link for a Maverick system Install using GDebi package installer or save it and issue the command below $ sudo dpkg -i Downloads/cdh3-repository_1.0_all.deb $ sudo apt-get update 4) Install Hadoop $ apt-cache search hadoop - Must show all available Hadoop Packages $ sudo apt-get install hadoop-0.20 hadoop-0.20-native sudo apt-get install hadoop-0.20-<daemon type> install all Daemons 5) Set Java and Hadoop Home Using command: gedit ~/.bashrc # Set Hadoop-related environment variables export HADOOP_HOME=/usr/lib/hadoop

Upload: ud

Post on 19-Feb-2016

17 views

Category:

Documents


4 download

DESCRIPTION

CDH4 Pseudo Installation - CentOS

TRANSCRIPT

Page 1: CDH3 Pseudo Installation on Ubuntu

CDH3 Pseudo installation on Ubuntu

1) Do not create username as hadoop as you will have issues in installation.

2) Install Java

Copy the java jdk on to desktop.

$ sudo cp jdk-6u30-linux-x** /usr/local

$ cd /usr/local

$ sudo sh jdk-6u30-linux-x**

3) Install CDH3 package

Go to - http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH3/CDH3u6/CDH3-

Installation-Guide/CDH3-Installation-Guide.html

Click on Installing CDH3 on Ubuntu and Debian Systems

Click on - this link for a Maverick system

Install using GDebi package installer or save it and issue the command below

$ sudo dpkg -i Downloads/cdh3-repository_1.0_all.deb

$ sudo apt-get update

4) Install Hadoop

$ apt-cache search hadoop - Must show all available Hadoop Packages

$ sudo apt-get install hadoop-0.20 hadoop-0.20-native

sudo apt-get install hadoop-0.20-<daemon type> install all Daemons

5) Set Java and Hadoop Home

Using command: gedit ~/.bashrc

# Set Hadoop-related environment variables

export HADOOP_HOME=/usr/lib/hadoop

Page 2: CDH3 Pseudo Installation on Ubuntu

export PATH=$PATH:/usr/lib/hadoop/bin

# Set JAVA_HOME

export JAVA_HOME=/usr/local/jdk1.6.0_30

export PATH=$PATH:/usr/local/jdk1.6.0_30/bin

close terminals and open new one and test

echo $JAVA_HOME

echo $HADOOP HOME

6) Adding dedicated users hdfs and mapred to hadoop group

$ sudo gpasswd -a hdfs hadoop

$ sudo gpasswd -a mapred hadoop

7) Configuration

$ cd /usr/lib/hadoop/conf

Set Java Home in hadoop-env.sh

$ sudo gedit hadoop-env.sh

export JAVA_HOME=/usr/local/jdk1.6.0_30

8) core-site.xml

<property>

<name>hadoop.tmp.dir</name>

<value>/usr/lib/hadoop/tmp</value>

</property>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:8020</value>

</property>

$ sudo mkdir /usr/lib/hadoop/tmp

$ sudo chmod 750 /usr/lib/hadoop/tmp/

$ sudo chown hdfs:hadoop /usr/lib/hadoop/tmp/

9) hdfs-site.xml

Page 3: CDH3 Pseudo Installation on Ubuntu

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

<property>

<name>dfs.name.dir</name>

<value>/storage/name</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>/storage/data</value>

</property>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

$ sudo mkdir /storage

$ sudo chmod 775 /storage/

$ chown hdfs:hadoop /storage/

10) mapred-site.xml

<property>

<name>mapred.job.tracker</name>

<value>hdfs://localhost:8021</value>

</property>

<property>

<name>mapred.system.dir</name>

<value>/mapred/system</value>

</property>

<property>

<name>mapred.local.dir</name>

<value>/mapred/local</value>

</property>

<property>

<name>mapred.temp.dir</name>

<value>/mapred/temp</value>

Page 4: CDH3 Pseudo Installation on Ubuntu

</property>

$ sudo mkdir /mapred

$ sudo chmod 775 /mapred

$ sudo chown mapred:hadoop /mapred

11) User Assignment

export HADOOP_NAMENODE_USER=hdfs

export HADOOP_SECONDARYNAMENODE_USER=hdfs

export HADOOP_DATANODE_USER=hdfs

export HADOOP_JOBTACKER_USER=mapred

export HADOOP_TASKTRACKER_USER=mapred

12) Format namenode

$ cd /usr/lib/hadoop/bin/

$ sudo -u hdfs hadoop namenode -format

You must get a successfully formatted message. Otherwise, check the error and correct it.

13) Start Daemons

$ sudo /etc/init.d/hadoop-0.20-namenode start

$ sudo /etc/init.d/hadoop-0.20-secondarynamenode start

$ sudo /etc/init.d/hadoop-0.20-jobtracker start

$ sudo /etc/init.d/hadoop-0.20-datanode start

$ sudo /etc/init.d/hadoop-0.20-tasktracker start

Check for any errors in /var/log/hadoop-0.20 for each daemon

check all ports are opened using $netstat -ptlen

14) Check UI

localhost:50070 - Hadoop Admin

localhost:50030 - Mapreduce