Requirements:

Java

See installation of the latest version here

A dedicated Hadoop user

Add a new Hadoop user and group that will access all the Hadoop files. This is the user in whose directory Hadoop will be installed and will communicate via SSH to the local user.

sudo addgroup hadoop

sudo adduser --ingroup hadoop hduser

Chaos

In my first Hadoop installation, I did not create a separate Hadoop user and only then could I understand why the existence of a separate user was necessary. Creating a separate user helps tremendously in terms of file permissions, security and backups.

I did not have a separate user and so with my first execution run on Hadoop, my normal user suffered. All my file permissions changed and on booting I was only able to log into my normal account and thereafter I could not do anything. The only screen I could see was my desktop, but it was barren. All the files, disks, profile settings, directories were no where to be seen. I could not even access the terminal. The on-screen buttons like unity, network, battery, time and date settings appeared disabled.

My, just a moment ago, perfectly configured laptop (or so I thought) was rendered unusable. I do not know the exact cause of this, but the only place where I deviated from the Installation Manual was not creating a serparate user and that is why I emphasise on creating one.

Terminal is king

The terminal ALWAYS comes to the rescue. Thats why I love Linux so much. No matter how badly you think you have screwed up and the only possible way out now is a fresh installation (which too comes free. :D) the terminal always provides a solution. I cannot stress on the utmost importance and utility of the terminal even if I were to talk about it every single day.

So, here is what I did. I opened my Terminal using Ctrl+Alt+F7. I checked my profile setting, and using ls I could see that sure enough my files were still present in my computer, but I could not view them. I figured out the problem. All the files and settings that were of my normal user had been shifted from my home directory into /, hence one up the directory hierarchy. I do not know why this happened. I could see the sidgan folder, which is my normal user within my home and sure enough it was completely empty. From here on, the solution seemed trivial, all I had to do was to move the entire directory structure one level down. This, I did by one simple command:

mv source dest

I breathed a sigh of relief and could see my splendid laptop back to its normal state with all its configuration files intact.

SSH

In order to access the different nodes, Hadoop uses SSH. All this is done while being logged into the hduser account. Simply generate SSH key for hduser.

One important thing to keep in mind is to not enter any password for the hduser because all the while Hadoop interacts with the nodes, the user will have to input the password each time. This is not possible, so, it is a better idea to not keep any password in the first place.

After generating SSH key for hduser the next step is to enable access to the machine, ie the normal user, in my case sidgan.

Test the SSH setup once by establishing a connection between the normal user and hduser by:

ssh localhost

IPv6 and IPv4

Disable IPv6 only for Hadoop by adding the given line to conf/hadoop-env.sh :

export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true

Installation

Download Hadoop and extract its contents.

tar -xzf hadoop-1.0.3.tar.gz

It should be placed in the hduser directory.

Change the owner of all files to hduser user and hadoop group.

sudo chown -R hduser:hadoop hadoop-1.0.3

Configuration

Step 1: Update .bashrc file

Since the hduser user will be accessing the Hadoop installation, it makes sense to update the .bashrc file of hduser. The following lines as suggested in the Installation Manual have to be appended at the end of the .bashrc file.

Step 2: Update environment variables

The JAVA_HOME path must be changed to the Sun JDK or JRE directory

export JAVA_HOME=/usr/lib/jvm/java-7-sun

This is done for the hduser.

Step 3: HDFS

Hadoop Dedicated File System (HDFS) is the directory where Hadoop stores all the data.

Step 4: Update hadoop-env.sh

Uncomment the line export JAVA_HOME=/usr/lib/jvm/java-X-sun , where X is the version number.

Step 4: Update conf/core-site.xml

Add the following lines within the configuration tags.

</pre></td>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/app/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>
</tr></table></div>

Step 4: Update conf/mapred-site.xml

Add the following lines within the configuration tags.

</pre></td>
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>
</tr></table></div>

Step 4: Update conf/hdfs-site.xml

Add the following lines within the configuration tags.

</pre></td>
<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>
</tr></table></div>

Start Hadoop

Run the following commands as the hadoop user, hduser:

/usr/local/hadoop/bin/hadoop namenode -format

/usr/local/hadoop/bin/start-all.sh


Stop Hadoop

Run the following commands as the hadoop user, hduser:

/usr/local/hadoop/bin/stop-all.sh




blog comments powered by Disqus

Published

31 October 2014

Tags