Hadoop Installation
Requirements:
Java
See installation of the latest version here
A dedicated Hadoop user
Add a new Hadoop user and group that will access all the Hadoop files. This is the user in whose directory Hadoop will be installed and will communicate via SSH to the local user.
sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser
Chaos
In my first Hadoop installation, I did not create a separate Hadoop user and only then could I understand why the existence of a separate user was necessary. Creating a separate user helps tremendously in terms of file permissions, security and backups.
I did not have a separate user and so with my first execution run on Hadoop, my normal user suffered. All my file permissions changed and on booting I was only able to log into my normal account and thereafter I could not do anything. The only screen I could see was my desktop, but it was barren. All the files, disks, profile settings, directories were no where to be seen. I could not even access the terminal. The on-screen buttons like unity, network, battery, time and date settings appeared disabled.
My, just a moment ago, perfectly configured laptop (or so I thought) was rendered unusable. I do not know the exact cause of this, but the only place where I deviated from the Installation Manual was not creating a serparate user and that is why I emphasise on creating one.
Terminal is king
The terminal ALWAYS comes to the rescue. Thats why I love Linux so much. No matter how badly you think you have screwed up and the only possible way out now is a fresh installation (which too comes free. :D) the terminal always provides a solution. I cannot stress on the utmost importance and utility of the terminal even if I were to talk about it every single day.
So, here is what I did. I opened my Terminal using Ctrl+Alt+F7. I checked my profile setting, and using ls
I could see that sure enough my files were still present in my computer, but I could not view them. I figured out the problem. All the files and settings that were of my normal user had been shifted from my home directory into /
, hence one up the directory hierarchy. I do not know why this happened. I could see the sidgan
folder, which is my normal user within my home
and sure enough it was completely empty. From here on, the solution seemed trivial, all I had to do was to move the entire directory structure one level down. This, I did by one simple command:
mv source dest
I breathed a sigh of relief and could see my splendid laptop back to its normal state with all its configuration files intact.
SSH
In order to access the different nodes, Hadoop uses SSH. All this is done while being logged into the hduser
account.
Simply generate SSH key for hduser
.
One important thing to keep in mind is to not enter any password for the hduser
because all the while Hadoop interacts with the nodes, the user will have to input the password each time. This is not possible, so, it is a better idea to not keep any password in the first place.
After generating SSH key for hduser
the next step is to enable access to the machine, ie the normal user, in my case sidgan
.
Test the SSH setup once by establishing a connection between the normal user and hduser
by:
ssh localhost
IPv6 and IPv4
Disable IPv6 only for Hadoop by adding the given line to conf/hadoop-env.sh
:
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
Installation
Download Hadoop and extract its contents.
tar -xzf hadoop-1.0.3.tar.gz
It should be placed in the hduser
directory.
Change the owner of all files to hduser
user and hadoop
group.
sudo chown -R hduser:hadoop hadoop-1.0.3
Configuration
Step 1: Update .bashrc
file
Since the hduser
user will be accessing the Hadoop installation, it makes sense to update the .bashrc
file of hduser
. The following lines as suggested in the Installation Manual have to be appended at the end of the .bashrc
file.
Step 2: Update environment variables
The JAVA_HOME
path must be changed to the Sun JDK or JRE directory
export JAVA_HOME=/usr/lib/jvm/java-7-sun
This is done for the hduser
.
Step 3: HDFS
Hadoop Dedicated File System (HDFS) is the directory where Hadoop stores all the data.
Step 4: Update hadoop-env.sh
Uncomment the line export JAVA_HOME=/usr/lib/jvm/java-X-sun
, where X
is the version number.
Step 4: Update conf/core-site.xml
Add the following lines within the configuration
tags.
Step 4: Update conf/mapred-site.xml
Add the following lines within the configuration
tags.
Step 4: Update conf/hdfs-site.xml
Add the following lines within the configuration
tags.
Start Hadoop
Run the following commands as the hadoop user, hduser
:
/usr/local/hadoop/bin/hadoop namenode -format
/usr/local/hadoop/bin/start-all.sh
Stop Hadoop
Run the following commands as the hadoop user, hduser
:
/usr/local/hadoop/bin/stop-all.sh
blog comments powered by Disqus