Install Hadoop Cluster

Install Hadoop Cluster

Hadoop Cluster

Purpose

This tutorial has as purpose to teach how to install a hadoop cluster with 1 Name Node and 2 Data Nodes.
Hadoop was installed into 3 CentOS VMs (KVM) and this tutorial is a merge of 3 others, which you can find on the internet.

Link1 Link2 Link3

Steps

Each command is prepended with a Bash Prompt that show where is running, who is running and the privilege which is running.

Pay attention: [[email protected] ~]#

  • User root
  • Host master.hadoop.local.tld
  • Privileged #

Actions to do on all nodes and master

  • Ensue firewall is stopped
[[email protected] ~]# systemctl stop firewalld
[[email protected] ~]# systemctl disable firewalld
[[email protected] ~]# systemctl stop firewalld
[[email protected] ~]# systemctl disable firewalld
  • Install JDK on all nodes
[[email protected] ~]# rpm -Uvh /root/jdk-8u131-linux-x64.rpm
  • Set up FQDN on all nodes
[[email protected] ~]# vi /etc/hosts
192.168.122.201 master.hadoop.local.tld
192.168.122.202 node1.hadoop.local.tld
192.168.122.203 node2.hadoop.local.tld
[[email protected] ~]# vi /etc/hosts
192.168.122.202 node1.hadoop.local.tld
192.168.122.203 node2.hadoop.local.tld
192.168.122.201 master.hadoop.local.tld
[[email protected] ~]# vi /etc/hosts
192.168.122.203 node2.hadoop.local.tld
192.168.122.202 node1.hadoop.local.tld
192.168.122.201 master.hadoop.local.tld
  • Add hadoop user on all nodes
[[email protected] ~]# useradd -d /opt/hadoop hadoop
[[email protected] ~]# passwd hadoop
[[email protected] ~]# useradd -d /opt/hadoop hadoop
[[email protected] ~]# passwd hadoop
[[email protected] ~]# useradd -d /opt/hadoop hadoop
[[email protected] ~]# passwd hadoop
  • Generate key on all nodes and master and copy to eache one
[[email protected] ~]# su - hadoop
[[email protected] ~]$ ssh-keygen -t rsa
[[email protected] ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
[[email protected] ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
[[email protected] ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
[[email protected] ~]# su - hadoop
[[email protected] ~]$ ssh-keygen -t rsa
[[email protected] ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
[[email protected] ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
[[email protected] ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
[[email protected] ~]# su - hadoop
[[email protected] ~]$ ssh-keygen -t rsa
[[email protected] ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
[[email protected] ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
[[email protected] ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
  • Download hadoop on all nodes
[[email protected] ~]$ curl -O http://ftp.unicamp.br/pub/apache/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz
[[email protected] ~]$ curl -O http://ftp.unicamp.br/pub/apache/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz
[[email protected] ~]$ curl -O http://ftp.unicamp.br/pub/apache/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz

* Ensure you are in /opt/hadoop *

[[email protected] ~]$ tar --strip-components=1 -zxvf hadoop-2.8.0.tar.gz
[[email protected] ~]$ tar --strip-components=1 -zxvf hadoop-2.8.0.tar.gz
[[email protected] ~]$ tar --strip-components=1 -zxvf hadoop-2.8.0.tar.gz
  • Edit .bash_profile on all nodes
[[email protected] ~]$ vi .bash_profile
## JAVA env variables
export JAVA_HOME=/usr/java/default
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar
## HADOOP env variables
export HADOOP_HOME=/opt/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
[[email protected] ~]$ vi .bash_profile
## JAVA env variables
export JAVA_HOME=/usr/java/default
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar
## HADOOP env variables
export HADOOP_HOME=/opt/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
[[email protected] ~]$ vi .bash_profile
## JAVA env variables
export JAVA_HOME=/usr/java/default
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar
## HADOOP env variables
export HADOOP_HOME=/opt/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Actions to do on Master ONLY

  • Create some data dirs on Master
[[email protected] ~]$ mkdir -p /opt/hadoop/hdfs/namenode
[[email protected] ~]$ mkdir -p /opt/hadoop/hdfs/datanode
[[email protected] ~]$ mkdir -p /opt/hadoop/hdfs/namesecondary
[[email protected] ~]$ mkdir -p /opt/hadoop/yarn/local
[[email protected] ~]$ mkdir -p /opt/hadoop/yarn/log
  • Edit core-site.xml
[[email protected] ~]$ vi etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master.hadoop.local.tld:9000/</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
  • Edit hdfs-site.xml
[[email protected] ~]$ vi etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:/opt/hadoop/hdfs/namesecondary</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>
</configuration>
  • Edit mapred-site.xml
[[email protected] ~]$ cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
[[email protected] ~]$ vi etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master.hadoop.local.tld:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master.hadoop.local.tld:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user/app</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Djava.security.egd=file:/dev/../dev/urandom</value>
</property>
</configuration>
  • Edit yarn-site.xml
[[email protected] ~]$ vi etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master.hadoop.local.tld</value>
</property>
<property>
<name>yarn.resourcemanager.bind-host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>yarn.nodemanager.bind-host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:/opt/hadoop/yarn/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>file:/opt/hadoop/yarn/log</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://master.hadoop.local.tld:8020/var/log/hadoop-yarn/apps</value>
</property>
</configuration>
  • Edit hadoop-env.sh
[[email protected] ~]$ vi etc/hadoop/hadoop-env.sh
# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/java/default/
  • Edit slaves
[[email protected] ~]$ vi etc/hadoop/slaves
node1.hadoop.local.tld
node2.hadoop.local.tld
  • Format namenode
[[email protected] ~]$ hdfs namenode -format

Actions to do on Nodes

  • Create some data dirs on Nodes
[[email protected] ~]$ mkdir -p /opt/hadoop/hdfs/datanode
[[email protected] ~]$ mkdir -p /opt/hadoop/yarn/local
[[email protected] ~]$ mkdir -p /opt/hadoop/yarn/log
[[email protected] ~]$ mkdir -p /opt/hadoop/hdfs/datanode
[[email protected] ~]$ mkdir -p /opt/hadoop/yarn/local
[[email protected] ~]$ mkdir -p /opt/hadoop/yarn/log
  • Copy etc from master
[[email protected] ~]$ scp -r master.hadoop.local.tld:etc .
[[email protected] ~]$ scp -r master.hadoop.local.tld:etc .

Actions to do master after all

  • Start at master
[[email protected] ~]$ source .bash_profile
[[email protected] ~]$ start-all.sh

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×