Hadoop Cluster
Purpose
This tutorial has as purpose to teach how to install a hadoop cluster with 1 Name Node and 2 Data Nodes.
Hadoop was installed into 3 CentOS VMs (KVM) and this tutorial is a merge of 3 others, which you can find on the internet.
Steps
Each command is prepended with a Bash Prompt that show where is running, who is running and the privilege which is running.
Pay attention: [[email protected] ~]#
- User
root
- Host
master.hadoop.local.tld
- Privileged
#
Actions to do on all nodes and master
- Ensue firewall is stopped
[[email protected] ~]# systemctl stop firewalld |
[[email protected] ~]# systemctl stop firewalld |
- Install JDK on all nodes
[[email protected] ~]# rpm -Uvh /root/jdk-8u131-linux-x64.rpm |
- Set up FQDN on all nodes
[[email protected] ~]# vi /etc/hosts |
192.168.122.201 master.hadoop.local.tld |
[[email protected] ~]# vi /etc/hosts |
192.168.122.202 node1.hadoop.local.tld |
[[email protected] ~]# vi /etc/hosts |
192.168.122.203 node2.hadoop.local.tld |
- Add hadoop user on all nodes
[[email protected] ~]# useradd -d /opt/hadoop hadoop |
[[email protected] ~]# useradd -d /opt/hadoop hadoop |
[[email protected] ~]# useradd -d /opt/hadoop hadoop |
- Generate key on all nodes and master and copy to eache one
[[email protected] ~]# su - hadoop |
[[email protected] ~]# su - hadoop |
[[email protected] ~]# su - hadoop |
- Download hadoop on all nodes
[[email protected] ~]$ curl -O http://ftp.unicamp.br/pub/apache/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz |
[[email protected] ~]$ curl -O http://ftp.unicamp.br/pub/apache/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz |
[[email protected] ~]$ curl -O http://ftp.unicamp.br/pub/apache/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz |
* Ensure you are in /opt/hadoop *
[[email protected] ~]$ tar --strip-components=1 -zxvf hadoop-2.8.0.tar.gz |
[[email protected] ~]$ tar --strip-components=1 -zxvf hadoop-2.8.0.tar.gz |
[[email protected] ~]$ tar --strip-components=1 -zxvf hadoop-2.8.0.tar.gz |
- Edit .bash_profile on all nodes
[[email protected] ~]$ vi .bash_profile |
## JAVA env variables |
[[email protected] ~]$ vi .bash_profile |
## JAVA env variables |
[[email protected] ~]$ vi .bash_profile |
## JAVA env variables |
Actions to do on Master ONLY
- Create some data dirs on Master
[[email protected] ~]$ mkdir -p /opt/hadoop/hdfs/namenode |
- Edit core-site.xml
[[email protected] ~]$ vi etc/hadoop/core-site.xml |
<configuration> |
- Edit hdfs-site.xml
[[email protected] ~]$ vi etc/hadoop/hdfs-site.xml |
<configuration> |
- Edit mapred-site.xml
[[email protected] ~]$ cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml |
<configuration> |
- Edit yarn-site.xml
[[email protected] ~]$ vi etc/hadoop/yarn-site.xml |
<configuration> |
- Edit hadoop-env.sh
[[email protected] ~]$ vi etc/hadoop/hadoop-env.sh |
# The java implementation to use. |
- Edit slaves
[[email protected] ~]$ vi etc/hadoop/slaves |
node1.hadoop.local.tld |
- Format namenode
[[email protected] ~]$ hdfs namenode -format |
Actions to do on Nodes
- Create some data dirs on Nodes
[[email protected] ~]$ mkdir -p /opt/hadoop/hdfs/datanode |
[[email protected] ~]$ mkdir -p /opt/hadoop/hdfs/datanode |
- Copy etc from master
[[email protected] ~]$ scp -r master.hadoop.local.tld:etc . |
[[email protected] ~]$ scp -r master.hadoop.local.tld:etc . |
Actions to do master after all
- Start at master
[[email protected] ~]$ source .bash_profile |