Hadoop Cluster Purpose This tutorial has as purpose to teach how to install a hadoop cluster with 1 Name Node and 2 Data Nodes. Hadoop was installed into 3 CentOS VMs (KVM) and this tutorial is a merge of 3 others, which you can find on the internet.
Link1 Link2 Link3
Steps Each command is prepended with a Bash Prompt that show where is running, who is running and the privilege which is running.
Pay attention: [[email protected] ~]#
User root
Host master.hadoop.local.tld
Privileged #
Actions to do on all nodes and master
Ensue firewall is stopped
192.168.122.201 master.hadoop.local.tld 192.168.122.202 node1.hadoop.local.tld 192.168.122.203 node2.hadoop.local.tld
192.168.122.202 node1.hadoop.local.tld 192.168.122.203 node2.hadoop.local.tld 192.168.122.201 master.hadoop.local.tld
192.168.122.203 node2.hadoop.local.tld 192.168.122.202 node1.hadoop.local.tld 192.168.122.201 master.hadoop.local.tld
Add hadoop user on all nodes
Generate key on all nodes and master and copy to eache one
Download hadoop on all nodes
[[email protected] ~]$ curl -O http://ftp.unicamp.br/pub/apache/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz
[[email protected] ~]$ curl -O http://ftp.unicamp.br/pub/apache/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz
[[email protected] ~]$ curl -O http://ftp.unicamp.br/pub/apache/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz
* Ensure you are in /opt/hadoop *
Edit .bash_profile on all nodes
## JAVA env variables export JAVA_HOME=/usr/java/default export PATH=$PATH:$JAVA_HOME/bin export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar ## HADOOP env variables export HADOOP_HOME=/opt/hadoop export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
## JAVA env variables export JAVA_HOME=/usr/java/default export PATH=$PATH:$JAVA_HOME/bin export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar ## HADOOP env variables export HADOOP_HOME=/opt/hadoop export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
## JAVA env variables export JAVA_HOME=/usr/java/default export PATH=$PATH:$JAVA_HOME/bin export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar ## HADOOP env variables export HADOOP_HOME=/opt/hadoop export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
Actions to do on Master ONLY
Create some data dirs on Master
<configuration > <property > <name > fs.defaultFS</name > <value > hdfs://master.hadoop.local.tld:9000/</value > </property > <property > <name > io.file.buffer.size</name > <value > 131072</value > </property > </configuration >
<configuration > <property > <name > dfs.namenode.name.dir</name > <value > /opt/hadoop/hdfs/namenode</value > </property > <property > <name > dfs.datanode.data.dir</name > <value > file:/opt/hadoop/hdfs/datanode</value > </property > <property > <name > dfs.namenode.checkpoint.dir</name > <value > file:/opt/hadoop/hdfs/namesecondary</value > </property > <property > <name > dfs.replication</name > <value > 2</value > </property > <property > <name > dfs.block.size</name > <value > 134217728</value > </property > </configuration >
<configuration > <property > <name > mapreduce.framework.name</name > <value > yarn</value > </property > <property > <name > mapreduce.jobhistory.address</name > <value > master.hadoop.local.tld:10020</value > </property > <property > <name > mapreduce.jobhistory.webapp.address</name > <value > master.hadoop.local.tld:19888</value > </property > <property > <name > yarn.app.mapreduce.am.staging-dir</name > <value > /user/app</value > </property > <property > <name > mapred.child.java.opts</name > <value > -Djava.security.egd=file:/dev/../dev/urandom</value > </property > </configuration >
<configuration > <property > <name > yarn.resourcemanager.hostname</name > <value > master.hadoop.local.tld</value > </property > <property > <name > yarn.resourcemanager.bind-host</name > <value > 0.0.0.0</value > </property > <property > <name > yarn.nodemanager.bind-host</name > <value > 0.0.0.0</value > </property > <property > <name > yarn.nodemanager.aux-services</name > <value > mapreduce_shuffle</value > </property > <property > <name > yarn.nodemanager.aux-services.mapreduce_shuffle.class</name > <value > org.apache.hadoop.mapred.ShuffleHandler</value > </property > <property > <name > yarn.log-aggregation-enable</name > <value > true</value > </property > <property > <name > yarn.nodemanager.local-dirs</name > <value > file:/opt/hadoop/yarn/local</value > </property > <property > <name > yarn.nodemanager.log-dirs</name > <value > file:/opt/hadoop/yarn/log</value > </property > <property > <name > yarn.nodemanager.remote-app-log-dir</name > <value > hdfs://master.hadoop.local.tld:8020/var/log/hadoop-yarn/apps</value > </property > </configuration >
# The java implementation to use. #export JAVA_HOME=${JAVA_HOME} export JAVA_HOME=/usr/java/default/
node1.hadoop.local.tld node2.hadoop.local.tld
Actions to do on Nodes
Create some data dirs on Nodes
Actions to do master after all