- Authors
- Name
Overview
Following the Hadoop build guide, I will share the Cluster mode installation method.
Extracting the Binary File
Find the desired version of Hadoop at https://hadoop.apache.org/releases.html. Instead of using a Hadoop release version, I installed using the binary built from the source of the latest development branch (trunk) Hadoop 3.4 version. Reference: How to build Hadoop 3.4
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
tar -zxvf hadoop-3.3.4.tar.gz
sudo cp -r hadoop-3.4.4 /usr/local/hadoop
Setting Environment Variables: ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_YARN_USER=${HADOOP_YARN_USER:-yarn}
export ZOOKEEPER_HOME=/usr/local/zookeeper
export HBASE_HOME=/usr/local/hbase
export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar
export HIVE_HOME=/usr/local/hive
export SPARK_HOME=/usr/local/spark
PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$SPARK_HOME/bin
Configuration Changes
HDFS configuration
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://ubuntu01:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/dfs/nn</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/dfs/dn</value>
</property>
</configuration>
Add the following content to hadoop-env.sh.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HDFS_NAMENODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_DATANODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
ubuntu02
ubuntu03
ubuntu04
ubuntu05
ubuntu06
YARN configuration
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>ubuntu01:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>ubuntu01:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>ubuntu01:8040</value>
</property>
</configuration>
Key Distribution
Run the following command on the master node and press Enter repeatedly.
ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa
Your public key has been saved in /root/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:D7CPU5pl7TUrMINhMzsO/ZVEXOE2YI73M/HCcDruhAQ root@yxxxxx
The key's randomart image is:
+---[RSA 3072]----+
| .+.o. |
| =.o |
| E . = * |
Copy the generated RSA public key under ~/.ssh/ to ~/.ssh/authorized_keys on all nodes (including the master node).
cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAADSxxxxxxxx...mmn8= root@yxxxxxx
Create the ~/.ssh folder on all nodes. (For the master node, the ~/.ssh folder should already have been created during the keygen process.) Open an editor and add the master's public key copied above to ~/.ssh/authorized_keys.
mkdir ~/.ssh
vim ~/.ssh/authorized_keys
namenode format
Execute the following command on the master node (ubuntu01).
hdfs naemnode -format
Starting the Hadoop Cluster
Running the following command will start the namenode and resource-manager on the master node ubuntu01, and node-manager on the worker nodes ubuntu02 through ubuntu06.
start-all.sh
Check the running Hadoop processes with the jps command.
For the master node, you should see something like this:
1693816 ResourceManager
1702488 SecondaryNameNode
1703408 Jps
1701958 NameNode
For worker nodes, you should see something like this:
1703882 DataNode
1704983 Jps
1704373 NodeManager
Checking the Web UI
Access the Namenode Web UI at http://ubuntu01:9870 to verify that the namenode and datanodes are running properly.
Access the Resource Manager Web UI at http://ubuntu01:8088 to check the status of the resource manager and node managers.