Skip to content

Split View: Hadoop 3.4 Build 방법 (Ubuntu 22.04)

|

Hadoop 3.4 Build 방법 (Ubuntu 22.04)

Overview

Hadoop은 워낙 거대한 소프트웨어라, Build하는 환경을 구축하는 것 조차 힘들다. hadoop build 공식가이드 문서에 build하는 방법이 상세히 나와있다. Linux, CentOS, MacOS, Windows에서 Build하는 법이 있지만, docker를 통한 build를 권장하고 있다. 왜냐하면 physical Linux 머신에, build 환경을 구축할 수 있지만 이 환경을 다시 복원하기란 어렵다. 또한 Build는 software 버젼에 민감하기 때문에, 기존에 머신에서 잘 돌아가고 있는 소프트웨어의 version을 downgrade 해야할 수도 있다. 편의를 위해 docker를 container를 활용해 build 환경을 구축하면, 시간과 노력을 획기적으로 단축할 수 있다. 그리고 아래는 그 방법에 대해 기록해보려한다.

install docker

sudo apt-get update
sudo apt-get install \
    ca-certificates \
    curl \
    gnupg \
    lsb-release
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin

Docker 설치 확인

sudo docker run hello-world

아래와 같은 메시지가 나오면 정상 설치된 것임.

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

Hadoop build 환경 구축

하둡 공식 github의 trunk branch를 clone 한 뒤, 해당 폴더에서 sudo ./start-build-env.sh 명령어를 입력하면 hadoop을 build하기 위한 환경이 자동으로 구축된다.

git clone https://github.com/apache/hadoop.github
cd hadoop
sudo ./start-build-env.sh

정상적으로 build 환경이 구축되면, 터미널에 아래와 같은 글자가 나온다.

Successfully built 147e63abcbef
Successfully tagged hadoop-build-1000:latest

 _   _           _                    ______
| | | |         | |                   |  _  \
| |_| | __ _  __| | ___   ___  _ __   | | | |_____   __
|  _  |/ _` |/ _` |/ _ \ / _ \| '_ \  | | | / _ \ \ / /
| | | | (_| | (_| | (_) | (_) | |_) | | |/ /  __/\ V /
\_| |_/\__,_|\__,_|\___/ \___/| .__/  |___/ \___| \_(_)
                              | |

hadoop build

아래 명령어를 입력하면, Source와 Binary distribution이 생성된다.

export MAVEN_OPTS="-Xms256m -Xmx1536m"
sudo JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64  mvn package -Pdist,src -DskipTests -Dtar

아래와 같이 필요에 따라 빌드 옵션 변경할 수 있다.

Building distributions:

Create binary distribution without native code and without Javadocs:

  $ mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true

Create binary distribution with native code:

  $ mvn package -Pdist,native -DskipTests -Dtar

Create source distribution:

  $ mvn package -Psrc -DskipTests

Create source and binary distributions with native code:

  $ mvn package -Pdist,native,src -DskipTests -Dtar

Create a local staging version of the website (in /tmp/hadoop-site)

  $ mvn site site:stage -Preleasedocs,docs -DstagingDirectory=/tmp/hadoop-site

빌드에 성공하면, 아래와 같이 BUILD SUCCESS 라는 문구가 표시된다. 현재 사용중인 노트북 기준으로 23분이 소요되었다.

[INFO] Apache Hadoop Client Packaging Integration Tests ... SUCCESS [  3.600 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [ 25.185 s]
[INFO] Apache Hadoop Client Modules ....................... SUCCESS [  0.024 s]
[INFO] Apache Hadoop Tencent COS Support .................. SUCCESS [  4.816 s]
[INFO] Apache Hadoop OBS support .......................... SUCCESS [ 21.104 s]
[INFO] Apache Hadoop Cloud Storage ........................ SUCCESS [  3.470 s]
[INFO] Apache Hadoop Cloud Storage Project ................ SUCCESS [  0.016 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  23:13 min
[INFO] Finished at: 2022-12-25T02:17:47Z
[INFO] ------------------------------------------------------------------------

결과물은 hadoop/hadoop-dist/target 폴더에 들어있다. 이 binary 파일을 통해, 하둡을 설치할 수 있다.

285f7027d3f:~/hadoop/hadoop-dist/target$ ll
total 662864
drwxr-xr-x  9 root       root            4096 Dec 25 02:17 ./
drwxr-xr-x  3 youngjukim youngjukim      4096 Dec 24 17:05 ../
drwxr-xr-x  2 root       root            4096 Dec 25 02:16 antrun/
drwxr-xr-x  3 root       root            4096 Dec 25 02:16 classes/
drwxr-xr-x 10 root       root            4096 Dec 25 02:16 hadoop-3.4.0-SNAPSHOT/
-rw-r--r--  1 root       root        37263892 Dec 25 01:54 hadoop-3.4.0-SNAPSHOT-src.tar.gz
-rw-r--r--  1 root       root       641461679 Dec 25 02:17 hadoop-3.4.0-SNAPSHOT.tar.gz
drwxr-xr-x  2 root       root            4096 Dec 25 02:10 hadoop-tools-deps/
drwxr-xr-x  3 root       root            4096 Dec 25 02:16 maven-shared-archive-resources/
-rw-r--r--  1 root       root              30 Dec 25 02:16 .plxarc
drwxr-xr-x  3 root       root            4096 Dec 25 02:16 test-classes/
drwxr-xr-x  2 root       root            4096 Dec 25 02:16 test-dir/

How to Build Hadoop 3.4 (Ubuntu 22.04)

Overview

Hadoop is such a massive piece of software that even setting up the build environment can be challenging. The official Hadoop build guide provides detailed instructions on how to build it. While there are instructions for building on Linux, CentOS, MacOS, and Windows, building through Docker is recommended. This is because although you can set up a build environment on a physical Linux machine, restoring that environment later is difficult. Also, since builds are sensitive to software versions, you may need to downgrade software versions that are already running well on your machine. By leveraging Docker containers for convenience, you can dramatically reduce the time and effort needed to set up the build environment. Below, I will document the method.

install docker

sudo apt-get update
sudo apt-get install \
    ca-certificates \
    curl \
    gnupg \
    lsb-release
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin

Verify Docker Installation

sudo docker run hello-world

If you see the following message, the installation was successful.

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

Setting Up the Hadoop Build Environment

After cloning the trunk branch from the official Hadoop GitHub repository, running sudo ./start-build-env.sh in that directory will automatically set up the build environment for Hadoop.

git clone https://github.com/apache/hadoop.github
cd hadoop
sudo ./start-build-env.sh

If the build environment is set up successfully, the following text will appear in the terminal.

Successfully built 147e63abcbef
Successfully tagged hadoop-build-1000:latest

 _   _           _                    ______
| | | |         | |                   |  _  \
| |_| | __ _  __| | ___   ___  _ __   | | | |_____   __
|  _  |/ _` |/ _` |/ _ \ / _ \| '_ \  | | | / _ \ \ / /
| | | | (_| | (_| | (_) | (_) | |_) | | |/ /  __/\ V /
\_| |_/\__,_|\__,_|\___/ \___/| .__/  |___/ \___| \_(_)
                              | |

hadoop build

Entering the following command will generate the Source and Binary distributions.

export MAVEN_OPTS="-Xms256m -Xmx1536m"
sudo JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64  mvn package -Pdist,src -DskipTests -Dtar

You can change the build options as needed, as shown below.

Building distributions:

Create binary distribution without native code and without Javadocs:

  $ mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true

Create binary distribution with native code:

  $ mvn package -Pdist,native -DskipTests -Dtar

Create source distribution:

  $ mvn package -Psrc -DskipTests

Create source and binary distributions with native code:

  $ mvn package -Pdist,native,src -DskipTests -Dtar

Create a local staging version of the website (in /tmp/hadoop-site)

  $ mvn site site:stage -Preleasedocs,docs -DstagingDirectory=/tmp/hadoop-site

If the build succeeds, a BUILD SUCCESS message will be displayed as shown below. It took 23 minutes on the laptop currently in use.

[INFO] Apache Hadoop Client Packaging Integration Tests ... SUCCESS [  3.600 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [ 25.185 s]
[INFO] Apache Hadoop Client Modules ....................... SUCCESS [  0.024 s]
[INFO] Apache Hadoop Tencent COS Support .................. SUCCESS [  4.816 s]
[INFO] Apache Hadoop OBS support .......................... SUCCESS [ 21.104 s]
[INFO] Apache Hadoop Cloud Storage ........................ SUCCESS [  3.470 s]
[INFO] Apache Hadoop Cloud Storage Project ................ SUCCESS [  0.016 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  23:13 min
[INFO] Finished at: 2022-12-25T02:17:47Z
[INFO] ------------------------------------------------------------------------

The build output is located in the hadoop/hadoop-dist/target folder. You can install Hadoop using this binary file.

285f7027d3f:~/hadoop/hadoop-dist/target$ ll
total 662864
drwxr-xr-x  9 root       root            4096 Dec 25 02:17 ./
drwxr-xr-x  3 youngjukim youngjukim      4096 Dec 24 17:05 ../
drwxr-xr-x  2 root       root            4096 Dec 25 02:16 antrun/
drwxr-xr-x  3 root       root            4096 Dec 25 02:16 classes/
drwxr-xr-x 10 root       root            4096 Dec 25 02:16 hadoop-3.4.0-SNAPSHOT/
-rw-r--r--  1 root       root        37263892 Dec 25 01:54 hadoop-3.4.0-SNAPSHOT-src.tar.gz
-rw-r--r--  1 root       root       641461679 Dec 25 02:17 hadoop-3.4.0-SNAPSHOT.tar.gz
drwxr-xr-x  2 root       root            4096 Dec 25 02:10 hadoop-tools-deps/
drwxr-xr-x  3 root       root            4096 Dec 25 02:16 maven-shared-archive-resources/
-rw-r--r--  1 root       root              30 Dec 25 02:16 .plxarc
drwxr-xr-x  3 root       root            4096 Dec 25 02:16 test-classes/
drwxr-xr-x  2 root       root            4096 Dec 25 02:16 test-dir/

Quiz

Q1: What is the main topic covered in "How to Build Hadoop 3.4 (Ubuntu 22.04)"? Learn how to build Hadoop 3.4 on Ubuntu 22.04.

Q2: What are the key steps for Verify Docker Installation? If you see the following message, the installation was successful.

Q3: Explain the core concept of Setting Up the Hadoop Build Environment. After cloning the trunk branch from the official Hadoop GitHub repository, running sudo ./start-build-env.sh in that directory will automatically set up the build environment for Hadoop.

Q4: What are the key aspects of hadoop build? Entering the following command will generate the Source and Binary distributions. You can change the build options as needed, as shown below. If the build succeeds, a BUILD SUCCESS message will be displayed as shown below. It took 23 minutes on the laptop currently in use.