Skip to content
Published on

How to Install Hive 2.3.9 (Non-Secure) on Ubuntu

Authors

Prerequisites

Servers

numberhostnamerole
1latte01master, namenode, resource-manager mysql
2latte03worker, datanode, nodemanager
3latte03worker, datanode, nodemanager

Hadoop Installation

In this post, we will explore a setup that uses HDFS as Hive's storage and MapReduce as the execution engine. Hadoop must be installed beforehand.

For installation instructions, refer to the Hadoop non-secure installation guide. Note that in this post, we will use version 2.10.2 instead of version 3.3.4 mentioned in the guide.

MySQL Installation

Hive requires an RDBMS as its metastore. We will use MySQL as the metastore storage, so please install MySQL by referring to the MySQL installation method in the MySQL SQL Basics post. I installed MySQL on the latte01 server. Once MySQL installation is complete, verify that MySQL is running properly with the following command.

sudo service mysql status

Hive Installation

The following installation process was performed while logged in as the root user.

Download and Extract Binary

Download the Hive binary on the master server (server 1), extract it, and copy it to an appropriate location. I copied the Hive-related files to /usr/local/hadoop.

wget https://dlcdn.apache.org/hive/hive-2.3.9/apache-hive-2.3.9-bin.tar.gz
tar -zxvf apache-hive-2.3.9-bin.tar.gz
cp -r apache-hive-2.3.9-bin/ /usr/local/hive

Install JDBC Driver

wget https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.0.33/mysql-connector-j-8.0.33.jar
cp mysql-connector-j-8.0.33.jar /usr/local/hive/lib/

Modify Configuration

If you check the /usr/local/hive/conf folder where we installed Hive, you can see that the configs are saved in .template format.

root@latte01:/usr/local/hive/conf# ls
beeline-log4j2.properties.template  hive-env.sh.template		  hive-log4j2.properties.template  llap-cli-log4j2.properties.template	   parquet-logging.properties
hive-default.xml.template	    hive-exec-log4j2.properties.template  ivysettings.xml		   llap-daemon-log4j2.properties.template

Modify hive-site.xml

Since the hive-site.xml file does not exist during initial installation, copy the hive-default.xml.template file to create hive-site.xml.

cp hive-default.xml.template hive-site.xml
vim hive-site.xml

Modify the following content.

hive-site.xml
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://latte01:3306/hive?createDatabaseIfNotExist=true</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>MySQL connection user name</value>
    <description>Username to use against metastore database</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>MySQL connection password</value>
    <description>password to use against metastore database</description>
  </property>

Add the following content at the bottom of hive-site.xml.

If you do not add the following, an error like Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D will occur.

hive-site.xml
  <property>
    <name>system:java.io.tmpdir</name>
    <value>/tmp/hive/java</value>
  </property>
  <property>
    <name>system:user.name</name>
    <value>${user.name}</value>
  </property>

Register HIVE_HOME

hive-env.sh
HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/local/hive
PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$SPARK_HOME/bin
export PATH

Initialize Metastore Schema

schematool -initSchema -dbType mysql -verbose
root@latte01:/usr/local/hive/conf# schematool -initSchema -dbType mysql -verbose

Initialization script hive-schema-2.3.0.mysql.sql
Connecting to jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true
Connected to: MySQL (version 8.0.34-0ubuntu0.22.04.1)
Driver: MySQL Connector/J (version mysql-connector-j-8.0.33 (Revision: 7d6b0800528b6b25c68b52dc10d6c1c8429c100c))
Transaction isolation: TRANSACTION_READ_COMMITTED
0: jdbc:mysql://localhost:3306/hive> !autocommit on
Autocommit status: true
0: jdbc:mysql://localhost:3306/hive> /*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */
No rows affected (0.007 seconds)
0: jdbc:mysql://localhost:3306/hive> /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */
No rows affected (0.001 seconds)
0: jdbc:mysql://localhost:3306/hive> /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */
...

0: jdbc:mysql://localhost:3306/hive> !closeall
Closing: 0: jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true
beeline>
beeline> Initialization script completed
schemaTool completed

Running Hive

Execute the following command to start the Hive shell.

/usr/local/hive/bin/hive

hive> show schemas;
OK
default
Time taken: 0.461 seconds, Fetched: 1 row(s)

To run as a daemon, do the following.

nohup hiveserver2 > hive.log &

Installing Hue Development Version

While it is possible to execute queries and check results using the Hive shell, the inherent limitations of a shell make user-friendly interaction impossible. Using a tool called Hue, you can easily run Hive queries and check results in a web browser.

Install OS-Dependent Packages

sudo apt-get install git ant gcc g++ libffi-dev libkrb5-dev libmysqlclient-dev libsasl2-dev libsasl2-modules-gssapi-mit libsqlite3-dev libssl-dev libxml2-dev libxslt-dev make maven libldap2-dev python3-dev  python-setuptools libgmp3-dev libbz2-dev

sudo apt-get install python3.8-dev python3-distutils


curl -sL https://deb.nodesource.com/setup_14.x | sudo bash -
sudo apt-get install -y nodejs

Install Python 3.8.18

wget https://www.python.org/ftp/python/3.8.18/Python-3.8.18.tgz
tar -zxvf Python-3.8.18.tgz
cd Python-3.8.18/
sudo ./configure --enable-optimizations
sudo make altinstall
cd /root/work
git clone https://github.com/cloudera/hue.git
cd hue
export PYTHON_VER=python3.8
export ROOT=/usr/local/hue
make apps

Create MySQL Hue User

mysql -u root -p
set global validate_password_policy=LOW;
set global validate_password_length=6;
create user 'hue'@'%' identified by 'xxxxxxxx';
grant all privileges on *.* to hue@'%';

Modify Hue INI File

hue/desktop/conf/pseudo-distributed.ini
[[database]]
host=localhost
port=3306
engine=mysql
user=hue
password=xxxxxxxx
name=hue


[beeswax]
  hive_server_host=localhost
  # Port where HiveServer2 Thrift server runs on.
  hive_server_port=10000

Start Dev Server

./build/env/bin/hue migrate
./build/env/bin/hue runserver 0:8000
Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: root is not allowed to impersonate admin

If the above error occurs, add the following information to core-site.xml and restart Hadoop.

core-site.xml
    <property>
        <name>hadoop.proxyuser.root.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.root.hosts</name>
        <value>*</value>
    </property>
   <property>
        <name>hadoop.proxyuser.hive.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hive.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hue.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hue.hosts</name>
        <value>*</value>
    </property>

Access http://latte01:8000/

Reference