1 Before start
Hadoop must be installed on your system before installing Hive. Let us verify the Hadoop installation using the following command:
$ hadoop version
Hadoop 2.8.2
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 66c47f2a01ad9637879e95f80c41f798373828fb
Compiled by jdu on 2017-10-19T20:39Z
Compiled with protoc 2.5.0
From source with checksum dce55e5afe30c210816b39b631a53b1d
This command was run using /home/hadoop/hadoop/share/hadoop/common/hadoop-common-2.8.2.jar
1.1 Download hive
After configuring hadoop successfully on your linux system. lets start hive setup. First download latest hive source code and extract archive using following commands.
$ cd /home/hadoop $ wget http://archive.apache.org/dist/hive/hive-2.3.2/apache-hive-2.3.2-bin.tar.gz $ tar xzf apache-hive-2.3.2-bin.tar.gz $ mv apache-hive-2.3.2-bin hive
1.2 Set Environment Variables edit
Configure your environment variables to use Hive. Edit /home/hadoop/.bash_profile and add the following lines:
export HADOOP_HOME=/home/hadoop/hadoop export HADOOP_PREFIX=/home/hadoop/hadoop export HIVE_HOME=/home/hadoop/hive export PATH=$HIVE_HOME/bin:$PATH
1.3 Configure HDFS for Hive
Before running hive you need to create warehouse directory in HDFS
and set them chmod g+w in HDFS
before create a table in Hive. Use the following commands.
$ hdfs dfs -mkdir /user/hive/warehouse $ hdfs dfs -chmod g+w /user/hive/warehouse
2 Hive metastore
Configuring metastore
means specifying to Hive where the database is stored.
All hive implementation need a metastore
service, where it stores metadata.
It is implemented using tables in relational database. By default,
Hive uses built-in Derby SQL server.
It provides single process storage, so when we use Derby we can not run instance of Hive CLI.
Whenever we want to run Hive on a personal machine or for some developer task than it is
good but when we want to use it on cluster then MYSQL or any other similar relational
database is required.
Now when you run your hive query and you are using default derby database you will
find that your current directory now contains a new sub-directory metastore_db.
Also the metastore
will be created if it doesn’t already exist.
The property of interest here is javax.jdo.option.ConnectionURL.
The default value of this property is jdbc:derby:;databaseName=metastore_db;create=true
.
This value specifies that you will be using embedded derby as your Hive metastore
and the location of the metastore
is metastore_db.
2.1 Config hive-site.xml
-
Copy the hive-default-xml template as
hive-site.xml
Copy$ cp $HOME/hive/conf/hive-default.xml.template $HOME/hive/conf/hive-site.xml
-
Set the default properties for tmpdir and user name on top of $HOME/hive/conf/
hive-site.xml
Copy<property> <name>system:java.io.tmpdir</name> <value>/tmp/${user.name}/java</value> </property> <property> <name>system:user.name</name> <value>${user.name}</value> </property>
If you miss to setup this values you will get an exception when running hive.java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
-
We can also configure directory for hive store table information.
By default, the location of warehouse is
/user/hadoop/warehouse
as specified inhive-site.xml
.Copy<property> <name>hive.metastore.warehouse.dir</name> <value>/user/${user.name}/warehouse</value> <description>location of default database for the warehouse</description> </property>
-
Notice this is a location pointing to
HDFS
, so it must exists before you plan to create any database, so create the warehouse directory for hive inHDFS
Copy$ hadoop fs -mkdir warehouse $ hdfs dfs -ls -R /
drwx-wx-wx - deister supergroup 0 2018-02-10 23:31 /tmp drwx-wx-wx - deister supergroup 0 2018-02-10 23:31 /tmp/hive drwx------ - deister supergroup 0 2018-02-10 23:33 /tmp/hive/deister drwxr-xr-x - deister supergroup 0 2018-02-10 23:28 /user drwxr-xr-x - deister supergroup 0 2018-02-10 23:32 /user/deister drwxr-xr-x - deister supergroup 0 2018-02-10 23:32 /user/deister/workspace
2.2 Derby metastore
If derby database is not installed in your system, download, install and configure it:
- Download derby:
Copy
$ wget http://archive.apache.org/dist/db/derby/db-derby-10.14.1.0/db-derby-10.14.1.0-bin.tar.gz $ tar xzf db-derby-10.14.1.0-bin.tar.gz $ mv db-derby-10.14.1.0-bin db-derby
- Configure environment variables editing: $HOME/.bash_profile:
Copy
export DERBY_INSTALL=/home/hadoop/db-derby export DERBY_HOME=/home/hadoop/db-derby export PATH=$DERBY_HOME/bin:$PATH
-
Create the Derby
metastore
using the following command.Copy$ schematool -dbType derby -initSchema
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver Metastore connection User: APP Starting metastore schema initialization to 2.3.0 Initialization script hive-schema-2.3.0.derby.sql Initialization script completed schemaTool completed
-
As you are setting up an Derby embedded
metastore
database, use the property below as JDBC URL in yourhive-site.xml
Copy/home/hadoop/hive/conf/hive-site.xml
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:derby:metastore_db;create=true </value> <description>JDBC connect string for a JDBC metastore </description> </property>
2.3 MySQL metastore
-
Create a
metastore
database in MYSQL server.Copymysql mysql> CREATE DATABASE metastore; mysql> USE metastore; mysql> CREATE USER 'hiveuser'@'localhost' IDENTIFIED BY 'password'; mysql> GRANT SELECT,INSERT,UPDATE,DELETE,ALTER,CREATE ON metastore.* TO 'hiveuser'@'localhost';
-
Add/Edit the following lines in your
hive-site.xml
Copy/usr/local/opt/hive/libexec/conf/hive-site.xml
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost/metastore</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hiveuser</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>password</value> </property> <property> <name>datanucleus.fixedDatastore</name> <value>false</value> </property>
3 Running hive
You can start hive by running
$ hive
...
Logging initialized using configuration in jar:file:/usr/local/Cellar/hive/2.3.1/libexec/lib/hive-common-2.3.1.jar!/hive-log4j2.properties Async: true
hive>
Or in debug mode
$ hive -hiveconf hive.root.logger=DEBUG,console
Once connect to hive, you can run some command to test it's running ok.
$ hive> show databases;
OK
default
Time taken: 9.579 seconds, Fetched: 1 row(s)
3.1 Create a database
Now, create a test database, then list contents of HDFS
hive> create database test;
OK
Time taken: 0.399 seconds
To make Hive CLI (command line interface) shows current database type:
set hive.cli.print.current.db=true;
hive (default)>
To make this feature persistent, edit hive-site.xml
and set the property
<property> <name>hive.cli.print.current.db</name> <value>true</value> </property>
You can see test.db unser warehouse directory
$ hdfs dfs -ls -R /
drwx-wx-wx - deister supergroup 0 2018-02-10 23:31 /tmp
drwx-wx-wx - deister supergroup 0 2018-02-10 23:31 /tmp/hive
drwx------ - deister supergroup 0 2018-02-10 23:35 /tmp/hive/deister
drwx------ - deister supergroup 0 2018-02-10 23:35 /tmp/hive/deister/3db7219f-d599-46be-b195-163f93374c8c
drwx------ - deister supergroup 0 2018-02-10 23:35 /tmp/hive/deister/3db7219f-d599-46be-b195-163f93374c8c/_tmp_space.db
drwxr-xr-x - deister supergroup 0 2018-02-10 23:28 /user
drwxr-xr-x - deister supergroup 0 2018-02-10 23:37 /user/deister
drwxr-xr-x - deister supergroup 0 2018-02-10 23:37 /user/deister/warehouse
drwxr-xr-x - deister supergroup 0 2018-02-10 23:37 /user/deister/warehouse/test.db
drwxr-xr-x - deister supergroup 0 2018-02-10 23:32 /user/deister/workspace
4 Thoubleshooting
4.1 Can't start Hive cause hadoop is ins safemode
If hadoop is put in safemode, hive will not be able to start throwing an exception like
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hive/hadoop/
To recover hadoop from safe mode type:
$ hdfs dfsadmin -safemode leave
Safe mode is OFF