- Authors
- Name
Overview
Learn how to query HBase data using Hive.
- Preparation
- How to Create a Hive-HBase Table!
- Converting an HBase Table to a Hive-HBase Table
- Reference
Preparation
There is an existing Hive table called u_data. u_data is part of the dataset provided by MovieLens, and you can find how to create this data at https://www.youngju.dev/blog/202211/hive_query_exampl.
The data schema is as follows.
hive> describe u_data;
OK
userid int
movieid int
rating int
unixtime timestamp
Time taken: 0.442 seconds, Fetched: 4 row(s)
How to Create a Hive-HBase Table!
Creating a Table
hive> CREATE TABLE hbase_u_data(key int, value int)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val");
OK
Time taken: 2.292 seconds
As shown below, you can see that the table has been created in Hive.
hive> show tables;
OK
hbase_u_data
Furthermore, you can also confirm that the table has been created in HBase.
hbase:001:0> list
TABLE
hbase_u_data
hbase:002:0> desc 'hbase_u_data';
Table hbase_u_data is ENABLED
hbase_u_data, {TABLE_ATTRIBUTES => {METADATA => {'hbase.store.file-tracker.impl' => 'DEFAULT'}}}
COLUMN FAMILIES DESCRIPTION
{NAME => 'cf1', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_V
ERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', COMPRESSION => 'NONE', BLOCKCACH
E => 'true', BLOCKSIZE => '65536 B (64KB)', METADATA => {'EVICT_BLOCKS_ON_CLOSE' => 'false', 'NEW_VERSION_BEHAVIOR' =>
'false', 'CACHE_DATA_ON_WRITE' => 'false', 'CACHE_INDEX_ON_WRITE' => 'false', 'CACHE_BLOOMS_ON_WRITE' => 'false', 'PR
EFETCH_BLOCKS_ON_OPEN' => 'false', 'CACHE_DATA_IN_L1' => 'false'}}
1 row(s)
Quota is disabled
Took 0.0953 seconds
Inserting Data
Insert data from the Hive table into the HBase table.
hive> INSERT OVERWRITE TABLE hbase_u_data SELECT userid, rating FROM u_data;
After inserting data from Hive to HBase, the number of rows may differ because Hive supports duplicate keys while HBase only supports unique keys.
hive> select count(*) from hbase_u_data;
OK
943
Time taken: 18.137 seconds, Fetched: 1 row(s)
Converting an HBase Table to a Hive-HBase Table
You can also convert an existing usertable in HBase into a Hive-HBase table that can be used from Hive.
Below is the usertable that exists only in HBase.
hbase:004:0> desc "usertable";
Table usertable is ENABLED
usertable, {TABLE_ATTRIBUTES => {METADATA => {'hbase.store.file-tracker.impl' => 'DEFAULT'}}}
COLUMN FAMILIES DESCRIPTION
{NAME => 'family', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BL
OCK_ENCODING => 'NONE', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE
=> '65536 B (64KB)', REPLICATION_SCOPE => '0'}
Convert HBase table to Hive-HBase table
Enter the currently existing HBase table name as the value for "hbase.table.name".
hive> CREATE EXTERNAL TABLE hbase_usertable(key string, value string)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = "family:field9")
> TBLPROPERTIES("hbase.table.name" = "usertable")
Time taken: 1.651 seconds
You can confirm that a table named hbase_usertable has been created in Hive.
hive> desc hbase_usertable;
OK
key string
value string
Check the number of rows to verify that the row count of the original HBase table matches the Hive-HBase table's row count to confirm it was created correctly.
hive> select count(distinct key) from hbase_usertable;
Total MapReduce CPU Time Spent: 4 minutes 42 seconds 330 msec
OK
1000
After this, if you insert data into this table from either Hive or HBase, the data can be queried from both Hive and HBase.
For more diverse use cases, refer to https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration.