Skip to content
Published on

How to Use HBase YCSB (Yahoo Cloud Serving Benchmark)

Authors
  • Name
    Twitter

Overview

There are times when you want to push data into HBase to measure the maximum performance of an HBase cluster. Or you may want to insert dummy data into HBase. For these cases, there is a tool called YCSB that I would like to introduce.

Installation

Install Maven

To build YCSB from source code, Maven version 3.x or higher is required. Install it with the following command and then check the Maven version. In my case, I used Maven 3.6.3.

install_maven
sudo apt install maven
mvn -version

Git Clone

Clone the project from YCSB-github.

Maven Build

build_maven
cd YCSB
mvn clean package

Install Python 2.7

YCSB is designed to run on Python 2.xx. Therefore, set up a virtualenv that can run Python 2.7 and activate it.

install-python2.7
pip install virtualenv
virtualenv py27 --python=python2.7
source py27/bin/activate

Create a Table in HBase

Refer to the YCSB HBase2 documentation to create a table called usertable in HBase beforehand.

hbase-shell
hbase:001:0> n_splits = 50
=> 50
hbase:002:0> create 'usertable', 'family', {SPLITS => (1..n_splits).map {|i| "user#{1000+i*(9999-1000)/n_splits}"}}
2022-11-19 12:29:26,454 INFO  [RPCClient-NioEventLoopGroup-1-1] Configuration.deprecation (Configuration.java:logDeprecation(1394)) - hbase.client.pause.cqtbe is deprecated. Instead, use hbase.client.pause.server.overloaded
Created table usertable2022-11-19 12:29:30,765 INFO  [RPCClient-NioEventLoopGroup-1-2] client.AsyncHBaseAdmin (RawAsyncHBaseAdmin.java:onFinished(2569)) - Operation: CREATE, Table Name: default:usertable completed

Took 4.7173 seconds
=> Hbase::Table - usertable

YCSB Folder

YCSB requires hbase-site.xml to run benchmarks, so create a new folder for this purpose. Then go to one of the servers in the HBase cluster, copy the ${HBASE_HOME}/conf/hbase-site.xml file, and paste it into this folder. You can also use scp to copy it.

mkdir
/YCSB$ mkdir youngju-hbase
/YCSB$ cd youngju-hbase/
/YCSB/youngju-hbase$ vim hbase-site.xml
hbase-site.xml
root@ubuntu01:~# cat /usr/local/hbase/conf/hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://ubuntu01:9000/hbase</value>
  </property>
  <property>
    <name>hbase.unsafe.stream.capability.enforce</name>
    <value>false</value>
  </property>
  <property>
	  <name>hbase.zookeeper.quorum</name>
	  <value>ubuntu01,ubuntu02,ubuntu03</value>
  </property>
  <property>
  <name>hbase.wal.provider</name>
  <value>filesystem</value>
</property>
</configuration>

Run YCSB

Run bin/ycsb load hbase2 -P workloads/workloada -cp youngju-hbase/ -p table=usertable -p from the YCSB home directory, and 1000 records will be loaded as shown below. Make sure the Python 2.7 virtual environment is activated so the Python script runs without issues.

load-YCSB
(py27) YCSB$ bin/ycsb load hbase2 -P workloads/workloada -cp youngju-hbase/ -p table=usertable -p columnfamily=family
[WARN]  Running against a source checkout. In order to get our runtime dependencies we'll have to invoke Maven. Depending on the state of your system, this may take ~30-45 seconds
[DEBUG]  Running 'mvn -pl site.ycsb:hbase2-binding -am package -DskipTests dependency:build-classpath -DincludeScope=compile -Dmdep.outputFilterFile=true'
java -cp youngju-hbase/:/home/youngju/work/YCSB/hbase2/conf:/home/youngju/work/YCSB/hbase2/target/hbase2-binding-0.18.0-SNAPSHOT.jar:/home/youngju/.m2/repository/org/apache/htrace/htrace-core4/4.1.0-incubating/htrace-core4-4.1.0-incubating.jar:/home/youngju/.m2/repository/org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar:/home/youngju/.m2/repository/commons-logging/commons-logging/1.2/commons-logging-1.2.jar:/home/youngju/.m2/repository/org/apache/yetus/audience-annotations/0.5.0/audience-annotations-0.5.0.jar:/home/youngju/.m2/repository/com/github/stephenc/findbugs/findbugs-annotations/1.3.9-1/findbugs-annotations-1.3.9-1.jar:/home/youngju/.m2/repository/org/hdrhistogram/HdrHistogram/2.1.4/HdrHistogram-2.1.4.jar:/home/youngju/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/home/youngju/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.4/jackson-mapper-asl-1.9.4.jar:/home/youngju/.m2/repository/org/apache/hbase/hbase-shaded-client/2.2.3/hbase-shaded-client-2.2.3.jar:/home/youngju/.m2/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar:/home/youngju/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.4/jackson-core-asl-1.9.4.jar:/home/youngju/work/YCSB/core/target/core-0.18.0-SNAPSHOT.jar site.ycsb.Client -db site.ycsb.db.hbase2.HBaseClient2 -P workloads/workloada -p table=usertable -p columnfamily=family -load
Command line: -db site.ycsb.db.hbase2.HBaseClient2 -P workloads/workloada -p table=usertable -p columnfamily=family -load
YCSB Client 0.18.0-SNAPSHOT

Loading workload...
log4j:WARN No appenders could be found for logger (org.apache.htrace.core.Tracer).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Starting test.
DBWrapper: report latency for each error is false and specific error codes to track for latency are: []
[OVERALL], RunTime(ms), 11783
[OVERALL], Throughput(ops/sec), 84.86803021301876
[TOTAL_GCS_PS_Scavenge], Count, 2
[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 10
[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.08486803021301877
[TOTAL_GCS_PS_MarkSweep], Count, 1
[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 12
[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.10184163625562251
[TOTAL_GCs], Count, 3
[TOTAL_GC_TIME], Time(ms), 22
[TOTAL_GC_TIME_%], Time(%), 0.18670966646864126
[CLEANUP], Operations, 2
[CLEANUP], AverageLatency(us), 4844.5
[CLEANUP], MinLatency(us), 13
[CLEANUP], MaxLatency(us), 9679
[CLEANUP], 95thPercentileLatency(us), 9679
[CLEANUP], 99thPercentileLatency(us), 9679
[INSERT], Operations, 1000
[INSERT], AverageLatency(us), 9895.29
[INSERT], MinLatency(us), 4144
[INSERT], MaxLatency(us), 1103871
[INSERT], 95thPercentileLatency(us), 14295
[INSERT], 99thPercentileLatency(us), 23871
[INSERT], Return=OK, 1000

Next, change only the load part to run and execute bin/ycsb run hbase2 -P workloads/workloada -cp youngju-hbase/ -p table=usertable -p. This will perform operations such as PUT, READ, UPDATE, etc., as shown below.

YCSB-run
(py27) YCSB$ bin/ycsb run hbase2 -P workloads/workloada -cp youngju-hbase/ -p table=usertable -p columnfamily=family
[WARN]  Running against a source checkout. In order to get our runtime dependencies we'll have to invoke Maven. Depending on the state of your system, this may take ~30-45 seconds
[DEBUG]  Running 'mvn -pl site.ycsb:hbase2-binding -am package -DskipTests dependency:build-classpath -DincludeScope=compile -Dmdep.outputFilterFile=true'
java -cp youngju-hbase/:/home/youngju/work/YCSB/hbase2/conf:/home/youngju/work/YCSB/hbase2/target/hbase2-binding-0.18.0-SNAPSHOT.jar:/home/youngju/.m2/repository/org/apache/htrace/htrace-core4/4.1.0-incubating/htrace-core4-4.1.0-incubating.jar:/home/youngju/.m2/repository/org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar:/home/youngju/.m2/repository/commons-logging/commons-logging/1.2/commons-logging-1.2.jar:/home/youngju/.m2/repository/org/apache/yetus/audience-annotations/0.5.0/audience-annotations-0.5.0.jar:/home/youngju/.m2/repository/com/github/stephenc/findbugs/findbugs-annotations/1.3.9-1/findbugs-annotations-1.3.9-1.jar:/home/youngju/.m2/repository/org/hdrhistogram/HdrHistogram/2.1.4/HdrHistogram-2.1.4.jar:/home/youngju/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/home/youngju/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.4/jackson-mapper-asl-1.9.4.jar:/home/youngju/.m2/repository/org/apache/hbase/hbase-shaded-client/2.2.3/hbase-shaded-client-2.2.3.jar:/home/youngju/.m2/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar:/home/youngju/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.4/jackson-core-asl-1.9.4.jar:/home/youngju/work/YCSB/core/target/core-0.18.0-SNAPSHOT.jar site.ycsb.Client -db site.ycsb.db.hbase2.HBaseClient2 -P workloads/workloada -p table=usertable -p columnfamily=family -t
Command line: -db site.ycsb.db.hbase2.HBaseClient2 -P workloads/workloada -p table=usertable -p columnfamily=family -t
YCSB Client 0.18.0-SNAPSHOT

Loading workload...
log4j:WARN No appenders could be found for logger (org.apache.htrace.core.Tracer).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Starting test.
DBWrapper: report latency for each error is false and specific error codes to track for latency are: []
[OVERALL], RunTime(ms), 10020
[OVERALL], Throughput(ops/sec), 99.8003992015968
[TOTAL_GCS_PS_Scavenge], Count, 2
[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 14
[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.13972055888223553
[TOTAL_GCS_PS_MarkSweep], Count, 1
[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 17
[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.16966067864271456
[TOTAL_GCs], Count, 3
[TOTAL_GC_TIME], Time(ms), 31
[TOTAL_GC_TIME_%], Time(%), 0.3093812375249501
[READ], Operations, 485
[READ], AverageLatency(us), 8099.581443298969
[READ], MinLatency(us), 2850
[READ], MaxLatency(us), 197247
[READ], 95thPercentileLatency(us), 25519
[READ], 99thPercentileLatency(us), 49439
[READ], Return=OK, 485
[CLEANUP], Operations, 2
[CLEANUP], AverageLatency(us), 2581.5
[CLEANUP], MinLatency(us), 9
[CLEANUP], MaxLatency(us), 5155
[CLEANUP], 95thPercentileLatency(us), 5155
[CLEANUP], 99thPercentileLatency(us), 5155
[UPDATE], Operations, 515
[UPDATE], AverageLatency(us), 10536.794174757282
[UPDATE], MinLatency(us), 4048
[UPDATE], MaxLatency(us), 226815
[UPDATE], 95thPercentileLatency(us), 35359
[UPDATE], 99thPercentileLatency(us), 66687
[UPDATE], Return=OK, 515

Summary

YCSB is a useful tool that can perform benchmarks on HBase 1.x, 2.x, and 3.x. Although I have not tried it, it also supports performance testing for various databases such as Redis, DynamoDB, Elasticsearch, Cassandra, and MongoDB. In real-world scenarios, no one knows what data will come in or in what pattern, so you should not rely solely on YCSB results. In particular, HBase can experience hotspots, and if requests are excessively concentrated on a specific region server, the overall QPS will decrease. For a detailed look at HBase architecture and row-key design, refer to Cho Daehyeop's post on HBase and Google's Bigtable Architecture.