- Authors
- Name
Overview
There are times when you want to push data into HBase to measure the maximum performance of an HBase cluster. Or you may want to insert dummy data into HBase. For these cases, there is a tool called YCSB that I would like to introduce.
Installation
Install Maven
To build YCSB from source code, Maven version 3.x or higher is required. Install it with the following command and then check the Maven version. In my case, I used Maven 3.6.3.
sudo apt install maven
mvn -version
Git Clone
Clone the project from YCSB-github.
Maven Build
cd YCSB
mvn clean package
Install Python 2.7
YCSB is designed to run on Python 2.xx. Therefore, set up a virtualenv that can run Python 2.7 and activate it.
pip install virtualenv
virtualenv py27 --python=python2.7
source py27/bin/activate
Create a Table in HBase
Refer to the YCSB HBase2 documentation to create a table called usertable in HBase beforehand.
hbase:001:0> n_splits = 50
=> 50
hbase:002:0> create 'usertable', 'family', {SPLITS => (1..n_splits).map {|i| "user#{1000+i*(9999-1000)/n_splits}"}}
2022-11-19 12:29:26,454 INFO [RPCClient-NioEventLoopGroup-1-1] Configuration.deprecation (Configuration.java:logDeprecation(1394)) - hbase.client.pause.cqtbe is deprecated. Instead, use hbase.client.pause.server.overloaded
Created table usertable2022-11-19 12:29:30,765 INFO [RPCClient-NioEventLoopGroup-1-2] client.AsyncHBaseAdmin (RawAsyncHBaseAdmin.java:onFinished(2569)) - Operation: CREATE, Table Name: default:usertable completed
Took 4.7173 seconds
=> Hbase::Table - usertable
YCSB Folder
YCSB requires hbase-site.xml to run benchmarks, so create a new folder for this purpose. Then go to one of the servers in the HBase cluster, copy the ${HBASE_HOME}/conf/hbase-site.xml file, and paste it into this folder. You can also use scp to copy it.
/YCSB$ mkdir youngju-hbase
/YCSB$ cd youngju-hbase/
/YCSB/youngju-hbase$ vim hbase-site.xml
root@ubuntu01:~# cat /usr/local/hbase/conf/hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://ubuntu01:9000/hbase</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>ubuntu01,ubuntu02,ubuntu03</value>
</property>
<property>
<name>hbase.wal.provider</name>
<value>filesystem</value>
</property>
</configuration>
Run YCSB
Run bin/ycsb load hbase2 -P workloads/workloada -cp youngju-hbase/ -p table=usertable -p from the YCSB home directory, and 1000 records will be loaded as shown below. Make sure the Python 2.7 virtual environment is activated so the Python script runs without issues.
(py27) YCSB$ bin/ycsb load hbase2 -P workloads/workloada -cp youngju-hbase/ -p table=usertable -p columnfamily=family
[WARN] Running against a source checkout. In order to get our runtime dependencies we'll have to invoke Maven. Depending on the state of your system, this may take ~30-45 seconds
[DEBUG] Running 'mvn -pl site.ycsb:hbase2-binding -am package -DskipTests dependency:build-classpath -DincludeScope=compile -Dmdep.outputFilterFile=true'
java -cp youngju-hbase/:/home/youngju/work/YCSB/hbase2/conf:/home/youngju/work/YCSB/hbase2/target/hbase2-binding-0.18.0-SNAPSHOT.jar:/home/youngju/.m2/repository/org/apache/htrace/htrace-core4/4.1.0-incubating/htrace-core4-4.1.0-incubating.jar:/home/youngju/.m2/repository/org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar:/home/youngju/.m2/repository/commons-logging/commons-logging/1.2/commons-logging-1.2.jar:/home/youngju/.m2/repository/org/apache/yetus/audience-annotations/0.5.0/audience-annotations-0.5.0.jar:/home/youngju/.m2/repository/com/github/stephenc/findbugs/findbugs-annotations/1.3.9-1/findbugs-annotations-1.3.9-1.jar:/home/youngju/.m2/repository/org/hdrhistogram/HdrHistogram/2.1.4/HdrHistogram-2.1.4.jar:/home/youngju/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/home/youngju/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.4/jackson-mapper-asl-1.9.4.jar:/home/youngju/.m2/repository/org/apache/hbase/hbase-shaded-client/2.2.3/hbase-shaded-client-2.2.3.jar:/home/youngju/.m2/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar:/home/youngju/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.4/jackson-core-asl-1.9.4.jar:/home/youngju/work/YCSB/core/target/core-0.18.0-SNAPSHOT.jar site.ycsb.Client -db site.ycsb.db.hbase2.HBaseClient2 -P workloads/workloada -p table=usertable -p columnfamily=family -load
Command line: -db site.ycsb.db.hbase2.HBaseClient2 -P workloads/workloada -p table=usertable -p columnfamily=family -load
YCSB Client 0.18.0-SNAPSHOT
Loading workload...
log4j:WARN No appenders could be found for logger (org.apache.htrace.core.Tracer).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Starting test.
DBWrapper: report latency for each error is false and specific error codes to track for latency are: []
[OVERALL], RunTime(ms), 11783
[OVERALL], Throughput(ops/sec), 84.86803021301876
[TOTAL_GCS_PS_Scavenge], Count, 2
[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 10
[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.08486803021301877
[TOTAL_GCS_PS_MarkSweep], Count, 1
[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 12
[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.10184163625562251
[TOTAL_GCs], Count, 3
[TOTAL_GC_TIME], Time(ms), 22
[TOTAL_GC_TIME_%], Time(%), 0.18670966646864126
[CLEANUP], Operations, 2
[CLEANUP], AverageLatency(us), 4844.5
[CLEANUP], MinLatency(us), 13
[CLEANUP], MaxLatency(us), 9679
[CLEANUP], 95thPercentileLatency(us), 9679
[CLEANUP], 99thPercentileLatency(us), 9679
[INSERT], Operations, 1000
[INSERT], AverageLatency(us), 9895.29
[INSERT], MinLatency(us), 4144
[INSERT], MaxLatency(us), 1103871
[INSERT], 95thPercentileLatency(us), 14295
[INSERT], 99thPercentileLatency(us), 23871
[INSERT], Return=OK, 1000
Next, change only the load part to run and execute bin/ycsb run hbase2 -P workloads/workloada -cp youngju-hbase/ -p table=usertable -p. This will perform operations such as PUT, READ, UPDATE, etc., as shown below.
(py27) YCSB$ bin/ycsb run hbase2 -P workloads/workloada -cp youngju-hbase/ -p table=usertable -p columnfamily=family
[WARN] Running against a source checkout. In order to get our runtime dependencies we'll have to invoke Maven. Depending on the state of your system, this may take ~30-45 seconds
[DEBUG] Running 'mvn -pl site.ycsb:hbase2-binding -am package -DskipTests dependency:build-classpath -DincludeScope=compile -Dmdep.outputFilterFile=true'
java -cp youngju-hbase/:/home/youngju/work/YCSB/hbase2/conf:/home/youngju/work/YCSB/hbase2/target/hbase2-binding-0.18.0-SNAPSHOT.jar:/home/youngju/.m2/repository/org/apache/htrace/htrace-core4/4.1.0-incubating/htrace-core4-4.1.0-incubating.jar:/home/youngju/.m2/repository/org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar:/home/youngju/.m2/repository/commons-logging/commons-logging/1.2/commons-logging-1.2.jar:/home/youngju/.m2/repository/org/apache/yetus/audience-annotations/0.5.0/audience-annotations-0.5.0.jar:/home/youngju/.m2/repository/com/github/stephenc/findbugs/findbugs-annotations/1.3.9-1/findbugs-annotations-1.3.9-1.jar:/home/youngju/.m2/repository/org/hdrhistogram/HdrHistogram/2.1.4/HdrHistogram-2.1.4.jar:/home/youngju/.m2/repository/log4j/log4j/1.2.17/log4j-1.2.17.jar:/home/youngju/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.4/jackson-mapper-asl-1.9.4.jar:/home/youngju/.m2/repository/org/apache/hbase/hbase-shaded-client/2.2.3/hbase-shaded-client-2.2.3.jar:/home/youngju/.m2/repository/org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar:/home/youngju/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.4/jackson-core-asl-1.9.4.jar:/home/youngju/work/YCSB/core/target/core-0.18.0-SNAPSHOT.jar site.ycsb.Client -db site.ycsb.db.hbase2.HBaseClient2 -P workloads/workloada -p table=usertable -p columnfamily=family -t
Command line: -db site.ycsb.db.hbase2.HBaseClient2 -P workloads/workloada -p table=usertable -p columnfamily=family -t
YCSB Client 0.18.0-SNAPSHOT
Loading workload...
log4j:WARN No appenders could be found for logger (org.apache.htrace.core.Tracer).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Starting test.
DBWrapper: report latency for each error is false and specific error codes to track for latency are: []
[OVERALL], RunTime(ms), 10020
[OVERALL], Throughput(ops/sec), 99.8003992015968
[TOTAL_GCS_PS_Scavenge], Count, 2
[TOTAL_GC_TIME_PS_Scavenge], Time(ms), 14
[TOTAL_GC_TIME_%_PS_Scavenge], Time(%), 0.13972055888223553
[TOTAL_GCS_PS_MarkSweep], Count, 1
[TOTAL_GC_TIME_PS_MarkSweep], Time(ms), 17
[TOTAL_GC_TIME_%_PS_MarkSweep], Time(%), 0.16966067864271456
[TOTAL_GCs], Count, 3
[TOTAL_GC_TIME], Time(ms), 31
[TOTAL_GC_TIME_%], Time(%), 0.3093812375249501
[READ], Operations, 485
[READ], AverageLatency(us), 8099.581443298969
[READ], MinLatency(us), 2850
[READ], MaxLatency(us), 197247
[READ], 95thPercentileLatency(us), 25519
[READ], 99thPercentileLatency(us), 49439
[READ], Return=OK, 485
[CLEANUP], Operations, 2
[CLEANUP], AverageLatency(us), 2581.5
[CLEANUP], MinLatency(us), 9
[CLEANUP], MaxLatency(us), 5155
[CLEANUP], 95thPercentileLatency(us), 5155
[CLEANUP], 99thPercentileLatency(us), 5155
[UPDATE], Operations, 515
[UPDATE], AverageLatency(us), 10536.794174757282
[UPDATE], MinLatency(us), 4048
[UPDATE], MaxLatency(us), 226815
[UPDATE], 95thPercentileLatency(us), 35359
[UPDATE], 99thPercentileLatency(us), 66687
[UPDATE], Return=OK, 515
Summary
YCSB is a useful tool that can perform benchmarks on HBase 1.x, 2.x, and 3.x. Although I have not tried it, it also supports performance testing for various databases such as Redis, DynamoDB, Elasticsearch, Cassandra, and MongoDB. In real-world scenarios, no one knows what data will come in or in what pattern, so you should not rely solely on YCSB results. In particular, HBase can experience hotspots, and if requests are excessively concentrated on a specific region server, the overall QPS will decrease. For a detailed look at HBase architecture and row-key design, refer to Cho Daehyeop's post on HBase and Google's Bigtable Architecture.