Skip to content
Published on

HBase Row Key Design

Authors
  • Name
    Twitter

Background

The design of the HBase row key is very important. Since regions are divided by the range of row keys, if the row key prefix is not well designed, hot spot regions will occur, leading to significant performance degradation. One method to prevent this is to place a salt at the very beginning of the row key so that the rows are distributed well across different regions.

For example, if you design the row key as send date:send time:message_id when storing messages, the following messages would be processed by the same region server, causing performance degradation.

230611:063031:1231231
230611:063032:1231232
230611:063032:1231233
230611:063033:1231234
230611:063033:1231235

What if we put the message_id at the front?

1231231:230611:063031
1231232:230611:063032
1231233:230611:063032
1231234:230611:063033
1231235:230611:063033

This would also cause writes to concentrate on a single region due to the sequentially increasing message_id. A good way to prevent this is to add a salt as a prefix to the row key using a good hash value.

With salt added, the row key structure would become salt:send date:send time:message_id. The salt uses the return value from putting another key into a hash function, because the hash function's return value has a consistent length and randomness, which helps distribute regions evenly.

Among the most commonly used hash functions -- SHA, AES, and MD5 -- let's use MD5. Using MD5_function(message_id) as the salt, the row keys would look like this:

8D4646EB2D7067126EB08ADB0672F7BB:230611:063031:1231231
715782C59C0561E9B6CE0F3D522C32F1:230611:063032:1231232
57F962C03EF3526EC6E95CEB50785C4C:230611:063032:1231233
8B353D5CC07E13577608711F4602FCB7:230611:063033:1231234
430EDB0C535BF08174E122EFECFA711D:230611:063033:1231235

Since the prefix order is no longer sequential, we can expect the data to be well scattered across different region servers. This allows for balanced use of HBase Region Server performance, greatly contributing to performance improvement.