Redis Cluster mutli-key command optimisation with hash tags

Redis Cluster allows a Redis installation to automatically partition its dataset across multiple nodes. Keys are partitioned using the concept of hash slots, of which there are 16384 in total, distributed across the cluster's nodes. For example, if the cluster consists of three nodes, the first node contains hash slots 0 to 5500, the second contains hash slots 5501 to 11000 and the third node contains hash slots 11001 to 16383. To determine a key's hash slot, we take the CRC16 of the key's modulo 16384.

One implication of the way keys are distributed across nodes is that operations that target multiple keys, such as MSET or MGET, issue more Redis commands than they would on a single Redis node setup.

To demonstrate, we will store the world's cities' populations on a Redis cluster. We will then retrieve the populations of all cities in Germany and sum them up to calculate the country's population. The supporting Python code is available on github and depends on the redis-py-cluster library.

First, let's setup a Redis cluster. There are several ways to run a Redis cluster, for example with Docker. Here, we download and build Redis from source, assuming a *nix system with make. After building Redis, we set up the create-cluster utility for a cluster consisting of six master and six slave Redis nodes.

      $ ./setup-redis-cluster.sh
      [...]
      [OK] All 16384 slots covered.
    

We can now import city population data into the running cluster. One way to achieve this is by issuing one SET command per key. The keys' format is <country>_<city>, for example France_Paris.

      $ ./measure.sh set.py
      OK
      OK
      OK
      OK
      OK
      OK

      real 0m6.439s
      user 0m2.978s
      sys  0m0.863s
      4460
      4465
      4439
      4508
      4314
      4436
    

The time command is used as a crude measure of the execution time of the Python scripts that issue the Redis commands. The last six lines of the measure.sh script's output are total_commands_processed, as reported by each of the cluster's six master nodes. In this first attempt, we can see that key distribution is even across the cluster's nodes and that we've issued more than 4000 SET commands per node. This is consistent with our expectations of Redis Cluster's key hashing algorithm.

Insertion can be optimised by reducing the number of issued commands with the use of MSET. Calling the redis-py-cluster library's mset function with multiple keys doesn't have the desired effect. Its implementation iterates over the keys provided to it and issues a SET command for each. That's because a Redis Cluster node will only accept multi-key commands where all the keys hash to the same slot. To address this, Redis Cluster provides the concept of hash tags, according to which, if there is a substring between {} brackets in a key, only what is inside the string is hashed. An example of this in our demo application would be {France}_Paris. By surrounding the country section of each key in curly braces, we are effectively storing them in the same slot, which means that we can store an entire group by calling MSET once. That is what the mset.py script does.

      $ ./measure.sh mset.py
      OK
      OK
      OK
      OK
      OK
      OK

      real 0m0.844s
      user 0m0.563s
      sys  0m0.097s
      31
      32
      37
      27
      26
      29
    

The demo application's mset.py script issues less Redis commands and executes faster than set.py.

Now suppose we want to determine Germany's population by fetching all of its cities' populations and summing them up. The demo application provides two scripts for doing this: get.py which issues one GET command per city and mget.py which uses one MGET command to fetch all of the cities' populations.

      $ ./measure.sh get.py
      OK
      OK
      OK
      OK
      OK
      OK
      Population: 57857061

      real 0m0.729s
      user 0m0.483s
      sys  0m0.096s
      295
      257
      262
      262
      278
      272
    
      $ ./measure.sh mget.py
      OK
      OK
      OK
      OK
      OK
      OK
      Population: 57857061

      real 0m0.285s
      user 0m0.242s
      sys  0m0.030s
      5
      2
      3
      3
      3
      3
    

Similar to when the cluster was populated, the MGET version is faster and issues less commands than its GET counterpart.

This technique can improve the performance of applications where groups of keys are frequently accessed together. However, it is worth noting that by grouping keys using hash tags, we are effectively overriding Redis Cluster's key hashing algorithm and potentially making keys distributed less evenly across nodes, which can cause certain nodes to use more memory than others.

This article was inspired by work we've done with my colleagues from the Supply team at loveholidays.com.

George Malamidis, 2021-04-24

References / Links

Contact