Redis Cluster allows a Redis installation to automatically partition its dataset across multiple nodes. Keys are partitioned using the concept of hash slots, of which there are 16384 in total, distributed across the cluster's nodes. For example, if the cluster consists of three nodes, the first node contains hash slots 0 to 5500, the second contains hash slots 5501 to 11000 and the third node contains hash slots 11001 to 16383. To determine a key's hash slot, we take the CRC16 of the key's modulo 16384.
One implication of the way keys are distributed across nodes is that operations that target multiple keys, such as MSET
or MGET
, issue more Redis commands than they would on a single Redis node setup.
To demonstrate, we will store the world's cities' populations on a Redis cluster. We will then retrieve the populations of all cities in Germany and sum them up to calculate the country's population. The supporting Python code is available on github and depends on the redis-py-cluster library.
First, let's setup a Redis cluster. There are several ways to run a Redis cluster, for example with Docker. Here, we download and build Redis from source, assuming a *nix system with make
. After building Redis, we set up the create-cluster utility for a cluster consisting of six master and six slave Redis nodes.
$ ./setup-redis-cluster.sh
[...]
[OK] All 16384 slots covered.
We can now import city population data into the running cluster. One way to achieve this is by issuing one SET
command per key. The keys' format is <country>_<city>
, for example France_Paris
.
$ ./measure.sh set.py OK OK OK OK OK OK real 0m6.439s user 0m2.978s sys 0m0.863s 4460 4465 4439 4508 4314 4436
The time
command is used as a crude measure of the execution time of the Python scripts that issue the Redis commands. The last six lines of the measure.sh
script's output are total_commands_processed
, as reported by each of the cluster's six master nodes. In this first attempt, we can see that key distribution is even across the cluster's nodes and that we've issued more than 4000 SET
commands per node. This is consistent with our expectations of Redis Cluster's key hashing algorithm.
Insertion can be optimised by reducing the number of issued commands with the use of MSET
. Calling the redis-py-cluster library's mset
function with multiple keys doesn't have the desired effect. Its implementation iterates over the keys provided to it and issues a SET
command for each. That's because a Redis Cluster node will only accept multi-key commands where all the keys hash to the same slot. To address this, Redis Cluster provides the concept of hash tags, according to which, if there is a substring between {} brackets in a key, only what is inside the string is hashed
. An example of this in our demo application would be {France}_Paris
. By surrounding the country section of each key in curly braces, we are effectively storing them in the same slot, which means that we can store an entire group by calling MSET
once. That is what the mset.py
script does.
$ ./measure.sh mset.py OK OK OK OK OK OK real 0m0.844s user 0m0.563s sys 0m0.097s 31 32 37 27 26 29
The demo application's mset.py
script issues less Redis commands and executes faster than set.py
.
Now suppose we want to determine Germany's population by fetching all of its cities' populations and summing them up. The demo application provides two scripts for doing this: get.py
which issues one GET
command per city and mget.py
which uses one MGET
command to fetch all of the cities' populations.
$ ./measure.sh get.py OK OK OK OK OK OK Population: 57857061 real 0m0.729s user 0m0.483s sys 0m0.096s 295 257 262 262 278 272
$ ./measure.sh mget.py OK OK OK OK OK OK Population: 57857061 real 0m0.285s user 0m0.242s sys 0m0.030s 5 2 3 3 3 3
Similar to when the cluster was populated, the MGET
version is faster and issues less commands than its GET
counterpart.
This technique can improve the performance of applications where groups of keys are frequently accessed together. However, it is worth noting that by grouping keys using hash tags, we are effectively overriding Redis Cluster's key hashing algorithm and potentially making keys distributed less evenly across nodes, which can cause certain nodes to use more memory than others.
This article was inspired by work we've done with my colleagues from the Supply team at loveholidays.com.
George Malamidis, 2021-04-24