Redis Cluster Tutorial (opens in new tab)

(redis.io)

126 pointsanuragramdasan12y ago28 comments

28 comments

Great post. I uploaded Salvatore keynote from Redis Conf 2013 today. http://blog.togo.io/redisconf/a-short-term-plan-for-redis-by...

anuragramdasanOP12y ago

Thanks for the link. I have recently started exploring too much into redis at a source code level and its incredibly fascinating.

csmuk12y ago

It's also beautiful code. It's a great project to learn from.

ransom153812y ago

I have a few serious concerns: Suppose have nodes A,B,C,D running with slaves and you have a user u1 start retrieving and storing data. The odds will be high that the user u1 stores keys on all nodes A,B,C,D since all the keys are based on a numerical hashed slot.

1) The user u1 will end up connecting to all nodes. Thus as you scale adding an "E" you always end up connecting to each cluster having the same amount of connections. Example, let's say A,B,C,D have 5k connections open and you add "E". Now "E" has 5k connections. Now you are blocked by the number of requests per second and no amount of hardware can save you.

2) Let's say B and B1 die. (It happens.) Now your system is completely out. All 5k connections are blocked. It would be nice if A,C,D,E continue to run.

I usually get around things like this using an index server between user u1 and A,B,C,D. When u1 starts retrieving keys the system needs to be designed such that u1 only connects to single node for it's keys.

How do you get around 1,2? Do other people use index servers?

antirez12y ago

Hello, I'm not sure I totally get 1 and 2, maybe sometimes you write "cluster" instead of "node"? Btw I'll try to reply.

1) The amount of connections is not very relevant. You can set a max limit if you want, like with the redis-rb-client that will close old connections before opening new ones if the limit set by the user is less than the number of master nodes. However this is not recommendable as actually what is an idle connection? Just an entry in a data structure of the kernel. So it is not an issue for every client running in a given box to have multiple connections to multiple nodes.

2) The limit in the partition resistance in Redis Cluster is a fundamental design trade off, it means no need for a "merge" operation and that in a given moment only one node is responsible for a given key. I think that in the case of Redis most of the times less partition tolerance in favour of a simpler consistency model is what users need.

EDIT: a clarification on "1", what I mean is that every connection will be active only for the subset of keys it holds, so basically the traffic of every client is split across all its connections. Similarly every node of the cluster only performs a percentage of work proportional to the number of hash slots it holds compared to the full set of 16k hash slots.

EDIT 2: it is not clear from your point "2" if you understand you can have B2, B3, B4, BN if you want.

seiji12y ago

such that u1 only connects to single node for it's keys. Do other people use index servers?

The redis cluster client in your programming language will know the entire current redis cluster state. The redis cluster client will automatically place (and read) keys directly to (and from) the correct instances. The direct reading also reduces latency found in some other cluster implementations where you only have the option to proxy your requests. You as an application programmer never have to worry about multiple servers.

When your redis cluster client connects to the cluster, the redis client reads the current key slot to server map, and uses that for every action (with updates as necessary as failures/promotions happen).

Let's say B and B1 die.

Good example, but you should probably be running with at a minimum B, B1, and B2 (and three entire copies of A, C, D as well). You have to design your deployments in concert with how your underlying architecture works. Redis cluster instances do not have to take up an entire machine. Run a couple masters and a couple replicas per host. Running a cluster with 4 masters means you should run with 8 replicas (minimum). See the math section of https://github.com/mattsta/redis-cluster-playground for more examples.

It's worth stepping back and defining "Cluster" too. A "database cluster" has no meaning by itself. It doesn't specify redundancy, how availability works, or how replication works. A Riak cluster has nothing to do with a Redis cluster. You have to learn the operations on each type as you go.

antirez12y ago

About that, redis-trib, the Redis Cluster config utility, while still pretty basic is already able to allocate masters and slaves in different IPs, so if you give it six nodes like 10.0.0.1:6379 10.0.0.1:6380 10.0.0.2:6379 10.0.0.2:6380 10.0.0.3:6379 10.0.0.3:6380, you can expect a master for each IP address, and replicas always allocated in nodes where there is NOT its master.

mbell12y ago

In general I think you're right but I'm not sure why you would have 5,000 connections to each node. There isn't any reason I can think of to maintain more than 1 connection per client (application process) to each node. You don't gain anything by 'connection pooling' with Redis so it's not like each app server would want to keep a 100 connection pool around. I guess you could have an issue if you have 5000 application processes all wanting to connect to a single redis cluster, but I'd argue that you'd be well past the point of needing to break your system into services by then anyway.

stiff12y ago

If Redis would now just use names for databases instead of numbers and get rid of the limit of number of databases... Managing multiple application instances (production, test, staging for multiple countries) using a single shared Redis instance but requiring separate databases is a huge pain.

guywithabike12y ago

Numbered databases are (de facto) deprecated. Run separate processes, instead.

(Redis is single-threaded, anyways. You probably don't want to run production and staging on the same Redis process.)

dabeeeenster12y ago

Still not a suitable solution IMHO. Now you have to deal with multiple ports, startup scripts etc etc

Every other database on the planet has database names pretty much.

I LOVE Redis but this is just a bonkers limitation IMHO...

moe12y ago

None of the reasons why many traditional databases have multi-tenancy support was ever relevant for redis.

And in a world that increasingly moves to containerization you should really look into adding automation to your admin workflow (e.g. ansible). 'port' and 'startup script' should become 'service' and 'template' in your mind.

seiji12y ago

Now you have to deal with multiple ports, startup scripts etc etc

That's just what happens when you live with production systems. Not every service in the world gives you a dev/staging/production server with "curl github.com/my-bespoke-database/install-and-run.sh"

The last place I worked had 60 physical mysql servers. We had to deal with ports and servers and startup scripts quite a bit.

erichmond12y ago

With tools like Chef, Puppet, Pallet, Ansible, this is a unbelievably solved problem.

janerik12y ago

antirez once commented on this: https://groups.google.com/forum/#!msg/redis-db/vS5wX8X4Cjg/8...

1 more reply

meritt12y ago

If you have separate stacks for each application instance why wouldn't a separate instance of Redis be part of that stack too? Do you really want some buggy code overloading test-redis and bleeding into production?

More generally, you can always use key-prefixes if you want a name-based way to segregate data in the same redis+db instance.

bsg7512y ago

Along this line I would find named Lua scripts useful, so multiple clients could access script functionality without uploading themselves, or having to retrieve the SHA1 from a dictionary.

This idea may of course be contrary to the expected use of Lua scripts in Redis, and an attempt for the RDBMS part of my brain to look at Lua scripts as stored procedures.

heterogenic12y ago

One thing I'd love to see (perhaps in a future iteration?) is a way to have overlapping ranges for redundancy, instead of mirrors.

The big benefit to this is when balancing write-heavy workloads, when the slave would not be getting its share of the total load.

I believe this could be accomplished by running a second DB on each machine as a slave to the next in the cycle, and that's what I'll probably try when we scale up to needing a cluster. But having it be a supported scenario (or facilitated somehow) would make me feel much better.

(edit: Just a feature request of course, exciting to see this release! Thanks for all your hard work antirez!)

taf212y ago

I wish redis had synchronous write option to guarantee that when a write response is received that all nodes in the cluster have also received a copy. Even if this was just an ack that data is replicated into memory instead of disk. Sure there are cases where you'd rather operate with async, but really I would prefer to put haproxy in front of 3 - 6 redis nodes and let it load balance both reads and writes similar to galera cluster with keepalived and a vip I would not need to be concerned with split brain assuming synchronous writes...

seiji12y ago

That's in the plan, it's just not implemented yet. (It's implied under the Note: Redis Cluster in the future will allow users to perform synchronous writes when absolutely needed.)

Something like:

   SYNC 3
   SET key value
   [replies with success only after it's ACK'd on three cluster instances.]

taf212y ago

I would definitely take slower write speed to have this extra assurance of high availability with data consistency... as it is today - it's pretty hard operate redis in a HA way even with sentinel... I'm sure others have figured it out but for me the best I have is a 30 second window when sentinel is switching the backup slave to master and either writes are lost haproxy can hold but it works sometimes... I can imagine with synchronous replication we'd have slower write speed but at least failover could be sub-second and possibly zero downtime... using a vip/keepalived - Still redis is such a good datastore, it's difficult to imagine a world/service without it. for what it's worth this looks to be the best setup i've found: http://failshell.io/sensu/high-availability-sensu/ would be interested if others have other setups that possibly work better...

antirez12y ago

Hey, going a bit off topic about Sentinel, this should not happen. Sentinel + Redis + Clients as a distributed system is not able to guarantee strong consistency with arbitrary partitions clearly, but in the scenario you describe, that is, the master crashes or is partitioned away alone, the window should be like a fraction of a millisecond (the replication link latency) most of the times.

EDIT: to clarify:

1) Sentinels will tell clients master is A.

2) A fails.

3) Sentinels will still tell it is A before failover.

4) Failover happens.

5) Sentinels will finally tell master is B.

1 more reply

dberg12y ago

seems like redis cluster is about turning redis into riak

antirez12y ago

Redis and Riak are from the point of view of distributed systems extremely different beasts. Both are useful in my opinion but the semantics is very different since Riak is basically an implementation of Amazon Dynamo paper. Dynamo-alike stores like Cassandra and Riak are key -> small_value stores that are available even under severe partitions, and where there is not a single node responsible for a given key. Application-assisted merge operations are used in order to merge values later when the network partition heals.

Redis have instead a master - replicas setup that can't usually resist severe partitions. In general only the side of the partition with the majority of masters survive assuming that there is at least a slave for every master that is not in the majority partition.

In practical terms this means that a few nodes going away usually result in the cluster being still available (unless you are really unlucky and all the replicas for a given hash slot go down at the same time), but that under severe partitions the system is likely unavailable.

The tradeoff makes Redis cluster able to work without help from the application, and without merge operations: this is very desirable because of the kind of complex-to-merge and big (millions of elements are common) data structures that a single key can hold in Redis.

bsg7512y ago

In what way?

Some variant of clustering is a very common feature among many different database platforms.

Redis adding clustering does not make it a step towards turning it into Riak any more than it is a step towards turning it into MySQL [1]

[1] http://www.mysql.com/products/cluster/

j / k navigate · click thread line to collapse

28 comments

benarent12y ago

Great post. I uploaded Salvatore keynote from Redis Conf 2013 today. http://blog.togo.io/redisconf/a-short-term-plan-for-redis-by...

anuragramdasanOP12y ago

Thanks for the link. I have recently started exploring too much into redis at a source code level and its incredibly fascinating.

csmuk12y ago

It's also beautiful code. It's a great project to learn from.

ransom153812y ago

2) Let's say B and B1 die. (It happens.) Now your system is completely out. All 5k connections are blocked. It would be nice if A,C,D,E continue to run.

How do you get around 1,2? Do other people use index servers?

antirez12y ago

Hello, I'm not sure I totally get 1 and 2, maybe sometimes you write "cluster" instead of "node"? Btw I'll try to reply.

EDIT 2: it is not clear from your point "2" if you understand you can have B2, B3, B4, BN if you want.

seiji12y ago

such that u1 only connects to single node for it's keys. Do other people use index servers?

Let's say B and B1 die.

antirez12y ago

mbell12y ago

stiff12y ago

guywithabike12y ago

Numbered databases are (de facto) deprecated. Run separate processes, instead.

(Redis is single-threaded, anyways. You probably don't want to run production and staging on the same Redis process.)

dabeeeenster12y ago

Still not a suitable solution IMHO. Now you have to deal with multiple ports, startup scripts etc etc

Every other database on the planet has database names pretty much.

I LOVE Redis but this is just a bonkers limitation IMHO...

moe12y ago

None of the reasons why many traditional databases have multi-tenancy support was ever relevant for redis.

seiji12y ago

Now you have to deal with multiple ports, startup scripts etc etc

That's just what happens when you live with production systems. Not every service in the world gives you a dev/staging/production server with "curl github.com/my-bespoke-database/install-and-run.sh"

The last place I worked had 60 physical mysql servers. We had to deal with ports and servers and startup scripts quite a bit.

erichmond12y ago

With tools like Chef, Puppet, Pallet, Ansible, this is a unbelievably solved problem.

janerik12y ago

antirez once commented on this: https://groups.google.com/forum/#!msg/redis-db/vS5wX8X4Cjg/8...

1 more reply

meritt12y ago

More generally, you can always use key-prefixes if you want a name-based way to segregate data in the same redis+db instance.

bsg7512y ago

Along this line I would find named Lua scripts useful, so multiple clients could access script functionality without uploading themselves, or having to retrieve the SHA1 from a dictionary.

This idea may of course be contrary to the expected use of Lua scripts in Redis, and an attempt for the RDBMS part of my brain to look at Lua scripts as stored procedures.

heterogenic12y ago

One thing I'd love to see (perhaps in a future iteration?) is a way to have overlapping ranges for redundancy, instead of mirrors.

The big benefit to this is when balancing write-heavy workloads, when the slave would not be getting its share of the total load.

(edit: Just a feature request of course, exciting to see this release! Thanks for all your hard work antirez!)

taf212y ago

seiji12y ago

That's in the plan, it's just not implemented yet. (It's implied under the Note: Redis Cluster in the future will allow users to perform synchronous writes when absolutely needed.)

Something like:

   SYNC 3
   SET key value
   [replies with success only after it's ACK'd on three cluster instances.]

taf212y ago

antirez12y ago

EDIT: to clarify:

1) Sentinels will tell clients master is A.

2) A fails.

3) Sentinels will still tell it is A before failover.

4) Failover happens.

5) Sentinels will finally tell master is B.

1 more reply

dberg12y ago

seems like redis cluster is about turning redis into riak

antirez12y ago

bsg7512y ago

In what way?

Some variant of clustering is a very common feature among many different database platforms.

Redis adding clustering does not make it a step towards turning it into Riak any more than it is a step towards turning it into MySQL [1]

[1] http://www.mysql.com/products/cluster/

j / k navigate · click thread line to collapse