Redisson RReadWriteLock – None of slaves were synced under load. If I disable checkLockSyncedSlaves, what are the real trade‑offs?

Summary

The Redisson RReadWriteLock is experiencing issues under high traffic, with errors indicating that none of the slaves were synced. Disabling the checkLockSyncedSlaves flag may improve lock acquisition latency, but it comes with trade-offs, including potential lock loss during failover.

Root Cause

The root cause of the issue is the asynchronous replication in Redis Cluster, which can lead to a situation where the master node has acknowledged a write, but the slave nodes have not. This can cause the RReadWriteLock to fail when trying to acquire a lock. The possible causes include:

High traffic and load on the Redis Cluster
Insufficient slavesSyncTimeout setting
Inadequate required replica count configuration

Why This Happens in Real Systems

This issue occurs in real systems due to the trade-off between consistency and availability. In a distributed system like Redis Cluster, it is challenging to achieve both strong consistency and high availability simultaneously. The checkLockSyncedSlaves flag is a mechanism to ensure consistency, but it can come at the cost of increased latency and decreased throughput.

Real-World Impact

The real-world impact of this issue includes:

Lock acquisition latency: Increased latency when acquiring locks, which can affect the overall performance of the system
Lock loss during failover: Potential loss of locks during a failover, which can lead to data inconsistencies and system errors
Reduced throughput: Decreased throughput due to the increased latency and decreased availability of the system

Example or Code

Config config = new Config();
config.useClusterServers()
    .addAddress("redis://localhost:6379")
    .setCheckLockSyncedSlaves(false);
RedissonClient redisson = Redisson.create(config);
RLock lock = redisson.getLock("myLock");

How Senior Engineers Fix It

Senior engineers fix this issue by:

Tuning the slavesSyncTimeout setting: Adjusting the slavesSyncTimeout setting to balance between consistency and availability
Configuring the required replica count: Setting the required replica count to ensure that a sufficient number of replicas acknowledge the write before considering the lock acquired
Implementing retry mechanisms: Implementing retry mechanisms to handle lock acquisition failures and lock loss during failover
Monitoring and analyzing system performance: Continuously monitoring and analyzing system performance to identify potential issues and optimize the configuration

Why Juniors Miss It

Juniors may miss this issue due to:

Lack of understanding of distributed systems: Limited knowledge of the trade-offs between consistency and availability in distributed systems
Insufficient experience with Redis Cluster: Limited experience with Redis Cluster and its configuration options
Overlooking the importance of replication: Failing to consider the importance of replication in ensuring data consistency and system availability