Raft Consensus: Client requests when a leader is not elected

# Raft Consensus: What Happens to Client Requests During Leadership Elections?

## Summary  
In Raft consensus, client requests can only be processed by an elected leader. During leadership elections—triggered by leader failure or network partitions—the cluster cannot service write requests. Requests arriving at this time are **neither processed nor persisted**, causing **client-facing errors** and requiring **explicit retry logic**. Raft provides mechanisms to redirect clients to the new leader once elected, but temporary unavailability is inherent to the protocol.

## Root Cause  
The fundamental issue stems from Raft's leader-centric design:

```go
type RaftState int

const (
    Follower RaftState = iota
    Candidate
    Leader
)

During elections:

  1. No leader exists as nodes transition between Candidate and Leader states
  2. Followers reject client requests immediately
  3. Former leaders that lost quorum step down and refuse requests
  4. Majority quorum isn’t achieved during vote-splitting scenarios

Election timeouts compound this—typically 150-300ms but can extend during network issues.

Why This Happens in Real Systems

Three systemic realities create these scenarios:

  • Node Failures: Crashed leaders force elections
  • Network Partitions: Isolated nodes trigger unnecessary elections
  • Scaling Events: Adding/removing nodes disrupts quorum calculations

Additionally:

  • Election timeouts trade availability for liveness guarantees
  • Split-brain scenarios temporarily paralyze the cluster
  • Clock drift extends unstable periods

Real-World Impact

The practical consequences include:

  • Temporary Unavailability: Requests fail during election window
  • Increased Latency: Client retries compound during election storms
  • Data Staleness: Read-your-writes consistency cannot be guaranteed
  • Cascading Failures: High client retry volume overloads nodes
# Example client error from etcd (Raft implementation)
Error:  rpc error: code = Unavailable desc = no leader

Example or Code (if applicable)

Here’s a real-world handling pattern:

func (n *Node) HandleClientRequest(req Request) (Response, error) {
    // Leadership check
    if n.state != Leader {
        return nil, errors.New("not leader")
    }

    // Append to Raft log only if leader
    err := n.appendLog(req)
    return response, err
}

// Client retry logic (exponential backoff)
func retryRequest(req Request) (res Response) {
    for attempt := 0; attempt < maxRetries; attempt++ {
        res, err := sendToLeader(req)
        if err == nil {
            return res
        }
        time.Sleep(exponentialBackoff(attempt))
    }
    panic("request failed after retries")
}

Critical components:

  1. Immediate error return from non-leaders
  2. Client-side backoff logic
  3. Leader discovery hooks in error responses

How Senior Engineers Fix It

Strategies to mitigate impact:

  • Graceful Leadership Transfer:
    raft.LeaderTransfer(targetID) // Proactive handoff before shutdown
  • Client Redirection:
    Include probable leader address in error responses (CurrentLeader: 10.5.0.3)
  • Pre-Vote Phase:
    Prevent disrupted nodes from causing elections
  • Tunable Timeouts:
    Adjust election timers based on network RTT
  • Idempotency Tokens:
    Allow safe client retries without data duplication
  • Health Checks:
    Use application-layer checks to filter unhealthy nodes

Why Juniors Miss It

Common oversight patterns:

  • Assuming always-on leadership: “Why would there ever be no leader?”
  • Underestimating election frequency: Not testing network partition scenarios
  • Lacking retry logic: Treating temporary errors as permanent failures
  • Ignoring implementation nuances: Using raw Raft instead of production-ready libs (like etcd’s Raft)
  • Misunderstanding quorum: Assuming cluster stays operational during node loss

Ironically, attempting to circumvent leader checks (“just write to followers!”) violates consensus guarantees and introduces data corruption risks.

Key Insight: Election gaps aren’t bugs—they’re safety mechanisms. Robust systems design expects and mitigates them.

Leave a Comment