# Raft Consensus: What Happens to Client Requests During Leadership Elections?
## Summary
In Raft consensus, client requests can only be processed by an elected leader. During leadership elections—triggered by leader failure or network partitions—the cluster cannot service write requests. Requests arriving at this time are **neither processed nor persisted**, causing **client-facing errors** and requiring **explicit retry logic**. Raft provides mechanisms to redirect clients to the new leader once elected, but temporary unavailability is inherent to the protocol.
## Root Cause
The fundamental issue stems from Raft's leader-centric design:
```go
type RaftState int
const (
Follower RaftState = iota
Candidate
Leader
)
During elections:
- No leader exists as nodes transition between
CandidateandLeaderstates - Followers reject client requests immediately
- Former leaders that lost quorum step down and refuse requests
- Majority quorum isn’t achieved during vote-splitting scenarios
Election timeouts compound this—typically 150-300ms but can extend during network issues.
Why This Happens in Real Systems
Three systemic realities create these scenarios:
- Node Failures: Crashed leaders force elections
- Network Partitions: Isolated nodes trigger unnecessary elections
- Scaling Events: Adding/removing nodes disrupts quorum calculations
Additionally:
- Election timeouts trade availability for liveness guarantees
- Split-brain scenarios temporarily paralyze the cluster
- Clock drift extends unstable periods
Real-World Impact
The practical consequences include:
- Temporary Unavailability: Requests fail during election window
- Increased Latency: Client retries compound during election storms
- Data Staleness: Read-your-writes consistency cannot be guaranteed
- Cascading Failures: High client retry volume overloads nodes
# Example client error from etcd (Raft implementation)
Error: rpc error: code = Unavailable desc = no leader
Example or Code (if applicable)
Here’s a real-world handling pattern:
func (n *Node) HandleClientRequest(req Request) (Response, error) {
// Leadership check
if n.state != Leader {
return nil, errors.New("not leader")
}
// Append to Raft log only if leader
err := n.appendLog(req)
return response, err
}
// Client retry logic (exponential backoff)
func retryRequest(req Request) (res Response) {
for attempt := 0; attempt < maxRetries; attempt++ {
res, err := sendToLeader(req)
if err == nil {
return res
}
time.Sleep(exponentialBackoff(attempt))
}
panic("request failed after retries")
}
Critical components:
- Immediate error return from non-leaders
- Client-side backoff logic
- Leader discovery hooks in error responses
How Senior Engineers Fix It
Strategies to mitigate impact:
- Graceful Leadership Transfer:
raft.LeaderTransfer(targetID) // Proactive handoff before shutdown - Client Redirection:
Include probable leader address in error responses (CurrentLeader: 10.5.0.3) - Pre-Vote Phase:
Prevent disrupted nodes from causing elections - Tunable Timeouts:
Adjust election timers based on network RTT - Idempotency Tokens:
Allow safe client retries without data duplication - Health Checks:
Use application-layer checks to filter unhealthy nodes
Why Juniors Miss It
Common oversight patterns:
- Assuming always-on leadership: “Why would there ever be no leader?”
- Underestimating election frequency: Not testing network partition scenarios
- Lacking retry logic: Treating temporary errors as permanent failures
- Ignoring implementation nuances: Using raw Raft instead of production-ready libs (like etcd’s Raft)
- Misunderstanding quorum: Assuming cluster stays operational during node loss
Ironically, attempting to circumvent leader checks (“just write to followers!”) violates consensus guarantees and introduces data corruption risks.
Key Insight: Election gaps aren’t bugs—they’re safety mechanisms. Robust systems design expects and mitigates them.