Why rapid SMS resend clicks lock users out and how to prevent it

Summary

A developer attempting to verify their account through SMS/mobile validation triggered an automated rate-limiting mechanism by repeatedly clicking the “resend message” button. This behavior, intended to resolve a perceived delay caused by spam filters, resulted in a temporary or permanent blacklist of the user’s phone number/identity within the verification service. Even after whitelisting the sender, the user remains unable to receive codes, indicating the block is enforced at the application or service provider layer, not the device layer.

Root Cause

The failure is a textbook case of distributed denial-of-service (DDoS) protection logic being triggered by human error. The specific triggers were:

  • Rapid-fire requests: Multiple “resend” requests within a short time window violated the service’s anti-enumeration and anti-spam thresholds.
  • Velocity Checks: Security systems detect high-frequency requests from a single identifier (phone number/IP) as a brute-force attempt to intercept or guess a 2FA code.
  • Stateful Blocking: Once the threshold was breached, the system moved the identifier into a “blocked” state in the database or a distributed cache (like Redis) to preserve system resources and prevent SMS gateway costs.

Why This Happens in Real Systems

In large-scale distributed systems, security is rarely about a single firewall; it is about layered defense-in-depth:

  • Cost Protection: SMS providers charge per message. A single malicious actor or a confused user can incur significant costs through SMS pumping attacks.
  • Resource Exhaustion: Every verification attempt consumes compute, database IO, and network bandwidth. Rate limiting ensures one user cannot degrade the experience for everyone else.
  • Security Integrity: Rapid requests are often a precursor to credential stuffing or OTP brute-forcing. Automated systems are tuned to be “pessimistic”—they assume a high-frequency requester is an attacker until proven otherwise.

Real-World Impact

  • User Friction: Legitimate users are locked out of critical onboarding flows, leading to high churn rates during the most sensitive part of the lifecycle.
  • Operational Overhead: Increased volume of support tickets (as seen in this case) requiring manual intervention from Customer Success or Security teams.
  • Reputational Damage: If the “cool-down” period is opaque or excessively long, users perceive the platform as unreliable or broken.

Example or Code (if necessary and relevant)

import time

class RateLimiter:
    def __init__(self, limit, window):
        self.limit = limit
        self.window = window
        self.attempts = {}

    def is_blocked(self, user_id):
        now = time.time()
        user_history = self.attempts.get(user_id, [])

        # Filter attempts within the sliding window
        user_history = [t for t in user_history if now - t = self.limit:
            return True
        return False

    def record_attempt(self, user_id):
        if user_id not in self.attempts:
            self.attempts[user_id] = []
        self.attempts[user_id].append(time.time())

# Simulation of the user's mistake
limiter = RateLimiter(limit=3, window=3600) # 3 attempts per hour
user_id = "phone:+123456789"

for i in range(5):
    if limiter.is_blocked(user_id):
        print(f"Attempt {i+1}: Access Denied (Blacklisted)")
    else:
        limiter.record_attempt(user_id)
        print(f"Attempt {i+1}: Code Sent")

How Senior Engineers Fix It

Senior engineers don’t just “unblock” the user; they design resilient verification flows:

  • Exponential Backoff (Client-Side): Implement UI logic that disables the “resend” button and forces a mandatory wait time (e.g., 1m, 5m, 15m, 1h) that increases with every failed attempt.
  • Graceful Degradation: Instead of a hard block, introduce secondary verification channels (Email, TOTP, or Authenticator apps) once the SMS threshold is reached.
  • Observability & Tooling: Build internal Admin Dashboards that allow Support Engineers to view the specific reason for a block (e.g., “SMS_RATE_LIMIT_EXCEEDED”) and provide a controlled “manual override” mechanism.
  • Circuit Breakers: Implement system-wide circuit breakers to protect the SMS gateway from being overwhelmed by coordinated attacks.

Why Juniors Miss It

  • Focusing on the Symptom: A junior engineer might focus on the user’s phone settings (spam folder/whitelisting) rather than the server-side state.
  • Ignoring the “Why” of Security: They often view rate limiting as a nuisance to be bypassed rather than a critical security and cost-control feature.
  • Linear Thinking: They assume if an action failed once, retrying immediately is the logical next step, failing to account for stateful, time-dependent security policies.

Leave a Comment