Supervisor usage advice

Summary

This incident examines the risks of replacing Supervisor with a custom infinite-loop Bash script to run Laravel’s schedule:run command every minute. While the script appears simple, it introduces subtle reliability and operational hazards that become significant in real production systems.

Root Cause

The core issue is relying on a hand-rolled process manager instead of a battle‑tested, fault‑tolerant supervisor designed to keep long‑running processes healthy.

Key contributing factors include:

  • No automatic restart on failure beyond set -e
  • No memory or resource monitoring
  • No logging rotation or structured output handling
  • No protection against runaway processes
  • No built‑in backoff or throttling

Why This Happens in Real Systems

Engineers often underestimate how fragile long-running shell loops can be. Real systems experience:

  • Transient failures (network hiccups, PHP segfaults, OOM kills)
  • Environment drift (updated PHP binaries, changed paths)
  • Unexpected output that breaks loops or pipes
  • Zombie processes when child processes aren’t reaped
  • Cron drift when sleep intervals accumulate over time

These issues accumulate silently until the scheduler stops running entirely.

Real-World Impact

Teams relying on DIY loops often encounter:

  • Missed scheduled tasks (backups, billing cycles, cleanup jobs)
  • Silent failures with no alerting
  • High CPU usage if the loop spins unexpectedly
  • Memory leaks from PHP or the shell process
  • Operational confusion during deploys or restarts

In production, these failures can cascade into:

  • Stale caches
  • Unsent emails
  • Failed invoices
  • Data corruption from skipped maintenance tasks

Example or Code (if necessary and relevant)

Below is the user-provided loop, shown exactly as executable code:

#!/usr/bin/env bash
set -e
while true
do
    php artisan schedule:run
    sleep 60
done

How Senior Engineers Fix It

Experienced engineers avoid reinventing process management. They use Supervisor, systemd, or Docker health checks because these tools provide:

  • Automatic restarts on crash or exit
  • Configurable backoff strategies
  • Resource limits (memory, CPU)
  • Structured logging
  • Process isolation
  • Graceful shutdown handling
  • Monitoring hooks for alerts

They also:

  • Run schedule:run via cron every minute, which is perfectly valid
  • Or run a queue worker under Supervisor and keep the scheduler under cron

The key is using mature, observable, self-healing infrastructure.

Why Juniors Miss It

Junior engineers often:

  • Focus on “it works on my machine” rather than long-term reliability
  • Underestimate how often processes fail in production
  • Assume while true is equivalent to a real process manager
  • Don’t consider logging, monitoring, or restart semantics
  • Haven’t yet experienced the pain of silent failures at scale

They see simplicity; seniors see operational risk.

Leave a Comment