Supervisor usage advice

Summary

This incident examines the risks of replacing Supervisor with a custom infinite-loop Bash script to run Laravel’s schedule:run command every minute. While the script appears simple, it introduces subtle reliability and operational hazards that become significant in real production systems.

Root Cause

The core issue is relying on a hand-rolled process manager instead of a battle‑tested, fault‑tolerant supervisor designed to keep long‑running processes healthy.

Key contributing factors include:

No automatic restart on failure beyond set -e
No memory or resource monitoring
No logging rotation or structured output handling
No protection against runaway processes
No built‑in backoff or throttling

Why This Happens in Real Systems

Engineers often underestimate how fragile long-running shell loops can be. Real systems experience:

Transient failures (network hiccups, PHP segfaults, OOM kills)
Environment drift (updated PHP binaries, changed paths)
Unexpected output that breaks loops or pipes
Zombie processes when child processes aren’t reaped
Cron drift when sleep intervals accumulate over time

These issues accumulate silently until the scheduler stops running entirely.

Real-World Impact

Teams relying on DIY loops often encounter:

Missed scheduled tasks (backups, billing cycles, cleanup jobs)
Silent failures with no alerting
High CPU usage if the loop spins unexpectedly
Memory leaks from PHP or the shell process
Operational confusion during deploys or restarts

In production, these failures can cascade into:

Stale caches
Unsent emails
Failed invoices
Data corruption from skipped maintenance tasks

Example or Code (if necessary and relevant)

Below is the user-provided loop, shown exactly as executable code:

#!/usr/bin/env bash
set -e
while true
do
    php artisan schedule:run
    sleep 60
done

How Senior Engineers Fix It

Experienced engineers avoid reinventing process management. They use Supervisor, systemd, or Docker health checks because these tools provide:

Automatic restarts on crash or exit
Configurable backoff strategies
Resource limits (memory, CPU)
Structured logging
Process isolation
Graceful shutdown handling
Monitoring hooks for alerts

They also:

Run schedule:run via cron every minute, which is perfectly valid
Or run a queue worker under Supervisor and keep the scheduler under cron

The key is using mature, observable, self-healing infrastructure.

Why Juniors Miss It

Junior engineers often:

Focus on “it works on my machine” rather than long-term reliability
Underestimate how often processes fail in production
Assume while true is equivalent to a real process manager
Don’t consider logging, monitoring, or restart semantics
Haven’t yet experienced the pain of silent failures at scale

They see simplicity; seniors see operational risk.