Listener lifetime managment with epoll

Summary

This incident examines a subtle but common lifetime‑management failure when using epoll with pointer‑associated event data. The core issue is that the file descriptor and the dynamically allocated object tied to it can fall out of sync, leading to dangling pointers, double frees, or memory leaks if not managed with a disciplined ownership model.

Root Cause

The root cause is the lack of a single, authoritative owner for the dynamically allocated object associated with the file descriptor. When cleanup paths diverge, the system can:

  • Delete the object but forget to close the file descriptor
  • Close the file descriptor but forget to delete the object
  • Close the descriptor in multiple places, causing undefined behavior
  • Leave the pointer in the epoll instance after the object is freed

The underlying problem is ambiguous ownership and unclear lifecycle boundaries.

Why This Happens in Real Systems

Real systems often evolve organically, and engineers add cleanup logic in multiple places. This leads to:

  • Multiple code paths that can close the same fd
  • Event-driven complexity, where callbacks run at unpredictable times
  • Implicit assumptions about who “owns” the pointer
  • Asynchronous teardown, where epoll events may still fire after an fd is closed

When the system grows, these assumptions break.

Real-World Impact

When fd/object lifetimes diverge, the system may experience:

  • Memory leaks (object never deleted)
  • Use-after-free (epoll still holds a pointer to freed memory)
  • Double close() or double delete
  • Spurious events on stale descriptors
  • Hard-to-reproduce crashes under load

These failures often appear only in production, under concurrency and high churn.

Example or Code (if necessary and relevant)

Below is a minimal example of a safe teardown pattern using RAII‑style ownership:

struct Connection {
    int fd;
    // ...
    ~Connection() {
        close(fd);
    }
};

void register_fd(int ep, int fd) {
    auto* conn = new Connection{fd};
    epoll_event ev{};
    ev.events = EPOLLIN;
    ev.data.ptr = conn;
    epoll_ctl(ep, EPOLL_CTL_ADD, fd, &ev);
}

void handle_event(epoll_event& ev) {
    auto* conn = static_cast(ev.data.ptr);
    if (read(conn->fd, buf, sizeof(buf)) == 0) {
        epoll_ctl(ep, EPOLL_CTL_DEL, conn->fd, nullptr);
        delete conn; // fd closed in destructor
    }
}

This ensures one object owns the fd, and closing the fd is always tied to deleting the object.

How Senior Engineers Fix It

Experienced engineers enforce strict ownership rules:

  • One object owns the fd, and that object is responsible for closing it
  • epoll never owns memory, it only stores a pointer
  • Deletion always implies fd closure, never the other way around
  • EPOLL_CTL_DEL is always called before deletion
  • No raw new/delete — use smart pointers or RAII wrappers
  • Centralized teardown logic, never scattered cleanup paths

A common best practice is:

“The object owns the fd. epoll only references the object.”

Why Juniors Miss It

Juniors often miss this because:

  • They assume epoll “manages” the pointer
  • They treat fd and object lifetimes as independent
  • They clean up in multiple places without a unified model
  • They underestimate how often teardown paths race in event-driven systems
  • They rely on raw pointers instead of RAII

The mistake is subtle, but the consequences are severe. Senior engineers learn to treat lifetime management as a first-class design problem, not an afterthought.

Leave a Comment