Are non-blocking Linux sockets thread-safe for multiple writers?

Summary

Multiple threads writing to a shared non-blocking TCP socket is not inherently thread-safe. The Linux kernel does not guarantee that data from different send() calls will be serialized without interleaving, even though send() itself is a single system call. In practice, data may interleave if writes are partial and buffers overlap, or if the application uses non-atomic buffer compositions. Non-blocking mode does not change the threading semantics—it only affects blocking behavior. For reliable, atomic message delivery, synchronization must be implemented by the application, not relied upon from the kernel.

Root Cause

The kernel ensures that each individual send() system call is atomic from the perspective of the caller, meaning the data from a single call will not be split into pieces from other threads. However, atomicity does not imply serialization across threads.

Partial writes: In non-blocking mode, send() may return fewer bytes than requested (EAGAIN/EWOULDBLOCK or partial write). If threads retry without coordination, their data can interleave.
Buffer overlap: If threads pass overlapping memory regions (e.g., shared buffer with offsets), the kernel’s atomicity per syscall does not protect against application-level interleaving.
Kernel queue behavior: The TCP stack’s write buffer is shared; concurrent writes are placed into the buffer, but ordering is not guaranteed across threads unless synchronized.

Why This Happens in Real Systems

In production systems, this issue arises because:

Performance optimization: Engineers often share sockets across threads for throughput, assuming the kernel handles concurrency. However, kernel guarantees are limited to syscall atomicity, not cross-thread serialization.
Non-blocking semantics: Non-blocking sockets increase throughput but expose partial writes, requiring careful retry logic. Without locks or atomic operations, data from multiple threads can interleave.
TCP stream nature: TCP is a byte stream; the kernel does not insert message boundaries. If multiple threads write to the same stream without synchronization, the application sees a combined stream, not isolated messages.

Real-World Impact

Data corruption: Interleaved data can break application-level protocols (e.g., JSON, protobuf) if messages are concatenated incorrectly.
Debugging complexity: Symptoms appear as sporadic parsing errors or corrupted logs, making root-cause analysis difficult.
Performance degradation: Over-synchronization (e.g., locking every send) can reduce concurrency benefits, while under-synchronization leads to retransmissions or retries.
Resource exhaustion: On high-load systems, uncoordinated writes may cause buffer overflows, leading to increased latency or connection resets.

Example or Code

Below is a minimal example showing two threads writing to a non-blocking TCP socket without synchronization. This illustrates potential interleaving and partial writes.

#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define PORT 8080
#define BUF_SIZE 1024

int sockfd;

void *writer(void *arg) {
    char *msg = (char *)arg;
    int len = strlen(msg);
    int sent = 0;
    while (sent  0) {
            sent += n;
        } else if (n == -1 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
            usleep(1000); // busy wait, not ideal
        } else {
            perror("send failed");
            break;
        }
    }
    return NULL;
}

int main() {
    // Setup non-blocking socket (simplified)
    sockfd = socket(AF_INET, SOCK_STREAM, 0);
    fcntl(sockfd, F_SETFL, O_NONBLOCK);

    // Assume connection is established; in real code, handle connect with non-blocking
    struct sockaddr_in addr = {0};
    addr.sin_family = AF_INET;
    addr.sin_port = htons(PORT);
    inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);
    connect(sockfd, (struct sockaddr*)&addr, sizeof(addr));

    // Two threads writing different messages
    pthread_t t1, t2;
    char msg1[] = "Hello from thread 1\n";
    char msg2[] = "Hello from thread 2\n";

    pthread_create(&t1, NULL, writer, msg1);
    pthread_create(&t2, NULL, writer, msg2);

    pthread_join(t1, NULL);
    pthread_join(t2, NULL);

    close(sockfd);
    return 0;
}

In this code, if both threads write simultaneously, the received data on the server may interleave (e.g., “Hello from thre” + “ead 1\n” + “Hello from thre” + “ead 2\n”) depending on timing and partial writes. This demonstrates that kernel serialization is not guaranteed.

How Senior Engineers Fix It

Senior engineers address this by applying explicit synchronization to ensure atomic writes and ordered data:

Use a mutex or semaphore to guard the socket or a shared send buffer. Each thread acquires the lock, composes the full message, and calls send() without yielding the lock mid-write.
Thread-specific buffers: Design each thread to write to its own buffer, then use a single dedicated writer thread to drain buffers to the socket. This decouples concurrency from I/O.
Connection per thread: Instead of sharing a socket, create one socket per thread (if architecture allows). This avoids contention entirely and leverages TCP’s own flow control.
Higher-level abstractions: Use libraries like libevent or Boost.Asio with io_uring for managed concurrency, or message queues (e.g., ZeroMQ) to handle serialization.
Non-blocking with poll/epoll: For high-performance servers, use an event loop (epoll) with a single thread managing multiple sockets, avoiding multi-threaded writes to the same socket.

Key takeaway: Never assume kernel serialization—design for explicit coordination in the application layer.

Why Juniors Miss It

Juniors often overlook these nuances due to:

Misinterpretation of “atomic”: Confusing per-syscall atomicity with cross-thread safety. They may think send() alone prevents interleaving.
Testing in low-load scenarios: In development, with low traffic, partial writes or interleaving rarely manifest, giving a false sense of security.
Over-reliance on non-blocking mode: Assuming non-blocking sockets have built-in concurrency controls, while they only change blocking behavior.
Lack of deep OS knowledge: Not understanding that TCP streams are unbounded and kernel buffers don’t provide message boundaries.
Premature optimization: Sharing sockets for performance without implementing locks, leading to race conditions in production.

Key takeaway: Always verify threading guarantees with load testing and consult POSIX/Linux man pages (e.g., send(2)) for precise semantics.