Volatile Variable Coherence

Summary

This postmortem analyzes a subtle concurrency anomaly involving a volatile variable in C++ on x86 and ARM architectures. The core question: Can two threads each write a value to a volatile variable and then fail to observe their own writes—resulting in printing neither “1” nor “2”?
Short answer: Yes, this can happen, because volatile does not guarantee atomicity, ordering, or coherence across threads.

Root Cause

The failure arises from misinterpreting what volatile means in C++.
volatile only prevents certain compiler optimizations. It does not provide:

  • Atomicity
  • Visibility guarantees across threads
  • Ordering constraints
  • Cache coherence semantics
  • Happens-before relationships

On both x86 and ARM, threads may:

  • Read stale values
  • Reorder loads/stores (especially on ARM)
  • Observe writes out of order
  • Race on the same memory location without synchronization

Thus, both threads can legally:

  • Write their value
  • Immediately read a stale cached value (e.g., still 0)
  • Skip printing entirely

Why This Happens in Real Systems

Real-world CPUs and compilers aggressively optimize memory operations. Without synchronization primitives, the following occur:

  • Store buffers delay visibility of writes
  • Load speculation reads old values
  • Weak memory ordering (ARM) allows reordering of loads/stores
  • Compiler reordering is allowed because volatile is not a synchronization primitive
  • Data races produce undefined behavior in C++

In short: volatile is not a concurrency tool.

Real-World Impact

Systems relying on volatile for synchronization often experience:

  • Lost updates
  • Missed signals
  • Spurious failures
  • Heisenbugs that disappear under debugging
  • Architecture-dependent behavior (works on x86, breaks on ARM)

These failures are notoriously hard to reproduce and diagnose.

Example or Code (if necessary and relevant)

Below is a minimal example showing the problematic pattern:

volatile int i = 0;

void thread1() {
    i = 1;
    if (i == 1) printf("1\n");
}

void thread2() {
    i = 2;
    if (i == 2) printf("2\n");
}

This code has a data race, making the behavior undefined.
Undefined behavior includes:

  • Only “1” printed
  • Only “2” printed
  • Both printed
  • Neither printed
  • Corrupted values
  • Anything else the machine feels like doing

How Senior Engineers Fix It

Experienced engineers avoid volatile for synchronization entirely. They use:

  • std::atomic
  • Memory order constraints
  • Mutexes or condition variables
  • Fences when absolutely necessary

Correct version:

std::atomic i{0};

void thread1() {
    i.store(1, std::memory_order_release);
    if (i.load(std::memory_order_acquire) == 1) printf("1\n");
}

void thread2() {
    i.store(2, std::memory_order_release);
    if (i.load(std::memory_order_acquire) == 2) printf("2\n");
}

This guarantees:

  • Coherence
  • Visibility
  • Ordering
  • Well-defined behavior

Why Juniors Miss It

Junior developers often assume:

  • Volatile means “thread-safe” (it does not)
  • x86’s strong memory model applies everywhere (ARM breaks this assumption)
  • Visibility and ordering are automatic (they are not)
  • Data races are harmless (they are undefined behavior)
  • The compiler won’t reorder things that “look sequential” (it will)

They also tend to test only on x86, where the strong memory model hides many bugs until the code runs on ARM or in production.


If you want, I can extend this into a full internal postmortem template or add diagrams explaining the memory model.

Leave a Comment