Volatile Variable Coherence

Summary

This postmortem analyzes a subtle concurrency anomaly involving a volatile variable in C++ on x86 and ARM architectures. The core question: Can two threads each write a value to a volatile variable and then fail to observe their own writes—resulting in printing neither “1” nor “2”?
Short answer: Yes, this can happen, because volatile does not guarantee atomicity, ordering, or coherence across threads.

Root Cause

The failure arises from misinterpreting what volatile means in C++.
volatile only prevents certain compiler optimizations. It does not provide:

Atomicity
Visibility guarantees across threads
Ordering constraints
Cache coherence semantics
Happens-before relationships

On both x86 and ARM, threads may:

Read stale values
Reorder loads/stores (especially on ARM)
Observe writes out of order
Race on the same memory location without synchronization

Thus, both threads can legally:

Write their value
Immediately read a stale cached value (e.g., still 0)
Skip printing entirely

Why This Happens in Real Systems

Real-world CPUs and compilers aggressively optimize memory operations. Without synchronization primitives, the following occur:

Store buffers delay visibility of writes
Load speculation reads old values
Weak memory ordering (ARM) allows reordering of loads/stores
Compiler reordering is allowed because volatile is not a synchronization primitive
Data races produce undefined behavior in C++

In short: volatile is not a concurrency tool.

Real-World Impact

Systems relying on volatile for synchronization often experience:

Lost updates
Missed signals
Spurious failures
Heisenbugs that disappear under debugging
Architecture-dependent behavior (works on x86, breaks on ARM)

These failures are notoriously hard to reproduce and diagnose.

Example or Code (if necessary and relevant)

Below is a minimal example showing the problematic pattern:

volatile int i = 0;

void thread1() {
    i = 1;
    if (i == 1) printf("1\n");
}

void thread2() {
    i = 2;
    if (i == 2) printf("2\n");
}

This code has a data race, making the behavior undefined.
Undefined behavior includes:

Only “1” printed
Only “2” printed
Both printed
Neither printed
Corrupted values
Anything else the machine feels like doing

How Senior Engineers Fix It

Experienced engineers avoid volatile for synchronization entirely. They use:

std::atomic
Memory order constraints
Mutexes or condition variables
Fences when absolutely necessary

Correct version:

std::atomic i{0};

void thread1() {
    i.store(1, std::memory_order_release);
    if (i.load(std::memory_order_acquire) == 1) printf("1\n");
}

void thread2() {
    i.store(2, std::memory_order_release);
    if (i.load(std::memory_order_acquire) == 2) printf("2\n");
}

This guarantees:

Coherence
Visibility
Ordering
Well-defined behavior

Why Juniors Miss It

Junior developers often assume:

Volatile means “thread-safe” (it does not)
x86’s strong memory model applies everywhere (ARM breaks this assumption)
Visibility and ordering are automatic (they are not)
Data races are harmless (they are undefined behavior)
The compiler won’t reorder things that “look sequential” (it will)

They also tend to test only on x86, where the strong memory model hides many bugs until the code runs on ARM or in production.

If you want, I can extend this into a full internal postmortem template or add diagrams explaining the memory model.