Why ARM Cortex-M RTOS Context Switches Fail and How to Fix Them

Summary

A custom RTOS context switch implementation on an ARM Cortex-M (STM32) failed to execute task switching, resulting in a HardFault or system hang. The failure stemmed from incorrect manual stack frame initialization and improper usage of ARM Inline Assembly within a PendSV_Handler. While the logic for swapping pointers was present, the state of the CPU registers and the alignment of the stack pointer did not meet the strict hardware requirements for an exception return.

Root Cause

The failure was caused by three primary technical oversights:

  • Incomplete Stack Frame Initialization: When manually creating a stack for a new task, the developer failed to account for the Hardware-Saved Stack Frame. On Cortex-M, when an exception occurs, the hardware automatically pushes xPSR, PC, LR, R12, R3, R2, R1, and R0 onto the stack. The code only initialized xPSR and PC, leaving the rest of the frame containing garbage data.
  • Stack Pointer Misalignment: ARM Cortex-M requires the stack to be 8-byte aligned at exception entry. The manual decrementing of the stack pointer (sp_Blink_1--) was not strictly maintaining this alignment, leading to undefined behavior during LDMIA or STMDB instructions.
  • Inline Assembly Constraint Violations: The use of the "m" constraint in the __ASM block for complex pointer arithmetic and multi-step register loading caused the compiler to generate unpredictable code, often failing to correctly map the C variables to the expected registers during the high-pressure context switch.

Why This Happens in Real Systems

In embedded systems, this type of failure is common when moving from Bare-Metal programming to Real-Time Operating Systems (RTOS).

  • Implicit Hardware Behavior: Developers often treat the CPU as a simple execution engine, forgetting that the hardware performs invisible “magic” (like auto-stacking) during interrupts.
  • Compiler Optimization Aggression: Modern compilers (like GCC or ARMCC) do not guarantee that a variable stays in a specific memory location or register unless explicitly told. In an inline assembly block, if the Clobber List is not perfectly defined, the compiler might overwrite a register that the assembly code is currently using to store a vital pointer.
  • Memory Corruption Silently Cascading: A stack misalignment might not crash the system immediately. It may work for several cycles until a specific instruction (like a floating-point operation or a 64-bit load) requires strict alignment, causing a delayed HardFault that is difficult to trace back to the original context switch.

Real-World Impact

  • System Non-Determinism: The most dangerous impact is a system that works 99% of the time but crashes under specific timing conditions.
  • HardFault Loops: The processor enters a loop of exception handling where it cannot recover, effectively bricking the device until a hardware reset.
  • Heisenbugs: Because the bug is tied to the exact state of the stack and register file, adding printf or debugger breakpoints changes the timing and memory layout, making the bug “disappear” during investigation.

Example or Code (if necessary and relevant)

/* Correcting the Stack Initialization for Cortex-M */
void Initialize_Task_Stack(uint32_t *stack_top, void (*task_func)(void)) {
    // 1. Hardware-saved registers (must be 8-byte aligned)
    stack_top[7] = 0x01000000;        // xPSR (Thumb bit must be set!)
    stack_top[6] = (uint32_t)task_func; // PC (Program Counter)
    stack_top[5] = 0xFFFFFFFD;        // LR (Link Register - Return to Thread mode, use PSP)
    stack_top[4] = 0;                 // R12
    stack_top[3] = 0;                 // R3
    stack_top[2] = 0;                 // R2
    stack_top[1] = 0;                 // R1
    stack_top[0] = 0;                 // R0

    // 2. Software-saved registers (R4-R11)
    // These are the registers we manually push/pop in PendSV
    // We need 8 slots for R4-R11.
    // The new SP will point to the start of this block.
}

How Senior Engineers Fix It

  • Isolate Assembly: Senior engineers rarely use complex inline assembly (__ASM volatile). Instead, they write context switch logic in a separate .s (Assembly) file. This prevents the C compiler from interfering with register allocations and allows for much cleaner debugging.
  • Strict Alignment Enforcement: They use __attribute__((aligned(8))) on all stack arrays to ensure the hardware requirement is met at the definition level.
  • Verification via Trace: Instead of stepping through code manually (which is nearly impossible during a context switch), they use Instruction Trace (ETM/ITM) or Stack Watermarking to observe how the stack pointer moves over time.
  • Formal Stack Frame Modeling: They create a struct that mirrors the hardware stack frame to ensure that the offsets for R0, PC, etc., are always mathematically correct and readable.

Why Juniors Miss It

  • Focus on Logic, Not Hardware: Juniors focus on “Does my pointer swap work?” whereas senior engineers ask “Does the hardware state match the pointer?”
  • Misunderstanding the Stack: There is a common misconception that the stack is just a piece of memory; they miss the fact that the CPU architecture dictates the layout of that memory during exceptions.
  • The “Black Box” Trap: Juniors often treat the PendSV_Handler as a black box, assuming that if the code looks like the textbook example, it must work. They fail to realize that compiler-generated prologue/epilogue code can invalidate their manual assembly assumptions.

Leave a Comment