Stack Corruption in 16-Bit Bootloaders: Pusha/Popa BIOS Interrupts

Summary

During the development of a 16-bit bootloader, a critical bug was identified where the print function failed to output the intended string, instead printing only a single character (or truncated output) depending on the state of the stack. The issue was traced back to stack corruption caused by the improper use of pusha and popa in conjunction with a specific interrupt behavior in Real Mode. This postmortem explores why “correct-looking” assembly code fails in low-level environments.

Root Cause

The root cause is a stack pointer misalignment and register corruption during the execution of the BIOS interrupt int 0x10.

The pusha/popa Trap: The developer used pusha to save all general-purpose registers before the loop and popa to restore them.
Stack Pointer Shift: The mov sp, 0x7C00 instruction sets the stack to grow downwards from the bootloader start address.
Interrupt Side Effects: BIOS interrupts like int 0x10 (Teletype output) often manipulate the stack or rely on specific register states.
The “S” Symptom: When popa is called, if the stack pointer (sp) has been modified by the BIOS or if the stack was not perfectly aligned with the pushed values, the registers are restored with garbage data. Specifically, the si register (which holds the string pointer) gets overwritten with a value that points to a memory location containing only the character ‘s’ or an invalid address, causing the loop to terminate prematurely.

Why This Happens in Real Systems

In modern operating systems, the stack is managed by a robust kernel with guard pages and strict abstractions. In Real Mode (16-bit), you are operating in a “wild west” environment:

No Memory Protection: Any instruction can overwrite any memory address, including the stack.
BIOS Non-Determinism: BIOS interrupts are written in assembly and are not guaranteed to be “transparent.” They may use the stack for their own internal logic, potentially clobbering values if the stack is too close to the code or data segments.
Segment/Offset Confusion: In 16-bit mode, memory is addressed via segment:offset. If ds or es are modified incorrectly during an interrupt, the lodsb instruction (which uses ds:si) will fetch data from the wrong physical memory location.

Real-World Impact

Silent Failures: The code does not crash with a “Segmentation Fault”; it simply produces incorrect output, making it incredibly difficult to debug without a hardware debugger or QEMU logs.
Unpredictable Bootstrapping: A bootloader that prints the wrong characters might appear to work on one machine but fail on another due to different BIOS implementations, leading to non-portable software.
System Instability: If the stack pointer is corrupted, the next ret instruction will jump to a random memory address, leading to a complete system hang or a reboot loop.

Example or Code (if necessary and relevant)

The faulty logic involves saving the entire state when only specific registers are needed, creating a dependency on the stack integrity.

print:
    pusha           ; Save ALL registers (AX, CX, DX, BX, SP, BP, SI, DI)
    mov ah, 0x0E    ; BIOS Teletype function
.loop:
    lodsb           ; Loads [ds:si] into AL and increments SI
    test al, al     ; Check for null terminator
    jz .done
    int 0x10        ; Call BIOS
    jmp .loop
.done:
    popa            ; RESTORES registers, but SI might be corrupted if stack shifted
    ret

How Senior Engineers Fix It

Senior engineers apply the Principle of Least Privilege to register management. Instead of saving everything, only save what is strictly necessary.

Selective Pushing: Instead of pusha, only push si and push di (or whatever is actually being used). This reduces the “surface area” for stack corruption.
Stack Isolation: Ensure the stack is placed in a known-safe memory region (like 0x7C00 or higher) and verify that the data segment (ds) is explicitly set to avoid lodsb fetching from the wrong segment.
Explicit Register Management: Avoid relying on the side effects of popa. If you need si to remain constant for a caller, manually save and restore it.

Corrected Implementation Pattern:

print:
    push ax         ; Only save what we actually change
    push dx
    push si
    mov ah, 0x0E
.loop:
    lodsb
    test al, al
    jz .done
    int 0x10
    jmp .loop
.done:
    pop si          ; Restore SI specifically
    pop dx
    pop ax
    ret

Why Juniors Miss It

Over-reliance on “Safe” Patterns: Juniors often see pusha/popa as a “magic bullet” to make functions re-entrant or safe, not realizing it creates a heavy dependency on the stack’s integrity.
Abstracted Thinking: Most programmers are used to high-level languages where the stack is an abstraction. They struggle to visualize the physical movement of the Stack Pointer during a BIOS interrupt.
Debugging Tool Gap: Juniors often debug using printf or high-level debuggers. In bootloader development, you must debug using register inspection (e.g., gdb with QEMU), which requires understanding exactly what si holds at every single clock cycle.