Handling Zero‑Size Resize Crashes in Vulkan GLFW X11 Applications

Summary

A graphics application using Vulkan, GLFW, and the X11 windowing system experienced progressive instability during window resizing. The failure pattern began with intermittent failed_create_swapchain errors, eventually escalating to a critical X11 BadValue error, which caused the windowing system to crash the application process. The root cause is a failure to handle the Zero-Extent state (minimization or rapid resizing) and a resource leak involving the Old Swapchain mechanism.

Root Cause

The failure is driven by three interlocking issues:

  • Zero-Dimension Extents: During rapid resizing or minimization, the window dimensions can momentarily become 0x0. Passing a width or height of zero to a swapchain creation function is invalid under the Vulkan specification and causes the driver to return an error.
  • Improper Old Swapchain Management: The code attempts to use swapchainBuilder.set_old_swapchain(swapChain). While this is a valid optimization, if the old_swapchain is passed but the creation of the new one fails, the application enters an undefined state where the previous swapchain might be destroyed prematurely or left in a “zombie” state.
  • Resource Leak/Race Condition: The error log shows Failed to create swapchain followed by the destruction of the existing swapchain. However, when X11 throws a BadValue error, it indicates that the application sent a command to the X Server (likely via GLFW or the Vulkan surface extension) with parameters that are out of bounds—specifically, attempting to create a surface with an invalid size or an invalid handle that was already invalidated by a failed previous attempt.

Why This Happens in Real Systems

In production environments, hardware and software interfaces are asynchronous and non-atomic.

  • Event Buffering: OS window managers (like KDE/X11) buffer resize events. The application might receive a “resize to 0” event that the user never intended, but the software must handle it.
  • Driver State Machines: Graphics drivers maintain complex internal states. If an application fails to clean up a VkSwapchainKHR correctly before attempting to create a new one with the “old swapchain” flag, the driver’s internal state machine can become desynchronized.
  • Window Manager Latency: There is a temporal gap between the window being resized and the surface capabilities being updated. If the application queries capabilities and immediately acts on them without checking for zero-values, it hits a race condition.

Real-World Impact

  • Application Crashes: Instead of a smooth resize, the end-user experiences a complete desktop crash or an immediate “Application has stopped working” dialog.
  • System Instability: In extreme cases involving X11, invalid opcodes or BadValue errors can lead to ghost windows or hanging window managers, forcing a user to restart their entire desktop session.
  • Degraded UX: Intermittent failures during resizing make the software feel “unpolished” and unreliable.

Example or Code (if necessary and relevant)

// The flawed logic in the provided snippet
void VulkanWindow::createSwapChain(uint32_t &width, uint32_t &height) {
    // ... (fetching capabilities)

    // BUG: If width or height is 0 (minimization), this proceeds to invalid creation
    width = capabilities.maxImageExtent.width;
    height = capabilities.maxImageExtent.height;

    // BUG: Passing an old swapchain that might already be invalid/destroyed
    if (swapChain != VK_NULL_HANDLE) {
        swapchainBuilder.set_old_swapchain(swapChain);
    }

    auto swapchainResult = swapchainBuilder.build();

    if (!swapchainResult.has_value()) {
        // BUG: Destroying the old swapchain here without ensuring the 
        // new one actually exists can lead to losing the only valid surface handle.
        if (swapChain) vkDestroySwapchainKHR(nri.getDevice(), swapChain, nullptr);
        // ...
    }
}

How Senior Engineers Fix It

A senior engineer implements defensive programming and state validation:

  • Zero-Extent Guard: Immediately check if width or height is zero. If so, the application should pause rendering and wait for the next frame/resize event rather than attempting to create a swapchain.
  • Atomic Swapchain Re-creation: Ensure the old swapchain is only destroyed after the new swapchain is successfully created and validated.
  • Strict Capability Validation: Instead of blindly using maxImageExtent, ensure the requested dimensions fall within the minImageExtent and maxImageExtent provided by the hardware.
  • Error Recovery Path: If swapchain creation fails, implement a fallback that clears all associated image views and framebuffers before attempting a retry, rather than just trying to reuse the broken state.

Why Juniors Miss It

  • Happy Path Bias: Juniors often write code assuming the window will always have a valid, positive size. They test by dragging the window corner, not by minimizing it or resizing it at extreme speeds.
  • API Misunderstanding: They see set_old_swapchain as a “magic” optimization provided by the API and don’t realize that it increases the complexity of the error-handling state machine.
  • Ignoring Edge Cases: They treat VK_ERROR_OUT_OF_DATE_KHR or failed_create_swapchain as fatal errors rather than expected lifecycle events that require specific cleanup protocols.

Leave a Comment