Summary
The issue at hand is related to PCIe bus enumeration on a bare metal system using a Cix Sky1 SoC. When attempting to read the Vendor ID of a PCI-to-PCI bridge with no devices plugged in, the CPU locks up. This behavior is not observed on x86 PCIe buses or in QEMU. The goal is to determine if there are any necessary steps to take before reading the configuration space to detect the presence of a device.
Root Cause
The root cause of this issue is likely due to the specifics of the PCIe host bridge implementation on the Cix Sky1 SoC. Some possible causes include:
- Configuration request stalling: The PCIe spec mentions that configuration requests may be stalled if a device takes a long time to initialize.
- Device Tree limitations: The Device Tree provided by the firmware may not contain all the necessary information for proper PCIe enumeration.
- Lack of PCIe capabilities: The absence of overarching root bus and PCIe capabilities, such as CRS Software Visibility, may contribute to the issue.
Why This Happens in Real Systems
This issue occurs in real systems due to the complexity of PCIe bus enumeration and the variety of hardware implementations. Some key factors include:
- Hardware-specific quirks: Different SoCs and PCIe host bridges may have unique requirements or behaviors.
- Firmware and software interactions: The interaction between the firmware, Device Tree, and software can affect PCIe enumeration.
- Lack of standardization: The PCIe spec may not cover all possible scenarios, leading to inconsistencies across different systems.
Real-World Impact
The real-world impact of this issue includes:
- System crashes: The CPU lockup can cause system crashes or freezes.
- Enumeration failures: The inability to properly enumerate PCIe devices can lead to device detection failures and system instability.
- Development challenges: The complexity of debugging and resolving this issue can slow down development and increase costs.
Example or Code
void checkDevice(uint8_t bus, uint8_t device) {
uint8_t function = 0;
uint16_t vendorID = getVendorID(bus, device, function);
if (vendorID == 0xFFFF) return; // Device doesn't exist
// ... device does exist, enumerate it
}
How Senior Engineers Fix It
Senior engineers can fix this issue by:
- Consulting hardware documentation: Thoroughly reviewing the SoC and PCIe host bridge documentation to understand specific requirements.
- Analyzing firmware and software interactions: Examining the firmware, Device Tree, and software interactions to identify potential issues.
- Implementing workarounds or patches: Developing workarounds or patches to address hardware-specific quirks or limitations.
- Using alternative enumeration methods: Exploring alternative enumeration methods, such as using ACPI tables instead of Device Tree.
Why Juniors Miss It
Junior engineers may miss this issue due to:
- Lack of experience with PCIe bus enumeration: Inadequate understanding of the complexities involved in PCIe bus enumeration.
- Insufficient knowledge of hardware-specific quirks: Limited familiarity with the specific SoC and PCIe host bridge being used.
- Overreliance on standard documentation: Relying too heavily on standard documentation, such as the PCIe spec, without considering hardware-specific requirements.
- Inadequate testing and debugging: Failing to thoroughly test and debug the system, leading to overlooked issues.