Sharing peripheral between two MCUs

Summary

We encountered a recurring field failure where temperature readings from a single DS18B20 1-Wire sensor were corrupted or unavailable when shared between two NXP MCUs using a manual software arbitration scheme (UART handshake). The root cause was a lack of hardware arbitration and synchronization on the shared 1-Wire bus. The failure mode manifested as bus contention, leading to data corruption and sporadic sensor lockups. The incident resulted in a 24-hour downtime for the thermal monitoring system. The permanent fix involved implementing a hardware multiplexer controlled by a mutex logic, ensuring exclusive bus access.

Root Cause

The primary failure was the naive implementation of a shared peripheral via a software-only handshake between two masters. The system utilized two NXP LPC series MCUs connected to a single DS18B20 temperature sensor. The MCUs communicated via UART to decide “who goes next,” but the physical 1-Wire bus lacked electrical isolation or arbitration.

  • Bus Contention: When both MCUs attempted to drive the 1-Wire bus low simultaneously (during the initialization or write slots), the total current draw exceeded the sink capability of the sensors, causing voltage droop and logic level interpretation errors.
  • Race Conditions: The UART handshake introduced latency. If MCU A sent a “request” but MCU B had already initiated a reset sequence, both would drive the line, leading to a deadlock or “floating bus” state.
  • Lack of Open-Drain Management: The MCUs were configured with push-pull outputs for the shared GPIO. Strict open-drain configuration with pull-up resistors is mandatory for multi-master 1-Wire topologies, but this was missing.

Why This Happens in Real Systems

This specific failure pattern is pervasive in embedded systems engineering, particularly in distributed embedded architectures. It arises from the assumption that software coordination is sufficient to manage physical hardware resources.

  • Resource Constraints: In cost-sensitive products, adding multiplexers (like the 74HC4051) or bus switches (like the TS5A3357) adds BOM cost and board space, leading engineers to attempt software-only solutions.
  • Over-reliance on AI/LLM Code Generation: As noted in the input, modern searches often yield “hallucinated” schematics or non-standard solutions. LLMs frequently generate code for I2C arbitration without accounting for the specific physical layer constraints of protocols like 1-Wire.
  • Protocol Assumptions: I2C has built-in arbitration (clock stretching, start/stop conditions), but 1-Wire and UART do not. Engineers often port strategies valid for I2C to 1-Wire, resulting in failure.
  • Asymmetric Boot Sequences: If one MCU resets faster or enters a bootloader, it may unintentionally lock the shared bus, preventing the other MCU from initializing its peripherals.

Real-World Impact

The impact of this design flaw extends beyond simple read errors. In a production environment involving thermal monitoring for NXP-based industrial controllers, the consequences are severe.

  • System Instability: The temperature sensor feeds data to a PID loop controlling a fan array. Corrupted data caused the PID loop to oscillate, leading to overheating and eventual thermal shutdown of the MCUs.
  • Increased Debugging Time: Symptoms were intermittent and non-deterministic, making reproduction difficult. Without a logic analyzer, the UART handshake appeared to work, obscuring the physical layer issue.
  • Hardware Degradation: Continuous bus contention stressed the GPIO drivers of both MCUs, potentially shortening the lifespan of the I/O pins due to excessive current sinking.
  • Missed SLAs: The downtime required a physical power cycle to reset the sensors, violating the 99.9% uptime SLA for the monitoring infrastructure.

Example or Code

The following C code demonstrates the flawed software arbitration used in the original firmware. This logic runs on both MCUs (e.g., NXP LPC800 series). It attempts to negotiate access via UART before touching the GPIO, but it fails to prevent physical bus contention.

#include 
#include "board.h"

// Mock definitions for the scenario
#define SHARED_PIN_PORT 0
#define SHARED_PIN_NUM  12
#define UART_PORT       0

// Shared resource state
volatile uint8_t bus_owner = 0; // 0: None, 1: MCU_A, 2: MCU_B

void delay_us(uint32_t us) {
    // Placeholder for timer delay
}

// Simulated 1-Wire reset pulse
void one_wire_reset() {
    // CRITICAL FLAW: Push-pull drive without checking other master
    Chip_GPIO_SetPinOutLow(LPC_GPIO, SHARED_PIN_PORT, SHARED_PIN_NUM);
    delay_us(480); // Drive low for reset
    Chip_GPIO_SetPinOutHigh(LPC_GPIO, SHARED_PIN_PORT, SHARED_PIN_NUM);
    delay_us(100); // Release and wait for presence pulse
}

// Software arbitration logic
void acquire_temperature_data() {
    uint8_t my_id = 1; // Assume MCU A

    // 1. Send request via UART to other MCU
    Chip_UART_SendByte(LPC_USART, (my_id << 4) | 0x01); // Request bus

    // 2. Wait for grant (naive implementation - no timeout!)
    while (1) {
        if (Chip_UART_ReadByte(LPC_USART) == (my_id << 4) | 0x02) {
            bus_owner = my_id;
            break; 
        }
    }

    // 3. Perform 1-Wire operations
    // PROBLEM: If MCU B is in bootloader or hung, it holds the line, 
    // or if timing aligns, both drive the line.
    one_wire_reset(); 

    // 4. Release bus
    bus_owner = 0;
    Chip_UART_SendByte(LPC_USART, (my_id << 4) | 0x03); // Release bus
}

How Senior Engineers Fix It

Senior engineers approach this by enforcing hardware abstraction and physical layer isolation. Software arbitration is treated as a fallback, not a primary mechanism.

  • Hardware Multiplexing: The industry-standard solution is a bus switch or multiplexer (e.g., 74HC4051 or TS5A3166). The common 1-Wire line is routed through the switch. A GPIO pin from the “master” MCU controls the enable line. Only the active MCU enables the path to the sensor; the other MCU keeps its pin in high-impedance (input) mode.
  • Tri-State Buffering: If a multiplexer is too costly, senior engineers use tri-state buffers. Each MCU drives a buffer input, and the buffer enable is controlled by the opposing MCU’s signal or a dedicated arbitration logic.
  • Protocol Enforcement: If software arbitration must be used, implement a robust token-passing protocol with strict timeouts and a hardware watchdog. If the token isn’t released within $X$ milliseconds, the system forces a reset of the shared bus line via a dedicated “kill” line.
  • Open-Drain Configuration: Ensure all shared GPIOs are configured as Open-Drain (or Wired-AND) with a strong external pull-up resistor (typically 4.7kΩ for 1-Wire). This allows multiple MCUs to pull the line low without fighting each other’s drivers.

Why Juniors Miss It

Junior engineers often overlook this issue due to a lack of experience with multi-master physical layer realities.

  • Logic-Level Thinking: Juniors often visualize circuits as ideal logic levels (0s and 1s) rather than physical electrical signals. They forget that two pins trying to drive “High” and “Low” simultaneously creates a short circuit.
  • Ignoring Latency: In the example code, the UART handshake seems immediate. Juniors often fail to account for interrupt latency or task switching, which can cause the two MCUs to be “out of sync” by milliseconds—enough to cause a collision.
  • Overconfidence in Digital Protocols: There is a misconception that all serial protocols handle multi-master arbitration gracefully. 1-Wire is inherently single-master unless hardware support is added.
  • Simulation vs. Reality: Testing in a simulator (like Proteus) rarely shows bus contention failures because the simulator models ideal logic. The failure only appears on real hardware with capacitance and timing skew.