Summary
The core issue arises from misunderstanding the Provisioning Protocol state machine when using Amazon FreeRTOS on ESP32. The SaveNetworkReq command is intended to persist network credentials for an existing, scanned network identified by a unique index and BSSID. It is not a mechanism to create a new network entry from a fully manual entry where the network was not previously discovered via a scan.
Attempting to craft a “manual” SaveNetworkReq by guessing indices or filling fields with zeros leads to protocol validation failure or state machine corruption. This results in either an immediate rejection of the request or a silent failure where the device connects (if the radio stack accepts the credentials directly) but the Provisioning Service fails to update its state or notify the application, causing the mobile app to hang waiting for a response that will never arrive.
Root Cause
The root cause is a mismatch between the Provisioning Protocol expectations and the application’s manual input strategy.
- Strict Validation: The
ProvisioningServiceon the ESP32 validates incomingSaveNetworkReqmessages against the currently cached list of scanned networks. It expects theindexfield to correspond to a valid entry in thewifiScanList. - Hardcoded Dependencies: The internal logic likely couples the
indexto a specificbssidandssid. When you inject a manual SSID with an index that is out of bounds (e.g., index 0 if the scan list was empty) or a BSSID of all zeros, the validation check fails. - Asynchronous Disconnect: The
NetworkTXcharacteristic is the designated channel for Acknowledgments (ACKs) and Status Updates. If the request fails validation silently (internal error), theNetworkTXwrite callback is never triggered, leaving the mobile app waiting indefinitely.
Why This Happens in Real Systems
In real-world IoT provisioning, the security model relies on verified discovery.
- BSSID Validation: To prevent connecting to “Evil Twin” access points, the provisioning service often locks the configuration to the specific BSSID (MAC address) scanned earlier.
- List-Based Indexing: Embedded systems often use fixed-size arrays for scan results. Manually injecting a network requires a dynamic
pushoperation to this array. The default Amazon FreeRTOS provisioning logic is typically read-only regarding this list; it expects you to select from it, not modify it. - Race Conditions: If you send a malformed request that partially triggers the Wi-Fi stack (e.g.,
wifi_connectAPI) but fails the provisioning layer, the device connects but falls back to a default “connected” state without the provisioner knowing. This breaks the idempotency of the provisioning flow.
Real-World Impact
- Poor User Experience: The mobile app hangs or displays a “Connection Failed” error after the device has actually successfully connected to the Wi-Fi.
- State Desynchronization: The device is online, but the companion app thinks it is still provisioning. This often leads to the user force-closing the app and restarting the process unnecessarily.
- Edge Case Blindness: This bug specifically affects enterprise or hidden SSID environments, which are high-value deployment scenarios for IoT devices.
- Debugging Difficulty: Because the Wi-Fi connects (layer 1/3) but the app doesn’t get notified (application layer), it looks like a Bluetooth/BLE GATT error rather than a Wi-Fi provisioning logic error.
Example or Code
To support manual SSID entry in Amazon FreeRTOS, you cannot rely on the default SaveNetworkReq used for scanned lists. You must implement a custom handling logic or use the provisioning library’s SaveNetwork function directly with a populated configuration structure, bypassing the index validation.
Here is how the structure should be populated for a manual entry, assuming you are calling the internal API or handling the payload manually:
/*
PROVISIONING_APP_NETWORK_DATA structure example.
This is what the payload must map to.
*/
WIFINetworkProfile_t manualNetworkProfile;
/* 1. SSID: Copy the manual string */
memcpy(manualNetworkProfile.ssid, "MyHiddenNetwork", strlen("MyHiddenNetwork"));
manualNetworkProfile.ssidLength = strlen("MyHiddenNetwork");
/* 2. Password: Copy the password */
memcpy(manualNetworkProfile.password, "MySecretPassword", strlen("MySecretPassword"));
manualNetworkProfile.passwordLength = strlen("MySecretPassword");
/* 3. Security Type: CRITICAL. Must match the actual network security (WPA2, WPA3, etc.) */
manualNetworkProfile.security = eWiFiSecurityWPA2;
/* 4. BSSID: Set to null/00:00:00:00:00:00 if unknown, but the provisioning library
handling logic must support "Unknown BSSID" mode. */
memset(manualNetworkProfile.bssid, 0, 6);
/* 5. Index: If calling SaveNetwork directly, you might pass an index,
but if the library expects a scan list, this is the point of failure. */
Protocol Level Payload (If manually encoding the GATT characteristic):
If you are manually constructing the byte stream for the NetworkTX characteristic, the standard structure usually looks like this (simplified):
// Conceptual representation of the byte buffer for SaveNetworkReq
// [Index (1 byte)] [SSID Len (1 byte)] [SSID (N bytes)] [BSSID (6 bytes)] [Sec Type (1 byte)] [Pass Len (1 byte)] [Pass (M bytes)]
// INCORRECT Manual Attempt (Causes the failure):
// Index: 1 (Assuming one item, but list is empty/unknown)
// SSID: "Hidden"
// BSSID: 00:00:00:00:00:00
// Result: Validation Error (Index out of bounds or BSSID mismatch)
// CORRECT Logic (If supported by the specific library version):
// The library must support a "Manual Entry" opcode or a specific Index value (e.g., 0xFF)
// that signals "Parse following fields as raw data, not index lookup".
How Senior Engineers Fix It
Senior engineers do not try to trick the provisioner. They modify the firmware behavior or the protocol implementation to explicitly support manual entry.
- Modify the Provisioning Service: Extend the
ProvisioningServiceto handle a specific “Magic Index” (e.g.,0xFE). If theSaveNetworkReqcontains this index, the firmware skips the “Lookup in Scan List” step and proceeds directly to “Save to Flash”. - Direct API Injection: Instead of sending the data over BLE and letting the provisioning task handle it, the application task can receive the manual SSID/Pass via a custom characteristic, and then directly call
WIFI_Connect()or the underlyingprovisioning_save_network()function. - UUID Routing: Create a separate Custom GATT Characteristic (e.g.,
ManualConfigUUID). Write the SSID/Pass to this. The firmware parses this, fills theWIFINetworkProfile_tstruct, and saves it to slot 0 or the next available index in the NVS (Non-Volatile Storage).
Recommended Fix Strategy:
Intercept the manual entry at the application layer. Do not try to send a fake SaveNetworkReq. Instead, implement a custom BLE handler that accepts SSID and Password separately, constructs the WIFINetworkProfile_t struct locally on the ESP32, and calls the provisioning API to save it.
Why Juniors Miss It
- Treats the Protocol as a Black Box: Juniors see
SaveNetworkReqin the documentation and assume it’s a generic “Save any network” command. They fail to read the fine print regarding the Index and BSSID requirements. - Focus on Wi-Fi, Ignore Provisioning State: They assume that if the Wi-Fi connects, the request was valid. They do not realize the Provisioning Service is a separate layer that requires its own specific acknowledgments.
- Lack of BLE/GATT Awareness: They miss that the
NetworkTXcharacteristic is likely set up with “Write Without Response” or has notification logic that depends on a specific server-side event. If the server (ESP32) doesn’t trigger the event, the client (App) never gets the notification. - Zeros as Nulls: They assume filling unused fields (BSSID, Index) with zeros is safe. In embedded C, a zero index often points to the start of an array, and a zero BSSID might be a valid value for some specific APs, leading to unpredictable logic paths.