Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FreeRTOS stuck after NVS write after BLE connection error on ESP32S2 (not on ESP32) (IDFGH-9454) #10821

Closed
3 tasks done
QuentinFarizon opened this issue Feb 22, 2023 · 5 comments
Assignees
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally Type: Bug bugs in IDF

Comments

@QuentinFarizon
Copy link

QuentinFarizon commented Feb 22, 2023

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

v5.0

Operating System used.

Windows

How did you build your project?

Command line with Make

If you are using Windows, please specify command line type.

None

Development Kit.

ESP32S3-DevkitM1

Power Supply used.

USB

What is the expected behavior?

Here is my scenario :

In the main, I start by connecting to a BLE peripheral
I am using a standard Nimble stack, after connection it starts service discovery.
I am shutting down my peripheral after connection but before service discovery finished, so I am called in blecent_on_disc_complete with an error status, on which I call ble_gap_terminate and write a log on NVS.

I agree that storing on NVS on a Nimble callstack can be considered dangerous, but I am expecting either :

  • that freeRTOS keeps running normally (this is the case with the same project on ESP32)
  • a crash
  • a watchdog (either interrupt WDT or task WDT)

Instead, the system is completely frozen, nothing output, and doesn't restart. In particular I have a simple task that just display a log each second, that should print indefinitely, and here it doesn't print anything.

static void
blecent_on_disc_complete(const struct peer *peer, int status, void *arg)
{
    if (status != 0) {
        /* Service discovery failed.  Terminate the connection. */
        MODLOG_DFLT(ERROR, "Error: Service discovery failed; status=%d "
                    "conn_handle=%d\n", status, peer->conn_handle);
        ble_gap_terminate(peer->conn_handle, BLE_ERR_REM_USER_CONN_TERM);
        store_connect_log(NULL, "Service discovery failed", status); // this function calls nvs_open_from_partition/nvs_set_str/nvs_commit/nvs_close
        return;
    }
    (...)
}

What is the actual behavior?

< HERE I POWER DOWN THE PERIPHERAL after connection successful >

I (09:57:01.482) DEBUG_COUNTER: Counting seconds 30
I (09:58:01.482) DEBUG_COUNTER: Counting seconds 31
I (09:59:01.482) DEBUG_COUNTER: Counting seconds 32
I (09:59:02.311) NimBLE: connection params update event; conn_handle=1 status=0

D (09:59:02.312) NimBLE: handle=1 our_ota_addr_type=0 our_ota_addr=dc:54:75:c8:ac:4e 
D (09:59:02.316) NimBLE: our_id_addr_type=0 our_id_addr=dc:54:75:c8:ac:4e 
D (09:59:02.323) NimBLE: peer_ota_addr_type=1 peer_ota_addr=f5:c5:dc:ca:13:c6 
D (09:59:02.331) NimBLE: peer_id_addr_type=1 peer_id_addr=f5:c5:dc:ca:13:c6 
D (09:59:02.338) NimBLE: conn_itvl=120 conn_latency=0 supervision_timeout=400 encrypted=0 authenticated=0 bonded=0
I (09:59:02.348) NimBLE: 

E (09:59:02.352) NimBLE: Error: Service discovery failed; status=7 conn_handle=1

I (09:59:02.360) NimBLE: GAP procedure initiated: terminate connection; conn_handle=1 hci_reason=19

D (09:59:02.370) nvs: nvs_open_from_partition logs 1
I (09:59:02.375) logs: Successfully opened NVS partition; handle=7
I (09:59:02.382) logs: Serialized log = {"type":"LINK_SCAN","unixTimestamp":1677059942,"pcbMacAddress":"dc:54:75:c8:ac:4c","data":{"error":{"message":"Service discovery failed","code":7}}}
D (09:59:02.399) nvs: nvs_set_str SPQDGXPMQLXGMEX {"type":"LINK_SCAN","unixTimestamp":1677059942,"pcbMacAddress":"dc:54:75:c8:ac:4c","data":{"error":{"message":"Service discovery failed","code":7}}}
I (09:59:02.419) logs: Wrote log with key SPQDGXPMQLXGMEX
D (09:59:02.423) nvs: nvs_close 7
I (09:59:02.426) NimBLE: disconnect; reason=520 
D (09:59:02.431) NimBLE: handle=1 our_ota_addr_type=0 our_ota_addr=dc:54:75:c8:ac:4e 
D (09:59:02.439) NimBLE: our_id_addr_type=0 our_id_addr=dc:54:75:c8:ac:4e 
D (09:59:02.446) NimBLE: peer_ota_addr_type=1 peer_ota_addr=f5:c5:dc:ca:13:c6 
I (09:59:02.452) NimBLE_BLE_CENT: Connection established retry 128
D (09:59:02.454) NimBLE: peer_id_addr_type=1 peer_id_addr=f5:c5:dc:ca:13:c6 
D (09:59:02.468) NimBLE: conn_itvl=120 conn_latency=0 supervision_timeout=400 encrypted=0 authenticated=0 bonded=0
I (09:59:02.478) NimBLE: 

I (09:59:02.481) NimBLE_BLE_CENT: Removed peer

Steps to reproduce.

  • connect successful
  • power down peripheral before service discovery
  • on discovery failed callback, terminate BLE connection, write on NVS

Debug Logs.

No response

More Information.

I do not reproduce on ESP32 chip.
I do not reproduce if I comment out the writing to NVS

@QuentinFarizon QuentinFarizon added the Type: Bug bugs in IDF label Feb 22, 2023
@github-actions github-actions bot changed the title FreeRTOS stuck after NVS write after BLE connection error on ESP32S2 (not on ESP32) FreeRTOS stuck after NVS write after BLE connection error on ESP32S2 (not on ESP32) (IDFGH-9454) Feb 22, 2023
@espressif-bot espressif-bot added the Status: Opened Issue is new label Feb 22, 2023
@rahult-github
Copy link
Collaborator

Hi @QuentinFarizon ,

Is it possible for you to please share the sample code where nvs related operations are done (mostly sequence and usage of nvs related APIs inside the store_connect_log function )? We can use the same at our end to reproduce this .

@QuentinFarizon
Copy link
Author

Sure, here is the implementation, it's a bit complicated because I use ArduinoJson to serialize the stored log.

#include <libraries/ArduinoJson-v6.18.5.h>

static const char *TAG = "logs";

#define MAX_SERIALIZED_LOG_SIZE 256

esp_err_t store_serialized_log(nvs_handle_t logs_nvs_handle, char* serialized_log)
{
    char key[NVS_KEY_NAME_MAX_SIZE] = {0};
    for (int i =0; i<NVS_KEY_NAME_MAX_SIZE - 1; i++) {
        key[i] = (esp_random() % 26) + 'A';
    }
    key[NVS_KEY_NAME_MAX_SIZE - 1] = '\0';

    esp_err_t rc = nvs_set_str(logs_nvs_handle, key, serialized_log);
    if (rc != 0) {
        ESP_LOGE(TAG, "Failed to write log to NVS; rc=%d\n", rc);
        return rc;
    }
    ESP_LOGI(TAG, "Wrote log with key %s", key);
    rc = nvs_commit(logs_nvs_handle);
    if (rc != 0) {
        ESP_LOGE(TAG, "Failed to commit NVS; rc=%d\n", rc);
        return rc;
    }
    return 0;
}

static void store_link_log(JsonDocument& doc)
{
    nvs_handle_t logs_nvs_handle;
    esp_err_t rc = nvs_open_from_partition(LOGS_NVS_PARTITION_NAME, LOGS_NVS_NAMESPACE, NVS_READWRITE, &logs_nvs_handle);
    if (rc != 0) {
        ESP_LOGE(TAG, "Failed to open NVS partition; rc=0x%x\n", rc);
        return;
    }
    ESP_LOGI(TAG, "Successfully opened NVS partition; handle=%lu", logs_nvs_handle);

    char serialized_log[MAX_SERIALIZED_LOG_SIZE];
    serializeJson(doc, serialized_log);
    ESP_LOGI(TAG, "Serialized log = %s", serialized_log);
    store_serialized_log(logs_nvs_handle, serialized_log);

    nvs_close(logs_nvs_handle);
}

static void fill_link_log_timestamp(JsonDocument& doc, int32_t offsetInSeconds = 0)
{
    time_t now;
    time(&now);
    doc["unixTimestamp"] = now + offsetInSeconds;
}

static void fill_link_log_mac_address(JsonDocument& doc)
{
    uint8_t mac[6] = {0};
    esp_read_mac(mac, ESP_MAC_WIFI_STA);
    char link_mac_address_str[6 * 2 + 5 + 1];
    sprintf(link_mac_address_str, "%02x:%02x:%02x:%02x:%02x:%02x", mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
    doc["pcbMacAddress"] = link_mac_address_str;
}

void store_connect_log(char* connected_mac_address, char* error_message, int error_code)
{
    StaticJsonDocument<MAX_SERIALIZED_LOG_SIZE> doc;
    doc["type"] = "LINK_SCAN";
    fill_link_log_timestamp(doc);
    fill_link_log_mac_address(doc);

    if (connected_mac_address) {
        ESP_LOGI(TAG, "Connected mac address: %s", connected_mac_address);
        doc["data"]["connected"][0] = connected_mac_address;
    }
    if (error_message) {
        doc["data"]["error"]["message"] = error_message;
        doc["data"]["error"]["code"] = error_code;
    }

    store_link_log(doc);
}

@RoshanESP
Copy link
Collaborator

RoshanESP commented Feb 23, 2023

Hi @QuentinFarizon,
I took your code and tried to reproduce the issue on my side.
However, I observed everything being normal and didn't find the freeRTOS stuck.
I am attaching the success log and the patch for your reference.
blecent_log.txt
blecent_patch.txt
However, I would like to know :

  1. What is the difference in this patch and your code.
  2. Can you also share the code where you are printing indefinitely.

@RoshanESP
Copy link
Collaborator

Hi @QuentinFarizon,
Do you have any further updates on the queries mentioned above?

@RoshanESP
Copy link
Collaborator

Hi @QuentinFarizon , Closing this issue, feel free to open if you have any updates.

@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: Done Issue is done internally and removed Status: Opened Issue is new labels Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally Type: Bug bugs in IDF
Projects
None yet
Development

No branches or pull requests

4 participants