Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL - Internal error (eg, unexpected failure in lower-level module) #9995

Closed
1 task done
hitecSmartHome opened this issue Jul 8, 2024 · 68 comments
Closed
1 task done
Labels
Status: Awaiting triage Issue is waiting for triage

Comments

@hitecSmartHome
Copy link

hitecSmartHome commented Jul 8, 2024

Board

ESP32-Wrover

Device Description

Hardware Configuration

Version

v2.0.17

IDE Name

PlatformIO

Operating System

Windows10

Flash frequency

80

PSRAM enabled

yes

Upload speed

115200

Description

SOMETIMES I got an SSL internal error when sending websocket frames. It mostly working and the packets are written fine. No matter the size because I'm chunking the data.

Sketch

void HsHWebsocket::send(const char* message, int messageLength) {
    if (!connected || messageLength <= 0) return;

    uint8_t headerSize;

    // Calculate header size
    if (messageLength < 126) {
        headerSize = 2;
    } else if (messageLength < 0xFFFF) {
        headerSize = 4;
    } else {
        headerSize = 10;
    }

    // Add mask key size if client
    headerSize += 4;

    // Generate mask key
    for (uint8_t i = 0; i < 4; ++i) {
        maskKey[i] = random(0x00, 0xFF);
    }

    // Create header
    fixedSendBuffer[0] = FIN | OPCODE_TEXT;
    if (messageLength < 126) {
        fixedSendBuffer[1] = 0x80 | messageLength;
    } else if (messageLength < 0xFFFF) {
        fixedSendBuffer[1] = 0x80 | 126;
        fixedSendBuffer[2] = (messageLength >> 8) & 0xFF;
        fixedSendBuffer[3] = messageLength & 0xFF;
    } else {
        fixedSendBuffer[1] = 0x80 | 127;
        fixedSendBuffer[2] = (messageLength >> 56) & 0xFF;
        fixedSendBuffer[3] = (messageLength >> 48) & 0xFF;
        fixedSendBuffer[4] = (messageLength >> 40) & 0xFF;
        fixedSendBuffer[5] = (messageLength >> 32) & 0xFF;
        fixedSendBuffer[6] = (messageLength >> 24) & 0xFF;
        fixedSendBuffer[7] = (messageLength >> 16) & 0xFF;
        fixedSendBuffer[8] = (messageLength >> 8) & 0xFF;
        fixedSendBuffer[9] = messageLength & 0xFF;
    }
    memcpy(fixedSendBuffer + headerSize - 4, maskKey, 4);

    // Send header
    if (client.write(fixedSendBuffer, headerSize) != headerSize) {
        log_e("Failed to send header\n");
        return;
    }

    // Send payload in chunks if necessary
    size_t chunkSize = MAX_FRAME_SIZE - headerSize;  // Adjust chunk size based on buffer size and header size
    size_t offset = 0;

    while (offset < messageLength) {
        size_t bytesToSend = min(chunkSize, messageLength - offset);

        // Mask the payload in chunks and send
        for (size_t i = 0; i < bytesToSend; ++i) {
            fixedSendBuffer[headerSize + i] = message[offset + i] ^ maskKey[(offset + i) % 4];
        }

        if (client.write(fixedSendBuffer + headerSize, bytesToSend) != bytesToSend) {
            log_e("Failed to send payload chunk\n");
            return;
        }

        offset += bytesToSend;
    }
}

Debug Message

[ 24635][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 8 bytes...

[ 24647][I][HsHWebsocketSend.cpp:122] sendPong(): Sending Pong

[ 24652][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 6 bytes...

[ 24655][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 675 bytes...

[ 24669][V][ssl_client.cpp:381] send_ssl_data(): Handling error -27648

[ 24676][E][ssl_client.cpp:37] _handle_error(): [send_ssl_data():382]: (-27648) SSL - Internal error (eg, unexpected                                                    failure in lower-level module)

[ 24689][E][ssl_client.cpp:37] _handle_error(): [data_to_read():361]: (-76) UNKNOWN ERROR CODE (004C)

[ 24690][V][ssl_client.cpp:321] stop_ssl_socket(): Cleaning SSL connection.

[ 24705][V][ssl_client.cpp:321] stop_ssl_socket(): Cleaning SSL connection.

[ 24709][E][HsHWebsocketSend.cpp:65] send(): Failed to send payload chunk



[ 24725][E][HsHWebsocketSend.cpp:48] send(): Failed to send header

Other Steps to Reproduce

Using WifiClientSecure

WiFiClientSecure secureClient;
WiFiClient normalClient;
WiFiClient& client = secureClient;

void HsHWebsocket::setClient() {
    if (port == 443) {
        secureClient.setInsecure();
        client = (WiFiClient&)secureClient;
        log_i("Using secure client");
    }
    log_i("Using normal client");
    client = normalClient;
}

I have checked existing issues, online documentation and the Troubleshooting Guide

  • I confirm I have checked existing issues, online documentation and Troubleshooting guide.
@hitecSmartHome hitecSmartHome added the Status: Awaiting triage Issue is waiting for triage label Jul 8, 2024
@hitecSmartHome
Copy link
Author

Sometimes there is that debug log

[165950][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 6 bytes...

[165959][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 444 bytes...

[165971][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 8 bytes...

[165980][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 444 bytes...

[165992][V][ssl_client.cpp:369] send_ssl_data(): Writing HTTP request with 8 bytes...

[165996][E][ssl_client.cpp:37] _handle_error(): [data_to_read():361]: (-30592) SSL - A fatal alert message was received from our peer

[166012][V][ssl_client.cpp:321] stop_ssl_socket(): Cleaning SSL connection.

[166022][V][ssl_client.cpp:381] send_ssl_data(): Handling error -78

[166028][E][ssl_client.cpp:37] _handle_error(): [send_ssl_data():382]: (-78) UNKNOWN ERROR CODE (004E)

[166037][V][ssl_client.cpp:321] stop_ssl_socket(): Cleaning SSL connection.

[166044][E][HsHWebsocketSend.cpp:48] send(): Failed to send header

@hitecSmartHome
Copy link
Author

There is even a crash sometimes

assert failed: 0x401dba51

Backtrace: 0x4008381e:0x3ffcda10 0x401d6501:0x3ffcda30 0x4008fc4a:0x3ffcda50 0x401dba51:0x3ffcda90 0x401db3a8:0x3ffcdab0 0x400846a7:0x3ffcdad0 0x4008fc7d:0x3ffcdaf0 0x400d7bd9:0x3ffcdb10 0x40268d12:0x3ffcdb30 0x400d7c51:0x3ffcdb50 0x4012161f:0x3ffcdb70 0x40121629:0x3ffcdbf0 0x400d4a3e:0x3ffcdc10 0x4018d235:0x3ffcdc30 0x40189e5e:0x3ffcdc80 0x40189e85:0x3ffcdca0

ELF file SHA256: e485fe8d165cb361

Rebooting...

DECODED

Decoding stack results
0x4008381e: panic_abort at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\esp_system\panic.c line 408
0x401d6501: esp_register_shutdown_handler at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\esp_system\esp_system.c line 60
0x4008fc4a: __assert_func at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\newlib\assert.c line 47
0x401dba51: tlsf_malloc at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/heap_tlsf_block_functions.h line 126
0x401db3a8: multi_heap_malloc_impl at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap\multi_heap.c line 191
0x400846a7: heap_caps_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap\heap_caps.c line 382
0x4008fc7d: free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\newlib\heap.c line 39
0x400d7bd9: ArduinoJson::V704PB22::detail::DefaultAllocator::deallocate(void*) at lib/ArduinoJson-7.x/src/ArduinoJson/Memory/Allocator.hpp line 31
0x40268d12: ArduinoJson::V704PB22::detail::StringPool::dereference(char const*, ArduinoJson::V704PB22::Allocator*) at lib/ArduinoJson-7.x/src/ArduinoJson/Memory/StringPool.hpp line 88
0x400d7c51: ArduinoJson::V704PB22::detail::ResourceManager::~ResourceManager() at lib/ArduinoJson-7.x/src/ArduinoJson/Memory/ResourceManager.hpp line 26
0x4012161f: Thermostat:: ::operator()(void) const at lib/ArduinoJson-7.x/src/ArduinoJson/Object/MemberProxy.hpp line 32
0x40121629: Thermostat:: ::operator()(void) const at src/Components/Thermostat/Thermostat.cpp line 156
0x400d4a3e: std::function ::operator()() const at c:\users\pc\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/std_function.h line 260
0x4018d235: Sys::handleIntervals() at c:\users\pc\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/stl_iterator.h line 783
0x40189e5e: std::_Function_handler   >::_M_invoke(const std::_Any_data &) at c:\users\pc\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0/new line 169
0x40189e85: std::_Function_handler  ::  >::_M_invoke(const std::_Any_data &, TmParams *&&) at src/System/System.cpp line 314

TLS implementation has serious faults it seems.

@hitecSmartHome
Copy link
Author

Random crashes when i call client.stop()

==================== CURRENT THREAD STACK =====================
#0  0x40083821 in panic_abort (details=0x3ffc5900 "assert failed: 0x401d3ac1") at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\esp_system\panic.c:408
#1  0x401cc6c4 in esp_system_abort (details=0x3ffc5900 "assert failed: 0x401d3ac1") at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\esp_system\esp_system.c:137
#2  0x4008fc4d in __assert_func (file=<optimized out>, line=0, func=0x0, expr=0x0) at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components
ewlib\assert.c:47
#3  0x401d3ac4 in tlsf_free (tlsf=0x3f80383c, ptr=<optimized out>) at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap\heap_tlsf.c:965
#4  0x401d341b in multi_heap_free_impl (p=0x3f82cbbc, heap=0x3f803828) at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap\multi_heap.c:212
#5  multi_heap_free_impl (heap=0x3f803828, p=0x3f82cbbc) at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap\multi_heap.c:200
#6  0x400846aa in heap_caps_free (ptr=<optimized out>) at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap\heap_caps.c:382
#7  0x40085d58 in esp_mbedtls_mem_free (ptr=0x3f82cbbc) at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\port\esp_mem.c:46
#8  0x40200f85 in mbedtls_free (ptr=0x3f82cbbc) at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library\platform.c:54
#9  0x4020aa1b in mbedtls_ssl_free (ssl=0x3f80485c) at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library\ssl_tls.c:6749
#10 0x401b409e in stop_ssl_socket (ssl_client=0x3f804858, rootCABuff=<optimized out>, cli_cert=0x0, cli_key=0x0) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/ssl_client.cpp:336
#11 0x401b3603 in WiFiClientSecure::stop (this=0x3ffb38a0 <serverRouter+12>) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:98
#12 0x401b36ec in WiFiClientSecure::available (this=0x3ffb38a0 <serverRouter+12>) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:248
#13 0x401b3657 in WiFiClientSecure::read (this=0x3ffb38a0 <serverRouter+12>, buf=0x3ffc5a80 "", size=0) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:213
#14 0x40263a1d in WiFiClientSecure::connected (this=0x3ffb38a0 <serverRouter+12>) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:257
#15 0x4014928f in HsHWebsocket::handleWebSocket (this=0x3ffb3894 <serverRouter>) at src/HsHWebsockets/HandleWebsocket.cpp:132
#16 0x4014b37f in <lambda(void*)>::operator() (__closure=0x0, param=0x3ffb3894 <serverRouter>) at src/HsHWebsockets/HsHWebsocketConnection.cpp:37
#17 <lambda(void*)>::_FUN(void *) () at src/HsHWebsockets/HsHWebsocketConnection.cpp:39

@me-no-dev
Copy link
Member

looks like memory issue. Both exceptions happen when dealing with memory. Also the code you provided is only illustration and not minimal code that we can actually compile and reproduce.

Have you tried 3.x? 2.x is EOL (even if we release 2.0.18 with the very last ESP-IDF 4.4.8)

@hitecSmartHome
Copy link
Author

Unfortunatelly I'm stuck at 2.0.17 because I use Arduino as a component of IDF with PlatformIO.
I need the full power of the menuconfig so I can adjust the external ram options to my needs.

Indeed it is a memory issue which happens when I call client.stop(). My app constantly sending messages and randomly crashes. Probably because when it calls client.stop() it is still in a sending state and wants to use client. I'm using a semaphore to protect the WiFiClient secure pointer but it did not work.

@hitecSmartHome
Copy link
Author

It seems to me so far that not the client.stop() call was the bad guys but the client.connected() call. I removed this from my project and no crash so far. Will test further

@hitecSmartHome
Copy link
Author

My test cpus are up and running since 16 hour and 49 min without the client.connected() call. Looks like it does not like concurrent calls from other threads

@me-no-dev
Copy link
Member

client.connected() goes through read() with size 0 to update the client's _connected variable, which goes through available(), which can in particular case call stop(), which would update the _connected variable to false, that can then be returned by connected(). So far it makes sense. What is your application doing with connected()? You mention multiple threads?

@me-no-dev
Copy link
Member

maybe connected() should read() only if _connected is true. Can you try that?

uint8_t WiFiClientSecure::connected() {
  if (_connected) {
    uint8_t dummy = 0;
    read(&dummy, 0);
  }
  return _connected;
}

@hitecSmartHome
Copy link
Author

My application is a websocket client implementation written in top of the WifiClientSecure and WiFiClient classes because the ArduinoWebsockets library and the native IDF ws client library ( on idf 4.4.7 ) is sucks. They both causing crashes or slow.
This websocket client tries to be connected to the server at all times. For this to work I attempted to check for connectivity on every iteration with this call. client.connected(). Which I tought is used for exactly this purpose. I wanted to initiate a reconnection after x sec if the client is not connected. Instead I turned to an other approach where the application ping-ponging with the server and if we can't send a messgae ( client.write() fails ) we initiate the reconnection.

Really interesting design choise for client.connected(). I really just wanted to know if we are still connected or not.

maybe connected() should read() only if _connected is true. Can you try that?

I can try that yes. I have other processors to test it along with the rest.
I can upload my ws client implementation so you can take a look if you want to see the use case but basically this is the function which runs in a loop.

void HsHWebsocket::handle() {
    if (!isConnected()) {
        disconnect();
        connect();
        return;
    }

    if (!client.available()) { return; }

    uint8_t opcode;
    uint64_t payloadLength;
    bool fin, masked;

    if (!readFrameHeader(opcode, payloadLength, fin, masked)) {
        return;
    }

    if (masked && !fragmentedMessage) {
        if (!readMaskingKey(maskKey)) {
            return;
        }
    }

    if (!readPayloadData(payloadLength, masked, maskKey)) {
        return;
    }

    if (fin) {
        processFrame(opcode);
    } else {
        fragmentedMessage = true;
    }
}

In the isConnected() call there was the client.connected() call like this

bool HsHWebsocket::isConnected() {
    return connected && client.connected();
}

This is instead is like this

bool HsHWebsocket::isConnected() {
    return connected;
}

I have an internal bool variable for tracking the connected state.

@hitecSmartHome
Copy link
Author

maybe connected() should read() only if _connected is true. Can you try that?

uint8_t WiFiClientSecure::connected() {
  if (_connected) {
    uint8_t dummy = 0;
    read(&dummy, 0);
  }
  return _connected;
}

Should I replace this function entierly?

uint8_t WiFiClient::connected()
{
    if (_connected) {
        uint8_t dummy;
        int res = recv(fd(), &dummy, 0, MSG_DONTWAIT);
        // avoid unused var warning by gcc
        (void)res;
        // recv only sets errno if res is <= 0
        if (res <= 0){
          switch (errno) {
              case EWOULDBLOCK:
              case ENOENT: //caused by vfs
                  _connected = true;
                  break;
              case ENOTCONN:
              case EPIPE:
              case ECONNRESET:
              case ECONNREFUSED:
              case ECONNABORTED:
                  _connected = false;
                  log_d("Disconnected: RES: %d, ERR: %d", res, errno);
                  break;
              default:
                  log_i("Unexpected: RES: %d, ERR: %d", res, errno);
                  _connected = true;
                  break;
          }
        } else {
          _connected = true;
        }
    }
    return _connected;
}

@me-no-dev
Copy link
Member

But you said you use ClientSecure. Why post the function of Client? You need to change the WiFiClientSecure::connected()

@hitecSmartHome
Copy link
Author

Oh sorry. The implementation is written so it can use both. I happened to click on the WifiClient and not the WiFiClientSecure.
There is a listen() method which takes in either a URL in a string format or an ip address. The library attempts to determine if it is a secure or a non secure connection and using either client or clientSecure depending on the http method.

WiFiClientSecure secureClient;
WiFiClient normalClient;
WiFiClient& client = secureClient;

void HsHWebsocket::setClient() {
    if (isSecure()) {
        secureClient.setInsecure(); // Development mode
        client = (WiFiClient&)secureClient;
        return;
    }
    client = normalClient;
}

@me-no-dev
Copy link
Member

The crash was in client secure, so I looked into that and say that it reads always, even if already known to be disconnected. WiFiClient::connected() does the same thing

@hitecSmartHome
Copy link
Author

Really strange that it can call stop() when I only want to check if it is connected or not. :D Will check

@me-no-dev
Copy link
Member

just change it to

uint8_t WiFiClientSecure::connected() {
  if (_connected) {
    uint8_t dummy = 0;
    read(&dummy, 0);
  }
  return _connected;
}

@hitecSmartHome
Copy link
Author

Changed, testing.

@me-no-dev
Copy link
Member

Really strange that it can call stop() when I only want to check if it is connected or not.

It's because it does not really directly check if connected, instead it tries to read 0 bytes from the client, to indirectly determine the state of the connection (WiFiClient does the same) and in some cases it should close the connection, so stop() is called. Issue is that it was called multiple times

@hitecSmartHome
Copy link
Author

Crashed.

PC: 0x4020c044
EXCVADDR: 0x00000000

Decoding stack results
0x4020c041: mbedtls_ssl_read_record at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library\ssl_msg.c line 3427
0x4020c302: mbedtls_ssl_read at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library\ssl_msg.c line 5236
0x401b6a7e: data_to_read(sslclient_context*) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/ssl_client.cpp line 356
0x401b6075: WiFiClientSecure::available() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 246
0x401b5ff0: WiFiClientSecure::read(unsigned char*, unsigned int) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 213
0x40266629: WiFiClientSecure::connected() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 257
0x4026478f: HsHWebsocket::isConnected() at src/HsHWebsocket/Utils.cpp line 103
0x40154764: HsHWebsocket::sendText(char const*, int) at src/HsHWebsocket/Send.cpp line 113
0x4013f955: HsHServerRouter::reply(ArduinoJson::V704PB22::JsonDocument, bool) at src/HsHServerRouter/Handlers.cpp line 15
0x4013f9e3: HsHServerRouter::redirect(char const*, char const*, char const*) at src/HsHServerRouter/Handlers.cpp line 27
0x401747dd: HsHServer::sendEvent(char const*, ArduinoJson::V704PB22::JsonDocument&) at src/Server/Event.cpp line 66
0x40121ef6: Thermostat:: ::operator()(void) const at src/Components/Thermostat/Thermostat.cpp line 156
0x40121f05: std::_Function_handler   >::_M_invoke(const std::_Any_data &) at c:\users\pc\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/std_function.h line 151
0x400d4732: std::function ::operator()() const at c:\users\pc\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/std_function.h line 260
0x40192235: Sys::handleIntervals() at src/System/SystemInterval.cpp line 55
0x4018bbda: Sys::loop() at src/System/System.cpp line 549
0x4018bc01: systemTask(void*) at src/System/System.cpp line 558

@me-no-dev
Copy link
Member

are you sure you replaced the proper file? connected should have shown line 258

@me-no-dev
Copy link
Member

C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 257

@me-no-dev
Copy link
Member

maybe clean the build, since you use PIO

@hitecSmartHome
Copy link
Author

are you sure you replaced the proper file? connected should have shown line 258

Yeah,

image

@hitecSmartHome
Copy link
Author

Cleaned it, trying again.

@hitecSmartHome
Copy link
Author

I can trigger this crash when the server rapidly sending messages to the esp32.
I have a range slider on the server's frontend. I can move it rapidly and it crashes eventually.
It can crash without me rapidly moving the range slider hovewer, but it causes it to crash much faster.
It does not crash at all no matter how crazy I move the slider if this call is commented out.

@hitecSmartHome
Copy link
Author

Cleaned, uploaded again ( via OTA ), crashed again.

Decoding stack results
0x4008381e: panic_abort at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\esp_system\panic.c line 408
0x401d1125: esp_system_abort at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\esp_system\esp_system.c line 137
0x4008fc4a: __assert_func at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\newlib\assert.c line 47
0x401d6675: tlsf_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/heap_tlsf_block_functions.h line 90
0x401d5fcc: multi_heap_free_impl at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap\multi_heap.c line 212
0x400846a7: heap_caps_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap\heap_caps.c line 382
0x40085d55: esp_mbedtls_mem_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\port\esp_mem.c line 46
0x40203b3a: mbedtls_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library\platform.c line 54
0x4020dab4: mbedtls_ssl_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library\ssl_tls.c line 6761
0x401b6a37: stop_ssl_socket(sslclient_context*, char const*, char const*, char const*) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/ssl_client.cpp line 336
0x401b5f9c: WiFiClientSecure::stop() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 98
0x401b6085: WiFiClientSecure::available() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 248
0x401b5ff0: WiFiClientSecure::read(unsigned char*, unsigned int) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 213
0x40266629: WiFiClientSecure::connected() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 257
0x4026478f: HsHWebsocket::isConnected() at src/HsHWebsocket/Utils.cpp line 103
0x401411e1: HsHServerRouter::connected() at src/HsHServerRouter/HsHServerRouter.cpp line 125
0x4013f969: HsHServerRouter::redirect(char const*, char const*, char const*) at src/HsHServerRouter/Handlers.cpp line 21
0x401747dd: HsHServer::sendEvent(char const*, ArduinoJson::V704PB22::JsonDocument&) at src/Server/Event.cpp line 66
0x400f02f1: Heater::sendSync() at src/Components/Heater/Utils.cpp line 53
0x400ec219: std::_Function_handler   >::_M_invoke(const std::_Any_data &, int &&, int &&) at src/Components/Heater/Heater.cpp line 11
0x4019848a: std::function ::operator()(int, int) const at c:\users\pc\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/std_function.h line 260
0x4019859d: Time::emitSecChange(int, int) at src/Time/TimeEvents.cpp line 69
0x401985bd: Time::monitorSecChange() at src/Time/TimeEvents.cpp line 331
0x40198721: Time::handleEvents() at src/Time/TimeEvents.cpp line 369
0x40197522: Time::loop() at src/Time/Time.cpp line 211
0x401a0968: loop() at src/main.cpp line 95
0x401c1289: loopTask(void*) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/cores/esp32/main.cpp line 50

Always these lines

0x401b5f9c: WiFiClientSecure::stop() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 98
0x401b6085: WiFiClientSecure::available() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 248
0x401b5ff0: WiFiClientSecure::read(unsigned char*, unsigned int) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 213
0x40266629: WiFiClientSecure::connected() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp line 257
0x4026478f: HsHWebsocket::isConnected() at src/HsHWebsocket/Utils.cpp line 103

@me-no-dev
Copy link
Member

what is also interesting is that even if you comment the call, after isConnected you call client.available(), which should have ended the same way, but you say it does not crash...

@hitecSmartHome
Copy link
Author

This is the basic idea for the usecase

#include <Arduino.h>
#include <HsHWebsocket/HsHWebsocket.h>

#define SERVER_URL "https://echo.websocket.org/"

void setup() {
  Serial.begin(115200);

  webSocket.onConnect([]() {
    Serial.println("WS - Connected");
  });

  webSocket.onDisconnect([]() {
    Serial.println("WS - Disconnected");
  });

  webSocket.onFrame([](char* data, size_t length) {
    Serial.printf("WS - Received: %s\n", data);
  });

  webSocket.onError([](WsError error) {
    Serial.printf("WS - Error: %d. Message: %s\n", error.code, error.message);
  });

  webSocket.listen(SERVER_URL);
  // webSocket.listen(SERVER_URL, "/path");
  // webSocket.listen(IPAddress(192, 168, 0, 1), 80, "/path");
  // webSocket.listen(IPAddress(192, 168, 0, 1), 80);
}

void loop() {
  vTaskDelay(1);
}

@me-no-dev
Copy link
Member

Try this:

void HsHWebsocket::sendText(const char* message, int messageLength) {
    if (xSemaphoreTake(sendMutex, portMAX_DELAY) == pdTRUE) {
        if (isConnected() && message && messageLength) {
            send(message, messageLength);
            xSemaphoreGive(sendMutex);
        }
    }
}

void HsHWebsocket::sendText(const char* message) {
    sendText(message, strlen(message));
}

@me-no-dev
Copy link
Member

Your handle() should be guarded by the same semaphore.

@hitecSmartHome
Copy link
Author

If I guard the handle with the same semaphore my esp is frozen. IDK why. I tried that. Will check your suggestions!

@hitecSmartHome
Copy link
Author

void HsHWebsocket::sendText(const char* message, int messageLength) {
    if (xSemaphoreTake(sendMutex, portMAX_DELAY) == pdTRUE) {
        if (isConnected() && message && messageLength) {
            send(message, messageLength);
            xSemaphoreGive(sendMutex);
        }
    }
}

This would never give the semaphore if the message is either empty or the server is not connected. It would stuck

@me-no-dev
Copy link
Member

ahh sorry..... typo. xSemaphoreGive(sendMutex); should be one level above

@me-no-dev
Copy link
Member

void HsHWebsocket::sendText(const char* message, int messageLength) {
    if (xSemaphoreTake(sendMutex, portMAX_DELAY) == pdTRUE) {
        if (isConnected() && message && messageLength) {
            send(message, messageLength);
        }
        xSemaphoreGive(sendMutex);
    }
}

@me-no-dev
Copy link
Member

void HsHWebsocket::handle() {
    if (xSemaphoreTake(sendMutex, portMAX_DELAY) != pdTRUE) {
        return;
    }
    if (!isConnected()) {
        disconnect();
        connect();
        goto return_sem;
    }

    if (!client.available()) { goto return_sem; }

    uint8_t opcode;
    uint64_t payloadLength;
    bool fin, masked;

    if (!readFrameHeader(opcode, payloadLength, fin, masked)) {
        goto return_sem;
    }

    if (masked && !fragmentedMessage) {
        if (!readMaskingKey(maskKey)) {
            goto return_sem;
        }
    }

    if (!readPayloadData(payloadLength, masked, maskKey)) {
        goto return_sem;
    }

    if (fin) {
        processFrame(opcode);
    } else {
        fragmentedMessage = true;
    }
return_sem:
    xSemaphoreGive(sendMutex);
}

@me-no-dev
Copy link
Member

You might also want to replace all vTaskDelay(1); with vTaskDelay(2); to ensure that there will be at least one tick for tasks to be switched

@hitecSmartHome
Copy link
Author

hm... Why is it necessary to be one tick? I use vTaskDelay(1); for all my tasks. ( I have a lot of tasks ) Is this bad practise?
Really interesting that you are using goto.

@hitecSmartHome
Copy link
Author

Thank you very much for the suggestions. Will put client.connected() back and implement your changes and test it.

@me-no-dev
Copy link
Member

Why is it necessary to be one tick?

As far as I understand it, vTaskDelay(1); will delay until the next tick, which could happen to be so close that it does not give time for tasks to switch. At least something like that was explained to me. In Arduino 1 tick is 1ms, so your task will never delay more than 2ms if vTaskDelay(2); is used

@hitecSmartHome
Copy link
Author

I have added the semaphore to my handle function and as it is expected, it froze the task. Never letting the send to complete.

@me-no-dev
Copy link
Member

is sendText called anywhere from within handle()?

@hitecSmartHome
Copy link
Author

No but sendpong() method is called from handle which also uses the send mutex to protect the client access when writing.

@hitecSmartHome
Copy link
Author

Will try to place it outside of handle()

@me-no-dev
Copy link
Member

yeah, taking and already taken mutex will freeze the task

@me-no-dev
Copy link
Member

you could remove it from sendpong if it's not called from anywhere else. It would be already protected by handle

@hitecSmartHome
Copy link
Author

case 0x09:  // Ping
            #if WS_DEBUG
                printf("Received ping frame.\n");
            #endif
            if (pingCallback) {
                pingCallback();
            }
            shouldSendPong = true;
            //sendPong();
            break;

and checking it before handle takes the mutex

void HsHWebsocket::handle() {
    if( shouldSendPong ){
        shouldSendPong = false;
        sendPong();
    }
    if (xSemaphoreTake(sendMutex, portMAX_DELAY) != pdTRUE) {
        return;
    }
//...

@hitecSmartHome
Copy link
Author

you could remove it from sendpong if it's not called from anywhere else. It would be already protected by handle

Yeah, but I plan to make it public so users can send a pong if they want. I did not think this trought yet... Will check it like this first.

@hitecSmartHome
Copy link
Author

It crashes almost imidiately with the changes.

assert failed: 0x401d65cd
  #0  0x401d65cd in tlsf_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/heap_tlsf.c:965 (discriminator 1)



Backtrace: 0x4008381e:0x3ffc58e0 0x401d107d:0x3ffc5900 0x4008fc4a:0x3ffc5920 0x401d65cd:0x3ffc5960 0x401d5f24:0x3ffc5980 0x400846a7:0x3ffc59a0 0x40085d55:0x3ffc59c0 0x40203a92:0x3ffc59e0 0x4020da0c:0x3ffc5a00 0x401b698f:0x3ffc5a20 0x401b5ef4:0x3ffc5a40 0x401b5fdd:0x3ffc5a60 0x401b5f48:0x3ffc5a80 0x40266581:0x3ffc5aa0 0x402646e7:0x3ffc5ad0 0x4014ce6f:0x3ffc5af0 0x4014a1a1:0x3ffc5b20
  #0  0x401d107d in esp_system_abort at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\esp_system/esp_system.c:137
  #1  0x4008fc4a in __assert_func at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\newlib/assert.c:47
  #2  0x401d65cd in tlsf_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/heap_tlsf.c:965 (discriminator 1)
  #3  0x401d5f24 in multi_heap_free_impl at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/multi_heap.c:212
      (inlined by) multi_heap_free_impl at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/multi_heap.c:200
  #4  0x400846a7 in heap_caps_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\heap/heap_caps.c:382
  #5  0x40085d55 in esp_mbedtls_mem_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\port/esp_mem.c:46
  #6  0x40203a92 in mbedtls_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library/platform.c:54        
  #7  0x4020da0c in mbedtls_ssl_free at C:\Users\Pc\.platformio\packages\framework-espidf@3.40407.0\components\mbedtls\mbedtls\library/ssl_tls.c:6761   
  #8  0x401b698f in stop_ssl_socket(sslclient_context*, char const*, char const*, char const*) at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/ssl_client.cpp:336
  #9  0x401b5ef4 in WiFiClientSecure::stop() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:98
  #10 0x401b5fdd in WiFiClientSecure::available() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:248
  #11 0x401b5f48 in WiFiClientSecure::read(unsigned char*, unsigned int) at 
C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:213
  #12 0x40266581 in WiFiClientSecure::connected() at C:/Users/Pc/.platformio/packages/framework-arduinoespressif32/libraries/WiFiClientSecure/src/WiFiClientSecure.cpp:257
  #13 0x402646e7 in HsHWebsocket::isConnected() at src/HsHWebsocket/Utils.cpp:103
  #14 0x4014ce6f in HsHWebsocket::handle() at src/HsHWebsocket/Handle.cpp:11  #15 0x4014a1a1 in HsHWebsocket::makeTask()::{lambda(void*)#1}::_FUN(void*) at src/HsHWebsocket/Connect.cpp:8
      (inlined by) _FUN at src/HsHWebsocket/Connect.cpp:13





ELF file SHA256: 5f115744b2ec6e08

Rebooting...

@me-no-dev
Copy link
Member

it breaks here: https://github.com/espressif/esp-idf/blob/release/v4.4/components/heap/heap_tlsf.c#L965
which means as we suspected that it was already freed. But how does it happen if now code is even protected by mutex. Any chance some other public function is left unguarded and touches the client?

@hitecSmartHome
Copy link
Author

Nop.

public:
    using Callback = std::function<void()>;
    using FrameCallback = std::function<void(char*, size_t)>;
    using ErrorCallback = std::function<void(WsError)>;
    using BinaryFrameCallback = std::function<void(const uint8_t*, size_t)>;

    HsHWebsocket();
    ~HsHWebsocket();

    void listen(const char* url, const char* path = "/");
    void listen(IPAddress ip, uint16_t port, const char* path = "/");

    /* Events */
    void onConnect(Callback callback);
    void onDisconnect(Callback callback);
    void onPing(Callback callback);
    void onPong(Callback callback);
    void onError(ErrorCallback callback);
    void onFrame(FrameCallback callback);
    void onBinaryFrame(BinaryFrameCallback callback);

    /* Utils */
    void addHeader(const char* key, const char* value);
    bool isConnected();

    /* Send */
    void sendText(const char* message, int messageLength);
    void sendText(const char* message);

   private:

Also I'm checking if it is already initialised with a flag so it can't be initialised twice.

@hitecSmartHome
Copy link
Author

Oh wait a minute. isConnected() is public. Will check.

@hitecSmartHome
Copy link
Author

Yeah, isConnected() is used outside. The tasks that are using the websocket implementation are also checking if we are connected or not in various situations.

bool HsHWebsocket::isConnected() {
    return client.connected() && connected;
}

@hitecSmartHome
Copy link
Author

But if i protect client.connected() with the same mutex, it will also trigger a lock.

@hitecSmartHome
Copy link
Author

client should be protected internally....

@me-no-dev
Copy link
Member

maybe, but for now you need to figure out the locks and make sure that you do not call it concurently

@me-no-dev
Copy link
Member

maybe protect isConnected and expand it when inside protected area, or create private _isConnected that is unprotected and you can use inside protected areas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting triage Issue is waiting for triage
Projects
None yet
Development

No branches or pull requests

3 participants