Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cant connect to broker when using esp8266 version 3.1.1 #292

Closed
nusabot-iot opened this issue Jan 25, 2023 · 9 comments
Closed

Cant connect to broker when using esp8266 version 3.1.1 #292

nusabot-iot opened this issue Jan 25, 2023 · 9 comments

Comments

@nusabot-iot
Copy link

Using Arduino IDE, and install board version 3.1.1.
and call connect API, always return false.

it works when using version 3.0.x

@mp-se
Copy link

mp-se commented Jan 26, 2023

I have the same issue when trying to upgrade to 3.1.1

[update] This sample code works on the previous version but return error -9 on the v3.1.1 version of the esp8266 arduino fwk.

WiFiClient _wifi;
MQTTClient mqtt(128);

mqtt.begin("myhost", 1883, _wifi);
mqtt.connect("test", "", "");

if (mqtt.publish("test/topic", "value")) {
Serial.printf("MQTT publish successful" CR);
} else {
Serial.printf("MQTT publish failed with error %d" CR, mqtt.lastError());
}

mqtt.disconnect();

@mp-se
Copy link

mp-se commented Jan 26, 2023

Here are some dumps with debugging on for the two frameworks: It looks like it fails to get the ACK due to timeout when sending the initial data to the server (my first thoughts)

These were the flags I added.
-D DEBUG_ESP_HTTP_CLIENT
-D DEBUG_ESP_HTTP_SERVER
-D DEBUG_ESP_PORT=Serial
-D DEBUG_ESP_WIFI
-D DEBUG_ESP_SSL
-D DEBUG_ESP_CORE

Working one (older fwk)

.---- start testing for mqtt bug
[hostByName] Host: 192.168.1.16 is a IP!
:ref 1
:wr 18 0
:wrc 18 18 0
:ack 18
:rn 4
:rd 1, 4, 0
:rdi 4, 1
:rd 1, 4, 1
:rdi 3, 1
:rd 2, 4, 2
:rdi 2, 2
:c0 2, 4
:wr 19 0
:wrc 19 19 0
MQTT publish successful
:wr 2 0
:wrc 2 2 0
:ack 19
:ack 2
:rcl pb=0 sz=-1
:abort
:ur 1
:dsrcv 0
:del
---- end testing for mqtt bug

Failing (3.1.1)

.---- start testing for mqtt bug
[hostByName] Host: 192.168.1.16 is IP!
:ref 1
:wr 18 0
:wrc 18 18 0
:wustmo
:close
MQTT publish failed with error -9
:ur 1
:dsrcv 0
:del
---- end testing for mqtt bug

@home-s-dev
Copy link

I had the same problem, just with esp8266 core 3.1.1 (and not with 3.1.0 or previous 3.0.x)

The problem seems related to the change in delay() coming with ESP8266 core ver 3.1.1
Core

I'm not 100% sure, but changing https://github.com/256dpi/arduino-mqtt/blob/master/src/MQTTClient.cpp#L72 from delay(0) to delay(1) can fix. No other occurrence of "delay" is present in the code...

I hope this can help.

@mp-se
Copy link

mp-se commented Jan 29, 2023

You were spot on. Changing delay(0) to delay(1) fixes the problem with running under 3.1.1. I will do some more tests and issue a PR for this later on.

@mp-se
Copy link

mp-se commented Jan 29, 2023

I did some more testing. Trying to use delayMicroseconds instead of delay but that didnt work. I'm wondering if the issue is connected to timing and reading the result from the initial connection.

Changing from delay(1) to yield() also works. I'm not sure which option is best in this case.

Changing a behaviors' in the arduino core like this would probably break more libraries, so I'm curious if this was intentional or they introduced a new bug.

@home-s-dev
Copy link

If I have pieced together correctly, old version of delay(0) results in calling __esp_delay(0)

extern "C" void __esp_delay(unsigned long ms) {
    if (ms) {
        os_timer_setfn(&delay_timer, (os_timer_func_t*)&delay_end, 0);
        os_timer_arm(&delay_timer, ms, ONCE);
    }
    else {
        esp_schedule();
    }
    esp_suspend();
    if (ms) {
        os_timer_disarm(&delay_timer);
    }
}

and eventually in esp_schedule() followed by esp_suspend().

On the contrary, new version of delay(0) requires more steps with overload versions of esp_delay and esp_try_delay.
In this last function, the argument passed to esp_delay (alias __esp_delay) can be modified according to recurrent scheduled functions (compute_scheduled_recurrent_grain):

bool esp_try_delay(const uint32_t start_ms, const uint32_t timeout_ms, const uint32_t intvl_ms) {
    uint32_t expired = millis() - start_ms;
    if (expired >= timeout_ms) {
        return true; // expired
    }

    // compute greatest chunked delay with respect to scheduled recurrent functions
    uint32_t grain_ms = std::gcd(intvl_ms, compute_scheduled_recurrent_grain());

    // recurrent scheduled functions will be called from esp_delay()->esp_suspend()
    esp_delay(grain_ms > 0 ?
        std::min((timeout_ms - expired), grain_ms):
        (timeout_ms - expired));

    return false; // expiration must be checked again
}

This manipulation can lead to a final call of __esp_delay with a non-zero argument.
In this scenario, esp_schedule is not triggered.

The delayMicroseconds function does not yield to other tasks, so I assume it can't be a valid option.

On the other hand, yield (alias __yield) seems very similar to former delay(0)

extern "C" void __yield() {
    if (cont_can_suspend(g_pcont)) {
        esp_schedule();
        esp_suspend_within_cont();
    }
    else {
        panic();
    }
}

Sorry for the long post.

@mp-se
Copy link

mp-se commented Feb 1, 2023

If I have pieced together correctly, old version of delay(0) results in calling __esp_delay(0)

extern "C" void __esp_delay(unsigned long ms) {
    if (ms) {
        os_timer_setfn(&delay_timer, (os_timer_func_t*)&delay_end, 0);
        os_timer_arm(&delay_timer, ms, ONCE);
    }
    else {
        esp_schedule();
    }
    esp_suspend();
    if (ms) {
        os_timer_disarm(&delay_timer);
    }
}

and eventually in esp_schedule() followed by esp_suspend().

On the contrary, new version of delay(0) requires more steps with overload versions of esp_delay and esp_try_delay. In this last function, the argument passed to esp_delay (alias __esp_delay) can be modified according to recurrent scheduled functions (compute_scheduled_recurrent_grain):

bool esp_try_delay(const uint32_t start_ms, const uint32_t timeout_ms, const uint32_t intvl_ms) {
    uint32_t expired = millis() - start_ms;
    if (expired >= timeout_ms) {
        return true; // expired
    }

    // compute greatest chunked delay with respect to scheduled recurrent functions
    uint32_t grain_ms = std::gcd(intvl_ms, compute_scheduled_recurrent_grain());

    // recurrent scheduled functions will be called from esp_delay()->esp_suspend()
    esp_delay(grain_ms > 0 ?
        std::min((timeout_ms - expired), grain_ms):
        (timeout_ms - expired));

    return false; // expiration must be checked again
}

This manipulation can lead to a final call of __esp_delay with a non-zero argument. In this scenario, esp_schedule is not triggered.

The delayMicroseconds function does not yield to other tasks, so I assume it can't be a valid option.

On the other hand, yield (alias __yield) seems very similar to former delay(0)

extern "C" void __yield() {
    if (cont_can_suspend(g_pcont)) {
        esp_schedule();
        esp_suspend_within_cont();
    }
    else {
        panic();
    }
}

Sorry for the long post.

Great investigation, you really traced the code. The question is if this change is a bug in the new arduino framework and it will be fixed in future versions or not. I would regard this as a bug in the arduino framework since it changes the behaviour and is not backwards compatible.

@256dpi
Copy link
Owner

256dpi commented Feb 4, 2023

Thank you all for the detailed investigation into this issue!

I just released a new version (2.5.1) of the library that increases the delay to 1ms which hopefully results in a proper scheduler yield on all RTOS platforms.

I refrained from adding compiler conditionals to check the existence of "yield()" to keep the code simple for the time being. But, if this does not fix the issue or causes others, we can look into adding those as necessary.

@256dpi
Copy link
Owner

256dpi commented Feb 5, 2023

I'm closing this. Please reopen if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants