Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws_iot_shadow_yield() does account for next pingtime when passing timer to iot_tls_read #38

Closed
tim-nordell-nimbelink opened this issue Aug 1, 2016 · 11 comments

Comments

@tim-nordell-nimbelink
Copy link

I'm assuming the iot_tls_read(...) call should block until it gets any amount of data or until the timer passed expires. If this assumption is correct, then the _aws_iot_mqtt_internal_yield(...) function needs to be updated as suggested below. (The current sample mbedtls implementation does not have this behavior and I'm guessing this is a bug - the fix for this is below in order to expose this other bug.)

The _aws_iot_mqtt_internal_yield(...) function invokes the iot_tls_read(...) call (via aws_iot_mqtt_internal_cycle_read(...)) with the full timeout value to the TLS function - it should give the lesser of the time the next ping needs to be sent or the overall timeout. The example case is to fix the mbedtls implementation to block for the full time, and to set the overall elapsed time for the yield function to a value over the time the unit needs to send another ping to the remote. Here's a patch that exposes the issue:

diff --git a/platform/linux/mbedtls/network_mbedtls_wrapper.c b/platform/linux/mbedtls/network_mbedtls_wrapper.c
index a8a3486..82c2e76 100644
--- a/platform/linux/mbedtls/network_mbedtls_wrapper.c
+++ b/platform/linux/mbedtls/network_mbedtls_wrapper.c
@@ -309,7 +309,7 @@ IoT_Error_t iot_tls_read(Network *pNetwork, unsigned char *pMsg, size_t len, Tim
                ret = mbedtls_ssl_read(&(tlsDataParams->ssl), pMsg, len);
                if(ret >= 0) { /* 0 is for EOF */
                        rxLen += ret;
-               } else if(ret != MBEDTLS_ERR_SSL_WANT_READ) {
+               } else if(ret != MBEDTLS_ERR_SSL_WANT_READ && ret != MBEDTLS_ERR_SSL_TIMEOUT) {
                        isErrorFlag = true;
                }

diff --git a/samples/linux/shadow_sample/shadow_sample.c b/samples/linux/shadow_sample/shadow_sample.c
index 9a0b270..3bd10dc 100644
--- a/samples/linux/shadow_sample/shadow_sample.c
+++ b/samples/linux/shadow_sample/shadow_sample.c
@@ -230,7 +230,7 @@ int main(int argc, char **argv) {

        // loop and publish a change in temperature
        while(NETWORK_ATTEMPTING_RECONNECT == rc || NETWORK_RECONNECTED == rc || SUCCESS == rc) {
-               rc = aws_iot_shadow_yield(&mqttClient, 200);
+               rc = aws_iot_shadow_yield(&mqttClient, 60000);
                if(NETWORK_ATTEMPTING_RECONNECT == rc) {
                        sleep(1);
                        // If the client is attempting to reconnect we will skip the rest of the loop.

The resulting behavior is that the client keeps disconnecting due to not sending a ping, and it needs to reconnect at the elapse of the 60 second timer.

@tim-nordell-nimbelink tim-nordell-nimbelink changed the title aws_iot_shadow_yield() does not send pings while waiting in mbedtls_read aws_iot_shadow_yield() does account for next pingtime when passing timer to iot_tls_read Aug 1, 2016
@chaurah
Copy link
Contributor

chaurah commented Aug 1, 2016

Hi Tim,
The read function is slightly messy because it doesn't actually use the timer that is passed in as the tls command timeout. Instead, the read timeout is set at the end of the connect function here.

We were seeing unexpected behavior when we tried to set this in every call of the loop in the tls_read function. That is already something we expect to fix in a future release. You can enable the commented line here to get the behavior you are looking for. But its not well tested at the moment on our end.

All of that being said, I am not entirely sure I understood your post correctly. You were trying to test the blocking behavior for the tls_read call. Is that correct? Was the above information useful?

Rahul

@tim-nordell-nimbelink
Copy link
Author

Rahul -

No, in the port to the embedded platform I'm working on (STM34F401 Nucleo) I had the implemented the iot_tls_read(...) call to either:

  1. Return upon receiving any data
  2. Return upon timer expiration
  3. Return upon TLS error

I noticed when running the shadow example on the port if I bumped the yield time to a large number, such as 60 seconds, that the library ends up losing the connection to the primary server. This is due to the library not sending a ping command since the iot_tls_read(...) call is waiting for 1 of the three conditions above for the 60 seconds to elapse even though the remote server expected a ping within a shorter interval from then. (It should have set the elapsed time to iot_tls_read(...) to the time to the next scheduled PINGREQ command.)

If the iot_tls_read(...) function is not supposed to use the timer at all... then the documentation should be updated.

Tim

@tim-nordell-nimbelink
Copy link
Author

(The diff above made the sample TLS implemention respect the timer so that the bug can be exposed in the main codebase since the one bug is hiding the other.)

@chaurah
Copy link
Contributor

chaurah commented Aug 2, 2016

Hi Tim,
The iot_tls_read function does use the timer. Its just not passing the timer to the underlying TLS layer directly and instead using it in the loop that calls the TLS layer read.

In your particular case, yes, you are correct that the ping timer value is the one that should be used as timeout. Let me run this by the rest of the team and see what the best way to handle this case should be. At the very least, a documentation update pointing this out is required.

Thank you for pointing out this issue. Please let us know if you have any further suggestions.

Rahul

@rongsaws
Copy link

rongsaws commented Aug 19, 2016

This is by design. See the comment below extracted from the code - https://github.com/aws/aws-iot-device-sdk-embedded-C/blob/master/src/aws_iot_mqtt_client_yield.c#L165-L168

Yield() must be called at a rate faster than the keepalive interval. It must also be called at a rate faster than the incoming message rate as this is the only way the client receives processing time to manage incoming messages.

Documentation should be updated to call this out clearly so that yield() is called frequent enough and the wait time passed to aws_iot_shadow_yield() is shorter than the keep-alive timeout.

@tim-nordell-nimbelink
Copy link
Author

tim-nordell-nimbelink commented Aug 19, 2016

In this case, the function has control for the entire keep alive interval with additional time beyond this so it's not exactly a rate thing. One would expect that the function would do the right thing in the case that it holds onto control of the thread for longer than the keep alive interval.

(If this should never keep control beyond the keep alive interval by design, the function could truncate the amount of time that it holds control to this amount of time. Another way of looking at this function is one is saying "do what you need to do for X seconds" - it should properly manage the internal state to do this, including returning early if really needs another external invocation, but ideally it'd just keep doing its thing internally until the X seconds has elapsed.)

@rongsaws
Copy link

The keep-alive interval is controlled through pClient->pingTimer, which is an independent timer from the yield timeout. As long as the yield wait time is small enough (compared to keepAliveInterval), keep alive message rate will be somewhat constant.

@tim-nordell-nimbelink
Copy link
Author

Regardless, a yield() style function should be able to main the internal state when it's given ample timeout periods such as minutes at a time.

(The "rate" messages in the documentation would normally make one conclude that it has to be run within that amount of time so that messages get through, not to make it functional. E.g. no gaps in control to the yield function that is greater than the keep alive interval, or the incoming message rate. In this case, the function has control of the thread within that "rate" so it should just work.)

@rongsaws
Copy link

I agree that the API documentation should be improved. I think the right strategy for calling aws_iot_mqtt_yield() is to call it as often as possible, from say application's main event loop. In order to prevent it from exhausting the CPU in the case where the main loop doesn't have blocking wait mechanism, argument timeout_ms is there to prevent busy looping by adding blocking wait on network receive for maximum length given by timeout_ms. If the app's main event loop has some kind of blocking wait built in, one could specify 0 for timeout_ms when calling aws_iot_mqtt_yield(), as long as it's called frequently enough.

(Note, of course 0 or extremely small yield timeout could lead to the issue you reported in #35, to which I believe I already have a solution. The idea basically is to use a separate timer - packet receiving timer - to time the package reads from the 2nd read here and onwards. The packet receiving timer is the maximum time we will until a full MQTT packet is received, it's a constant timer so it won't be affected by whatever the remaining time the yield cycle has. )

@tim-nordell-nimbelink
Copy link
Author

tim-nordell-nimbelink commented Aug 19, 2016

The only point I disagree with is what the permitted range of the parameter timeout_ms when invoking the ..._yield() call should be by design.

For it to be friendly to folks using the library (and not needing to realize specific quirks on the way it operates), it should accept an arbitrary timeout and not cause the internal state of the library to be out-of-wack due to the parameter specified there. E.g.

while(1)
{
   aws_iot_mqtt_yield(..., 100);
}

should be just as valid as:

while(1)
{
    aws_iot_mqtt_yield(..., 60000);
}

but currently they are not equivalent despite the second case having control of the CPU at the time the next keep alive should be sent.

(Or in replacing the above with aws_iot_shadow_yield(...).)

I digress.

@rongsaws
Copy link

That part I have already agreed with you :) . We could either improve the documentation or throw an error if the given timeout_ms is greater than the keep alive interval. In reality, keep alive interval is generally way greater than timeout_ms for the cost and efficiency reasons.

@chaurah chaurah added the bug label Nov 19, 2016
tgsong pushed a commit to tgsong/aws-iot-device-sdk-embedded-C that referenced this issue Nov 13, 2020
Implement http deinit function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants