Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UIP Code Retransmission Timeout problem #167

Closed
nielsonm236 opened this issue Apr 2, 2023 · 2 comments
Closed

UIP Code Retransmission Timeout problem #167

nielsonm236 opened this issue Apr 2, 2023 · 2 comments

Comments

@nielsonm236
Copy link
Owner

While investigating Issue 159 I came back to a very old problem that I have looked into before and thought I had resolved, or at least mitigated. As code was added and the processor became busier I would see an increasing number of retransmissions of Browser packets and some Browser corruption. Adjustments to the repetition rate of the "periodic_service" call seemed to improve the situation, but those adjustments were gradually degrading MQTT performance.

I've finally determined what was going wrong, and the source of the issue is surprising. There is a problem in the "timer" functionality of the publicly sourced uip.c code. This problem is as follows:

  • The "timer" value used in the uip.c code to track timing of SYNACK responses and to determine when a retransmission is required is implemented incorrectly. The "timer" increments or decrements (depending on conditions) with every call of the uip code. The rate at which that code is called is determined by the "periodic_timer_expired" function which calls the "periodic_service" function. The periodic_timer_expired interval has been varied in code from 20ms to 80ms as code has been altered in an effort to provide the best MQTT performance while minimizing HTML retransmissions.

Why is this a problem?

  • This is a problem because the uip code uses the periodic_service call rate as its timebase. The biggest issue is that the RTO (Retransmission Timeout) value is set to "3" and it is supposed to be specified in seconds. RTO is initialized to "3" per typical TCP implementations in uipopt.h file. But in the uip code it is being compared to a timer "countdown" that is occurring at the periodic_service interval, NOT in 1 second intervals. This causes very frequent retransmissions of TCP segments, generating more traffic, and generating more retransmissions. Over time I had been increasing the periodic_service call rate from 20ms to 80ms to decrease the retransmission problems. But I still had problems when I would operate my browser over a weak WIFI connection.
  • There are a few other timeouts that are specified assuming 1 second intervals, but the RTO timeout is the main issue.

So you might think "well the periodic_service routine should be called at 1 second intervals to match the way the uip timer code was written".

  • NO. Then we would only transmit TCP segments at 1 second intervals (at best), clearly not what is wanted for MQTT OR HTML.

The fix:

  • I've modified the uip code so that it can be called at 20ms intervals to maximize MQTT performance by maximizing the rate at which MQTT messages can be transacted. I've also added a timing function within the uip code so that its "timer" operates at 1 second intervals regardless of the periodic_service call rate.
  • This is a simple change, but it required acquiescing to the idea that such a basic error was present in the public uip code. It is surprising to me that the uip code worked at all given this revelation, but it explains why I've had so many HTML and MQTT timing issues each time the code has been altered.

I've been testing for a few days now and only rarely see a retransmission timeout, typically when I deliberately connect to a weak WIFI signal. For the general case performance is greatly improved. I want to test further because I am still seeing an occasional "connection reset" error when doing code upgrades over ethernet. In that instance the upload is working fine, but the browser repaint after upload completion seems to get interrupted. That issue has also always been present and may be due to a different problem.

@nielsonm236
Copy link
Owner Author

Testing has gone well so I think I might go ahead with a release. In my systems I'm very pleased with how well this is working.

I may have been too harsh with regard to the uip.c code. If I consider that it was really designed for HTML interaction (not MQTT), and may have been designed for environments where there was a LOT more RAM available so that entire HTML transactions fit in a single segment (not broken up into many segments as I have to do), then maybe 1 transaction per second would be acceptable. We don't have that: We must handle MQTT transactions on the order of 10's of milliseconds, and due to RAM limitations even simple webpages are transmitted as multiple segments. Enough said ... it is fixed now.

I haven't been able to recreate the "connection reset" error. But I think it is still there. I will open a new issue to track that.

nielsonm236 added a commit that referenced this issue Apr 16, 2023
Addressed Issues #159, #163, #166, #167, #170 and several documentation changes
@nielsonm236
Copy link
Owner Author

Addressed in release 20230416 1116

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant