Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(freertos): Made select function non-blocking on Linux target (IDFGH-13569) #14457

Merged
merged 1 commit into from
Sep 14, 2024

Conversation

snake-4
Copy link
Contributor

@snake-4 snake-4 commented Aug 28, 2024

The select function wrapper was rewritten to be non-blocking on Linux systems, as it was stealing all the CPU time from lower priority tasks when called from a higher priority task. This is because the FreeRTOS scheduler does not know that the task thread is sleeping during the system call.

This issue manifests all "slow" system calls on the Linux target, but handling the case of select fixes the problems for most ESP-IDF components.

The FreeRTOS POSIX port documentation lists this as a known issue, so user code is responsible handling this case if other system calls are used, even if unknowingly.

This closes the GH issue #14395 "select() blocks the FreeRTOS scheduler on Linux target".

Additional Notes

  1. I'd suggest adding a note and a reference to this section of the FreeRTOS documentation to the ESP-IDF documentation.
  2. The fix can be confirmed by running the code in my previous bug report select() blocks the FreeRTOS scheduler on Linux target (IDFGH-13498) #14395.

Copy link

github-actions bot commented Aug 28, 2024

Messages
📖 🎉 Good Job! All checks are passing!

👋 Hello snake-4, we appreciate your contribution to this project!


📘 Please review the project's Contributions Guide for key guidelines on code, documentation, testing, and more.

🖊️ Please also make sure you have read and signed the Contributor License Agreement for this project.

Click to see more instructions ...


This automated output is generated by the PR linter DangerJS, which checks if your Pull Request meets the project's requirements and helps you fix potential issues.

DangerJS is triggered with each push event to a Pull Request and modify the contents of this comment.

Please consider the following:
- Danger mainly focuses on the PR structure and formatting and can't understand the meaning behind your code or changes.
- Danger is not a substitute for human code reviews; it's still important to request a code review from your colleagues.
- To manually retry these Danger checks, please navigate to the Actions tab and re-run last Danger workflow.

Review and merge process you can expect ...


We do welcome contributions in the form of bug reports, feature requests and pull requests via this public GitHub repository.

This GitHub project is public mirror of our internal git repository

1. An internal issue has been created for the PR, we assign it to the relevant engineer.
2. They review the PR and either approve it or ask you for changes or clarifications.
3. Once the GitHub PR is approved, we synchronize it into our internal git repository.
4. In the internal git repository we do the final review, collect approvals from core owners and make sure all the automated tests are passing.
- At this point we may do some adjustments to the proposed change, or extend it by adding tests or documentation.
5. If the change is approved and passes the tests it is merged into the default branch.
5. On next sync from the internal git repository merged change will appear in this public GitHub repository.

Generated by 🚫 dangerJS against 8a39db3

@snake-4 snake-4 force-pushed the linux-select-fix branch 2 times, most recently from 12b141f to 1cc8e5c Compare August 28, 2024 11:28
@snake-4 snake-4 changed the title fix(freertos): Made select function non-blocking on Linux target fix(freertos)!: Made select function non-blocking on Linux target Aug 28, 2024
@espressif-bot espressif-bot added the Status: Opened Issue is new label Aug 28, 2024
@github-actions github-actions bot changed the title fix(freertos)!: Made select function non-blocking on Linux target fix(freertos)!: Made select function non-blocking on Linux target (IDFGH-13569) Aug 28, 2024
@snake-4 snake-4 changed the title fix(freertos)!: Made select function non-blocking on Linux target (IDFGH-13569) fix(freertos): Made select function non-blocking on Linux target (IDFGH-13569) Aug 28, 2024
Copy link
Contributor

@0xjakob 0xjakob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @snake-4, this looks like a great improvement!

The only question I have here is how did you determine the number of 10 ticks for waiting? Is there a reasoning why it's exactly 10 or could it be something else, too?

Copy link
Collaborator

@david-cermak david-cermak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@snake-4
Copy link
Contributor Author

snake-4 commented Sep 4, 2024

Hi @snake-4, this looks like a great improvement!

The only question I have here is how did you determine the number of 10 ticks for waiting? Is there a reasoning why it's exactly 10 or could it be something else, too?

There's no specific reason why it's 10 but it has to be at least 1 tick to let the FreeRTOS scheduler know that lower priority tasks may run. Although, at 1 tick delay, it would take up 50% of the simulator's CPU time (not the actual host CPU time). A delay of 10 ticks seemed like a good trade-off, considering the select timeout resolution is affected by it as well.

I'll refer to threads that are not scheduled by the FreeRTOS scheduler as raw threads and FreeRTOS tasks as just tasks below.

The reason why polling is used is because while it is possible to signal raw threads from tasks (using pthread), other way around is not possible as a FreeRTOS semaphore would have to be used. The simulator does not expect the FreeRTOS APIs to be called from raw threads and doing so will cause issues as mentioned here.

A method for achieving what I've said above could be done by modifying the FreeRTOS scheduler's internal state to immediately schedule the task waiting on IO upon completion, but this approach would require modifying the simulator's source code.

@snake-4
Copy link
Contributor Author

snake-4 commented Sep 4, 2024

As per my original post, all blocking host system calls (except select) will be considered as "work" by the FreeRTOS scheduler. This includes usleep as well.

The select function wrapper was rewritten to be non-blocking
on Linux systems, as it was stealing all the CPU time
from lower priority tasks when called from a higher priority task.
This is because the FreeRTOS scheduler does not know
that the task thread is sleeping during the system call.

This issue manifests all "slow" system calls on the Linux target,
but handling the case of select fixes the problems for most ESP-IDF components.

The FreeRTOS POSIX port documentation lists this as a known issue,
so user code is responsible handling this case if other system calls are used,
even if unknowingly.

This closes GH issue espressif#14395 "select() blocks the FreeRTOS scheduler on Linux target"
@david-cermak
Copy link
Collaborator

Thanks for explaining 👍 and for the updates.

will add this PR to our internal pipeline to be merged it to master (after all internal checks pass, which may take some time)
Thank you again for your contribution!

@espressif-bot espressif-bot added Status: Reviewing Issue is being reviewed Status: Done Issue is done internally Resolution: NA Issue resolution is unavailable and removed Status: Opened Issue is new Status: Reviewing Issue is being reviewed labels Sep 6, 2024
@espressif-bot espressif-bot merged commit 8a39db3 into espressif:master Sep 14, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants