Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] uORB framework hangs when error code 0 is returned by function #15787

Closed
1 task done
linguini1 opened this issue Feb 8, 2025 · 6 comments
Closed
1 task done
Labels
Arch: arm Issues related to ARM (32-bit) architecture Area: OS Components OS Components issues OS: Linux Issues related to Linux (building system, etc) Type: Bug Something isn't working

Comments

@linguini1
Copy link
Contributor

linguini1 commented Feb 8, 2025

Description / Steps to reproduce the issue

When a driver-level implementation of the selftest function returns a 0 (OK) error code, the uORB framework hangs indefinitely.

Steps to reproduce:

  • Create a stub for the selftest function
  • Return a negated error code from the selftest function unconditionally
  • Call self-test through application layer using orb_ioctl(fd, SNIOC_SELFTEST, 0)
  • Observe failure with return of -1
  • Change lower half selftest to return 0
  • Observe application layer call hangs indefinitely

I observed this issue while writing a self-test function for the LSM6DSO32 IMU. When the selftest fails everything is fine, but when it passes the function hangs forever. I was able to confirm this by adding a log statement at the exit point and changing the return code to -EACCESS before the function returns. The log statement gets executed and the function does not hang. It does hang if I remove the error code change code and allow the function to return 0.

/* Other code above ...*/
early_ret:
  nxmutex_unlock(&dev->devlock);
  /* This gets printed and -1 shown by orb_ioctl when returning -EACCES */
  sninfo("Finished selftest");
  return -EACCES;
  /* Commenting out the above line and replacing with 'return 0' results in hang */

On which OS does this issue occur?

[OS: Linux]

What is the version of your OS?

Linux 6.13.1-arch1-1 #1 SMP PREEMPT_DYNAMIC GNU/Linux

NuttX Version

master

Issue Architecture

[Arch: arm]

Issue Area

[Area: OS Components]

Host information

No response

Verification

  • I have verified before submitting the report.
@linguini1 linguini1 added the Type: Bug Something isn't working label Feb 8, 2025
@github-actions github-actions bot added Arch: arm Issues related to ARM (32-bit) architecture Area: OS Components OS Components issues OS: Linux Issues related to Linux (building system, etc) labels Feb 8, 2025
@Donny9
Copy link
Contributor

Donny9 commented Feb 8, 2025

@linguini1 Hello:

I observed this issue while writing a self-test function for the LSM6DSO32 IMU. When the selftest fails everything is fine, but when it passes the function hangs forever. I was able to confirm this by adding a log statement at the exit point and changing the return code to -EACCESS before the function returns. The log statement gets executed and the function does not hang. It does hang if I remove the error code change code and allow the function to return 0.

The process of the selftest is quite straightforward: orb_ioctl -> ioctl -> sensor_ioctl -> selftest. Have you enabled SENSORS_RPMSG for multicore control? If not, when your upper layer calls orb_ioctl, is there a possibility that the file descriptor (fd) passed in is incorrect?

On a side note, orb_ioctl(fd, SNIOC_RESET, 0) returning -1 regardless of the error returned by the lowerhalf function is not documented anywhere I could find. I feel it should return a positive error code indicating the error that occurred, like ioctl does.

"orb_ioctl" is actually equivalent to "ioctl" in that it requires using errno to determine specific error codes. However, we have internally modified it to directly return errno. I will push this pull request (PR) shortly.

@linguini1
Copy link
Contributor Author

The process of the selftest is quite straightforward: orb_ioctl -> ioctl -> sensor_ioctl -> selftest. Have you enabled SENSORS_RPMSG for multicore control? If not, when your upper layer calls orb_ioctl, is there a possibility that the file descriptor (fd) passed in is incorrect?

I did take a look at this through the call graph and I'm unsure where this would cause an issue as well. I added some log statements to the ioctl implementation of sensor.c to see the value of ret once the lowerhalf selftest was called. The return value was what I expected. If I change the value of ret within the sensor.c ioctl implementation, the hang also stops. This is confusing me because I really don't see that value used anywhere else.

I know the file descriptor is valid because other ioctl calls that pass-through to the lower-half control function work without issue. I also do not have SENSORS_RPMSG enabled, but I am working on a single core right now.

"orb_ioctl" is actually equivalent to "ioctl" in that it requires using errno to determine specific error codes. However, we have internally modified it to directly return errno. I will push this pull request (PR) shortly.

Been a long day and it appears I forgot how ioctl works! I think setting errno is fine then, especially since it adheres to ioctl closer.

@Donny9
Copy link
Contributor

Donny9 commented Feb 8, 2025

I did take a look at this through the call graph and I'm unsure where this would cause an issue as well. I added some log statements to the ioctl implementation of sensor.c to see the value of ret once the lowerhalf selftest was called. The return value was what I expected. If I change the value of ret within the sensor.c ioctl implementation, the hang also stops. This is confusing me because I really don't see that value used anywhere else.

@linguini1
Your meaning is that the first call to selftest can succeed, but all subsequent ioctl calls hang? Is it because the lock cannot be acquired or some other issue? Where is the task waiting? I have reviewed your driver code and it doesn't seem to have an asymmetric lock issue.

@linguini1
Copy link
Contributor Author

@linguini1 Your meaning is that the first call to selftest can succeed, but all subsequent ioctl calls hang?

No, I mean the call to selftest only succeeds if it returns a non-zero error code. I cannot make any subsequent ioctl call due to the hanging. The hanging only seems to occur when 0 is returned from the selftest function.

The hanging doesn't occur in my implementation of selftest, though. I went to line 878 of sensor.c where the lowerhalf selftest is called, and added ret = -EINVAL on the next line. If I do this, orb_ioctl doesn't hang even if my lowerhalf selftest call returns 0. I cannot figure out why this would be because other ioctl commands in sensor.c return 0 with no problems. I will share an example code snippet that I have been using shortly.

@Donny9
Copy link
Contributor

Donny9 commented Feb 8, 2025

No, I mean the call to selftest only succeeds if it returns a non-zero error code. I cannot make any subsequent ioctl call due to the hanging. The hanging only seems to occur when 0 is returned from the selftest function.

The hanging doesn't occur in my implementation of selftest, though. I went to line 878 of sensor.c where the lowerhalf selftest is called, and added ret = -EINVAL on the next line. If I do this, orb_ioctl doesn't hang even if my lowerhalf selftest call returns 0. I cannot figure out why this would be because other ioctl commands in sensor.c return 0 with no problems. I will share an example code snippet that I have been using shortly.

This is very strange. You can add some logging in file_vioctl to see what happens in the subsequent process. If the system hangs, does it manifest as a crash? Are there any logs available?

@linguini1
Copy link
Contributor Author

I am not sure what changed but this is working now. I will keep an eye on this because I encountered the same issue the the LIS2MDL driver I wrote, but I will properly test further before re-raising an issue so as to not waste anyone's time. Thank you @Donny9 for the help and my apologies!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arch: arm Issues related to ARM (32-bit) architecture Area: OS Components OS Components issues OS: Linux Issues related to Linux (building system, etc) Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants