-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libbladeRF] Device hangs after repeated open/stream/close #408
Comments
This issue seems to be only on OSX, based on my current testing. I'm currently trying to determine if we can better handle failed canceled transfers by forcibly entering the "Stream done" state and continuing to shut down:
However, this is yielding segfaults, - more digging and review of the code paths in this condition are required. |
Just wanted to throw in and say that I was also seeing this on OSX. I have been experimenting with creating a tool similar to |
@staticfloat I can now reproduce this on all OSes via the libbladeRF_repeated_stream test program. In the past, I did not see this, but I'm now seeing this happen when I revert to very early changesets that I do not recall exhibiting this. After the upcoming RC1 I'm going to try to dive more into this. This is what causes the lockup in the DC calibration table generation. For the time being, I am addressing this using a single stream (not repeatedly opening/closing) and using timestamps to determine when to receive samples post-retune. That approach may work well for you. |
This code is a re-designed set of DC calibration functionality built atop of libbladeRF. This functionality provides the ability to: - Run LMS6002D DC calibrations per module (or all at once) - Search for nominal RX/TX DC correction values for a single frequencies or a range of frequencies. To avoid inducing issue #408 by utilizing a single stream, rather than repeatedly starting and stopping streams. No TX thread is used, as this is uneccessary; it is sufficient to flush 0+0j samples through the TX path at the start of calibration and allow the DAC to remain held in this state as the FPGA (intentionally) underruns.
This code is a re-designed set of DC calibration functionality built atop of libbladeRF. This functionality provides the ability to: - Run LMS6002D DC calibrations per module (or all at once) - Search for nominal RX/TX DC correction values for a single frequencies or a range of frequencies. To avoid inducing issue #408 by utilizing a single stream, rather than repeatedly starting and stopping streams. No TX thread is used, as this is uneccessary; it is sufficient to flush 0+0j samples through the TX path at the start of calibration and allow the DAC to remain held in this state as the FPGA (intentionally) underruns.
This code is a re-designed set of DC calibration functionality built atop of libbladeRF. This functionality provides the ability to: - Run LMS6002D DC calibrations per module (or all at once) - Search for nominal RX/TX DC correction values for a single frequencies or a range of frequencies. To avoid inducing issue #408 by utilizing a single stream, rather than repeatedly starting and stopping streams. No TX thread is used, as this is uneccessary; it is sufficient to flush 0+0j samples through the TX path at the start of calibration and allow the DAC to remain held in this state as the FPGA (intentionally) underruns.
This code is a re-designed set of DC calibration functionality built atop of libbladeRF. This functionality provides the ability to: - Run LMS6002D DC calibrations per module (or all at once) - Search for nominal RX/TX DC correction values for a single frequencies or a range of frequencies. To avoid inducing issue #408 by utilizing a single stream, rather than repeatedly starting and stopping streams. No TX thread is used, as this is uneccessary; it is sufficient to flush 0+0j samples through the TX path at the start of calibration and allow the DAC to remain held in this state as the FPGA (intentionally) underruns.
This code is a re-designed set of DC calibration functionality built atop of libbladeRF. This functionality provides the ability to: - Run LMS6002D DC calibrations per module (or all at once) - Search for nominal RX/TX DC correction values for a single frequencies or a range of frequencies. To avoid inducing issue #408 by utilizing a single stream, rather than repeatedly starting and stopping streams. No TX thread is used, as this is uneccessary; it is sufficient to flush 0+0j samples through the TX path at the start of calibration and allow the DAC to remain held in this state as the FPGA (intentionally) underruns.
This code is a re-designed set of DC calibration functionality built atop of libbladeRF. This functionality provides the ability to: - Run LMS6002D DC calibrations per module (or all at once) - Search for nominal RX/TX DC correction values for a single frequencies or a range of frequencies. To avoid inducing issue #408 by utilizing a single stream, rather than repeatedly starting and stopping streams. No TX thread is used, as this is uneccessary; it is sufficient to flush 0+0j samples through the TX path at the start of calibration and allow the DAC to remain held in this state as the FPGA (intentionally) underruns.
…transfer is called on one transfer, all transfers on the same endpoint are cancelled. Calling libusb_cancel_transfer on additional transfers on the same endpoint _may_ result in macOS returning kIOReturnAborted. The net result is the bladeRF can no longer be communicated with. The fix to this is to only call libusb_cancel_transfer once in cancel_all_transfers and set the appropriate state for all transfers. Refer to libusb_cancel_transfer documentation for more details.
It has been reported that running a long series of repeated "open, stream samples, close" operations can cause the device to hang. It is generally not possible to open the device after this, as its left in the "running" state.
I've used the following scripts for this, and see the fault reproduced near a thousand iterations in:
rx_test.sh:
test_script.txt:
The hangup occurs when we request the device to shut down.
When ending the stream, we tell the async handler to stop submitting new transfers and to denote when all pending transfers have come back (either with data, or with the error code resulting from us requesting that pending transfers be canceled).
However, the below log shows that some unexpected libusb failure occurs when requesting that transfers to be canceled. Furthermore, the "Wait on state change failed" indicates that none of the pending transfers are completing.
In moving forward with this, I propose the following tasks:
The text was updated successfully, but these errors were encountered: