-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA Accelerated FFT #1166
Comments
I have tried a GitHub project called radio-core which uses cuda acceleration for Broadcast FM demodulation, it seems to make a significant impact in reducing system load. |
Also I’m using a hackrf so there is a lot of data being processed, from what I understand the hackrf internal filters only work properly with a sample rate of over 10mhz (although I might have misunderstood) |
But if I’m being honest my main motivation probably is “well the sdr software I use has gpu acceleration, look at my cool setup with all of this compute power!”. |
That's what I meant: GPU acceleration just for the sake of it.... I am alright to break SDRangel for this. I don´t care I have a NVIDIA graphics card which btw I think is superior to other graphic cards. |
@f4exb actually I was just looking at vkfft, I hadn’t seen it before, if it doesn’t have any breaking bugs then maybe it could be useful for people with amd cards as well? Also I hope I’m not asking too many questions but other than the waterfall is there anything else that uses FFTW? I’m fairly new to this field entirely but I’m enjoying studying the code to see how things work. |
There are a couple of other plugins that use FFTs as well, but all the code does this indirectly via the FFTEngine and FFTFactory classes. So what you probably want to look at, is modifying the FFTFactory to return a VkFFTEngine (which would be a subclass of FFTEngine) if it is applicable for the current system - if not, return an FFTWEngine - or something along those lines. See sdrbase/dsp/fftwengine.h kissengine.h (which is an alternative to FFTW) and FFTFactory.cpp - It looks like it should be fairly straightforward to drop a different implementation in there. While I doubt it will be of much benefit to existing plugins - high performance FFT and IFFT could be useful for OFDM modems in the future. |
@srcejon Awesome, I saw reference to an alternative fft engine in the source files, but I didn’t quite know where it fit into the rest of things. I went ahead and forked the repo and I’m running some benchmarks on VkFFT now as well. |
The "switch" is in https://github.com/f4exb/sdrangel/blob/master/sdrbase/dsp/fftengine.cpp and based on global defines set in the CMakeLists.txt in sdrbase: https://github.com/f4exb/sdrangel/blob/master/sdrbase/CMakeLists.txt#L9 For now it checks if I highly recommend to insert vkFFT as a third option keeping the other two and keep the option to fallback to FFTW by some compilation switch. On some systems Vulkan may not be available or have no or little advantage over FFTW e.g. on Raspberry Pi. |
Ideally I would have thought it should be a runtime decision rather than at compilation time, so that binary releases can use Vulkan etc if they are available, but can still fallback to FFTW if not. We don't really want to do multiple builds. That's assuming the list of new dependencies isn't problematic. |
Info taken from latest KrakenSDR update “ investigation into the possibility of using the GPU on the Pi 4 to compute the FFTs required in our algorithms faster via Vulkan and VkFFT. Long story short, for larger FFTs it seems that the Pi 4 GPU is capable of about a 2x speedup. However, an issue is that the Pi 4 Vulkan implementation is very new, and in it’s current state is missing an important feature relating to memory transfer. Without this feature, there is a need to perform unnecessary memory transfers and this brings us back to a 1x speedup. But we have considered that even without any speedup, using the GPU essentially provides us with another computational core which may still be of use as it frees up the CPU cores for other tasks.” |
Has anyone cracked this nut? |
Have just tried adding the CUDA version of VkFFT - and at the moment, it looks much slower than FFTW. Could be because I've done something wrong - but probably because we're just performing a single FFT serially, and there's too much overhead in getting it in and out of the GPU. |
Released in v7.15.3 |
Is your feature request related to a problem? Please describe.
Large FFT with large overlap is too slow
Describe the solution you'd like
I would like the FFT to run on my NVIDIA GPU and just for the sake of it!
Describe alternatives you've considered
FFTW sucks!
Additional context
For wider GPU support this could be considered: https://github.com/DTolm/VkFFT
The text was updated successfully, but these errors were encountered: