CUDA Accelerated FFT #1166

f4exb · 2022-02-25T16:23:39Z

Is your feature request related to a problem? Please describe.
Large FFT with large overlap is too slow

Describe the solution you'd like
I would like the FFT to run on my NVIDIA GPU and just for the sake of it!

Describe alternatives you've considered
FFTW sucks!

Additional context
For wider GPU support this could be considered: https://github.com/DTolm/VkFFT

Jcwscience · 2022-02-25T16:43:11Z

I have tried a GitHub project called radio-core which uses cuda acceleration for Broadcast FM demodulation, it seems to make a significant impact in reducing system load.

Jcwscience · 2022-02-25T16:47:19Z

Also I’m using a hackrf so there is a lot of data being processed, from what I understand the hackrf internal filters only work properly with a sample rate of over 10mhz (although I might have misunderstood)

Jcwscience · 2022-02-25T16:51:20Z

But if I’m being honest my main motivation probably is “well the sdr software I use has gpu acceleration, look at my cool setup with all of this compute power!”.

f4exb · 2022-02-25T20:57:55Z

That's what I meant: GPU acceleration just for the sake of it....

I am alright to break SDRangel for this. I don´t care I have a NVIDIA graphics card which btw I think is superior to other graphic cards.

Jcwscience · 2022-02-25T21:15:25Z

@f4exb actually I was just looking at vkfft, I hadn’t seen it before, if it doesn’t have any breaking bugs then maybe it could be useful for people with amd cards as well? Also I hope I’m not asking too many questions but other than the waterfall is there anything else that uses FFTW? I’m fairly new to this field entirely but I’m enjoying studying the code to see how things work.

srcejon · 2022-02-25T21:55:22Z

There are a couple of other plugins that use FFTs as well, but all the code does this indirectly via the FFTEngine and FFTFactory classes.

So what you probably want to look at, is modifying the FFTFactory to return a VkFFTEngine (which would be a subclass of FFTEngine) if it is applicable for the current system - if not, return an FFTWEngine - or something along those lines.

See sdrbase/dsp/fftwengine.h kissengine.h (which is an alternative to FFTW) and FFTFactory.cpp - It looks like it should be fairly straightforward to drop a different implementation in there.

While I doubt it will be of much benefit to existing plugins - high performance FFT and IFFT could be useful for OFDM modems in the future.

Jcwscience · 2022-02-25T21:59:09Z

@srcejon Awesome, I saw reference to an alternative fft engine in the source files, but I didn’t quite know where it fit into the rest of things. I went ahead and forked the repo and I’m running some benchmarks on VkFFT now as well.

f4exb · 2022-02-26T06:48:37Z

So what you probably want to look at, is modifying the FFTFactory to return a VkFFTEngine (which would be a subclass of FFTEngine) if it is applicable for the current system - if not, return an FFTWEngine - or something along those lines.

The "switch" is in https://github.com/f4exb/sdrangel/blob/master/sdrbase/dsp/fftengine.cpp and based on global defines set in the CMakeLists.txt in sdrbase: https://github.com/f4exb/sdrangel/blob/master/sdrbase/CMakeLists.txt#L9 For now it checks if libfftw3fis available which bases the choice between FFTW (-DUSE_FFTW) or an internal KISS FFT (-DUSE_KISSFFT).

I highly recommend to insert vkFFT as a third option keeping the other two and keep the option to fallback to FFTW by some compilation switch. On some systems Vulkan may not be available or have no or little advantage over FFTW e.g. on Raspberry Pi.

srcejon · 2022-02-26T19:41:52Z

I highly recommend to insert vkFFT as a third option keeping the other two and keep the option to fallback to FFTW by some compilation switch. On some systems Vulkan may not be available or have no or little advantage over FFTW e.g. on Raspberry Pi.

Ideally I would have thought it should be a runtime decision rather than at compilation time, so that binary releases can use Vulkan etc if they are available, but can still fallback to FFTW if not. We don't really want to do multiple builds. That's assuming the list of new dependencies isn't problematic.

alphafox02 · 2022-03-04T20:29:27Z

Info taken from latest KrakenSDR update

“ investigation into the possibility of using the GPU on the Pi 4 to compute the FFTs required in our algorithms faster via Vulkan and VkFFT. Long story short, for larger FFTs it seems that the Pi 4 GPU is capable of about a 2x speedup. However, an issue is that the Pi 4 Vulkan implementation is very new, and in it’s current state is missing an important feature relating to memory transfer. Without this feature, there is a need to perform unnecessary memory transfers and this brings us back to a 1x speedup. But we have considered that even without any speedup, using the GPU essentially provides us with another computational core which may still be of use as it frees up the CPU cores for other tasks.”

savagesmc · 2023-08-06T02:38:35Z

Has anyone cracked this nut?

srcejon · 2023-08-07T11:33:46Z

Have just tried adding the CUDA version of VkFFT - and at the moment, it looks much slower than FFTW. Could be because I've done something wrong - but probably because we're just performing a single FFT serially, and there's too much overhead in getting it in and out of the GPU.

f4exb · 2023-08-20T21:49:31Z

Released in v7.15.3

f4exb added the enhancement label Feb 25, 2022

srcejon self-assigned this Aug 7, 2023

srcejon mentioned this issue Aug 14, 2023

GPU FFT and simple profiler #1779

Merged

f4exb closed this as completed Aug 20, 2023

f4exb added this to the v7.15.3 milestone Aug 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA Accelerated FFT #1166

CUDA Accelerated FFT #1166

f4exb commented Feb 25, 2022 •

edited

Loading

Jcwscience commented Feb 25, 2022

Jcwscience commented Feb 25, 2022

Jcwscience commented Feb 25, 2022

f4exb commented Feb 25, 2022 •

edited

Loading

Jcwscience commented Feb 25, 2022

srcejon commented Feb 25, 2022

Jcwscience commented Feb 25, 2022

f4exb commented Feb 26, 2022 •

edited

Loading

srcejon commented Feb 26, 2022 •

edited

Loading

alphafox02 commented Mar 4, 2022

savagesmc commented Aug 6, 2023

srcejon commented Aug 7, 2023 •

edited

Loading

f4exb commented Aug 20, 2023

CUDA Accelerated FFT #1166

CUDA Accelerated FFT #1166

Comments

f4exb commented Feb 25, 2022 • edited Loading

Jcwscience commented Feb 25, 2022

Jcwscience commented Feb 25, 2022

Jcwscience commented Feb 25, 2022

f4exb commented Feb 25, 2022 • edited Loading

Jcwscience commented Feb 25, 2022

srcejon commented Feb 25, 2022

Jcwscience commented Feb 25, 2022

f4exb commented Feb 26, 2022 • edited Loading

srcejon commented Feb 26, 2022 • edited Loading

alphafox02 commented Mar 4, 2022

savagesmc commented Aug 6, 2023

srcejon commented Aug 7, 2023 • edited Loading

f4exb commented Aug 20, 2023

f4exb commented Feb 25, 2022 •

edited

Loading

f4exb commented Feb 25, 2022 •

edited

Loading

f4exb commented Feb 26, 2022 •

edited

Loading

srcejon commented Feb 26, 2022 •

edited

Loading

srcejon commented Aug 7, 2023 •

edited

Loading