Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Accelerated FFT #1166

Closed
f4exb opened this issue Feb 25, 2022 · 13 comments
Closed

CUDA Accelerated FFT #1166

f4exb opened this issue Feb 25, 2022 · 13 comments
Assignees
Milestone

Comments

@f4exb
Copy link
Owner

f4exb commented Feb 25, 2022

Is your feature request related to a problem? Please describe.
Large FFT with large overlap is too slow

Describe the solution you'd like
I would like the FFT to run on my NVIDIA GPU and just for the sake of it!

Describe alternatives you've considered
FFTW sucks!

Additional context
For wider GPU support this could be considered: https://github.com/DTolm/VkFFT

@Jcwscience
Copy link

I have tried a GitHub project called radio-core which uses cuda acceleration for Broadcast FM demodulation, it seems to make a significant impact in reducing system load.

@Jcwscience
Copy link

Also I’m using a hackrf so there is a lot of data being processed, from what I understand the hackrf internal filters only work properly with a sample rate of over 10mhz (although I might have misunderstood)

@Jcwscience
Copy link

But if I’m being honest my main motivation probably is “well the sdr software I use has gpu acceleration, look at my cool setup with all of this compute power!”.

@f4exb
Copy link
Owner Author

f4exb commented Feb 25, 2022

That's what I meant: GPU acceleration just for the sake of it....

I am alright to break SDRangel for this. I don´t care I have a NVIDIA graphics card which btw I think is superior to other graphic cards.

@Jcwscience
Copy link

@f4exb actually I was just looking at vkfft, I hadn’t seen it before, if it doesn’t have any breaking bugs then maybe it could be useful for people with amd cards as well? Also I hope I’m not asking too many questions but other than the waterfall is there anything else that uses FFTW? I’m fairly new to this field entirely but I’m enjoying studying the code to see how things work.

@srcejon
Copy link
Collaborator

srcejon commented Feb 25, 2022

There are a couple of other plugins that use FFTs as well, but all the code does this indirectly via the FFTEngine and FFTFactory classes.

So what you probably want to look at, is modifying the FFTFactory to return a VkFFTEngine (which would be a subclass of FFTEngine) if it is applicable for the current system - if not, return an FFTWEngine - or something along those lines.

See sdrbase/dsp/fftwengine.h kissengine.h (which is an alternative to FFTW) and FFTFactory.cpp - It looks like it should be fairly straightforward to drop a different implementation in there.

While I doubt it will be of much benefit to existing plugins - high performance FFT and IFFT could be useful for OFDM modems in the future.

@Jcwscience
Copy link

@srcejon Awesome, I saw reference to an alternative fft engine in the source files, but I didn’t quite know where it fit into the rest of things. I went ahead and forked the repo and I’m running some benchmarks on VkFFT now as well.

@f4exb
Copy link
Owner Author

f4exb commented Feb 26, 2022

So what you probably want to look at, is modifying the FFTFactory to return a VkFFTEngine (which would be a subclass of FFTEngine) if it is applicable for the current system - if not, return an FFTWEngine - or something along those lines.

The "switch" is in https://github.com/f4exb/sdrangel/blob/master/sdrbase/dsp/fftengine.cpp and based on global defines set in the CMakeLists.txt in sdrbase: https://github.com/f4exb/sdrangel/blob/master/sdrbase/CMakeLists.txt#L9 For now it checks if libfftw3fis available which bases the choice between FFTW (-DUSE_FFTW) or an internal KISS FFT (-DUSE_KISSFFT).

I highly recommend to insert vkFFT as a third option keeping the other two and keep the option to fallback to FFTW by some compilation switch. On some systems Vulkan may not be available or have no or little advantage over FFTW e.g. on Raspberry Pi.

@srcejon
Copy link
Collaborator

srcejon commented Feb 26, 2022

I highly recommend to insert vkFFT as a third option keeping the other two and keep the option to fallback to FFTW by some compilation switch. On some systems Vulkan may not be available or have no or little advantage over FFTW e.g. on Raspberry Pi.

Ideally I would have thought it should be a runtime decision rather than at compilation time, so that binary releases can use Vulkan etc if they are available, but can still fallback to FFTW if not. We don't really want to do multiple builds. That's assuming the list of new dependencies isn't problematic.

@alphafox02
Copy link

Info taken from latest KrakenSDR update

“ investigation into the possibility of using the GPU on the Pi 4 to compute the FFTs required in our algorithms faster via Vulkan and VkFFT. Long story short, for larger FFTs it seems that the Pi 4 GPU is capable of about a 2x speedup. However, an issue is that the Pi 4 Vulkan implementation is very new, and in it’s current state is missing an important feature relating to memory transfer. Without this feature, there is a need to perform unnecessary memory transfers and this brings us back to a 1x speedup. But we have considered that even without any speedup, using the GPU essentially provides us with another computational core which may still be of use as it frees up the CPU cores for other tasks.”

@savagesmc
Copy link

Has anyone cracked this nut?

@srcejon srcejon self-assigned this Aug 7, 2023
@srcejon
Copy link
Collaborator

srcejon commented Aug 7, 2023

Have just tried adding the CUDA version of VkFFT - and at the moment, it looks much slower than FFTW. Could be because I've done something wrong - but probably because we're just performing a single FFT serially, and there's too much overhead in getting it in and out of the GPU.

@f4exb
Copy link
Owner Author

f4exb commented Aug 20, 2023

Released in v7.15.3

@f4exb f4exb closed this as completed Aug 20, 2023
@f4exb f4exb added this to the v7.15.3 milestone Aug 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants