Performance relative to CuPy and on Jetson natively #748
Replies: 1 comment
-
Hi @dzeleznikar, that comment was showing a comparison of syntax here: https://nvidia.github.io/MatX/basics/matlabpython.html We do not have a performance comparison because it varies so much, but as a general rule of thumb we expect MatX to be faster than cuPy for all workloads. If it's not, it's a bug and should be fixed. A good example is here. FFT performance is heavily dependent on the size of the FFT and the number of batches. In general cuPy and MatX won't have much of a difference there because they should launch the same exact kernel in cuFFT under the same circumstances. That would be more of a comparison across libraries and hardware at that point as you pointed out. I'd be happy to run some tests on an Orin with a specific size in MatX and cuPy, but just be warned that MatX should not outperform cuPy by much in a synthetic test like that. If your real workload does a lot more than just FFTs, then I would expect MatX to outperform cuPy. Is your MATLAB available for viewing? |
Beta Was this translation helpful? Give feedback.
-
I was reading the comments on this old YC post about MatX (https://news.ycombinator.com/item?id=37756281) and saw a reference to "comparison to numPy/cuPy, and we do have a table showing the comparison in the docs," however I could not find the comparison table showing "between MatX and cuPy we see a 3-4x performance difference on average." Is it hiding somewhere or was deprecated with a newer version?
I'm primarily interested in what the Jetson Orin board can do as an alternative to an FPGA platform, but some benchmarks here might help... I could only find things like https://openbenchmarking.org/result/2409108-NE-JETSONORI73 available publicly, and it sounds like that one leverages VkFFT which is an open-source alternative to cuFFT which shows a performance comparison between Nvidia A100 and AMD MI250 when using VkFFT vs CuFFT vs rocFFT, but it would be really interesting to see how performance compares on a Jetson Orin with MatX and/or CuPy as these seem like much more approachable paths to developing with CUDA in my use cases where I have MATLAB code today
Beta Was this translation helpful? Give feedback.
All reactions