You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, do you have any plan to support ROCm backend for AMDGPU? (rocm is CUDA equivalent of AMD)
Or, are you interested in PR?
It can enable an assembly level optimizations that are not possible with OpenCL backend.
I've been involved with rocm backend development for TVM, which is basically Halide tailored for deep learning inference. Their rocm backend compiles how level IR in python to optimized gpu code using LLVM's AMDGPU backend.
Although their backend is still very much preliminary (the runtime was fixed just week and special math function support is still in progress), the performance of generated code is quite descent: Without AMD-specific optimization, their sgemm kernel already achieves 5200 GFLOPs for a 8 TFLOPs card (see here) and 7740 GFLOPs for a 12.5 TFLOPs card. Their HWCW layout convolution kernel already achieves performance on par with their OpenCL backend.
My real passion is in imaging, or visual computing in general (and not in deep learning per se), so I'd love to see rocm support in Halide as well. Please note that I don't work for AMD and have nothing to do with it. I just like rocm's open ecosystem over NV platform.
Thanks
The text was updated successfully, but these errors were encountered:
Hi, do you have any plan to support ROCm backend for AMDGPU? (rocm is CUDA equivalent of AMD)
Or, are you interested in PR?
It can enable an assembly level optimizations that are not possible with OpenCL backend.
I've been involved with rocm backend development for TVM, which is basically Halide tailored for deep learning inference. Their rocm backend compiles how level IR in python to optimized gpu code using LLVM's AMDGPU backend.
Although their backend is still very much preliminary (the runtime was fixed just week and special math function support is still in progress), the performance of generated code is quite descent: Without AMD-specific optimization, their sgemm kernel already achieves 5200 GFLOPs for a 8 TFLOPs card (see here) and 7740 GFLOPs for a 12.5 TFLOPs card. Their HWCW layout convolution kernel already achieves performance on par with their OpenCL backend.
My real passion is in imaging, or visual computing in general (and not in deep learning per se), so I'd love to see rocm support in Halide as well. Please note that I don't work for AMD and have nothing to do with it. I just like rocm's open ecosystem over NV platform.
Thanks
The text was updated successfully, but these errors were encountered: