-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDNA3 not being utilised to its full potential #5
Comments
|
I forgot to add:
VkFFT is the fastest so there is likely no real reason to try other FFT backends, unless you want to (see https://github.com/amd/openmm-hip#fft-backends) |
|
In my experience, only large cases like amber20-cellulose (400k atoms) and amber20-stmv (1M atoms) reflect relative performance of different GPUs, i.e. performance scales with more compute units/higher frequency/etc., smaller cases scale worse (latency of launching kernels, scheduling work groups by GPU etc. are sometimes higher than kernels' work).
Yeah, this dependency is not installed with openmm automatically as it's used only for these benchmarks. You can try to install it with
I didn't run OpenMM on RDNA3 but I saw that the HIP compiler generates dual issue instructions, I just wouldn't expect too much as not every pair of instructions in every kernel can be encoded using it.
That would be great. OpenMM (and OpenMM-HIP) has quite simple building instructions, I hope they'll work for you without issues.
Sad. Anyway, thanks for benchmarking. I hope you'll get a chance to run amber20 tests on this and other GPUs.
I'm not aware of it, do you know any details? For example, dual issue is the compiler's way to generate code, it does it but I can't say how effective. Perhaps the suggestion about drivers was for games? Because shaders are compiled basically by the driver's compiler, unlike ROCm where the compiler is a part of ROCm distribution. |
I ran through variety of FAH projects with 7900xtx, and it is consistently 15-20% faster than 6900xt. 6900xt folds at 2.3ghz or so, 7900xtx folds at 2.95-3ghz. 7900xtx has much higher clocks and also more CUs (80 vs 96), which would be utilised by opencl/openmm regardless. But then again, 7900xtx has shader clocks (2.2Ghz or something). So that increase we see right now might be due to CU count increase from 80 to 96, which kinda makes sense. But those CUs have more resources in itself, thus the crazy increase in FLOPS. Even ignoring those FLOPS, 7900xtx should be much faster than 6900xt. And I understand we need large systems for any high end GPU. nVidia has similar issue, but they worked out their CUDA thingy quite well, and their cards are still crazy fast even with relatively low atom counts. They saw quite a jump in FAH performance going from Turing to ampere, and then more progress with Ada. Obviously nothing close to what their CEO tells everyone in the slides, but still. Regardless of that, I saw tremendous perf increase going from opencl to hip. And that is across a lot of AMD GPUs. Hopefully things start moving with HIP Fahcore :) |
I know that's not 7900xtx, but here is Radeon 7 running amber20:
|
Is this project even alive ? |
As far as I understand hip is working as plug in, and those interested can build openmm/hip environments within conda and build what they want. This is in Linux. |
HI,
6900xt MBA (23.04TFLOPS)I'm nearly done testing all of my AMD GPUs comparing them between OpenCL and HIP environments, and today it was 7900xtx turn. Here are the results and comparison vs 6900xt:
Not much of the improvement going from 6900xt. I'll try to get AMD's attention to this.
Will post the rest of the GPU test results in other hip/openmm area monday most likely.
conda env built was standard. Have no knowledge on how to play around with fft backends, but I think that wouldn't change the outcome too much compared to vkfft
The text was updated successfully, but these errors were encountered: