-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
katago 1.14.0 TRT plan cache boots significantly slower than 1.13.2 #879
Comments
I do not currently have access to these GPUs. Could you please try commenting out this line and see if it makes a difference? Otherwise it might be a bug in TensorRT itself. |
1.13-trt8.5-benchmark.txt
Maybe the trt 8.6 itself causes that: 1). cost more time to start katago. 2). katago benchmark is worse in trt 8.6. cc: @lightvector |
1.14-trt8.5-benchmark.txt I am not sure if trt 8.6's issue, or my environment causes this. if anyone else can observe that the bechmark or boot time becomes slower, I suggest we should make 1.14 can run in trt 8.5 so, I make a pr for this purpose: |
Do you have single GPU performance numbers? And what's your CUDA version? Anyway I'll submit a proper patch for TensorRT 8.5 compatibility very soon. |
1.14-trt-8.5-single-RTX3060Laptop-benchmark.txt Now I use another machine which has only a single RTX 3060 Laptop GPU. The benchmark shows v/s or nn/s are almost close, but the loading time, trt 8.6 still slower than 8.5 (7 seconds vs 4 seconds) The environment is: |
I've submitted the patch. In the meantime, you can build KataGo with TensorRT 9.2 (available here instead of the website), which fixes the problem and improves performance and compatibility. |
I also tested katago-1.13.2 and katago-1.14 on trt-8.6.1 in my environment. However, I found no significant difference between 1.13.2 and 1.14. They both take about 18-20 seconds from boot to GTP ready. Anyway, thanks to @hyln9 for the pr compating trt-8.5 for current main branch. #https://github.com/lightvector/KataGo/pull/882 My environment: loading weight:18b |
@TTXS123OK are you testing using plan cache or timing cache? The key reason is the trt version, not the katago version |
Here is the testing environments for comparing the booting time for 1.13.2 and 1.14.0 when hitting the plan cache (That means the plan cache files already exist).
For 1.13.2: using TensorRT 8.5.2
For 1.14.0: using TensorRT 8.6.1
The loading weight is: 18b
In a 5 cards of RTX3080 machine, it takes 40 seconds for 1.14.0 to boot to GTP ready, while 1.13.2 just need 17 seconds
In a 8 cards of RTX4070 machine, it takes 63 seconds for 1.14.0 to boot to GTP ready, while 1.13.2 just need 26 seconds.
I also try with different weights and different machines, 1.14.0 is generally boots much slower than 1.13.2
Have you @lightvector or @hyln9 observed this? Thanks!
The text was updated successfully, but these errors were encountered: