Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

katago 1.14.0 TRT plan cache boots significantly slower than 1.13.2 #879

Closed
kinfkong opened this issue Jan 1, 2024 · 8 comments · Fixed by #882
Closed

katago 1.14.0 TRT plan cache boots significantly slower than 1.13.2 #879

kinfkong opened this issue Jan 1, 2024 · 8 comments · Fixed by #882

Comments

@kinfkong
Copy link
Contributor

kinfkong commented Jan 1, 2024

Here is the testing environments for comparing the booting time for 1.13.2 and 1.14.0 when hitting the plan cache (That means the plan cache files already exist).

For 1.13.2: using TensorRT 8.5.2
For 1.14.0: using TensorRT 8.6.1
The loading weight is: 18b

In a 5 cards of RTX3080 machine, it takes 40 seconds for 1.14.0 to boot to GTP ready, while 1.13.2 just need 17 seconds
In a 8 cards of RTX4070 machine, it takes 63 seconds for 1.14.0 to boot to GTP ready, while 1.13.2 just need 26 seconds.

I also try with different weights and different machines, 1.14.0 is generally boots much slower than 1.13.2

Have you @lightvector or @hyln9 observed this? Thanks!

@hyln9
Copy link
Contributor

hyln9 commented Jan 1, 2024

I do not currently have access to these GPUs.

Could you please try commenting out this line and see if it makes a difference?

Otherwise it might be a bug in TensorRT itself.

@kinfkong
Copy link
Contributor Author

kinfkong commented Jan 1, 2024

1.13-trt8.5-benchmark.txt
1.14-trt8.6-benchmark.txt

@hyln9

  1. Commenting out that line does not help. Commenting that line out or not ,does not affect the boot time, they are the same.
  2. I attached the benchmark log for katago1.13+trt8.5 and katago1.14+trt8.6. benchmark log shows more details, and you can check the timestamps in the log to check if there are any hints? thanks.
  3. The benchmark also shows there is a about 10% performance drop in 1.14.

Maybe the trt 8.6 itself causes that: 1). cost more time to start katago. 2). katago benchmark is worse in trt 8.6.

cc: @lightvector

@kinfkong
Copy link
Contributor Author

kinfkong commented Jan 1, 2024

1.14-trt8.5-benchmark.txt
I slightly modify the code to make 1.14 to comapitable with trt 8.5. Here is the benchmark log.
Everything seems fine running katago 1.14 in trt 8.5

I am not sure if trt 8.6's issue, or my environment causes this. if anyone else can observe that the bechmark or boot time becomes slower, I suggest we should make 1.14 can run in trt 8.5

so, I make a pr for this purpose:
#880

@hyln9
Copy link
Contributor

hyln9 commented Jan 1, 2024

Do you have single GPU performance numbers? And what's your CUDA version?

Anyway I'll submit a proper patch for TensorRT 8.5 compatibility very soon.

@kinfkong
Copy link
Contributor Author

kinfkong commented Jan 1, 2024

1.14-trt-8.5-single-RTX3060Laptop-benchmark.txt
1.14-trt-8.6-single-RTX3060Laptop-benchmark.txt

Now I use another machine which has only a single RTX 3060 Laptop GPU.

The benchmark shows v/s or nn/s are almost close, but the loading time, trt 8.6 still slower than 8.5 (7 seconds vs 4 seconds)

The environment is:
ubuntu 22.04
cuda 12.1 (both for running and compiling)

@hyln9

@hyln9
Copy link
Contributor

hyln9 commented Jan 6, 2024

I've submitted the patch.

In the meantime, you can build KataGo with TensorRT 9.2 (available here instead of the website), which fixes the problem and improves performance and compatibility.

@TTXS123OK
Copy link
Contributor

TTXS123OK commented Jan 7, 2024

Here is the testing environments for comparing the booting time for 1.13.2 and 1.14.0 when hitting the plan cache (That means the plan cache files already exist).

For 1.13.2: using TensorRT 8.5.2 For 1.14.0: using TensorRT 8.6.1 The loading weight is: 18b

In a 5 cards of RTX3080 machine, it takes 40 seconds for 1.14.0 to boot to GTP ready, while 1.13.2 just need 17 seconds In a 8 cards of RTX4070 machine, it takes 63 seconds for 1.14.0 to boot to GTP ready, while 1.13.2 just need 26 seconds.

I also try with different weights and different machines, 1.14.0 is generally boots much slower than 1.13.2

Have you @lightvector or @hyln9 observed this? Thanks!

I also tested katago-1.13.2 and katago-1.14 on trt-8.6.1 in my environment. However, I found no significant difference between 1.13.2 and 1.14. They both take about 18-20 seconds from boot to GTP ready.

Anyway, thanks to @hyln9 for the pr compating trt-8.5 for current main branch. #https://github.com/lightvector/KataGo/pull/882

My environment:
1 card 2080Ti
ubuntu 20.04
cuda 12.0

loading weight:18b
katago-1.13.2 git version: 48ec6e7
katago-1.14 git version: 4334a0f
using katago config generated by `./katago genconfig -model 18b.bin.gz' with default options

@kinfkong
Copy link
Contributor Author

kinfkong commented Jan 9, 2024

@TTXS123OK are you testing using plan cache or timing cache?
katago-1.13.2 and katago 1.14 has no differences on booting time, when they are both using trt 8.6
However, they are different if one is using trt 8.5 while the other is using trt 8.6

The key reason is the trt version, not the katago version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants