katago 1.14.0 TRT plan cache boots significantly slower than 1.13.2 #879

kinfkong · 2024-01-01T14:56:49Z

Here is the testing environments for comparing the booting time for 1.13.2 and 1.14.0 when hitting the plan cache (That means the plan cache files already exist).

For 1.13.2: using TensorRT 8.5.2
For 1.14.0: using TensorRT 8.6.1
The loading weight is: 18b

In a 5 cards of RTX3080 machine, it takes 40 seconds for 1.14.0 to boot to GTP ready, while 1.13.2 just need 17 seconds
In a 8 cards of RTX4070 machine, it takes 63 seconds for 1.14.0 to boot to GTP ready, while 1.13.2 just need 26 seconds.

I also try with different weights and different machines, 1.14.0 is generally boots much slower than 1.13.2

Have you @lightvector or @hyln9 observed this? Thanks!

hyln9 · 2024-01-01T15:22:16Z

I do not currently have access to these GPUs.

Could you please try commenting out this line and see if it makes a difference?

Otherwise it might be a bug in TensorRT itself.

kinfkong · 2024-01-01T16:45:34Z

1.13-trt8.5-benchmark.txt
1.14-trt8.6-benchmark.txt

@hyln9

Commenting out that line does not help. Commenting that line out or not ,does not affect the boot time, they are the same.
I attached the benchmark log for katago1.13+trt8.5 and katago1.14+trt8.6. benchmark log shows more details, and you can check the timestamps in the log to check if there are any hints? thanks.
The benchmark also shows there is a about 10% performance drop in 1.14.

Maybe the trt 8.6 itself causes that: 1). cost more time to start katago. 2). katago benchmark is worse in trt 8.6.

cc: @lightvector

kinfkong · 2024-01-01T17:42:51Z

1.14-trt8.5-benchmark.txt
I slightly modify the code to make 1.14 to comapitable with trt 8.5. Here is the benchmark log.
Everything seems fine running katago 1.14 in trt 8.5

I am not sure if trt 8.6's issue, or my environment causes this. if anyone else can observe that the bechmark or boot time becomes slower, I suggest we should make 1.14 can run in trt 8.5

so, I make a pr for this purpose:
#880

hyln9 · 2024-01-01T17:55:31Z

Do you have single GPU performance numbers? And what's your CUDA version?

Anyway I'll submit a proper patch for TensorRT 8.5 compatibility very soon.

kinfkong · 2024-01-01T21:02:24Z

1.14-trt-8.5-single-RTX3060Laptop-benchmark.txt
1.14-trt-8.6-single-RTX3060Laptop-benchmark.txt

Now I use another machine which has only a single RTX 3060 Laptop GPU.

The benchmark shows v/s or nn/s are almost close, but the loading time, trt 8.6 still slower than 8.5 (7 seconds vs 4 seconds)

The environment is:
ubuntu 22.04
cuda 12.1 (both for running and compiling)

@hyln9

hyln9 · 2024-01-06T11:14:30Z

I've submitted the patch.

In the meantime, you can build KataGo with TensorRT 9.2 (available here instead of the website), which fixes the problem and improves performance and compatibility.

TTXS123OK · 2024-01-07T09:27:54Z

Here is the testing environments for comparing the booting time for 1.13.2 and 1.14.0 when hitting the plan cache (That means the plan cache files already exist).

For 1.13.2: using TensorRT 8.5.2 For 1.14.0: using TensorRT 8.6.1 The loading weight is: 18b

In a 5 cards of RTX3080 machine, it takes 40 seconds for 1.14.0 to boot to GTP ready, while 1.13.2 just need 17 seconds In a 8 cards of RTX4070 machine, it takes 63 seconds for 1.14.0 to boot to GTP ready, while 1.13.2 just need 26 seconds.

I also try with different weights and different machines, 1.14.0 is generally boots much slower than 1.13.2

Have you @lightvector or @hyln9 observed this? Thanks!

I also tested katago-1.13.2 and katago-1.14 on trt-8.6.1 in my environment. However, I found no significant difference between 1.13.2 and 1.14. They both take about 18-20 seconds from boot to GTP ready.

Anyway, thanks to @hyln9 for the pr compating trt-8.5 for current main branch. #https://github.com/lightvector/KataGo/pull/882

My environment:
1 card 2080Ti
ubuntu 20.04
cuda 12.0

loading weight:18b
katago-1.13.2 git version: 48ec6e7
katago-1.14 git version: 4334a0f
using katago config generated by `./katago genconfig -model 18b.bin.gz' with default options

kinfkong · 2024-01-09T05:29:37Z

@TTXS123OK are you testing using plan cache or timing cache?
katago-1.13.2 and katago 1.14 has no differences on booting time, when they are both using trt 8.6
However, they are different if one is using trt 8.5 while the other is using trt 8.6

The key reason is the trt version, not the katago version

kinfkong mentioned this issue Jan 1, 2024

Make Katago 1.14 compatible with TRT 8.5 #880

Closed

hyln9 mentioned this issue Jan 5, 2024

Restore TensorRT 8.5 support #882

Merged

lightvector closed this as completed in #882 Mar 3, 2024

Porkepix mentioned this issue Mar 11, 2024

katago 1.14.1 Homebrew/homebrew-core#165695

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

katago 1.14.0 TRT plan cache boots significantly slower than 1.13.2 #879

katago 1.14.0 TRT plan cache boots significantly slower than 1.13.2 #879

kinfkong commented Jan 1, 2024

hyln9 commented Jan 1, 2024

kinfkong commented Jan 1, 2024

kinfkong commented Jan 1, 2024 •

edited

Loading

hyln9 commented Jan 1, 2024

kinfkong commented Jan 1, 2024 •

edited

Loading

hyln9 commented Jan 6, 2024

TTXS123OK commented Jan 7, 2024 •

edited

Loading

kinfkong commented Jan 9, 2024 •

edited

Loading

katago 1.14.0 TRT plan cache boots significantly slower than 1.13.2 #879

katago 1.14.0 TRT plan cache boots significantly slower than 1.13.2 #879

Comments

kinfkong commented Jan 1, 2024

hyln9 commented Jan 1, 2024

kinfkong commented Jan 1, 2024

kinfkong commented Jan 1, 2024 • edited Loading

hyln9 commented Jan 1, 2024

kinfkong commented Jan 1, 2024 • edited Loading

hyln9 commented Jan 6, 2024

TTXS123OK commented Jan 7, 2024 • edited Loading

kinfkong commented Jan 9, 2024 • edited Loading

kinfkong commented Jan 1, 2024 •

edited

Loading

kinfkong commented Jan 1, 2024 •

edited

Loading

TTXS123OK commented Jan 7, 2024 •

edited

Loading

kinfkong commented Jan 9, 2024 •

edited

Loading