[Compile]Error compiling with TENSORRT on CUDA 12.2 #55016

engineer1109 · 2023-06-30T02:01:47Z

bug描述 Describe the Bug

目前版本分支 develop 12a296c
cmake .. -DWITH_CUSTOM_DEVICE=ON -DWITH_GPU=ON -DWITH_TENSORRT=ON

CUDA 12.2 apt 安装
TENSOR 8.6.1.6 apt 安装 libnvinfer-dev libnvinfer-plugin-dev

出现大量的相同编译错误

/media/wjl/D2/github/fork/Paddle/paddle/fluid/inference/tensorrt/plugin/many_emb_layernorm_kernel.cu(83): error: no instance of function template "cuda::std::__4::plus<void>::operator()" matches the argument list
            argument types are: (cub::CUB_200101_750_NS::KeyValuePair<float, float>, cub::CUB_200101_750_NS::KeyValuePair<float, float>)
            object type is: cub::CUB_200101_750_NS::Sum
      threadData = pairSum(threadData, kvp<T>(rldval, rldval * val));
                   ^
          detected during:
            instantiation of "void paddle::inference::tensorrt::plugin::embLayerNormKernel_2<T,TPB>(int32_t, const int32_t *, const int32_t *, const float *, const float *, const T *, const T *, int32_t, int32_t, T *) [with T=float, TPB=256U]" at line 279
            instantiation of "int32_t paddle::inference::tensorrt::plugin::embSkipLayerNorm_2(cudaStream_t, int32_t, int32_t, int32_t, const int32_t *, const int32_t *, int32_t, const float *, const float *, const T *, const T *, int32_t, int32_t, T *) [with T=float]" at line 365

cub::Sum 有问题

其他补充信息 Additional Supplementary Information

No response

The text was updated successfully, but these errors were encountered:

ForFishes · 2023-07-04T06:08:31Z

您好，请问用的是官方镜像吗？使用官方发布的镜像编译试一试呢？https://www.paddlepaddle.org.cn/documentation/docs/zh/install/docker/linux-docker.html

engineer1109 · 2023-07-04T06:10:27Z

@ForFishes 没用镜像，问题来源是CUDA12.1 没问题， CUDA 12.2有问题

jeng1220 · 2023-07-20T02:18:10Z

nvbugs 4202615

jeng1220 · 2023-07-20T02:36:17Z

~~cub::Sum是__host__ __device__ __forceinline__ T cub::Sum::operator()~~
~~最快的workaround是將其替換成~~
~~// threadData = pairSum(threadData, kvp(rldval, rldval * val));~~
~~threadData.key += rldva;~~
~~threadData.value += rldval * val;~~

目前看起來，問題是新的cub::Sum複用::cuda::std::plus<>所引起的:
https://github.com/NVIDIA/cub/blob/main/cub/thread/thread_operators.cuh#L79

舊cub是自行實現Sum:
https://github.com/NVIDIA/cub/blob/2.0.X/cub/thread/thread_operators.cuh#L97C1-L106C3

因cub是底層算子，warp_reduce、block_reduce 和 device_reduce 也會受影響
故不是光改一個地方就能解決問題

Issue #55016

jeng1220 · 2023-07-21T06:22:47Z

@engineer1109 ,
問題應已修復，若沒問題的話，麻煩關閉Issue。

Issue PaddlePaddle#55016

jeng1220 · 2023-07-27T02:54:54Z

@engineer1109 ,
由於問題已修復，故關閉這Issue，若你仍遇到問題，請再此開啟Issue

Issue PaddlePaddle#55016

engineer1109 added status/new-issue 新建 type/bug-report 报bug labels Jun 30, 2023

paddle-bot bot assigned ForFishes Jun 30, 2023

paddle-bot bot added the PFCC Paddle Framework Contributor Club，https://github.com/PaddlePaddle/community/tree/master/pfcc label Jun 30, 2023

paddle-bot bot added status/following-up 跟进中 and removed status/new-issue 新建 labels Jul 4, 2023

jeng1220 added the NVIDIA label Jul 19, 2023

jeng1220 mentioned this issue Jul 20, 2023

Bugfix, CUB regression in CUDA 12.2 #55594

Merged

Wangzheee pushed a commit that referenced this issue Jul 21, 2023

Bugfix, CUB regression in CUDA 12.2 (#55594)

b2c797a

Issue #55016

cqulilujia pushed a commit to cqulilujia/Paddle that referenced this issue Jul 24, 2023

Bugfix, CUB regression in CUDA 12.2 (PaddlePaddle#55594)

b5f1735

Issue PaddlePaddle#55016

jeng1220 closed this as completed Jul 27, 2023

paddle-bot bot added status/close 已关闭 and removed status/following-up 跟进中 labels Jul 27, 2023

wz1qqx pushed a commit to wz1qqx/Paddle that referenced this issue Jul 31, 2023

Bugfix, CUB regression in CUDA 12.2 (PaddlePaddle#55594)

e459458

Issue PaddlePaddle#55016

jinjidejinmuyan pushed a commit to jinjidejinmuyan/Paddle that referenced this issue Aug 30, 2023

Bugfix, CUB regression in CUDA 12.2 (PaddlePaddle#55594)

96ea5d1

Issue PaddlePaddle#55016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Compile]Error compiling with TENSORRT on CUDA 12.2 #55016

[Compile]Error compiling with TENSORRT on CUDA 12.2 #55016

engineer1109 commented Jun 30, 2023

ForFishes commented Jul 4, 2023

engineer1109 commented Jul 4, 2023

jeng1220 commented Jul 20, 2023

jeng1220 commented Jul 20, 2023 •

edited

Loading

jeng1220 commented Jul 21, 2023 •

edited

Loading

jeng1220 commented Jul 27, 2023

[Compile]Error compiling with TENSORRT on CUDA 12.2 #55016

[Compile]Error compiling with TENSORRT on CUDA 12.2 #55016

Comments

engineer1109 commented Jun 30, 2023

bug描述 Describe the Bug

其他补充信息 Additional Supplementary Information

ForFishes commented Jul 4, 2023

engineer1109 commented Jul 4, 2023

jeng1220 commented Jul 20, 2023

jeng1220 commented Jul 20, 2023 • edited Loading

jeng1220 commented Jul 21, 2023 • edited Loading

jeng1220 commented Jul 27, 2023

jeng1220 commented Jul 20, 2023 •

edited

Loading

jeng1220 commented Jul 21, 2023 •

edited

Loading