-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【PaddlePaddle Hackathon 5 No.48】StridedCopyKernel算子GPU性能优化-part1 #58033
【PaddlePaddle Hackathon 5 No.48】StridedCopyKernel算子GPU性能优化-part1 #58033
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@wanghuancoder 老师这个算子怎么调用呀。 |
哦!这个我看了一下,这个kernel目前只在内部使用,没有单纯调用测试性能的渠道。因为这个算法与Contiguous一致。所以我直接合入吧。 |
This reverts commit a9e4b68.
这个PR存在问题导致PaddleDetection develop分支崩溃: python tools/train.py -c configs/rtdetr/rtdetr_hgnetv2_l_6x_coco.yml -o worker_num=16 LearningRate.base_lr=0.0001 log_iter=1 use_gpu=True save_dir=./test_tipc/output/rtdetr_hgnetv2_l_6x_coco/benchmark_train/norm_train_gpus_5_autocast_fp32 epoch=1 pretrain_weights=https://bj.bcebos.com/v1/paddledet/models/rtdetr_hgnetv2_l_6x_coco.pdparams TrainReader.batch_size=16 filename=rtdetr_hgnetv2_l_6x_coco TrainReader.shuffle=False --enable_ce=True |
@wanghuancoder 这周末修完 |
@wanghuancoder 老师我编译安装PaddleDetection后,运行您给的代码: python tools/train.py -c configs/rtdetr/rtdetr_hgnetv2_l_6x_coco.yml -o worker_num=16 LearningRate.base_lr=0.0001 log_iter=1 use_gpu=True save_dir=./test_tipc/output/rtdetr_hgnetv2_l_6x_coco/benchmark_train/norm_train_gpus_5_autocast_fp32 epoch=1 pretrain_weights=https://bj.bcebos.com/v1/paddledet/models/rtdetr_hgnetv2_l_6x_coco.pdparams TrainReader.batch_size=16 filename=rtdetr_hgnetv2_l_6x_coco TrainReader.shuffle=False --enable_ce=True |
…dle#58230) This reverts commit a9e4b68.
…dle#58230) This reverts commit a9e4b68.
@wanghuancoder @wanghuancoder 老师您好,您提到的StridedCopyKernel存在问题导致PaddleDetection develop分支崩溃,大概率是因为线程配置参数越界。目前已初步修复,但是需要再测试一下那个问题。您给的测试指令我直接运行不了,可能需要依赖特定的环境。有什么需要可以联系我~~ |
…dle#58230) This reverts commit a9e4b68.
PR types
Performance optimization
PR changes
OPs
Description
目前ContiguousKernel、StridedCopyKernel两个 kernel 都是通过 numel index 计算数据偏移地址,需要一个 for 循环做计算,计算偏移地址效率低,导致 kernel 性能差。