【MetaX】Merge Metax's modifications to mxmaca/2.6 branch #68534

idontkonwher · 2024-09-30T02:33:19Z

PR Category

Environment Adaptation

PR Types

Others

Description

Open source the modifications made by Metax to the Paddle repository when adapting to the MXMACA software stack.

* fix windows bug * fix windows bug * fix windows bug * fix windows bug * fix windows bug * fix windows bug * Update inference_lib.cmake

…#60324) Co-authored-by: gouzil <66515297+gouzil@users.noreply.github.com>

…addle#60184) * fix weight-only quant kernel error for n div 64 !=0 * code style fix

…0208) (PaddlePaddle#60495) * fix chunk allocator posix_memalign return value check;test=develop * fix chunk allocator posix_memalign return value check;test=develop * fix chunk allocator posix_memalign return value check;test=develop

…le#60545)

…e#60620) * fix fleetutil get_online_pass_interval bug3; test=develop * fix fleetutil get_online_pass_interval bug3; test=develop * fix fleetutil get_online_pass_interval bug3; test=develop

* update 2023 security advisory, test=document_fix * update pdsa-2023-019, test=document_fix

…EADME (PaddlePaddle#60786) * [Dy2St][2.6] Disable `test_transformer` on release/2.6 and update README * [Docs] Update latest release version in README (PaddlePaddle#60691) * restore order

PaddlePaddle#60829) (PaddlePaddle#60875)

* Fix set value grad (PaddlePaddle#59034) * first fix the UT * fix set value grad * polish code * add static mode backward test * always has input valuetensor * add dygraph test * Fix shape error in combined-indexing setitem (PaddlePaddle#60447) * add ut * fix shape error in combine-indexing * fix ut * Set value with scalar (PaddlePaddle#60452) * set_value with scalar * fix ut * remove test_pir * remove one test since 2.6 not support uint8-add

…addlePaddle#60616) (PaddlePaddle#60772)

…ddlePaddle#61067)

…le#61338) * fix draw security problem

…1337)

This uses shlex for safe command parsing to fix arbitrary code injection Co-authored-by: ndren <andreien@proton.me>

…ePaddle#61382) * OS Command Injection prune_by_memory_estimation fix * Fix StyleCode

…dlePaddle#61398) * fix security problem for run_cmd

…Paddle#61388) * fix download security problem

…addlePaddle#60774) (PaddlePaddle#61045) Co-authored-by: Tian <121000916+SylarTiaNII@users.noreply.github.com>

* fix issue 60092 * update * update * update

* fix unique kernel, row to num_out

…ddlePaddle#61586)

* remove _wget * remove _wget * remove wget test

(cherry picked from commit fe4655e86b92f5053fa886af49bf199307960a05) Change-Id: I35003420292359f8a41b19b7ca2cbaae17dc5b45 Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

…move ldg up. (cherry picked from commit a7cb0ed275a3488f79445ef31456ab6560e9de43) Change-Id: Ia89df4e5a26de64baae4152837d2ce3076c56df1 Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

…stDivMod;3.move ldg up. (cherry picked from commit 4fb857655d09f55783d9445b91a2d953ed14d0b8) Change-Id: I7df7f3af7b4615e5e96d33b439e5276be6ddb732 Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

(cherry picked from commit 333cba7aca1edf7a0e87623a0e55e230cd1e9451) Change-Id: Ic808d42003677ed543621eb22a797f0ab7751baa Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

…onzero and masked_select (forward only) OP. (cherry picked from commit c907b40eb3f9ded6ee751e522c2a97a353ac93bd) Change-Id: I7f4845405e64e7599134a8c497f464ac04dead88 Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

1. 256 Blocksize launch for small shape inputgrad; 2. FastDivMod in inputgrad and filtergrad; 3. shared memory to put output_grad_data in small shape. (cherry picked from commit f9f29bf7b8d929fb95eb1153a79d8a6b96d5b6d2) Change-Id: I1a3818201784031dbedc320286ea5f4802dbb6b1 Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

…iple tensors. (cherry picked from commit 3bd200f262271a333b3947326442b86af7fb6da1) Change-Id: I57c94cc5e709be8926e1b21da14b653cb18eabc3 Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

…nto multiple tensors." This reverts commit 3bd200f262271a333b3947326442b86af7fb6da1. (cherry picked from commit 86ed8adaa8c20d3c824eecb0ee1e10d365bcea37) Change-Id: I5b8b7819fdf99255c65fe832d5d77f8e439bdecb Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

(cherry picked from commit cddb01a83411c45f68363248291c0c4685e60b24) Change-Id: Ie106ff8d65c21a8545c40636f021b73f3ad84587 Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

(cherry picked from commit 07ea3acf347fda434959c8c9cc3533c0686d1836) Change-Id: Id7a727fd18fac4a662f8af1bf6c6b5ebc6233c9f Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

Use tmp to store ldg data in the loop so calculate and ldg time can fold each other. (cherry picked from commit 7ddab49d868cdb6deb7c3e17c5ef9bbdbab86c3e) Change-Id: I46399594d1d7f76b78b9860e483716fdae8fc7d6 Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

…ed memory and making single thread do more tasks. (cherry picked from commit 631ffdda2847cda9562e591dc87b3f529a51a978) Change-Id: Ie9ffdd872ab06ff34d4daf3134d6744f5221e41e Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

1.LayerNormBackward: remove if statement, now will always loop VPT times for ldg128 in compiler, bool flag to control if write action will be taken or not; 2.ContiguousCaseOneFunc: tmp saving division result for less division (cherry picked from commit 422d676507308d26f6107bed924424166aa350d3) Change-Id: I37aab7e2f97ae6b61c0f50ae4134f5eb1743d429 Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

Set BlockDim.z to make blockSize always be 512, each block can handle several batches. Then all threads will loop 4 times for better performance. (cherry picked from commit 7550c90ca29758952fde13eeea74857ece41908b) Change-Id: If24de87a0af19ee07e29ac2e7e237800f0181148 Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

…ange it to 64 warp reduce. (cherry picked from commit a346af182b139dfc7737e5f6473dc394b21635d7) Change-Id: I6c8d8105fd77947c662e6d22a0d15d7bad076bde Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

Might have lossdiff with old optimization without atomicAdd. (cherry picked from commit 80b0bcaa9a307c94dbeda658236fd75e104ccccc) Change-Id: I4a7c4ec2a0e885c2d581dcebc74464830dae7637 Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

(cherry picked from commit cc421d7861c359740de0d2870abcfde4354d8c71) Change-Id: I55c049e951f93782af1c374331f44b521ed75dfe Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

…oat16>. Change-Id: I5788c73a9c45f65e60ed5a88d16a473bbb888927

Change-Id: I8b34f02958ddccb3467f639daaac8044022f3d34

Change-Id: I77730da567903f43ef7a9992925b90ed4ba179c7

Change-Id: I1b7eb58e7959daff8660ce7889ba390cdfae0c1a

Change-Id: I94d422c969bdb83ad74262e03efe38ca85ffa673

Change-Id: I8ece364d926596a40f42d973190525d9b8224d99

paddle-bot · 2024-09-30T02:33:23Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

CLAassistant · 2024-09-30T02:33:27Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

zequn yang seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

risemeup1 and others added 30 commits December 26, 2023 17:15

fix windows bug for common lib (PaddlePaddle#60308)

1b696a1

* fix windows bug * fix windows bug * fix windows bug * fix windows bug * fix windows bug * fix windows bug * Update inference_lib.cmake

[Dy2St] Disable test_bert on CPU (PaddlePaddle#60173) (PaddlePaddle…

a4cd847

…#60324) Co-authored-by: gouzil <66515297+gouzil@users.noreply.github.com>

[Cherry-pick] fix weight quant kernel bug when n div 64 != 0 (PaddleP…

20d3558

…addle#60184) * fix weight-only quant kernel error for n div 64 !=0 * code style fix

tile (PaddlePaddle#60261)

203754e

update 2023 security advisory, test=document_fix (PaddlePaddle#60532)

83ce809

fix fleetutil get_online_pass_interval bug2; test=develop (PaddlePadd…

ae2e588

…le#60545)

fix fused_rope diff (PaddlePaddle#60217) (PaddlePaddle#60593)

97b65c7

[cherry-pick]fix fleetutil get_online_pass_interval bug3 (PaddlePaddl…

bbc13eb

…e#60620) * fix fleetutil get_online_pass_interval bug3; test=develop * fix fleetutil get_online_pass_interval bug3; test=develop * fix fleetutil get_online_pass_interval bug3; test=develop

[cherry-pick]update pdsa-2023-019 (PaddlePaddle#60649)

ccdf528

* update 2023 security advisory, test=document_fix * update pdsa-2023-019, test=document_fix

[Dy2St][2.6] Disable test_grad on release/2.6 (PaddlePaddle#60662)

e50f43e

fix bug of ci (PaddlePaddle#59926) (PaddlePaddle#60785)

7b0d2e9

[Dy2St][2.6] Disable test_transformer on release/2.6 and update R…

e738f49

…EADME (PaddlePaddle#60786) * [Dy2St][2.6] Disable `test_transformer` on release/2.6 and update README * [Docs] Update latest release version in README (PaddlePaddle#60691) * restore order

[Dy2St][2.6] Increase test_transformer and test_mobile_net ut time (

d788e9b

PaddlePaddle#60829) (PaddlePaddle#60875)

[cherry-pick] This PR enable offset of generator for custom device. (P…

0f732a5

…addlePaddle#60616) (PaddlePaddle#60772)

fix core dump when fallback gather_nd_grad and MemoryAllocateHost (Pa…

ac1702b

…ddlePaddle#61067)

fix qat tests (PaddlePaddle#61211) (PaddlePaddle#61284)

ff119d0

[Security] fix draw security problem (PaddlePaddle#61161) (PaddlePadd…

aeaa0ca

…le#61338) * fix draw security problem

fix _decompress security problem (PaddlePaddle#61294) (PaddlePaddle#6…

0227a0d

…1337)

Fix CVE-2024-0521 (PaddlePaddle#61032) (PaddlePaddle#61287)

f99d4f2

This uses shlex for safe command parsing to fix arbitrary code injection Co-authored-by: ndren <andreien@proton.me>

[Security] fix security problem for prune_by_memory_estimation (Paddl…

af9b8c5

…ePaddle#61382) * OS Command Injection prune_by_memory_estimation fix * Fix StyleCode

[Security] fix security problem for run_cmd (PaddlePaddle#61285) (Pad…

9cd0c91

…dlePaddle#61398) * fix security problem for run_cmd

[Security] fix download security problem (PaddlePaddle#61162) (Paddle…

5f3bbeb

…Paddle#61388) * fix download security problem

check eval for security (PaddlePaddle#61389)

60325a1

[cherry-pick] adapt c_embedding to phi namespace for custom devices (P…

0ccb9cb

…addlePaddle#60774) (PaddlePaddle#61045) Co-authored-by: Tian <121000916+SylarTiaNII@users.noreply.github.com>

[CherryPick] Fix issue 60092 (PaddlePaddle#61427)

f025385

* fix issue 60092 * update * update * update

Fix unique (PaddlePaddle#60840) (PaddlePaddle#61044)

3452e61

* fix unique kernel, row to num_out

cinn(py-dsl): skip eval string in python-dsl (PaddlePaddle#61380) (Pa…

a37f6fb

…ddlePaddle#61586)

remove _wget (PaddlePaddle#61356) (PaddlePaddle#61569)

9250f66

* remove _wget * remove _wget * remove wget test

Zhao Wu and others added 25 commits August 28, 2024 12:27

improve FilterBBoxes

df38d3c

(cherry picked from commit fe4655e86b92f5053fa886af49bf199307960a05) Change-Id: I35003420292359f8a41b19b7ca2cbaae17dc5b45 Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

improve deformable_conv_grad op:1.adaptive block size;2.FastDivMod;3.…

cb28f43

…move ldg up. (cherry picked from commit a7cb0ed275a3488f79445ef31456ab6560e9de43) Change-Id: Ia89df4e5a26de64baae4152837d2ce3076c56df1 Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

improve ModulatedDeformableIm2colGpuKernel:1.adaptive block size;2.Fa…

bdf6be4

…stDivMod;3.move ldg up. (cherry picked from commit 4fb857655d09f55783d9445b91a2d953ed14d0b8) Change-Id: I7df7f3af7b4615e5e96d33b439e5276be6ddb732 Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

improve KeBNBackwardData:replace 1.0/sqrt with rsqrt

3d67920

(cherry picked from commit 333cba7aca1edf7a0e87623a0e55e230cd1e9451) Change-Id: Ic808d42003677ed543621eb22a797f0ab7751baa Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

Improve CheckFiniteAndUnscaleKernel by splitting the kernel into mult…

2f6170b

…iple tensors. (cherry picked from commit 3bd200f262271a333b3947326442b86af7fb6da1) Change-Id: I57c94cc5e709be8926e1b21da14b653cb18eabc3 Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

improve ScatterInitCUDAKernel and ScatterCUDAKernel

add2987

(cherry picked from commit cddb01a83411c45f68363248291c0c4685e60b24) Change-Id: Ie106ff8d65c21a8545c40636f021b73f3ad84587 Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

fix bugs and make the code easier to read

7b7b153

(cherry picked from commit 07ea3acf347fda434959c8c9cc3533c0686d1836) Change-Id: Id7a727fd18fac4a662f8af1bf6c6b5ebc6233c9f Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

improve KeMatrixTopK:1.fix private memory;2.modify max grid size;3.ch…

b756d63

…ange it to 64 warp reduce. (cherry picked from commit a346af182b139dfc7737e5f6473dc394b21635d7) Change-Id: I6c8d8105fd77947c662e6d22a0d15d7bad076bde Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

Modify LayerNorm Optimization

37d4165

Might have lossdiff with old optimization without atomicAdd. (cherry picked from commit 80b0bcaa9a307c94dbeda658236fd75e104ccccc) Change-Id: I4a7c4ec2a0e885c2d581dcebc74464830dae7637 Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

improve roi_align op:1.adaptive block size;2.FastDivMod.

29999eb

(cherry picked from commit cc421d7861c359740de0d2870abcfde4354d8c71) Change-Id: I55c049e951f93782af1c374331f44b521ed75dfe Signed-off-by: m00891 <Zequn.Yang@metax-tech.com>

add workaround for parameters dislocation when calling BatchedGEMM<fl…

b37090c

…oat16>. Change-Id: I5788c73a9c45f65e60ed5a88d16a473bbb888927

fix McFlashAttn string

531f3c7

Change-Id: I8b34f02958ddccb3467f639daaac8044022f3d34

[C500-27046] fix wb issue

d180d30

Change-Id: I77730da567903f43ef7a9992925b90ed4ba179c7

Support compiling external ops

5235601

Change-Id: I1b7eb58e7959daff8660ce7889ba390cdfae0c1a

Merge "Support compiling external ops" into mx/dev-2.6.0

cd89fd8

support flash attn varlen api and support arm build

f7eb629

Change-Id: I94d422c969bdb83ad74262e03efe38ca85ffa673

Add a copyright notice

ee123e0

Change-Id: I8ece364d926596a40f42d973190525d9b8224d99

Modify some third-party dependency addresses to public network addresses

898a221

idontkonwher changed the title ~~【Metax】Merge Metax's modifications to mxmaca/2.6 branch~~ 【MetaX】Merge Metax's modifications to mxmaca/2.6 branch Sep 30, 2024

xiaoguoguo626807 approved these changes Sep 30, 2024

View reviewed changes

xiaoguoguo626807 merged commit b102bc4 into PaddlePaddle:release-mxmaca/2.6 Sep 30, 2024
0 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【MetaX】Merge Metax's modifications to mxmaca/2.6 branch #68534

【MetaX】Merge Metax's modifications to mxmaca/2.6 branch #68534

idontkonwher commented Sep 30, 2024

paddle-bot bot commented Sep 30, 2024

CLAassistant commented Sep 30, 2024

【MetaX】Merge Metax's modifications to mxmaca/2.6 branch #68534

【MetaX】Merge Metax's modifications to mxmaca/2.6 branch #68534

Conversation

idontkonwher commented Sep 30, 2024

PR Category

PR Types

Description

paddle-bot bot commented Sep 30, 2024

CLAassistant commented Sep 30, 2024