[Paddle-ASP]Support sharding training for the Nvidia's ASP(2:4 sparsity) functionality #37725

minghaoBD · 2021-11-30T12:22:07Z

PR types

Bug fixes

PR changes

Others

Describe

Nvidia has implemented 2:4 sparsity code in PaddlePaddle, supporting fleet distributed training. But when we are trying to train with sharding strategy (the model parallel paradigm in PaddlePaddle), GPU:0 will always be OOM while other GPUs seems normal.

After fix, developers should pass in an argument: shading=True when calling sparsity.prune_model() with the sharding strategy. Otherwise no difference when using the APIs.

sync repo

paddle-bot-old · 2021-11-30T12:22:50Z

✅ This PR's description meets the template requirements!
Please wait for other CI results.

paddle-bot-old · 2021-11-30T12:22:56Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

qingqing01

Please add UT

wanghaoshuang · 2021-11-30T15:32:24Z

python/paddle/fluid/contrib/sparsity/asp.py

@@ -150,7 +155,8 @@ def prune_model(main_program=None,
                n=2,
                m=4,
                mask_algo='mask_1d',
-                with_mask=True):
+                with_mask=True,
+                sharding=False):


可以让用户直接传一个place么？那种方式更好理解？

理解是好理解，但是我们需要给用户额外说明。
此外，即使说明了，也有经常place设置错误的风险，然后出bug。
我的理解是，place在prune_model里面，代码稳定性更高一些呢？

paddle-bot-old · 2021-12-09T02:37:36Z

Sorry to inform you that 13083b1's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

minghaoBD · 2021-12-30T06:57:58Z

Please add UT

added tests for optimizer compatibility and modified prune_model API.

wanghaoshuang

LGTM.

wanghaoshuang · 2021-12-31T02:33:38Z

请修改下PR标题，方便后续检索和管理自己的工作。

minghaoBD · 2021-12-31T02:35:48Z

请修改下PR标题，方便后续检索和管理自己的工作。

Done, thanks

JZ-LIANG

LGTM for sharding

XiaoguangHu01

LG API

TCChenlong

LGTM

minghaoBD and others added 5 commits April 27, 2021 10:28

Merge pull request #1 from PaddlePaddle/develop

d986600

sync repo

Merge branch 'PaddlePaddle:develop' into develop

85da8b4

Merge branch 'PaddlePaddle:develop' into develop

054b6d0

Merge branch 'PaddlePaddle:develop' into develop

4827419

[ASP-sharding] support ASP training under sharding mode

f4fe8a8

qingqing01 requested review from wanghaoshuang, qingqing01, JZ-LIANG and Xreki November 30, 2021 12:33

qingqing01 reviewed Nov 30, 2021

View reviewed changes

wanghaoshuang reviewed Nov 30, 2021

View reviewed changes

Merge branch 'PaddlePaddle:develop' into asp_sharding

13083b1

minghaoBD and others added 4 commits December 30, 2021 09:59

Merge branch 'PaddlePaddle:develop' into asp_sharding

cf7d633

add unittest for asp optimizer with sharding

fb3b163

add unittest for asp optimizer with sharding

0daa8af

add system test for asp-sharding optimizer

b575986

minghaoBD requested review from wanghaoshuang and qingqing01 December 30, 2021 06:56

skip windows CI

3c5e4dc

wanghaoshuang approved these changes Dec 31, 2021

View reviewed changes

minghaoBD changed the title ~~Asp sharding~~ [Paddle-ASP]Asp sharding Dec 31, 2021

JZ-LIANG approved these changes Jan 4, 2022

View reviewed changes

XiaoguangHu01 approved these changes Jan 6, 2022

View reviewed changes

TCChenlong approved these changes Jan 6, 2022

View reviewed changes

wanghaoshuang merged commit aec6e8a into PaddlePaddle:develop Jan 6, 2022

minghaoBD changed the title ~~[Paddle-ASP]Asp sharding~~ [Paddle-ASP]Support sharding training for the Nvidia's ASP(2:4 sparsity) functionality Jan 6, 2022

minghaoBD deleted the asp_sharding branch January 6, 2022 03:30

gongweibao mentioned this pull request Mar 1, 2022

[fleet_executor] Add entrance of FleetExecutor in AnalysisPredictor for distributed inference #39992

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Paddle-ASP]Support sharding training for the Nvidia's ASP(2:4 sparsity) functionality #37725

[Paddle-ASP]Support sharding training for the Nvidia's ASP(2:4 sparsity) functionality #37725

minghaoBD commented Nov 30, 2021 •

edited

Loading

paddle-bot-old bot commented Nov 30, 2021 •

edited

Loading

paddle-bot-old bot commented Nov 30, 2021

qingqing01 left a comment

wanghaoshuang Nov 30, 2021

minghaoBD Dec 1, 2021

paddle-bot-old bot commented Dec 9, 2021

minghaoBD commented Dec 30, 2021

wanghaoshuang left a comment

wanghaoshuang commented Dec 31, 2021

minghaoBD commented Dec 31, 2021

JZ-LIANG left a comment

XiaoguangHu01 left a comment

TCChenlong left a comment

[Paddle-ASP]Support sharding training for the Nvidia's ASP(2:4 sparsity) functionality #37725

[Paddle-ASP]Support sharding training for the Nvidia's ASP(2:4 sparsity) functionality #37725

Conversation

minghaoBD commented Nov 30, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Nov 30, 2021 • edited Loading

paddle-bot-old bot commented Nov 30, 2021

qingqing01 left a comment

Choose a reason for hiding this comment

wanghaoshuang Nov 30, 2021

Choose a reason for hiding this comment

minghaoBD Dec 1, 2021

Choose a reason for hiding this comment

paddle-bot-old bot commented Dec 9, 2021

minghaoBD commented Dec 30, 2021

wanghaoshuang left a comment

Choose a reason for hiding this comment

wanghaoshuang commented Dec 31, 2021

minghaoBD commented Dec 31, 2021

JZ-LIANG left a comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

TCChenlong left a comment

Choose a reason for hiding this comment

minghaoBD commented Nov 30, 2021 •

edited

Loading

paddle-bot-old bot commented Nov 30, 2021 •

edited

Loading