Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dygraph sharding stage2 #37707

Merged

Conversation

Baibaifan
Copy link
Contributor

@Baibaifan Baibaifan commented Nov 30, 2021

PR types

New features

PR changes

Others

Describe

Add dygraph sharding stage2

import paddle
from paddle.distributed.fleet.meta_optimizers.dygraph_optimizer.sharding_optimizer_stage2 import ShardingOptimizerStage2
from paddle.distributed.fleet.meta_parallel.sharding.sharding_stage2 import ShardingStage2

fleet.init(is_collective=True)
group = paddle.distributed.new_group([0, 1])

# wrap model & optimizer 
model = model_class(...)
oss_optimizer = ShardingOptimizer(params=model.parameters(), optim=optimizer, group=group)
model = ShardingStage2(model, oss_optimizer, group=group)

# use optimizer as normal
img, label = data
label.stop_gradient = True
img.stop_gradient = True
out = model(img)

loss = paddle.nn.functional.cross_entropy(input=out, label=label)
oss_optimizer.step()
oss_optimizer.clear_grad()

1.Accuracy test
dp2 fp32 - sharding stage2 fp32
d79fa0689b34500cae8ba08f20f67338
dp2 fp32 - sharding stage2 fp16
e9a4cb709252d37a666d74380ff73bab

2.Performance testing
stage2 buffer size=8Mb
0.31B参数 GPT-3 模型
gbs=16,mbs=2:dp(9412 tokens/s) ; stage1(11440 tokens/s) ; stage2(11992 tokens/s)

0.31B参数 GPT-3 模型
gbs=4,mbs=2:dp(9723 tokens/s) ; stage1(10244 tokens/s) ; stage2(10451 tokens/s)

  1. GPU memory test
    GPT-3 0.31B sharding stage1 fp32
    ef5f735df26f4b8b0d9160736283c187
    GPT-3 0.31B sharding stage2 fp32
    a0367d0d7c4494a64c8531a1b94496da
    stage1 memory peak:6899 stage2 memory peak:6353 reduced 546Mb Optimization and promotion 8%.

@paddle-bot-old
Copy link

paddle-bot-old bot commented Nov 30, 2021

✅ This PR's description meets the template requirements!
Please wait for other CI results.

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@XieYunshen XieYunshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for set_tests_properties(test_dygraph_sharding_stage2 PROPERTIES TIMEOUT 120)

Copy link
Member

@ForFishes ForFishes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Baibaifan Baibaifan merged commit 20e1977 into PaddlePaddle:develop Dec 2, 2021
Zjq9409 pushed a commit to Zjq9409/Paddle that referenced this pull request Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants