-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple the program desc with batch_size in Transformer. #783
Decouple the program desc with batch_size in Transformer. #783
Conversation
… fix-transformer-batchsize-dev
… fix-transformer-batchsize-dev
… fix-transformer-batchsize-dev
@@ -273,6 +301,9 @@ def main(): | |||
|
|||
trg_idx2word = paddle.dataset.wmt16.get_dict( | |||
"de", dict_size=ModelHyperParams.trg_vocab_size, reverse=True) | |||
# Append the <pad> token since the dict provided by dataset.wmt16 does | |||
# not include it. | |||
trg_idx2word[ModelHyperParams.trg_pad_idx] = "<pad>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix this in next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get it.
@@ -138,12 +144,14 @@ def test(exe): | |||
test_avg_costs = [] | |||
for batch_id, data in enumerate(val_data()): | |||
if len(data) != TrainTaskConfig.batch_size: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Refine the validation and use the global statistics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Decouple the program desc with batch_size in Transformer. The inference program has been validated to have the same generated sentences for different batch size.
It relies on PaddlePaddle/Paddle#9008 .