Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ErnieDoc P0] add PretrainedConfig and unit test #5210

Merged
merged 3 commits into from
Mar 14, 2023

Conversation

ZwhElliott
Copy link
Contributor

PR types

PR changes

Description

@paddle-bot
Copy link

paddle-bot bot commented Mar 13, 2023

Thanks for your contribution!

@@ -75,6 +70,8 @@ class ErnieDocTokenizer(ErnieTokenizer):
"ernie-doc-base-zh": {"do_lower_case": True},
}

# max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里缺少了预训练模型对应的PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES,导致它的max_model_input_size继承了ErnieTokenizer。导致了test_pretrained_model_lists无法通过。
但是我在文档中并未找到 "ernie-doc-base-zh"这个模型的POSITIONAL_EMBEDDINGS_SIZES,所以尚未添加。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里对齐configurations里面的max_position_embeddings即可

Copy link
Collaborator

@sijunhe sijunhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d_model还是不要改成hidden_size,维持原样.
PR质量挺高的,稍微修改一下即可

@@ -75,6 +70,8 @@ class ErnieDocTokenizer(ErnieTokenizer):
"ernie-doc-base-zh": {"do_lower_case": True},
}

# max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里对齐configurations里面的max_position_embeddings即可

class ErnieTokenizationTest(TokenizerTesterMixin, unittest.TestCase):

tokenizer_class = ErnieDocTokenizer
# fast_tokenizer_class = ErnieFastTokenizer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有fast tokenizer可以删掉

@codecov
Copy link

codecov bot commented Mar 14, 2023

Codecov Report

Merging #5210 (75eb5f4) into develop (c7395fb) will increase coverage by 0.75%.
The diff coverage is 100.00%.

❗ Current head 75eb5f4 differs from pull request most recent head d9f1207. Consider uploading reports for the commit d9f1207 to get more accurate results

@@             Coverage Diff             @@
##           develop    #5210      +/-   ##
===========================================
+ Coverage    50.93%   51.69%   +0.75%     
===========================================
  Files          461      467       +6     
  Lines        65731    66629     +898     
===========================================
+ Hits         33481    34444     +963     
+ Misses       32250    32185      -65     
Impacted Files Coverage Δ
paddlenlp/transformers/__init__.py 100.00% <100.00%> (ø)
paddlenlp/transformers/ernie_doc/configuration.py 100.00% <100.00%> (ø)
paddlenlp/transformers/ernie_doc/modeling.py 96.55% <100.00%> (+77.18%) ⬆️
paddlenlp/transformers/ernie_doc/tokenizer.py 89.47% <100.00%> (+2.51%) ⬆️

... and 25 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Comment on lines 303 to 305
std=self.initializer_range
if hasattr(self, "initializer_range")
else self.ernie_doc.config["initializer_range"],
else self.ernie_doc.initializer_range,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std=self.config.initializer_range

Copy link
Collaborator

@sijunhe sijunhe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@sijunhe sijunhe merged commit 9590300 into PaddlePaddle:develop Mar 14, 2023
@ZwhElliott ZwhElliott deleted the dth-039 branch March 17, 2023 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants