-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ErnieDoc P0] add PretrainedConfig and unit test #5210
Conversation
Thanks for your contribution! |
@@ -75,6 +70,8 @@ class ErnieDocTokenizer(ErnieTokenizer): | |||
"ernie-doc-base-zh": {"do_lower_case": True}, | |||
} | |||
|
|||
# max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里缺少了预训练模型对应的PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES,导致它的max_model_input_size继承了ErnieTokenizer。导致了test_pretrained_model_lists无法通过。
但是我在文档中并未找到 "ernie-doc-base-zh"这个模型的POSITIONAL_EMBEDDINGS_SIZES,所以尚未添加。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里对齐configurations里面的max_position_embeddings即可
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d_model
还是不要改成hidden_size
,维持原样.
PR质量挺高的,稍微修改一下即可
@@ -75,6 +70,8 @@ class ErnieDocTokenizer(ErnieTokenizer): | |||
"ernie-doc-base-zh": {"do_lower_case": True}, | |||
} | |||
|
|||
# max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里对齐configurations里面的max_position_embeddings即可
class ErnieTokenizationTest(TokenizerTesterMixin, unittest.TestCase): | ||
|
||
tokenizer_class = ErnieDocTokenizer | ||
# fast_tokenizer_class = ErnieFastTokenizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没有fast tokenizer可以删掉
Codecov Report
@@ Coverage Diff @@
## develop #5210 +/- ##
===========================================
+ Coverage 50.93% 51.69% +0.75%
===========================================
Files 461 467 +6
Lines 65731 66629 +898
===========================================
+ Hits 33481 34444 +963
+ Misses 32250 32185 -65
... and 25 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
std=self.initializer_range | ||
if hasattr(self, "initializer_range") | ||
else self.ernie_doc.config["initializer_range"], | ||
else self.ernie_doc.initializer_range, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std=self.config.initializer_range
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
PR types
PR changes
Description