-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Hackathon 4th No.102】给AutoConverter增加新的模型组网的支持 AlbertModel #5626
Conversation
…nto albert_auto_converter
Thanks for your contribution! |
Codecov Report
@@ Coverage Diff @@
## develop #5626 +/- ##
===========================================
+ Coverage 59.58% 59.59% +0.01%
===========================================
Files 483 483
Lines 68102 68121 +19
===========================================
+ Hits 40581 40600 +19
Misses 27521 27521
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
刚看了一下 PaddleNLP-CI 失败的原因,好象是别的测试用例出错了? |
是得,这个你暂时不用管。 |
transformers 源码中也是通过 AlbertForMaskedLM 与 AlbertForPretraining 也是同样来处理。 |
@@ -357,6 +358,118 @@ class AlbertPretrainedModel(PretrainedModel): | |||
pretrained_init_configuration = ALBERT_PRETRAINED_INIT_CONFIGURATION | |||
pretrained_resource_files_map = ALBERT_PRETRAINED_RESOURCE_FILES_MAP | |||
|
|||
@classmethod | |||
def _get_name_mappings(cls, config: AlbertConfig) -> List[StateDictNameMapping]: | |||
mappings: list[StateDictNameMapping] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个变量定义在这里没有用到,故可以删除。
# ("AlbertForMaskedLM",), TODO: need to tie weights | ||
# ("AlbertForPretraining",), TODO: need to tie weights | ||
("AlbertForMultipleChoice",), | ||
# ("AlbertForQuestionAnswering",), TODO: transformers NOT add the last pool layer before qa_outputs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可通过 architectures 来控制 pooler 的映射。
# ("AlbertForMaskedLM",), TODO: need to tie weights | ||
# ("AlbertForPretraining",), TODO: need to tie weights |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
等 #5623 合入之后就可以用 tie_weights 了。
class AlbertForQuestionAnswering(AlbertPreTrainedModel):
_keys_to_ignore_on_load_unexpected = [r"pooler"]
def __init__(self, config: AlbertConfig):
super().__init__(config)
self.num_labels = config.num_labels
self.albert = AlbertModel(config, add_pooling_layer=False) # 注意这里
self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)
# Initialize weights and apply final processing
self.post_init() 在 {
"_name_or_path": "tiny_models/albert/AlbertForQuestionAnswering",
"architectures": [
"AlbertForQuestionAnswering"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 2,
"classifier_dropout_prob": 0.1,
"embedding_size": 128,
"eos_token_id": 3,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 36,
"initializer_range": 0.02,
"inner_group_num": 1,
"intermediate_size": 37,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "albert",
"num_attention_heads": 6,
"num_hidden_groups": 6,
"num_hidden_layers": 6,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"torch_dtype": "float32",
"transformers_version": "4.28.0.dev0",
"type_vocab_size": 16,
"vocab_size": 30000
} 而在 class AlbertModel(AlbertPretrainedModel):
def __init__(self, config: AlbertConfig):
...
if config.add_pooling_layer: # 注意这里
self.pooler = nn.Linear(config.hidden_size, config.hidden_size)
self.pooler_activation = nn.Tanh()
else:
self.pooler = None
self.pooler_activation = None
self.init_weights() 当用 这样一般用没啥问题,因为模型的 我这里把 if config.add_pooling_layer:
model_mappings.extend(
[
["pooler.weight", "pooler.weight", "transpose"],
["pooler.bias", "pooler.bias"],
]
) 请帮忙看看这样处理行不行? 谢谢! |
|
嗯 |
需要merge develop一下哈 |
…nto albert_auto_converter
搞定,请评审,谢谢! |
没问题的 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
PR types
New features
PR changes
APIs
Description
【Hackathon 4th No.102】给AutoConverter增加新的模型组网的支持 AlbertModel
Hackathon 4th No.102 这个任务里面有5个模型,我计划每个模型单独提PR,这个PR是处理 albert 模型。
hf-internal-testing/tiny-random-AlbertModel
反馈几个问题:
AlbertForQuestionAnswering
模型无法测试,transformers
的这个模型与paddlenlp
有点区别:transformers
的albert
没有添加pooling
层,而paddle
是有的,导致无法做name mapping。请问这个要怎么处理?
AlbertForMaskedLM
与AlbertForPretraining
参考BertModel
同样无法测试。@wj-Mcat 请评审,谢谢!:)