Loading from checkpoints re-downloads pre-trained BERT model #9236

sivakhno · 2021-08-31T16:53:45Z

sivakhno
Aug 31, 2021

I am defining a simple multi-class BERT classification model and then training it using pytorch-lightning. The code is in https://colab.research.google.com/drive/1os9mz7w7gmLBL_ZDvZ9K1saz9UA3rmD7?usp=sharing under class BertForMulticlassSequenceClassification(BertPreTrainedModel). The issue is that after training when I am loading the classifier model model = ClassTaggerModel.load_from_checkpoint(checkpoint_file) I get

Some weights of BertForMulticlassSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifiers.0.weight', 'classifiers.1.bias', 'classifiers.0.bias', 'classifiers.1.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

The reason is probably because pl.LightningModule module has transformer's from_pretrained function that would normally downloads weights from huggingface. This is undesirable behaviour when loading from the trained checkpoint. Is there any feature in Pytroch-lightning that can help having different logic for these two cases (training vs loading). Thanks!

Answered by rohitgr7

Aug 31, 2021

It's because lightning instantiates the LightningModel and then loads the weights using load_from_checkpoint and since you have HFModel.from_pretrained in the init it will load the pretrained weights every time. There is a way around for this.

class HFLightningModule(LightningModule):
    def __init__(self, ..., model_name=None)
        if model_name is not None:
            self.model = HFModel.from_pretrained(model_name, ...)
        else:
            self.model = HFModel(config, num_classes)


model = HFLightningModule(..., model_name='bert-base-cased')
trainer.fit(model, ...)

model = HFLightningModule.load_from_checkpoint(...)

Although there might be a better solution.

View full answer

rohitgr7 · 2021-08-31T20:40:50Z

rohitgr7
Aug 31, 2021

It's because lightning instantiates the LightningModel and then loads the weights using load_from_checkpoint and since you have HFModel.from_pretrained in the init it will load the pretrained weights every time. There is a way around for this.

class HFLightningModule(LightningModule):
    def __init__(self, ..., model_name=None)
        if model_name is not None:
            self.model = HFModel.from_pretrained(model_name, ...)
        else:
            self.model = HFModel(config, num_classes)


model = HFLightningModule(..., model_name='bert-base-cased')
trainer.fit(model, ...)

model = HFLightningModule.load_from_checkpoint(...)

Although there might be a better solution.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading from checkpoints re-downloads pre-trained BERT model #9236

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Loading from checkpoints re-downloads pre-trained BERT model #9236

sivakhno Aug 31, 2021

Replies: 1 comment

rohitgr7 Aug 31, 2021

sivakhno
Aug 31, 2021

rohitgr7
Aug 31, 2021