Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load model from check-points #2746

Open
Jakobhenningjensen opened this issue Jun 12, 2024 · 3 comments
Open

Load model from check-points #2746

Jakobhenningjensen opened this issue Jun 12, 2024 · 3 comments

Comments

@Jakobhenningjensen
Copy link

I have a training which pushes to MyRepo/model-train during training phase at each logging interval i.e

    training_args = SentenceTransformerTrainingArguments(
         .
         .
        load_best_model_at_end=True,
        push_to_hub=True,
        push_to_hub_organization="MyRepo",
        push_to_hub_model_id="Model-train",
        push_to_hub_token=os.environ["HUGGING_FACE_API_TOKEN"]
    )

when the training was done, the model.push_to_hub() failed (my fault) thus I wanted to just load the model from my checkpoint and push that.

If I try to do SentenceTransformers("MyRepo/Model-train") then I get No sentence-transformers model found with name ... Creating a new one with mean pooling..

Is there anything I can do? Or do I just need to retrain and wait for 18 hours? 😬

If I go to the repo I have the following files
image

@Jakobhenningjensen
Copy link
Author

Jakobhenningjensen commented Jun 12, 2024

I'm currently trying doing it manually like

from sentence_transformers.models import Pooling, Transformers

model = Transformer(path)
pooling_model = Pooling(model.get_word_embedding_dimension(), "mean")
sbert = SentenceTransformer(modules=[model, pooling_model])

but I'm not entirely sure if that is correct i.e if "mean-pooling" was used or "cls" (the model does work fine though).
Note, I don't want to continue the training I actually just want to load the checkpoint and push that to hub.

(would you be interested in me making a PR of a SentenceTransformer.from_pretained ?)

@ganeshkrishnan1
Copy link

You can directly load the sentencetransformer with a path

   # word_embedding_model = models.Transformer('mixedbread-ai/mxbai-embed-large-v1')
    # pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension())
    # model_kwargs = {'device': 'cuda','attn_implementation':'flash_attention_2'}
    # model = SentenceTransformer(modules=[word_embedding_model, pooling_model], device="cuda", model_kwargs=model_kwargs)
 model=SentenceTransformer("./aihello-sbert-combined-model/checkpoint-200000/")

@tomaarsen
Copy link
Collaborator

I'm currently trying doing it manually like

from sentence_transformers.models import Pooling, Transformers

model = Transformer(path)
pooling_model = Pooling(model.get_word_embedding_dimension(), "mean")
sbert = SentenceTransformer(modules=[model, pooling_model])

but I'm not entirely sure if that is correct i.e if "mean-pooling" was used or "cls" (the model does work fine though). Note, I don't want to continue the training I actually just want to load the checkpoint and push that to hub.

If the base model is not yet a SentenceTransformer model, then this (i.e., mean pooling) is indeed equivalent. Like @ganeshkrishnan1 rightfully points out, you can also use a path, and even if it's a non-ST model, it'll automatically add the mean pooling. Otherwise, it'll add whatever modules (including pooling) was specified in the configuration files.

(would you be interested in me making a PR of a SentenceTransformer.from_pretained ?)

I don't think so, my intention is that the SentenceTransformer constructor performs similarly as e.g. AutoModel.from_pretrained in transformers. Adding a SentenceTransformer.from_pretrained should then not add much, and it'll be a bit confusing for users.

As for your original question: I'm not super familiar with the automatic push_to_hub via the Training Args, but I think it depends on what your hub_strategy is. If you use checkpoint or all_checkpoints, then you should be able to continue training with the last checkpoint if you get a crash. Otherwise, by default (i.e. every_save) I think it only uploads the model, config, tokenizer, and model card, but not the trainer state.

In short, in your case I think you can just do

model = SentenceTransformers("MyRepo/Model-train")
model.push_to_hub("MyRepo/Model-train")

but it depends on the base model. If it was:

  1. not a ST model, or
  2. an ST model with mean pooling (check the 1_Pooling/config.json file)

then you're good to go with this.

  • Tom Aarsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants