Load model from check-points #2746

Jakobhenningjensen · 2024-06-12T13:15:10Z

I have a training which pushes to MyRepo/model-train during training phase at each logging interval i.e

    training_args = SentenceTransformerTrainingArguments(
         .
         .
        load_best_model_at_end=True,
        push_to_hub=True,
        push_to_hub_organization="MyRepo",
        push_to_hub_model_id="Model-train",
        push_to_hub_token=os.environ["HUGGING_FACE_API_TOKEN"]
    )

when the training was done, the model.push_to_hub() failed (my fault) thus I wanted to just load the model from my checkpoint and push that.

If I try to do SentenceTransformers("MyRepo/Model-train") then I get No sentence-transformers model found with name ... Creating a new one with mean pooling..

Is there anything I can do? Or do I just need to retrain and wait for 18 hours? 😬

If I go to the repo I have the following files

The text was updated successfully, but these errors were encountered:

Jakobhenningjensen · 2024-06-12T17:03:40Z

I'm currently trying doing it manually like

from sentence_transformers.models import Pooling, Transformers

model = Transformer(path)
pooling_model = Pooling(model.get_word_embedding_dimension(), "mean")
sbert = SentenceTransformer(modules=[model, pooling_model])

but I'm not entirely sure if that is correct i.e if "mean-pooling" was used or "cls" (the model does work fine though).
Note, I don't want to continue the training I actually just want to load the checkpoint and push that to hub.

(would you be interested in me making a PR of a SentenceTransformer.from_pretained ?)

ganeshkrishnan1 · 2024-06-16T03:12:40Z

You can directly load the sentencetransformer with a path

   # word_embedding_model = models.Transformer('mixedbread-ai/mxbai-embed-large-v1')
    # pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension())
    # model_kwargs = {'device': 'cuda','attn_implementation':'flash_attention_2'}
    # model = SentenceTransformer(modules=[word_embedding_model, pooling_model], device="cuda", model_kwargs=model_kwargs)
 model=SentenceTransformer("./aihello-sbert-combined-model/checkpoint-200000/")

tomaarsen · 2024-06-17T12:49:10Z

I'm currently trying doing it manually like
from sentence_transformers.models import Pooling, Transformers

model = Transformer(path)
pooling_model = Pooling(model.get_word_embedding_dimension(), "mean")
sbert = SentenceTransformer(modules=[model, pooling_model])
but I'm not entirely sure if that is correct i.e if "mean-pooling" was used or "cls" (the model does work fine though). Note, I don't want to continue the training I actually just want to load the checkpoint and push that to hub.

If the base model is not yet a SentenceTransformer model, then this (i.e., mean pooling) is indeed equivalent. Like @ganeshkrishnan1 rightfully points out, you can also use a path, and even if it's a non-ST model, it'll automatically add the mean pooling. Otherwise, it'll add whatever modules (including pooling) was specified in the configuration files.

(would you be interested in me making a PR of a SentenceTransformer.from_pretained ?)

I don't think so, my intention is that the SentenceTransformer constructor performs similarly as e.g. AutoModel.from_pretrained in transformers. Adding a SentenceTransformer.from_pretrained should then not add much, and it'll be a bit confusing for users.

As for your original question: I'm not super familiar with the automatic push_to_hub via the Training Args, but I think it depends on what your hub_strategy is. If you use checkpoint or all_checkpoints, then you should be able to continue training with the last checkpoint if you get a crash. Otherwise, by default (i.e. every_save) I think it only uploads the model, config, tokenizer, and model card, but not the trainer state.

In short, in your case I think you can just do

model = SentenceTransformers("MyRepo/Model-train")
model.push_to_hub("MyRepo/Model-train")

but it depends on the base model. If it was:

not a ST model, or
an ST model with mean pooling (check the 1_Pooling/config.json file)

then you're good to go with this.

Tom Aarsen

wangguan1995 mentioned this issue Dec 21, 2024

cant load sentence-transformers/all-MiniLM-L6-v2 offline #3142

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load model from check-points #2746

Load model from check-points #2746

Jakobhenningjensen commented Jun 12, 2024

Jakobhenningjensen commented Jun 12, 2024 •

edited

Loading

ganeshkrishnan1 commented Jun 16, 2024

tomaarsen commented Jun 17, 2024

Load model from check-points #2746

Load model from check-points #2746

Comments

Jakobhenningjensen commented Jun 12, 2024

Jakobhenningjensen commented Jun 12, 2024 • edited Loading

ganeshkrishnan1 commented Jun 16, 2024

tomaarsen commented Jun 17, 2024

Jakobhenningjensen commented Jun 12, 2024 •

edited

Loading