Should I do 'accelerator.prepare()' again after 'accelerator.unwrap_model()' ? #948

DtYXs · 2022-12-29T07:56:09Z

For multi-gpu training:
model = accelerator.prepare(model)
I want to save ckpt during training. Then I do this:
model = accelerator.unwrap_model(model)
pipeline = Pipeline(model=model)
pipeline.save_pretrained(...)
And I want to continue training. Should I do model = accelerator.prepare(model) again after saving?

The text was updated successfully, but these errors were encountered:

pacman100 · 2022-12-29T12:56:28Z

Hello,

below is the code structure of follow:

model = accelerator.prepare(model)

for epoch in range(num_epochs):
    # train loop
    
    # eval loop
    
    # saving checkpoint
    unwrapped_model = accelerator.unwrap_model(model)
    
    pipeline = Pipeline(model=unwrapped_model)
    pipeline.save_pretrained(...)

   # alternatively you can save using `unwrapped_model.save_pretrained(...)`

Let us know if that solves your query. For such queries in future, please use https://discuss.huggingface.co/c/accelerate/18

DtYXs · 2022-12-29T18:51:59Z

Hello,

below is the code structure of follow:
model = accelerator.prepare(model)

for epoch in range(num_epochs):
    # train loop
    
    # eval loop
    
    # saving checkpoint
    unwrapped_model = accelerator.unwrap_model(model)
    
    pipeline = Pipeline(model=unwrapped_model)
    pipeline.save_pretrained(...)

   # alternatively you can save using `unwrapped_model.save_pretrained(...)`
Let us know if that solves your query. For such queries in future, please use https://discuss.huggingface.co/c/accelerate/18

Thanks for your helpful reply.
If the operation of unwrapped_model affect model?
Actually, in my code, if I continue to train the original model after doing this, there will be problems. I want to know whether this unwrapped_model is completely independent of model, and if not, how can I save the ckpt multiple times without destroying the training process, not only after the full training and then save the ckpt once.
For short, I want to save ckpt multiple times in the training process. But accelerator.unwrap_model(model) seems cannot work well.

pacman100 · 2022-12-30T01:30:09Z

Hello @DtYXs, can you please provide a minimal reproducible example of the issue that you are facing?

miquelmarti · 2023-01-18T15:08:10Z

I'm having problems with this too. When using mixed precision with fp16 and no distributed training, after I use unwrap_model to save a checkpoint after one epoch running at close to max memory usage, immediately on the next forward pass of training I get OOM. If I don't unwrap, no OOM. I am looking into it but I'm pretty sure the culprit is around here:

accelerate/src/accelerate/utils/other.py

Line 55 in aaa2637

original_forward = model.__dict__.pop("_original_forward", None)

unwrap_model seems to affect the passed model and remove its autocast etc wrapper for the forward pass.

sgugger · 2023-01-18T15:10:12Z

Yes it does, this was added recently by @muellerzr.
Maybe we should have a flag in unwrap_model so the user can select if they want model hooks to be removed or not.

miquelmarti · 2023-01-18T15:12:33Z

One would expect they get removed on the returned object, but never on the object you pass to this function, or?

sgugger · 2023-01-18T15:23:30Z

It's tricky since a model is a reference type, and we can't really create a whole copy of the model.

muellerzr · 2023-01-18T15:26:45Z

@miquelmarti what Sylvain already describes is actually supported. Just do:

model = accelerator.unwrap_model(model, keep_fp32_wrapper=True)

Will this work for you? :)

miquelmarti · 2023-01-19T09:40:51Z

I understand and yes, this flag does work for me. However, the behaviour before the change was equivalent to setting the flag to True and IMO that should be the default now as well so that unwrapping the model does not affect the next calls to forward. I think train -> checkpoint unwrapped model -> keep training is probably the most common case of unwrap_model. If you have good reason for it not being set to True by default, some warning on the documentation would probably save people upgrading or having expectations similar to mine some debugging time :')

EDIT: I guess #858 is the reason to remove the hooks by default. If I remove the hook using the flag I have the same problem as described in the issue when I do intermediate checkpoints, if I don't training cannot continue in the same way. There should be a way to allow for both.

github-actions · 2023-02-12T15:05:56Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

JeS24 · 2024-07-15T18:18:35Z

@muellerzr @sgugger Was any consensus reached on what to do here? I am facing the same dilemma as @miquelmarti and would rather prefer that unwrapping and checkpointing the model (using save_pretrained()) does not impact its ability to continue training.

github-actions bot closed this as completed Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should I do 'accelerator.prepare()' again after 'accelerator.unwrap_model()' ? #948

Should I do 'accelerator.prepare()' again after 'accelerator.unwrap_model()' ? #948

DtYXs commented Dec 29, 2022 •

edited

Loading

pacman100 commented Dec 29, 2022

DtYXs commented Dec 29, 2022 •

edited

Loading

pacman100 commented Dec 30, 2022

miquelmarti commented Jan 18, 2023 •

edited

Loading

sgugger commented Jan 18, 2023

miquelmarti commented Jan 18, 2023

sgugger commented Jan 18, 2023

muellerzr commented Jan 18, 2023

miquelmarti commented Jan 19, 2023 •

edited

Loading

github-actions bot commented Feb 12, 2023

JeS24 commented Jul 15, 2024

Should I do 'accelerator.prepare()' again after 'accelerator.unwrap_model()' ? #948

Should I do 'accelerator.prepare()' again after 'accelerator.unwrap_model()' ? #948

Comments

DtYXs commented Dec 29, 2022 • edited Loading

pacman100 commented Dec 29, 2022

DtYXs commented Dec 29, 2022 • edited Loading

pacman100 commented Dec 30, 2022

miquelmarti commented Jan 18, 2023 • edited Loading

sgugger commented Jan 18, 2023

miquelmarti commented Jan 18, 2023

sgugger commented Jan 18, 2023

muellerzr commented Jan 18, 2023

miquelmarti commented Jan 19, 2023 • edited Loading

github-actions bot commented Feb 12, 2023

JeS24 commented Jul 15, 2024

DtYXs commented Dec 29, 2022 •

edited

Loading

DtYXs commented Dec 29, 2022 •

edited

Loading

miquelmarti commented Jan 18, 2023 •

edited

Loading

miquelmarti commented Jan 19, 2023 •

edited

Loading