-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐯 Fix LigerKernel for SFTTrainer #2940
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@@ -173,7 +173,6 @@ def __init__( | |||
) | |||
if isinstance(model, str): | |||
model = self._create_model_from_path(model, args) | |||
self.use_liger = is_liger_kernel_available() and isinstance(model, AutoLigerKernelForCausalLM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the model used is already a liger model (and args.use_liger = False)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, but as recommended by the Liger maintainer, we shouldn't be passing Liger models at the init (just patched via the config)
Should we deprecate passing the Liger model to the trainer or would you prefer an alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this command doesn't achieve what it's supposed to:
>>> from liger_kernel.transformers import AutoLigerKernelForCausalLM
>>> model = AutoLigerKernelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B")
Applied Liger kernels to Qwen2
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
>>> isinstance(model, AutoLigerKernelForCausalLM)
False
I don't know of an easy way to test whether a model is liger (perhaps @ByronHsu does?).
Anyway, with your change you can still pass a liger model to the trainer. But you'll need to specify use_liger=True
. Which sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line doesn't work when a Liger Model is converted to PEFT before passing into the trainer. It does not respect args.use_liger either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, but as recommended by the Liger maintainer, we shouldn't be passing Liger models at the init (just patched via the config)
Should we deprecate passing the Liger model to the trainer or would you prefer an alternative?
Deprecating it might cause some problem. LoRA+ requires the model instance to be created before hand (to create optimizer). The flag use_liger
does not convert PEFT wrapped model to liger model.
model = get_peft_model(model, lora_config)
optimizer = create_loraplus_optimizer(
model=model,
optimizer_cls=torch.optim.AdamW,
lr=lr,
eps=eps,
betas=betas,
weight_decay=weight_decay,
loraplus_lr_ratio=loraplus_lr_ratio,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag
use_liger
does not convert PEFT wrapped model to liger model.
in sft_tariner.py:
if args.use_liger:
if not is_liger_kernel_available():
raise ImportError("Please install Liger-kernel for use_liger=True")
model = AutoLigerKernelForCausalLM.from_pretrained(model_path, **model_init_kwargs)
else:
model = AutoModelForCausalLM.from_pretrained(model_path, **model_init_kwargs)
return model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, many VLMs can't be loaded with AutoLigerKernelForCausalLM
. For example, monkey patching apply_liger_kernel_to_qwen2_vl()
is required for Qwen2-VL
@kashif @qgallouedec @lewtun , I think Liger models are now supported nativately in in transformers if the |
i think so too... we will need to pin the transformer version but yes should be a better soluton |
Thanks @edbeeching!
After checking, it seems like |
it was introduced in 4.45 |
Thanks for the pointer to If yes, feel free to merge if I'm offline :) |
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
What does this PR do?
Fixes an issue where
use_liger=True
threw an error in the loss computation because the fix in #2874 was partially reverted in this line of #2890. Without this change, one gets errors on the loss computationCommand to test:
trl sft --model_name_or_path Qwen/Qwen2.5-0.5B --dataset_name trl-lib/Capybara --output_dir Qwen2.5-0.5B-SFT --use_liger true
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.