-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mistral 7b conversion script #8052
Mistral 7b conversion script #8052
Conversation
57d578c
to
91ac647
Compare
a8516bf
to
c1b0eb0
Compare
4a4e725
to
29c543a
Compare
Could you please keep the conversion script consistent with other ones? I specifically mean:
In particular, at best we keep these aligned across scripts:
Llama2 conversion script has a bit different flavour, but I believe we should follow the "majority" style for new scripts (and align Llama2 at some point). |
for key in keys: | ||
checkpoint['state_dict'][key.replace('model.', 'model.module.', 1)] = checkpoint['state_dict'].pop(key) | ||
|
||
model = load_model(MegatronGPTModel, checkpoint, strict=False, trainer=trainer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function load_model
is a bit convolved, can you avoid using it? I recommend instantiating model with sth like
model = MegatronGPTModel(cfg, trainer)
missing_keys, unexpected_keys = model.load_state_dict(hf_state_dict, strict=False)
see also #7977.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if the model.load_state_dict
with strict=False
verifies if any weight were loaded at all? I'm not implying that load_model
performs this check, rather want to understand if this is something we care doing at this stage of the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That PR #7977 has been merged. Could you please replace load_model
with load_state_dict_helper
? The former is really cluttered and unnecessarily saves two tokenizers -- the one stored in tokenizer.tokenizer_model
is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And as for strict=False
there are checks for missing_keys
and unexpected_keys
lists in load_state_dict_helper
that will complain if any expected weights are not loaded, or are superfluous.
As for architecture, Mistral seems to heavily borrow from Llama2. Would it be possible to just extend Llama2 conversion script to cover HF Mistral checkpoint? |
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
Hi @janekl thanks for your comments! I'm leaning towards merging this as is and once we settle on how we want to refactor the import scripts do the actual refactor. I think you are right it shares a lot with the Llama script, and once the PRs you mention have been approved & merged we can follow the appropriate guidelines here as well. |
4dc48ae
to
fab45fc
Compare
scripts/nlp_language_modeling/convert_nemo_mistral_7b_to_hf.py
Dismissed
Show dismissed
Hide dismissed
scripts/nlp_language_modeling/convert_nemo_mistral_7b_to_hf.py
Dismissed
Show dismissed
Hide dismissed
From mistral checkpoint not hf. Pending: support for block-diagonal attention mask. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
for more information, see https://pre-commit.ci Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
fab45fc
to
1a0e698
Compare
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
1a0e698
to
763fbb2
Compare
jenkins |
jenkins |
I think that adding a Jenkins conversion test for a small Mistral model would be really desirable. Examples are here Lines 391 to 419 in f10d694
cc @ericharper |
Sorry for replying late. That's fine, we can polish it later, there is some consistency PR open here #8192. In any case, please see my comments in two other threads. We really need to test conversion in analogy to other scripts. Otherwise we won't have any confidence in refactoring work later. I think @ericharper should support this extra effort. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
* Import script for mistral-7b. From mistral checkpoint not hf. Pending: support for block-diagonal attention mask. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add window_size to nemo_config. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Switch from Mistral checkpoint to HF-Mistral. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Force lowercase when checking for normalization type. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * NeMo-Mistral-7B to HF-Mistral-7B. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com>
* Import script for mistral-7b. From mistral checkpoint not hf. Pending: support for block-diagonal attention mask. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add window_size to nemo_config. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Switch from Mistral checkpoint to HF-Mistral. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Force lowercase when checking for normalization type. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * NeMo-Mistral-7B to HF-Mistral-7B. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com>
* Import script for mistral-7b. From mistral checkpoint not hf. Pending: support for block-diagonal attention mask. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add window_size to nemo_config. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Switch from Mistral checkpoint to HF-Mistral. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Force lowercase when checking for normalization type. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * NeMo-Mistral-7B to HF-Mistral-7B. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Sasha Meister <ameister@nvidia.com>
This reverts commit 9944304.
* Import script for mistral-7b. From mistral checkpoint not hf. Pending: support for block-diagonal attention mask. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add window_size to nemo_config. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Switch from Mistral checkpoint to HF-Mistral. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Force lowercase when checking for normalization type. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * NeMo-Mistral-7B to HF-Mistral-7B. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com>
* Import script for mistral-7b. From mistral checkpoint not hf. Pending: support for block-diagonal attention mask. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add window_size to nemo_config. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Switch from Mistral checkpoint to HF-Mistral. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Force lowercase when checking for normalization type. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * NeMo-Mistral-7B to HF-Mistral-7B. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Pablo Garay <pagaray@nvidia.com>
* Import script for mistral-7b. From mistral checkpoint not hf. Pending: support for block-diagonal attention mask. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add window_size to nemo_config. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Switch from Mistral checkpoint to HF-Mistral. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Force lowercase when checking for normalization type. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * NeMo-Mistral-7B to HF-Mistral-7B. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com>
What does this PR do ?
Adds conversion script for mistral-7b to NeMo.
Collection: [Note which collection this PR will affect]
nlp/language_modeling
Changelog
Usage
# Add a code snippet demonstrating how to use this
Jenkins CI
To run Jenkins, a NeMo User with write access must comment
jenkins
on the PR.Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information