Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One question about the permute function code in permute_qkv.py #89

Open
drxmy opened this issue Nov 24, 2023 · 2 comments
Open

One question about the permute function code in permute_qkv.py #89

drxmy opened this issue Nov 24, 2023 · 2 comments

Comments

@drxmy
Copy link

drxmy commented Nov 24, 2023

I am trying to convert baichuan2-megatron to hf. When reading the code, i can not understand this part

def permute(x):
        if revert:
            return x.view(head_dim//2, 2, dim).transpose(0, 1).reshape(head_dim, dim)
        return x.view(2, head_dim//2, dim).transpose(0, 1).reshape(head_dim, dim)

Why head_dim//2?
Really appreciate it if someone can explain this.

@martinjaggi
Copy link
Contributor

could you be more precise on the question, and which code file and model architecture you're referring to?

@drxmy
Copy link
Author

drxmy commented Jan 3, 2024

could you be more precise on the question, and which code file and model architecture you're referring to?

I trained a baichuan2 model(https://huggingface.co/baichuan-inc/Baichuan2-7B-Base) with Megatron-LM and want to convert it back to huggingface format.

I dont understand this part of the code when trying add support for baichuan2
https://github.com/epfLLM/Megatron-LLM/blob/1b06b129fa463b7bfce88ef49e2082f8df00c7fa/weights_conversion/utils/permute_qkv.py#L15C3-L18C82

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants