Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat/llama-2 examples #319

Merged
merged 11 commits into from
Aug 3, 2023

Conversation

mhenrichsen
Copy link
Collaborator

Example of qlora training llama-2 7b

@winglian
Copy link
Collaborator

image

Could you run this through the pre-commit hook please? Thank you!

@mhenrichsen mhenrichsen changed the title feat/qlora llama 2 feat/llama-2 examples Jul 23, 2023
@mhenrichsen
Copy link
Collaborator Author

@winglian fixed. Also added a lora example.

@ssmi153
Copy link
Contributor

ssmi153 commented Jul 23, 2023

Do these config files work out of the box without the changes listed here: #294 ? or will we need to wait for the suggested changes in that issue to be implemented?

@philpax
Copy link
Contributor

philpax commented Jul 25, 2023

I have successfully trained a Llama-2 7B QLoRA on a 3090 using this and it seems to work. Thanks for this!

@Layoric
Copy link

Layoric commented Jul 25, 2023

Have also trained Llamav2 7B and 13B with this, both showing good improvements 👍

@NanoCode012
Copy link
Collaborator

According to the llama2 page, they recommend adding the PAD token to the special token config. I think this is easy to add. We can use the one that axolotl hardcodes [PAD].

However, I'm unclear whether point 2 within #294 is necessary.

@mhenrichsen
Copy link
Collaborator Author

According to the llama2 page, they recommend adding the PAD token to the special token config. I think this is easy to add. We can use the one that axolotl hardcodes [PAD].

However, I'm unclear whether point 2 within #294 is necessary.

@NanoCode012
According to this, the pad token should be <pad> or am I reading that wrong?

@mhenrichsen
Copy link
Collaborator Author

mhenrichsen commented Jul 26, 2023

Added the pad token. Not sure if it made a difference for training, but can confirm inference still works.


### Instruction:
What is the meaning of life?

### Response:
 The ultimate meaning or purpose of human existence, commonly known as "the meaning of life," can vary significantly between individuals and cultures. However, some general themes or ideas about its significance have emerged throughout history. 

One common interpretation of the meaning of life is to find personal happiness and fulfillment by pursuing goals and realizing one's potential. This could involve cultivating relationships, engaging in work or activities that bring satisfaction, or contributing positively to society. Another interpretation suggests that life has no particular end goal but rather serves as a journey or series of experiences, wherein each stage brings its own unique value and lessons. This perspective emphasizes the importance of embracing and savoring each moment while taking advantage of opportunities for growth and development.

A third view holds that the meaning of life lies within spirituality, religion, or faith. For many people, finding inner peace and connection with something greater than oneself provides a deeper understanding of the purpose of existence. Ultimately, how one defines or perceives the meaning of life often depends on individual values, belief systems, and personal preferences.```

@buzzCraft
Copy link

buzzCraft commented Jul 26, 2023

7b works fine on ml.g5.12xlarge Sagemaker
(with qLora settings..)

@ssmi153
Copy link
Contributor

ssmi153 commented Jul 28, 2023

I've successfully trained Llama-2 13b with this suggested QLora configuration and it worked well. I'm having some troubles with the 70b model though. It looks like our x-formers attention monkeypatch doesn't like the Grouped Query Attention that the 70B model uses. I also got errors trying to run it with FlashAttention instead of x-formers attention.

One other comment: the way we've added the pad token at the moment sets both and to token 0 in the tokenizer. I doubt that this causes any real issues in practise, but I noticed that in the debug output.

All in all though, the new configs work for the smaller models which is great!

(For reference, here's the stack trace from the 70B model training attempt - just tried this again with the latest docker container on Runpod - slightly different error to the old one):
[2023-07-30 11:32:17,550] [INFO] [axolotl.scripts.train:219] [PID:1060] loading tokenizer... /external/models/meta-llama_Llama-2-70b-hf loss = self.compute_loss(model, inputs) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2671, in compute_loss outputs = model(**inputs) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/utils/operations.py", line 581, in forward File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 810, in forward outputs = self.model( File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 690, in forward layer_outputs = torch.utils.checkpoint.checkpoint( File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, *args) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward outputs = run_function(*args) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 686, in custom_forward return module(*inputs, output_attentions, None) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 413, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/workspace/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_xformers.py", line 48, in xformers_forward .view(bsz, q_len, self.num_heads, self.head_dim) RuntimeError: shape '[1, 4096, 64, 128]' is invalid for input of size 4194304

@ssmi153
Copy link
Contributor

ssmi153 commented Jul 31, 2023

Further to my comment about the 70B model, this looks very similar to what people are experiencing on FastChat here: lm-sys/FastChat#2075 . It looks like someone over there got it working by updating one of their dependencies - I'm asking for more info at the moment. (Would people prefer if I move this into its own issue rather than mucking up the Pull Request?)

@NanoCode012 NanoCode012 merged commit dc71d88 into axolotl-ai-cloud:main Aug 3, 2023
@tmm1
Copy link
Collaborator

tmm1 commented Aug 3, 2023

Yes it would be helpful to have a new issue about 70B specific problems

mkeoliya pushed a commit to mkeoliya/axolotl that referenced this pull request Dec 15, 2023
* qlora llama-2

* qlora llama-2

* linting

* readme

* lora added

* linting

* change group_by_length

* 13b fitting on 24gb

* grouped lengths true

* add pad token

* change out dir

---------

Co-authored-by: Mads Henrichsen <mads@Brbar-tilhrende-Mads.local>
djsaunde pushed a commit that referenced this pull request Dec 17, 2024
* qlora llama-2

* qlora llama-2

* linting

* readme

* lora added

* linting

* change group_by_length

* 13b fitting on 24gb

* grouped lengths true

* add pad token

* change out dir

---------

Co-authored-by: Mads Henrichsen <mads@Brbar-tilhrende-Mads.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants