feat/llama-2 examples #319

mhenrichsen · 2023-07-22T21:20:35Z

Example of qlora training llama-2 7b

winglian · 2023-07-22T23:15:16Z

Could you run this through the pre-commit hook please? Thank you!

mhenrichsen · 2023-07-23T11:17:27Z

@winglian fixed. Also added a lora example.

ssmi153 · 2023-07-23T23:10:31Z

Do these config files work out of the box without the changes listed here: #294 ? or will we need to wait for the suggested changes in that issue to be implemented?

philpax · 2023-07-25T06:17:02Z

I have successfully trained a Llama-2 7B QLoRA on a 3090 using this and it seems to work. Thanks for this!

Layoric · 2023-07-25T11:16:40Z

Have also trained Llamav2 7B and 13B with this, both showing good improvements 👍

NanoCode012 · 2023-07-25T13:27:11Z

According to the llama2 page, they recommend adding the PAD token to the special token config. I think this is easy to add. We can use the one that axolotl hardcodes [PAD].

However, I'm unclear whether point 2 within #294 is necessary.

mhenrichsen · 2023-07-25T17:49:45Z

According to the llama2 page, they recommend adding the PAD token to the special token config. I think this is easy to add. We can use the one that axolotl hardcodes [PAD].

However, I'm unclear whether point 2 within #294 is necessary.

@NanoCode012
According to this, the pad token should be <pad> or am I reading that wrong?

mhenrichsen · 2023-07-26T11:09:44Z

Added the pad token. Not sure if it made a difference for training, but can confirm inference still works.


### Instruction:
What is the meaning of life?

### Response:
 The ultimate meaning or purpose of human existence, commonly known as "the meaning of life," can vary significantly between individuals and cultures. However, some general themes or ideas about its significance have emerged throughout history. 

One common interpretation of the meaning of life is to find personal happiness and fulfillment by pursuing goals and realizing one's potential. This could involve cultivating relationships, engaging in work or activities that bring satisfaction, or contributing positively to society. Another interpretation suggests that life has no particular end goal but rather serves as a journey or series of experiences, wherein each stage brings its own unique value and lessons. This perspective emphasizes the importance of embracing and savoring each moment while taking advantage of opportunities for growth and development.

A third view holds that the meaning of life lies within spirituality, religion, or faith. For many people, finding inner peace and connection with something greater than oneself provides a deeper understanding of the purpose of existence. Ultimately, how one defines or perceives the meaning of life often depends on individual values, belief systems, and personal preferences.```

buzzCraft · 2023-07-26T12:51:23Z

7b works fine on ml.g5.12xlarge Sagemaker
(with qLora settings..)

ssmi153 · 2023-07-28T09:39:45Z

I've successfully trained Llama-2 13b with this suggested QLora configuration and it worked well. I'm having some troubles with the 70b model though. It looks like our x-formers attention monkeypatch doesn't like the Grouped Query Attention that the 70B model uses. I also got errors trying to run it with FlashAttention instead of x-formers attention.

One other comment: the way we've added the pad token at the moment sets both and to token 0 in the tokenizer. I doubt that this causes any real issues in practise, but I noticed that in the debug output.

All in all though, the new configs work for the smaller models which is great!

(For reference, here's the stack trace from the 70B model training attempt - just tried this again with the latest docker container on Runpod - slightly different error to the old one):
[2023-07-30 11:32:17,550] [INFO] [axolotl.scripts.train:219] [PID:1060] loading tokenizer... /external/models/meta-llama_Llama-2-70b-hf loss = self.compute_loss(model, inputs) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2671, in compute_loss outputs = model(**inputs) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/utils/operations.py", line 581, in forward File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 810, in forward outputs = self.model( File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 690, in forward layer_outputs = torch.utils.checkpoint.checkpoint( File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint return CheckpointFunction.apply(function, preserve, *args) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 107, in forward outputs = run_function(*args) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 686, in custom_forward return module(*inputs, output_attentions, None) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 413, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/workspace/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_xformers.py", line 48, in xformers_forward .view(bsz, q_len, self.num_heads, self.head_dim) RuntimeError: shape '[1, 4096, 64, 128]' is invalid for input of size 4194304

ssmi153 · 2023-07-31T09:35:03Z

Further to my comment about the 70B model, this looks very similar to what people are experiencing on FastChat here: lm-sys/FastChat#2075 . It looks like someone over there got it working by updating one of their dependencies - I'm asking for more info at the moment. (Would people prefer if I move this into its own issue rather than mucking up the Pull Request?)

tmm1 · 2023-08-03T16:55:24Z

Yes it would be helpful to have a new issue about 70B specific problems

* qlora llama-2 * qlora llama-2 * linting * readme * lora added * linting * change group_by_length * 13b fitting on 24gb * grouped lengths true * add pad token * change out dir --------- Co-authored-by: Mads Henrichsen <mads@Brbar-tilhrende-Mads.local>

mhenrichsen added 2 commits July 22, 2023 23:19

qlora llama-2

4ce76c1

qlora llama-2

a3ac842

Mads Henrichsen added 4 commits July 23, 2023 12:12

linting

098b313

readme

6aa5348

lora added

00b0083

linting

9514467

mhenrichsen changed the title ~~feat/qlora llama 2~~ feat/llama-2 examples Jul 23, 2023

Mads Henrichsen and others added 3 commits July 23, 2023 13:44

change group_by_length

36702d6

13b fitting on 24gb

9ac844b

grouped lengths true

750524e

mhenrichsen added 2 commits July 26, 2023 12:21

add pad token

e038aa7

change out dir

07bdbe8

NanoCode012 merged commit dc71d88 into axolotl-ai-cloud:main Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat/llama-2 examples #319

feat/llama-2 examples #319

mhenrichsen commented Jul 22, 2023

winglian commented Jul 22, 2023

mhenrichsen commented Jul 23, 2023

ssmi153 commented Jul 23, 2023

philpax commented Jul 25, 2023

Layoric commented Jul 25, 2023

NanoCode012 commented Jul 25, 2023

mhenrichsen commented Jul 25, 2023

mhenrichsen commented Jul 26, 2023 •

edited

Loading

buzzCraft commented Jul 26, 2023 •

edited

Loading

ssmi153 commented Jul 28, 2023 •

edited

Loading

ssmi153 commented Jul 31, 2023

tmm1 commented Aug 3, 2023

feat/llama-2 examples #319

feat/llama-2 examples #319

Conversation

mhenrichsen commented Jul 22, 2023

winglian commented Jul 22, 2023

mhenrichsen commented Jul 23, 2023

ssmi153 commented Jul 23, 2023

philpax commented Jul 25, 2023

Layoric commented Jul 25, 2023

NanoCode012 commented Jul 25, 2023

mhenrichsen commented Jul 25, 2023

mhenrichsen commented Jul 26, 2023 • edited Loading

buzzCraft commented Jul 26, 2023 • edited Loading

ssmi153 commented Jul 28, 2023 • edited Loading

ssmi153 commented Jul 31, 2023

tmm1 commented Aug 3, 2023

mhenrichsen commented Jul 26, 2023 •

edited

Loading

buzzCraft commented Jul 26, 2023 •

edited

Loading

ssmi153 commented Jul 28, 2023 •

edited

Loading