Skip to content

Commit

Permalink
Merge pull request #350 from tmm1/group-len-false-examples
Browse files Browse the repository at this point in the history
set `group_by_length` to false in all examples
  • Loading branch information
tmm1 authored Aug 9, 2023
2 parents 8e3a0f5 + bbe633b commit f5994fc
Show file tree
Hide file tree
Showing 6 changed files with 8 additions and 6 deletions.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -426,7 +426,9 @@ save_safetensors:
# whether to mask out or include the human's prompt from the training labels
train_on_inputs: false
# don't use this, leads to wonky training (according to someone on the internet)
# group similarly sized data to minimize padding
# may be slower to start, as it must download and sort the entire dataset
# note that training loss may have an oscillating pattern with this enabled
group_by_length: false
# Whether to use gradient checkpointing https://huggingface.co/docs/transformers/v4.18.0/en/performance#gradient-checkpointing
Expand Down
2 changes: 1 addition & 1 deletion examples/cerebras/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ torchdistx_path:
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: true
group_by_length: false
bf16: true
fp16: false
tf32: true
Expand Down
2 changes: 1 addition & 1 deletion examples/gptj/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ torchdistx_path:
lr_scheduler: cosine
learning_rate: 0.0001
train_on_inputs: false
group_by_length: true
group_by_length: false
bf16: true
fp16: false
tf32: true
Expand Down
2 changes: 1 addition & 1 deletion examples/llama-2/lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: true
group_by_length: false
bf16: true
fp16: false
tf32: false
Expand Down
2 changes: 1 addition & 1 deletion examples/llama-2/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: true
group_by_length: false
bf16: true
fp16: false
tf32: false
Expand Down
2 changes: 1 addition & 1 deletion examples/openllama-3b/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ torchdistx_path:
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: true
group_by_length: false
bf16: true
fp16: false
tf32: true
Expand Down

0 comments on commit f5994fc

Please sign in to comment.