This directory contains recipes for pre-training and fine-tuning large language models (LLMs) using NeMo.
A recipe in NeMo is a Python file that defines a complete configuration for training or fine-tuning an LLM. Each recipe typically includes:
- Model configuration: Defines the architecture and hyperparameters of the LLM.
- Training configuration: Specifies settings for the PyTorch Lightning Trainer, including distributed training strategies.
- Data configuration: Sets up the data pipeline, including batch sizes and sequence lengths.
- Optimization configuration: Defines the optimizer and learning rate schedule.
- Logging and checkpointing configuration: Specifies how to save model checkpoints and log training progress.
See CONFIGURATION-HIERARCHY.md for an extensive list of parameters and features available in the recipe modules outlined above.
Recipes are designed to be modular and extensible, allowing users to easily customize settings for their specific use cases.
You can use these recipes via the NeMo CLI (provided by NeMo-Run):
nemo llm <task> --factory <recipe_name>
Where:
<task>
is eitherpretrain
orfinetune
<recipe_name>
is the name of the recipe (e.g.llama3_8b
)
For example:
nemo llm pretrain --factory llama3_8b
Important
When launching the recipes with multiple processes (i.e. on multiple GPUs), add the -y
option to the command to avoid user confirmation prompts.
For example, nemo llm pretrain --factory llama3_8b -y
You can override any parameter in the recipe:
nemo llm pretrain --factory llama3_8b trainer.max_steps=2000
For more details around running recipes, see pre-train.
See ADD-RECIPE.md for instructions on how to add a new recipe.