Skip to content

Commit

Permalink
Jwilber/bionemo example small updates (NVIDIA#561)
Browse files Browse the repository at this point in the history
Went through the example repo over break and found a few very small
nitpicks:

- Minor typos in the README.
- The finetuning cli script in README had the wrong path
- Updated tensorboard logging dir in pretrain_script to match the
attribute name and finetune script's structure

---------

Signed-off-by: Jared Wilber <jwilber@nvidia.com>
  • Loading branch information
jwilber authored Jan 6, 2025
1 parent d34307b commit 0640396
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 10 deletions.
18 changes: 9 additions & 9 deletions sub-packages/bionemo-example_model/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

# Introduction

This is a minimalist package containing an example model that makes use of bionemo2 and nemo conventions. It contains the necessary models, dataloaders, datasets, and custom loss fucntions. The referenced classes and function are in `bionemo.example_model.lightning.lightning_basic`.
This is a minimalist package containing an example model that makes use of bionemo2 and nemo conventions. It contains the necessary models, dataloaders, datasets, and custom loss functions. The referenced classes and functions are in `bionemo.example_model.lightning.lightning_basic`.

This tutorial demonstrates the creation of a simple MNIST model. This should be run in a BioNeMo container. The BioNeMo Framework container can run in a brev.dev launchable: [![ Click here to deploy.](https://uohmivykqgnnbiouffke.supabase.co/storage/v1/object/public/landingpage/brevdeploynavy.svg)](https://console.brev.dev/launchable/deploy?launchableID=env-2pPDA4sJyTuFf3KsCv5KWRbuVlU). It takes about 10 minutes to deploy this notebook as a Launchable. As of this writing, we are working on a free tier so a credit card may be required. You can reach out to your NVIDIA rep for credit. Notebooks and a shell interface can be launced by clicking `Open Notebook`. (Note: This links to the nightly release and may be out of sync with these docs.)
This tutorial demonstrates the creation of a simple MNIST model. This should be run in a BioNeMo container. The BioNeMo Framework container can run in a brev.dev launchable: [![ Click here to deploy.](https://uohmivykqgnnbiouffke.supabase.co/storage/v1/object/public/landingpage/brevdeploynavy.svg)](https://console.brev.dev/launchable/deploy?launchableID=env-2pPDA4sJyTuFf3KsCv5KWRbuVlU). It takes about 10 minutes to deploy this notebook as a Launchable. As of this writing, we are working on a free tier so a credit card may be required. You can reach out to your NVIDIA rep for credit. Notebooks and a shell interface can be launched by clicking `Open Notebook`. (Note: This links to the nightly release and may be out of sync with these docs.)


For this tutorial, we will reuse elements from the BioNeMo example_model package.
Expand All @@ -26,10 +26,10 @@ Loss functions used here are `MSELossReduction` and `ClassifierLossReduction`. T

# Datasets and Datamodules

Datasets used for model training must be compatible with Megatron datasets. To enable this, the output of a given index and epoch must be deterministic. However, we may wish to have a different ordering in every epoch. To enable this, the items in the dataset should be accessible by both the epoch and the index. This can be done by accessing elements of the dataset with `EpochIndex` from `bionemo.core.data.multi_epoch_dataset`. A simple way of doing this is to wrap a dataset with `IdentityMultiEpochDatasetWrapper` imported from `bionemo.core.data.multi_epoch_dataset`. In this example, in in `bionemo.example_model.lightning.lightning_basic`, we use a custom dataset `MNISTCustomDataset` that wraps the `__getitem__` method of the MNIST dataset such that it return a dict instead of a Tuple or tensor. The `MNISTCustomDataset` returns elements of type `MnistItem`, which is a `TypedDict`.
Datasets used for model training must be compatible with Megatron datasets. To enable this, the output of a given index and epoch must be deterministic. However, we may wish to have a different ordering in every epoch. To enable this, the items in the dataset should be accessible by both the epoch and the index. This can be done by accessing elements of the dataset with `EpochIndex` from `bionemo.core.data.multi_epoch_dataset`. A simple way of doing this is to wrap a dataset with `IdentityMultiEpochDatasetWrapper` imported from `bionemo.core.data.multi_epoch_dataset`. In this example, in in `bionemo.example_model.lightning.lightning_basic`, we use a custom dataset `MNISTCustomDataset` that wraps the `__getitem__` method of the MNIST dataset such that it returns a dict instead of a Tuple or tensor. The `MNISTCustomDataset` returns elements of type `MnistItem`, which is a `TypedDict`.


In the data module/data loader class, it is necessary to have a data_sampler method to shuffle the data and that allows the sampler to be used with Megatron. This is a nemo2 peculiarity. A `nemo.lightning.pytorch.plugins.MegatronDataSampler` is the best choice. It sets up the capability to utilize micro-batching and gradient accumulation. It is also the place where the global batch size is constructed.
In the data module/data loader class, it is necessary to have a data_sampler attribute to shuffle the data and that allows the sampler to be used with Megatron. This is a nemo2 peculiarity. A `nemo.lightning.pytorch.plugins.MegatronDataSampler` is the best choice. It sets up the capability to utilize micro-batching and gradient accumulation. It is also the place where the global batch size is constructed.

Also the sampler will not shuffle your data. So you need to wrap your dataset in a dataset shuffler that maps sequential IDs to random IDs in your dataset. This can be done with `MultiEpochDatasetResampler` from `bionemo.core.data.multi_epoch_dataset`.

Expand Down Expand Up @@ -75,7 +75,7 @@ Similarly, `ExampleFineTuneConfig` extends `ExampleGenericConfig` for finetuning

# Training Module

It is helfpul to have a training module that inherits from `lightning.pytorch.LightningModule` which organizes the model architecture, training, validation, and testing logic while abstracting away boilerplate code, enabling easier and more scalable training. This wrapper can be used for all model and loss combinations specified in the config.
It is helpful to have a training module that inherits from `lightning.pytorch.LightningModule` which organizes the model architecture, training, validation, and testing logic while abstracting away boilerplate code, enabling easier and more scalable training. This wrapper can be used for all model and loss combinations specified in the config.
In `bionemo.example_model.lightning.lightning_basic`, we define `BionemoLightningModule`.

In this example, `training_step`, `validation_step`, and `predict_step` define the training, validation, and prediction loops are independent of the forward method. In nemo:
Expand All @@ -88,7 +88,7 @@ In this example, `training_step`, `validation_step`, and `predict_step` define t

Additionally, during these steps, we log the validation, testing, and training loss. This is done similarly to https://lightning.ai/docs/torchmetrics/stable/pages/lightning.html. These logs can then be exported to wandb, or other metric viewers. For more complicated tracking, it may be necessary to use pytorch callbacks: https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html.

Further `loss_reduction_class()`, `training_loss_reduction()`, `validation_loss_reduction(),` and` test_loss_reduction()` are defined based on what's in the config. Additionally, `configure_model()` is definated based on the config.
Further `loss_reduction_class()`, `training_loss_reduction()`, `validation_loss_reduction(),` and` test_loss_reduction()` are defined based on what's in the config. Additionally, `configure_model()` is defined based on the config.

# Training the models
In `bionemo.example_model.lightning.lightning_basic` a checkpoint_callback variable is defined. This enables .nemo file-like checkpointing.
Expand All @@ -99,7 +99,7 @@ We specify a training strategy of type `nemo.lightning.MegatronStrategy`. This s

We specify a trainer of type `nemo.lightning.Trainer`, which is an extension of the pytorch lightning trainer. This is where the devices, validation intervals, maximal steps, maximal number of epochs, and how frequently to log are specified.

we specify a nemo-logger. We can set TensorBoard and WandB logging, along with extra loggers. Here, we specify a `CSVLogger` from lightning.pytorch.loggers.
We specify a nemo-logger. We can set TensorBoard and WandB logging, along with extra loggers. Here, we specify a `CSVLogger` from lightning.pytorch.loggers.

We can now proceed to training. The first pre-training scripts is `bionemo/example_model/training_scripts/pretrain_mnist.py`

Expand All @@ -109,7 +109,7 @@ This script will print out the location of the final model: <pretrain_directory>

Then we can run a finetuning-script:
```
python src/bionemo/example_model/training_scripts/training_scripts/finetune_mnist.py ---pretrain_ckpt_dirpath <pretrain_directory>
python src/bionemo/example_model/training_scripts/finetune_mnist.py ---pretrain_ckpt_dirpath <pretrain_directory>
```

A nuance here is that in the config file, we specify the initial checkpoint path, along with which keys to skip. In the previous model checkpoint, we did not have a head labelled "digit_classifier", so we specify it as a head to be skipped.
Expand All @@ -121,4 +121,4 @@ Finally, we can run a classification task with
python src/bionemo/example_model/training_scripts/predict_mnist.py --finetune_dir <finetune_dir>.
```

The results can be viewed with TensorBoardLogger if that is configured, or as a CSV file created by the CSVLogger.
The results can be viewed with TensorBoardLogger if that is configured, or as a CSV file created by the `CSVLogger`.
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def run_pretrain(name: str, directory_name: str):
nemo_logger = NeMoLogger(
log_dir=str(save_dir),
name=name,
tensorboard=TensorBoardLogger(save_dir=directory_name, name=name),
tensorboard=TensorBoardLogger(save_dir=save_dir, name=name),
ckpt=checkpoint_callback,
extra_loggers=[CSVLogger(save_dir / "logs", name=name)],
)
Expand Down

0 comments on commit 0640396

Please sign in to comment.