Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finetune nvidia/parakeet-tdt-1.1b results out of memeory even if with lower batch size. #10085

Open
sankar-mukherjee opened this issue Aug 8, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@sankar-mukherjee
Copy link

sankar-mukherjee commented Aug 8, 2024

I am trying to finetune nvidia/parakeet-tdt-1.1b model using the instruction below with g5.12xlarge with 4 gpus with 24gb memory.

https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/asr/configs.html#fine-tuning-configurations

First I have created a docker container and inside the container i am running finetune.sh. I am getting OutOfMemoryError before starting the traning. I have tried reducing the batch_size = 16, 8, 4, 2, as well as max_duration of the audio files to = 20, 10 , 5 . None of them succeeds. Can anyone help me?

Docker file

# Use the specified base image
ARG FROM_IMAGE_NAME=[nvcr.io/nvidia/pytorch:24.01-py3](http://nvcr.io/nvidia/pytorch:24.01-py3)
FROM ${FROM_IMAGE_NAME}

# Set the working directory
WORKDIR /ASR

# Expose port 8000 for external communication
EXPOSE 8000

# Install system dependencies
RUN apt-get update && apt-get install -y screen libsndfile1 ffmpeg libsox-dev gfortran

# Install Cython (needed for NeMo)
RUN pip install Cython

# Clone the specified branch of the pytorch-lightning repository and install it
RUN git clone -b bug_fix https://github.com/athitten/pytorch-lightning.git && \
    cd pytorch-lightning && \
    PACKAGE_NAME=pytorch pip install -e .
RUN git clone https://github.com/NVIDIA/TransformerEngine.git && \
    cd TransformerEngine && \
    git fetch origin 8c9abbb80dba196f086b8b602a7cf1bce0040a6a && \
    git checkout FETCH_HEAD && \
    git submodule init && git submodule update && \
    NVTE_FRAMEWORK=pytorch NVTE_WITH_USERBUFFERS=1 MPI_HOME=/usr/local/mpi pip install .
# Copy and install Python dependencies from requirements.txt if necessary
COPY requirements.txt .
RUN pip install -r requirements.txt
# RUN pip uninstall -y huggingface_hub && \
#     pip install huggingface_hub==0.22.0

# Set environment variables for NVIDIA
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV NEMO_CACHE_DIR /efs/smukherjee/ASR/cached_models

# Copy the rest of your application code
COPY . .

requirement.txt

nemo_toolkit[all]
transformers
huggingface-hub==0.23.2
seaborn

finetune.sh

#!/usr/bin/env bash

export HF_HOME='/efs/smukherjee/ASR/cached_models/'
export HYDRA_FULL_ERROR=1

python /efs/smukherjee/NeMo/examples/asr/speech_to_text_finetune.py \
    --config-path=/efs/smukherjee/NeMo/examples/asr/conf/asr_finetune \
    --config-name=speech_to_text_finetune \
    model.train_ds.manifest_filepath="/efs/smukherjee/ASR/data/train_finetune_dataset_raw.json" \
    model.validation_ds.manifest_filepath="/efs/smukherjee/ASR/data/val_finetune_dataset_raw.json" \
    model.train_ds.max_duration=5 \
    model.train_ds.batch_size=2 \
    model.validation_ds.batch_size=2 \
    model.tokenizer.update_tokenizer=False \
    trainer.devices=-1 \
    trainer.accelerator='gpu' \
    trainer.max_epochs=50 \
    exp_manager.exp_dir="/efs/smukherjee/ASR/output/finetune" \
    +model.joint.fused_batch_size=1 \
    +init_from_pretrained_model="nvidia/parakeet-tdt-1.1b"

Error:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 21.98 GiB of which 18.50 MiB is free. Process 6411 has 21.05 GiB memory in use. Process 7096 has 302.00 MiB memory in use. Process 7094 has 302.00 MiB memory in use. Process 7095 has 302.00 MiB memory in use. Of the allocated memory 20.38 GiB is allocated by PyTorch, and 219.99 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Epoch 0:   0%|          | 0/16093 [00:13<?, ?it/s]
@sankar-mukherjee sankar-mukherjee added the bug Something isn't working label Aug 8, 2024
@nithinraok
Copy link
Collaborator

nithinraok commented Aug 22, 2024

Have you tried loading the 1.1b using

from nemo.collections.asr import ASRModel
model = ASRModel.from_pretrained('nvidia/parakeet-tdt-1.1b')

And see the memory usage? You would need twice this size initially as you are looking to finetune from existing model. Memory usage at this point probably would answer if you have anymore memory left to train.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants