Skip to content

ML Training PyTorch MIVisionX

Kiriti Gowda edited this page Jan 11, 2021 · 1 revision

ML Training - PyTorch with RALI

To run the training example, either create your own docker or pull an already existing docker.

sudo pull lakshmikumar/rocm3.7_tf1.15_mivisionx:v0.3

The docker contains all the pre-requisites. MIVisionX is built and installed.

  • Step 1: Go into the rali_pybind/example folder.
cd MIVisionX/rali/rlai_pybind/example/tf_petsTrainingExample
  • Step 2: Modify the shell script in the folder to give the appropriate path for the data set. Replace lines 16-17 with these:
vim download_and_preprocess_dataset.sh
Line 16: python3.6 dataset_tools/create_pet_tf_record.py --data_dir=/root/MIVisionX/rali/rali_pybind/example/tf_petsTrainingExample --output_dir=/root/MIVisionX/rali/rali_pybind/example/tf_petsTrainingExample/tfr
Line 17: cd /root/MIVisionX/rali/rali_pybind/example/tf_petsTrainingExample/tfr
  • Step 3: For the first run in the docker, the train_withRALI_withTFRecordReader.py script remains as is. For subsequent runs within the same docker instance, replace line 24:
DATASET_DOWNLOAD_AND_PREPROCESS = False
  • Step 4: Export libraries
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/mivisionx/lib/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/rocm/rpp/lib/
  • Step 5: Run the training script
python3.6 train_withRALI_withTFRecordReader.py