-
Notifications
You must be signed in to change notification settings - Fork 1
TensorFlow Use Case
CaT's repository contains the folder experiments/cat-tf with the necessary scripts to install and run the experiments performed with the TensorFlow application.
Next, we detail how to download and prepare the ImageNet dataset and how to use CaT to trace the training of the LeNet model.
-
Go to imagenet dataset directory:
cd datasets/imagenet/imagenet2012
-
Download the training and validation images from the ImageNet Large Scale Visual Recognition Challenge 2012 dataset:
- Training images (Task 1 & 2) - ILSVRC2012_img_train.tar - 138GB
- Validation images (all tasks) - ILSVRC2012_img_val.tar - 6.3GB
-
Go to imagenet dataset directory:
cd datasets/imagenet
-
Convert dataset images to the TFRecord format:
./imagenet_to_tfrecord.sh -e
When finished, there should be a directory "imagenet2012/tf_records" containing 1152 TFRecords files (1024 for training and 128 for validation), occupying approximately 144 GiB.
-
Go to imagenet dataset directory:
cd datasets/imagenet/
-
Download the dataset by running the following command:
./download_imagenette.sh
-
Convert dataset images to the TFRecord format:
./imagenet_to_tfrecord.sh -s
When finished, there should be a directory "imagenette2/tf_records" containing 1152 TFRecords files (1024 for training and 128 for validation), occupying approximately 1.1 GiB.
This script allows to select and train TensorFlow models (ResNet-50, AlexNet, and LeNet) on the ImageNet dataset.
Note: Do not forget to update the file "whitelist.txt" with the correct path to the ImageNet dataset.
- To specify the model to train, use the flag
-m [model]
:-
resnet
for training the ResNet-50 model -
alexnet
for training the AlexNet model -
lenet
for training the LeNet model
-
- To process the Imagenette dataset (imagenette2), use the flag
-s
. By default, the script will use the original Imagenet dataset (imagenet2012). - To specify the batch size, use the flag
-b [batch_size]
. - To specify the number of epochs, use the flag
-e [number_epochs]
. - To specify the number of GPUs, use the flag
-e [number_gpus]
. - To specify the deployment, use the flag
-d [deployment]
:-
vanilla
for training without tracing -
catbpf
for tracing the training with the CatBpf tracer -
catstrace
for tracing the training with the CatStrace tracer
-
./train-official-model.sh -m lenet -b 64 -e 20 -g 1 -d catbpf
The results are saved to the cat-tf/results
directory:
$ tree results
results/
└── lenet-bs64-ep2-catbpf-2021_10_14-01_07
├── catbpf-log-lenet-bs64-ep2-catbpf-2021_10_14-01_07.txt <- catbpf log
├── dstat-2021_10_14-01_07.csv <- dstat output
├── info-lenet-bs64-ep2-catbpf-2021_10_14-01_07.txt <- experiment information
├── iostat-2021_10_14-01_07.csv <- iostat output
├── log-lenet-bs64-ep2-catbpf-2021_10_14-01_07.txt <- tensorflow output
├── nvidia-smi-2021_10_14-01_07.csv <- nvidia-smi output
└── trace-lenet-bs64-ep2-catbpf-2021_10_14-01_07.json <- catbpf trace
CaT prototype was used to capture TensorFlow’s interactions with the storage medium while reading the ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) dataset during the training of the LeNet CNN model, for 20 epochs, with a batch size of 64, and using a single GPU.
./train-official-model.sh -m lenet -b 64 -e 20 -g 1 -d vanilla
./train-official-model.sh -m lenet -b 64 -e 20 -g 1 -d catbpf
./train-official-model.sh -m lenet -b 64 -e 20 -g 1 -d catstrace
Each experiment was performed twice (i.e., 2 runs for each deployment).