TensorFlow Use Case

Installation and evaluation steps

CaT's repository contains the folder experiments/cat-tf with the necessary scripts to install and run the experiments performed with the TensorFlow application.

Next, we detail how to download and prepare the ImageNet dataset and how to use CaT to trace the training of the LeNet model.

Download and prepare the dataset

Go to imagenet dataset directory: cd datasets/imagenet
Download the training and validation images from the ImageNet Large Scale Visual Recognition Challenge 2012 dataset:
- Training images (Task 1 & 2) - ILSVRC2012_img_train.tar - 138GB
- Validation images (all tasks) - ILSVRC2012_img_val.tar - 6.3GB
Convert dataset images to the TFRecord format: ./imagenet_to_tfrecord.sh

When finished, there should be a directory named "tf_records" containing 1152 TFRecords files (1024 for training and 128 for validation), occupying approximately 144 GiB.

How to run

Script "train-official-model.sh"

This script allows to select and train TensorFlow models (ResNet-50, AlexNet, and LeNet) on the ImageNet dataset.

Options:

To specify the model to train, use the flag -m [model]:
- resnet for training the ResNet-50 model
- alexnet for training the AlexNet model
- lenet for training the LeNet model
To specify the batch size, use the flag -b [batch_size].
To specify the number of epochs, use the flag -e [number_epochs].
To specify the number of GPUs, use the flag -e [number_gpus].
To specify the deployment, use the flag -d [deployment]:
- vanilla for training without tracing
- catbpf for tracing the training with the CatBpf tracer
- catstrace for tracing the training with the CatStrace tracer

Example:

./train-official-model.sh -m lenet -b 64 -e 20 -g 1 -t catbpf

Results:

The results are saved to the cat-tf/results directory:

$ tree results
results/
└── lenet-bs64-ep2-catbpf-2021_10_14-01_07
    ├── catbpf-log-lenet-bs64-ep2-catbpf-2021_10_14-01_07.txt   <- catbpf log
    ├── dstat-2021_10_14-01_07.csv                              <- dstat output
    ├── info-lenet-bs64-ep2-catbpf-2021_10_14-01_07.txt         <- experiment information
    ├── iostat-2021_10_14-01_07.csv                             <- iostat output
    ├── log-lenet-bs64-ep2-catbpf-2021_10_14-01_07.txt          <- tensorflow output
    ├── nvidia-smi-2021_10_14-01_07.csv                         <- nvidia-smi output
    └── trace-lenet-bs64-ep2-catbpf-2021_10_14-01_07.json       <- catbpf trace

Content-aware Tracers Evaluation

CaT prototype was used to capture TensorFlow’s interactions with the storage medium while reading the ImageNet Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) dataset during the training of the LeNet CNN model, for 20 epochs, with a batch size of 64, and using a single GPU.

For vanilla deployment (without tracing):

./train-official-model.sh -m lenet -b 64 -e 20 -g 1 -d vanilla

For CatBpf deployment (tracing with CatBpf):

./train-official-model.sh -m lenet -b 64 -e 20 -g 1 -d catbpf

For CatStrace deployment (tracing with CatStrace):

./train-official-model.sh -m lenet -b 64 -e 20 -g 1 -d catstrace

Each experiment was performed twice (i.e., 2 runs for each deployment).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly