This repository provides code to train and used neural network on compressed JPEG images. No pre-trained weights are/will be made available. The article for this repository is available here.
This implementation relies on the module jpeg2dct from uber research team. The SSD used in this repository was taken from this repository and then modified.
All the networks proposed in this repository are modified versions of the three following architectures
- Installation
- Training
- Prediction
- Classification (ImageNet)
- Detection (PascalVOC)
- Detection (MS-COCO)
- Method limitations
The provided code can be used directly or installed as a package.
The examples provided suppose the usage of a virtual environnement. The following steps are to setup this environnement:
# Making virtualenv
mkdir .venv
cd .venv
python3 -m venv jpeg_deep
source jpeg_deep/bin/activate
cd ..
# Optional (maybe upgrade pip)
pip install --upgrade pip
# Installing all the dependencies (the code was tested with the specified version numbers on python 3.+)
pip install tensorflow-gpu==1.14.0
pip install keras==2.2.4
pip install pillow
pip install opencv-python
pip install jpeg2dct
pip install albumentations
pip install tqdm
pip install bs4
pip install cython
pip install pycocotools
pip install matplotlib
pip install lxml
# Corrects bug in saved weights
pip install h5py==2.10.0
If you want to include the provided networks into other projects, you can also install the package.
# Maybe activate a virtual env before running this command
python setup.py install
The training uses a system of configuration files and experiments. This system aims to help saving the parameters of a given run. On start of the training, an experiment folder will be created with copies of the configuration file, weights and logs. Examples config files available are the configuration used for the different training of the paper.
To simplify deployment on different machines, the following variables need to be defined (see the Classification/Detections sections for details in the dataset_path):
# Setting the main dirs for the training datasets
export DATASET_PATH_TRAIN=<path_to_train_directory>
export DATASET_PATH_VAL=<path_to_validation_directory>
export DATASET_PATH_TEST=<path_to_test_directory>
# Setting the directory were the experiment folder will be created
export EXPERIMENTS_OUTPUT_DIRECTORY=<path_to_output_directory>
Once you have defined all the variables and modified the config files to your needs, simply run the following command:
python scripts/training.py -c <config_dir_path> --no-horovod
The config file in the <config_dir_path> needs to be named "config.py" for the script to run correctly.
For more details on classification training on ImageNet dataset, refer to this section, for more details for training on Pascal VOC dataset, refer to this section and for more details for training MS-COCO dataset, refer to this section
The training script support the usage of horovod. Be aware that using horovod may require the modification of some of the parameters. I recommend reading these articles for more details on multi gpu training: 1, 2 and 3.
I highly recommend to train on multiple GPUs for the classification on ImageNet given the size of the dataset. An example file for training with horovod using slurm is provided jpeg_deep.sl.
cd slurm
sbatch jpeg_deep.sl
This script is given as example and will probably not work for your settings, you can refer to the original horovod git for more details on how to get it to work.
No pre-trained weights are/will be made available. To get this section running, you'll have to retrain the networks from scratch.
Displaying the results can be done using the prediction.py script. In order to use the script you have to first carry a training for at least one epoch (the prediction pre-suppose that you have an experiment folder).
The prediction will be done on the test set. You need to modify the config_temp.py file in the experiment generated folder in order to use a different dataset.
Then, simply run the following command:
python scripts/prediction.py <experiment_path> <weights_path>
For the vgg16 based classifiers: The prediction script uses the test generator specified in the config file to get the data. If you choose to use the one provided in the example configuration files, you will need to use the fully convolutional version of the network for prediction (test images are bigger than training images, as in the original vgg paper). The creation of the fully convolutional network can be done using the classification2ssd.py script:
# Create the .h5 for vgg16
python scripts/classification2ssd.py vgg16 <weights_path>
# Create the .h5 for vgg_dct
python scripts/classification2ssd.py vgg16 <weights_path> -dct
We also provide with a way to test the speed of the trained networks. This is done using the prediction_time.py script.
In order to test the speed of the networks, a batch of data is preloaded into memory then prediction is run over this batch for P times, and the overall is done N times. Results is then averaged. You may or may not load weights.
python scripts/prediction_time.py <experiment_path> -nr 10 -w <weights_path>
The trained networks can be evaluated using the following script:
python scripts/evaluate.py <experiment_path> <weights_path>
This script can also generate file for submission on the evaluation servers for the PascalVOC and MSCOCO:
python scripts/evaluate.py <experiment_path> <weights_path> -s -o <output_path>
The table below shows the results obtained (accuracy) compared with the state of the art. All the presented results are on the validation dataset. All the FPS were calculated using a NVIDIA GTX 1080 and using the prediction_time.py script. Batch size was set to 8.
Official Newtorks | top-1 | top-5 | FPS |
---|---|---|---|
VGG16 | 73.0 | 91.2 | N/A |
VGG-DCT | 42.0 | 66.9 | N/A |
ResNet50 | 75.78 | 92.65 | N/A |
LC-RFA | 75.92 | 92.81 | N/A |
LC-RFA-Thinner | 75.39 | 92.57 | N/A |
Deconvolution-RFA | 76.06 | 92.02 | N/A |
VGG based Newtorks (our trainings) | top-1 | top-5 | FPS |
---|---|---|---|
VGG16 | 71.9 | 90.8 | 267 |
VGG-DCT | 65.5 | 86.4 | 553 |
VGG-DCT Y | 62.6 | 84.6 | 583 |
VGG-DCT Deconvolution | 65.9 | 86.7 | 571 |
ResNet50 based Newtorks (our trainings) | top-1 | top-5 | FPS |
---|---|---|---|
ResNet50 | 74.73 | 92.33 | 324 |
LC-RFA | 74.82 | 92.58 | 318 |
LC-RFA Y | 73.25 | 91.40 | 329 |
LC-RFA-Thinner | 74.62 | 92.33 | 389 |
LC-RFA-Thinner Y | 72.48 | 91.04 | 395 |
Deconvolution-RFA | 74.55 | 92.39 | 313 |
The dataset can be downloaded here. Choose the version that suits your needs, I used the 2012 (Object Detection) data.
Once the data is downloaded, to use the provided generators, it should be stored following this tree (as long as you have separated train and validation folders you should be okay)
imagenet
|
|_ train
| |_ n01440764
| |_ n01443537
| |_ ...
|
|_ validation
|_ n01440764
|_ n01443537
|_ ...
Then you'll just need to set the configuration files to fit your needs and follow the procedure described in the training section. Keep in mind that the provided configuration files were used in a distributed training, hence the hyper parameters fit this particular settings. If you don't train that way, you'll need to change them.
Also the system variable should be set to the ImageNet folder (if you use the provided config files)
# Setting the main dirs for the training datasets
export DATASET_PATH_TRAIN=<path_to_train_directory>/imagenet
export DATASET_PATH_VAL=<path_to_validation_directory>/imagenet
export DATASET_PATH_TEST=<path_to_test_directory>/imagenet
Results for training on the Pascal VOC dataset are presented bellow. Networks were either trained on the 2007 train/val set (07) or 2007+2012 train/val sets (07+12) and evaluated on the 2007 test set.
Official Networks | mAP (07) | mAP (07+12) | FPS |
---|---|---|---|
SSD300 | 68.0 | 74.3 | N/A |
SSD300 DCT | 39.2 | 47.8 | N/A |
Networks, VGG based (our trainings) | mAP (07) | mAP (07+12) | FPS |
---|---|---|---|
SSD300 | 65.0 | 74.0 | 102 |
SSD300 DCT | 48.9 | 60.0 | 262 |
SSD300 DCT Y | 50.7 | 59.8 | 278 |
SSD300 DCT Deconvolution | 38.4 | 53.5 | 282 |
Network, ResNet50 based (our trainings) | mAP (07) | mAP (07+12) | FPS |
---|---|---|---|
SSD300-Resnet50 (retrained) | 61.3 | 73.1 | 108 |
SSD300 DCT LC-RFA | 61.7 | 70.7 | 110 |
SSD300 DCT LC-RFA Y | 62.1 | 71.0 | 109 |
SSD300 DCT LC-RFA-Thinner | 58.5 | 67.5 | 176 |
SSD300 DCT LC-RFA-Thinner Y | 60.6 | 70.2 | 174 |
SSD300 DCT Deconvolution-RFA | 54.7 | 68.8 | 104 |
The data can be downloaded on the official website.
After downloading you should have directories following this architecture:
VOCdevkit
|
|_ VOC2007
| |_ Annotations
| |_ ImageSets
| |_ JPEGImages
| |_ ...
|
|_ VOC2012
|_ Annotations
|_ ImageSets
|_ JPEGImages
|_ ...
Then you'll just need to set the configuration files to fit your needs and follow the procedure described in the training section. The hyper-parameters provided for the training were not used in a parallel setting.
Also the system variable should be set to the Pascal VOC folder (if you use the provided config files)
# Setting the main dirs for the training datasets
export DATASET_PATH=<path_to_directory>/VOCdevkit
Results for training on the Pascal VOC dataset are presented bellow. Networks were either trained on the 2007 train/val set (07) or 2007+2012 train/val sets (07+12) and evaluated on the 2007 test set.
Official Networks | mAP 0.5:0.95 |
---|---|
SSD300 | 23.2 |
Networks, VGG based (our trainings) | mAP |
---|---|
SSD300 | 24.5 |
SSD300 DCT | 14.3 |
SSD300 DCT Y | 14.4 |
SSD300 DCT Deconvolution | 13.5 |
Network, ResNet50 based (our trainings) | mAP (07) |
---|---|
SSD300-Resnet50 (retrained) | 26.8 |
SSD300 DCT LC-RFA | 25.8 |
SSD300 DCT LC-RFA Y | 25.2 |
SSD300 DCT LC-RFA-Thinner | 25.4 |
SSD300 DCT LC-RFA-Thinner Y | 24.6 |
SSD300 DCT Deconvolution-RFA | 25.9 |
The data can be downloaded on the official website.
After downloading you should have directories following this architecture:
mscoco
|
|_ annotations
| |_ captions_train2014.json
| |_ instances_train2017.json
| |_ person_keypoints_val2017.json
| |_ ...
|
|_ train2017
| |_ 000000110514.jpg
| |_ ...
|
|_ val2017
| |_ ...
|
|_ test2017
|_ ...
Then you'll just need to set the configuration files to fit your needs and follow the procedure described in the training section. The hyper-parameters provided for the training were not used in a parallel setting.
Also the system variable should be set to the mscoco folder (if you use the provided config files)
# Setting the main dirs for the training datasets
export DATASET_PATH=<path_to_directory>/VOCdevkit
I know from experience that diving into ones code to adapt to its own project is often hard and confusing at first. To help you if you ever want to toy with the code, a built-in documentation is provided. It uses a modify version of the keras documentation generator (here).
To generate the documentation:
pip install mkdocs
cd docs
python autogen.py
To display the documentation:
# From root of the repository
mkdocs serve
The presented method has some limitations especially for general purpose deployments. The two main issues I see are described hereafter.
Resizing images in the RGB domain is straightforward whereas resizing in the DCT domain is more complicated. Although theoretically doable, methods for such usage are not implemented. The following list of articles explore the possibility to resize images directly in the frequency domain:
- On Resizing Images In The DCT Domain
- Image Resizing In The Discrete Cosine Transform Domain
- Fast Image Resizing in Discrete Cosine Transform Domain with Spatial Relationship between DCT Block and its Sub-Blocks
- Design and Analysis of an Image Resizing Filter in the Block-DCT Domain
For classification, the impact is limited as long as the images are about the same size as the original training images. This is due to the fact that the network can be made fully convolutionnals. For detection, this is a bit more complicated as the SSD in the presented implementation does not scale well (although it should theoretically be able to do so). This is due to the original design of the network and the need for padding layers. I intend to test modified version of the network if I find some time to do so.
The second limitation is for training. Data-augmentation has to be carried in the RGB domain, thus the data-augmentation pipeline is the following one: JPEG => RGB => data-augmentation => JPEG => Compressed Input. This slows the training down.
- Use correct path for the PASCAL VOC and MS COCO datasets
- Set correct descriptions for all the config files