There are three parts to this code.
- Feature extraction
- Similarity matrix generation
- Variety contribution ratio calculation
You need to add your images into a directory called database/, so it will look like this:
├── src/ # Source files
├── cache/ # Generated on runtime for feature extraction file
├── models/ # Containing all the model training files
├── README.md # Intro to the repo
└── database/ # Directory of all your images
all your images should be put into database/
In this directory, each image class should have its own directory and the images belonging to that class should put into that directory.
To get started with feature extraction, run the feature extraction code through python resnet.py
after following the env steps and folder management as described there.
Once you run the above code, visit the cache/ directory where you will find hte extracted features file. The same file will be used in the next step.
After the feature extraction step is completed the Similarity matrix and variety can be generated by running the DSI class from the DSI - Similarity matrix and Variety contribution ratio calculation.py file
Generation of Similarity matrix and variety is followed by the code for variety contribution ratio and to remove the redundant images for dataset optimization.
Original dataset credits are to their respective authors:
- A. Khosla, N. Jayadevaprakash, B. Yao, F.-F. Li, Novel dataset for fine-grained image categorization: Stanford dogs, in: Proc. CVPR Workshop on Fine-Grained Visual Categorization (FGVC), Vol. 2, 2011.
- Nilsback, Maria-Elena, and Andrew Zisserman. "Automated flower classification over a large number of classes." 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing. IEEE, 2008.
Feature extraction is based on the work of Po-Chih Huang's CBIR system based on ResNet features.
If you want to cite the entire work of Dataset Structural Index: Leveraging machine's perspective towards visual data please make sure to include the full citiation as follows:
@article{parikh2021dataset,
title={Dataset Structural Index: Leveraging a machine's perspective towards visual data},
author={Parikh, Dishant},
journal={arXiv preprint arXiv:2110.04070},
year={2021}
}
Dishant Parikh | DishantP