This Python code implements a Bag of Visual Words (BoVW) model for image classification. It performs the following tasks:
- Data exploration and visualization
- Dataset preprocessing
- Keypoint extraction using SIFT
- Building a codebook using K-means clustering
- Generating sparse vectors using vector quantization and tf-idf weighting
- Image search functionality based on cosine similarity
- Python 3.x
- OpenCV (
opencv-python
andopencv-contrib-python
) - NumPy
- Matplotlib
- scikit-learn
Functions like display_sample_images
, plot_image_histogram
, plot_image_sizes
, plot_average_color_distribution
, plot_class_distribution
, and plot_image_sharpness_distribution
are defined to explore and visualize the dataset.
The images are gray-scaled, and also resized to lower dimension to decrease the computational cost.
- Keypoints from images are visualized using the SIFT (Scale-Invariant Feature Transform) algorithm.
- SIFT descriptors are extracted from images to capture local features.
- Keypoints are overlaid on the original images to visualize their distribution.
- SIFT descriptors extracted from images are used to build a codebook.
- K-means clustering is performed on the descriptors to group them into clusters.
- Each cluster centroid represents a visual word in the codebook.
- The codebook obtained from K-means clustering is saved using joblib for later use.
- This allows the codebook to be reused without recomputation, saving time and resources.
- Vector quantization maps visual feature descriptors to visual words based on the codebook.
- Frequency vectors are created for each image by counting the occurrences of visual words.
- Tf-idf (Term Frequency-Inverse Document Frequency) weighting is applied to the frequency vectors to adjust their importance.
- A search function calculates cosine similarity between tf-idf weighted vectors to perform an image search.
- Top-K similar images are identified based on their cosine similarity scores to the search image.
To perform an image search:
search(index)
where,
index
: index of the image in the dataset that you want to search.