DISCLAIMER. This repository is no longer mantained. Please refer to our framework Ducho for an improved and more recent version of this project.
This repository provides a Python implementation to extract multimodal features from images and texts, either high-level ones from pretrained deep learning models (e.g., CNNs-extracted embeddings), or low-level ones (e.g., color and shape).
List of publications that used the codes from this repository:
- A Study on the Relative Importance of Convolutional Neural Networks in Visually-Aware Recommender Systems (accepted at CVFAD@CVPR2021)
- V-Elliot: Design, Evaluate and Tune Visual Recommender Systems (accepted at RecSys2021)
- Leveraging Content-Style Item Representation for Visual Recommendation (accepted at ECIR2022)
- Reshaping Graph Recommendation with Edge Graph Collaborative Filtering and Customer Reviews (accepted at DL4SR@CIKM2022)
The list will be constantly updated. If any of your works is missing, please contact me (daniele.malitesta@poliba.it)!
Table of Contents:
To begin with, please make sure your system has these installed:
- Python 3.6.8
- CUDA 10.1
- cuDNN 7.6.4
Then, install all required Python dependencies with the command:
pip install -r requirements.txt
Finally, you are supposed to structure the dataset folders in the following way:
To classify images and extract visual features from them, please run the following script:
python classify_extract_visual.py \
--gpu <gpu-id>
--dataset <dataset-name> \
--model_name <list-of-cnns> \
--cnn_output_name <list-of-output-names-for-each-cnn> \
--cnn_output_shape <list-of-output-shapes-for-each-cnn> \
--cnn_output_split <whether-to-store-separately-output-features-or-not> \
--category_dim <dimension-for-dimensionality-reduction> \
--print_each <print-status-each>
The input parameters model_name
, cnn_output_name
, and cnn_output_shape
are lists of values for whom there must exist a correspondence across all the lists, e.g., model_name[0] --> VGG19
, cnn_output_name[0] --> fc2
, cnn_output_shape[0] --> ()
. Setting the output shape as ()
means no reshape is performed after extraction.
- Principal Component Analysis (PCA)
The script will generate three output files, namely:
, a csv file with the classification outcomes for the input images and the adopted modelcnn_features_<model_name>_<output_name>.npy
, a npy file with the extracted features for the input images, the adopted model and extraction layercnn_features_<model_name>_<output_name>_pca<dim>.npy
, a npy file with the extracted features for the input images, the adopted model, extraction layer, and reduction dimension.
N.B. Depending on how you set the argument --cnn_output_split
, you may store a unique numpy array (see above), or different numpy arrays, one for each extracted visual feature (in this case, they will be stored to the directory cnn_features_<model_name>_<output_name>/
or cnn_features_<model_name>_<output_name>_pca<dim>/
To extract textual features from texts, please run the following script:
python extract_textual.py \
--gpu <gpu-id>
--dataset <dataset-name> \
--model_name <list-of-textual-encoders> \
--text_output_split <whether-to-store-separately-output-features-or-not>
--column <column-to-encode>
--print_each <print-status-each>
Please, refer to SentenceTransformers for an indication of the available pre-trained models.
The script will generate three output files, namely:
, a npy file with the extracted features for the input texts and the adopted model
N.B. Depending on how you set the argument --text_output_split
, you may store a unique numpy array (see above), or different numpy arrays, one for each extracted textual feature (in this case, they will be stored to the directory text_features_<model_name>/
This section refers to the novel metric visual diversity (VisDiv), proposed in our paper A Study on the Relative Importance of Convolutional Neural Networks in Visually-Aware Recommender Systems.
To calculate the VisDiv, please run the following script:
python evaluate_visual_profile.py \
--dataset <dataset-name> \
--image_feat_extractors <list-of-image-feature-extractors> \
--visual_recommenders <list-of-visual-recommenders> \
--top_k <top-k-to-calculate-visdiv-on> \
--save_plots <whether-to-save-the-output-plots>
To run, the script requires the folder with the obtained recommendation results. It must be formatted in the following way:
where each tsv file refers to the recommendation lists produced by the best performing configuration for each visual recommender.
The script will generate the following outputs, namely:
, a set of pdf files having the t-SNE graphical representation of the VisDiv for each user./plots/<dataset-name>_<top-k>/<visual-recommender>/<image-feature-extractor>/all_users_stats.csv
, a csv file to store all VisDiv values for each user./plots/<dataset-name>_<top-k>/<visual-recommender>/<image-feature-extractor>/final_stats.out
, a txt file to store the final statistics for the VisDiv metric
Daniele Malitesta (daniele.malitesta@poliba.it)