added notebooks and scripts to reranking using graph features (catboo… #146
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I'm adding scripts to run the reranking pipeline using graph features (numerical, textual, and embedding features). There are also jupyter notebooks alternatives. Note: the scripts only train the model without reranking for now. Thus, converting the
.ipynb
files and running from top to bottom will do all training, reranking, and features importance.Remember to change configurations before running the
.ipynb
files. Or alternatively, you can train the models with the scripts, then load it in to theipynb
file and gather reranking results.In addition to the reranking pipeline, I added
graph_features_preparation.py
which prepares the dataframe with graph features and publish it to HuggingFace (T5-large-ssm, T5-xl-ssm).The added notebooks and scripts and its functionality are as below:
graph_features_preparation.py
: prepare the dataset with graph features and publish to HFlinear_regression.ipynb
: notebooks to train linear regression and rerank with dataset above (can be ran from top to bottom)train_linear_regression.py
: script to train linear regression using dataset abovecatboost_features.ipynb
: notebooks to train catboost and rerank with dataset above (can be ran from top to bottom)train_catboost_regressor.py
: script to train catboost using dataset above