Publication: In preparation
Citation: Flores et al. 2025
compound_metadata.csv: Contains compound type information for a few compounds (e.g. whether they're amino acids, etc.)
score_metadata.csv: Contains all metadata information on each spectral similarity score
sample_metadata.csv: Contains all metadata information on each sample
model.RDS: The trained ensemble model with all scores
reduced_model.RDS: The trained ensemble model with the top 6 performing scores
BinSizes.csv: The number of candidate molecules per sample and retention index bin
FP_FN_Ranks.txt: Full model and reduced model predictions on the testing dataset
reduced_test_pred.RDS: An R object with the reduced model predictions on the testing dataset
test_pred.RDS: An R object with the full model predictions on the testing dataset
TP_Ranks.txt: Rankings of the true positive per sample and retention index bin for the top 6 scores, the full model, and the reduced model
Note: All other data used in this study is too large for a github repo and can be found here:
build_dataset.R: Extracts all molecule information needed from this study after downloading
ensemble_model.R: Code to build the ensemble model after running build_dataset.R
false_positive_&_false_negative: Extracts all needed information about false positives and false negatives after running the ensemble model
top_N.R: Compares the true positive rankings of the built models and the top 6 scores
- plots.R: Generates all visualizations of results for this study