GitHub - PNNL-m-q/metabolomics_ensemble_score

Ensemble GC-MS Spectral Similarity Score

Publication: In preparation

Citation: Flores et al. 2025

Metadata

compound_metadata.csv: Contains compound type information for a few compounds (e.g. whether they're amino acids, etc.)
score_metadata.csv: Contains all metadata information on each spectral similarity score
sample_metadata.csv: Contains all metadata information on each sample

Models

Result_Data

BinSizes.csv: The number of candidate molecules per sample and retention index bin
FP_FN_Ranks.txt: Full model and reduced model predictions on the testing dataset
reduced_test_pred.RDS: An R object with the reduced model predictions on the testing dataset
test_pred.RDS: An R object with the full model predictions on the testing dataset
TP_Ranks.txt: Rankings of the true positive per sample and retention index bin for the top 6 scores, the full model, and the reduced model

Note: All other data used in this study is too large for a github repo and can be found here: https://data.pnnl.gov/group/nodes/dataset/33302

Scripts

build_dataset.R: Extracts all molecule information needed from this study after downloading https://data.pnnl.gov/group/nodes/dataset/33302
ensemble_model.R: Code to build the ensemble model after running build_dataset.R
false_positive_&_false_negative: Extracts all needed information about false positives and false negatives after running the ensemble model
top_N.R: Compares the true positive rankings of the built models and the top 6 scores

Visualization

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Metadata		Metadata
Models		Models
Result_Data		Result_Data
Scripts		Scripts
Visualization		Visualization
.gitignore		.gitignore
disclaimer.txt		disclaimer.txt
license.txt		license.txt
readme.md		readme.md