Exclude extra columns in labels df (TrainConfig) and filepaths df (PredictConfig) #186

ejm714 · 2022-05-11T17:49:13Z

For the labels csv, we required "filepath" and "label" and support optional columns "split" and/or "site." If a labels csv contains extra columns, we should exclude those as these can lead to unintended consequences.

For example, if there is a column with "species_" prefix, e.g. "species_VE", this column will get included in the species list (which results from label -> species -> get dummies -> filter columns with "species_"). By explicitly excluding extra columns, we avoid this bug.

Similarly for predict, we should exclude any columns that are not "filepath" as this interferes with indexing in the dataloader.

github-actions · 2022-05-11T17:52:51Z

🚀 Deployed on https://deploy-preview-186--silly-keller-664934.netlify.app

pjbull

Looks great!

exclude extra columns so these do not get one hot encoded or used

cd476e5

ejm714 requested a review from pjbull May 11, 2022 17:49

pjbull approved these changes May 11, 2022

View reviewed changes

pjbull merged commit 957f936 into master May 11, 2022

pjbull deleted the exclude-extra-cols branch May 11, 2022 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exclude extra columns in labels df (TrainConfig) and filepaths df (PredictConfig) #186

Exclude extra columns in labels df (TrainConfig) and filepaths df (PredictConfig) #186

ejm714 commented May 11, 2022 •

edited

Loading

github-actions bot commented May 11, 2022

pjbull left a comment

Exclude extra columns in labels df (TrainConfig) and filepaths df (PredictConfig) #186

Exclude extra columns in labels df (TrainConfig) and filepaths df (PredictConfig) #186

Conversation

ejm714 commented May 11, 2022 • edited Loading

github-actions bot commented May 11, 2022

pjbull left a comment

Choose a reason for hiding this comment

ejm714 commented May 11, 2022 •

edited

Loading