Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude extra columns in labels df (TrainConfig) and filepaths df (PredictConfig) #186

Merged
merged 1 commit into from
May 11, 2022

Conversation

ejm714
Copy link
Collaborator

@ejm714 ejm714 commented May 11, 2022

For the labels csv, we required "filepath" and "label" and support optional columns "split" and/or "site." If a labels csv contains extra columns, we should exclude those as these can lead to unintended consequences.

For example, if there is a column with "species_" prefix, e.g. "species_VE", this column will get included in the species list (which results from label -> species -> get dummies -> filter columns with "species_"). By explicitly excluding extra columns, we avoid this bug.

Similarly for predict, we should exclude any columns that are not "filepath" as this interferes with indexing in the dataloader.

@ejm714 ejm714 requested a review from pjbull May 11, 2022 17:49
@github-actions
Copy link
Contributor

Copy link
Member

@pjbull pjbull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@pjbull pjbull merged commit 957f936 into master May 11, 2022
@pjbull pjbull deleted the exclude-extra-cols branch May 11, 2022 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants