Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

export v2: metadata ID column not used if strain column already exists in the metadata #1262

Closed
joverlee521 opened this issue Jul 25, 2023 · 2 comments · Fixed by #1261
Closed
Labels
bug Something isn't working

Comments

@joverlee521
Copy link
Contributor

joverlee521 commented Jul 25, 2023

Reviewing #1261 and the related nextstrain/mpox#161 made me realize that augur export v2 does not use the provided metadata ID column if "strain" already exists as a column in the metadata.

Looking at the outputs of the monkeypox CI run for the PR, when the "strain" and "accession" columns have different values, the final Auspice JSON will be missing attributes from the metadata. For example, the sequence MPXV_USA_2022_FL001 is missing "region", "country" and "host" attributes even though they are available in the example metadata.tsv


The issue comes from how export v2 adds metadata to the tree node_attrs.

The metadata gets added using the hard-coded strain field

augur/augur/export_v2.py

Lines 1018 to 1024 in 9ef4711

# first pass: metadata
for node in metadata.values():
if node["strain"] in node_attrs: # i.e. this node name is in the tree
for key, value in node.items():
corrected_key = update_deprecated_names(key)
node_attrs[node["strain"]][corrected_key] = value
metadata_names.add(corrected_key)

However, the metadata ID column only gets assigned to the strain field if strain does not already exist in the metadata

augur/augur/export_v2.py

Lines 1081 to 1083 in 9ef4711

for strain in metadata_file.keys():
if "strain" not in metadata_file[strain]:
metadata_file[strain]["strain"] = strain


Potential fix: 9741279

@joverlee521 joverlee521 added the bug Something isn't working label Jul 25, 2023
@jameshadfield
Copy link
Member

This bug is also present in augur export v1. I don't think we should be backporting improvements there, so I would favor removing --metadata-id-columns from v1 rather than fixing the bug. In fact, I think we should put it on the roadmap to remove augur export v1 entirely, and then ~symlink augur export to augur export v2.

@victorlin
Copy link
Member

@jameshadfield re: export v1, see #1266.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants