Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

export v2: Add --metadata-columns option #1384

Merged
merged 5 commits into from
Feb 8, 2024

Conversation

joverlee521
Copy link
Contributor

@joverlee521 joverlee521 commented Jan 6, 2024

Description of proposed changes

Allows users to specify additional metadata columns to export as
node attributes that are not specified as coloring options via
the existing --color-by-metadata option or the Auspice config JSON.
This allows the metadata columns to be visible in the tree in Auspice
without polluting the color-by options.

Related issue(s)

Resolves #1383

Checklist

  • Checks pass
  • If making user-facing changes, add a message in CHANGES.md summarizing the changes in this PR

Copy link

codecov bot commented Jan 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (abc86a8) 66.71% compared to head (32c9740) 66.96%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1384      +/-   ##
==========================================
+ Coverage   66.71%   66.96%   +0.24%     
==========================================
  Files          69       69              
  Lines        7322     7341      +19     
  Branches     1799     1804       +5     
==========================================
+ Hits         4885     4916      +31     
+ Misses       2169     2154      -15     
- Partials      268      271       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

augur/export_v2.py Outdated Show resolved Hide resolved
@jameshadfield
Copy link
Member

jameshadfield commented Jan 7, 2024

Nice one Jover - the overall aim of this work is something I've wanted for a long time. I haven't tested the code, just skimmed through.

These columns will not be used as coloring options in Auspice but will be visible in the tree

Here do you mean visible in the modal when (shift-)clicking on a branch/tip? It'd be good to surface them as sidebar filtering options too, but that'll take an Auspice change I think. My understanding of the linked issue (and from discussions with others) is that datasets are adding unnecessary colorings in order to expose metadata as filtering options.

@joverlee521
Copy link
Contributor Author

Here do you mean visible in the modal when (shift-)clicking on a branch/tip?

Yes, I mean they will only be visible on the modal on click.

It'd be good to surface them as sidebar filtering options too, but that'll take an Auspice change I think. My understanding of the linked issue (and from discussions with others) is that datasets are adding unnecessary colorings in order to expose metadata as filtering options.

Yup, I think this will require changes in Auspice. My understanding is that currently filters must be defined in the Auspice JSON under meta.filters. Even colorings are not automatically available as filters (nextstrain/auspice#1251).

Copy link
Member

@victorlin victorlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality is great! Suggestions below are non-blocking (except changelog request).

tests/functional/export_v2/cram/metadata-columns.t Outdated Show resolved Hide resolved
augur/export_v2.py Outdated Show resolved Hide resolved
augur/export_v2.py Outdated Show resolved Hide resolved
augur/export_v2.py Outdated Show resolved Hide resolved
Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows users to specify additional metadata columns to export as node attributes that are not specified as coloring options via the existing --color-by-metadata option or the Auspice config JSON. This allows the metadata columns to be visible in the tree in Auspice without polluting the color-by options.

This behaviour is already possible by specifying the (TSV metadata column names) in the auspice config's filters array. There are side effects beyond those implemented in this PR, as each key in filters will be shown as a (collapsed) section in the footer, but the (big) upside is that the metadata is searchable.

I want to change the general behaviour here a bit, but I think we are better to sketch the big direction out first before merging any PRs. I'm envisioning something like the following, but it's not been completely thought through:

  1. Auspice uses all node_attrs as sidebar filtering options (and maybe make branch labels searchable too, but that's complicated).
  2. The filters array (in the dataset JSON) is reserved for the items listed in the footer. I see this functionality as less important now that we have sidebar filtering.
  3. We have some way of exporting metadata (TSV) fields without them being colourings. Maybe that's this PR?

But maybe just doing (1) is enough? Thoughts?

@@ -877,6 +887,9 @@ def register_parser(parent_subparsers):
help="delimiters to accept when reading a metadata file. Only one delimiter will be inferred.")
optional_inputs.add_argument('--metadata-id-columns', default=DEFAULT_ID_COLUMNS, nargs="+",
help="names of possible metadata columns containing identifier information, ordered by priority. Only one ID column will be inferred.")
optional_inputs.add_argument('--metadata-columns', nargs="+",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any functionality we support via command line arguments should be available via the auspice-config JSON as well, as the former is a subset of the latter's capabilities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this flag relates to the metadata input/export but not really Auspice control settings. Not sure this fits in the auspice-config JSON?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it as controlling what data is exported, similar to filters & colours. The (original) intention when we started adding command line arguments for options in the config JSON was for auspice-config to be all you need to control what data the JSON contains.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that makes sense. I'll move up the flag to be part of the config args group too then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auspice config schema update and changes to support the new property added in bdb2b2c

@joverlee521
Copy link
Contributor Author

I want to change the general behaviour here a bit, but I think we are better to sketch the big direction out first before merging any PRs.

  1. Auspice uses all node_attrs as sidebar filtering options (and maybe make branch labels searchable too, but that's complicated).
  2. The filters array (in the dataset JSON) is reserved for the items listed in the footer. I see this functionality as less important now that we have sidebar filtering.
  3. We have some way of exporting metadata (TSV) fields without them being colourings. Maybe that's this PR?

But maybe just doing (1) is enough? Thoughts?

I think (1) is definitely needed. I think this will require some design on filtering for continuous value fields?

On (2), I think we can even deprecate the footer filters. I definitely only use the sidebar filtering and have not looked at footer filters in a while.

I think even if we implement (1), we still need some way to support (3). I think it would be nice to fully detangle exported metadata from coloring options. I would want to be more explicit about what metadata fields are exported instead of letting it stay a side effect of other Auspice configurations.

Allows users to specify additional metadata columns to export as
node attributes that are not specified as coloring options via
the existing `--color-by-metadata` option or the Auspice config JSON.
This allows the metadata columns to be visible in the tree in Auspice
without polluting the color-by options.
@joverlee521
Copy link
Contributor Author

Rebased to fix merge conflicts in changelog.

@joverlee521
Copy link
Contributor Author

@jameshadfield since (1) was implemented in nextstrain/auspice#1743, do you still have feedback here?

@jameshadfield
Copy link
Member

@jameshadfield since (1) was implemented in nextstrain/auspice#1743, do you still have feedback here?

On (2), I think we can even deprecate the footer filters. I definitely only use the sidebar filtering and have not looked at footer filters in a while.

My interpretation of the deprecation here was to deprecate the auspice UI, but keep the augur export filters setting and use this as the way to add metadata that won't be a coloring. As in "what data would you like to include for filtering in auspice". But would you prefer to add --metadata-columns and remove --filters entirely?

(If we do add --metadata-columns, I do think it needs to be a setting in the auspice-config as well.)

@joverlee521
Copy link
Contributor Author

My interpretation of the deprecation here was to deprecate the auspice UI, but keep the augur export filters setting and use this as the way to add metadata that won't be a coloring. As in "what data would you like to include for filtering in auspice". But would you prefer to add --metadata-columns and remove --filters entirely?

(If we do add --metadata-columns, I do think it needs to be a setting in the auspice-config as well.)

I'd prefer to add --metadata-columns. The --filters flag does not currently exist, so this would just be deprecating the filters key in the auspice config JSON.

For me,--filters does not tell me if the data is coming from the metadata TSV. I'd be confused about whether or not I need to include node-data JSON fields to use them as filters as well.

Since the new `--metadata-columns` flag controls
what data is available in the final Auspice JSON,
it should have a matching option in the Auspice
config JSON setting.

This also moves the flag to the "DISPLAY CONFIGURATION"
group to match the other Auspice config controlled
settings. As per the existing pattern, the command
line flag overrules what is set in the config file.

Based on feedback from @jameshadfield
#1384 (comment)
@joverlee521 joverlee521 requested a review from a team February 7, 2024 21:53
@huddlej
Copy link
Contributor

huddlej commented Feb 7, 2024

On (2), I think we can even deprecate the footer filters. I definitely only use the sidebar filtering and have not looked at footer filters in a while.

I'd vote to please keep the footer filters, if possible. I still use them regularly and find them helpful for quickly picking from a longer list of options (e.g., year-month values for flu, clades, regions, etc.). It's generally been helpful to have a way to see in a single view what the filter fields are and the specific values within each field.

Copy link
Member

@jameshadfield jameshadfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't re-reviewed everything but the changes in bdb2b2c look good by inspection

@joverlee521
Copy link
Contributor Author

I'd vote to please keep the footer filters, if possible. I still use them regularly and find them helpful for quickly picking from a longer list of options (e.g., year-month values for flu, clades, regions, etc.). It's generally been helpful to have a way to see in a single view what the filter fields are and the specific values within each field.

@huddlej No worries, this PR doesn't touch the footer filters! Although this reminded me that the default footer filters will likely only be a subset of available filter fields since it currently does not include the additional metadata columns!

augur/augur/export_v2.py

Lines 539 to 542 in 4bf72f8

# if not specified in the config, include all boolean and categorical colorbys
potentials = {coloring["key"] for coloring in data_json['meta']["colorings"]
if coloring["type"] != "continuous" and coloring["key"] != 'gt'}
data_json['meta']['filters'] = list(potentials)

@joverlee521
Copy link
Contributor Author

Merging this PR as-is, will update the footer filters separately (if desired).

@joverlee521 joverlee521 merged commit e4353b4 into master Feb 8, 2024
20 checks passed
@joverlee521 joverlee521 deleted the export-additional-metadata branch February 8, 2024 17:54
joverlee521 added a commit that referenced this pull request Feb 16, 2024
Fixing changelog update made in #1384.

The PR was actually released as part of v24.2.0, not v 24.1.0.
@joverlee521 joverlee521 mentioned this pull request Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[export] Decouple colorings and exported metadata
4 participants