Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering does not work as expected if internal traits are defined #1275

Open
jameshadfield opened this issue Jan 25, 2021 · 0 comments
Open
Labels
bug Something isn't working proposal Proposals that warrant further discussion

Comments

@jameshadfield
Copy link
Member

jameshadfield commented Jan 25, 2021

Context

Currently, when filtering by a trait (e.g. SARS-CoV-2 filtered to clade 20I/501Y.v1) we determine node visibility via the following steps

  1. Find the set of tips X which match the filter (in this case, X is tips annotated with clade_membership = 20I/501Y.V1)
  2. Postorder traversal to add the ancestral nodes of these tips to the selection, X. This will always end up visiting and including the root, i.e. rootNode \in X.
  3. Preorder traversal to find the MRCA of X and remove nodes from X which are above this. The tree can then be zoomed to this (MRCA) node via the "zoom to selected button".

Importantly, this approach does not take into account traits annotated on internal nodes, should they exist. Most of the time this works well, but it there are situations where it does not.

Example dataset

The bugs described here use the (contrived) dataset available at https://nextstrain.org/staging/test/internal-trait-filters

Bug 1

Given a trait whose values are known and annotated across the tree, because filtering does not examine internal node states the filtering may be wrong. For example, trait1 is annotated across the tree and defines a basal state (blue) which has been repeatedly lost. The filtering should include all blue nodes (i.e. those annotated with trait1=basal), however it only includes the single tip with this annotation.

image
Bug 1: filtering to basal (RHS) should make all blue branches visible, not just the tip

Bug 2

Similar to 🐛 1, but a case where our filtering algorithm includes nodes which it should not.

image
Bug 2: Filtering to trait2 = tip (RHS) should only show the blue (terminal) nodes, not the entire tree

Bug 3

It is impossible to filter to a trait which only occurs on internal nodes. In the example dataset here, you cannot filter to trait2=internal (yellow), despite it occuring on the tree. Relatedly, this value does not appear in the legend.

Relationship with genotype filtering (PR #1265)

Genotype filtering (currently only available in PR #1265) operates differently, because (a) the internal states are known and (b) the data structure for storing these states across the tree is different to other node traits.

Posible solution

Consider internal node states during filtering. Note that there are many traits where we do not know internal states, and in these cases our current approach is the appropriate one.

@jameshadfield jameshadfield added bug Something isn't working proposal Proposals that warrant further discussion low priority labels Jan 25, 2021
jameshadfield added a commit that referenced this issue Jan 30, 2024
The sidebar filtering now surfaces all valid node-attrs defined across
the (terminal) nodes. URL queries (`?f_${attrName}=value1,value2,...`)
also work for all known attrs. Attributes which are known to be
continuous (via a colorings definition) are excluded, as the filtering
UI is not (yet) able to handle these; if a non-coloring continuous
attribute is set on the nodes then this will end up as a multitude of
numerical options in the sidebar.

As part of this implementation we have removed `stateCountAttrs` from
redux state and improved the validation of (filtering) URL queries.

The behaviour of filtering, and the restriction to collecting attributes
from terminal nodes only, is unchanged. See #1275 for more context.

Closes #1251
jameshadfield added a commit that referenced this issue Jan 30, 2024
The sidebar filtering now surfaces all valid node-attrs defined across
the (terminal) nodes. URL queries (`?f_${attrName}=value1,value2,...`)
also work for all known attrs. Attributes which are known to be
continuous (via a colorings definition) are excluded, as the filtering
UI is not (yet) able to handle these; if a non-coloring continuous
attribute is set on the nodes then this will end up as a multitude of
numerical options in the sidebar.

As part of this implementation we have removed `stateCountAttrs` from
redux state and improved the validation of (filtering) URL queries.

The behaviour of filtering, and the restriction to collecting attributes
from terminal nodes only, is unchanged. See #1275 for more context.

Closes #1251
jameshadfield added a commit that referenced this issue Feb 2, 2024
The sidebar filtering now surfaces all valid node-attrs defined across
the (terminal) nodes. URL queries (`?f_${attrName}=value1,value2,...`)
also work for all known attrs. Attributes which are known to be
continuous (via a colorings definition) are excluded, as the filtering
UI is not (yet) able to handle these; if a non-coloring continuous
attribute is set on the nodes then this will end up as a multitude of
numerical options in the sidebar.

As part of this implementation we have removed `stateCountAttrs` from
redux state and improved the validation of (filtering) URL queries.

The behaviour of filtering, and the restriction to collecting attributes
from terminal nodes only, is unchanged. See #1275 for more context.

Closes #1251
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working proposal Proposals that warrant further discussion
Projects
No open projects
Status: Backlog
Development

No branches or pull requests

1 participant