-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow users to specify arbitrary branch & clade labels #728
Commits on Apr 11, 2023
-
Allow branch labels in node-data JSONs
Previously branch labels could not be specified in data passed to `augur export v2` except for two "special cases": (i) AA mutations (stored in node-data-json -> nodes) would create branch labels "aa", if applicable. (ii) `clade_annotation` (stored in node-data-json -> nodes) was interpreted to be the "clade" branch label, and exported as such. Here we extend the allowed node-data structure to include a top-level key `branches` as described in [1] and the test data added here [2]. This data is exported in the appropriate format for Auspice (unchanged). This paves the way for pipelines to define a range of branch labels for export. Currently the only usable key in this dict is 'labels'. If a branch label (via node-data-json -> branches -> node_name -> label) is provided for 'aa' or 'clade' then this will overwrite the values generated above (i, ii). A side-effect of this work is that the requirement for node-data JSONs to specify "nodes" has been relaxed (see [2] for an example); however if neither "nodes" nor "branches" are defined then we raise a validation error. [1] #720 [2] ./tests/functional/export_v2/branch-labels.json
Configuration menu - View commit details
-
Copy full SHA for 90d1a5f - Browse repository at this point
Copy the full SHA 90d1a5fView commit details -
[clades] export labels as specific branch labels
Previously clade membership (i.e. the coloring) and the branch labels defining the root of the clade were defined via: <OUTPUT_NODE_DATA> → nodes → <node_name> → clade_membership, and <OUTPUT_NODE_DATA> → nodes → <node_name> → clade_annotation. `augur export` would then convert the clade_annotation into a branch label named 'clade'. Here we change the format of augur clade's OUTPUT_NODE_DATA so that the membership and labels are now stored via: <OUTPUT_NODE_DATA> → nodes → <node_name> → clade_membership, and <OUTPUT_NODE_DATA> → branches → <node_name> → labels → clade. The previous commit modified augur export to handle this format. Augur pipelines should be fully backwards compatible as long as a new major version of augur is released, as we ensure that node-data files are created by the same augur (major) version. Scripts which relied on the format of this node-data file may be affected. Note that we keep the key 'clade_membership' deliberately: this is used in auspice-config JSONs and auspice URLs, and so changing it will cause lots of downstream issues for a minimal syntax improvement. (The `clade_annotation` key name was never exported in auspice JSONs.) This commit paves the way for allowing custom key names.
Configuration menu - View commit details
-
Copy full SHA for 5ba7cf1 - Browse repository at this point
Copy the full SHA 5ba7cf1View commit details -
[clades] allow custom membership / label names
These arguments shouldn't need to be used in most cases but are really useful for pipelines which run `augur clades` multiple times (e.g. nCoV's emerging lineages). This will allow _n_ node-data files to be passed to `augur export` with a resulting _n_ colorings and labels. (Currently you need multiple extra steps: the node-data JSON needs to have the key names changed, and then you need to manually set branch labels in the auspice JSON.)
Configuration menu - View commit details
-
Copy full SHA for fd88aa7 - Browse repository at this point
Copy the full SHA fd88aa7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 007cb47 - Browse repository at this point
Copy the full SHA 007cb47View commit details -
[clades] allow node-data nodes to be a subset of tree nodes
Our current implementation of read_node_data requires that every node in the tree is specified in the (merged) node_data files. For mutations this is overkill -- many nodes don't have mutations and it's overkill to require node_data JSONs to specify things like `"node_name": {"muts": []}`. This may well be the general behaviour we want, but i didn't want to modify the read_node_data function which sees extensive use. A welcome side effect of these changes is that we no longer have to supply both nuc and aa_muts.
Configuration menu - View commit details
-
Copy full SHA for 4316a7d - Browse repository at this point
Copy the full SHA 4316a7dView commit details -
[clades] tests for clades set at the root node
See comments in tests/functional/clades.t Also adds / updates comments and docstrings which were noticed as I worked through the code relating to these tests.
Configuration menu - View commit details
-
Copy full SHA for 22e2444 - Browse repository at this point
Copy the full SHA 22e2444View commit details -
[clades] supress unused --references arg
Workflows may be using this so I elected to hide it rather than remove it (and warn people it's a no-op if they do happen to be using it)
Configuration menu - View commit details
-
Copy full SHA for 0cb841d - Browse repository at this point
Copy the full SHA 0cb841dView commit details -
[clades] improve reference sequence parsing
This function had a few subtle bugs in it which are fixed here, as well as improving the warning message to explain how this may affect clade inference. Note that the presence of sequences on nodes other than the root is not considered by augur clades.
Configuration menu - View commit details
-
Copy full SHA for 0aaf6a7 - Browse repository at this point
Copy the full SHA 0aaf6a7View commit details -
[clades] catch error where pos is beyond ref length
We could check all of these up-front instead of exiting upon the first error, and such a check should be part of validation within augur clades, but this commit is a simple solution to fix a reported bug. Closes #965
Configuration menu - View commit details
-
Copy full SHA for a356a9e - Browse repository at this point
Copy the full SHA a356a9eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2c6b662 - Browse repository at this point
Copy the full SHA 2c6b662View commit details -
[clades] warnings for unfound clades
A fatal error is raised if no clades are defined, but if a clade is not found on the tree it's only a warning. Suggested in #735
Configuration menu - View commit details
-
Copy full SHA for 40e549d - Browse repository at this point
Copy the full SHA 40e549dView commit details -
[clades] check for multiple mutations at same pos
Multiple mutations at the same position on a single branch are now a fatal error. Previous behaviour was to overwrite such mutations when parsing. Suggested by #735.
Configuration menu - View commit details
-
Copy full SHA for e5cfc3a - Browse repository at this point
Copy the full SHA e5cfc3aView commit details
Commits on May 4, 2023
-
Merge pull request #1199 from nextstrain/clade-fixes
Multiple improvements to augur clades
Configuration menu - View commit details
-
Copy full SHA for dd318ba - Browse repository at this point
Copy the full SHA dd318baView commit details