Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grab bag of improvements to augur clades #735

Open
1 of 6 tasks
jameshadfield opened this issue Jun 9, 2021 · 0 comments
Open
1 of 6 tasks

Grab bag of improvements to augur clades #735

jameshadfield opened this issue Jun 9, 2021 · 0 comments
Labels
enhancement New feature or request

Comments

@jameshadfield
Copy link
Member

jameshadfield commented Jun 9, 2021

Context
This issue represents a list of minor odditites, bugs and gotchas I discovered while working on augur clades in PR #728

  • Despite the help indicating that nucleotide and/or amino-acid mutations are required, the node-data JSONs, when combined, must contain both muts and aa_muts keys for each node because the augur clades codes assumes their existence. Only one should be required.
  • Every node in the tree must have a corresponding entry in a node-data JSON, even if it has no mutations (this is asserted in NodeReader). We should allow nodes without information to be missing from node-data JSONs.
  • A single branch can define multiple mutations at the same position without an error being thrown, but each mutation overrides the previous and the results are unexpected. We should detect cases such as these and exit with an error.
  • #-prefixed lines in the clades TSV are not interpreted as comments, they're actually read as potentially valid clade definitions! You can mostly get away with this, as they define non-sensical clades, and thus don't cause any errors. I suggest we add comment='#' to pd.read_csv here.
  • related to ☝️ we should print warnings for each clade (in the supplied TSV) which isn't found in the tree.
  • The behaviour of augur clades means that if there are multiple nodes containing clade-defining mutations (i.e. the clade is polyphyletic), then we only annotate clades on the biggest monophyly. We should warn when situations like this arrise, or allow this to be relaxed. I expect it'll become common to want to define "clades" via a small set of constellation nCoV mutations, and expect polyphyletic colourings in Auspice.
@jameshadfield jameshadfield added the enhancement New feature or request label Jun 9, 2021
jameshadfield added a commit that referenced this issue Apr 11, 2023
A fatal error is raised if no clades are defined, but if a clade is not
found on the tree it's only a warning.
Suggested in #735
jameshadfield added a commit that referenced this issue Apr 11, 2023
Multiple mutations at the same position on a single branch are now a
fatal error. Previous behaviour was to overwrite such mutations when
parsing. Suggested by #735.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Status: Backlog
Development

No branches or pull requests

1 participant