Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gene reference files to generate E gene trees #48

Merged
merged 8 commits into from
May 13, 2024
Merged

Conversation

j23414
Copy link
Contributor

@j23414 j23414 commented May 13, 2024

Description of proposed changes

Following up on the Generating gene reference files PR, use the E gene reference files to create E gene trees.

Visual summary (view whole pipeline plan so far)

Related issue(s)

Checklist

Links to resulting trees below

genome E gene
all all/genome all/E
denv1 denv1/genome denv1/E
denv2 denv2/genome denv2/E
denv3 denv3/genome denv3/E
denv4 denv4/genome denv4/E

A manual run of the dengue serotype 4 E gene tree might look like:

# from dengue repo top level
nextstrain build phylogenetic auspice/dengue_denv4_E.json

Until we have a set of E gene serotype and genotype defining mutations,
skip the augur clades call for E gene builds. This was originally discussed
for commit:

2f48ddb

Used in measles:

nextstrain/measles@ab92a1b

And diagrammed in:

#17 (comment)
Add ncbi_serotype annotation to phylogeny, since this can be used as a
source-of-truth when adding E gene defining serotypes in subsequent PRs.
@j23414 j23414 requested a review from a team May 13, 2024 16:24
Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for breaking down the steps @j23414! I've left some non-blocking comments for consideration, but this looks good to merge to me 👍

@@ -42,7 +42,7 @@ rule prepare_auspice_config:
output:
auspice_config="results/config/{gene}/auspice_config_{serotype}.json",
params:
replace_clade_key="clade_membership",
replace_clade_key=lambda wildcard: r"clade_membership" if wildcard.gene in ['genome'] else r"nextclade_subtype",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the clade_membership is automatically used to create the clade branch label in augur export, this change means the E gene build will not have the automatic clade branch label.

It's possible to create custom branch labels since nextstrain/augur#728, but this uses augur clades to create the labels and the workflow is skipping augur clades for the E gene build 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the nextclade_subtype field is being added through augur traits, maybe augur traits should be updated to support adding branch labels as well...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know! I'm currently planning to explore serotype and genotype-defining E mutations (which would create clade_membership for E gene builds) in a future PR.

But if that doesn't work out, I'll explore "create custom branch labels ..." route you've referenced.

@@ -42,7 +42,7 @@ rule prepare_auspice_config:
output:
auspice_config="results/config/{gene}/auspice_config_{serotype}.json",
params:
replace_clade_key="clade_membership",
replace_clade_key=lambda wildcard: r"clade_membership" if wildcard.gene in ['genome'] else r"nextclade_subtype",
replace_clade_title=lambda wildcard: r"Serotype" if wildcard.serotype in ['all'] else r"DENV genotype",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To mirror the title NCBI serotype added below, maybe the default clade_membership title should be updated to Nextstrain serotype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for flagging! I'm considering renaming NCBI serotype toSerotype (NCBI) to better match naming convention used in measle's Genotype (NCBI) title. Then, I agree that the default title for clade_membership should be renamed from Serotype to Serotype (Nextstrain).

To recap:

  • NCBI serotype -> Serotype (NCBI): indicating that denv1-4 assignment is based on NCBI GenBank record annotation
  • Serotype -> Serotype (Nextstrain): indicating that denv1-4 assignment is based on augur clades call using full-genome-level-serotype-defining amino acid mutations
  • Nextclade genotype -> Genotype (Nextclade): indicating genotype level assignment within the serotype (e.g. DENV1/S) based on Nextclade call

This naming adjustment leaves space for a potential Genotype (NCBI) if we develop a script for parsing genotype annotations from the GenBank data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, actually I think I'll merge this pull request as it is, and handle the renaming mentioned above in a later PR. That way, we can address it along with fixing this issue: #41.

@j23414 j23414 merged commit 5365ef2 into main May 13, 2024
41 checks passed
@j23414 j23414 deleted the gene-phylogeny branch May 13, 2024 21:58
This was referenced May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants