Skip to content

Commit

Permalink
Start from actually-aligned sequences
Browse files Browse the repository at this point in the history
Use aligned sequences as the aligned sequences input, rather than pass
off unaligned sequences as the aligned sequences input.

This should be inconsequential to workflow behaviour or results, but it
makes the config a bit more straightforward and less confusing.

In a quick dig thru history, it seems like ncov-ingest's
aligned.fasta.xz was not _quite_ available when we first switched our
profiles to use an "aligned" input instead of a "sequences" input.  The
original use of "aligned" with unaligned sequences was driven by run
time concerns and related to the move of the filtering step after the
subsampling step and move of the "preprocess" steps from this workflow
(ncov) to ncov-ingest.¹

Resolves <#1054>.

¹ <#814>
  <#823>
  • Loading branch information
tsibley committed Apr 6, 2023
1 parent eff4695 commit 1376d82
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 9 deletions.
4 changes: 1 addition & 3 deletions nextstrain_profiles/nextstrain-country/builds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,10 @@ include_hcov19_prefix: True
files:
description: "nextstrain_profiles/nextstrain-country/nextstrain_description.md"

# Note: unaligned sequences are provided as "aligned" sequences to avoid an initial full-DB alignment
# as we re-align everything after subsampling.
inputs:
- name: gisaid
metadata: "s3://nextstrain-ncov-private/metadata.tsv.gz"
aligned: "s3://nextstrain-ncov-private/sequences.fasta.xz"
aligned: "s3://nextstrain-ncov-private/aligned.fasta.xz"
skip_sanitize_metadata: true

# Define locations for which builds should be created.
Expand Down
4 changes: 1 addition & 3 deletions nextstrain_profiles/nextstrain-gisaid/builds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,10 @@ include_hcov19_prefix: True
files:
description: "nextstrain_profiles/nextstrain-gisaid/nextstrain_description.md"

# Note: unaligned sequences are provided as "aligned" sequences to avoid an initial full-DB alignment
# as we re-align everything after subsampling.
inputs:
- name: gisaid
metadata: "s3://nextstrain-ncov-private/metadata.tsv.gz"
aligned: "s3://nextstrain-ncov-private/sequences.fasta.xz"
aligned: "s3://nextstrain-ncov-private/aligned.fasta.xz"
skip_sanitize_metadata: true

# Define locations for which builds should be created.
Expand Down
4 changes: 1 addition & 3 deletions nextstrain_profiles/nextstrain-open/builds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,10 @@ include_hcov19_prefix: False
files:
description: "nextstrain_profiles/nextstrain-open/nextstrain_description.md"

# Note: unaligned sequences are provided as "aligned" sequences to avoid an initial full-DB alignment
# as we re-align everything after subsampling.
inputs:
- name: open
metadata: "s3://nextstrain-data/files/ncov/open/metadata.tsv.gz"
aligned: "s3://nextstrain-data/files/ncov/open/sequences.fasta.xz"
aligned: "s3://nextstrain-data/files/ncov/open/aligned.fasta.xz"
skip_sanitize_metadata: true

# Define locations for which builds should be created.
Expand Down

0 comments on commit 1376d82

Please sign in to comment.