Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Staging for release of 2.4.0 #584

Merged
merged 27 commits into from
Apr 24, 2024
Merged

Merge Staging for release of 2.4.0 #584

merged 27 commits into from
Apr 24, 2024

Conversation

pchaumeil
Copy link
Collaborator

No description provided.

pchaumeil and others added 27 commits November 23, 2023 11:55
This commit is to fix few bugs:
- #540 : The empty files are skip during the sketch step of Mash,
they are then catch in the prodigal step and are returned as Unclassified
- #549 : `--force` has been modified to deal with #540
- Prodigal wasn't returning the empty files as failed genomes, it was only skipping them.
These genomes are now returned in the summary file and flagged as Unclassified.
package ~= x.y.z is the same as (package >= x.y.z, package == x.*). This prevents the wrong numpy version (2.x) from being installed, such as in #467 (which is not yet properly fixed and should NOT have been closed).

See: https://peps.python.org/pep-0440/#compatible-release
fix(setup.py): use ~= to constrain major version
In some cases, when running the 3 classify steps independently, a genome may be filtered out in the alignment step.
However, it's still present in the ani screening from the classify step and can have a ANI > 95% ( this can happen with partial genomes, where AF can still be high)
Tk would try to report it twice in the summary file and would return an error. Instead we report it as classified with ani,
 but with a warning from the alignment step ( MSA < 10%).
 skani should reduce the number of such cases as it keep AF low for partial genomes.
In the generated summary.tsv files, several columns have been renamed for clarity and consistency. The following columns have been affected:
- fastani_reference column has been renamed to closest_genome_reference.
- fastani_reference_radius column has been renamed to closest_genome_reference_radius.
- fastani_taxonomy column has been renamed to closest_genome_taxonomy.
 - fastani_ani column has been renamed to closest_genome_ani.
  - fastani_af column has been renamed to closest_genome_af.
Description of --genes flag is more explicit and mentions input as predicted proteins
The summary files are still produces even if all genomes fail the prodigal step
update md5sum, changelog, and announcement.
# Conflicts:
#	gtdbtk/ani_screen.py
#	gtdbtk/main.py
#	gtdbtk/markers.py
@pchaumeil pchaumeil merged commit 59609e2 into master Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants