Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No such file or directory: './identify/*.failed_genomes.tsv' #475

Closed
lfenske-93 opened this issue Feb 21, 2023 · 6 comments
Closed

No such file or directory: './identify/*.failed_genomes.tsv' #475

lfenske-93 opened this issue Feb 21, 2023 · 6 comments
Labels
error Help required for a GTDB-Tk error.

Comments

@lfenske-93
Copy link
Contributor

Hi again 😅

sorry but ani_screen seems still not to work for me. When I'm running GTDBtk with --skip_ani_screen everything works fine, but as soon as I try the new mode I run into errors:

2023-02-21 15:42:16] ERROR: Uncontrolled exit resulting from an unexpected error.
================================================================================ 
EXCEPTION: FileNotFoundError 
MESSAGE: [Errno 2] No such file or directory: './identify/SAMD00000667.failed_genomes.tsv' 

I'm running GTDBtk via conda and try to use it in a Nextflow workflow, so Nextflow terminates as soon as the error occurs.

Trying without NF ends with:

[2023-02-21 14:57:09] INFO: Creating Mash sketch file: human/classify/ani_screen/intermediate_results/mash/gtdbtk.user_query_sketch.msh
[2023-02-21 14:57:09] ERROR: Error generating Mash sketch:                      
[2023-02-21 14:57:09] ERROR: Controlled exit resulting from an unrecoverable error or warning.

Any idea what could help?
Greetings,
Linda

@lfenske-93 lfenske-93 added the error Help required for a GTDB-Tk error. label Feb 21, 2023
@ohickl
Copy link

ohickl commented Feb 21, 2023

Have the same problem. For me creating the conda env with mash=2.2.2 specified as dependency fixed the Error generating Mash sketch: part. Might be a compatibility problem with with Mash 2.2.3, which will be installed by default, as it also used to say:

... INFO: Using Mash version unknown

in the log. Now it is:

... INFO: Using Mash version 2.2.2

Now I also get:

================================================================================
EXCEPTION: FileNotFoundError
  MESSAGE: [Errno 2] No such file or directory: '.../GTDB/identify/gtdbtk.failed_genomes.tsv'
________________________________________________________________________________

The command used is:

export GTDBTK_DATA_PATH=".../gtdbtk/release207_v2"
export PYTHONPATH=$CONDA_PREFIX/lib/python3.7/site-packages
gtdbtk classify_wf --genome_dir .../bins/bin.1/ \
                   --extension fasta \
                   --mash_db .../gtdbtk/mash_db \
                   --out_dir .../bins/bin.1/GTDB \
                   --cpus 32 \
                   --pplacer_cpus 4

@pchaumeil
Copy link
Collaborator

Hello,

To reproduce the error on my side, could you please provide the both provides the subset of genomes and the command lines you are running to get this error?

Thanks

@lfenske-93
Copy link
Contributor Author

Hi,

In fact, downgrading mash to v2.2.2 helped to work around the first error. But the second error still occurs.
I also saw that the genomes are definitely classified and under classify/ani_screen is also the gtdbtk.bac120.ani_summary.tsv with all the results.

Nevertheless, the error occurs, which probably has something to do with the fact that after the ani classification the tool run is actually completed and the further files are of course no longer created.

Here are the two genomes I tried and the command executed was:

gtdbtk classify_wf --genome_dir /shared/test/ --extension .gz --mash_db /shared/test/db/ --out_dir out/

SAMD00002791.fna.gz
SAMD00090154.fna.gz

Greetings,
Linda

@Sidduppal
Copy link

Sidduppal commented Feb 23, 2023

Getting the same error with *failed_genomes.tsv not found.

@pchaumeil
Copy link
Collaborator

@lfenske-93, Thanks for the genomes,
I have run the command and found the issue.
As it has been mentioned, this bug appears because all genomes have been classified with the ani screen step and the classify pipeline is actually completed.
We are currently working on a patch to fix this issue and the v2.3 issue.
A new version of Tk will be released soon.

pchaumeil added a commit that referenced this issue Feb 24, 2023
@pchaumeil
Copy link
Collaborator

A new version of GTDB-Tk ( v2.2.4 ) has been released to fix the *failed_genomes.tsv issue.

Unfortunately we were unable to reproduce the Mash 2.3 error on our end. We did try with few different environments and it seems to work fine.

[2023-02-24 15:17:32] INFO: GTDB-Tk v2.2.3
[2023-02-24 15:17:32] INFO: gtdbtk classify_wf --genome_dir /tmp/test_genomes --extension .gz --out_dir /tmp/test_genomes_result2 --mash_db /tmp/gtdbtkmashdb2 --cpus 40
[2023-02-24 15:17:32] INFO: Using GTDB-Tk reference data version r207: /srv/db/gtdbtk/official/release207_v2
[2023-02-24 15:17:33] INFO: Loading reference genomes.
[2023-02-24 15:17:33] INFO: Using Mash version 2.3
[2023-02-24 15:17:33] INFO: Creating Mash sketch file: /tmp/test_genomes_result2/classify/ani_screen/intermediate_results/mash/gtdbtk.user_query_sketch.msh
[2023-02-24 15:17:33] INFO: Completed 2 genomes in 0.28 seconds (7.19 genomes/second).
[2023-02-24 15:17:33] INFO: Creating Mash sketch file: /tmp/gtdbtkmashdb2.msh
[2023-02-24 15:24:24] INFO: Completed 65,703 genomes in 6.86 minutes (9,582.53 genomes/minute).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error Help required for a GTDB-Tk error.
Projects
None yet
Development

No branches or pull requests

4 participants