Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible issue with filter --save-predictable-norank and merged taxids #80

Closed
4 tasks done
standage opened this issue Apr 10, 2023 · 2 comments
Closed
4 tasks done
Labels

Comments

@standage
Copy link

Prerequisites

  • make sure you're are using the latest version by taxonkit version
  • read the usage

Describe your issue

I've had the equivalent of the following code in pytaxonkit's test suite for a while now.

echo -e "131567\n2\n1224\n1236\n91347\n543\n561\n562\n2605619\n10239\n2731341\n2731360\n2731618\n2731619\n28883\n10699\n196894\n1327037\n" \
    | taxonkit filter --threads 1 --equal-to species --lower-than species --save-predictable-norank

In recent weeks this test started causing CI failures—the command would just hang for hours. Only today have I had a chance to track down the issue. After a bit of trial and error, I discovered (with the taxonkit lineage command) that a few of these taxids had been merged.

echo -e "131567\n2\n1224\n1236\n91347\n543\n561\n562\n2605619\n10239\n2731341\n2731360\n2731618\n2731619\n28883\n10699\n196894\n1327037\n" \
    | taxonkit lineage
14:50:33.260 [WARN] taxid 28883 was merged into 2731619
14:50:33.260 [WARN] taxid 10699 was merged into 2731619
14:50:33.260 [WARN] taxid 196894 was merged into 2788787
131567  cellular organisms
2       cellular organisms;Bacteria
1224    cellular organisms;Bacteria;Pseudomonadota
1236    cellular organisms;Bacteria;Pseudomonadota;Gammaproteobacteria
91347   cellular organisms;Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales
543     cellular organisms;Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae
561     cellular organisms;Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia
562     cellular organisms;Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichia coli
2605619 cellular organisms;Bacteria;Pseudomonadota;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia;Escherichia coli;Escherichia coli O16:H48
10239   Viruses
2731341 Viruses;Duplodnaviria
2731360 Viruses;Duplodnaviria;Heunggongvirae
2731618 Viruses;Duplodnaviria;Heunggongvirae;Uroviricota
2731619 Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes
28883   Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes
10699   Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes
196894  Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;unclassified Caudoviricetes
1327037 Viruses;Duplodnaviria;Heunggongvirae;Uroviricota;Caudoviricetes;unclassified Caudoviricetes;Croceibacter phage P2559Y

When I dropped or replaced the merged taxids, the problem went away and I got the expected answer.

echo -e "131567\n2\n1224\n1236\n91347\n543\n561\n562\n2605619\n10239\n2731341\n2731360\n2731618\n2731619\n2788787\n1327037" \
    | taxonkit filter --threads 1 --equal-to species --lower-than species --save-predictable-norank
562
2605619
1327037

Can you confirm that taxonkit filter is choking on merged taxids here?

  • describe the problem
  • provide a reproducible example
@shenwei356
Copy link
Owner

It's a bug~ The filter did not check merged/deleted taxids ...

@standage
Copy link
Author

That fixes it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants