Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid Taxid while Mapping #50

Open
arpit20328 opened this issue Sep 11, 2024 · 7 comments
Open

Invalid Taxid while Mapping #50

arpit20328 opened this issue Sep 11, 2024 · 7 comments

Comments

@arpit20328
Copy link

arpit20328 commented Sep 11, 2024

12:31:27.687 [INFO] deciding the existence of a reference:
12:31:27.687 [INFO] preset profiling mode: 3
12:31:27.687 [INFO] minimum number of reads per reference chunk: 50
12:31:27.687 [INFO] minimum number of uniquely matched reads: 20
12:31:27.687 [INFO] minimum proportion of matched reference chunks: 0.800000
12:31:27.687 [INFO] maximum standard deviation of relative depths of all chunks: 2.000000
12:31:27.687 [INFO]
12:31:27.687 [INFO] minimum number of high-confidence uniquely matched reads: 5
12:31:27.687 [INFO] minimum query coverage of high-confidence uniquely matched reads: 0.750000
12:31:27.687 [INFO] minimum proportion of high-confidence uniquely matched reads: 0.100000
12:31:27.687 [INFO]
12:31:27.687 [INFO] taxonomy data:
12:31:27.687 [INFO] taxdump directory: /home/arpit/clark/CLARKV1.3.0.0/DIR_DB/taxonomy
12:31:27.687 [INFO] mapping reference IDs to TaxIds: [cut_1_3_nucl_gb_accession2taxid]
12:31:27.687 [INFO]
12:31:27.687 [INFO] reporting:
12:31:27.687 [INFO] default format : outfile
12:31:27.687 [INFO] -------------------- [main parameters] --------------------
12:31:27.687 [INFO]
12:31:27.687 [INFO] stage 1/4: counting matches and unique matches for filtering out low-confidence references
12:31:27.687 [INFO] parsing file: search.kmcp@db1.kmcp.tsv.gz
12:31:27.724 [ERRO] unknown taxid for NEW_KRAKEN_DATABASE, please check taxid mapping file(s)
(

@arpit20328
Copy link
Author

I have ref fasta file as:

NZ_CP014581.1
AGCCGGTCGCCAACTTCAGCTCTCGTACATCCGGAGTCCAGCAGCGAGACGATCAGCTCC
TGATCCGTTGTTATGACGCCGTCCGATTCACACTTTCGCATGCGAAATTTTACTCACTGT
TCGACCGTACTGTAATTTTGTACGGAGTGCGCGCAGCGGCGCAGATATGAAAGACCTTCC
GACTGCAGCAGTCCTGTCCATCCGGACCAGAAGGGCCGTCGTTCCCGCTCGACTCTACTT
GACCAAGCCGGGCCTGACTCGCGGCGTCAGCAGCCACGAAACCTCGCCTTTCGGGCACTT
CGAACGCACTCAAACACGAGCGATTCTAGCCGAACCACGTAGCCCGATCCAGGGGGATAT
CGGACTTATTTCGCACCATCTATTGACCGTCAACATTTCTAATAATAATATCCGTTGCAC
GTGACCACACGTCACCCATCAAAAATCAATGGACCGTCAATAACCGGAGCGTCTCATGTC
CAACACCGTGCAATCGCGCATCTACCAGGCCCTTACCGAACTGATTCCGAATCTTGGAAA

My taxid file is like

A00001.1 10641
A00002.1 9913
A00003.1 9913
A00004.1 32630
A00005.1 32630
A00006.1 32630
A00008.1 32630
A00009.1 32630
A00010.1 32630
A00011.1 32630

what is the problem exactly ? should header of fasta be of taxid and not accession number ?

@arpit20328
Copy link
Author

Do the taxid file can't contain any other taxid which is not present in the reference.fasta ?

@shenwei356
Copy link
Owner

The file is for mapping references IDs, not sequence IDs, to TaxIds. cut_1_3_nucl_gb_accession2taxid should be a file mapping sequence accession to taxids.

unknown taxid for NEW_KRAKEN_DATABASE, please check taxid mapping file(s)

I worry about the index now. This means there's a reference genome named NEW_KRAKEN_DATABASE.

@arpit20328
Copy link
Author

@shenwei356 Can you provide kmcp indexed file and its database (.fna) file for bacterial identification ? my data is paried end cell free DNA fastq data.

@shenwei356
Copy link
Owner

shenwei356 commented Sep 11, 2024

As mentioned here, there's one: https://1drv.ms/f/s!Ag89cZ8NYcqtlgAtJo--uKPNVT4t?e=KEDFrc

You can also try other tools, like motus3 and metaphyan4 and sylph, if you just need bacterial identification.

@arpit20328
Copy link
Author

So I have to download

image

along with taxdump files as follows ?

image

and the taxid file as follows ?

image

@shenwei356
Copy link
Owner

The choice of taxdump file and taxid file depends on you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants