Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add name2taxid function from taxonkit #6146

Merged
merged 38 commits into from
Aug 14, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
dda83a8
Add name2taxid function from taxonkit
SantaMcCloud Jul 13, 2024
698d432
rename file
SantaMcCloud Jul 13, 2024
28d1db1
rename file
SantaMcCloud Jul 13, 2024
306860f
rename file and change a param
SantaMcCloud Jul 13, 2024
d3876fb
a little fix
SantaMcCloud Jul 13, 2024
855fae6
change the test such that files will be compared
SantaMcCloud Jul 23, 2024
0af783a
make the name options a bit clear
SantaMcCloud Jul 31, 2024
022abb6
did an example for the show rank option
SantaMcCloud Jul 31, 2024
45b2a87
Update tools/taxonkit/taxonkit_name2taxid.xml
SantaMcCloud Jul 31, 2024
f244cdb
Update tools/taxonkit/taxonkit_name2taxid.xml
SantaMcCloud Jul 31, 2024
7d875e8
Update tools/taxonkit/taxonkit_name2taxid.xml
SantaMcCloud Jul 31, 2024
8ae7dbc
Update tools/taxonkit/taxonkit_name2taxid.xml
SantaMcCloud Jul 31, 2024
ae067f4
test for fix
SantaMcCloud Jul 31, 2024
1be6b8f
test commit
SantaMcCloud Jul 31, 2024
55ef8d0
Merge branch 'main' into taxonkit
SantaMcCloud Jul 31, 2024
ea0ae5e
Merge branch 'taxonkit' of https://github.com/SantaMcCloud/tools-iuc …
SantaMcCloud Jul 31, 2024
fb592ac
fix test
SantaMcCloud Aug 1, 2024
6cfc298
fix test
SantaMcCloud Aug 1, 2024
f50f0b3
commit to fix format problems
SantaMcCloud Aug 1, 2024
380719b
Update taxonkit_name2taxid.xml
SantaMcCloud Aug 1, 2024
65b1996
format fix maybe
SantaMcCloud Aug 1, 2024
b2f8360
delet file for reseting it on github
SantaMcCloud Aug 1, 2024
f93386a
add the deleted file to see if the format is fixes now
SantaMcCloud Aug 1, 2024
03dc04b
Update taxonkit_name2taxid.xml
SantaMcCloud Aug 1, 2024
5e4af94
foramted
SantaMcCloud Aug 1, 2024
12363c9
Update tools/taxonkit/taxonkit_name2taxid.xml
bgruening Aug 8, 2024
2fd395a
change such that the newest version will be dowanloaded and unpacked …
SantaMcCloud Aug 9, 2024
8fe321e
change but fomrat problem
SantaMcCloud Aug 9, 2024
5341887
fix file format
SantaMcCloud Aug 9, 2024
d96c35d
Apply suggestions from code review
bgruening Aug 9, 2024
ccd125e
Delete tools/taxonkit/test-data/test-db/names.dmp
SantaMcCloud Aug 14, 2024
d44f3dd
Delete tools/taxonkit/test-data/test-db/delnodes.dmp
SantaMcCloud Aug 14, 2024
214cc69
Update tools/taxonkit/taxonkit_name2taxid.xml
SantaMcCloud Aug 14, 2024
de7b816
Update tools/taxonkit/taxonkit_name2taxid.xml
SantaMcCloud Aug 14, 2024
7a9c364
revert deleting
SantaMcCloud Aug 14, 2024
49663eb
change value names
SantaMcCloud Aug 14, 2024
3d91034
fix test
SantaMcCloud Aug 14, 2024
e2f9f64
now fixed
SantaMcCloud Aug 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions tools/taxonkit/.shed.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
name: taxonkit
owner: iuc
description: TaxonKit - A Practical and Efficient NCBI Taxonomy Toolkit
homepage_url: https://bioinf.shenwei.me/taxonkit/
long_description: |
TaxonKit is a set of tools for analyzing and manipulating taxonomic data. It includes utilities for converting metagenomic profile tables to CAMI format, among other functionalities.
remote_repository_url: https://github.com/shenwei356/taxonkit
categories:
- Metagenomics
type: unrestricted
auto_tool_repositories:
name_template: "{{ tool_id }}"
description_template: "Wrapper for TaxonKit function: {{ tool_name }}."
suite:
name: "suite_taxonkit"
description: "A suite of tools that brings the TaxonKit project into Galaxy."
long_description: |
TaxonKit is a set of tools for analyzing and manipulating taxonomic data, including converting metagenomic profile tables to CAMI format.
22 changes: 22 additions & 0 deletions tools/taxonkit/macros.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
<macros>
<xml name="requirements">
<requirements>
<requirement type="package" version="@TOOL_VERSION@">taxonkit</requirement>
<yield/>
</requirements>
</xml>
<token name="@TOOL_VERSION@">0.17.0</token>
<token name="@VERSION_SUFFIX@">0</token>
<token name="@PROFILE@">21.05</token>
<xml name="biotools">
<xrefs>
<xref type="bio.tools">taxonkit</xref>
</xrefs>
</xml>
<xml name="citations">
<citations>
<citation type="doi">10.1016/j.jgg.2021.03.006</citation>
<yield/>
</citations>
</xml>
</macros>
127 changes: 127 additions & 0 deletions tools/taxonkit/taxonkit_name2taxid.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
<tool id="name2taxid" name="Name2taxid" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
<description>Convert taxon names to TaxIds</description>
SantaMcCloud marked this conversation as resolved.
Show resolved Hide resolved
<macros>
<import>macros.xml</import>
</macros>
<expand macro="biotools"/>
<expand macro="requirements" />
<command detect_errors="exit_code">
<![CDATA[

mkdir -p ../home/.taxonkit &&

#if $data.is_select == 'his':
#for $f in $data.files:
ln -s '$f' '../home/.taxonkit/$f.element_identifier' &&
#end for
#else:
ln -s '$ncbi.fields.path/names.dmp' '../home/.taxonkit/names.dmp' &&
ln -s '$ncbi.fields.path/merged.dmp' '../home/.taxonkit/merged.dmp' &&
ln -s '$ncbi.fields.path/nodes.dmp' '../home/.taxonkit/nodes.dmp' &&
ln -s '$ncbi.fields.path/delnodes.dmp' '../home/.taxonkit/delnodes.dmp' &&
#end if

taxonkit name2taxid
--name-field $name_field
bgruening marked this conversation as resolved.
Show resolved Hide resolved
$sci_name
$show_rank
'$input'
> '$output'
]]>
</command>
<inputs>
<param name="input" type="data" format="tabular" label="Input file"
help="Input any tsv file where the NCBI names are written. You can also use a .txt but only one name per row!" />
<param argument="--name-field" type="data_column" data_ref="input" label="Select column with the names"
help="Select the colum where the name are written" />
<param argument="--sci-name" type="boolean" falsevalue="" truevalue="--sci-name" checked="false" label="Only searching scientific names"/>
<param argument="--show-rank" type="boolean" falsevalue="" truevalue="--show-rank" checked="false" label="Show rank" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you specify what this shows ? Does it include the rank like this _g ? Maybe show a small example in the help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<conditional name="data">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the indentation is really off here ... can you please use the https://github.com/galaxyproject/galaxy-language-server and use the autoformat option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it should be fix now but im not sure but it look better now at least at this postion where you tag it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its broken again. Please use the https://github.com/galaxyproject/galaxy-language-server plugin, this will help you a lot and also fixes all the formatting for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this add-on, but is does nothing on my side... I save the file and at this moment they should auto format it, which it didn't do. Also, it seems that the file which I uploaded always was not the file showed here with the broken format. I had the format on my side always correct, but it was shown always broken here on GitHub, which is strange. I now edited manually over GitHub in the hope that the format is now correct!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ctrl+Shift+P and then format the document and "Galaxy tool: sort the attributes of all...".

You are using TABS and should use 4-spaces. Uploading the document does not change the content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay it should be done now with 5e4af94 .

Sorry for this stupid thing!

<param name="is_select" type="select" label="Use either a DM or history files for the ncbi database">
SantaMcCloud marked this conversation as resolved.
Show resolved Hide resolved
<option value="dm">Data manager</option>
SantaMcCloud marked this conversation as resolved.
Show resolved Hide resolved
<option value="his">History</option>
</param>
<when value="dm">
<param name="ncbi" type="select" label="NCBI database"
help="Choose NCBI database version" >
<options from_data_table="ncbi_taxonomy">
<validator message="No NCBI database is available" type="no_options"/>
</options>
</param>
</when>
<when value="his">
<param name="files" format="tabular" type="data" multiple="true"
label="Input .dmp files"
help="To use the NCBI database we need to provide followed .dmp files: nodes.dmp, names.dmp, delnodes.dmp and merged.dmp. You can get the via download of this file **ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz** and unzip it!" />
SantaMcCloud marked this conversation as resolved.
Show resolved Hide resolved
</when>
</conditional>
</inputs>
<outputs>
<data name="output" format="tabular" label="Names2taxID" />
</outputs>
<tests>
<test>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u add a test for the rank option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way i can add this since i need special lines from the original database and i dont know how they work together to get the rank option. I did add an example in the help section to show how the output should look it you use a complete database

<param name="input" value="name2taxid_test1.tsv" ftype="tabular"/>
<param name="name_field" value="1" />
<conditional name="data">
<param name="is_select" value="dm"/>
<param name="ncbi" value="test-db-tox" />
</conditional>
<output name="output">
<assert_contents>
<has_text text="9606" n="1"/>
<has_text text="349741" n="1"/>
</assert_contents>
</output>
</test>
<test>
<param name="input" value="name2taxid_test2.tsv" ftype="tabular" />
<conditional name="data">
<param name="is_select" value="dm"/>
<param name="ncbi" value="test-db-tox" />
</conditional>
<param name="name_field" value="2" />
<output name="output">
<assert_contents>
<has_text text="test" n="4"/>
<has_text text="Akkermansia muciniphila ATCC BAA-835" n="1"/>
<has_text text="Akkermansia muciniphila" n="2"/>
<has_text text="349741" n="1"/>
</assert_contents>
</output>
</test>
<test>
<param name="input" value="name2taxid_test3.txt" ftype="tabular"/>
<param name="name_field" value="1" />
<conditional name="data">
<param name="is_select" value="his"/>
<param name="files" value="test-db/nodes.dmp,test-db/merged.dmp,test-db/names.dmp,test-db/delnodes.dmp" ftype="tabular"/>
</conditional>
<param name="sci_name" value="true"/>
<param name="show_rank" value="true"/>
<output name="output">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can u please compare the output generated by the tool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with 855fae6

<assert_contents>
<has_text text="Drosophila" n="3"/>
<has_text text="32281" n="1"/>
</assert_contents>
</output>
</test>
</tests>
<help>
<![CDATA[

This tool can convert any NCBI name to there taxid. Simply input any tsv or txt file into here and state the column where the name are written.
SantaMcCloud marked this conversation as resolved.
Show resolved Hide resolved

.. class:: infomark

Example

::
Homo sapiens
Akkermansia muciniphila ATCC BAA-835
Akkermansia muciniphila
Mouse Intracisternal A-particle
]]>
</help>
<expand macro="citations" />
</tool>
4 changes: 4 additions & 0 deletions tools/taxonkit/test-data/name2taxid_test1.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Homo sapiens
Akkermansia muciniphila ATCC BAA-835
Akkermansia muciniphila
Mouse Intracisternal A-particle
4 changes: 4 additions & 0 deletions tools/taxonkit/test-data/name2taxid_test2.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
test Homo sapiens
test Akkermansia muciniphila ATCC BAA-835
test Akkermansia muciniphila
test Mouse Intracisternal A-particle
1 change: 1 addition & 0 deletions tools/taxonkit/test-data/name2taxid_test3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Drosophila
1 change: 1 addition & 0 deletions tools/taxonkit/test-data/ncbi_taxonomy.loc.test
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
test-db-tox Test Database ${__HERE__}/test-db
Loading