Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the profile2cami tool, a component of the TaxonKit suite. #6085

Merged
merged 11 commits into from
Jul 26, 2024

Conversation

Alby-Git
Copy link
Contributor

Tool details for profile2cami

  • Name: profile2cami
  • Description: A utility from the taxonkit suite for converting taxonomic profiles into the CAMI format.
  • Homepage: profile2cami
  • License: License

Checklist for submission

FOR CONTRIBUTOR:

  • - I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • - License permits unrestricted use (educational + commercial)
  • - This PR adds a new tool or tool collection
  • - This PR updates an existing tool or tool collection
  • - This PR does something else (explain below)

Thank you for your consideration.

<param name="show_rank" type="text" value="superkingdom,phylum,class,order,family,genus,species,strain" label="Show Ranks" help="Specify the ranks to show in the result file." />
</inputs>
<outputs>
<data name="cami_output" format="tsv" label="Converted CAMI Output" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a proper CAMI format to Galaxy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taxonkit serves as an intermediary step in my project to prepare data for the upcoming CAMI Opal Tool, which requires the CAMI Profiling Bioboxes format as input. Within the scope of my project, implementing the CAMI format isn't necessary, but it could be beneficial for broader applications.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is well-defined and could become a standard (https://github.com/CAMI-challenge/contest_information/blob/master/file_formats/CAMI_TP_specification.mkd) probably a good start for me to learn how to add a new format. But as table it works just fine ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call this file .loc.test?

Copy link
Contributor Author

@Alby-Git Alby-Git Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested renaming the file; unfortunately, I think Galaxy needs a .loc file to find the content.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to shrink this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally! Reduced the size a bit; I hope that is sufficient.

Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome quality! Just a few tiny questions. Thanks

Copy link
Contributor

@paulzierep paulzierep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. In generally, I am just a bit confused about the recomputation of the abundance. Basic: 99.999999999999986; with percentage: 1.000000000000000; that could just be NumPy rounding issues, but why does recompute not consider the deleted taxon ?

<param name="show_rank" type="text" value="superkingdom,phylum,class,order,family,genus,species,strain" label="Show Ranks" help="Specify the ranks to show in the result file." />
</inputs>
<outputs>
<data name="cami_output" format="tsv" label="Converted CAMI Output" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is well-defined and could become a standard (https://github.com/CAMI-challenge/contest_information/blob/master/file_formats/CAMI_TP_specification.mkd) probably a good start for me to learn how to add a new format. But as table it works just fine ...

Copy link
Contributor

@paulzierep paulzierep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with these small changes it is ready. But when I tested it, I realised, that the results seem not to comply with the CAMI satetment: The percentages given for all taxa from the same rank should sum up to <= 100%
https://github.com/CAMI-challenge/contest_information/blob/master/file_formats/CAMI_TP_specification.mkd
However, the tools seems to sum up percentages when lower ranks are reported, this is not correct. I need to inverstigate, but if the tools is indeed wrong, I think we should not merge it ...

@@ -0,0 +1,18 @@
name: taxonkit
owner: iuc
description: Tools for metagenomic profiling and analysis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not used for profiling and specific to NCBI TaxonKit - A Practical and Efficient NCBI Taxonomy Toolkit something like that I guess

@@ -0,0 +1,17 @@
<macros>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add bio.tool xml: https://bio.tools/taxonkit

--data-dir '${taxonomy.fields.path}'
--abundance-field '${abundance_field}'
--taxid-field '${taxid_field}'
#if $percentage:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

]]>
</command>
<inputs>
<param name="input_file" type="data" format="txt" label="Input Profile File" help="Input the tab-delimited profile file with TaxId and abundance columns." />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<param name="input_file" type="data" format="txt" label="Input Profile File" help="Input the tab-delimited profile file with TaxId and abundance columns." />
<param name="input_file" type="data" format="txt" label="Input Profile File" help="A tab-delimited profile file with TaxId and abundance columns." />

</command>
<inputs>
<param name="input_file" type="data" format="txt" label="Input Profile File" help="Input the tab-delimited profile file with TaxId and abundance columns." />
<param argument="--taxonomy" type="select" label="NCBI taxonomy" help="To have actual human-readable taxon names in the standardised output">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<param argument="--taxonomy" type="select" label="NCBI taxonomy" help="To have actual human-readable taxon names in the standardised output">
<param argument="--taxonomy" type="select" label="NCBI taxonomy" help="This NCBI database is used to map human-readable taxon names to TaxId's.">


Profile2CAMI is a tool for converting metagenomic profile tables to CAMI format.

**Inputs**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All info that is given in the input parameters can be removed from the help

<param name="keep_zero" type="boolean" value="false" label="Keep Zero Abundances" help="Check to keep taxons with abundance of zero." />
<param name="sample_id" type="text" value="" label="Sample ID" help="Optional sample ID to include in the result file." />
<param name="taxonomy_id" type="text" value="" label="Taxonomy ID" help="Optional taxonomy ID to include in the result file." />
<param name="show_rank" type="text" value="superkingdom,phylum,class,order,family,genus,species,strain" label="Show Ranks" help="Specify the ranks to show in the result file." />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use a type="select" and multiple="true" param for this ´, like this one:

         <param name="show_rank" type="select" multiple="true" label="Show Ranks">
            <option value="superkingdom">Superkingdom</option>
            <option value="phylum">Phylum</option>
            <option value="class">Class</option>
            <option value="order">Order</option>
            <option value="family">Family</option>
            <option value="genus">Genus</option>
            <option value="species">Species</option>
            <option value="strain">Strain</option>
        </param>

Make all options

@paulzierep
Copy link
Contributor

When investigating further, I think this tool does currently not work as expected: shenwei356/taxonkit#99
Before they look into the issue, I would wait with merging it ...

@shenwei356
Copy link

Hi guys, a new TaxonKit version is ready for this: bioconda/bioconda-recipes#48917

<!-- Locations of taxonomy data downloaded from NCBI -->
<table name="ncbi_taxonomy" comment_char="#">
<columns>value, name, path</columns>
<file path="${__HERE__}/test-data/ncbi_taxonomy.loc" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<file path="${__HERE__}/test-data/ncbi_taxonomy.loc" />
<file path="${__HERE__}/test-data/ncbi_taxonomy.loc.test" />

@@ -0,0 +1,2 @@
#value name path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this file should be renamed to .loc.test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgruening thank you for your review. I hope the changes now align with your suggestions.

Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM!

Copy link
Contributor

@paulzierep paulzierep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this changes (only cosmetics) its good for merge from my side

<!-- Locations of taxonomy data downloaded from NCBI -->
<table name="ncbi_taxonomy" comment_char="#">
<columns>value, name, path</columns>
<file path="${__HERE__}/test-data/ncbi_taxonomy.loc.sample" />
Copy link
Contributor

@paulzierep paulzierep Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<file path="${__HERE__}/test-data/ncbi_taxonomy.loc.sample" />
<file path="tool-data/ncbi_taxonomy.loc" />

afaik, the ${__HERE__} variable is only used in tests and the real data will be stored in the tool-data folder, @bgruening can you confirm that ?
Also the path should point to the .loc file not the sample.

@@ -0,0 +1,2 @@
#value name path
test-db-tox "Test Database" ${__HERE__}/test-db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test-db-tox "Test Database" ${__HERE__}/test-db
test-db-tox "Test Database" tool-deta/test-db

Same as the other comment, although the real path does not really matter, since the DM sets it

@@ -0,0 +1,2 @@
#value name path
test-db-tox "Test Database" ${__HERE__}/test-db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test-db-tox "Test Database" ${__HERE__}/test-db

Line should be removed. The .sample files and their content would be used.

Copy link
Contributor

@paulzierep paulzierep Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed or commented ?, in most other tools there is a comment showing how it looks like

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented is also fine .. an example is a good thing to have

Copy link
Contributor

@paulzierep paulzierep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small change required

#if $taxonomy_id:
-t '${taxonomy_id}'
#end if
--show-rank '${ranks}'
Copy link
Contributor

@paulzierep paulzierep Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put this in an if condition, so that if nothing is selected, all ranks are used as is the default by the tool ? also, please adjust the test for it, that's more convenient for the user

@paulzierep
Copy link
Contributor

Good to merge from my side @bgruening @bernt-matthias ?

@bgruening bgruening merged commit 695ea58 into galaxyproject:main Jul 26, 2024
11 checks passed
nilchia pushed a commit to pavanvidem/tools-iuc that referenced this pull request Aug 24, 2024
…xyproject#6085)

* Added the profile2cami tool, a component of the TaxonKit suite.

* Renamed shed.yml -> .shed.yml

* Shrinked delnodes.dmp 684kb -> 150kb

* Renamed test.loc -> ncbi_taxonomy.loc

* Adjusted taxonkit, removed folder profile2cami, added suite in .shed.yml

* Updated Taxonkit to version 0.17.0 - worked on issues.

* Renamed taxonomy.loc -> taxonomy.loc.test

* Adjusted test, adjusted loc.sample

* Try to fix linting error

* Resolving linting error

* Adjusted the output label.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants