Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tool wrapper <options from_dataset> with undocumented behavior #14324

Closed
wm75 opened this issue Jul 13, 2022 · 8 comments
Closed

tool wrapper <options from_dataset> with undocumented behavior #14324

wm75 opened this issue Jul 13, 2022 · 8 comments

Comments

@wm75
Copy link
Contributor

wm75 commented Jul 13, 2022

https://docs.galaxyproject.org/en/latest/dev/schema.html#from-dataset doesn not really talk about which kind of format the dataset from which to populate the dropdown should have. Since you populate the dropdown from columns of the dataset, at a minimum those must be defined somehow.
I tested this today with input of format="csv" and it turns out that you never get any options in that case.
Looking into the source code I found that

self.separator = elem.get('separator', '\t')

describes an undocumented attribute "separator", and when I include this in the options like so:
<options from_dataset="my_input_dataset" separator=","> then Galaxy will happily populate my dropdown with the selected columns of a csv.

For my specific usecase this is good enough, but:

  1. I'm feeling bad using an undocumented feature (or at least I think it's undocumented)
  2. hard-coding the separator like this does seem strange when Galaxy, for columnar datatypes, should already know about the separator

So is it just oversight that the separator attribute isn't documented and it can be used safely?
Should Galaxy introspect the separator instead?

@bernt-matthias
Copy link
Contributor

Docs seem missing. Alternatively it should be possible to use filters, which are better documented.

@bernt-matthias
Copy link
Contributor

Can you check if we have a tool test for this?

@wm75
Copy link
Contributor Author

wm75 commented Jul 13, 2022

There don't seem to be any tests here and I don't find an example in tools-iuc either. Everything seems to be using tabular input.

Filters? How? With something like <filter type="multiple_splitter" column="4" separator=","/>?
That might work (but haven't tested). Wouldn't call it well-defined though because what's column number whatever in a csv file then (it doesn't have just one column does it?)

@bernt-matthias
Copy link
Contributor

There are some tests here test/functional/tools/filter_multiple_splitter.xml

Wouldn't call it well-defined though because what's column number whatever in a csv file then (it doesn't have just one column does it?)

My expectation would be:

  • The <options from_dataset> will try to split on tab (which will produce a single column) ..
  • The filter when applied to column 1 (or 0?) can then be used to split at ,.

Sill, you are absolutely right, this needs documentation.

Btw. does the xml linter complain about the separator attribute. It seems to be missing in the xsd file?

@wm75
Copy link
Contributor Author

wm75 commented Jul 13, 2022

Btw. does the xml linter complain about the separator attribute. It seems to be missing in the xsd file?

Ah, that's a good point. Had forgotten linting before and, yes, you're right it's failing as you expected.

@wm75
Copy link
Contributor Author

wm75 commented Jul 13, 2022

My expectation would be:

The will try to split on tab (which will produce a single column) ..
The filter when applied to column 1 (or 0?) can then be used to split at ,.

Alright, that's exactly what it does. Unfortunately the result is not very helpful for my usecase since (again, exactly as explained in the tests) it splits every line into mutliple (1-column) options. So the resulting options list consists of all elements of the original table, but I really only want those from one particular column.

@bernt-matthias
Copy link
Contributor

@wm75 .. have a look here #14332 .. I targeted 22.01

@bernt-matthias
Copy link
Contributor

We can still think of getting the separator from the datatype.. But I would still keep the separator attribute, since I could also imagine cases where txt files are used.

Could you open a new issue if you agree.

@mvdbeek mvdbeek closed this as completed in d642ce3 Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants