Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parameters format #39

Closed
nservant opened this issue Oct 14, 2019 · 16 comments
Closed

parameters format #39

nservant opened this issue Oct 14, 2019 · 16 comments

Comments

@nservant
Copy link
Collaborator

Thanks! It's one of those unwritten rules that we've discussed and implemented to various degrees in some of the main pipelines but haven't had the time to officially document 😅 You could change the other parameters too? If it helps, if things do change then I'll have to update numerous pipelines! I've also updated the template today to change the last remaining parameter to this format...
https://github.com/nf-core/tools/pull/425/files#diff-4ac32a78649ca5bdd8e0ba38b7006a1eR13

Originally posted by @drpatelh in #37

@nservant
Copy link
Collaborator Author

@drpatelh , even in the ChIP-seq or ATAC-seq projects, all parameters do not have the same format.
Should I wait a bit before changing all my parameters :) ?

@drpatelh
Copy link
Member

They should be! The only parameter is maxMultiqcEmailFileSize which I have now updated in the template (see nf-core/tools#425). Which ones have you spotted?

@nservant
Copy link
Collaborator Author

What about --bwa_index, --gene_bed, --tss_bed, --macs_gsize, --mito_name , etc. ?

@drpatelh
Copy link
Member

Ah, sorry. Maybe I didn't explain properly. Any parameters that require a user input e.g. (file path, integer etc) are in snake_case format and any that are boolean switches are in camelCase format. Maybe we can use this thread for discussion.

@ewels @apeltzer What do you think about this? This is something we initially discussed way back in 2 years ago Barcelona at the NF conference. Quite a few pipelines are now adopting this approach e.g. rnaseq (mostly I think), chipseq, smrnaseq and a few others. I have also change the template yesterday for consistency. If you both agree we can create an issue on the website to add some docs. Mainly so it's not lost in the ether again 😅

@nservant
Copy link
Collaborator Author

Ah !! ok, I didn't get it. This is up to you. I will update according to your best practice.
If I can just make a comment, my feeling is that choosing a single format for all variables is much more convenient (and easy to follow for new developers).

@ewels
Copy link
Member

ewels commented Oct 15, 2019

My feeling is also that it would be simpler to just have one and stick to it (my preference would be snake_case). I know we discussed this before, I don't remember why we wanted both..

@drpatelh
Copy link
Member

Its easier to distinguish parameter type? Personally, I think its easier to work out what a parameter is doing i.e. does it require some sort of input or is it just a switch. I think you probably done this subconsciously for the NGI pipelines because they had parameters like saveAlignedIntermediates, saveReference, singleEnd that were boolean and then others that werent e.g. bwa_index.

@ewels
Copy link
Member

ewels commented Oct 15, 2019

But surely users should know what parameters do before using them..? 😉 And we should have parameter validation pretty soon. And personally I find it harder to remember which parameters use which style.

I think that the discrepancy in the old pipelines was more down to who wrote the code - me or @Hammarn 😉

@drpatelh
Copy link
Member

So the beauty of having the two styles is that you can pretty much figure out what a parameter is doing before you know what it's doing or even before using it!

Ok. Lets put it to the test! Probably not the best example because you know what most of these parameters are doing already (:wink:). If you had to look through them without knowing anything about what they are doing doesn't it help to figure out which ones require an input and which ones don't? To add to that the case distinction allows the boolean ones to be pretty much self-explanatory. Just gives some visual partitioning to a massive parameter set that would otherwise all look the same 😸

https://github.com/drpatelh/chipseq/blob/b2fdfd82b76262d57478a81913298189b983badf/main.nf#L22

@ewels
Copy link
Member

ewels commented Oct 15, 2019

Why not use the [type] notation in the help docs, as already done for --clip_r1 and others? More explicit, more consistent and easier for beginners 😉

--single_end [bool]            Specifies that the input is single-end reads
--seq_center [str]             Sequencing center information to be added to read group of BAM files

Or maybe [flag] for the bools to make it clearer. Or nothing for those, but [str] for anything with an argument... But you get my point anyway.

@ewels
Copy link
Member

ewels commented Oct 15, 2019

x-ref nf-core/tools#426

@drpatelh
Copy link
Member

Ok. Having thought about it a little, I'm coming round to using snake_case for all parameters. It is probably best for standardisation. As you mentioned, we need to consistently enforce the use of [type] notation in main.nf and in usage.md because most people won't look at the main script.

Let's agree on this so I can update the template to reflect this and we will need to add some docs to the website too. I have a few pipelines that are ripe for release now and so I can make the update sooner rather than later.

@nf-core/core are we all in agreement? Speak now or forever hold your peace 😉

@ewels
Copy link
Member

ewels commented Oct 16, 2019

Nice! This should go into the guidelines that @maxulysse is starting to draft. He maybe has strong views on the issue?

Note that we should be auto-generating the pipeline help text from the parameters settings file soon (nf-core/hlatyping#61), so that will make that part much easier. And we can / should also lint the documentation from that (nf-core/tools#111), so can also enforce standardisation there.

@apeltzer
Copy link
Member

Snake case is fine by me - as long as we all adhere to it and make things as the automated linting by @ewels possible ;-)

@drpatelh
Copy link
Member

Ive made some headway with this for the atacseq and chipseq pipelines:
https://github.com/drpatelh/nf-core-atacseq/blob/7e2c171753bebe093e129d2163041cc54b64435e/main.nf#L21-L87

Any comments? Also changed --design to --input for standardisation.

@ewels
Copy link
Member

ewels commented Oct 18, 2019

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants