Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update of nf-core/modules file structure #1159

Closed
KevinMenden opened this issue Jul 5, 2021 · 11 comments · Fixed by #1171
Closed

Update of nf-core/modules file structure #1159

KevinMenden opened this issue Jul 5, 2021 · 11 comments · Fixed by #1171

Comments

@KevinMenden
Copy link
Contributor

KevinMenden commented Jul 5, 2021

As discussed on Slack downstream of here: https://nfcore.slack.com/archives/CE5LG7WMB/p1625484224149900

We want to update the file structure of nf-core/modules and modules in pipelines to better work with external module repositories and to allow for multiple versions of the same tool being installed in a nf-core pipeline.

nf-core/modules

For nf-core/modules, the structure should change to

modules/tool/subtool

So essentially replacing software with modules.
Custom tools, that are not bioinformatic software, will be blaced in the folder modules/custom/mycustomtool

nf-core pipelines

For pipelines, the structure will change into:

nf-core/rnaseq/modules/nf-core/modules/bedtools/intersect/short_sha/

So that every module is stored with the short_sha, which allows users to have the same modules twice, with different sha sums (i.e. versions).

Modules from external repos can then be installed into:

nf-core/rnaseq/modules/username/reponame/external_tool/external_subtool/short_sha
@ewels
Copy link
Member

ewels commented Jul 5, 2021

Yup, exactly! 👍🏻 And that we want to add to modules.json the repo alongside the git sha.. (even though I guess this is already clear from the folder structure, but still).

@grst
Copy link
Member

grst commented Jul 6, 2021

For pipelines, the structure will change into:

nf-core/rnaseq/modules/nf-core/modules/bedtools/intersect/short_sha/

I'm all for adding that information to modules.json, but is it really necessary to have that clunky path?

  • Is there any legitimate reason to have two versions of the same module in a single pipeline? (I don't think any other package manager would allow that anyway. If one really, really needs that I don't think it would be too much to ask to manually copy one module folder to custom)
  • The import paths in the .nf files will get clunky, too. Also, they need to be updated every time the modules are updated (Maybe tools could do that, but it feels like an over-complication)
  • The readability of pipeline suffers. I already find nf-core DSL2 pipelines already quite hard to read (information is scattered among many places) and it will make matters worse if modules are in some cryptic sha directory (consider an outstander trying to read the code). I find it somewhat paradox, that we on the one hand strive for best-practice pipelines, on the other hand, I can't reasonably tell my master student to learn nextflow by reading the rnaseq pipeline. Over-generalization negatively impacts readability.

@KevinMenden
Copy link
Contributor Author

I agree (I think we all do) that two versions in the same pipeline is quite an edge case. And personally I'm also not sure whether we need to support that at all costs as one could also just make a local module or copy it manually, as Gregor said.
So personally I agree that this is not really needed and we have it in the modules.json.

For the rest of the file path though, this is to make the modules tool suite more compatible with external package repositories (I'm thinking a bit like conda channels), and this could be quite a nice think to have down the line.

The import paths get more clunky, yep. The SHA sum is actually the most annoying part of this in my opinion, as you will definitely have to copy that one. So I think if we wouldn't have that, it would be fine/okay.

Absolutely agree on the readability. When I started with DSL2, I was completely lost, and I liked DSL1 pipelines much more in terms of readability. The main.nf file was large, but at least you could find everything in there. This is definitely a big downside of DSL2. (That's also why I wanted to write a bit of documentation about the pipeline structure)
But I guess that's just what you get with a more modular architecture, and there are tons of positive aspects of course. But yeah I fully agree with you that we should always try to think about what the benefits are of adding some feature, and whether they outweigh the added complexity.

@grst
Copy link
Member

grst commented Jul 6, 2021

For the rest of the file path though, this is to make the modules tool suite more compatible with external package repositories (I'm thinking a bit like conda channels), and this could be quite a nice think to have down the line.

I like that idea, and you can still have that, the module just needs to have a unique name across repos/"channels". Actually, this is also how it works with conda.

But I guess that's just what you get with a more modular architecture, and there are tons of positive aspects of course.

I absolutely agree that overall, maintainability strongly benefits from DSL2. We should strive for some balance between readability <-> generalization, though. Maybe it's worth discussing in slack or in a separate issue if and how we can improve readability in general.

@ewels
Copy link
Member

ewels commented Jul 6, 2021

The "multiple versions of the same module" thing got in because a few people fought for it fairly hard in Slack a few times. But I suspect mostly due to thinking about edge cases rather than a lot of direct need. I hadn't really thought about the fact that older versions could be managed by copying in as a local module if needed. I'm happy with that solution.

So yeah, I'm fine with dropping the git sha from the file path 👍🏻

@grst to clarify - the first part of the path is actually the repo name. So without the git sha it becomes:

modules/nf-core/modules/bedtools/intersect/

in other words (subtool not always needed):

modules/git_user/repo_name/tool/subtool/

I don't think that I want to shorten it any more than this really. I suspect that modules with duplicate names will be pretty common, as people develop in their own modules repo before shipping to nf-core for example. I also think that having username/repo makes the source of modules easier to find.

@drpatelh
Copy link
Member

drpatelh commented Jul 6, 2021

Agree with keeping the path as intuitive as possible when installing and importing in workflows. I somehow overlooked the latter during the frantic discussions yesterday but thank you for being the voice of reason here @grst !

In terms of simplicity, I am not sure we can do much more to make DSL2 pipelines as readable as possible. I think we have now found an obvious and standardised home for all of these different components i.e. modules, subworkflows, workflows both local and from nf-core. Findability was also one of my sticking points with DSL2 compared to DSL1 but we would always have to compromise on this for the flexibility we gain.

@antunderwood
Copy link
Contributor

I agree with @grst that having the benefit of the modularity of DSL2 has a slightly negative impact on readability. Would it be helpful to have a figure showing the tree structure diagrammatically and annotating it with the location of key DSL2 features - modules, workflows etc?

@antunderwood
Copy link
Contributor

I agree with @grst that having the benefit of the modularity of DSL2 has a slightly negative impact on readability. Would it be helpful to have a figure showing the tree structure diagrammatically and annotating it with the location of key DSL2 features - modules, workflows etc?

I wouldn't mind having a go working with @drpatelh to make sure the info is accurate

@KevinMenden
Copy link
Contributor Author

KevinMenden commented Jul 6, 2021

I think a figure would be great!
We have been adding documentation about the pipeline structure here (now outdated of course): https://nf-co.re/developers/adding_pipelines#nf-core-pipeline-structure

But that's actually a bit hard to find and I don't think many people read this. With nf-core pipelines getting gradually more complex I really think better documentation about all the bits and pieces that make up these pipelines now would be very valuable to newcomers.

@KevinMenden
Copy link
Contributor Author

KevinMenden commented Jul 7, 2021

Okay the changes have been made to nf-core/modules, now we need to adapt nf-core/tools. I will make a check list here for the different commands that have to be adapted. @ErikDanielsson it's probably best if we edit this comment here and mention what we are working on so we don't do the same. I'll start with the lint command.

  • nf-core modules install (assigned to @ErikDanielsson)
  • nf-core modules remove(assigned to @ErikDanielsson)
  • nf-core modules lint (assigned to @KevinMenden )
  • nf-core modules list (assigned to @ErikDanielsson)
  • nf-core modules bump-versions
  • nf-core modules create-test-yml
  • nf-core modules create

Other things

  • Update pipeline template in `nf-core/tools``

@KevinMenden
Copy link
Contributor Author

Okay closing this issue as the refactoring has been done both in nf-core/modules and nf-core/tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants