Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download: Get DSL2 singularity containers #832

Merged
merged 33 commits into from
Feb 8, 2021

Conversation

ewels
Copy link
Member

@ewels ewels commented Jan 20, 2021

With DSL2, we have multiple containers and they are specified within module files. These are no longer found by nextflow config and therefore nf-core download doesn't see them (#818).

This PR adds several new bits of functionality to nf-core download for singularity containers:

Finding DSL2 containers

The tool now scrapes any .nf files in modules/ recursively, and parses them to look for a lines that look like this:

container = "xxx"

If multiple matches are found in a file, any that start with http are prioritised and the first match after that is used.

NB: Scraping code only handles simple strings and will fail with anything more complicated (e.g. Sarek). This kind of thing needs to be resolved by Nextflow and so requires nextflow config or similar to work. I can't see any way around this in the scope of this PR.

Direct image downloads

The code can now directly download images if we have a URL. If the container name starts with http then we download this directly in Python. If not, then we pass to a singularity pull command as before. Python downloads are made pretty with a nice progress bar etc.

Caching downloads

Finally, the code now checks for the NXF_SINGULARITY_CACHEDIR environment variable and saves singularity images there first if set, before copying to the archive. This speeds things up massively if running a few times and will be increasingly important with shared images across multiple DSL2 pipelines. If the env var is not set then the files are saved directly to the archive, but a log message with a tip about it is shown.

Still to do:

  • Update the downloaded pipeline code so that it knows where to find the bundled singularity images (nf-core download: customise config to include path to singularity image #464)
  • Check that auto-generated image names from https downloads are the same as those made by Nextflow
  • Figure out how to have nested Progress bars with different columns in Rich.
    • Have a global progress bar showing how far through all containers we are, then a transient progress bar for each download.
  • More testing (check DSL1 pipelines still work)

Note 1: This PR does not fix #513, but I suspect that this will be less of a problem as all pipelines transition to DSL2 so I think that we can probably ignore that issue now.

PR checklist

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md is updated
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

ewels added 2 commits January 21, 2021 00:01
Also add support for direct downloads of https container URLs
@ewels ewels added WIP Work in progress command line tools Anything to do with the cli interfaces DSL2 labels Jan 20, 2021
@codecov
Copy link

codecov bot commented Jan 20, 2021

Codecov Report

Merging #832 (ded8339) into dev (be0a144) will decrease coverage by 2.84%.
The diff coverage is 43.72%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #832      +/-   ##
==========================================
- Coverage   77.72%   74.87%   -2.85%     
==========================================
  Files          22       22              
  Lines        2496     2695     +199     
==========================================
+ Hits         1940     2018      +78     
- Misses        556      677     +121     
Impacted Files Coverage Δ
nf_core/utils.py 90.65% <ø> (-0.06%) ⬇️
nf_core/download.py 63.41% <40.09%> (-26.59%) ⬇️
nf_core/schema.py 86.47% <70.58%> (-0.84%) ⬇️
nf_core/__main__.py 65.26% <83.33%> (+0.40%) ⬆️
nf_core/lint/schema_lint.py 81.81% <100.00%> (+0.86%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be0a144...e4a397e. Read the comment docs.

ewels added 3 commits January 21, 2021 22:04
Remove some code duplication, improve logic for cache dirs.
Now have two progress bars - one showing how far through the images we are and one for the specific image download.
Customises the rich progress bar rendering to allow different output fields for the different types of task.
ewels added 8 commits January 21, 2021 23:14
Tells tool to overwrite any existing files it finds.
* Make the different operations go in order (copy cache first, then downloads, then pulls)
* Refactored the confusing function that copied cached files and returned file paths
* Improved progress bars with better, changing, description text
To prevent accidentally using partial broken downloads, use an additional .partial file extension whilst download is running and rename the file to remove this when complete.
@ewels
Copy link
Member Author

ewels commented Jan 25, 2021

Simultaneous downloads are awesome, but now ctrl-c doesn't cleanly exit. Any ideas?

@ewels ewels force-pushed the download-dsl2-containers branch from 31c7fdd to 7dd36f0 Compare January 31, 2021 22:15
@maxulysse
Copy link
Member

Still thinking about complicated cases.
(Sarek obviously).
I can make the list of containers more clear, so that it's easy to read, but we probably don't need to download all containers all the time.
(I'm mainly thinking here for the annotation, I don't need to download the huge human containers if I'm only working with mouse for example)
Any way to control that?

@ewels
Copy link
Member Author

ewels commented Feb 1, 2021

I don't think that this will change anything for Sarek. nf-core download currently ignores those containers, right? I think that will still be the case after this PR unless you move them into DSL2 module files that have exactly the right formatting.

@maxulysse
Copy link
Member

OK, we'll figure it out then.

@ewels ewels linked an issue Feb 1, 2021 that may be closed by this pull request
@ewels ewels marked this pull request as ready for review February 1, 2021 21:40
@ewels ewels removed the WIP Work in progress label Feb 1, 2021
@ewels ewels force-pushed the download-dsl2-containers branch from 90ee064 to 6e943dc Compare February 1, 2021 21:46
@ewels
Copy link
Member Author

ewels commented Feb 1, 2021

Ok, done a load more testing and bugfixing, written some docs, hopefully should be about ready to go now I think.

@drpatelh / @maxulysse would be great if you guys could give it another test! I rewrote how the logging works for the Singularity pull containers so now that's beautiful as well 🤩 (and you know that something is actually happening..)

Phil

@ewels
Copy link
Member Author

ewels commented Feb 1, 2021

Singularity image pull demo:

singularity_pull.mp4

@maxulysse
Copy link
Member

Trying the latest version now.
The Downloading singularity images progress bar is working for me now.

@maxulysse
Copy link
Member

Now, my only issue is to figure out how not to download every modules...

I don't really care if people download both the gatk and the gatk_spark modules, it won't take that much space.
(or even all the modules that people won't use because sarek has a lot of choice...)

But the extra containers for annotation are something else.
So currently my plan is to write the declaration for said containers in a way that is not recognized by tools, and get people to download separately.

Sounds good to you?

@ewels
Copy link
Member Author

ewels commented Feb 2, 2021

Sounds great!

Copy link
Member

@maxulysse maxulysse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried it and works well

@ewels
Copy link
Member Author

ewels commented Feb 3, 2021

Just waiting on @KevinMenden to be happy now 👍🏻

@KevinMenden
Copy link
Contributor

I'm happy, just had this weird error yesterday 🤔 but that wasn't breaking anything. Just didn't show the log for some reason.
Otherwise it worked like a charm 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
command line tools Anything to do with the cli interfaces high-priority
Projects
None yet
4 participants