Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download: Seqera containers Patch 2 #3293

Merged
merged 11 commits into from
Dec 2, 2024

Conversation

MatthiasZepper
Copy link
Member

@MatthiasZepper MatthiasZepper commented Nov 22, 2024

The uncoordinated adoption of Seqera Containers in modules before the associated YAML could be devised and supported by tools has broken 'nf-core download' (#3179). #3182 topped up the associated test to make pipeline developers aware, so they could patch the modules with Biocontainer URIs to restore offline support.

With #3244 (Patch 1), I added minimal support for Seqera Containers, but had to include a bypass in the prioritization function to support the Seqera Container's odd http:// paths, that directly deeplink into the last layer of the Docker OCI images.

This change, however, also bypassed the deduplication routine, such that a race condition could arise, if a pipeline used the same container image in multiple modules and the download was performed with multiple processes in parallel. This led to bug #3285 and an issue with the release PR of fastquorum, that is preventable by limiting the number of processes to 1.

This PR (Patch2) now fixes said bug. Additionally, it adds basic support for native Singularity images specified with oras:// paths, such that the Seqera Container http:// paths can be phased out again.

This is desirable, because for every Seqera Container http:// path module, the associated Docker image is downloaded/converted as well, but never used. For a medium sized pipeline, the tool therefore downloads a dozen or more useless Docker container images and since the conversion to Singularity is slow, this significantly bogs down nf-core download.

The new dedicated Seqera Container consolidation functionality is unfortunately flaky and may be too aggressive, in case multiple revisions of a pipeline are downloaded at the same time. While this is far from ideal, I prefer that issue over a pile of unused containers being downloaded each time. Only more verbose container URIs for Seqera Containers could prevent this from happening, but I think the changes to see that implemented are low.

Ultimately, we should either stick with Bioconda or implement the new container YAML format.

PR checklist

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md is updated
  • If you've fixed a bug or added code that should be tested, add tests!

Copy link

codecov bot commented Nov 26, 2024

Codecov Report

Attention: Patch coverage is 70.37037% with 8 lines in your changes missing coverage. Please review.

Project coverage is 75.67%. Comparing base (800fd8d) to head (d72667c).
Report is 18 commits behind head on dev.

Files with missing lines Patch % Lines
nf_core/pipelines/download.py 70.37% 8 Missing ⚠️
Additional details and impacted files

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mirpedrol mirpedrol added this to the 3.1 milestone Nov 27, 2024
@MatthiasZepper MatthiasZepper force-pushed the seqera_containers_patch2 branch from 2eca65c to 66ffaf1 Compare November 29, 2024 12:09
Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com>
@MatthiasZepper MatthiasZepper merged commit 3c17e81 into nf-core:dev Dec 2, 2024
88 checks passed
@MatthiasZepper MatthiasZepper deleted the seqera_containers_patch2 branch December 2, 2024 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants