Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline support for Wave in Nextflow #323

Open
ewels opened this issue Nov 7, 2023 · 6 comments
Open

Offline support for Wave in Nextflow #323

ewels opened this issue Nov 7, 2023 · 6 comments

Comments

@ewels
Copy link
Member

ewels commented Nov 7, 2023

Wave in Nextflow is beautifully simple - no need to define container URIs, just the conda package names and we get everything for free. However, for wide adoption (or at least, adoption in @nf-core), we need to support offline usage of pipelines.

For offline work, the process is typically as follows:

  • On an online system:
    • Download Nextflow + required plugins
    • Download pipeline + configs etc
    • Download container images
  • Transfer to an offline system
  • Run

This hinges on Nextflow checking the local container cache (eg. NXF_SINGULARITY_CACHE) for images before attempting to download them. Things like Singularity container filenames are predictable so it's easy for us to wrap download functionality into tooling like nf-core download and make sure that they are available.

However, this assumption breaks with Wave. Currently, Nextflow needs to reach out to the Wave service (online) to find out the container URI and resulting local cache filename. So without an internet connection, it doesn't know where to check locally.

As I see it, we have two options:

  • We give functionality to the Nextflow Wave plugin to be able to figure out container URIs within plugin logic, therefore working offline. This would mean that the container URIs could be built offline and everything would work.
    • Pros: Avoids pinging the Wave service by default when local caches are available. Less stress on the Wave service and more robust in case of downtime.
    • Cons: Potentially lots of work, some features such as auth strings will not work offline.
  • We put the onus on @nf-core instead, building functionality into nf-core download to write container URIs to a Nextflow config file, fetch the container images, and bundle this config with the pipeline somehow so that it works without further configuration by the users.
    • Pros: Likely nothing to do on the Wave / Nextflow side 👀
    • Cons: Less flexible and generic, (mostly) specific to nf-core
@edmundmiller
Copy link

edmundmiller commented Nov 7, 2023

I think nextflow inspect does that:

$ nextflow inspect main.nf -profile local

{
    "processes": [
        {
            "name": "r2_CELL_CYCLE_SCORING_AND_PCA",
            "container": "wave.seqera.io/wt/4fc019059a1f/wave/build:create_objects--c32b27bc3124db00"
        },
        ...

So we just hook nextflow inspect into nf-core download. When they're running `nf-core download, they should have an internet connection, right? Worse case we export the containers on release and commit the json updates to the repos!

@ewels
Copy link
Member Author

ewels commented Nov 7, 2023

Yeah exactly, that's essentially my option 2 - fetch the container URIs at the point of download (or release) and have an associated config file that specifies the container URIs.

It basically means that offline users won't be using Wave at all, it's just a regular Nextflow run with containers as usual, but maybe this is the best solution.. My main issue with it is that it forces people to use nf-core download.

@pditommaso
Copy link
Collaborator

I'm inclined to option 2 too. nextflow inspect command was made keeping this possibility in mind.

@edmundmiller
Copy link

It basically means that offline users won't be using Wave at all, it's just a regular Nextflow run with containers as usual, but maybe this is the best solution.. My main issue with it is that it forces people to use nf-core download.

Would users need to use wave at all, besides checking whether an image has been created? I was having that issue where it was returning the image name before it even got built (ie quay.io/nf-core/modules/bowtie:bowtie-1.3.0_samtools-1.16.1--82705d624eee2198). So it should be able to go out and look for that image(I'm guessing right now it's auth-ing with the repo through Tower Platform).

But if we could tweak the behavior slightly (it might already be this):

  1. Check if the image repo is public
  2. If the repo is private, auth through platform, and then try to download.

@edmundmiller
Copy link

What if we ran nextflow inspect in CI in the pipelines on release, and had a containers.json that got generated.

Every single commit wouldn't be reproducible, but the releases would be able to be nf-core downloadable.

I think that's a good compromise. It would vastly simplify the container downloading logic from nf-core download

@edmundmiller
Copy link

seqeralabs/nf-aggregate#43 Basically this 😆

edmundmiller added a commit to nf-core/modules that referenced this issue Nov 7, 2024
* build: Add wave

* build: Set strategy to dockerfile, conda then container

* refactor: Remove container

* build: Add a repo to push to

* ci(wave): Add wave build

https://github.com/nodejs/docker-node/blob/3c4fa6daf06a4786d202f2f610351837806a0380/.github/workflows/build-test.yml#L29

* ci(wave): Switch to all_changed_files

* ci(wave): Only look for envronment.ymls

* dummy: Change env

* ci(wave): Remove raw format

* ci(wave): Try a bunch of different things at once

* ci(wave): Remove redundant fromJson and wrap in an array

* ci(wave): I have no idea what I'm doing

* ci(wave): Wrap it

* ci(wave): Found an example

https://github.com/tj-actions/changed-files/blob/main/.github/workflows/matrix-test.yml

* ci(wave): Maybe quotes?

* ci(wave): That'll do it

* ci(wave): Fix wave install

* ci(wave): Hard code an image

* ci(wave): Add secrets

* feat: Try a different files structure

* ci(wave): First stab at building singularity images

* fixup! feat: Try a different files structure

* ci(wave): Add profile to matrix

* ci(wave): Give up on fancy substitution

* ci(wave): Add await

Co-authored-by: ewels <ewels@users.noreply.github.com>

* ci(wave): Switch to quay

* test(wave): Add freeze and update build repo

* refactor(wave): What happens if I add a container?

* refactor(wave): Have both bowtie modules use the same env

For the sake of demonstration

* test: Cut out using wave on tests

* refactor: What happens if we use the singularity one?

* refactor: Keep container directives for offline download

seqeralabs/wave#323

* feat: Try new singularity OCI setting

nextflow-io/nextflow@f5362a7

* build: Update container name

Guess #4327 broke that

* chore: Bump wave-cli version

* ci: Install runc

* ci: Switch to singularityhub action

nextflow-io/nextflow#4543

* ci: Install new singularity manually

Why that action trys to build from source, idk.

* ci: Install dependancies for singularity

* ci: runc => crun

* ci: Fix cgroup error

https://blog.misharov.pro/2021-05-16/systemd-github-actions

* ci: That'll do it

* ci: Remove Dockerfile

We'll have a seperate action for this I think

* ci: Update name

* ci: Push to the correct repos

* ci: Remove OCI stuff

* ci: Need a full URL

* ci: Fix // in container name

* ci: Remove push

Build once, renovate should bump the images automagically

* build: Add containers back

* ci: Add cache repos

Idk what this does exactly

* ci: Change registry name to use _

Because "build" is a api end point on quay.io.

So `bowtie/build` doesn't work.

Other plus is this matches the conda env name.

* build: / => _ in container name

* Try ociAutoPull

* chore: Add renovate comments to samtools

Just to trigger wave build

* test: Add ociAutoPull to nf-test

* ci: Bump wave version

* chore: Bump containers with new wave version

Not sure why that's happening...

* build: Update to use commity.wave.seqera.io

* ci: Bump wave-cli to 1.4.1

* ci: Try apptainer

* ci: Remove build-repo to see what happens

* build: Bump Nextflow version requirement

* fix: Get rid of the environment name?

Maybe this will get the auto generated tag?

* ci: Bump action versions

* ci: Try name-strategy tagPrefix

seqeralabs/wave-cli@269df0e

* ci: Remove singularity build for now

* ci: Try imageSuffix

* ci: Try none

* ci: What is the bowtie container name

* ci: Remove --name-strategy

* style: Add back in container elvis operator

* ci: Remove cache repo

* Revert "build: Bump Nextflow version requirement"

This reverts commit 69e1ea5.

* Revert "test: Add ociAutoPull to nf-test"

This reverts commit 5a3d546.

* test(#6505): Snapshot the versions contents, not the hash

* ci(#6505): Update version snapshot after building containers

* test(#6505): Attempt a topic channel with tests

askimed/nf-test#258

* chore: Bump to 1.5.0

* fix: Remove shard and filter on test bumping

* build: Bump images to match environment

* ci: Fix nf-test setup

* ci: Remove snapshot bumping

* build: Fix containers in bowtie

---------

Co-authored-by: ewels <ewels@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants