Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download: Refactor CLI commands and introduce --download-configs as well as --container-library. #2336

Merged
merged 14 commits into from
Jun 27, 2023
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,21 @@

- Introduce a `--tower` flag for `nf-core download` to obtain pipelines in an offline format suited for [seqeralabs® Nextflow Tower](https://cloud.tower.nf/) ([#2247](https://github.com/nf-core/tools/pull/2247)).
- Refactored the CLI for `--singularity-cache` in `nf-core download` from a flag to an argument. The prior options were renamed to `amend` (container images are only saved in the `$NXF_SINGULARITY_CACHEDIR`) and `copy` (a copy of the image is saved with the download). `remote` was newly introduced and allows to provide a table of contents of a remote cache via an additional argument `--singularity-cache-index` ([#2247](https://github.com/nf-core/tools/pull/2247)).
- Refactored the CLI parameters related to container images. Although downloading other images than those of the Singularity/Apptainer container system is not supported for the time being, a generic name for the parameters seemed preferable. So the new parameter `--singularity-cache-index` introduced in [#2247](https://github.com/nf-core/tools/pull/2247) has been renamed to `--container-cache-index` prior to release ([#2336](https://github.com/nf-core/tools/pull/2336)).
- To address issue [#2311](https://github.com/nf-core/tools/issues/2311), a new parameter `--container-library` was created allowing to specify the container library (registry) from which container images in OCI format (Docker) should be pulled ([#2336](https://github.com/nf-core/tools/pull/2336)).

#### Updated CLI parameters

| Old parameter | New parameter |
| --------------------- | ---------------------------------------------- |
| new parameter | `-d` / `--download-configuration` |
| new parameter | `-t` / `--tower` |
| `-c`/ `--container` | `-s` / `--container-system <VALUE>` |
| new parameter | `-l` / `--container-library <VALUE>` |
| `--singularity-cache` | `-u` / `--container-cache-utilisation <VALUE>` |
| new parameter | `-i` / `--container-cache-index <VALUE>` |

_In addition, `-r` / `--revision` has been changed to a parameter that can be provided multiple times so several revisions can be downloaded at once._

### Linting

Expand Down
26 changes: 15 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -343,17 +343,19 @@ You can run the pipeline by simply providing the directory path for the `workflo
nextflow run /path/to/download/nf-core-rnaseq-dev/workflow/ --input mydata.csv --outdir results # usual parameters here
```

> Note that if you downloaded singularity images, you will need to use `-profile singularity` or have it enabled in your config file.
> Note that if you downloaded Singularity container images, you will need to use `-profile singularity` or have it enabled in your config file.

### Downloaded nf-core configs

The pipeline files are automatically updated (`params.custom_config_base` is set to `../configs`), so that the local copy of institutional configs are available when running the pipeline.
So using `-profile <NAME>` should work if available within [nf-core/configs](https://github.com/nf-core/configs). This option is not available when downloading a pipeline for use with [Nextflow Tower](#adapting-downloads-to-nextflow-tower) because the application manages all configurations separately.
So using `-profile <NAME>` should work if available within [nf-core/configs](https://github.com/nf-core/configs).

### Downloading singularity containers
> ⚠️ This option is not available when downloading a pipeline for use with [Nextflow Tower](#adapting-downloads-to-nextflow-tower) because the application manages all configurations separately.

If you're using Singularity, the `nf-core download` command can also fetch the required Singularity container images for you.
To do this, select `singularity` in the prompt or specify `--container singularity` in the command.
### Downloading Apptainer containers

If you're using [Singularity](https://apptainer.org) (Apptainer), the `nf-core download` command can also fetch the required container images for you.
To do this, select `singularity` in the prompt or specify `--container-system singularity` in the command.
Your archive / target output directory will then also include a separate folder `singularity-containers`.

The downloaded workflow files are again edited to add the following line to the end of the pipeline's `nextflow.config` file:
Expand All @@ -372,9 +374,9 @@ We highly recommend setting the `$NXF_SINGULARITY_CACHEDIR` environment variable
If found, the tool will fetch the Singularity images to this directory first before copying to the target output archive / directory.
Any images previously fetched will be found there and copied directly - this includes images that may be shared with other pipelines or previous pipeline version downloads or download attempts.

If you are running the download on the same system where you will be running the pipeline (eg. a shared filesystem where Nextflow won't have an internet connection at a later date), you can choose to _only_ use the cache via a prompt or cli options `--singularity-cache amend`. This instructs `nf-core download` to fetch all Singularity images to the `$NXF_SINGULARITY_CACHEDIR` directory but does _not_ copy them to the workflow archive / directory. The workflow config file is _not_ edited. This means that when you later run the workflow, Nextflow will just use the cache folder directly.
If you are running the download on the same system where you will be running the pipeline (eg. a shared filesystem where Nextflow won't have an internet connection at a later date), you can choose to _only_ use the cache via a prompt or cli options `--container-cache-utilisation amend`. This instructs `nf-core download` to fetch all Singularity images to the `$NXF_SINGULARITY_CACHEDIR` directory but does _not_ copy them to the workflow archive / directory. The workflow config file is _not_ edited. This means that when you later run the workflow, Nextflow will just use the cache folder directly.

If you are downloading a workflow for a different system, you can provide information about its image cache to `nf-core download`. To avoid unnecessary container image downloads, choose `--singularity-cache remote` and provide a list of already available images as plain text file to `--singularity-cache-index my_list_of_remotely_available_images.txt`. To generate this list on the remote system, run `find $NXF_SINGULARITY_CACHEDIR -name "*.img" > my_list_of_remotely_available_images.txt`. The tool will then only download and copy images into your output directory, which are missing on the remote system.
If you are downloading a workflow for a different system, you can provide information about the contents of its image cache to `nf-core download`. To avoid unnecessary container image downloads, choose `--container-cache-utilisation remote` and provide a list of already available images as plain text file to `--container-cache-index my_list_of_remotely_available_images.txt`. To generate this list on the remote system, run `find $NXF_SINGULARITY_CACHEDIR -name "*.img" > my_list_of_remotely_available_images.txt`. The tool will then only download and copy images into your output directory, which are missing on the remote system.

#### How the Singularity image downloads work

Expand All @@ -385,15 +387,15 @@ The Singularity image download finds containers using two methods:
2. It scrapes any files it finds with a `.nf` file extension in the workflow `modules` directory for lines
that look like `container = "xxx"`. This is the typical method for DSL2 pipelines, which have one container per process.

Some DSL2 modules have container addresses for docker (eg. `biocontainers/fastqc:0.11.9--0`) and also URLs for direct downloads of a Singularity continaer (eg. `https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0`).
Some DSL2 modules have container addresses for docker (eg. `biocontainers/fastqc:0.11.9--0`) and also URLs for direct downloads of a Singularity container (eg. `https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0`).
Where both are found, the download URL is preferred.

Once a full list of containers is found, they are processed in the following order:

1. If the target image already exists, nothing is done (eg. with `$NXF_SINGULARITY_CACHEDIR` and `--singularity-cache amend` specified)
2. If found in `$NXF_SINGULARITY_CACHEDIR` and `--singularity-cache copy` is specified, they are copied to the output directory
1. If the target image already exists, nothing is done (eg. with `$NXF_SINGULARITY_CACHEDIR` and `--container-cache-utilisation amend` specified)
2. If found in `$NXF_SINGULARITY_CACHEDIR` and `--container-cache-utilisation copy` is specified, they are copied to the output directory
3. If they start with `http` they are downloaded directly within Python (default 4 at a time, you can customise this with `--parallel-downloads`)
4. If they look like a Docker image name, they are fetched using a `singularity pull` command
4. If they look like a Docker image name, they are fetched using a `singularity pull` command. Choose the container libraries (registries) queried by providing one or multiple `--container-library` parameter(s). For example, if you call `nf-core download` with `-l quay.io -l ghcr.io -l docker.io`, every image will be pulled from `quay.io` unless an error is encountered. Subsequently, `ghcr.io` and then `docker.io` will be queried for any image that has failed before.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a cool idea. You can also try to read this from /etc/containers/registries.conf if you really wanted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am open to improvements, but I felt that reading various config files (e.g. also the various Nextflow configs) was overengineering considering how desperately the next tools release is anticipated? If I recall correctly, it was you who rightfully pointed out via personal message that I was not really focussing enough on the essentials of that feature?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I agree that's not important! I was just adding a comment because I thought you might be interested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am interested for sure, but ironically I have neither a /etc/containers/registries.conf nor a $HOME/.config/containers/registries.conf on my system. I presume that this is GNU/Linux specific?

- This requires Singularity/Apptainer to be installed on the system and is substantially slower

Note that compressing many GBs of binary files can be slow, so specifying `--compress none` is recommended when downloading Singularity images that are copied to the output directory.
Expand All @@ -406,6 +408,8 @@ If the download speeds are much slower than your internet connection is capable

Subsequently, the `*.git` folder can be moved to it's final destination and linked with a pipeline in _Tower_ using the `file:/` prefix.

> 💡 Also without access to Tower, pipelines downloaded with the `--tower` flag can be run: `nextflow run -r 2.5 file:/path/to/pipelinedownload.git`. Downloads in this format allow you to include multiple revisions of a pipeline in a single file, but require that the revision (e.g. `-r 2.5`) is always explicitly specified.

## Pipeline software licences

Sometimes it's useful to see the software licences of the tools used in a pipeline.
Expand Down
59 changes: 48 additions & 11 deletions nf_core/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import rich_click as click

from nf_core import __version__
from nf_core.download import DownloadError
from nf_core.modules.modules_repo import NF_CORE_MODULES_REMOTE
from nf_core.utils import check_if_outdated, rich_force_colors, setup_nfcore_dir

Expand Down Expand Up @@ -68,6 +69,20 @@
rich.traceback.install(console=stderr, width=200, word_wrap=True, extra_lines=1)


# Define exceptions for which no traceback should be printed,
# because they are actually preliminary, but intended program terminations.
# (Custom exceptions are cleaner than `sys.exit(1)`, which we used before)
def selective_traceback_hook(exctype, value, traceback):
if exctype in {DownloadError}: # extend set as needed
log.error(value)
else:
# print the colored traceback for all other exceptions with rich as usual
stderr.print(rich.traceback.Traceback.from_exception(exctype, value, traceback))


sys.excepthook = selective_traceback_hook


def run_nf_core():
# print nf-core header if environment variable is not set
if os.environ.get("_NF_CORE_COMPLETE") is None:
Expand Down Expand Up @@ -222,19 +237,37 @@ def launch(pipeline, id, revision, command_only, params_in, params_out, save_all
@click.option("-f", "--force", is_flag=True, default=False, help="Overwrite existing files")
@click.option("-t", "--tower", is_flag=True, default=False, help="Download for seqeralabs® Nextflow Tower")
@click.option(
"-c", "--container", type=click.Choice(["none", "singularity"]), help="Download software container images"
"-d",
"--download-configuration",
is_flag=True,
default=False,
help="Include configuration profiles in download. Not available with `--tower`",
)
# -c changed to -s for consistency with other --container arguments, where it is always the first letter of the last word.
# Also -c might be used instead of -d for config in a later release, but reusing params for different options in two subsequent releases might be too error-prone.
@click.option(
"-s",
"--singularity-cache",
"--container-system",
type=click.Choice(["none", "singularity"]),
help="Download container images of required software.",
)
@click.option(
"-l",
"--container-library",
multiple=True,
help="Container registry/library or mirror to pull images from.",
)
@click.option(
"-u",
"--container-cache-utilisation",
type=click.Choice(["amend", "copy", "remote"]),
help="Utilize the 'singularity.cacheDir' in the download process, if applicable.",
help="Utilise a `singularity.cacheDir` in the download process, if applicable.",
)
@click.option(
"-i",
"--singularity-cache-index",
"--container-cache-index",
type=str,
help="List of images already available in a remote 'singularity.cacheDir', imposes --singularity-cache=remote",
help="List of images already available in a remote `singularity.cacheDir`.",
)
@click.option("-p", "--parallel-downloads", type=int, default=4, help="Number of parallel image downloads")
def download(
Expand All @@ -244,9 +277,11 @@ def download(
compress,
force,
tower,
container,
singularity_cache,
singularity_cache_index,
download_configuration,
container_system,
container_library,
container_cache_utilisation,
container_cache_index,
parallel_downloads,
):
"""
Expand All @@ -264,9 +299,11 @@ def download(
compress,
force,
tower,
container,
singularity_cache,
singularity_cache_index,
download_configuration,
container_system,
container_library,
container_cache_utilisation,
container_cache_index,
parallel_downloads,
)
dl.download_workflow()
Expand Down
Loading