Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download: Improve container detection in configs, specifically the container parameter in process scope #2346

Merged

Conversation

MatthiasZepper
Copy link
Member

@MatthiasZepper MatthiasZepper commented Jun 28, 2023

When downloading a pipeline with --container-system singularity , tools will search for the respective containers at two different places:

  1. In the modules (DSL2 syntax)
  2. In the configs (Process scope)

While the module parsing was quite advanced, the config parsing assumed plain container definitions, which doesn't hold true e.g. for differentialabundance 1.2.0:

Therefore, I have now broken up the function find_container_images(). The part for cleaning the raw DSL2 matches was moved to a new helper function rectify_raw_container_matches() that is now also applied to the config matches.

Just a draft for now, tests etc. will be added tomorrow.

PR checklist

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md is updated
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

@MatthiasZepper MatthiasZepper self-assigned this Jun 28, 2023
@MatthiasZepper MatthiasZepper added this to the 2.9 milestone Jun 28, 2023
@MatthiasZepper MatthiasZepper added the bug Something isn't working label Jun 28, 2023
@codecov
Copy link

codecov bot commented Jun 28, 2023

Codecov Report

Merging #2346 (cdf620e) into dev (1d7c39c) will increase coverage by 0.06%.
The diff coverage is 89.85%.

@@            Coverage Diff             @@
##              dev    #2346      +/-   ##
==========================================
+ Coverage   72.69%   72.76%   +0.06%     
==========================================
  Files          78       78              
  Lines        8864     8889      +25     
==========================================
+ Hits         6444     6468      +24     
- Misses       2420     2421       +1     
Impacted Files Coverage Δ
nf_core/download.py 59.47% <89.85%> (+1.24%) ⬆️

…r cleaning any container declarations from the config file.
with open(file_path, "r") as fh:
# Look for any lines with `container = "xxx"`
search_space = fh.read()
module_container = re.findall(r"container\s*[\"\']([^\"]*)[\"\']", search_space, re.S)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would adding the newline solve #2338 ?

Suggested change
module_container = re.findall(r"container\s*[\"\']([^\"]*)[\"\']", search_space, re.S)
module_container = re.findall(r"container\s*[\"\']([^\"]*)[\"\']\n", search_space, re.S)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that I haven't tested the full code, only this regex with an example module :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, not because there are modules where the DSL2 is wrapped into curly braces, so the quote is followed by a closing bracket. There are also modules with escaped quotes inside unescaped quotes that need to be matched.

After many hours, I realized that capturing the quote and using exactly this instead of an either or notation [\"\'] is the only thing that prevented abridged matches.

 container_regex = re.compile(
                            r"container\s+[\s{}=$]*(?P<quote>[\'\"])(?P<param>(?:.(?!\1))*.?)\1[\s}]*",
                            re.DOTALL
                        )

local_module_findings = re.findall(container_regex, search_space)

I have now tested about 20+ pipelines in multiple revisions, and it seems that will work for all modules and configs in there. There are still some containers not picked up (e.g. in Sarek 2.5), but this is not due to the regex, but due to the downstream logic how to prioritize https:// direct download container addresses over those that would need to be pulled with Singularity.

Maybe we should in the long-term either have a nf-core container API or include a small sqlite-Database with curated container addresses for all revisions into tools?

@mirpedrol
Copy link
Member

I've tested downloading differentialabundance locally, and it seems to work and download the overridden container 👍

@MatthiasZepper MatthiasZepper force-pushed the ContainerErrorDifferentialAbundance branch from c5dc899 to ccbb131 Compare June 29, 2023 20:45
@MatthiasZepper MatthiasZepper marked this pull request as ready for review June 29, 2023 22:08
Copy link
Member

@mirpedrol mirpedrol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants