-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download: Improve container detection in configs, specifically the container parameter in process scope #2346
Download: Improve container detection in configs, specifically the container parameter in process scope #2346
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #2346 +/- ##
==========================================
+ Coverage 72.69% 72.76% +0.06%
==========================================
Files 78 78
Lines 8864 8889 +25
==========================================
+ Hits 6444 6468 +24
- Misses 2420 2421 +1
|
…r cleaning any container declarations from the config file.
nf_core/download.py
Outdated
with open(file_path, "r") as fh: | ||
# Look for any lines with `container = "xxx"` | ||
search_space = fh.read() | ||
module_container = re.findall(r"container\s*[\"\']([^\"]*)[\"\']", search_space, re.S) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would adding the newline solve #2338 ?
module_container = re.findall(r"container\s*[\"\']([^\"]*)[\"\']", search_space, re.S) | |
module_container = re.findall(r"container\s*[\"\']([^\"]*)[\"\']\n", search_space, re.S) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that I haven't tested the full code, only this regex with an example module :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, not because there are modules where the DSL2 is wrapped into curly braces, so the quote is followed by a closing bracket. There are also modules with escaped quotes inside unescaped quotes that need to be matched.
After many hours, I realized that capturing the quote and using exactly this instead of an either or notation [\"\']
is the only thing that prevented abridged matches.
container_regex = re.compile(
r"container\s+[\s{}=$]*(?P<quote>[\'\"])(?P<param>(?:.(?!\1))*.?)\1[\s}]*",
re.DOTALL
)
local_module_findings = re.findall(container_regex, search_space)
I have now tested about 20+ pipelines in multiple revisions, and it seems that will work for all modules and configs in there. There are still some containers not picked up (e.g. in Sarek 2.5), but this is not due to the regex, but due to the downstream logic how to prioritize https:// direct download container addresses over those that would need to be pulled with Singularity.
Maybe we should in the long-term either have a nf-core container API or include a small sqlite-Database with curated container addresses for all revisions into tools?
I've tested downloading differentialabundance locally, and it seems to work and download the overridden container 👍 |
…hits with escape sequences.
…ication needs improvements still.
c5dc899
to
ccbb131
Compare
…r improvement over previous performance, but still...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!!
When downloading a pipeline with
--container-system singularity
, tools will search for the respective containers at two different places:While the module parsing was quite advanced, the config parsing assumed plain container definitions, which doesn't hold true e.g. for differentialabundance 1.2.0:
Therefore, I have now broken up the function
find_container_images()
. The part for cleaning the raw DSL2 matches was moved to a new helper functionrectify_raw_container_matches()
that is now also applied to the config matches.Just a draft for now, tests etc. will be added tomorrow.
PR checklist
CHANGELOG.md
is updateddocs
is updated