-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Container code #5250
Container code #5250
Conversation
43c02ae
to
ba21194
Compare
64a5f82
to
0016e90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my comments below.
@@ -68,11 +84,89 @@ def validate_label_uniqueness(ctx, _, value): | |||
return value | |||
|
|||
|
|||
ON_CONTAINER = OverridableOption( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if "on_container" should be a separate option. Maybe instead of having the "ON_COMPUTER" option we call it "CODE_LOCATION" and provide 3 choices: on a computer, on a container, local. Something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that these three cases are related; the downside of your suggestion is that we would need to break backwards compatibility.
At this stage I would rather vote for a separate option that is turned off by default.
__all__ = ('ContainerCode',) | ||
|
||
|
||
class ContainerCode(Code): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I am not convinced anymore whether creating a new ContainerCode
class is better than adding a few more argument to the standard Code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't know yet how exactly we want to represent all the container metadata, so I think going with a subclass is the right way for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I am not convinced anymore whether creating a new ContainerCode class is better than adding a few more argument to the standard Code
Would you mind sharing why? That route has been tried multiple times (local and remote for Code
, mesh and explicit for KpointsData
) and it always creates a lot of confusion, both in the code and for the user.
82575eb
to
2821540
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @unkcpz - most of this looks good to me, and is in line with what we discussed.
just a couple of comments, mostly on naming
aiida/orm/__init__.py
Outdated
@@ -47,6 +47,7 @@ | |||
'Comment', | |||
'Computer', | |||
'ComputerEntityLoader', | |||
'ContainerCode', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative name: Container
and ContainerizedCode
I don't have strong preferences for any of these options though, also ContainerCode
is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I'd go for ContainerCode: it makes it clear it's a Code (actually it's a subclass) and it's a bit shorter than ContainerizedCode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would personally definitely keep the Code
suffix. Also I think ContainerizedCode
is more "correct" and therefore for me clearer. I usually don't mind names that are a bit longer if that makes them clearer. But I could also live with ContainerCode
if others prefer this.
def __init__( | ||
self, | ||
computer=None, | ||
cmd_tmpl=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think abbreviations are necessary here.
command_template
is perfectly fine and easier to read
|
||
self.set_attribute('container_exec_path', container_exec_path) | ||
|
||
def get_container_exec_path(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose container_command
, in analogy to https://docs.docker.com/engine/reference/run/
__all__ = ('ContainerCode',) | ||
|
||
|
||
class ContainerCode(Code): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't know yet how exactly we want to represent all the container metadata, so I think going with a subclass is the right way for now.
CONTAINER_CMDLINE_TMPL = OverridableOption( | ||
'--container-cmdline-tmpl', | ||
prompt='container cmdline params tmpl', | ||
default='sarus run --mount=src=$PWD,dst=/workir,type=bind --workdir /workdir {image}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this should default to singularity instead, since it is way more popular.
I would perhaps call this container-engine-command
(I don't think "template" needs to be part of the name; we also don't call the work directory "template")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there is a typo! the first time you say /workir instead of /workdir
ON_COMPUTER = OverridableOption( | ||
'--on-computer/--store-in-db', | ||
is_eager=False, | ||
default=True, | ||
cls=InteractiveOption, | ||
required_fn=is_not_on_container, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm... why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also originally thinking that the three options are mutually exclusive, but you're right - we could have containerised code that runs both 'on computer' or 'in-db'. @unkcpz did you test the two cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this in-db
? it is the other way to say for the local code, no? So either we on_computer or 'not_on_computer' which is local (in-db?) or we on_container?
aiida/common/escaping.py
Outdated
@@ -40,6 +40,10 @@ def escape_for_bash(str_to_escape): | |||
|
|||
str_to_escape = str(str_to_escape) | |||
|
|||
# dont espace when string contain $ symbol, usually this is for ENV variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this fix is way too far-reaching; it will affect any string that contains a dollar sign, not just the one we are looking at.
Can you provide a link to the issue that @giovannipizzi mentioned where they discuss the pros/cons of using single/double quotes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is too much...
Also note the spelling dont
-> don't
, espace
-> escape
The issue is #4836
Shall we add an optional parameter to escape_for_bash
, use_double_quotes=False
? If True, one should do (untested)
escaped_quotes = str_to_escape.replace('"', '''"'"'"''')
return f'"{escaped_quotes}"'
(obvious, no? :-P)
Then, the question however becomes: "who decides which quotes to use?"
Option 1: the new string container-engine-command
is escaped with double quotes, the rest with single quotes (backward compatibility). However, this does not address #4836 and it will be complex for users to remember which is quoted how.
Option 2: we also ask one more thing to users: use double quotes instead of single quotes for escaping
(default false). Is this too much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would vote for option 2. But this prompt setup question should be in the computer setup stage just after the scheduler setting there, I assume.
aiida/orm/utils/loaders.py
Outdated
return ContainerCode | ||
|
||
@classmethod | ||
def _get_query_builder_label_identifier(cls, identifier, classes, operator='==', project='*'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you changing something in this method?
In a quick browse I didn't see anything specific to the container code (but I might be wrong).
You might be able to subclass CodeEntityLoader
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described above, if we simply make the entry point core.code.containerized
then load_code
can be used and none of this extra code is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @unkcpz - most of this looks good to me, and is in line with what we discussed.
just a couple of comments, mostly on naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional comments to be taken into account.
One important thing: before merging it would be important to test this (both ContainerCode in mode ON_COMPUTER and in mode STORE_AND_UPLOAD) to check it actually works. In can give you a path of a pw code I already prepared in sarus on eiger, if you want to try. You could prepare a simple container with python in it and check that you can upload a python script and run it in mode STORE_AND_UPLOAD.
Actually, it would be good to add integration tests where we actually run one of these codes among the tests? Using docker
without MPI as an example? I feel (but I'm not sure) that there should be already examples of this for the add
code, and docker should be already available in the GitHub actions.
CONTAINER_CMDLINE_TMPL = OverridableOption( | ||
'--container-cmdline-tmpl', | ||
prompt='container cmdline params tmpl', | ||
default='sarus run --mount=src=$PWD,dst=/workir,type=bind --workdir /workdir {image}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there is a typo! the first time you say /workir instead of /workdir
ON_COMPUTER = OverridableOption( | ||
'--on-computer/--store-in-db', | ||
is_eager=False, | ||
default=True, | ||
cls=InteractiveOption, | ||
required_fn=is_not_on_container, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also originally thinking that the three options are mutually exclusive, but you're right - we could have containerised code that runs both 'on computer' or 'in-db'. @unkcpz did you test the two cases?
aiida/common/escaping.py
Outdated
@@ -40,6 +40,10 @@ def escape_for_bash(str_to_escape): | |||
|
|||
str_to_escape = str(str_to_escape) | |||
|
|||
# dont espace when string contain $ symbol, usually this is for ENV variable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is too much...
Also note the spelling dont
-> don't
, espace
-> escape
The issue is #4836
Shall we add an optional parameter to escape_for_bash
, use_double_quotes=False
? If True, one should do (untested)
escaped_quotes = str_to_escape.replace('"', '''"'"'"''')
return f'"{escaped_quotes}"'
(obvious, no? :-P)
Then, the question however becomes: "who decides which quotes to use?"
Option 1: the new string container-engine-command
is escaped with double quotes, the rest with single quotes (backward compatibility). However, this does not address #4836 and it will be complex for users to remember which is quoted how.
Option 2: we also ask one more thing to users: use double quotes instead of single quotes for escaping
(default false). Is this too much?
aiida/engine/daemon/execmanager.py
Outdated
@@ -86,7 +87,7 @@ def upload_calculation( | |||
computer = node.computer | |||
|
|||
codes_info = calc_info.codes_info | |||
input_codes = [load_node(_.code_uuid, sub_classes=(Code,)) for _ in codes_info] | |||
input_codes = [load_node(_.code_uuid, sub_classes=(Code, ContainerCode)) for _ in codes_info] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really needed since ContainerCode is a subclass? (maybe it is, I didn't check the underlying code). Same question in at least another point below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether it is a sub-class in Python doesn't matter. What matters is the structuring of the entry points, because that is what the query builder uses. So as long as Code = core.code
and ContainerCode = core.code.container
, then querying for all Code
's will also match all ContainerCode
's, unless subclassing
is explicitly turned off. Note that the load_node
call here should just be replaced with load_code
which is the standard for querying for codes.
@@ -710,6 +711,12 @@ def presubmit(self, folder: Folder) -> CalcInfo: | |||
this_argv = [this_code.get_execname() | |||
] + (code_info.cmdline_params if code_info.cmdline_params is not None else []) | |||
|
|||
# set this_argv only for container code | |||
if isinstance(this_code, ContainerCode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is completely overriding the behaviour above? I.e., this e.g. is ignoring the behaviour of this_withmpi
and not putting the mpi_args
, that I think is incorrect?
"""Get image name""" | ||
return self.get_attribute('image', '') | ||
|
||
def container_cmd_params(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's check Leo's suggestion for the name, above (I liked it). In any case, let's use the same name for the function and the attribute, to avoid confusion in users when they call it via the API or when they query for it via the QueryBuilder. Finally, for consistency, let's add get_
in front. So, e.g., container_engine_command
for the attribute and get_container_engine_command
for the method
@@ -20,6 +20,7 @@ def test_escape_for_bash(): | |||
["string with a ' single quote", """'string with a '"'"' single quote'"""], | |||
[1, "'1'"], | |||
[2.0, "'2.0'"], | |||
['$PWD', '$PWD'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be adapted according to my comment above (and testing both single-quoted escaping, i.e. now, and double-quoted escaping - essentially copying the tests an swapping all single quote with double quotes and vice versa
tests/engine/processes/calcjobs/test_calc_job/test_container_code.sh
Outdated
Show resolved
Hide resolved
There are already quite a few reviews, but I added @sphuber if he wants to have an eye (not critical if he doesn't have time) since there are quite a few things to be careful about (entry point names, subclassing, ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @unkcpz . I have given this a quick read, but I haven't really gone into depth of the ContainerCode
implementation for now, since I haven't been present during the meetings. So I trust the implementation is ok.
The main problem that I see is that we are making the already complex verdi code setup
even more complex. The fact that this endpoint is used for three different classes feels a bit problematic. We have already had plenty of problems with the command merely supporting local and remote codes, but I feel this will increase the complexity signifcantly. Now I am not sure if we can or should address this now, but I think that in the future an approach more in the sense of the transport commands, where each plugin can provide their own CLI subcommand, would be the way to go. That solution is not without its own problems for sure, but I think that will be better in the end.
aiida/orm/__init__.py
Outdated
@@ -47,6 +47,7 @@ | |||
'Comment', | |||
'Computer', | |||
'ComputerEntityLoader', | |||
'ContainerCode', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would personally definitely keep the Code
suffix. Also I think ContainerizedCode
is more "correct" and therefore for me clearer. I usually don't mind names that are a bit longer if that makes them clearer. But I could also live with ContainerCode
if others prefer this.
__all__ = ('ContainerCode',) | ||
|
||
|
||
class ContainerCode(Code): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I am not convinced anymore whether creating a new ContainerCode class is better than adding a few more argument to the standard Code
Would you mind sharing why? That route has been tried multiple times (local and remote for Code
, mesh and explicit for KpointsData
) and it always creates a lot of confusion, both in the code and for the user.
@@ -222,3 +236,4 @@ def is_local(self): | |||
class CodeType(enum.Enum): | |||
STORE_AND_UPLOAD = 'store in the db and upload' | |||
ON_COMPUTER = 'on computer' | |||
ON_CONTAINER = 'on container' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ON_CONTAINER = 'on container' | |
ON_CONTAINER = 'on container' |
Would go with CONTAINERIZED
or CONTAINER
here.
aiida/orm/utils/loaders.py
Outdated
return ContainerCode | ||
|
||
@classmethod | ||
def _get_query_builder_label_identifier(cls, identifier, classes, operator='==', project='*'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described above, if we simply make the entry point core.code.containerized
then load_code
can be used and none of this extra code is needed
@unkcpz What is the status of this? I guess the 2.0 release candidate is due by Jan 15th, would be good to get this in? |
Thanks for headup @ltalirz, I am cleaning up the commits and working on this today. |
7df6adb
to
15de2d6
Compare
Got the error in CI https://github.com/aiidateam/aiida-core/runs/4765450356?check_suite_focus=true:
It only appears in this branch, not sure which file change causing this. I can reproduce it in my local machine. EDIT: By rollback |
Another issue that block me to finish this PR is that to using docker as an example to configure the
But
This also happened if we want (and we should) to keep on using the I am thinking maybe it is always safe to put the command without containized code after
When switch to containerized code we have:
|
c7bfe0a
to
52bbeab
Compare
I think you are correct that you will need extra quotes around the command that is to be run inside the container. You don't necessarily need to remove the quotes around the individual components - one could also escape them as in
if that's easier (isn't there a way in AiiDA to do this automatically?) |
Hm... upon closer look: what about the |
Thanks @ltalirz. Yes, I saw that, there are ways to pass variables instead of using redirections
However, if I understand it correctly, it is defined in the calcjob plugins how to pass the input, either use |
Yes, sorry, forget what I said (I was a bit tired already ;-) ). I think it is the right approach to have the directory mounted and, from thereon, make sure that the stdout/stderr redirection to files happens inside the container. |
for more information, see https://pre-commit.ci
use `container-engine-command`, `is_containerized`, `container_command`, `ContainerizedCode`.
ef0454d
to
8ffb104
Compare
7a87387
to
5bf1a9b
Compare
for more information, see https://pre-commit.ci
@unkcpz are you still waiting for something in particular with this PR? |
Yes, I am waiting for #5350 get merged. More commits are now based on this PR and in my forked repo. |
Hi @ltalirz, thanks for the head up. I have other stuff in my hands and already have containerized code PR to finish in my schedule but after Thursday this week. Will take some time to focus on this then. |
@sphuber yes. I close it. Thanks for head up. |
There are the following major changes in this PR:
$
contained in the string, it will not be escaped in single quotes.Since the container set is now included in the setting of the container code, which I think for the production using this should be separated, the implementation of the
verdi code setup --on-container
interface is a bit of a mess, all the options are in thecode setup
subcommand.