-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SandboxFolder
: remove option to create in profile repository
#5496
Conversation
326d0df
to
b4a4bb7
Compare
@giovannipizzi do you maybe still know what the original reasoning was to having the sandbox folder in a subpath of |
d835cf0
to
6228dfc
Compare
In general you want to ensure the sandbox is on the same filesystem (and /tmp is often in another one), as this can guarantee atomic move of the files once you want to transfer them from sandbox to actual repository. This was needed before, and is needed inside the disk-objectstore. For the sandbox of AiiDA, (possibly?) this might not be needed anymore (but I'm not sure). Note also that we need to check with method we use to move files from sandbox to repository - some that guarantee atomicity don't work (=fail) between different filesystems (I think it's os.rename, but I'd need to double check). Even if the limitation is not there, also note that people might put the repository on a large filesystem, while TMP might be on a different one (even in RAM, or a smaller FS). So I could imagine some rare but possible situations (I actually had one when trying to migrate an archive - I had to set an ENV VAR to change the location of the TMP folder) where there's not enough space on the TMP partition and the process fails (in this case adding large files to a node), even if there is a lot of empty space in the partition of the repository. So I'm a bit hesitant - maybe we should have an optional key for the tmp location, and by default if missing we keep the old (current) behaviour? |
This is no longer needed, because we no longer use an atomic move. The reason is that we no longer assume that the file repository interface can use move operations. The files are therefore always copied through streams.
Indeed it is
It is ok to keep it I guess, I just think it shouldn't be hijacking the |
I think you mean, you agree what I'd already said should happen in #5330 😉
No, it should be a configuration option on the profile, with a default, not hard-coded |
6228dfc
to
8070bd6
Compare
Fine by me. It was never configurable before, but I have added it. |
Cool cheers, the point is that it was always in an "accessible" place before, where you could inspect it if necessary, but now it will not be.
I'd note also, "cleaning" of sandbox folders is not particularly reliable, since it's only deleted on (python) garbage collection. So if the process doesn't stop cleanly it won't be removed. |
Note, the goal (in this PR or another) should be to fully remove from
and place it on the |
This was also my argument indeed to use that as a default. I personally have a lot of junk in the
Agree, happy to create a follow-up PR to do this. @chrisjsewell can you also review the code changes perhaps? |
For the junk in the sandbox - probably we should add the cleaning of the sandbox to the |
@giovannipizzi See #5514 |
Just commenting to mention that #5514 turned out to be more complex than anticipated, mainly because when doing the maintenance there is no easy way to offer safety for other profiles that use the same folder. You can check the issue for more details, but for this feature would we agree to move forward even without that functionality or is there anything that needs to be adapted? |
I think we should move ahead with this and leave the advanced functionality for #5514 . These changes are not making things worse. The sandbox folder is currently also not cleaned automatically or performed by some CLI command |
8070bd6
to
f6dc92b
Compare
The `SandboxFolder` by default would create the temporary directory in the `sandbox` subfolder of the repository directory, defined by the `repository_uri` key in the profile configuration. However, this folder path should mainly serve to designate the location of the file repository of the storage backend. So far the sandbox folder hijacking the same key wasn't a problem as there was only one storage backend implementation. However, when we allow other storage backend implementations, having to keep this config key just for the sandbox folder doesn't make sense. Additionally, the `SandboxFolder` can and should be a generic utility that is not tightly coupled to the concept of a `Profile`, as it was, by directly using it in the constructor. The original reason for storing the sandbox folder in the folder of the file repository is that this path is configurable and so users could define a location on file system with plenty space available or a filesystem that is in any other way more suited than the default temporary directory of the operating system. To keep the possibility for this configuration, the `SanboxFolder` constructor now takes the optional `filepath` argument. If defined, the sandbox folder will be created in this directory. By default, the folder will be created in the default temporary directory of the operating system. The `tempfile.mkdtemp` built in method is used for this purpose.
f6dc92b
to
c34d633
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is all good in general, just have a couple of minor comments / questions / checks.
"""Tests for `verdi import`.""" | ||
# pylint: disable=redefined-outer-name | ||
"""Tests for `verdi archive import`.""" | ||
from click.exceptions import BadParameter | ||
from click.testing import CliRunner | ||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referring to the whole file, not just these imports: Oh wow, github really goes haywire when you change a couple of indentations 😅. Why did you change this here? I don't see any modifications related to the sandbox folder feature, just the change from a class with an init
to methods using the run_cli_command
, am I missing anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is just a refactoring to use the pytest fixtures. I think that since I removed the class and just made them normal functions, the deindent makes almost every line seem changed. There are no functional changes though, just refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see. In the future maybe it would be good to separate the commits in which we do this kind of refactoring into a different PR, specially if it is not using the feature changed in the PR. I hope it doesn't sound to nagging, its just that it is a bit more mentally costly to be going over this without being clear on the type of changes I'm supposed to be looking for, more so since github is not properly identifying the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, it's just that I came across it and it was so tempting to refactor it as it made it so much more succinct. But I agree that it is probably best left to separate refactor PRs.
c34d633
to
38bb426
Compare
This config option can be used to define the directory into which all sandbox folders will be created. This can be useful if the default temporary directory of the operating system that would otherwise be used is not suited, for example because it is on a slower file system or it has insufficient empty space. The option is passed to the `SandboxFolder` constructor whenever it is used by AiiDA's internal code. Since the option requires the presence of a configuration, the retrieval of the option is added as high up in the API as possible. This is in order to not couple components to the configuration that should remain independent.
38bb426
to
fbd1c22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, @sphuber don't know if you were expecting another review, otherwise maybe set a deadline for anybody else to take a look or hold their peace.
Cheers @ramirezfranciscof . More reviews would have been great, but it has been open for almost three weeks with plenty of invitations and have had no response, so I will go ahead and merge this. |
Fixes #5494
The
SandboxFolder
by default would create the temporary directory inthe
sandbox
subfolder of the repository directory, defined by therepository_uri
key in the profile configuration.However, this folder path should mainly serve to designate the location
of the file repository of the storage backend. So far the sandbox folder
hijacking the same key wasn't a problem as there was only one storage
backend implementation. However, when we allow other storage backend
implementations, having to keep this config key just for the sandbox
folder doesn't make sense. It would be more logical to always create the
sandbox folders in the temporary directory created through the builtin
method
tempfile.mkdtemp
.