-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AiiDA configuration file sometimes gets erased. #4604
Comments
The problem is this line, which results in the configuration file being overwritten whenever it is successfully read using
This change was introduced in 2f9224a (February 2020). @greschd asked whether the addition of If it was necessary for tests to pass, perhaps the tests need to be amended? |
Might be related to #4523, insofar as that can lead to "inconsistent" config states - and maybe that leads to storing it over and over again (just a hunch, could be wrong). |
It looks like it only needs to be stored if |
No the reason is that the line before it migrates the config if necessary, but only in memory. The file on disk will remain in its old form. The line was added to persist the changes to disk. Otherwise the migration would be performed each time the config file was read. What we should do instead is to have the writing to disk only happen if an actual migration was performed. I am not convinced however that this is the source of the bug described by @yakutovicha . The problem you describe is merely the explanation for why the config file is written each time it is read. This doesn't necessarily explain how it can be accidentally deleted. |
I thinks its quite probably the source of the bug. As already mentioned, it may be something in Additionally, this aiida-core/aiida/manage/configuration/config.py Lines 391 to 394 in 02248cf
should be changed to if md5_from_filelike(handle) != md5_file(self.filepath):
self._backup(self.filepath)
shutil.copy(handle.name, self.filepath) i.e. you only need to perform the copy if the hashes are different |
Ok, could you prepare a PR for this? |
While this is technically true, people might expect the file to be written if they call |
I have opened #4605 to make sure the config only gets written to disk after loading if a migration was performed. This solves one problem, but like I said before, this is not the actual cause of the bug reported here. Of course having many more unnecessary writings of the file increases the chance of the bug happening but the actual cause has to be sought elsewhere. A good candidate is the use of |
@yakutovicha Could you perhaps clarify what "erased" means here? If it is the latter, the problem might instead be in the rather broad aiida-core/aiida/manage/configuration/config.py Lines 47 to 52 in 02248cf
P.S. We might need an "exponential backoff mechanism" for reading the config file ;-D |
I wonder why you think this is a broad exception being caught? This is the standard way of catching a file that does not exist or cannot be read. Do you mean that instead of also catching the latter case, we should only be checking if the file does not already exist? There we will have similar problems with multiple processes racing to be the first to create it. |
Maybe yes? |
Maybe it is indeed better to change this to |
Randomly I was just reading this |
Unless we really think this is necessary, I think it is worth to try and avoid having to make |
Yeh I wasn't suggesting making a dependency lol, it was just to note an example of how others have tackled this atomicity |
I see, thanks for the link 👍 Taking a look at the implementation does make an important point though: doing this properly across-platforms is not trivial. So it does not make sense to pretend we can implement a quick version ourselves that covers all corner cases. The question is what level of security we need though. I was thinking of the simple following algorithm for writing the content of
I think this hopefully would safeguard us from many but not all potential problems. If the current writing caused the bug reported here, this change may solve that, but I don't think we can say that for sure. |
I have opened a PR linked above that makes the writing to disk of the configuration file more atomic. As mentioned, I am not sure that this solves this bug. @yakutovicha how often did the bug occur? Would it be feasible to use this branch in your environment to see if it still happens? Proving a negative is difficult but I am not sure how else to test this or to confirm whether this PR can close this issue. |
@csadorf once had it. Additionally, when running a course for 14 groups (14 AiiDAlab accounts) I had two cases in 4 days. This is all I can say now. @sphuber I am not sure you've already taken care of it, but just in case: did you make sure the timestamp doesn't change when you run |
Yes, this was already fixed in PR #4605 and will be released with @yakutovicha could you still answer the question what you mean with "erased": is the entire file removed or is the content simply partially or fully removed? |
ah, sorry, missed that question. In my case, the file remained there, but it was empty. |
Oh that certainly makes sense that it could be shutil.copy then (overwriting rather than completely deleting) @ramirezfranciscof was this the same for you? |
And just to be sure: when you say "empty" do you mean "completely empty" or "a configuration file with no configured profiles"? |
extremely empty. |
This was potentially addressed by #4607 so closing for now. Please reopen if the problem persists |
Describe the bug
The aiida configuration file sometimes gets erased when the machine where AiiDA is running is overloaded. More specifically, if one calls
verdi
and then, while it is running, stops it using theCtrl+C
there is a high risk of losing the config file.Additional note: every time the
verdi
command is called, the config file timestamp changes, which suggests that the file gets overwritten.Expected behaviour
The
verdi
command should not change the config file unless it isverdi config
or a version update.The text was updated successfully, but these errors were encountered: