-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running several verdi run
processes independently crashes due to UniquenessError when creating the auto groups
#997
Comments
The problem is that @lekah was launching his calculations through a function that was called with The core problem is that the |
Probably the solution is to document that AiiDA is not 'thread-safe' |
@giovannipizzi do you have any suggestion on where to put this in the documentation? |
Mmm... I'm not sure... I think this goes in hand with all the discussion we had about checking all other thread/process-safety (e.g. when storing attributes, extras, ...). I don't remember if we said this can happen for the 1.0? Anyway, thinking back to this after a few months, I think that if we manage to fix the |
verdi run
processes independently crashes due to UniquenessError when creating the auto groups
Small script to create 30 nodes "quickly": #!/usr/bin/env runaiida
import os
import time
from datetime import datetime
from aiida.orm import Data
N = 30
current_pid = os.getpid()
print(current_pid)
nodes = []
for i in range(N):
time.sleep(0.1)
data = Data().store()
nodes.append(data.pk)
print(nodes) You can save the script above in a file (./spawn.py & ) ; ( ./spawn.py & ) so that they run in parallel. The second will crash with a traceback similar to the one above. |
This also remove an overzelous isinstance check, and moves additional checks in a cached function that is run only when storing the very first node (that needs to be put in an autogroup), making storing of nodes faster (even if times oscillates so it's hard to estimate exactly by how much). Also, added logic to allow for concurrent creation of multiple groups (and test). This fixes aiidateam#997
I found a case of a calculation going into SUBMISSIONFAILED with the following log-message:
*** 1739620 [Li27Na27C27O81-1000-branching-19-NVE-5-0]: SUBMISSIONFAILED
*** Scheduler output: N/A
*** Scheduler errors: N/A
*** 1 LOG MESSAGES:
+-> ERROR at 2017-12-17 20:29:05.260324+00:00
| Submission of calc 1739620 failed, check also the log file! Traceback: Traceback (most recent call last):
| File "/home/kahle/git/AiiDA/aiida_core-screening/aiida/daemon/execmanager.py", line 623, in submit_calc
| remotedata.store()
| File "/home/kahle/git/AiiDA/aiida_core-screening/aiida/orm/implementation/general/node.py", line 1571, in store
| name=group_name, type_string=VERDIAUTOGROUP_TYPE)[0]
| File "/home/kahle/git/AiiDA/aiida_core-screening/aiida/orm/implementation/general/group.py", line 149, in get_or_create
| bla = cls(*args, **kwargs).store(), True
| File "/home/kahle/git/AiiDA/aiida_core-screening/aiida/orm/implementation/django/group.py", line 127, in store
| raise UniquenessError("A group with the same name (and of the "
| UniquenessError: A group with the same name (and of the same type) already exists, unable to store
I suspect that the get_or_create method of the Group is not safe for multi-threaded/multi-processing environment, because the underlying implementation is not an 'INSERT IF' construct, but first a query, and then the storage, which can result in the Uniqueness being broken.
We need to make the get_or_create method threadsafe.
I also, what is the point of grouping submitted calculations? Is that a necessary feature?
The text was updated successfully, but these errors were encountered: