-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the execmanager.retrieve_calculation
idempotent'ish
#3142
Make the execmanager.retrieve_calculation
idempotent'ish
#3142
Conversation
The `retrieve_calculation` would cause an exception if called multiple times for the same calculations, which can happen if the first time that the runner was working on it got interrupted, for example due to a daemon shutdown. The reason is that the second time around the adding of the `retrieved` folder data node will raise a uniqueness exception, because there can only be one output with the same label. Note that full idempotency is impossible, but this change should make the problem a lot less likely to occurr. The idea is to delay the actual attaching of the retrieved folder data node to the last moment possible. This way, if the method is called again and the folder is already there, we can be reasonably sure that the files were already retrieved succesfully and we simply return, leaving the call a no-op. This is done in the beginning of the function to check if the output node already exists using the `LinkManager.first()` call. If the node exists, the retrieve function has apparently already been called before and reached the end of the function where it adds the retrieved folder. This means all the files were already successfully retrieved so we can safely skip. The `LinkManager.first()` method is adapted to, instead of raising a `ValueError` when no entry is found at all, simply return `None`. This is more consistent with the behavior of `QueryBuilder.first()`.
19e3e84
to
81de7b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - ideally, the final solution would be to make sure the add_incoming an atomic operation (wrapping it in a transaction?). I think we can approve this, maybe can you add a note to check/improve this in the issue where we should address atomicity, transactions etc?
I also agree with returning None
in first()
, this is consistence with SQLAlchemy's first()
method vs. one()
that instead raises
…#3142) The `retrieve_calculation` would cause an exception if called multiple times for the same calculations, which can happen if the first time that the runner was working on it got interrupted, for example due to a daemon shutdown. The reason is that the second time around the adding of the `retrieved` folder data node will raise a uniqueness exception, because there can only be one output with the same label. Note that full idempotency is impossible, but this change should make the problem a lot less likely to occur. The idea is to delay the actual attaching of the retrieved folder data node to the last moment possible. This way, if the method is called again and the folder is already there, we can be reasonably sure that the files were already retrieved successfully and we simply return, leaving the call a no-op. This is done in the beginning of the function to check if the output node already exists using the `LinkManager.first()` call. If the node exists, the retrieve function has apparently already been called before and reached the end of the function where it adds the retrieved folder. This means all the files were already successfully retrieved so we can safely skip. The `LinkManager.first()` method is adapted to, instead of raising a `ValueError` when no entry is found at all, simply return `None`. This is more consistent with the behavior of `QueryBuilder.first()`.
…#3142) The `retrieve_calculation` would cause an exception if called multiple times for the same calculations, which can happen if the first time that the runner was working on it got interrupted, for example due to a daemon shutdown. The reason is that the second time around the adding of the `retrieved` folder data node will raise a uniqueness exception, because there can only be one output with the same label. Note that full idempotency is impossible, but this change should make the problem a lot less likely to occur. The idea is to delay the actual attaching of the retrieved folder data node to the last moment possible. This way, if the method is called again and the folder is already there, we can be reasonably sure that the files were already retrieved successfully and we simply return, leaving the call a no-op. This is done in the beginning of the function to check if the output node already exists using the `LinkManager.first()` call. If the node exists, the retrieve function has apparently already been called before and reached the end of the function where it adds the retrieved folder. This means all the files were already successfully retrieved so we can safely skip. The `LinkManager.first()` method is adapted to, instead of raising a `ValueError` when no entry is found at all, simply return `None`. This is more consistent with the behavior of `QueryBuilder.first()`.
Fixes #3141
The
retrieve_calculation
would cause an exception if called multipletimes for the same calculations, which can happen if the first time that
the runner was working on it got interrupted, for example due to a
daemon shutdown. The reason is that the second time around the adding of
the
retrieved
folder data node will raise a uniqueness exception,because there can only be one output with the same label.
Note that full idem-potency is impossible, but this change should make
the problem a lot less likely to occur. The idea is to delay the actual
attaching of the retrieved folder data node to the last moment possible.
This way, if the method is called again and the folder is already there,
we can be reasonably sure that the files were already retrieved
successfully and we simply return, leaving the call a no-op. This is done
in the beginning of the function to check if the output node already
exists using the
LinkManager.first()
call.The
LinkManager.first()
method is adapted to, instead of raising aValueError
when no entry is found at all, simply returnNone
. Thisis more consistent with the behavior of
QueryBuilder.first()
.