-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protocols: Move all static work chain inputs to protocol #902
Conversation
3eaed59
to
8ebeb75
Compare
I realize that this is always going to be opinion-based, but I actually feel the opposite is true. I find the 2-space indentation very confusing and makes it difficult to see what level of nesting I am looking at. Why would we decide that 4 spaces for code is correct, but not for the configuration? Anyway, since this distracts from this PR and also makes the actual changes to the configs more difficult to spot, could you please remove this for now? Happy to discuss this in a separate issue/PR and if others users also really prefer 2-space, then I won't stop it. Once the tests are then also running, I will review this. Thanks! |
8ebeb75
to
c9c3733
Compare
Sure! My preference for 2-space indentations may also be due to the fact that VScode adds vertical lines for the indentation in YAML files. The more compact 2-space indentations make it easier to see settings at a glance for me. But it's not critical.
Edit: Nevermind, that was kind of unnecessary anyways! |
c9c3733
to
4a61d66
Compare
My editor also shows the tab-lines, but with 2-spaces, it just all merges into one big blob of lines 😅 |
This may be related to the fact that you use such ridiculously small fonts that only your Dutch-elvish eyes can read 👀 |
b8e7a46
to
fe8938a
Compare
Alright, I also added the restart inputs explicitly, and switched to using |
Before going into the review, two points that came to mind:
|
Thanks for your thoughts, @sphuber!
That's definitely true. We should probably try to report such dynamically set inputs so the user can more easily find them.
Another good point! I would argue beginners are now immediately introduced to the I'm all for making it super duper clear that the using the
Are there? In the QE docs on the https://www.quantum-espresso.org/Doc/INPUT_PW.html#idm65 The
Haven't looked at these in the context of this PR, but happy to make it more "holistic", i.e. try to implement this principle downstream. Will give a crack at this tomorrow and report back! |
One additional thing to note here is that we should really try and come at this from the perspective of the user, and perhaps it would be good to write down as many use cases as we can think of. |
Ok, I'm on board. Let's go in that direction then that protocols should be as complete as possible and
As I understood it, this is to restart an interrupted NSCF calculation. But reading the docs again, it is a bit ambiguous, and thinking about it now, I am not sure if an NSCF calculation could be performed partially and shutdown cleanly and then restarted to perform the rest. Maybe it can actually do part of all the k-points and then write to disk, and you can restart to do the final set of k-points? Would be interesting to test.
Fully agree, but that is the tricky point here. For novice users, you want to do as much as possible in an automatic way, but also correct inputs that are most likely incorrect. This is where it becomes difficult though, because sometimes there are expert users who do something on purpose and there is nothing more frustrating than the code overriding your desired behavior and not allowing to change it. But in that sense, maybe the |
fe8938a
to
a341ff7
Compare
Well, I stand corrected, you actually can ^^. And this will indeed only work when I'm still conflicted on what to do with this code though: aiida-quantumespresso/src/aiida_quantumespresso/workflows/pw/base.py Lines 247 to 252 in 5cae75f
Let me go over some use cases to discuss this change and if we want to keep it. Use case ALet me first repeat here the use case you mentioned here that was the reason for adding this code:
Example run of use case.builder = PwBaseWorkChain.get_builder_from_protocol(
code=pw_code,
structure=structure,
)
builder.clean_workdir = orm.Bool(False)
builder.max_iterations = orm.Int(1)
builder.pw.parameters['CONTROL']['max_seconds'] = 20
wc_node = submit(builder)
while not wc_node.is_finished:
time.sleep(5)
parent_folder = wc_node.called_descendants[-1].outputs.remote_folder
restart_builder = wc_node.get_builder_restart()
restart_builder.pw.parent_folder = parent_folder
submit(restart_builder) A couple of comments here:
Without the code that sets the
A bit tedious perhaps, but not insurmountable. And the validation warning implemented on the Use case BThe user has successfully completed a By adding the code above,
I can probably think of others. The user could of course explicitly set Use case CMany work chains rely on restarts from previous steps in the work chain. The example that we try to fix in this PR is the one raised by @unkcpz (#877), where the restart for the band structure calculation suddenly needs to explicitly set ConclusionAlthough the addition of the code above in 0aba276 makes it slightly more straightforward to restart from interrupted runs, it complicates many other use cases in a manner that can be confusing and frustrating for the user. We could think of ways to restrict when the I think we should:
Side tangent: trying to set the restart_builder.pw.parameters['CONTROL']['restart_mode'] = 'restart' since the node is the same one as the already run In [1]: d = Dict({'a': 1})
...: d['b'] = 2
...: print(d.get_dict())
{'a': 1, 'b': 2}
In [2]: d.store()
...: d['c'] = 3
...: print(d.get_dict())
---------------------------------------------------------------------------
ModificationNotAllowed Traceback (most recent call last)
Cell In[2], line 2
1 d.store()
----> 2 d['c'] = 3
3 print(d.get_dict())
File ~/envs/aiida-dev/code/aiida-core/aiida/orm/nodes/data/dict.py:71, in Dict.__setitem__(self, key, value)
70 def __setitem__(self, key, value):
---> 71 self.base.attributes.set(key, value)
File ~/envs/aiida-dev/code/aiida-core/aiida/orm/nodes/attributes.py:118, in NodeAttributes.set(self, key, value)
110 def set(self, key: str, value: Any) -> None:
111 """Set an attribute to the given value.
112
113 :param key: name of the attribute
(...)
116 :raise aiida.common.ModificationNotAllowed: if the entity is stored
117 """
--> 118 self._node._check_mutability_attributes([key]) # pylint: disable=protected-access
119 self._backend_node.set_attribute(key, value)
File ~/envs/aiida-dev/code/aiida-core/aiida/orm/nodes/node.py:209, in Node._check_mutability_attributes(self, keys)
202 """Check if the entity is mutable and raise an exception if not.
203
204 This is called from `NodeAttributes` methods that modify the attributes.
205
206 :param keys: the keys that will be mutated, or all if None
207 """
208 if self.is_stored:
--> 209 raise exceptions.ModificationNotAllowed('the attributes of a stored entity are immutable')
ModificationNotAllowed: the attributes of a stored entity are immutable But trying to change nested content of the In [3]: d = Dict({'a': {'b': None, 'c': None}})
...: d['a']['b'] = 2
...: print(d.get_dict())
{'a': {'b': 2, 'c': None}}
In [4]: d.store()
...: d['a']['c'] = 3
...: print(d.get_dict())
{'a': {'b': 2, 'c': None}} |
Thanks for the detailed writeup @mbercx , this is super important and useful. Let me address your points, starting with the last:
I definitely remember writing about this in an issue and basically coming to the conclusion that it will be very difficult to come up with a solution for this. Essentially what is happening with the following code: restart_builder.pw.parameters['CONTROL']['restart_mode'] = 'restart' First def __getitem__(self, key):
try:
return self.base.attributes.get(key)
except AttributeError as exc:
raise KeyError from exc this will get the attribute from the database with that name and return it. The return type is a plain Python dictionary, let's call it The only "solution" I see here is to have Then, as for your other points. I see how helping making one use-case easier, may hamstring another user. Maybe it is true that in the end, it is quite complicated and we cannot warn/prevent for all use-cases and will have to leave some up to documentation.
Don't see a problem with that, fine to open a PR for this.
👍
Fine, but I will be sending all complaints of people, trying to restart and the calculation failing again because it is restarting from scratch, to you 😄 |
Hehe, agreed! I think there is more risk of someone using the Regarding the documentation, I'm always sort of paralysed when I start trying to add something to the docs, because there are so many things I would change I feel I would go down a documentation rabbit hole that would cause me sleep deprivation. So I've opened an issue for this instead: I think we should have a meeting to discuss the documentation again, also in light of this decision:
EDIT: on the following:
I see, quite tricky indeed... I understand a solution is not simple, but I think we should thinking about doing so for stored nodes. It's quite an insidious bug that this passes without warning/error. Let me try and dig up some issues on this. EDIT: Couldn't find an issue for this, so opened aiidateam/aiida-core#5970 |
8fb3cc1
to
e7a018f
Compare
CELL: | ||
press_conv_thr: 0.5 | ||
CONTROL: | ||
calculation: scf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not ask questions how a pressure convergence condition was the only part of the "scf" step of the PwRelaxWorkChain
protocol. 😅
e7a018f
to
89a06f6
Compare
PwBandsWorkChain
: Move inputs to protocol
@sphuber OP message updated for new commit message, tests adapted. Ready for review! |
2e3cce5
to
969d559
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mbercx . Just a single request for change, then we can merge.
That being said, I do think it is very important we add as fast as possible some minimal documentation that at the very least states that the workchains do not automatically set required defaults when launched directly and they can fail, and that therefore they are recommended to use the get_builder_from_protocol
. I am pretty sure that there will be quite some users that do not use the protocols and will not set all the proper defaults because the workchains used to do that.
CONTROL: | ||
calculation: bands | ||
ELECTRONS: | ||
diagonalization: paro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value was added in QE 6.6? Since that is the oldest version we support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yessir, see
@@ -229,7 +229,6 @@ def run_scf(self): | |||
inputs.metadata.call_link_label = 'scf' | |||
inputs.pw.structure = self.ctx.current_structure | |||
inputs.pw.parameters = inputs.pw.parameters.get_dict() | |||
inputs.pw.parameters.setdefault('CONTROL', {})['calculation'] = 'scf' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are removing this, maybe we move line 231 to the if self.ctx.current_number_of_bands:
block, since then we only unwrap-and-wrap the parameters if we really need to update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to still add this change @mbercx ? Then this can be merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this comment somehow. On it! 🚀
7be29e6
to
bd9c01d
Compare
bd9c01d
to
a193fa1
Compare
Several "default" inputs of the `PwBandsWorkChain` are set inside one of the steps of the work chain, which: 1. Can be confusing for the user, since the expected inputs are not there in the input files. 2. Can be frustrating for the user when a different value is desirable, for a use case that may not immediately be obvious. 3. Means default values are specified _both_ in the work chain logic and protocol, making it more difficult to get a clear overview of the input parameters. Here we move the specification of the default inputs to the protocol file of the `PwBandsWorkChain` (`bands.yaml`). We also explicitly set the restart inputs to make sure the `bands` calculations restarts from the charge density by default. Since the default protocol now correctly sets the calculation type to `bands`, the `validate_inputs` validator of the `PwCalculation` will raise a warning because a `parent_folder` has not been initially provided. Hence, we set the `validate_inputs_base` validator for the `pw` port of the `bands` namespace, as is also done for e.g. the `nscf` of the `PdosWorkchain`.
This is already the default in QE anyways, and is specified as such in the protocol.
a193fa1
to
6ecb6bd
Compare
Agreed. I would love to give the documentation some love soon. Will try to find the time later this week! |
Fixes #830
Some "default" inputs are set inside one of the steps of several work chains, which:
Here we move the specification of all static default inputs to the corresponding protocol file. Additionally:
bands
, thevalidate_inputs
validator of thePwCalculation
will raise a warning because aparent_folder
has not been initially provided. Hence, we set thevalidate_inputs_base
validator for thepw
port of thebands
namespace, as is also done for e.g. thenscf
of thePdosWorkchain
.base_final_scf
part of thePwRelaxWorkChain
protocol, that erroneously specifiedCELL.press_conv_thr
.PwBaseWorkChain.set_max_seconds
toPwBaseWorkChain.prepare_process
. Having a separate method for these few lines of code seemed unnecessary, especially since it's the only step that is executed inprepare_process
.RestartType.FULL
in case theparent_folder
input is provided for thePwBaseWorkChain
. There are simply too many different use cases of restarting I think it's difficult if not impossible to consider all of them, and in several known use cases this code addition would cause more harm than good.