Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProcessBuilder: fix nested namespaces stay ProcessBuilderNamespace #4983

Merged
merged 2 commits into from
Jun 21, 2021

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Jun 10, 2021

Fixes #4630

When a ProcessBuilder is constructed from a PortNamespace each
PortNamespace is converted into a ProcessBuilderNamespace including
the nested PortNamespaces. This guarantees that even nested port
namespaces have the same autocompletion and automatic validation when
assigning values as the top level namespace.

There was a bug however that when assigning a dictionary to a namespace
of the builder, that namespace would lose its ProcessBuilderNamespace
type and become a plain dictionary. Subsequent assignments to ports
within this namespace would then no longer be validated.

The source of the bug was in the __setattr__ of ProcessBuilderNamespace
which would simply assign the value that is being passed to the internal
_data container, losing the ProcessBuilderNamespace class if the
value is a plain dictionary. The solution is to, in the case where an
attribute that is being set corresponds to a PortNamespace, to first
assign a new ProcessBuilderNamespace in the internal _data container
for that given attribute, and then iterate over the contents of the
value dictionary being assigned and call setattr for that on the new
ProcessBuilderNamespace. By calling setattr again, the process
described above is applied recursively, as it should.

This fix also solves another bug that was a direct result of the problem
described above. When passing a dictionary to _update, any existing
content would be fully overwritten instead of merging the existing and
new dictionary, which is what the behavior of update should be, just as
for the normal update method of plain Python dictionaries.

@sphuber sphuber requested a review from giovannipizzi June 10, 2021 20:42
@sphuber
Copy link
Contributor Author

sphuber commented Jun 10, 2021

@giovannipizzi I think this should fix the problem of the original issue and also indirectly fixes the problem you were experiencing. Would be good if you could verify this for your explicit test case.

@codecov
Copy link

codecov bot commented Jun 10, 2021

Codecov Report

Merging #4983 (ef310f1) into develop (ca91040) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4983      +/-   ##
===========================================
+ Coverage    80.12%   80.12%   +0.01%     
===========================================
  Files          515      515              
  Lines        36676    36692      +16     
===========================================
+ Hits         29382    29397      +15     
- Misses        7294     7295       +1     
Flag Coverage Δ
django 74.59% <100.00%> (+0.04%) ⬆️
sqlalchemy 73.52% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiida/engine/processes/builder.py 92.67% <100.00%> (+1.27%) ⬆️
aiida/transports/plugins/local.py 81.41% <0.00%> (-0.25%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca91040...ef310f1. Read the comment docs.

@giovannipizzi
Copy link
Member

Thanks @sphuber for the quick fix!
Unfortunately it's still not enough - if I run (after the example in issue #4630)
builder._update({'base': {'pw': {'settings': orm.Dict(dict={})}}})

I get this:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-904b1e35e7c0> in <module>
----> 1 builder._update({'base': {'pw': {'settings': orm.Dict(dict={})}}})

~/.virtualenvs/aiida-prod/codes/aiida-core/aiida/engine/processes/builder.py in _update(self, *args, **kwds)
    148             for key, value in args[0].items():
    149                 if isinstance(value, collections.abc.Mapping):
--> 150                     self[key].update(value)
    151                 else:
    152                     self.__setattr__(key, value)

/usr/lib/python3.7/_collections_abc.py in update(*args, **kwds)
    839             if isinstance(other, Mapping):
    840                 for key in other:
--> 841                     self[key] = other[key]
    842             elif hasattr(other, "keys"):
    843                 for key in other.keys():

~/.virtualenvs/aiida-prod/codes/aiida-core/aiida/engine/processes/builder.py in __setitem__(self, item, value)
    124 
    125     def __setitem__(self, item, value):
--> 126         self.__setattr__(item, value)
    127 
    128     def __delitem__(self, item):

~/.virtualenvs/aiida-prod/codes/aiida-core/aiida/engine/processes/builder.py in __setattr__(self, attr, value)
     92                 validation_error = port.validate(value)
     93                 if validation_error:
---> 94                     raise ValueError(f'invalid attribute value {validation_error.message}')
     95 
     96             # If the attribute that is being set corresponds to a port that is a ``PortNamespace`` we need to make sure

ValueError: invalid attribute value required value was not provided for 'code'

I can't debug more now (it's late :-) ) but it seems it's trying to "remove" the code and so validation fails?

@sphuber sphuber force-pushed the fix/4630/process-builder-update branch from 09fe198 to 5ca57ed Compare June 11, 2021 06:06
@sphuber
Copy link
Contributor Author

sphuber commented Jun 11, 2021

Apologies @giovannipizzi . It should be fixed now. I added a test that replicates your problem and tested the exact same example and I think it does the correct thing now.

giovannipizzi
giovannipizzi previously approved these changes Jun 11, 2021
Copy link
Member

@giovannipizzi giovannipizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This indeed works and seems ok!

Only one comment (on the commit message, and on the docstring of _update, where it says that the method functions just as collections.abc.MutableMapping.update).

This is not true. Taking the example from your test, if you run this in python:

a = {'nested': {'namespace': {'int': 1, 'float': 2.0}}}
a.update({'nested': {'namespace': {'int': 5}}})

now a will be {'nested': {'namespace': {'int': 5}}}, and not {'nested': {'namespace': {'int': 5, 'float': 2.0}}} (what our _update does).

That is, python will not "merge" recursively but update only at the first nesting level.
I think what python does is OK, and on the other hand our merging is OK because this is what we want to do in most of the cases (or at least most of the cases I've encountered).

Maybe a better name would have been _merge but I wouldn't change this now (probably).
Rather, I would correct the docstring to say "this is NOT like the update method of a python dictionary" and explain why with an example (e.g. the one above).

I'm approving anyway for now, in case you want to merge this anyway and move this task to a different issue, but maybe it's better if this is done here as well?

@sphuber
Copy link
Contributor Author

sphuber commented Jun 11, 2021

I'm approving anyway for now, in case you want to merge this anyway and move this task to a different issue, but maybe it's better if this is done here as well?

I will fix it here before merging. No need to reapprove since I am just fixing docstring.

@sphuber
Copy link
Contributor Author

sphuber commented Jun 11, 2021

Thinking about it some more though, what you originally reported as a bug then was actually correct behavior. You called update with a nested dictionary and it behaved as Mapping.update but you expected an automatic merge. I am starting to think whether we should actually revert that particular change that I made and instead add a _merge method that does what you wanted to happen. Then users can choose what they want and _update actually does what one would expect. What do you think?

@giovannipizzi
Copy link
Member

I see, and I'm ok - the only question is: how do we express this when specifying a namespace? What I really need to do in the common-workflows EOS is this:

    inputs = {
        'structure': structure,
        #'scale_factors': orm.List(list=[0.96, 0.98, 1, 1.02, 1.04]),
        'generator_inputs': {  # code-agnostic inputs for the relaxation
            'engines': engines,
            'protocol': 'precise',
            'relax_type': RelaxType.NONE,
            'electronic_type': ElectronicType.METAL,
            'spin_type': SpinType.NONE,
        },
        'sub_process_class': sub_process_cls_name,
        'sub_process' : {  # optional code-dependent overrides
            'base': {
                'pw': {
                    'settings' : orm.Dict(dict= {
                        'cmdline': ['-nk', '18'],
                    })
                }
            }
        }
    }

and I really want to only replace settings, not the rest. But who's going to decide if specifying the sub_process will update or merge? We let each implementation (in this case EOS) decide what's the behaviour? Or can the user decide when submitting, in some way (I don't know how)

@sphuber
Copy link
Contributor Author

sphuber commented Jun 11, 2021

But who's going to decide if specifying the sub_process will update or merge? We let each implementation (in this case EOS) decide what's the behaviour? Or can the user decide when submitting, in some way (I don't know how)

It is the generic EquationOfStateWorkChain that decides and it currently calls _update. So this is a choice we can make at the generic workchain level and all implementations will then behave the same. I would say that intuitively (at least how this input namespace was designed) is to have it function as a merge and not an update. So I would simply change the PR as I described and then we update aiida-common-workflows to use _merge instead of _update.

@sphuber sphuber force-pushed the fix/4630/process-builder-update branch 2 times, most recently from 37bd083 to e5bd506 Compare June 11, 2021 12:12
@sphuber
Copy link
Contributor Author

sphuber commented Jun 11, 2021

@giovannipizzi since we are actually changing the code after first review, I pushed without temporarily unchecking the setting to not invalidate your review. Would be good if you can give a quick look to see if it is still ok. Since the reasons for the fixes are clearer now, I have also split them into two separate commits. One is really fixing a bug and the second is just adding the _merge method which is just a feature.

giovannipizzi
giovannipizzi previously approved these changes Jun 11, 2021
Copy link
Member

@giovannipizzi giovannipizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, even if I'm wondering if it's a good idea to keep _recursive_merge as a method of the class - since we're using private methods _merge and _update that should actually be used by the user, I think finding _recursive_merge will be quite confusing (actually it might be more intuitive to think the actual method is _recursive_merge) on which one to use. I'd prefer to move it as a module function so it's not visibile in the class namespace.

If you don't agree just merge, otherwise can you please move out?

@giovannipizzi
Copy link
Member

One comment on these private methods, that I just realised. Since we just don't want to occupy the namespace, maybe the better way to do it (while at the same time avoiding to use private methods) is to append and underscore, rather than prepending? So merge_, and update_, similarly to e.g. what we do for the outline or sqlalchemy does for some methods.

This would avoid asking users to use private methods, and should be OK because (IIRC) also appended underscores are not valid for port names.

What do you think? If you agree, maybe we move to an issue? (This will have to be introduced with a deprecation pattern - or even without deprecating, we can keep the private method, and just define the new update_ that just calls the internal private _update).

@giovannipizzi
Copy link
Member

One final thing, more related to this PR - when would be the right moment to "release" this? We could do 1.6.4 but it's a new feature. And I think we don't want to have a 1.7? So we need to wait, I guess?

@giovannipizzi
Copy link
Member

Hi @sphuber - beside the update_ vs _update, I was thinking even more about this.
The usecase can become even more general. E.g. what if I want only to update one field inside the Settings or Parameters Dict? What if I want to append to a list in it? What if I want to remove a field in a Dict?

I guess having the update and merge functionality is good anyway, but we might want to have more flexibility for the common workflows. Just mentioning it here (and maybe we try to merge this in the meantime), but we can follow up in the corresponding issue on the common workflows repo.

@chrisjsewell
Copy link
Member

chrisjsewell commented Jun 15, 2021

The usecase can become even more general. E.g.

Yeh just to add my two-cents, I really think the whole AttributeDict approach is a little problematic, since Python is really not set up to handle them properly (as opposed to e.g. JavaScript where this is a built-in feature of the language).
They are definitely useful in the IPython context, where you can get dynamic autocompletion (so I completely see why they have been used), but for "internal" APIs, they can get pretty messy (there are so many corner cases to consider and are horrible for type checking).
(I literally added the AiiDA implementation to another one of my packages, but then ended up removing it lol executablebooks/markdown-it-py#66)

Anyway, its very unlikely we would reverse course now, but we should definitely be wary about implementing any more in the future.

@chrisjsewell
Copy link
Member

chrisjsewell commented Jun 15, 2021

Other than my word of warning though, yeh cheers I think the fix is probably as good as you can get

@sphuber
Copy link
Contributor Author

sphuber commented Jun 16, 2021

Quickly responding to @giovannipizzi comments:

  • I agree that it is better to move _recursive_merge should just be a static function, will change this
  • Having underscore at the end might indeed be better, but it is the first time we add this in out API. Maybe we leave this change for a later time since anyway we would have to deprecate _update which already exists.
  • Regarding release: we are merging this now in develop but if important we can cherry-pick this on a release 1.6.x branch. Honestly, having a feature in a patch release is not a big deal. It doesn't break anything. So I would just add it to 1.6.5 together with any other important fixes we want to backport.
  • Finally, whether to extend the functionality to the internals of nodes, I don't think we should do that. The contents of the builder stop at the leaf values. The builder has no business going inside the nodes, even if that also happens to be a mapping such as a Dict node. The problem you described with relation to aiida-common-workflows is a problem specific to that use case and should be solved there, I find.

@sphuber sphuber force-pushed the fix/4630/process-builder-update branch 2 times, most recently from 309b98a to 1f70265 Compare June 18, 2021 12:02
@sphuber sphuber requested a review from giovannipizzi June 18, 2021 12:02
Copy link
Member

@giovannipizzi giovannipizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks!

(Note that there is no test for the keyword use of _merge, as Codecov points out - if you have time you could copy-paste the test - in this case feel free to just consider me approving that test addition!

@sphuber sphuber force-pushed the fix/4630/process-builder-update branch from 1f70265 to dc734d9 Compare June 21, 2021 07:56
sphuber added 2 commits June 21, 2021 10:01
…ce` class

When a `ProcessBuilder` is constructed from a `PortNamespace` each
`PortNamespace` is converted into a `ProcessBuilderNamespace` including
the nested `PortNamespace`s. This guarantees that even nested port
namespaces have the same autocompletion and automatic validation when
assigning values as the top level namespace.

There was a bug however that when assigning a dictionary to a namespace
of the builder, that namespace would lose its `ProcessBuilderNamespace`
type and become a plain dictionary. Subsequent assignments to ports
within this namespace would then no longer be validated.

The source of the bug was in the `__setattr__` of `ProcessBuilderNamespace`
which would simply assign the value that is being passed to the internal
`_data` container, losing the `ProcessBuilderNamespace` class if the
value is a plain dictionary. The solution is to, in the case where an
attribute that is being set corresponds to a `PortNamespace`, to first
assign a new `ProcessBuilderNamespace` in the internal `_data` container
for that given attribute, and then iterate over the contents of the
value dictionary being assigned and call `setattr` for that on the new
`ProcessBuilderNamespace`. By calling `setattr` again, the process
described above is applied recursively, as it should.
The class already had the `_update` method, but like the normal `update`
method of a mapping, this will not recursively merge the contents of
nested dictionaries with any existing values. Anything below the top
level will simply get overwritten.

However, often one wants to merge the contents of the dictionary with
the existing namespace structure only overriding leaf nodes. To this end
the `_merge` method is added which recursively merges the content of the
new dictionary with the existing content.
@sphuber sphuber force-pushed the fix/4630/process-builder-update branch from dc734d9 to ef310f1 Compare June 21, 2021 08:02
@sphuber sphuber merged commit 97a7fa9 into aiidateam:develop Jun 21, 2021
@sphuber sphuber deleted the fix/4630/process-builder-update branch June 21, 2021 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants