Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recent change from #263 breaks my workchain #267

Closed
danielhollas opened this issue Apr 15, 2023 · 20 comments
Closed

Recent change from #263 breaks my workchain #267

danielhollas opened this issue Apr 15, 2023 · 20 comments
Assignees

Comments

@danielhollas
Copy link

While testing my workchain with plumpy 0.21.6 for the recent boolean fix in #265, I noticed my workchain is broken in another way now (not present in version 0.21.5). I verified that the breakage indeed comes from the second feature introduced in 0.21.6 in #263.

Specifically, my workchain excepts right after submission, during the input port validation. Here's a detailed traceback.

Traceback
/opt/conda/lib/python3.9/site-packages/plumpy/ports.py in validate_ports(self, port_values, breadcrumbs)
    722         """
    723         for name, port in self._ports.items():
--> 724             validation_error = port.validate(port_values.pop(name, UNSPECIFIED), breadcrumbs)
    725             if validation_error:
    726                 return validation_error

/opt/conda/lib/python3.9/site-packages/plumpy/ports.py in validate(self, port_values, breadcrumbs)
    648 
    649         # In all other cases, validate all input ports explicitly specified in this port namespace
--> 650         validation_error = self.validate_ports(port_values, breadcrumbs_local)
    651         if validation_error:
    652             return validation_error

/opt/conda/lib/python3.9/site-packages/plumpy/ports.py in validate_ports(self, port_values, breadcrumbs)
    722         """
    723         for name, port in self._ports.items():
--> 724             validation_error = port.validate(port_values.pop(name, UNSPECIFIED), breadcrumbs)
    725             if validation_error:
    726                 return validation_error

/opt/conda/lib/python3.9/site-packages/plumpy/ports.py in validate(self, port_values, breadcrumbs)
    664                 message = self.validator(port_values_clone)  # type: ignore # pylint: disable=not-callable
    665             else:
--> 666                 message = self.validator(port_values_clone, self)  # pylint: disable=not-callable
    667             if message is not None:
    668                 assert isinstance(message, str), \

/opt/conda/lib/python3.9/site-packages/aiida/engine/processes/calcjobs/calcjob.py in validate_calc_job(inputs, ctx)
     68 
     69     code = inputs.get('code', None)
---> 70     computer_from_code = code.computer
     71     computer_from_metadata = inputs.get('metadata', {}).get('computer', None)
     72 

AttributeError: 'NoneType' object has no attribute 'computer'

The error is strange because it comes from calcjob input port validation for input port that should not exist.
Specifically, in my workchain I am calling a aiida-orca calcjob twice in different contexts, so I am exposing the calcjob inputs into two different namespaces. However, I am excluding the code node to prevent duplication, and specify it manually on the workchain itself, and then manually passing it to the calcjob upon submission within the workchain.

Here's the workflow

https://github.com/ispg-group/aiidalab-ispg/blob/7bbff87cb1576d43aca8921cfd34825963f3e159/workflows/aiidalab_atmospec_workchain/__init__.py#L93

The relevant snippets

    @classmethod
    def define(cls, spec):
        super().define(spec)
        spec.expose_inputs(
            OrcaBaseWorkChain, namespace="opt", exclude=["orca.structure", "orca.code"]
        )
        spec.expose_inputs(
            OrcaBaseWorkChain, namespace="exc", exclude=["orca.structure", "orca.code"]
        )
        spec.input("structure", valid_type=(StructureData, TrajectoryData))
        spec.input("code", valid_type=Code)
        .
        .
        .
   def optimize(self):
        """Optimize geometry"""
        inputs = self.exposed_inputs(
            OrcaBaseWorkChain, namespace="opt", agglomerate=False
        )
        inputs.orca.structure = self.inputs.structure
        inputs.orca.code = self.inputs.code

        calc_opt = self.submit(OrcaBaseWorkChain, **inputs)
        calc_opt.label = "optimization"
        return ToContext(calc_opt=calc_opt)

I believe this is a valid approach but please correct me if I am wrong.

I was quite puzzled as to why this should be broken by #263 because I am not explicitly using dynamic namespaces here, but I guess they are used internally.

Here's what seems to happen. The get_port function that was modified in #263 originally returned ValueError for the code nodes that were not exposed as inputs, but that was fine because in that case the validation was skipped, see

https://github.com/aiidateam/aiida-core/blob/9de3f90a157ec7c9f99c1f17f3ca3548f6d96bba/aiida/engine/processes/calcjobs/calcjob.py#L54

But now it seems that the missing port is somehow created dynamically, but with None value which trips the validation code.

Note that as a somewhat separate issue, the uncought exception is particularly confusing, because the code is not guarded against the None value, even though it clearly permits it in just the previous line.

https://github.com/aiidateam/aiida-core/blob/9de3f90a157ec7c9f99c1f17f3ca3548f6d96bba/aiida/engine/processes/calcjobs/calcjob.py#L69


Not sure if this is something to be fixed in plumpy or aiida-core but I am creating the issue here since the root cause was a plumpy change in #263.

CC @sphuber

NOTE: In my actual application, I am not calling the abovementioned workchain directly, but from within another, master workchain. Not sure if that is pertinent to the issue.

https://github.com/ispg-group/aiidalab-ispg/blob/7bbff87cb1576d43aca8921cfd34825963f3e159/workflows/aiidalab_atmospec_workchain/__init__.py#L341

@sphuber
Copy link
Collaborator

sphuber commented Apr 15, 2023

Thanks for the detailed report @danielhollas . I am a bit confused as well. I am pretty sure I know where the error comes from, however, I don't understand how 0.21.6 would have broken that. Is it possible that it didn't show up with 0.21.5 simply because the other bug that was fixed in 0.21.6 hit before the issue you report here?

The error message you see comes from the validator that is set on the top-level input namespace of the base CalcJob
https://github.com/aiidateam/aiida-core/blob/9de3f90a157ec7c9f99c1f17f3ca3548f6d96bba/aiida/engine/processes/calcjobs/calcjob.py#L42

The code input is not required (on the port declaration itself), but the validator will check that it has been defined if there is also no remote_folder input, which would be a calcjob import run and is the only case where a code input is not necessary. In your workchain, you exclude the code input, but the validator on the CalcJob top-level inputs remains. This is a problem that has cropped up before and we don't have a better solution for this now.

The design problem boils down to this:

  • We want to allow validators that are "multi-port", since some validation rules for ports are dependent on the values for other ports
  • We want ports to be exposable to allow easy composition of workchains
  • When exposing, it is sometimes useful to be able to exclude certain ports so they can be added once in a different place in the namespace
  • Validators on exposed port(namespace)s can be invalidated if the namespace they are contained within is exposed with certain ports excluded

The problem is that it is impossible (practically) to automatically correct validators, or even detect if they would be affected by its namespace being exposed with only part of its ports, which would allow us maybe to disable it.

For now, the pragmatic solution has been for validators to account for ports being disabled. The CalcJob.validator does this in fact, where it checks whether the code and metadata.computer ports are actually still there:https://github.com/aiidateam/aiida-core/blob/main/aiida/engine/processes/calcjobs/calcjob.py#L54-L59
If not, it will skip the validation.

I think the conclusion then, is that there simply is a bug in the CalcJob validator:https://github.com/aiidateam/aiida-core/blob/9de3f90a157ec7c9f99c1f17f3ca3548f6d96bba/aiida/engine/processes/calcjobs/calcjob.py#L69
It makes no sense to try and get the code input and catch the exception if it doesn't exist and set it to None, to only proceed and attempt to get the computer, which will always result in the AttributeError that you hit. I think in this case, there should also be a check that the namespace still has a code input, and only then try to get it from the values. Otherwise, skip validation.

I will transfer the issue and open a PR in aiida-core to see if that fixes it

@sphuber sphuber transferred this issue from aiidateam/plumpy Apr 15, 2023
@sphuber sphuber self-assigned this Apr 15, 2023
@danielhollas
Copy link
Author

Is it possible that it didn't show up with 0.21.5 simply because the other bug that was fixed in 0.21.6 hit before the issue you report here?

I am sure because I have patched my workflow against the boolean issue when testing this. Also there really were not many other changes between 0.21.3 (which I was using until now) and 0.21.6.

@danielhollas
Copy link
Author

The problem is that it is impossible (practically) to automatically correct validators, or even detect if they would be affected by its namespace being exposed with only part of its ports, which would allow us maybe to disable it.

Yeah, that's tough.

I should maybe clarify since it is not clear from the traceback, that in my particular example, the error actually is triggered when I am submitting the workchain, not the CalcJob itself. So it looks like the input ports are validated recursively when the workchain is submitted, which is where the problem is because at that point not all the CalcJob ports are present. But I assume the same validator is then run again when the actual CalcJob is submitted.

@sphuber
Copy link
Collaborator

sphuber commented Apr 16, 2023

I should maybe clarify since it is not clear from the traceback, that in my particular example, the error actually is triggered when I am submitting the workchain, not the CalcJob itself. So it looks like the input ports are validated recursively when the workchain is submitted, which is where the problem is because at that point not all the CalcJob ports are present.

This is by design though and therefore expected. That is the reason the exclude key was added to expose_outputs to remove ports that won't be defined at launch time of the wrapping workchain.

But I assume the same validator is then run again when the actual CalcJob is submitted.

Indeed, but then it will be validated independent of the context of the wrapping workchain. So if the wrapper excluded a certain port, when you launch the actual sub class in that workchain, the original spec is used including the port that was excluded.

@sphuber
Copy link
Collaborator

sphuber commented Apr 17, 2023

I had a closer look and am not sure why your example is failing. I created this MWE that should reproduce your use-case:

#!/usr/bin/env python
from plumpy import Process


class SubProcess(Process):

    @classmethod
    def define(cls, spec):
        super().define(spec)
        spec.input('a')
        spec.input('b')
        spec.inputs.validator = cls.validate_inputs

    @classmethod
    def validate_inputs(cls, value, ctx):
        try:
            ctx.get_port('a')
        except ValueError:
            print('a not in ctx')
            return None

        if not isinstance('a', str):
            return f'input `a` should be a str, got: {type(str)}'


class MasterProcess(Process):

    @classmethod
    def define(cls, spec):
        super().define(spec)
        spec.expose_inputs(SubProcess, 'sub', exclude=('a',))
        spec.input('a')


print(MasterProcess.spec())
inputs = {
    'a': 'test',
    'sub': {
        'b': 'alt'
    }
}

print('VALIDATION:', MasterProcess.spec().inputs.validate(inputs))

It passes validation just fine and prints:

{
    "inputs": {
        "_attrs": {
            "default": [],
            "dynamic": false,
            "help": null,
            "required": "True",
            "valid_type": "None"
        },
        "a": {
            "name": "a",
            "required": "True"
        },
        "sub": {
            "_attrs": {
                "default": [],
                "dynamic": false,
                "help": null,
                "required": "True",
                "valid_type": "None"
            },
            "b": {
                "name": "b",
                "required": "True"
            }
        }
    },
    "outputs": {
        "_attrs": {
            "default": [],
            "dynamic": false,
            "help": null,
            "required": "True",
            "valid_type": "None"
        }
    }
}
a not in ctx
VALIDATION: None

Now as you already noticed, it probably has to do that get_port will now create the port on the fly if the namespace is dynamic. In my example, it clearly is not dynamic and so the code works as expected. Looking at your code that also seems to be the case, but maybe it is not? It would be very useful if you could run print(OrcaWignerSpectrumWorkChain.spec()) to make sure those exposed namespaces aren't dynamic after all. My suspicion is that they are, because otherwise I wouldn't be able to explain the behavior.

@sphuber
Copy link
Collaborator

sphuber commented Apr 17, 2023

@danielhollas On another note, we had planned to release v2.3.0 today. Since this change was introduced in plumpy and the latest aiida-core release specified 0.21.3 that would also automatically install the latest plumpy, it is not v2.3 that is really breaking this. I will therefore go ahead and release anyway. We can then continue investigating this and release a patch for plumpy or aiida-core as well, as needed.

@danielhollas
Copy link
Author

danielhollas commented Apr 17, 2023

Here's the output of print(OrcaWignerSpectrumWorkChain.spec()). Indeed the namespace is dynamic.

{
    "inputs": {
        "_attrs": {
            "default": [],
            "dynamic": false,
            "help": null,
            "required": "True",
            "valid_type": "<class 'aiida.orm.nodes.data.data.Data'>"
        },
        "code": {
            "name": "code",
            "non_db": "False",
            "required": "True",
            "valid_type": "<class 'aiida.orm.nodes.data.code.installed.InstalledCode'>"
        },
        "exc": {
            "_attrs": {
                "default": [],
                "dynamic": true,
                "help": null,
                "required": "True",
                "valid_type": "<class 'aiida.orm.nodes.data.data.Data'>"
            },
            "clean_workdir": {
                "default": "<function BaseRestartWorkChain.define.<locals>.<lambda> at 0x7fcd0afa29d0>",
                "help": "If `True`, work directories of all called calculation jobs will be cleaned at the end of execution.",
                "name": "clean_workdir",
                "non_db": "False",
                "required": "False",
                "valid_type": "<class 'aiida.orm.nodes.data.bool.Bool'>"
            },
            "handler_overrides": {
                "help": "Mapping where keys are process handler names and the values are a dictionary, where each dictionary can define the ``enabled`` and ``priority`` key, which can be used to toggle the values set on the original process handler declaration.",
                "name": "handler_overrides",
                "non_db": "False",
                "required": "False",
                "valid_type": "(<class 'aiida.orm.nodes.data.dict.Dict'>, <class 'NoneType'>)"
            },
            "max_iterations": {
                "default": "<function BaseRestartWorkChain.define.<locals>.<lambda> at 0x7fcd0c73ca60>",
                "help": "Maximum number of iterations the work chain will restart the process to finish successfully.",
                "name": "max_iterations",
                "non_db": "False",
                "required": "False",
                "valid_type": "<class 'aiida.orm.nodes.data.int.Int'>"
            },
            "metadata": {
                "_attrs": {
                    "default": [],
                    "dynamic": false,
                    "help": null,
                    "required": "False",
                    "valid_type": "None"
                },
                "call_link_label": {
                    "default": "CALL",
                    "help": "The label to use for the `CALL` link if the process is called by another process.",
                    "name": "call_link_label",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'str'>"
                },
                "description": {
                    "help": "Description to set on the process node.",
                    "name": "description",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                },
                "label": {
                    "help": "Label to set on the process node.",
                    "name": "label",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                },
                "store_provenance": {
                    "default": "True",
                    "help": "If set to `False` provenance will not be stored in the database.",
                    "name": "store_provenance",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'bool'>"
                }
            },
            "orca": {
                "_attrs": {
                    "default": [],
                    "dynamic": true,
                    "help": null,
                    "required": "True",
                    "valid_type": "<class 'aiida.orm.nodes.data.data.Data'>"
                },
                "file": {
                    "_attrs": {
                        "default": [],
                        "dynamic": true,
                        "help": "Additional input files like gbw or hessian",
                        "required": "False",
                        "valid_type": "<class 'aiida.orm.nodes.data.singlefile.SinglefileData'>"
                    }
                },
                "metadata": {
                    "_attrs": {
                        "default": [],
                        "dynamic": false,
                        "help": null,
                        "required": "False",
                        "valid_type": "None"
                    },
                    "call_link_label": {
                        "default": "CALL",
                        "help": "The label to use for the `CALL` link if the process is called by another process.",
                        "name": "call_link_label",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "<class 'str'>"
                    },
                    "computer": {
                        "help": "When using a \"local\" code, set the computer on which the calculation should be run.",
                        "name": "computer",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "(<class 'aiida.orm.computers.Computer'>, <class 'NoneType'>)"
                    },
                    "description": {
                        "help": "Description to set on the process node.",
                        "name": "description",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                    },
                    "dry_run": {
                        "default": "False",
                        "help": "When set to `True` will prepare the calculation job for submission but not actually launch it.",
                        "name": "dry_run",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "<class 'bool'>"
                    },
                    "label": {
                        "help": "Label to set on the process node.",
                        "name": "label",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                    },
                    "options": {
                        "_attrs": {
                            "default": [],
                            "dynamic": false,
                            "help": null,
                            "required": "False",
                            "valid_type": "None"
                        },
                        "account": {
                            "help": "Set the account to use in for the queue on the remote computer",
                            "name": "account",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                        },
                        "additional_retrieve_list": {
                            "help": "List of relative file paths that should be retrieved in addition to what the plugin specifies.",
                            "name": "additional_retrieve_list",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'list'>, <class 'tuple'>, <class 'NoneType'>)"
                        },
                        "append_text": {
                            "default": "",
                            "help": "Set the calculation-specific append text, which is going to be appended in the scheduler-job script, just after the code execution",
                            "name": "append_text",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "custom_scheduler_commands": {
                            "default": "",
                            "help": "Set a (possibly multiline) string with the commands that the user wants to manually set for the scheduler. The difference of this option with respect to the `prepend_text` is the position in the scheduler submission file where such text is inserted: with this option, the string is inserted before any non-scheduler command",
                            "name": "custom_scheduler_commands",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "environment_variables": {
                            "default": "<function CalcJob.define.<locals>.<lambda> at 0x7fcd0afa20d0>",
                            "help": "Set a dictionary of custom environment variables for this calculation",
                            "name": "environment_variables",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'dict'>"
                        },
                        "environment_variables_double_quotes": {
                            "default": "False",
                            "help": "If set to True, use double quotes instead of single quotes to escape the environment variables specified in ``environment_variables``.",
                            "name": "environment_variables_double_quotes",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'bool'>"
                        },
                        "import_sys_environment": {
                            "default": "True",
                            "help": "If set to true, the submission script will load the system environment variables",
                            "name": "import_sys_environment",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'bool'>"
                        },
                        "input_filename": {
                            "default": "aiida.inp",
                            "name": "input_filename",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "max_memory_kb": {
                            "help": "Set the maximum memory (in KiloBytes) to be asked to the scheduler",
                            "name": "max_memory_kb",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'int'>, <class 'NoneType'>)"
                        },
                        "max_wallclock_seconds": {
                            "help": "Set the wallclock in seconds asked to the scheduler",
                            "name": "max_wallclock_seconds",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'int'>, <class 'NoneType'>)"
                        },
                        "mpirun_extra_params": {
                            "default": "<function CalcJob.define.<locals>.<lambda> at 0x7fcd0afa2d30>",
                            "help": "Set the extra params to pass to the mpirun (or equivalent) command after the one provided in computer.mpirun_command. Example: mpirun -np 8 extra_params[0] extra_params[1] ... exec.x",
                            "name": "mpirun_extra_params",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'list'>, <class 'tuple'>)"
                        },
                        "output_filename": {
                            "default": "aiida.out",
                            "name": "output_filename",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "parser_name": {
                            "default": "orca_base_parser",
                            "name": "parser_name",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "prepend_text": {
                            "default": "",
                            "help": "Set the calculation-specific prepend text, which is going to be prepended in the scheduler-job script, just before the code execution",
                            "name": "prepend_text",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "priority": {
                            "help": "Set the priority of the job to be queued",
                            "name": "priority",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                        },
                        "qos": {
                            "help": "Set the quality of service to use in for the queue on the remote computer",
                            "name": "qos",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                        },
                        "queue_name": {
                            "help": "Set the name of the queue on the remote computer",
                            "name": "queue_name",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                        },
                        "rerunnable": {
                            "help": "Determines if the calculation can be requeued / rerun.",
                            "name": "rerunnable",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'bool'>, <class 'NoneType'>)"
                        },
                        "resources": {
                            "help": "Set the dictionary of resources to be used by the scheduler plugin, like the number of nodes, cpus etc. This dictionary is scheduler-plugin dependent. Look at the documentation of the scheduler for more details.",
                            "name": "resources",
                            "non_db": "True",
                            "required": "True",
                            "valid_type": "<class 'dict'>"
                        },
                        "scheduler_stderr": {
                            "default": "_scheduler-stderr.txt",
                            "help": "Filename to which the content of stderr of the scheduler is written.",
                            "name": "scheduler_stderr",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "scheduler_stdout": {
                            "default": "_scheduler-stdout.txt",
                            "help": "Filename to which the content of stdout of the scheduler is written.",
                            "name": "scheduler_stdout",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "stash": {
                            "_attrs": {
                                "default": [],
                                "dynamic": false,
                                "help": "Optional directives to stash files after the calculation job has completed.",
                                "required": "False",
                                "valid_type": "None"
                            },
                            "source_list": {
                                "help": "Sequence of relative filepaths representing files in the remote directory that should be stashed.",
                                "name": "source_list",
                                "non_db": "True",
                                "required": "False",
                                "valid_type": "(<class 'tuple'>, <class 'list'>, <class 'NoneType'>)"
                            },
                            "stash_mode": {
                                "help": "Mode with which to perform the stashing, should be value of `aiida.common.datastructures.StashMode`.",
                                "name": "stash_mode",
                                "non_db": "True",
                                "required": "False",
                                "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                            },
                            "target_base": {
                                "help": "The base location to where the files should be stashd. For example, for the `copy` stash mode, this should be an absolute filepath on the remote computer.",
                                "name": "target_base",
                                "non_db": "True",
                                "required": "False",
                                "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                            }
                        },
                        "submit_script_filename": {
                            "default": "_aiidasubmit.sh",
                            "help": "Filename to which the job submission script is written.",
                            "name": "submit_script_filename",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "withmpi": {
                            "default": "False",
                            "name": "withmpi",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'bool'>"
                        }
                    },
                    "store_provenance": {
                        "default": "True",
                        "help": "If set to `False` provenance will not be stored in the database.",
                        "name": "store_provenance",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "<class 'bool'>"
                    }
                },
                "monitors": {
                    "_attrs": {
                        "default": [],
                        "dynamic": true,
                        "help": "Add monitoring functions that can inspect output files while the job is running and decide to prematurely terminate the job.",
                        "required": "False",
                        "valid_type": "<class 'aiida.orm.nodes.data.dict.Dict'>"
                    }
                },
                "parameters": {
                    "help": "Input parameters to generate the input file.",
                    "name": "parameters",
                    "non_db": "False",
                    "required": "True",
                    "valid_type": "<class 'aiida.orm.nodes.data.dict.Dict'>"
                },
                "remote_folder": {
                    "help": "Remote directory containing the results of an already completed calculation job without AiiDA. The inputs should be passed to the `CalcJob` as normal but instead of launching the actual job, the engine will recreate the input files and then proceed straight to the retrieve step where the files of this `RemoteData` will be retrieved as if it had been actually launched through AiiDA. If a parser is defined in the inputs, the results are parsed and attached as output nodes as usual.",
                    "name": "remote_folder",
                    "non_db": "False",
                    "required": "False",
                    "valid_type": "(<class 'aiida.orm.nodes.data.remote.base.RemoteData'>, <class 'NoneType'>)"
                },
                "settings": {
                    "help": "Additional input parameters",
                    "name": "settings",
                    "non_db": "False",
                    "required": "False",
                    "valid_type": "(<class 'aiida.orm.nodes.data.dict.Dict'>, <class 'NoneType'>)"
                },
                "store_gbw": {
                    "default": "<function OrcaCalculation.define.<locals>.<lambda> at 0x7fcd0afa2820>",
                    "help": "Store gbw file as SinglefileData output node.",
                    "name": "store_gbw",
                    "non_db": "False",
                    "required": "False",
                    "valid_type": "<class 'aiida.orm.nodes.data.bool.Bool'>"
                }
            }
        },
        "metadata": {
            "_attrs": {
                "default": [],
                "dynamic": false,
                "help": null,
                "required": "False",
                "valid_type": "None"
            },
            "call_link_label": {
                "default": "CALL",
                "help": "The label to use for the `CALL` link if the process is called by another process.",
                "name": "call_link_label",
                "non_db": "True",
                "required": "False",
                "valid_type": "<class 'str'>"
            },
            "description": {
                "help": "Description to set on the process node.",
                "name": "description",
                "non_db": "True",
                "required": "False",
                "valid_type": "(<class 'str'>, <class 'NoneType'>)"
            },
            "label": {
                "help": "Label to set on the process node.",
                "name": "label",
                "non_db": "True",
                "required": "False",
                "valid_type": "(<class 'str'>, <class 'NoneType'>)"
            },
            "store_provenance": {
                "default": "True",
                "help": "If set to `False` provenance will not be stored in the database.",
                "name": "store_provenance",
                "non_db": "True",
                "required": "False",
                "valid_type": "<class 'bool'>"
            }
        },
        "nwigner": {
            "default": "<function OrcaWignerSpectrumWorkChain.define.<locals>.<lambda> at 0x7fcd0b8bf820>",
            "name": "nwigner",
            "non_db": "False",
            "required": "False",
            "valid_type": "<class 'aiida.orm.nodes.data.int.Int'>"
        },
        "opt": {
            "_attrs": {
                "default": [],
                "dynamic": true,
                "help": null,
                "required": "True",
                "valid_type": "<class 'aiida.orm.nodes.data.data.Data'>"
            },
            "clean_workdir": {
                "default": "<function BaseRestartWorkChain.define.<locals>.<lambda> at 0x7fcd0afa29d0>",
                "help": "If `True`, work directories of all called calculation jobs will be cleaned at the end of execution.",
                "name": "clean_workdir",
                "non_db": "False",
                "required": "False",
                "valid_type": "<class 'aiida.orm.nodes.data.bool.Bool'>"
            },
            "handler_overrides": {
                "help": "Mapping where keys are process handler names and the values are a dictionary, where each dictionary can define the ``enabled`` and ``priority`` key, which can be used to toggle the values set on the original process handler declaration.",
                "name": "handler_overrides",
                "non_db": "False",
                "required": "False",
                "valid_type": "(<class 'aiida.orm.nodes.data.dict.Dict'>, <class 'NoneType'>)"
            },
            "max_iterations": {
                "default": "<function BaseRestartWorkChain.define.<locals>.<lambda> at 0x7fcd0c73ca60>",
                "help": "Maximum number of iterations the work chain will restart the process to finish successfully.",
                "name": "max_iterations",
                "non_db": "False",
                "required": "False",
                "valid_type": "<class 'aiida.orm.nodes.data.int.Int'>"
            },
            "metadata": {
                "_attrs": {
                    "default": [],
                    "dynamic": false,
                    "help": null,
                    "required": "False",
                    "valid_type": "None"
                },
                "call_link_label": {
                    "default": "CALL",
                    "help": "The label to use for the `CALL` link if the process is called by another process.",
                    "name": "call_link_label",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'str'>"
                },
                "description": {
                    "help": "Description to set on the process node.",
                    "name": "description",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                },
                "label": {
                    "help": "Label to set on the process node.",
                    "name": "label",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                },
                "store_provenance": {
                    "default": "True",
                    "help": "If set to `False` provenance will not be stored in the database.",
                    "name": "store_provenance",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'bool'>"
                }
            },
            "orca": {
                "_attrs": {
                    "default": [],
                    "dynamic": true,
                    "help": null,
                    "required": "True",
                    "valid_type": "<class 'aiida.orm.nodes.data.data.Data'>"
                },
                "file": {
                    "_attrs": {
                        "default": [],
                        "dynamic": true,
                        "help": "Additional input files like gbw or hessian",
                        "required": "False",
                        "valid_type": "<class 'aiida.orm.nodes.data.singlefile.SinglefileData'>"
                    }
                },
                "metadata": {
                    "_attrs": {
                        "default": [],
                        "dynamic": false,
                        "help": null,
                        "required": "False",
                        "valid_type": "None"
                    },
                    "call_link_label": {
                        "default": "CALL",
                        "help": "The label to use for the `CALL` link if the process is called by another process.",
                        "name": "call_link_label",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "<class 'str'>"
                    },
                    "computer": {
                        "help": "When using a \"local\" code, set the computer on which the calculation should be run.",
                        "name": "computer",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "(<class 'aiida.orm.computers.Computer'>, <class 'NoneType'>)"
                    },
                    "description": {
                        "help": "Description to set on the process node.",
                        "name": "description",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                    },
                    "dry_run": {
                        "default": "False",
                        "help": "When set to `True` will prepare the calculation job for submission but not actually launch it.",
                        "name": "dry_run",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "<class 'bool'>"
                    },
                    "label": {
                        "help": "Label to set on the process node.",
                        "name": "label",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                    },
                    "options": {
                        "_attrs": {
                            "default": [],
                            "dynamic": false,
                            "help": null,
                            "required": "False",
                            "valid_type": "None"
                        },
                        "account": {
                            "help": "Set the account to use in for the queue on the remote computer",
                            "name": "account",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                        },
                        "additional_retrieve_list": {
                            "help": "List of relative file paths that should be retrieved in addition to what the plugin specifies.",
                            "name": "additional_retrieve_list",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'list'>, <class 'tuple'>, <class 'NoneType'>)"
                        },
                        "append_text": {
                            "default": "",
                            "help": "Set the calculation-specific append text, which is going to be appended in the scheduler-job script, just after the code execution",
                            "name": "append_text",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "custom_scheduler_commands": {
                            "default": "",
                            "help": "Set a (possibly multiline) string with the commands that the user wants to manually set for the scheduler. The difference of this option with respect to the `prepend_text` is the position in the scheduler submission file where such text is inserted: with this option, the string is inserted before any non-scheduler command",
                            "name": "custom_scheduler_commands",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "environment_variables": {
                            "default": "<function CalcJob.define.<locals>.<lambda> at 0x7fcd0afa20d0>",
                            "help": "Set a dictionary of custom environment variables for this calculation",
                            "name": "environment_variables",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'dict'>"
                        },
                        "environment_variables_double_quotes": {
                            "default": "False",
                            "help": "If set to True, use double quotes instead of single quotes to escape the environment variables specified in ``environment_variables``.",
                            "name": "environment_variables_double_quotes",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'bool'>"
                        },
                        "import_sys_environment": {
                            "default": "True",
                            "help": "If set to true, the submission script will load the system environment variables",
                            "name": "import_sys_environment",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'bool'>"
                        },
                        "input_filename": {
                            "default": "aiida.inp",
                            "name": "input_filename",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "max_memory_kb": {
                            "help": "Set the maximum memory (in KiloBytes) to be asked to the scheduler",
                            "name": "max_memory_kb",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'int'>, <class 'NoneType'>)"
                        },
                        "max_wallclock_seconds": {
                            "help": "Set the wallclock in seconds asked to the scheduler",
                            "name": "max_wallclock_seconds",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'int'>, <class 'NoneType'>)"
                        },
                        "mpirun_extra_params": {
                            "default": "<function CalcJob.define.<locals>.<lambda> at 0x7fcd0afa2d30>",
                            "help": "Set the extra params to pass to the mpirun (or equivalent) command after the one provided in computer.mpirun_command. Example: mpirun -np 8 extra_params[0] extra_params[1] ... exec.x",
                            "name": "mpirun_extra_params",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'list'>, <class 'tuple'>)"
                        },
                        "output_filename": {
                            "default": "aiida.out",
                            "name": "output_filename",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "parser_name": {
                            "default": "orca_base_parser",
                            "name": "parser_name",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "prepend_text": {
                            "default": "",
                            "help": "Set the calculation-specific prepend text, which is going to be prepended in the scheduler-job script, just before the code execution",
                            "name": "prepend_text",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "priority": {
                            "help": "Set the priority of the job to be queued",
                            "name": "priority",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                        },
                        "qos": {
                            "help": "Set the quality of service to use in for the queue on the remote computer",
                            "name": "qos",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                        },
                        "queue_name": {
                            "help": "Set the name of the queue on the remote computer",
                            "name": "queue_name",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                        },
                        "rerunnable": {
                            "help": "Determines if the calculation can be requeued / rerun.",
                            "name": "rerunnable",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "(<class 'bool'>, <class 'NoneType'>)"
                        },
                        "resources": {
                            "help": "Set the dictionary of resources to be used by the scheduler plugin, like the number of nodes, cpus etc. This dictionary is scheduler-plugin dependent. Look at the documentation of the scheduler for more details.",
                            "name": "resources",
                            "non_db": "True",
                            "required": "True",
                            "valid_type": "<class 'dict'>"
                        },
                        "scheduler_stderr": {
                            "default": "_scheduler-stderr.txt",
                            "help": "Filename to which the content of stderr of the scheduler is written.",
                            "name": "scheduler_stderr",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "scheduler_stdout": {
                            "default": "_scheduler-stdout.txt",
                            "help": "Filename to which the content of stdout of the scheduler is written.",
                            "name": "scheduler_stdout",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "stash": {
                            "_attrs": {
                                "default": [],
                                "dynamic": false,
                                "help": "Optional directives to stash files after the calculation job has completed.",
                                "required": "False",
                                "valid_type": "None"
                            },
                            "source_list": {
                                "help": "Sequence of relative filepaths representing files in the remote directory that should be stashed.",
                                "name": "source_list",
                                "non_db": "True",
                                "required": "False",
                                "valid_type": "(<class 'tuple'>, <class 'list'>, <class 'NoneType'>)"
                            },
                            "stash_mode": {
                                "help": "Mode with which to perform the stashing, should be value of `aiida.common.datastructures.StashMode`.",
                                "name": "stash_mode",
                                "non_db": "True",
                                "required": "False",
                                "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                            },
                            "target_base": {
                                "help": "The base location to where the files should be stashd. For example, for the `copy` stash mode, this should be an absolute filepath on the remote computer.",
                                "name": "target_base",
                                "non_db": "True",
                                "required": "False",
                                "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                            }
                        },
                        "submit_script_filename": {
                            "default": "_aiidasubmit.sh",
                            "help": "Filename to which the job submission script is written.",
                            "name": "submit_script_filename",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'str'>"
                        },
                        "withmpi": {
                            "default": "False",
                            "name": "withmpi",
                            "non_db": "True",
                            "required": "False",
                            "valid_type": "<class 'bool'>"
                        }
                    },
                    "store_provenance": {
                        "default": "True",
                        "help": "If set to `False` provenance will not be stored in the database.",
                        "name": "store_provenance",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "<class 'bool'>"
                    }
                },
                "monitors": {
                    "_attrs": {
                        "default": [],
                        "dynamic": true,
                        "help": "Add monitoring functions that can inspect output files while the job is running and decide to prematurely terminate the job.",
                        "required": "False",
                        "valid_type": "<class 'aiida.orm.nodes.data.dict.Dict'>"
                    }
                },
                "parameters": {
                    "help": "Input parameters to generate the input file.",
                    "name": "parameters",
                    "non_db": "False",
                    "required": "True",
                    "valid_type": "<class 'aiida.orm.nodes.data.dict.Dict'>"
                },
                "remote_folder": {
                    "help": "Remote directory containing the results of an already completed calculation job without AiiDA. The inputs should be passed to the `CalcJob` as normal but instead of launching the actual job, the engine will recreate the input files and then proceed straight to the retrieve step where the files of this `RemoteData` will be retrieved as if it had been actually launched through AiiDA. If a parser is defined in the inputs, the results are parsed and attached as output nodes as usual.",
                    "name": "remote_folder",
                    "non_db": "False",
                    "required": "False",
                    "valid_type": "(<class 'aiida.orm.nodes.data.remote.base.RemoteData'>, <class 'NoneType'>)"
                },
                "settings": {
                    "help": "Additional input parameters",
                    "name": "settings",
                    "non_db": "False",
                    "required": "False",
                    "valid_type": "(<class 'aiida.orm.nodes.data.dict.Dict'>, <class 'NoneType'>)"
                },
                "store_gbw": {
                    "default": "<function OrcaCalculation.define.<locals>.<lambda> at 0x7fcd0afa2820>",
                    "help": "Store gbw file as SinglefileData output node.",
                    "name": "store_gbw",
                    "non_db": "False",
                    "required": "False",
                    "valid_type": "<class 'aiida.orm.nodes.data.bool.Bool'>"
                }
            }
        },
        "optimize": {
            "default": "<function OrcaWignerSpectrumWorkChain.define.<locals>.<lambda> at 0x7fcd0b8bff70>",
            "name": "optimize",
            "non_db": "False",
            "required": "False",
            "valid_type": "<class 'aiida.orm.nodes.data.bool.Bool'>"
        },
        "structure": {
            "name": "structure",
            "non_db": "False",
            "required": "True",
            "valid_type": "(<class 'aiida.orm.nodes.data.structure.StructureData'>, <class 'aiida.orm.nodes.data.array.trajectory.TrajectoryData'>)"
        },
        "wigner_low_freq_thr": {
            "default": "<function OrcaWignerSpectrumWorkChain.define.<locals>.<lambda> at 0x7fcd0af48550>",
            "name": "wigner_low_freq_thr",
            "non_db": "False",
            "required": "False",
            "valid_type": "<class 'aiida.orm.nodes.data.float.Float'>"
        }
    },
    "outline": [
        {
            "if(should_optimize)": [
                "optimize(Optimize geometry)",
                "inspect_optimization(Check whether optimization succeeded)"
            ]
        },
        "excite(Calculate excited states for a single geometry)",
        "inspect_excitation(Check whether excitation succeeded)",
        {
            "if(should_run_wigner)": [
                "wigner_sampling",
                "wigner_excite",
                "inspect_wigner_excitation(Check whether all wigner excitations succeeded)"
            ]
        },
        "results(Expose results from child workchains)"
    ],
    "outputs": {
        "_attrs": {
            "default": [],
            "dynamic": false,
            "help": null,
            "required": "True",
            "valid_type": "None"
        },
        "relaxed_structure": {
            "name": "relaxed_structure",
            "required": "False",
            "valid_type": "<class 'aiida.orm.nodes.data.structure.StructureData'>"
        },
        "single_point_tddft": {
            "help": "Output parameters from a single-point TDDFT calculation",
            "name": "single_point_tddft",
            "required": "True",
            "valid_type": "<class 'aiida.orm.nodes.data.dict.Dict'>"
        },
        "wigner_tddft": {
            "help": "Output parameters from all Wigner TDDFT calculation",
            "name": "wigner_tddft",
            "required": "False",
            "valid_type": "<class 'aiida.orm.nodes.data.list.List'>"
        }
    }
}

Indeed it seems that the namespace is dynamic.

@danielhollas
Copy link
Author

Hmm, the dynamic namespace seems to come out from the OrcaBaseWorkChain for some reason.

https://github.com/pzarabadip/aiida-orca/blob/e267e8b1d875009bac755f8ab76fd4effe1cf218/aiida_orca/workchains/base.py#L19

Output from w = OrcaWorkflowFactory('orca.base') ; print(w.spec().inputs)

{
    "_attrs": {
        "default": [],
        "dynamic": false,
        "help": null,
        "required": "True",
        "valid_type": "<class 'aiida.orm.nodes.data.data.Data'>"
    },
    "clean_workdir": {
        "default": "<function BaseRestartWorkChain.define.<locals>.<lambda> at 0x7fcb2efcbdc0>",
        "help": "If `True`, work directories of all called calculation jobs will be cleaned at the end of execution.",
        "name": "clean_workdir",
        "non_db": "False",
        "required": "False",
        "valid_type": "<class 'aiida.orm.nodes.data.bool.Bool'>"
    },
    "handler_overrides": {
        "help": "Mapping where keys are process handler names and the values are a dictionary, where each dictionary can define the ``enabled`` and ``priority`` key, which can be used to toggle the values set on the original process handler declaration.",
        "name": "handler_overrides",
        "non_db": "False",
        "required": "False",
        "valid_type": "(<class 'aiida.orm.nodes.data.dict.Dict'>, <class 'NoneType'>)"
    },
    "max_iterations": {
        "default": "<function BaseRestartWorkChain.define.<locals>.<lambda> at 0x7fcb2f0b0790>",
        "help": "Maximum number of iterations the work chain will restart the process to finish successfully.",
        "name": "max_iterations",
        "non_db": "False",
        "required": "False",
        "valid_type": "<class 'aiida.orm.nodes.data.int.Int'>"
    },
    "metadata": {
        "_attrs": {
            "default": [],
            "dynamic": false,
            "help": null,
            "required": "False",
            "valid_type": "None"
        },
        "call_link_label": {
            "default": "CALL",
            "help": "The label to use for the `CALL` link if the process is called by another process.",
            "name": "call_link_label",
            "non_db": "True",
            "required": "False",
            "valid_type": "<class 'str'>"
        },
        "description": {
            "help": "Description to set on the process node.",
            "name": "description",
            "non_db": "True",
            "required": "False",
            "valid_type": "(<class 'str'>, <class 'NoneType'>)"
        },
        "label": {
            "help": "Label to set on the process node.",
            "name": "label",
            "non_db": "True",
            "required": "False",
            "valid_type": "(<class 'str'>, <class 'NoneType'>)"
        },
        "store_provenance": {
            "default": "True",
            "help": "If set to `False` provenance will not be stored in the database.",
            "name": "store_provenance",
            "non_db": "True",
            "required": "False",
            "valid_type": "<class 'bool'>"
        }
    },
    "orca": {
        "_attrs": {
            "default": [],
            "dynamic": true,
            "help": null,
            "required": "True",
            "valid_type": "<class 'aiida.orm.nodes.data.data.Data'>"
        },
        "code": {
            "help": "The `Code` to use for this job. This input is required, unless the `remote_folder` input is specified, which means an existing job is being imported and no code will actually be run.",
            "name": "code",
            "non_db": "False",
            "required": "False",
            "valid_type": "(<class 'aiida.orm.nodes.data.code.abstract.AbstractCode'>, <class 'NoneType'>)"
        },
        "file": {
            "_attrs": {
                "default": [],
                "dynamic": true,
                "help": "Additional input files like gbw or hessian",
                "required": "False",
                "valid_type": "<class 'aiida.orm.nodes.data.singlefile.SinglefileData'>"
            }
        },
        "metadata": {
            "_attrs": {
                "default": [],
                "dynamic": false,
                "help": null,
                "required": "False",
                "valid_type": "None"
            },
            "call_link_label": {
                "default": "CALL",
                "help": "The label to use for the `CALL` link if the process is called by another process.",
                "name": "call_link_label",
                "non_db": "True",
                "required": "False",
                "valid_type": "<class 'str'>"
            },
            "computer": {
                "help": "When using a \"local\" code, set the computer on which the calculation should be run.",
                "name": "computer",
                "non_db": "True",
                "required": "False",
                "valid_type": "(<class 'aiida.orm.computers.Computer'>, <class 'NoneType'>)"
            },
            "description": {
                "help": "Description to set on the process node.",
                "name": "description",
                "non_db": "True",
                "required": "False",
                "valid_type": "(<class 'str'>, <class 'NoneType'>)"
            },
            "dry_run": {
                "default": "False",
                "help": "When set to `True` will prepare the calculation job for submission but not actually launch it.",
                "name": "dry_run",
                "non_db": "True",
                "required": "False",
                "valid_type": "<class 'bool'>"
            },
            "label": {
                "help": "Label to set on the process node.",
                "name": "label",
                "non_db": "True",
                "required": "False",
                "valid_type": "(<class 'str'>, <class 'NoneType'>)"
            },
            "options": {
                "_attrs": {
                    "default": [],
                    "dynamic": false,
                    "help": null,
                    "required": "False",
                    "valid_type": "None"
                },
                "account": {
                    "help": "Set the account to use in for the queue on the remote computer",
                    "name": "account",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                },
                "additional_retrieve_list": {
                    "help": "List of relative file paths that should be retrieved in addition to what the plugin specifies.",
                    "name": "additional_retrieve_list",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'list'>, <class 'tuple'>, <class 'NoneType'>)"
                },
                "append_text": {
                    "default": "",
                    "help": "Set the calculation-specific append text, which is going to be appended in the scheduler-job script, just after the code execution",
                    "name": "append_text",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'str'>"
                },
                "custom_scheduler_commands": {
                    "default": "",
                    "help": "Set a (possibly multiline) string with the commands that the user wants to manually set for the scheduler. The difference of this option with respect to the `prepend_text` is the position in the scheduler submission file where such text is inserted: with this option, the string is inserted before any non-scheduler command",
                    "name": "custom_scheduler_commands",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'str'>"
                },
                "environment_variables": {
                    "default": "<function CalcJob.define.<locals>.<lambda> at 0x7fcb2d9065e0>",
                    "help": "Set a dictionary of custom environment variables for this calculation",
                    "name": "environment_variables",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'dict'>"
                },
                "environment_variables_double_quotes": {
                    "default": "False",
                    "help": "If set to True, use double quotes instead of single quotes to escape the environment variables specified in ``environment_variables``.",
                    "name": "environment_variables_double_quotes",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'bool'>"
                },
                "import_sys_environment": {
                    "default": "True",
                    "help": "If set to true, the submission script will load the system environment variables",
                    "name": "import_sys_environment",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'bool'>"
                },
                "input_filename": {
                    "default": "aiida.inp",
                    "name": "input_filename",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'str'>"
                },
                "max_memory_kb": {
                    "help": "Set the maximum memory (in KiloBytes) to be asked to the scheduler",
                    "name": "max_memory_kb",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'int'>, <class 'NoneType'>)"
                },
                "max_wallclock_seconds": {
                    "help": "Set the wallclock in seconds asked to the scheduler",
                    "name": "max_wallclock_seconds",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'int'>, <class 'NoneType'>)"
                },
                "mpirun_extra_params": {
                    "default": "<function CalcJob.define.<locals>.<lambda> at 0x7fcb2d1afee0>",
                    "help": "Set the extra params to pass to the mpirun (or equivalent) command after the one provided in computer.mpirun_command. Example: mpirun -np 8 extra_params[0] extra_params[1] ... exec.x",
                    "name": "mpirun_extra_params",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'list'>, <class 'tuple'>)"
                },
                "output_filename": {
                    "default": "aiida.out",
                    "name": "output_filename",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'str'>"
                },
                "parser_name": {
                    "default": "orca_base_parser",
                    "name": "parser_name",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'str'>"
                },
                "prepend_text": {
                    "default": "",
                    "help": "Set the calculation-specific prepend text, which is going to be prepended in the scheduler-job script, just before the code execution",
                    "name": "prepend_text",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'str'>"
                },
                "priority": {
                    "help": "Set the priority of the job to be queued",
                    "name": "priority",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                },
                "qos": {
                    "help": "Set the quality of service to use in for the queue on the remote computer",
                    "name": "qos",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                },
                "queue_name": {
                    "help": "Set the name of the queue on the remote computer",
                    "name": "queue_name",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                },
                "rerunnable": {
                    "help": "Determines if the calculation can be requeued / rerun.",
                    "name": "rerunnable",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'bool'>, <class 'NoneType'>)"
                },
                "resources": {
                    "help": "Set the dictionary of resources to be used by the scheduler plugin, like the number of nodes, cpus etc. This dictionary is scheduler-plugin dependent. Look at the documentation of the scheduler for more details.",
                    "name": "resources",
                    "non_db": "True",
                    "required": "True",
                    "valid_type": "<class 'dict'>"
                },
                "scheduler_stderr": {
                    "default": "_scheduler-stderr.txt",
                    "help": "Filename to which the content of stderr of the scheduler is written.",
                    "name": "scheduler_stderr",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'str'>"
                },
                "scheduler_stdout": {
                    "default": "_scheduler-stdout.txt",
                    "help": "Filename to which the content of stdout of the scheduler is written.",
                    "name": "scheduler_stdout",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'str'>"
                },
                "stash": {
                    "_attrs": {
                        "default": [],
                        "dynamic": false,
                        "help": "Optional directives to stash files after the calculation job has completed.",
                        "required": "False",
                        "valid_type": "None"
                    },
                    "source_list": {
                        "help": "Sequence of relative filepaths representing files in the remote directory that should be stashed.",
                        "name": "source_list",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "(<class 'tuple'>, <class 'list'>, <class 'NoneType'>)"
                    },
                    "stash_mode": {
                        "help": "Mode with which to perform the stashing, should be value of `aiida.common.datastructures.StashMode`.",
                        "name": "stash_mode",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                    },
                    "target_base": {
                        "help": "The base location to where the files should be stashd. For example, for the `copy` stash mode, this should be an absolute filepath on the remote computer.",
                        "name": "target_base",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                    }
                },
                "submit_script_filename": {
                    "default": "_aiidasubmit.sh",
                    "help": "Filename to which the job submission script is written.",
                    "name": "submit_script_filename",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'str'>"
                },
                "withmpi": {
                    "default": "False",
                    "name": "withmpi",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'bool'>"
                }
            },
            "store_provenance": {
                "default": "True",
                "help": "If set to `False` provenance will not be stored in the database.",
                "name": "store_provenance",
                "non_db": "True",
                "required": "False",
                "valid_type": "<class 'bool'>"
            }
        },
        "monitors": {
            "_attrs": {
                "default": [],
                "dynamic": true,
                "help": "Add monitoring functions that can inspect output files while the job is running and decide to prematurely terminate the job.",
                "required": "False",
                "valid_type": "<class 'aiida.orm.nodes.data.dict.Dict'>"
            }
        },
        "parameters": {
            "help": "Input parameters to generate the input file.",
            "name": "parameters",
            "non_db": "False",
            "required": "True",
            "valid_type": "<class 'aiida.orm.nodes.data.dict.Dict'>"
        },
        "remote_folder": {
            "help": "Remote directory containing the results of an already completed calculation job without AiiDA. The inputs should be passed to the `CalcJob` as normal but instead of launching the actual job, the engine will recreate the input files and then proceed straight to the retrieve step where the files of this `RemoteData` will be retrieved as if it had been actually launched through AiiDA. If a parser is defined in the inputs, the results are parsed and attached as output nodes as usual.",
            "name": "remote_folder",
            "non_db": "False",
            "required": "False",
            "valid_type": "(<class 'aiida.orm.nodes.data.remote.base.RemoteData'>, <class 'NoneType'>)"
        },
        "settings": {
            "help": "Additional input parameters",
            "name": "settings",
            "non_db": "False",
            "required": "False",
            "valid_type": "(<class 'aiida.orm.nodes.data.dict.Dict'>, <class 'NoneType'>)"
        },
        "store_gbw": {
            "default": "<function OrcaCalculation.define.<locals>.<lambda> at 0x7fcb2d906b80>",
            "help": "Store gbw file as SinglefileData output node.",
            "name": "store_gbw",
            "non_db": "False",
            "required": "False",
            "valid_type": "<class 'aiida.orm.nodes.data.bool.Bool'>"
        },
        "structure": {
            "help": "Input structure",
            "name": "structure",
            "non_db": "False",
            "required": "True",
            "valid_type": "<class 'aiida.orm.nodes.data.structure.StructureData'>"
        }
    }
}

@danielhollas
Copy link
Author

@sphuber here's a modified MWE that triggers the error. The substantive change is that SubProcess subclasses CalcJob instead of Process. This change for some mysterious reason changes the namespaces from expose_inputs to dynamic. (note that I also had to change/fix the validator to see the problem)

#!/usr/bin/env python
from plumpy import Process
from aiida.engine import CalcJob


class SubProcess(CalcJob):

    @classmethod
    def define(cls, spec):
        super().define(spec)
        spec.input('a')
        spec.input('b')
        spec.inputs.validator = cls.validate_inputs

    @classmethod
    def validate_inputs(cls, inputs, ctx):
        try:
            ctx.get_port('a')
        except ValueError:
            print('a not in ctx')
            return None

        if 'a' not in inputs:
            return f'input `a` is missing!'
        if 'b' not in inputs:
            return f'input `b` is missing!'

        a = value['a']
        if not isinstance(a, str):
            return f'input `a` should be a str, got: {type(a)}'

        b = value['b']
        if not isinstance(b, str):
            return f'input `b` should be a str, got: {type(b)}'

class MasterProcess(Process):

    @classmethod
    def define(cls, spec):
        super().define(spec)
        spec.expose_inputs(SubProcess, 'sub', exclude=('a',))
        spec.input('a')

print(MasterProcess.spec().inputs)
inputs = {
    'a': 'test',
    'sub': {
        'b': 'alt'
    }
}

print('VALIDATION:', MasterProcess.spec().inputs.validate(inputs))

Here's the output

{
    "_attrs": {
        "default": [],
        "dynamic": false,
        "help": null,
        "required": "True",
        "valid_type": "None"
    },
    "a": {
        "name": "a",
        "required": "True"
    },
    "sub": {
        "_attrs": {
            "default": [],
            "dynamic": true,
            "help": null,
            "required": "True",
            "valid_type": "<class 'aiida.orm.nodes.data.data.Data'>"
        },
        "b": {
            "name": "b",
            "non_db": "False",
            "required": "True"
        },
        "code": {
            "help": "The `Code` to use for this job. This input is required, unless the `remote_folder` input is specified, which means an existing job is being imported and no code will actually be run.",
            "name": "code",
            "non_db": "False",
            "required": "False",
            "valid_type": "(<class 'aiida.orm.nodes.data.code.abstract.AbstractCode'>, <class 'NoneType'>)"
        },
        "metadata": {
            "_attrs": {
                "default": [],
                "dynamic": false,
                "help": null,
                "required": "False",
                "valid_type": "None"
            },
            "call_link_label": {
                "default": "CALL",
                "help": "The label to use for the `CALL` link if the process is called by another process.",
                "name": "call_link_label",
                "non_db": "True",
                "required": "False",
                "valid_type": "<class 'str'>"
            },
            "computer": {
                "help": "When using a \"local\" code, set the computer on which the calculation should be run.",
                "name": "computer",
                "non_db": "True",
                "required": "False",
                "valid_type": "(<class 'aiida.orm.computers.Computer'>, <class 'NoneType'>)"
            },
            "description": {
                "help": "Description to set on the process node.",
                "name": "description",
                "non_db": "True",
                "required": "False",
                "valid_type": "(<class 'str'>, <class 'NoneType'>)"
            },
            "dry_run": {
                "default": "False",
                "help": "When set to `True` will prepare the calculation job for submission but not actually launch it.",
                "name": "dry_run",
                "non_db": "True",
                "required": "False",
                "valid_type": "<class 'bool'>"
            },
            "label": {
                "help": "Label to set on the process node.",
                "name": "label",
                "non_db": "True",
                "required": "False",
                "valid_type": "(<class 'str'>, <class 'NoneType'>)"
            },
            "options": {
                "_attrs": {
                    "default": [],
                    "dynamic": false,
                    "help": null,
                    "required": "False",
                    "valid_type": "None"
                },
                "account": {
                    "help": "Set the account to use in for the queue on the remote computer",
                    "name": "account",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                },
                "additional_retrieve_list": {
                    "help": "List of relative file paths that should be retrieved in addition to what the plugin specifies.",
                    "name": "additional_retrieve_list",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'list'>, <class 'tuple'>, <class 'NoneType'>)"
                },
                "append_text": {
                    "default": "",
                    "help": "Set the calculation-specific append text, which is going to be appended in the scheduler-job script, just after the code execution",
                    "name": "append_text",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'str'>"
                },
                "custom_scheduler_commands": {
                    "default": "",
                    "help": "Set a (possibly multiline) string with the commands that the user wants to manually set for the scheduler. The difference of this option with respect to the `prepend_text` is the position in the scheduler submission file where such text is inserted: with this option, the string is inserted before any non-scheduler command",
                    "name": "custom_scheduler_commands",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'str'>"
                },
                "environment_variables": {
                    "default": "<function CalcJob.define.<locals>.<lambda> at 0x7f9c8bcb5670>",
                    "help": "Set a dictionary of custom environment variables for this calculation",
                    "name": "environment_variables",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'dict'>"
                },
                "environment_variables_double_quotes": {
                    "default": "False",
                    "help": "If set to True, use double quotes instead of single quotes to escape the environment variables specified in ``environment_variables``.",
                    "name": "environment_variables_double_quotes",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'bool'>"
                },
                "import_sys_environment": {
                    "default": "True",
                    "help": "If set to true, the submission script will load the system environment variables",
                    "name": "import_sys_environment",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'bool'>"
                },
                "input_filename": {
                    "help": "Filename to which the input for the code that is to be run is written.",
                    "name": "input_filename",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                },
                "max_memory_kb": {
                    "help": "Set the maximum memory (in KiloBytes) to be asked to the scheduler",
                    "name": "max_memory_kb",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'int'>, <class 'NoneType'>)"
                },
                "max_wallclock_seconds": {
                    "help": "Set the wallclock in seconds asked to the scheduler",
                    "name": "max_wallclock_seconds",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'int'>, <class 'NoneType'>)"
                },
                "mpirun_extra_params": {
                    "default": "<function CalcJob.define.<locals>.<lambda> at 0x7f9c8bcb55e0>",
                    "help": "Set the extra params to pass to the mpirun (or equivalent) command after the one provided in computer.mpirun_command. Example: mpirun -np 8 extra_params[0] extra_params[1] ... exec.x",
                    "name": "mpirun_extra_params",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'list'>, <class 'tuple'>)"
                },
                "output_filename": {
                    "help": "Filename to which the content of stdout of the code that is to be run is written.",
                    "name": "output_filename",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                },
                "parser_name": {
                    "help": "Set a string for the output parser. Can be None if no output plugin is available or needed",
                    "name": "parser_name",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                },
                "prepend_text": {
                    "default": "",
                    "help": "Set the calculation-specific prepend text, which is going to be prepended in the scheduler-job script, just before the code execution",
                    "name": "prepend_text",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'str'>"
                },
                "priority": {
                    "help": "Set the priority of the job to be queued",
                    "name": "priority",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                },
                "qos": {
                    "help": "Set the quality of service to use in for the queue on the remote computer",
                    "name": "qos",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                },
                "queue_name": {
                    "help": "Set the name of the queue on the remote computer",
                    "name": "queue_name",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                },
                "rerunnable": {
                    "help": "Determines if the calculation can be requeued / rerun.",
                    "name": "rerunnable",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "(<class 'bool'>, <class 'NoneType'>)"
                },
                "resources": {
                    "help": "Set the dictionary of resources to be used by the scheduler plugin, like the number of nodes, cpus etc. This dictionary is scheduler-plugin dependent. Look at the documentation of the scheduler for more details.",
                    "name": "resources",
                    "non_db": "True",
                    "required": "True",
                    "valid_type": "<class 'dict'>"
                },
                "scheduler_stderr": {
                    "default": "_scheduler-stderr.txt",
                    "help": "Filename to which the content of stderr of the scheduler is written.",
                    "name": "scheduler_stderr",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'str'>"
                },
                "scheduler_stdout": {
                    "default": "_scheduler-stdout.txt",
                    "help": "Filename to which the content of stdout of the scheduler is written.",
                    "name": "scheduler_stdout",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'str'>"
                },
                "stash": {
                    "_attrs": {
                        "default": [],
                        "dynamic": false,
                        "help": "Optional directives to stash files after the calculation job has completed.",
                        "required": "False",
                        "valid_type": "None"
                    },
                    "source_list": {
                        "help": "Sequence of relative filepaths representing files in the remote directory that should be stashed.",
                        "name": "source_list",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "(<class 'tuple'>, <class 'list'>, <class 'NoneType'>)"
                    },
                    "stash_mode": {
                        "help": "Mode with which to perform the stashing, should be value of `aiida.common.datastructures.StashMode`.",
                        "name": "stash_mode",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                    },
                    "target_base": {
                        "help": "The base location to where the files should be stashd. For example, for the `copy` stash mode, this should be an absolute filepath on the remote computer.",
                        "name": "target_base",
                        "non_db": "True",
                        "required": "False",
                        "valid_type": "(<class 'str'>, <class 'NoneType'>)"
                    }
                },
                "submit_script_filename": {
                    "default": "_aiidasubmit.sh",
                    "help": "Filename to which the job submission script is written.",
                    "name": "submit_script_filename",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'str'>"
                },
                "withmpi": {
                    "default": "False",
                    "help": "Set the calculation to use mpi",
                    "name": "withmpi",
                    "non_db": "True",
                    "required": "False",
                    "valid_type": "<class 'bool'>"
                }
            },
            "store_provenance": {
                "default": "True",
                "help": "If set to `False` provenance will not be stored in the database.",
                "name": "store_provenance",
                "non_db": "True",
                "required": "False",
                "valid_type": "<class 'bool'>"
            }
        },
        "monitors": {
            "_attrs": {
                "default": [],
                "dynamic": true,
                "help": "Add monitoring functions that can inspect output files while the job is running and decide to prematurely terminate the job.",
                "required": "False",
                "valid_type": "<class 'aiida.orm.nodes.data.dict.Dict'>"
            }
        },
        "remote_folder": {
            "help": "Remote directory containing the results of an already completed calculation job without AiiDA. The inputs should be passed to the `CalcJob` as normal but instead of launching the actual job, the engine will recreate the input files and then proceed straight to the retrieve step where the files of this `RemoteData` will be retrieved as if it had been actually launched through AiiDA. If a parser is defined in the inputs, the results are parsed and attached as output nodes as usual.",
            "name": "remote_folder",
            "non_db": "False",
            "required": "False",
            "valid_type": "(<class 'aiida.orm.nodes.data.remote.base.RemoteData'>, <class 'NoneType'>)"
        }
    }
}
VALIDATION: Error occurred validating port 'inputs.sub': input `a` is missing!

@danielhollas
Copy link
Author

Regardless of why the input namespace is dynamic, this anyway should be fixed right?
From a design point of view, it seems a bit strange that a function called get_port would have such a big sideeffect of creating new dynamic ports. Should that be handled somewhere more up the stack, or via another function like get_port_or_create_dynamic.

I looked at other usage of get_port in aiida-core and the only other one is in exposed_inputs.

https://github.com/aiidateam/aiida-core/blob/fce1cd6d7da2b0241830cab30a83eb5ab978a16d/aiida/engine/processes/process.py#L958

I wonder if other unintended side effects are lurking there?

@sphuber sphuber transferred this issue from aiidateam/aiida-core Apr 18, 2023
@unkcpz
Copy link
Member

unkcpz commented Apr 18, 2023

Hi both, I am reviewing #268 but I am still confused about what actually happened in @danielhollas's case. The MWE is clearly shows the problem comes when the validator calls the get_port and creates the excluded port as dynamic. But which is such a validator in @danielhollas's process, it calls get_port explicitly?

@danielhollas
Copy link
Author

@unkcpz the particular validator that triggered this is this CalcJob validator in aiida-core.

https://github.com/aiidateam/aiida-core/blob/9de3f90a157ec7c9f99c1f17f3ca3548f6d96bba/aiida/engine/processes/calcjobs/calcjob.py#L54

@sphuber
Copy link
Collaborator

sphuber commented Apr 18, 2023

The problem comes from the following "conflicting" design requirements:

  • We want to allow validators that are "multi-port", since some validation rules for ports are dependent on the values for other ports
  • We want ports to be exposable to allow easy composition of workchains
  • When exposing, it is sometimes useful to be able to exclude certain ports so they can be added once in a different place in the namespace
  • Validators on exposed port(namespace)s can be invalidated if the namespace they are contained within is exposed with certain ports excluded

So, since we know that people will likely exclude certain ports of a CalcJob when exposing it, we can no longer assume in the CalcJob.validate_inputs validator that all ports are still there when it is called. To workaround this, I changed validators to have the PortNamespace itself be passed in as the ctx argument, the last argument of the validator. This can then be used to see if the target port is still there. If it isn't, it means whoever exposed that namespace, removed that port, and so we can no longer validate it. This is why these ctx.get_port calls are in the validator.

Now the problem was that the change I added to make dynamic output namespaces act recursively, it would also automatically recreate the input port that was excluded, and so now the validator would think it should validate it again, but of course no value was provided and so it failed.

Hope that helps clear things up a bit

@danielhollas
Copy link
Author

@sphuber Thank you for the plumpy PR, I am just looking at it.

Unrelated to that, did you manage to figure out why the namespace exposed from a CalcJob class is dynamic?

Also, I think it would be nice to cleanup the CalcJob validator, since I believe there is a logic error at the beginning.

    try:
        ctx.get_port('code')
        ctx.get_port('metadata.computer')
    except ValueError:
        # If the namespace no longer contains the `code` or `metadata.computer` ports we skip validation
        return None

This looks like the validator is never run, unless I provide both 'code' and 'metadata.computer', which is often redundant. I believe the validator should only be skipped if the 'code' post has been excluded, am I missing something?

@sphuber
Copy link
Collaborator

sphuber commented Apr 18, 2023

Unrelated to that, did you manage to figure out why the namespace exposed from a CalcJob class is dynamic?

No, that I haven't looked at yet, since, as you pointed out, that is not the real problem. Might still have a look later, but don't think it is that important for now.

This looks like the validator is never run, unless I provide both 'code' and 'metadata.computer', which is often redundant.

Not quite. The ctx is the actual PortNamespace, so that code is checking whether the ports are still there, not the values that are passed for it in the inputs. I think the check should still be there because the purpose is to make sure that if both the code and metadata.computer are passed in the inputs, then they should be commensurate. But if either of them are no longer part of the input namespace, then there really isn't anything to validate.

That being said, there is still a minor inconsistency (the line doing computer_from_code = code.computer after we get the code knowing it may be None), so I might still open a PR to address it.

@danielhollas
Copy link
Author

Not quite. The ctx is the actual PortNamespace, so that code is checking whether the ports are still there, not the values that are passed for it in the inputs.

Right, fair enough but...

I think the check should still be there because the purpose is to make sure that if both the code and metadata.computer are passed in the inputs, then they should be commensurate. But if either of them are no longer part of the input namespace, then there really isn't anything to validate.

That's not quite true though, there are a bunch of other validations that should be run (e.g. is the computer stored, are the scheduler parameters correct) if code and code.computer are provided.

@unkcpz
Copy link
Member

unkcpz commented Apr 18, 2023

Thanks, both! clear to me now.

Unrelated to that, did you manage to figure out why the namespace exposed from a CalcJob class is dynamic?

I think I have some clue one this. I try to run @sphuber's MWE and got different output as below.

        "sub": {
            "_attrs": {
                "default": [],
                "dynamic": true,
                "help": null,
                "required": "True",
                "valid_type": "<class 'aiida.orm.nodes.data.data.Data'>"
            },
            "b": {
                "is_metadata": "False",
                "name": "b",
                "non_db": "False",
                "required": "True"
            },
            "metadata": {
                "_attrs": {
                    "default": [],

Then I realize I use the Process class from aiida-core while in the in the case above it is the Process from plumpy. The valid_type defined in https://github.com/aiidateam/aiida-core/blob/fce1cd6d7da2b0241830cab30a83eb5ab978a16d/aiida/engine/processes/process.py#L119 leads to the discrepancy. Afterward, the code in

plumpy/src/plumpy/ports.py

Lines 396 to 397 in 485b196

if valid_type is not None:
self.dynamic = True
will make it a dynamic port.

So to me, this might be the direct root of the issue, the expose_inputs has no doubt to be a namespace and therefore make no sense to have a valid_type of orm.Data anymore.

@unkcpz
Copy link
Member

unkcpz commented Apr 20, 2023

Why the heck should setting a valid type automatically make the namespace dynamic??

This is asked by @danielhollas offline. @sphuber give me the reason for the initial design, can you put a quick comment here? It makes a lot of sense to me so I approve the PR for this issue.

@sphuber
Copy link
Collaborator

sphuber commented Apr 20, 2023

The point is that if you set a valid_type on a PortNamespace, for example, you say "this namespace accepts str objects", then it makes no sense to still refuse them because the namespace is not dynamic. In my mind, if you say "this namespace accepts Data nodes" then that clearly signals that you accept any port there as long as its type is Data. Otherwise, you would simply define explicit ports and set the valid_type on those ports.

@sphuber
Copy link
Collaborator

sphuber commented Apr 20, 2023

Fixed by #268

@sphuber sphuber closed this as completed Apr 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants