Hubbard data structure for handling newest QuantumESPRESSO versions (>=`7.1`) #849

bastonero · 2022-10-11T15:27:02Z

QuantumESPRESSO implements a new syntax for Hubbard input starting from v7.1 onward.

The pw.x now needs a new CARD in the input file.

We introduce the following new modules:

common.hubbard: it contains a code, structure agnostic data class describing Hubbard information, such as parameters, projectors, formulation.
data.hubbard_structure: it contains a new code agnostic aiida DataType, the HubbardStructureData, for handling Hubbard information along with its associated StructureData.
utils.hubbard: a specific, dedicated QuantumESPRESSO utility module for handling and manipulating HubbardStructureData.

The new data class and aiida data type implement Hubbard information in an agnostic manner, meaning they can be shared among other codes, and they have no constraints for our particular plugin implementation. This is handled through the new utils.hubbard module.

…esso into feature/data/hubbard

…ndard`-`background`, and others

mbercx · 2022-10-12T13:18:08Z

Thanks a lot @bastonero! A couple of comments off the cuff:

The class has lots of methods: shall we put parts in one or more Mixin?

Would these methods also be added to a different class? Some of the methods (e.g. atomic_distance) definitely seem to be more general. I guess these could also be added to the StructureData class. Another suggestion might be to think of which methods the user should really need and hide the others.

Decide upon some final/semi-final name for the variables.

I think having the input of Cristiano and Iurii (maybe also Eric?) would be valuable here. I'll try to already set up an environment where they can start playing around with this class to give some feedback.

Decide upon final data structure of hubbard parameters (by now, it is a list of list, and the order is crucial within list)

If the order is important, are there any alternatives for the internal data structure?

How much to be restrictive in the input checks?

I guess it would be good to have some basic validation, so nothing breaks when they start to run with the structure. Have to check the specific methods though.

The class is initialised from a StructureData object - more convenient to let this more "free"?

Good question. I'm considering if:

We should not have the same API as the constructor of the StructureData class (which I don't particularly love), just for consistency, which then calls the constructor of StructureData with super(). Pro for this is consistency for the users.
Alternatively develop a "better" API here already. I like the idea of having a generic constructor that doesn't allow for inputs like pymatgen, structure_data etc, and then have class methods like from_pymatgen, from_structure_data etc. I think this may be more clear for users?

That said, I'm also wondering what you consider to be more "free" ^^

The hubbard_parameters could be a very long list of info: shall we dump them into e.g. yaml files or similar (e.g. into the repository)?

Typically, the questions you want to ask whether thinking database or repository are:

Do I want to query for this?
How much data is it.

Considering the answer to the first question is probably "no" (or I'm missing a use case?), and the second is: potentially big, I'd lean towards storage in the repository. As for the format: YAML seems fine, but I guess we should consider some other options (how fast is the (de)serialization, is there a native format that QE supports so our storage is consistent, ...).

Shall we then finally put the occupations as well? In case, to discuss how to store this.

We'd need these for the project we have in mind, no? (i.e. ML for @muhrin) Format we should indeed discuss.

sphuber · 2022-10-12T22:18:04Z

Thanks a lot @bastonero .

The class has lots of methods: shall we put parts in one or more Mixin?

After a quick look, I definitely feel that all the *_quantumespresso_* methods should be factored out. Not really to just reduce the number of methods on the class for clarity, but since they don't really belong on this class. As we discussed before, I think this class should be a code-agnostic generic data container for Hubbard parameters. There should then be utility functions that can format that data in the format required by a particular. PwCalculation would then use the implementation you currently have, but it would allow another plugin to use the data class as well and simply implement their own "formatters".

Decide upon some final/semi-final name for the variables

Any particular ones that you had doubts about?

Decide upon final data structure of hubbard parameters (by now, it is a list of list, and the order is crucial within list)

The order of the list should be guaranteed when going to and from the database. So that should not be a problem. In principle I don't see a problem with this structure.

How much to be restrictive in the input checks?

I would be really restrictive and have tight checks. It would be really good if you define a very clear "schema" of what data it can the class can store. For starters, it should be very clear exactly what the valid data structure of hubbard_parameters can be. Currently, you would have to parse the validation in _set_hubbard_parameters to figure out the format. You could think of looking into pydantic and define a dataclass that can simply the validation for you, and then you just consume that and if valid store on the attributes.

The class is initialised from a StructureData object - more convenient to let this more "free"?

I agree with @mbercx , we are more or less stuck with the current interface of StructureData, but it is not the best. I think the core quantities that should be defined by a user are cell, pbc and sites (where sites would be of form sites=[['Si', [0, 0, 0]], ['Si', [1, 1, 1]]]). So I would make it something like:

class HubbardStructureData(StructureData):
    """Structure data containing code independent info on Hubbard parameters."""

    def __init__(
        self,
        cell: list[list[float]],
        pbc: list[bool],
        sites: list[list[str, list[float]]]
        hubbard_parameters: list = None, 
        hubbard_projectors: str = "ortho-atomic",
    ):
        self.cell = cell
        self.pbc = pbc
        self.hubbard_parameters = hubbard_parameters
        self.hubbard_projectors = hubbard_projectors
        self.sites = sites

    @property
    def sites(self):
        return super().sites

    @sites.setter
    def sites(self, values):
        for kind_name, position in values:
            self.append_atom(name=kind_name, position=position)

I haven't tested this, but I think you get the gist.

The hubbard_parameters could be a very long list of info: shall we dump them into e.g. yaml files or similar (e.g. into the repository)?

If you don't think a user would ever query for this, yes, probably go with the repo. But I think it might actually have very queryable data no? Also, shouldn't typically be larger than storing all positions of a structure with a medium amount of atoms and those are stored in database. So might just keep it there for now.

Shall we then finally put the occupations as well? In case, to discuss how to store this.

Which occupations?

bastonero · 2022-10-13T09:12:50Z

Thank you @mbercx and @sphuber for the comments.

The class has lots of methods: shall we put parts in one or more Mixin?

allow another plugin to use the data class as well and simply implement their own "formatters".

Thus you mean we should define a HubbardBaseData, from which a particular plugin would subclass it adding its methods?

Decide upon some final/semi-final name for the variables

Any particular ones that you had doubts about?

Me and @mbercx discussed yesterday about this. We will make try Iurii and Cristiano to see if the choices seem appropriate with what is used in the DFT community.

Decide upon final data structure of hubbard parameters (by now, it is a list of list, and the order is crucial within list)

The order of the list should be guaranteed when going to and from the database. So that should not be a problem. In principle I don't see a problem with this structure.

I was thinking that having list of list could make queries hard? Better having dictionaries?

The class is initialised from a StructureData object - more convenient to let this more "free"?

To comment and recap again upon this. The issue right now is that the StructureData is in aiida.core, meaning any change would require the update of aiida.core, which is not desirable. So, from v.3.x the base DataTypes are going to be moved in a separate plugin.
Now, what is then a good structure for the StructureData? Let's suppose we can try to explore in this branch. I think @sphuber has already pointed out the main attributes: cell, pbc and sites.
The last one though is crucial. In fact, it would be desirable to specify for each site some extra property. So ideally one would have a list of dictionaries, each one specifying at least the basic inputs, e.g. symbol(s), position and 'mass(es', plus additional ones that in the future aiida.data (or whatever is going to be) can be standardised.

As an example:

cell = [[1,0,0],[0,1,0],[0,0,1]]
pbc = [True,True,True]
sites = [
  {
    'symbol':'Fe',
    'position':[0,0,0],
    'mass':70,
    'spin':1, # to discuss eventually
    'hubbard_parameters': {
      'manifold':'3d',
      'neighbour_index':0,
      'neighbour_manifold:'3d',
      'value':5.0,
      'translation_vector':[0,0,0],
      'type':'dudarev',
     },
    'charge':None,
    'extra_property_1':value_1,
    'extra_property_2':value_2,
   # ...
  },
  {
    'symbol':'Fe',
    'position':[0.5,0,0],
    'mass':70,
    'spin':-1,
    'hubbard_parameters': {
      'manifold':'3d',
      'neighbour_index':1,
      'neighbour_manifold:'3d',
      'value':5.0,
      'translation_vector':[0,0,0],
      'type':'dudarev',
     },
    'charge':+4,
    'extra_property_1':value_1,
    'extra_property_2':value_2,
   # ...
  },
 # ...
]

This structure would also allow for automatic generation of the infamous kind names, which it seems we do not want anymore. But still, it would be possible, just by putting an extra property in case some scenario would eventually need it.

From here, we can then move to the next question:

The hubbard_parameters could be a very long list of info: shall we dump them into e.g. yaml files or similar (e.g. into the repository)?

If you don't think a user would ever query for this, yes, probably go with the repo. But I think it might actually have very queryable data no? Also, shouldn't typically be larger than storing all positions of a structure with a medium amount of atoms and those are stored in database. So might just keep it there for now.

Would the proposed structure for the new possible StructureData be easily queryable? I remember @giovannipizzi saying that having the sites as they are right now makes the queries hard, e.g. to find some atom symbols (?).

Shall we then finally put the occupations as well? In case, to discuss how to store this.

Which occupations?
The occupation matrices for the orbital projections on the Hubbard manifold.

If this new structure of the data could work, then we could think to go for this completely new StructureData, while avoiding subclassing (as we are doing right now with this new data in the branch) something that will need changes anyways. In the meantime, for deprecation purposes, we can think of setting a method to get the "traditional" StructureData in order to mediate the transition.

bastonero · 2022-10-13T09:55:05Z

I pin here @giovannipizzi, @sphuber, @mbercx, @muhrin, @zooks97 for opinions and comments (please, just see the above comment, or the entire discussion - it is rather urgent).

I had a look into the implementation of the current StructureData. What I am proposing is essentially the same that is already there, except the fact that symbols and masses are defined now in the infamous Kind class. I guess that this was done to avoid some repetition of data, which would still be small in most cases.

I think that, if the above proposal is what is actually desirable and we would agree, then we can momentarily "copy/paste" the entire class from aiida-core, remove the Kind class, and extend the Site class as mentioned before (with the due changes all over the class).

…ndard`-`background`, and others

pyproject.toml

…iida-quantumespresso into feature/data/hubbard

…ndard`-`background`, and others

bastonero · 2023-03-21T12:47:05Z