Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hubbard data structure for handling newest QuantumESPRESSO versions (>=7.1) #849

Merged
merged 45 commits into from
Apr 6, 2023

Conversation

bastonero
Copy link
Collaborator

@bastonero bastonero commented Oct 11, 2022

QuantumESPRESSO implements a new syntax for Hubbard input starting from v7.1 onward.

The pw.x now needs a new CARD in the input file.

We introduce the following new modules:

  • common.hubbard: it contains a code, structure agnostic data class describing Hubbard information, such as parameters, projectors, formulation.
  • data.hubbard_structure: it contains a new code agnostic aiida DataType, the HubbardStructureData, for handling Hubbard information along with its associated StructureData.
  • utils.hubbard: a specific, dedicated QuantumESPRESSO utility module for handling and manipulating HubbardStructureData.

The new data class and aiida data type implement Hubbard information in an agnostic manner, meaning they can be shared among other codes, and they have no constraints for our particular plugin implementation. This is handled through the new utils.hubbard module.

@bastonero bastonero requested review from sphuber and mbercx October 11, 2022 15:27
@bastonero bastonero self-assigned this Oct 12, 2022
@mbercx
Copy link
Member

mbercx commented Oct 12, 2022

Thanks a lot @bastonero! A couple of comments off the cuff:

The class has lots of methods: shall we put parts in one or more Mixin?

Would these methods also be added to a different class? Some of the methods (e.g. atomic_distance) definitely seem to be more general. I guess these could also be added to the StructureData class. Another suggestion might be to think of which methods the user should really need and hide the others.

Decide upon some final/semi-final name for the variables.

I think having the input of Cristiano and Iurii (maybe also Eric?) would be valuable here. I'll try to already set up an environment where they can start playing around with this class to give some feedback.

Decide upon final data structure of hubbard parameters (by now, it is a list of list, and the order is crucial within list)

If the order is important, are there any alternatives for the internal data structure?

How much to be restrictive in the input checks?

I guess it would be good to have some basic validation, so nothing breaks when they start to run with the structure. Have to check the specific methods though.

The class is initialised from a StructureData object - more convenient to let this more "free"?

Good question. I'm considering if:

  • We should not have the same API as the constructor of the StructureData class (which I don't particularly love), just for consistency, which then calls the constructor of StructureData with super(). Pro for this is consistency for the users.
  • Alternatively develop a "better" API here already. I like the idea of having a generic constructor that doesn't allow for inputs like pymatgen, structure_data etc, and then have class methods like from_pymatgen, from_structure_data etc. I think this may be more clear for users?

That said, I'm also wondering what you consider to be more "free" ^^

The hubbard_parameters could be a very long list of info: shall we dump them into e.g. yaml files or similar (e.g. into the repository)?

Typically, the questions you want to ask whether thinking database or repository are:

  • Do I want to query for this?
  • How much data is it.

Considering the answer to the first question is probably "no" (or I'm missing a use case?), and the second is: potentially big, I'd lean towards storage in the repository. As for the format: YAML seems fine, but I guess we should consider some other options (how fast is the (de)serialization, is there a native format that QE supports so our storage is consistent, ...).

Shall we then finally put the occupations as well? In case, to discuss how to store this.

We'd need these for the project we have in mind, no? (i.e. ML for @muhrin) Format we should indeed discuss.

@sphuber
Copy link
Contributor

sphuber commented Oct 12, 2022

Thanks a lot @bastonero .

The class has lots of methods: shall we put parts in one or more Mixin?

After a quick look, I definitely feel that all the *_quantumespresso_* methods should be factored out. Not really to just reduce the number of methods on the class for clarity, but since they don't really belong on this class. As we discussed before, I think this class should be a code-agnostic generic data container for Hubbard parameters. There should then be utility functions that can format that data in the format required by a particular. PwCalculation would then use the implementation you currently have, but it would allow another plugin to use the data class as well and simply implement their own "formatters".

Decide upon some final/semi-final name for the variables

Any particular ones that you had doubts about?

Decide upon final data structure of hubbard parameters (by now, it is a list of list, and the order is crucial within list)

The order of the list should be guaranteed when going to and from the database. So that should not be a problem. In principle I don't see a problem with this structure.

How much to be restrictive in the input checks?

I would be really restrictive and have tight checks. It would be really good if you define a very clear "schema" of what data it can the class can store. For starters, it should be very clear exactly what the valid data structure of hubbard_parameters can be. Currently, you would have to parse the validation in _set_hubbard_parameters to figure out the format. You could think of looking into pydantic and define a dataclass that can simply the validation for you, and then you just consume that and if valid store on the attributes.

The class is initialised from a StructureData object - more convenient to let this more "free"?

I agree with @mbercx , we are more or less stuck with the current interface of StructureData, but it is not the best. I think the core quantities that should be defined by a user are cell, pbc and sites (where sites would be of form sites=[['Si', [0, 0, 0]], ['Si', [1, 1, 1]]]). So I would make it something like:

class HubbardStructureData(StructureData):
    """Structure data containing code independent info on Hubbard parameters."""

    def __init__(
        self,
        cell: list[list[float]],
        pbc: list[bool],
        sites: list[list[str, list[float]]]
        hubbard_parameters: list = None, 
        hubbard_projectors: str = "ortho-atomic",
    ):
        self.cell = cell
        self.pbc = pbc
        self.hubbard_parameters = hubbard_parameters
        self.hubbard_projectors = hubbard_projectors
        self.sites = sites

    @property
    def sites(self):
        return super().sites

    @sites.setter
    def sites(self, values):
        for kind_name, position in values:
            self.append_atom(name=kind_name, position=position)

I haven't tested this, but I think you get the gist.

The hubbard_parameters could be a very long list of info: shall we dump them into e.g. yaml files or similar (e.g. into the repository)?

If you don't think a user would ever query for this, yes, probably go with the repo. But I think it might actually have very queryable data no? Also, shouldn't typically be larger than storing all positions of a structure with a medium amount of atoms and those are stored in database. So might just keep it there for now.

Shall we then finally put the occupations as well? In case, to discuss how to store this.

Which occupations?

@bastonero
Copy link
Collaborator Author

bastonero commented Oct 13, 2022

Thank you @mbercx and @sphuber for the comments.

The class has lots of methods: shall we put parts in one or more Mixin?

allow another plugin to use the data class as well and simply implement their own "formatters".

Thus you mean we should define a HubbardBaseData, from which a particular plugin would subclass it adding its methods?

Decide upon some final/semi-final name for the variables

Any particular ones that you had doubts about?

Me and @mbercx discussed yesterday about this. We will make try Iurii and Cristiano to see if the choices seem appropriate with what is used in the DFT community.

Decide upon final data structure of hubbard parameters (by now, it is a list of list, and the order is crucial within list)

The order of the list should be guaranteed when going to and from the database. So that should not be a problem. In principle I don't see a problem with this structure.

I was thinking that having list of list could make queries hard? Better having dictionaries?

The class is initialised from a StructureData object - more convenient to let this more "free"?

To comment and recap again upon this. The issue right now is that the StructureData is in aiida.core, meaning any change would require the update of aiida.core, which is not desirable. So, from v.3.x the base DataTypes are going to be moved in a separate plugin.
Now, what is then a good structure for the StructureData? Let's suppose we can try to explore in this branch. I think @sphuber has already pointed out the main attributes: cell, pbc and sites.
The last one though is crucial. In fact, it would be desirable to specify for each site some extra property. So ideally one would have a list of dictionaries, each one specifying at least the basic inputs, e.g. symbol(s), position and 'mass(es', plus additional ones that in the future aiida.data (or whatever is going to be) can be standardised.

As an example:

cell = [[1,0,0],[0,1,0],[0,0,1]]
pbc = [True,True,True]
sites = [
  {
    'symbol':'Fe',
    'position':[0,0,0],
    'mass':70,
    'spin':1, # to discuss eventually
    'hubbard_parameters': {
      'manifold':'3d',
      'neighbour_index':0,
      'neighbour_manifold:'3d',
      'value':5.0,
      'translation_vector':[0,0,0],
      'type':'dudarev',
     },
    'charge':None,
    'extra_property_1':value_1,
    'extra_property_2':value_2,
   # ...
  },
  {
    'symbol':'Fe',
    'position':[0.5,0,0],
    'mass':70,
    'spin':-1,
    'hubbard_parameters': {
      'manifold':'3d',
      'neighbour_index':1,
      'neighbour_manifold:'3d',
      'value':5.0,
      'translation_vector':[0,0,0],
      'type':'dudarev',
     },
    'charge':+4,
    'extra_property_1':value_1,
    'extra_property_2':value_2,
   # ...
  },
 # ...
]

This structure would also allow for automatic generation of the infamous kind names, which it seems we do not want anymore. But still, it would be possible, just by putting an extra property in case some scenario would eventually need it.

From here, we can then move to the next question:

The hubbard_parameters could be a very long list of info: shall we dump them into e.g. yaml files or similar (e.g. into the repository)?

If you don't think a user would ever query for this, yes, probably go with the repo. But I think it might actually have very queryable data no? Also, shouldn't typically be larger than storing all positions of a structure with a medium amount of atoms and those are stored in database. So might just keep it there for now.

Would the proposed structure for the new possible StructureData be easily queryable? I remember @giovannipizzi saying that having the sites as they are right now makes the queries hard, e.g. to find some atom symbols (?).

Shall we then finally put the occupations as well? In case, to discuss how to store this.

Which occupations?
The occupation matrices for the orbital projections on the Hubbard manifold.

If this new structure of the data could work, then we could think to go for this completely new StructureData, while avoiding subclassing (as we are doing right now with this new data in the branch) something that will need changes anyways. In the meantime, for deprecation purposes, we can think of setting a method to get the "traditional" StructureData in order to mediate the transition.

@bastonero
Copy link
Collaborator Author

I pin here @giovannipizzi, @sphuber, @mbercx, @muhrin, @zooks97 for opinions and comments (please, just see the above comment, or the entire discussion - it is rather urgent).

I had a look into the implementation of the current StructureData. What I am proposing is essentially the same that is already there, except the fact that symbols and masses are defined now in the infamous Kind class. I guess that this was done to avoid some repetition of data, which would still be small in most cases.

I think that, if the above proposal is what is actually desirable and we would agree, then we can momentarily "copy/paste" the entire class from aiida-core, remove the Kind class, and extend the Site class as mentioned before (with the due changes all over the class).

@mbercx mbercx force-pushed the feature/data/hubbard branch from c04c5f0 to d1e2808 Compare October 19, 2022 14:00
pyproject.toml Outdated Show resolved Hide resolved
@mbercx mbercx force-pushed the feature/data/hubbard branch from 8350849 to ed2b1d4 Compare October 20, 2022 07:30
@mbercx mbercx force-pushed the feature/data/hubbard branch from ed2b1d4 to ae1dbe8 Compare October 20, 2022 08:03

class Hubbard(HubbardProjectors):
"""Class for complete description of Hubbard interactions."""
hubbard_parameters: List[HubbardParameters] # only `parameters`?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here: hubbard_parameters or just hubbard? Or something else?

neighbour_index: conint(strict=True)
neighbour_manifold: constr(strip_whitespace=True, to_lower=True, min_length=2, max_length=5)
translation_vector: List[conint(strict=True)]
hubbard_value: float # just `value`?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we validate with PositiveFloat? Or too restrictive?

Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bastonero . Great PR, really clean and understandable code. Just some minor suggestions and questions.

src/aiida_quantumespresso/common/hubbard.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/common/hubbard.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/common/hubbard.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/common/hubbard.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/common/hubbard.py Outdated Show resolved Hide resolved
tests/common/test_hubbard.py Outdated Show resolved Hide resolved
tests/utils/test_hubbard.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/common/hubbard.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/utils/hubbard.py Outdated Show resolved Hide resolved
tests/common/__initi__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two more minor comments that I just came across. But I think this is good to go. @bastonero could you please update the branch, fix the failing docs build and write a final message for the commit (you can just update the original PR description with a description that is up to date). Then just make it a real PR from draft and I will approve and merge

src/aiida_quantumespresso/common/hubbard.py Show resolved Hide resolved
src/aiida_quantumespresso/data/hubbard_structure.py Outdated Show resolved Hide resolved
@bastonero bastonero changed the title Hubbard data type for handling new QE version and beyond Hubbard data type for handling QE v >= 7.1 Apr 5, 2023
@bastonero bastonero changed the title Hubbard data type for handling QE v >= 7.1 Hubbard data structure for handling newest QuantumESPRESSO versions (v.>=7.1) Apr 5, 2023
@bastonero bastonero changed the title Hubbard data structure for handling newest QuantumESPRESSO versions (v.>=7.1) Hubbard data structure for handling newest QuantumESPRESSO versions (>=7.1) Apr 5, 2023
@bastonero bastonero marked this pull request as ready for review April 5, 2023 12:19
@sphuber sphuber merged commit 355020c into main Apr 6, 2023
@sphuber sphuber deleted the feature/data/hubbard branch April 6, 2023 10:04
@sphuber
Copy link
Contributor

sphuber commented Apr 6, 2023

Fantastic, thanks a lot @bastonero !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants