Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve missing AtomGroup property error messages #2565

Closed
zemanj opened this issue Feb 28, 2020 · 11 comments · Fixed by #2636
Closed

Improve missing AtomGroup property error messages #2565

zemanj opened this issue Feb 28, 2020 · 11 comments · Fixed by #2636

Comments

@zemanj
Copy link
Member

zemanj commented Feb 28, 2020

Expected behavior
When I try to access an AtomGroup property that is generally valid but the corresponding data is missing from the underlying topology/trajectory, I expect a NoDataError with a message stating that the data I am requesting is not available from the topology/trajectory.

Actual behavior
When I load a topology that doesn't contain information about, e.g., charges, and try to access that AtomGroup property, an AttributeError: AtomGroup has no attribute charges is raised.

Code to reproduce the behavior

import MDAnalysis as mda
from MDAnalysis.tests.datafiles import GRO
u = mda.Universe(GRO)
u.atoms.charges

Version of MDAnalysis
Current 0.20.1 dev branch with Python 3.6 under Fedora 31
Should be version-independent though.

Detailed problem description
The AttributeError still makes sense if one tries to access an attribute that is generally unknown to Atom or AtomGroup. Thus, one needs to distinguish between attributes which could be there if the respective topology/trajectory info was there, and others that generally don't exist.
Furthermore, there are derived attributes such as fragindices, which are constructed from the topology's bond information. Thus, the error message for such derived attributes should state the missing information from which they are derived. The same is true for methods that require certain data, so that trying to access AtomGroup.center_of_mass() should raise a NoDataError due to missing information about masses.

Some examples: in the code example given above

  • u.atoms.boogiewoogie should raise an AttributeError
  • u.atoms.charges should raise a NoDataError due to missing charge information
  • u.atoms.fragindices should raise a NoDataError due to missing bond information
  • u.atoms.unwrap() should raise a NoDataError due to missing bond information
  • u.atoms.total_charge() should raise a NoDataError due to missing charge information
  • ...

Proposed solution

  1. Create a dictionary with all known AtomGroup attributes as keys and the corresponding topology/trajectory information which is required for the attribute to exists.
  2. Do the same for all known Atom attributes.
  3. In the __getattr__() methods of Atom and AtomGroup, check whether the requested attribute is present in the respective dictionary's keys. If (and only if) that is the case, raise a NoDataError with an error message clarifying the problem. Otherwise, do nothing so that an AttributeError is automatically raised whenever a truly unknown attribute is accessed.
  4. Write tests which cover all possible outcomes.

Regarding points 1 and 2, this involves parsing the transplants of AtomGroup and probably also those of Atom. Thanks to @richardjgowers we know how to automate the process so that the bookkeeping is done automatically, see his comment below.

@Naimish240
Copy link

Hey @zemanj ! I'll get onto it. Thanks!

@zemanj
Copy link
Member Author

zemanj commented Feb 29, 2020

Hi @Naimish240, thank you for taking this on! Please see also the discussion in #2536, there are some hints regarding how this can be handled properly.

@shfrz
Copy link
Contributor

shfrz commented Feb 29, 2020

Hi @zemanj, I have submitted a PR, please check it out. Thank you!

@AnshulAngaria
Copy link

Hi @zemanj, I would like to work on this. I want to know whether the error should be generalised to cover many attributes or just raise a NoDataError every time it comes across an unknown attribute with a better error message?

@zemanj
Copy link
Member Author

zemanj commented Mar 1, 2020

@shfrz @AnshulAngaria I appreciate your efforts. However, I have to say that I don't think it's good practice to "highjack" an issue another person is already working on (@Naimish240 in this case, see above). Nevertheless, I'll review your PRs but keep in mind that if @Naimish240 comes up with a proper solution in a timely manner, this is the one that will be merged.

If you think you have a good solution for an issue someone else is already about to work on, you should discuss this with him/her before moving ahead.

@zemanj
Copy link
Member Author

zemanj commented Mar 1, 2020

I edited my initial issue comment and added a detailed description of the problem. Furthermore, I sketched what I think a possible solution might look like.

@richardjgowers
Copy link
Member

@zemanj this sounds very good. The dictionaries don't need to be created by hand, (and shouldn't), instead they can get registered when the class is defined.

This sort of magic is already in use with Universe.add_TopologyAttr, where u.add_TopologyAttr('bfactors') knows to use the BFactors class.

https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/core/universe.py#L805

So what would need doing is here:

class _TopologyAttrMeta(type):

You would need to add a scan through the .transplants dictionary, the .items() attribute will be unbound functions, take the method names and map them to an error message relating to the TopologyAttr class... so somewhere once the module is imported there will be an automatically generated dictionary that looks like:

_TOPOLOGY_ERRORS = {
  'center_of_mass': 'This requires masses',
}

That can be used for making nice NoDataErrors

@shfrz
Copy link
Contributor

shfrz commented Mar 3, 2020

Hi @richardjgowers, this seems an elegant approach. I would love to implement that. But let's see what @Naimish240 comes up with first.
Thanks :)

@zemanj
Copy link
Member Author

zemanj commented Mar 3, 2020

@richardjgowers That sounds exactly like the solution I was looking for! As far as I can see, the only downside is that this machinery won't be on-demand (cached) but executed on import.
Ah, one more thing: One single dict might not suffice if we want to have this working also for other objects such as residues, segments, and their respective groups. Anyway, we certainly need to distinguish between Atom and AtomGroup errors.

@zemanj
Copy link
Member Author

zemanj commented Mar 10, 2020

Ping @Naimish240
Are you still interested in working on this?

@zemanj
Copy link
Member Author

zemanj commented Mar 12, 2020

As @Naimish240 seems to be unavailable, someone else may give this a go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment