Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interface Update v0.4.x #106

Closed
cortner opened this issue Jul 1, 2024 · 40 comments
Closed

Interface Update v0.4.x #106

cortner opened this issue Jul 1, 2024 · 40 comments

Comments

@cortner
Copy link
Member

cortner commented Jul 1, 2024

The following document summarizes a proposed interface update for 0.4.x. Please provide comments by 8 July, after which I will start a new PR that implements this. If you plan to provide comments but after 8 July just let me know here please.

https://hackmd.io/@TQNVbm-8R6mxyDAnh7mzAA/HJgCF-xP0

Short summary

EDIT - just to be clear - we are only talking about 0.4.x. Further evolutions are likely. This need not be the final iteration.

@cortner
Copy link
Member Author

cortner commented Jul 1, 2024

My sense is that the MAIN DECISION to be made is whether the reference implementation should become the de facto standard to be used by this community or just an example implementation that nobody should use. The current proposal is to demote it to an example implementation. Hearing opinions on this point would be particularly welcome - in favour or against either direction.

@rkurchin
Copy link
Collaborator

rkurchin commented Jul 1, 2024

I'll start by saying (again, but more publicly) a big THANK YOU 🙏 to @cortner for spearheading this!

As I've said privately, I'm generally in agreement with everything and have no major comments, certainly nothing blocking. Regarding the example vs. standard, I've started an attempt to organize some thoughts below (if others chime in I can attempt to summarize other thoughts in edits, too). Personally, I lean towards the "example implementation" option; however, I also think it's unlikely we'll be able to fully "enforce" either option (nor should we) and the likely steady-state will have components of both...

In favor of "de facto standard" / against "example implementation"

  • likely maximally simplifies interoperability
  • this approach will be familiar to folks coming from e.g. ASE

In favor of "example implementation" / against "de facto standard"

  • it is likely (inevitable?) that for reasons of performance/convenience, packages will utilize different internal representations of structure anyway; in a sense the whole point of a Julian interface is to enable this while still smoothing out interoperability challenges
  • (related to above point) this way probably makes it easier for packages to adopt AtomsBase support later on in their development

@rashidrafeek
Copy link
Contributor

Thank you for this. These updates looks good.

In the document, I was confused about a few things:

  • What does particle_id return? Is it supposed to return a Type like ChemicalElement or something else? While explaining this, another function named particle_type is being used which seems to return the particle type. Was this a typo? Is particle_type and particle_id the same?
  • atomic_number, atomic_symbol, atomic_mass are still part of the primary interface right?

@rashidrafeek
Copy link
Contributor

rashidrafeek commented Jul 1, 2024

I saw that a plural form for each setter function is added. Wouldn't defining singular as well as plural form of interface functions lead to numerous functions in the namespace? There seems to be plural forms defined for the primary interface functions position, mass and particle_id as positions, masses and particle_ids. Shoudn't there also be, atomic_numbers, atomic_symbols etc. If the interface is extended later on, with charge, velocity, momentum etc., again plural forms for these would need to be added. Wouldn't this lead unnecessary pollution of namespace.

Isn't it better to just use the singular form for the system as well, similar to the current definition? Just a suggestion.

@cortner
Copy link
Member Author

cortner commented Jul 1, 2024

Thanks for pointing out inconsistencies. I'll clear those up asap and then respond to the other point.

@cortner
Copy link
Member Author

cortner commented Jul 2, 2024

  • What does particle_id return? Is it supposed to return a Type like ChemicalElement or something else?

yes. ChemicalElement is the strong recommendation, but it could be something else and this is up to the implementer.

While explaining this, another function named particle_type is being used which seems to return the particle type. Was this a typo? Is particle_type and particle_id the same?

I changed everything to _id. There was a discussion about these two names and I couldn't find it. I have a vague memory we may have said that _type interferes with the technical type in the Julia language hence we switched to _id. @mfherbst - I remember you proposed a change of name, can you remember?

  • atomic_number, atomic_symbol, atomic_mass are still part of the primary interface right?

yes. I propose to provide fallback implementations, as described in the document, with functions such as

atomic_number(...) = atomic_number(particle_id(...))
atomic_symbol(...) = atomic_symbol(particle_id(...))
atomic_mass(...) = mass(...)

atomic_mass could be deprecated (I have no view on this), the other two should probably stay.

@cortner
Copy link
Member Author

cortner commented Jul 2, 2024

I saw that a plural form for each setter function is added. Wouldn't defining singular as well as plural form of interface functions lead to numerous functions in the namespace? There seems to be plural forms defined for the primary interface functions position, mass and particle_id as positions, masses and particle_ids. Shoudn't there also be, atomic_numbers, atomic_symbols etc. If the interface is extended later on, with charge, velocity, momentum etc., again plural forms for these would need to be added. Wouldn't this lead unnecessary pollution of namespace.

yes, but personally I don't see why this should be a problem. If I'm missing something please do point it out.

Isn't it better to just use the singular form for the system as well, similar to the current definition? Just a suggestion.

I personally find it very hard to read position(sys) and intuit that I am getting a list of positions rather than a single position. That said, this is a point where I feel I worked to convince others to agree with me rather than there was a general desire to change this.

I stand by my point and will continue to argue against allowing position(sys). However, I would also be happy with

position.(sys)
position(sys, :)     #  <=  this is maybe my favourite now.

Another advantage of positions, velocities etc (as Joe Greener pointed out) is that this makes the list of positions a system property but a single position a particle property.

For avoidance of any doubt: It is 100% ok to continue to debate this before making a final decision.

@rashidrafeek
Copy link
Contributor

I saw that a plural form for each setter function is added. Wouldn't defining singular as well as plural form of interface functions lead to numerous functions in the namespace? There seems to be plural forms defined for the primary interface functions position, mass and particle_id as positions, masses and particle_ids. Shoudn't there also be, atomic_numbers, atomic_symbols etc. If the interface is extended later on, with charge, velocity, momentum etc., again plural forms for these would need to be added. Wouldn't this lead unnecessary pollution of namespace.

yes, but personally I don't see why this should be a problem. If I'm missing something please do point it out.

The main problem that I feel is it doesn't feel natural while coding (maybe not a concrete reason). For example, I frequently interchange between using a subset of positions (currently position(sys, indices)) or all positions (currently position(sys)). Requiring to add an additional s just for the second case doesn't seem natural to me. Anyway, I am okay with either of the options.

I personally find it very hard to read position(sys) and intuit that I am getting a list of positions rather than a single position.

Can't the position be thought of as a single point in the configuration space, or for an arbitrary property, a single point in the N-dimensional space of whichever property we are looking at :)

@cortner
Copy link
Member Author

cortner commented Jul 2, 2024

So it just seems to come down to different people having different mental pathways. That's always unfortunate and makes it difficult to decide. Maybe this change needs to be postponed.

I would love to hear more opinions from others.

@jgreener64
Copy link
Collaborator

I had it in my head that position would always return one position and positions multiple, which would give a nice justification of different functions giving different return types. However as Rashid points out the position(sys, indices) case also returns multiple positions, so I can see some confusion arising if we introduce the plural case just for positions(sys).

The only firm opinion I have is that position.(sys) is probably not the way to go, as it makes it hard to dispatch with a custom system type.

@cortner
Copy link
Member Author

cortner commented Jul 8, 2024

Thanks everyone for the comments so far. As I start working on the PR we can still discuss. For now I won't make the singular/plural change yet.

@mfherbst
Copy link
Member

mfherbst commented Jul 8, 2024

Let me start by also issuing a big thank you to @cortner. I think as part of this discussion my general idea of what AtomsBase is for has already sharpened a lot.

I am also in general agreement with the proposal modulo some of the details pointed out for discussion in the md document and above.

Example versus reference implementation

I think we as an ecosystem will long-term need a standard implementation to point people to when they want to expose their data to AtomsBase via conversion.
To give one concrete use case example: in AtomsIO I have multiple use cases where the internal data representation from a package is not compatible with AtomsBase and the only way to integrate it with AtomsBase is a convert(AbstractSystem, data).
However, I don't think it needs to be decided now which implementation this should be and it does not need to be the implementation in this package. Rather I think of the implementations here as references for others to get inspiration and test their code against.

Change of name particle_id

I remember you proposed a change of name, can you remember?

I think type sounds more physics-y than _id, that's probably why I proposed _type over _id. But honestly I don't have a strong opinion.

atomic_mass versus mass

I vote to deprecate atomic_mass.

singular versus plural

Regarding the different mental pictures between singular and plural. Maybe the compromise here is to make things explicit. In that sense I like the idea of only allowing position(sys, :) over position(sys) and positions(sys). If we feel this is too explicit it is always easier to add sugar than to remove it later (because people will start using all variants).

@cortner
Copy link
Member Author

cortner commented Jul 8, 2024

long-term need a standard implementation ... I don't think it needs to be decided now which implementation this should be

I agree with this.

Change of name particle_id

I'll add a poll below.

singular versus plural

based on discussion so far I think what you write may be the best way forward. I will add another poll.

@cortner
Copy link
Member Author

cortner commented Jul 8, 2024

POLL: particle_id vs particle_type to return the category (e.g. chemical element) of a particle.

particle_id : 👍
particle_type : 👎

@cortner
Copy link
Member Author

cortner commented Jul 8, 2024

POLL: do you agree to deprecate atomic_mass (👍 vs 👎 )

@cortner
Copy link
Member Author

cortner commented Jul 8, 2024

POLL: do you agree to

  • deprecate position(sys) and to enforce the use of position(sys, :)
  • not introduce positions(sys)

This this motion is defeated then this discussion will be postponed to v0.5.x.

@swyant
Copy link

swyant commented Jul 8, 2024

After using LAMMPS a lot, particle_id is a bit confusing for me because I would actually expect to return an integer, since that's how atom_id's are used in LAMMPS. So I prefer particle_type. There's a risk that someone might expect a DataType to be returned when using particle_type, but as long as the documentation is clear, this shouldn't be an issue.

As for atomic_mass I think it's fine to deprecate it in favor of mass for better generalization. However, I still think there should be a default fallback for ChemicalElement's, i.e.
atomic_mass(p::ChemicalElement) = mass(p).

@swyant
Copy link

swyant commented Jul 8, 2024

For the position proposal, why not a slight variant to what you propose where:

  • deprecate position(sys) so that position always requires two arguments, i.e. position(sys,i) or positions(sys,:)
  • A default fallback positions(sys) = position(sys,:) that allows for a simple interface to the system-level property of all positions of a given system.

@cortner
Copy link
Member Author

cortner commented Jul 8, 2024

Re I'd vs type. We don't have to rush this. Is there a better terminology? I think both are problematic with type maybe Margo ally preferrable?

  • species
  • category

?

@cortner
Copy link
Member Author

cortner commented Jul 8, 2024

For the position proposal, why not a slight variant to what you propose where:

  • deprecate position(sys) so that position always requires two arguments, i.e. position(sys,i) or positions(sys,:)
  • A default fallback positions(sys) = position(sys,:) that allows for a simple interface to the system-level property of all positions of a given system.

Thanks for pointing this out. It is actually exactly what I wanted, but you are right I didn't highlight that positions is now just sugar. A few points made above suggest it isn't entirely obvious what the right approach is. There is some value in enforcing a uniform style.

@mfherbst
Copy link
Member

mfherbst commented Jul 8, 2024

Is there a better terminology?

@niklasschmitz also suggested "particle_kind" as an option. Sounds like type but rings different bells for CS people.

There is some value in enforcing a uniform style.

Agree !

@cortner
Copy link
Member Author

cortner commented Jul 9, 2024

Regarding id, type, kind, category, species: I think the term we choose has to capture the distinction between types of atoms rather than the distinctions between - say - atoms, electrons, etc.

@rashidrafeek
Copy link
Contributor

Re I'd vs type. We don't have to rush this. Is there a better terminology? I think both are problematic with type maybe Margo ally preferrable?

  • species
  • category

Doesn't species capture the functionality better? Currently, we have species_type which returns the type of atom/particle.

@cortner
Copy link
Member Author

cortner commented Jul 10, 2024

I like species; one doesn't even need particle_species or species_type; just species by itself should be fine?

@cortner
Copy link
Member Author

cortner commented Jul 10, 2024

For the chemists here : how does species or particle_species work in relation to the concept of "chemical species" (cf. discussions about how to define chemical element in a way that incorporates isotopes and various other things ....)

@mfherbst
Copy link
Member

how does species or particle_species work in relation to the concept of "chemical species"

I think for the standard use case it works fine. I think only for coarse graining and similar it can create conceptional clashes.

@jgreener64 might have some insight on the potential language in this case.

@cortner
Copy link
Member Author

cortner commented Jul 10, 2024

I would personally be quite comfortable with "species" for a coarse-grained particle. But would be good to hear more views.

@jgreener64
Copy link
Collaborator

species sounds okay to me, as does type. I agree that id might imply an integer index. Avoiding atom and chemical in function names would be sensible if we want to indicate that particles don't have to be atoms (for situations like coarse graining).

@tjjarvinen
Copy link
Collaborator

For the chemists here : how does species or particle_species work in relation to the concept of "chemical species" (cf. discussions about how to define chemical element in a way that incorporates isotopes and various other things ....)

It is a bit problematic. Chemistry definition can lead to confusion, as chemical species can mean more than just "atom type". So, I would prever particle_type over it.

@cortner
Copy link
Member Author

cortner commented Jul 11, 2024

I don't see the issue - sounds consistent to me to use the term species here.

EDIT: @tjjarvinen -- would be great if you can substantiate your concern. I can then poll people to decide.

@tjjarvinen
Copy link
Collaborator

It is not a big issue.

It just feels a bit out of context to speak about species, when we are talking about atom types. I somehow expect that when species is used, we are discussing about larger objects like molecules, radicals etc. I guess that this would be the initial reaction of other chemists too.

@rkurchin
Copy link
Collaborator

I would posit that that could actually be a positive thing – since people might conceivably be doing coarse-grained simulations where a single "particle" could mean a radical, etc. For me, "species" as a single atom is also okay, though, but I recognize it might read differently to different people.

@rashidrafeek
Copy link
Contributor

rashidrafeek commented Jul 13, 2024

Yes. Actually reading the wikipedia page makes me think that this term is even more apt for our use case. There a classification of "Chemical species" is made as:

Types of Chemical Species:

  • Atomic species
  • Molecular species
  • Ionic species
  • Radical species

which seems to match what we want.

@cortner
Copy link
Member Author

cortner commented Jul 13, 2024

Based on the above discussion can I suggest a quick poll please?

  • particle_type -> 😄
  • particle_species -> 🚀
  • species -> 👀

@cortner
Copy link
Member Author

cortner commented Jul 16, 2024

Thanks everybody. I'll continue with the work and hopefully open the PR soon.

@cortner
Copy link
Member Author

cortner commented Aug 17, 2024

I invite everyone who has participated in this discussion to look at the PR - #107

@cortner
Copy link
Member Author

cortner commented Aug 17, 2024

As I'm wrapping up this PR, I have just a final comment that is relevant to a more general discussion. We made the decision to "hide" the implementations and make them just example implementations. I think in the long term this is not a good solution. If we can all agree on a single or very small set of implementations that cover 99% of use-cases but remain highly performant, then this would be a big advantage to this community.

So for the next version 0.5 I would like to propose to turn this around entirely:

  • make AtomsBase the home for the reference implementations
  • make a new package AtomsBaseCore that becomes the interface package.

@mfherbst
Copy link
Member

mfherbst commented Aug 19, 2024

I agree with you, but I would suggest a different name. How about calling AtomsBaseCore AbstractSystems instead, because that's all it does, providing the AbstractSystem interface ?

On the AtomsBase 0.5 side: Why don't we make DecoratedParticles the reference implementation ? It seems to cover most of the requirements ?

In fact this actually poses an important point. Maybe it is a good idea to actually not "hide" the implementation in 0.4, but rather directly move the interface part out into a different package if this is what we want to do in the long run. At least for me this would make the transition 0.3 -> 0.4 -> 0.5 smoother and associated with less code change headaches.

What are other people's thoughts ?

@cortner
Copy link
Member Author

cortner commented Aug 19, 2024

We can also agree for now to still export the reference implementations but revisit this discussion for 0.5. I don't see us solving this immediately. Can be part of the Molssi workshop discussion.

@mfherbst
Copy link
Member

Closing as #107 merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants