Deprecations on all components to disappear or change in Topology Refactor #599

dotsdl · 2015-12-27T20:34:20Z

richardjgowers · 2015-12-27T21:18:29Z

Ahh this is a good point. We'll have to start a list on the wiki or something... once everything is finalised

orbeckst · 2015-12-31T07:11:22Z

Wiki/Issue363-Changes by @richardjgowers :

Is this a second big API break? In particular the fact that we now need explicit .atom in order to get at positions is probably going to break code, isn't it.
Also, is it clear what ResidueGroup.atoms.positions (ResidueGroup and SegmentGroup: methods that return arrays of length len(group.atoms) #411) is returning?

richardjgowers · 2015-12-31T10:58:30Z

@orbeckst Yes. If you're using Residues, ResidueGroups Segments or SegmentGroups as AtomGroups, (accessing atom level properties such as positions off them) then yes it will break things.

WRT #411, an XGroup now always returns things of len(XGroup), with the exception being .atoms .residues and .segments which return a different XGroup

RG.atoms returns a set* of the Atoms in that ResidueGroup
RG.atoms.positions returns all the positions of all the atoms in the RG

I think we made all level shifts (.atoms, .residues and .segments) return sets

kain88-de · 2015-12-31T11:45:21Z

@richardjgowers @dotsdl the changes seem to be very nice. For the changes page. Could you maybe create some real-code example how things will change from the current Topology Model to the new Model. I also would'nt mind if you start doing that in a notebook that you upload and link somewhere (a gist or maybe already as a new blog-post).

dotsdl · 2016-01-04T20:02:36Z

I think we made all level shifts (.atoms, .residues and .segments) return sets

Correct. Doing

AG.atoms

will give an AtomGroup containing only one instance of each atom represented within AG in atom index order. The same goes for:

RG.atoms
SG.atoms

The same principle applies for .residues and .segments from any Group. You always get what amounts to a set.

richardjgowers · 2016-01-05T12:35:35Z

I've been thinking about this, and it's a little complicated to add in deprecation warnings, but I think it's possible. So for example, RG.masses currently just inherits the AG.masses method, so you can't actually decorate the RG version as it doesn't exist as its own object.

But if we made a DeprecatedAtomGroup class, we could then deprecate everything...

class AtomGroup
    # mostly unchanged

class DeprecatedAtomGroup(AtomGroup)
    # add in some magic here to give deprecation warnings
    # maybe even hacking into getattr.......

class Residue(DeprecatedAtomGroup)

class RG(DeprecatedAtomGroup)

orbeckst · 2016-01-05T18:21:26Z

On 5 Jan, 2016, at 05:35, Richard Gowers wrote:

I've been thinking about this, and it's a little complicated to add in deprecation warnings, but I think it's possible. So for example, RG.masses currently just inherits the AG.masses method, so you can't actually decorate the RG version as it doesn't exist as its own object.

But if we made a DeprecatedAtomGroup class, we could then deprecate everything...

There's still @kain88-de's valid question as to should we deprecate if we cannot provide an alternative yet.

I think the answer is that this will be a second API break and we need to think about how to mitigate this. The last break already annoyed users, as I was being told.

orbeckst · 2016-01-05T18:27:38Z

Question about

I think we made all level shifts (.atoms, .residues and .segments) return sets

Are these groups unordered or ordered, i.e. like real sets (random order) or lists with unique elements in defined order?

richardjgowers · 2016-01-05T18:54:32Z

I don't think @dotsdl and I really 100% decided on the set idea, it just seemed like the neatest solution.

So considering a trivial system with 4 atoms 2 residues:

(a1, r1)
(a2, r1)
(a3, r2)
(a4, r2)

I think upshifts should (and currently do?) return a set:

AtomGroup(a1, a2, a3).residues = RG(r1, r2)

AtomGroup(a1, a3, a2).residues = RG(r1, r2)

It's the downshifts that are a little confusing, so this seems clear

RG(r1, r2).atoms - AG(a1, a2, a3, a4)

But then others are more confusing

RG(r1, r2, r1).atoms = AG(a1, a2, a3, a4) or AG(a1, a2, a3, a4, a1, a2)

Or more generically:

RG.atoms = set(R.atoms for R in RG) or [R.atoms for R in RG]

None of this will require much work to change on the new branch, so a discussion is welcome!

orbeckst · 2016-01-05T19:23:17Z

The current implementation does something like

RG.atoms = set(R.atoms for R in RG)

I also wondered if you would want duplicate atoms for RG(r1, r2, r1).atoms and I am tending towards the existing behavior that is also semantically more clear: "Give me the atoms contained in these residues". If you want specific atom orderings then you need to use + or select_atoms(sel1, sel2, sel1). However: discuss!

EDIT P.S.: If sets then I would want ordered sets, i.e. atoms in ascending atom index order.

dotsdl · 2016-01-05T19:39:45Z

Are these groups unordered or ordered, i.e. like real sets (random order) or lists with unique elements in defined order?

They are always ordered by {atom,residue,segment} index.

dotsdl · 2016-01-05T19:47:23Z

To elaborate further on the why...

Making *Group.atoms, *Group.residues, and *Group.segments always yield ordered sets, no matter the level coming from or going to, is the simplest rule that also accommodates an intuitive use. If one wants the resids of each atom in an AtomGroup, one can already get that with:

AG.resids

If one wants the resids of the residues represented among the atoms in the AtomGroup, one can then do:

AG.residues.resids

Likewise, if one wants an AtomGroup with no duplicates, that's easy with:

AG.atoms

The point is, making .atoms, .residues, .segments always yield ordered sets is a simple rule that also affords functionality that is gained, but not lost anywhere.

dotsdl · 2016-01-05T19:50:54Z

TLDR: Any other scheme requires probably different rules for what you get depending on which level you are coming from, which gets confusing and unintuitive real fast. We considered not using sets for about a day before we realized other schemes were hard to explain even to ourselves.

orbeckst · 2016-01-05T19:52:49Z

On 5 Jan, 2016, at 12:47, David Dotson wrote:

Likewise, if one wants an AtomGroup with no duplicates, that's easy with:

AG.atoms

The point is, making .atoms, .residues, .segments always yield ordered sets is a simple rule that also affords functionality that is gained, but not lost anywhere.

I am not so sure that I like the fact that you can have AG != atoms... at the moment

ag = u.atoms[[0,1,0]]
ag.atoms == ag

You might break code in rather subtle ways.

orbeckst · 2016-01-05T19:57:09Z

On 5 Jan, 2016, at 12:50, David Dotson wrote:

TLDR: Any other scheme requires probably different rules for what you get depending on which level you are coming from, which gets confusing and unintuitive real fast. We considered not using sets for about a day before we realized other schemes were hard to explain even to ourselves.

Fair enough, but I think the discussion here really shows that we have to be careful how we introduce this and how we prepare users. It's all good to be excited about a sleek new implementation (which is great, don't get me wrong!) but you also have a responsibility towards the user base � and most of them hate having to debug code that "used to work". Thus, it should be very clear what changes (and the wiki page is a starting point). I also liked the idea of writing a blog post before a merge, something like a vision for what needs to be changed and why and explain some of the rationale.

dotsdl · 2016-01-05T20:05:21Z

I am not so sure that I like the fact that you can have AG != atoms... at the moment

ag = u.atoms[[0,1,0]]
ag.atoms == ag

You might break code in rather subtle ways.

What is the point of ag.atoms if it gives you exactly the same object as simply ag? I get that there's probably a lot of code that uses these semantics, but I think it makes about as much sense as if a numpy array had a .array property that just pointed to itself.

orbeckst · 2016-01-05T21:26:46Z

Because the underlying principle was "everything is an AtomGroup", it was very convenient to either use the object, let's call it g itself or g.atoms, if you were not sure that g was actually a base AtomGroup and not e.g. a ResidueGroup. It made it much more convenient to use groups at different levels of the hierarchy.

What your are proposing for .atoms changes the meaning quite dramatically, from "these are the atoms as we have them recorded and arranged in this group" to "unique atoms in the group".

I am not saying that we can't change this but one has to be a bit sensitive to current usage.

kain88-de · 2016-01-05T22:04:38Z

How do you say an atom is unique? Do you go with the ID of the atom?

I still think it would be very nice to have an example where old code has to be changed or would be broken by the changes. That would make it easier to see how the changes would affect users.

richardjgowers · 2016-01-05T22:23:43Z

@kain88-de yeah, an Atom is just an index. Or rather, all Atoms/Residues/Segments are uniquely defined according to (Universe, Level (A/R/S), Index). The class defines the first two, so within the class you just worry about indices.

dotsdl · 2016-01-05T23:06:35Z

@orbeckst I think there are a few possible options for this whole scheme we can debate over to help focus discussion

1. Everything gives a set

Same rule for everything, in which a call to .atoms, .residues, .segments always gives a group of unique elements represented at that level.

Call	Gives
`AtomGroup.atoms`	ordered `AtomGroup` of unique atoms represented in `AtomGroup`
`AtomGroup.residues`	ordered `ResidueGroup` of unique residues represented in `AtomGroup`
`AtomGroup.segments`	ordered `SegmentGroup` of unique segments represented in `AtomGroup`
`ResidueGroup.atoms`	ordered `AtomGroup` of unique atoms represented in `ResidueGroup`
`ResidueGroup.residues`	ordered `ResidueGroup` of unique residues represented in `ResidueGroup`
`ResidueGroup.segments`	ordered `SegmentGroup` of unique segments represented in `ResidueGroup`
`SegmentGroup.atoms`	ordered `AtomGroup` of unique atoms represented in `SegmentGroup`
`SegmentGroup.residues`	ordered `ResidueGroup` of unique residues represented in `SegmentGroup`
`SegmentGroup.segments`	ordered `SegmentGroup` of unique segments represented in `SegmentGroup`

2. Everything gives a set except for the same level

Same as 1, with the exception that at the same level you get back the same object. Calling ResidueGroup.atoms gets you a set always, whereas AtomGroup.atoms does not.

Call	Gives
`AtomGroup.atoms`	`AtomGroup` identical to the calling `AtomGroup`
`AtomGroup.residues`	ordered `ResidueGroup` of unique residues represented in `AtomGroup`
`AtomGroup.segments`	ordered `SegmentGroup` of unique segments represented in `AtomGroup`
`ResidueGroup.atoms`	ordered `AtomGroup` of unique atoms represented in `ResidueGroup`
`ResidueGroup.residues`	`ResidueGroup` identical to the calling `ResidueGroup`
`ResidueGroup.segments`	ordered `SegmentGroup` of unique segments represented in `ResidueGroup`
`SegmentGroup.atoms`	ordered `AtomGroup` of unique atoms represented in `SegmentGroup`
`SegmentGroup.residues`	ordered `ResidueGroup` of unique residues represented in `SegmentGroup`
`SegmentGroup.segments`	`SegmentGroup` identical to the calling `SegmentGroup`

3. Order and repeat elements preserved as much as possible

Adds unique method/property to each *Group class that returns a set of the object.

At the atom level, calling .atoms, .residues, or .segments gets a group with an element for each atom.

At the residue level, calling .atoms gives an AtomGroup with repeats if the calling ResidueGroup itself had repeats, and atoms are given in the same order as the residues in the ResidueGroup. Calling .residues and .segments gets a group with an element for each residue.

At the segment level, calling .residues gives a ResidueGroup with repeats if the calling SegmentGroup itself had repeats, and residues are given in the same order as the segments in the SegmentGroup. Calling .atoms gives an AtomGroup of the atoms represented by each residue in residue order in each segment in segment order in the calling SegmentGroup, including repeats. Calling segments gets a group with an element for each segment.

Call	Gives
`AtomGroup.atoms`	`AtomGroup` identical to the calling `AtomGroup`
`AtomGroup.residues`	`ResidueGroup` of `len(AtomGroup)` giving the residue for each atom
`AtomGroup.segments`	`SegmentGroup` of `len(AtomGroup)` giving the segment for each atom
`ResidueGroup.atoms`	`AtomGroup` giving the atoms for each residue in the calling `ResidueGroup`'s order
`ResidueGroup.residues`	`ResidueGroup` identical to the calling `ResidueGroup`
`ResidueGroup.segments`	`SegmentGroup` of `len(ResidueGroup)` giving the segment for each residue
`SegmentGroup.atoms`	`AtomGroup` giving the atoms for each segment in the calling `SegmentGroup`'s order
`SegmentGroup.residues`	`ResidueGroup` giving the residues for each segment in the calling `SegmentGroup`'s order
`SegmentGroup.segments`	`SegmentGroup` identical to the calling `SegmentGroup`

4 Upwards give sets, downwards gives list comprehensions

Similar to 3 except Giving AtomGroup.residues as len(AG) is a little silly, so use 1's rule for upwards shifts.

Adds unique to return a set of the object.

1's downwards rules can be done using ResidueGroup.unique.atoms (but 1 can't really recreate the downward shifts here).

Call	Gives
`AtomGroup.atoms`	`AtomGroup` back again (`AtomGroup.atoms == AtomGroup`)
`AtomGroup.unique`	`set(AtomGroup)`
`AtomGroup.residues`	`set([Atom.residue for Atom in AG])`
`AtomGroup.segments`	`set([Atom.segment for Atom in AG])`
`ResidueGroup.atom`	`concatenate([Residue.atoms for Residue in RG])`
`ResidueGroup.unique`	`set(ResidueGroup)`
`ResidueGroup.residues`	`ResidueGroup`
`ResidueGroup.segments`	`set([R.segment for R in ResidueGroup])`
`SegmentGroup.atoms`	`concatenate([S.atoms for S in SegmentGroup])`
`SegmentGroup.unique`	`set(SegmentGroup)`
`SegmentGroup.residues`	`concatenate([S.residues for S in SegmentGroup])`
`SegmentGroup.segments`	`SegmentGroup`

dotsdl · 2016-01-05T23:07:31Z

I'm an advocate for 1, although the others can technically work.

jbarnoud · 2016-01-06T10:58:23Z

The main difference between 1 and 2 is that, in 1, AtomGroup.atoms removes duplicates, while 2 keeps them. Am I right?

To what extent are AtomGroups mutable? Indeed, the other difference I see is that, in 1, AtomGroup.atoms return a copy, it does not in 2.

I foresee less surprise with 1 as we can say that these properties always return a new object ans that they always deduplicate elements.

I am curious about 3. Is it possible to get (even not directly), with 1, what 3 would output?

richardjgowers · 2016-01-06T11:05:19Z

@jbarnoud yes I think you're correct about 1 vs 2

What I don't like in 1 is that AG.atoms doesn't explicitly say it's a set, I'd much rather that AG.atoms == AG and then add AG.unique == set(AG)

I think I'd like to invent 4, which is identical to 3 except AG.residues returns a set (and AG.segments, RG.segments)

tylerjereddy · 2016-01-06T12:14:06Z

I'll concede to just skimming this issue, but do let me know if I need to write any lib2to3 fixers for API changes. Extending ten2eleven to provide warning messages where the deprecation architecture is complicated or philosophically not correct might be an idea.

hainm · 2016-02-13T00:31:36Z

And what does segment do? Why not only atoms and residues?

orbeckst · 2016-02-13T00:33:57Z

On 12 Feb, 2016, at 17:31, Hai Nguyen wrote:

And what does segment do? Why not only atoms and residues?

Larger units, such as, say actual molecules.

Atoms are real, molecules are real, residues are convenient ;-)

(But of course, the whole solvent or the membrane could also conveniently be grouped as a segment, so that's why we are not calling it a molecule... that's what fragments are for, but they don't fit into the hierarchy.)

dotsdl · 2016-02-13T06:37:18Z

@hainm our new topology system is more performant, both in speed and memory usage, and more flexible (no hardcodes of attributes) than either the current MDAnalysis topology system or that of mdtraj. There is no reason to regress on this front. The new system also preserves much of the existing intuitive API of MDAnalysis that works quite well, but internally eliminates the possibility of staleness along with the performance benefits.

The only things that need deciding are what attribute access down the hierarchy returns. See the conversation earlier and my gist on the question.

dotsdl · 2016-02-13T06:44:09Z

@orbeckst

So to sidestep the ambiguity of accessing lower-level attributes from a single object (Segment, Residue), we've so far left out properties for these. That means that there is no Segment.names or Residue.names, even though there is a SegmentGroup.names and ResidueGroup.names.

Instead, accessing names from a Segment can be done with either Segment.atoms.names or Segment.residues.names. Accessing names from a Segment in this way requires one to be explicit in the level desired, which under my proposal would yield iterables structured according to the level used.

Is this a satisfying resolution? In short, we leave out lower-level properties from Residue and Segment, requiring explicit level specification. This is internally consistent with the idea that what is returned is an iterable of equal length to the calling object, since Residue and Segment aren't iterables at all.

dotsdl · 2016-02-13T06:50:00Z

Ah, should have re-read your summary:

Thus I propose to resolve any ambiguous cases by referring to properties in the next lower tier in the hierarchy.

You're in favor of making Segment behave roughly like ResidueGroup and Residue behave roughly like AtomGroup, then? I think that's my reading of it. I think it's doable, but would love to hear what @richardjgowers thinks.

hainm · 2016-02-13T06:52:11Z

@hainm our new topology system is more performant, both in speed and memory usage, and more flexible (no hardcodes of attributes) than either the current MDAnalysis topology system or that of mdtraj. There is no reason to regress on this front. The new system also preserves much of the existing intuitive API of MDAnalysis that works quite well, but internally eliminates the possibility of staleness along with the performance benefits.

yeah, I only make comment based on what I look at MDA and mdtraj tutorials. So I don't know what users want to use. Can you propose a detail example about new topology? Whether the new design will be widely used from users? How often it is used? I am afraid that we are trying to introduce many idea but ending no one ever uses (I throwed away 1/2 pytraj code that I wrote for 1/2 year to get simplicity :D).

hainm · 2016-02-13T06:58:16Z

For example: how often people use '_setter' method for SegmentGroup? If this design is for Topology editit, should let user to use another more-dedicated program?

dotsdl · 2016-02-13T06:59:14Z

Can you propose a detail example about new topology?

The new topology system works under the hood. Most users won't notice it, save for the performance gains and the API breaks here and there (which we are discussing now, in this very thread, and deprecating ahead of time what we can).

See the wiki for some details on what's changed and why. More will be written about it as we get closer to the merge of issue-363 into develop. You can give it a spin if you like, too. Most things work already.

dotsdl · 2016-02-13T07:00:43Z

For example: how often people use '_setter' method for SegmentGroup? If this design is for Topology editit, should let user to use another more-dedicated program?

Huh? I have no idea what you're asking. At any rate, this thread is pretty focused on a particular question, and I'd rather we keep it focused on that question until it's resolved. You are welcome to have a look at the issue-363 branch yourself to see what's going on there.

hainm · 2016-02-13T07:01:00Z

thanks.

sorry to dilute this issue.

richardjgowers · 2016-02-13T10:00:05Z

Hey

I think the nested returns are new since we last spoke, but I think they make sense. The only problem I see with them is that some atom properties don't work like that (mass and charge) and so it's not completely smooth. Ie RG.Atomprop isnt always always the same shape (but is always the same len or shape[0])

orbeckst · 2016-02-14T06:41:52Z

@dotsdl ,

You're in favor of making Segment behave roughly like ResidueGroup and Residue behave roughly like AtomGroup, then? I think that's my reading of it. I think it's doable, but would love to hear what @richardjgowers thinks.

Yes, together with your proposal to use nested lists (or iterables) where appropriate.

It is true that this breaks orthogonality (as @richardjgowers pointed out) but I think we can gain more in functionality. At a minimum we do guarantee that for plural attributes one always gets an iterable of len(group), with the items either being aggregated (charge, mass, ...) or being iterables themselves.

dotsdl · 2016-03-11T02:18:47Z

@orbeckst given your comment here, I take it this means we're fine with leaving out things like Residue.names that go downward in the hierarchy? That is, getting an atom-level property from a residue means doing e.g. Residue.atoms.names? If so, I'm in favor and will move forward with this (actually, I think we're already there...).

orbeckst · 2016-03-11T14:53:33Z

Yes.

Oliver Beckstein
email: orbeckst@gmail.com

Am Mar 10, 2016 um 19:18 schrieb David Dotson notifications@github.com:

@orbeckst given your comment here, I take it this means we're fine with leaving out things like Residue.names that go downward in the hierarchy? That is, getting an atom-level property from a residue means doing e.g. Residue.atoms.names? If so, I'm in favor and will move forward with this (actually, I think we're already there...).

—
Reply to this email directly or view it on GitHub.

dotsdl · 2016-03-11T18:00:34Z

Excellent. There's a fairly clear path forward then. Will keep hammering to get everything in the right place. As I said, I think we're at least 90% there already.

mnmelo · 2016-03-11T18:04:31Z

I'm also cool with that, but what is then the recommended way to have a list of all residue names? (Unduplicated, one name per residue)

dotsdl · 2016-03-11T18:06:42Z

@mnmelo no matter what group you're using to start with, you'd do grp.residues.resnames.

mnmelo · 2016-03-11T18:09:19Z

Ok, suspected as much. Cool with me.

richardjgowers · 2016-03-11T19:12:37Z

@dotsdl if you had a RG, doesn't RG.residues just return itself? I think grp.residues.unique is the most foolproof way?

dotsdl · 2016-03-11T19:23:50Z

@richardjgowers ah yeah, fair enough. doing grp.residues will give you a ResidueGroup formed according to the table (4).

@mnmelo if you want resnames for each residue but with absolutely no duplicate residues, you would do grp.residues.unique.resnames to get this.

This in accordance with the consensus on #599 (#599 (comment)).

dotsdl · 2016-03-30T04:30:46Z

Once #698 is merged, I'll fill in the checklist and we'll close. :D

dotsdl added Priority-High usability Component-Core labels Dec 27, 2015

dotsdl self-assigned this Dec 27, 2015

dotsdl added this to the 0.13 milestone Dec 27, 2015

orbeckst mentioned this issue Dec 28, 2015

new 0.13.0 release #600

Closed

dotsdl modified the milestones: 0.14, 0.14.1 - Bugfixes Feb 18, 2016

dotsdl mentioned this issue Feb 18, 2016

All deprecations on AtomGroup and friends getters and setters. #698

Merged

3 tasks

orbeckst mentioned this issue Mar 1, 2016

topology API semantics test case #703

Closed

2 tasks

dotsdl added a commit that referenced this issue Mar 28, 2016

Accessing topologyattrs always gives iterable of same length as caller.

ada354e

This in accordance with the consensus on #599 (#599 (comment)).

dotsdl closed this as completed Apr 11, 2016

dotsdl mentioned this issue Jan 16, 2017

Residue & Segment slicing. Residue.select_atoms? #1163

Closed

Deprecations on all components to disappear or change in Topology Refactor #599

Deprecations on all components to disappear or change in Topology Refactor #599

Comments

dotsdl commented Dec 27, 2015

richardjgowers commented Dec 27, 2015

orbeckst commented Dec 31, 2015

richardjgowers commented Dec 31, 2015

kain88-de commented Dec 31, 2015

dotsdl commented Jan 4, 2016

richardjgowers commented Jan 5, 2016

orbeckst commented Jan 5, 2016

orbeckst commented Jan 5, 2016

richardjgowers commented Jan 5, 2016

orbeckst commented Jan 5, 2016

dotsdl commented Jan 5, 2016

dotsdl commented Jan 5, 2016

dotsdl commented Jan 5, 2016

orbeckst commented Jan 5, 2016

orbeckst commented Jan 5, 2016

dotsdl commented Jan 5, 2016

orbeckst commented Jan 5, 2016

kain88-de commented Jan 5, 2016

richardjgowers commented Jan 5, 2016

dotsdl commented Jan 5, 2016

1. Everything gives a set

2. Everything gives a set except for the same level

3. Order and repeat elements preserved as much as possible

4 Upwards give sets, downwards gives list comprehensions

dotsdl commented Jan 5, 2016

jbarnoud commented Jan 6, 2016

richardjgowers commented Jan 6, 2016

tylerjereddy commented Jan 6, 2016

hainm commented Feb 13, 2016

orbeckst commented Feb 13, 2016

dotsdl commented Feb 13, 2016

dotsdl commented Feb 13, 2016

dotsdl commented Feb 13, 2016

hainm commented Feb 13, 2016

hainm commented Feb 13, 2016

dotsdl commented Feb 13, 2016

dotsdl commented Feb 13, 2016

hainm commented Feb 13, 2016

richardjgowers commented Feb 13, 2016

orbeckst commented Feb 14, 2016

dotsdl commented Mar 11, 2016

orbeckst commented Mar 11, 2016

dotsdl commented Mar 11, 2016

mnmelo commented Mar 11, 2016

dotsdl commented Mar 11, 2016

mnmelo commented Mar 11, 2016

richardjgowers commented Mar 11, 2016

dotsdl commented Mar 11, 2016

dotsdl commented Mar 30, 2016