Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reading chainID info from prmtop amber topologies. #4007

Merged
merged 5 commits into from
Sep 3, 2023

Conversation

pgbarletta
Copy link
Contributor

@pgbarletta pgbarletta commented Jan 23, 2023

Hi!

Would something like this be of interest? While tleap doesn't add chainID info, parmed does and I though I would be nice to support it as well.

Changes made in this Pull Request:

  • Add get_fmt() to TOPParser.parsesection_mapper() to support .prmtop comments like the following:
%FLAG RESIDUE_CHAINID
%COMMENT Residue chain ID (chainId) read from PDB file; DIMENSION(NRES)
%FORMAT(20a4)
  • Add parse_chainids() to TOPParser to try to parse chainID information from .prmtop topology files.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@codecov
Copy link

codecov bot commented Jan 23, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (c2e6df9) 93.39% compared to head (2c6513b) 93.40%.

Additional details and impacted files
@@            Coverage Diff             @@
##           develop    #4007     +/-   ##
==========================================
  Coverage    93.39%   93.40%             
==========================================
  Files          170      184     +14     
  Lines        22224    23354   +1130     
  Branches      4065     4071      +6     
==========================================
+ Hits         20757    21814   +1057     
- Misses         951     1024     +73     
  Partials       516      516             
Files Changed Coverage Δ
package/MDAnalysis/topology/TOPParser.py 100.00% <100.00%> (ø)

... and 14 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@orbeckst
Copy link
Member

@lilyminium would you have some bandwidth to have a look at this topology parser enhancement?

Alternatively, can you recommend someone else to review — if so, please pass the buck and ping them ;-).

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple initial comments, my major blocker here is that, as far as I can tell, RESIDUE_CHAINID isn't a canonical flag for the parm7 format. I'd like to ask for docs on this format spec addition.

package/MDAnalysis/topology/TOPParser.py Outdated Show resolved Hide resolved
package/MDAnalysis/topology/TOPParser.py Outdated Show resolved Hide resolved
package/MDAnalysis/topology/TOPParser.py Show resolved Hide resolved
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should have been a blocking review sorry.

@pgbarletta
Copy link
Contributor Author

Here's the parmed function that add's the RESIDUE_CHAINID flag.

IDK if Amber developers consider it canonical, but parmed writes it, so I guess it can be relied upon.

@IAlibay
Copy link
Member

IAlibay commented Jul 11, 2023

Could one of the other @MDAnalysis/coredevs weigh in here please? I'm rather on the fence on adding fields being introduced by parmed that aren't directly in the spec, I could go either way here.

@orbeckst
Copy link
Member

I am not an AMBER user myself so I don't know how important it is for typical use to support ParmEd extensions to the format. Could someone with AMBER experience comment?

@pgbarletta is the parm7 format extensible, i.e. does it allow for additional/optional flags? Does it state what should happen if an optional flag is encountered?

@pgbarletta
Copy link
Contributor Author

@pgbarletta is the parm7 format extensible, i.e. does it allow for additional/optional flags? Does it state what should happen if an optional flag is encountered?

Parmed does provide functions for the addition/removal of %FLAGs. This is the most up to date (that is, 2004) reference for the format (according to the official site) which already shows modifications to support conversion from CHARMM parameters.

If we decide to move forward, I should modify some one of the .prmtop topologies from the tests so I can test this feature. Otherwise, as @IAlibay, I could either move forward or scrap it. I just thought it would be a nice addition since amber users are left with no chainID info when reading from .rst7 and .prmtop .

@orbeckst
Copy link
Member

Thank you @pgbarletta . The PDF reference says

  • Any new SECTION should be added to the end of the topology file to avoid conflicts with order-dependent parsers.
  • ...
  • Avoid modifying if possible. Consider if this new section or change is truly necessary and belongs in the prmtop.

I read this as "it is allowed to extend the PRMTOP" so that we may have optional %FLAG SECTION.

Given that ParmEd produced files are probably still common, I am supporting parsing an optional %FLAG RESIDUE_CHAINID field, provided there are no problems when it is absent.

@github-actions
Copy link

github-actions bot commented Jul 13, 2023

Linter Bot Results:

Hi @pgbarletta! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location Outcome
main package ⚠️ Possible failure
testsuite ✅ Passed

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/6060116484/job/16443944129


Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

@pgbarletta
Copy link
Contributor Author

Thank you @pgbarletta . The PDF reference says
...

Oops, sorry, missed that.

Modified the ache.prmtop data file and the corresponding test, so the feature is tested as well.
If tests go well, I'll add it to the changelog and re-request a review.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My primary concern is with testing: Can you leave the old test file as is and make a copy with the additions. Then use the new one to run additional tests for the new features. We need to know that files without the optional parts also work. Can you change the new test file so that there are multiple segments (even if it makes little sense for the actual system) but so that multi-chains is also tested.

For other things see inline comments — please reply there. Thanks!

package/CHANGELOG Outdated Show resolved Hide resolved
Comment on lines 155 to 156
can be added with the addPDB command from parmed:
https://parmed.github.io/ParmEd/html/parmed.html#addpdb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use named links instead of bare links (see example above with

`PARM parameter/topology file specification`_

State what happens when %RESIDUE_CHAINID is not present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

package/MDAnalysis/topology/TOPParser.py Show resolved Hide resolved
x = FORTRANReader(y)

def get_fmt(file):
if (line := next(file))[:7] == "%FORMAT":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the syntax (line := ...) do, i.e., (x := y) ??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is equivalent to

line = next(file)
if line[:7] == "%FORMAT":
   ...

I hadn't see C-style "assignment inside conditionals" in Python. In which version of Python was this introduced?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I find the old-school Python two-line syntax clearer but I'll concede that this is a matter of personal style. More important is if this syntax is supported by all our Python versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the 'walrus operator', added on 3.8 and the minimum supported one is 3.9, so we'd be good. Anyways, I ended up dropping it.

Comment on lines 666 to 668
attr : :class:`Segids`
A :class:`Segids` instance containing the chainID of each residue
as defined in the parm7 file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really return a segids instance?? It looks as if this just returns an array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad mistake. Fixed.

package/MDAnalysis/topology/TOPParser.py Show resolved Hide resolved
Comment on lines 1 to 6
%VERSION VERSION_STAMP = V0001.000 DATE = 07/25/06 09:16:39
%FLAG TITLE
%FORMAT(20a4)
NALA
%FLAG POINTERS
%FORMAT(10I8)
%VERSION VERSION_STAMP = V0001.000 DATE = 07/13/23 13:12:57
%FLAG TITLE
%FORMAT(20a4)
NALA
%FLAG POINTERS
%FORMAT(10I8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter that the new version does not have full white-space padded lines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's a tell of which software was used to write the .prmtop. tleap uses fortran format writing, while parmed writes only the character that needs.

%COMMENT If present: %FLAG RESIDUE_ICODE, %FORMAT(20a4)
%FORMAT(20I4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14
%FLAG RESIDUE_CHAINID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep the old test file and make a new one with the additions. Then test both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@pgbarletta pgbarletta force-pushed the add_top_segid branch 4 times, most recently from d1ddbcb to 2a0bb02 Compare July 21, 2023 09:48
@pgbarletta
Copy link
Contributor Author

  • I added TestPRMmultiParser to test the new multi_ache.prmtop and restored TestPRMParser and its tested file ache.prmtop. I copied TestPRMParser to make TestPRMmultiParser so there's a lot of overlap between the tests that I think are a bit of a waste. If you're ok with it, I'd like to remove the tests on atom_i and atom_zero.
  • Regarding errors in the RESIDUE_CHAINID flag: I decided that giving a proper warning and just reading the file as if there was no RESIDUE_CHAINID info was the right thing to do, mainly because this is an optional flag and I don't think a field that is optional should prevent the user from doing what they want. Let me know if it's ok with you.

@pgbarletta pgbarletta requested a review from orbeckst July 21, 2023 10:17
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick initial comments - I'll have a proper review over the weekend.

attrs["segids"] = Segids(segids)
attrs["ChainIDs"] = ChainIDs(chainids)
n_segs = len(segids)
except (KeyError, ValueError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this, the default option is that the optional attribute (which isn't going to be present in most PARM7 files), is the option you hope will happen and then throw an error in all other cases?

I'm not very keen on this, since it means that we have to eat up the cost of throwing an error for 99% of the time. Is there a better way to reformat this so we don't have to deal with the error throwing / catching cost in most cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Using ifs now.

package/MDAnalysis/topology/TOPParser.py Show resolved Hide resolved
@@ -0,0 +1,4506 @@
%VERSION VERSION_STAMP = V0001.000 DATE = 07/20/23 21:22:41
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

304 kb for a new optional case is quite a lot. Can we bz2 this file please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -426,6 +426,8 @@
PFncdf_Top = (_data_ref / 'Amber/posfor.top').as_posix()
PFncdf_Trj = (_data_ref / 'Amber/posfor.ncdf').as_posix()

PRMmulti = (_data_ref / "Amber/multi_ache.prmtop").as_posix()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRMmulti doesn't really give me a sense of what is in this file. Can we either rename it to "PRM_chainids" or at the very least have a comment explaining what this file contains?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 158 to 168
"names",
"types",
"type_indices",
"charges",
"masses",
"resnames",
"bonds",
"angles",
"dihedrals",
"impropers",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changed here? If it's just a reformatting thing, could you revert this? It's a lot of extra noise on the diff that will make git blaming difficult if it ever needs to happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I can revert that if you want, but Darker asks for it. If the idea was to go towards flake8/black compliance, then stuff like this is gonna happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the text of the darker bot, if it's not a flake8 failure then it's ok to ignore.

The flake8 failures are there but don't need addressing because we don't enforce changes on datafiles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I misread that. I thought it was my new TestPRMChainidParser class, not the old TestPRMParser. It's fixed now.

@@ -201,6 +210,76 @@ class TestPRMParser(TOPBase):
expected_elems = None


class TestPRMmultiParser(TOPBase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, multi is a bit hard to know what it does, so maybe add an issue / PR number and a comment on what it's checking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

package/MDAnalysis/topology/TOPParser.py Show resolved Hide resolved
@IAlibay
Copy link
Member

IAlibay commented Jul 31, 2023

@pgbarletta we are changing the way in which future contributions to MDAnalysis are made. Could you please confirm that you agree to releasing this code under the terms of the LGPLv2.1 and that your contribution also adheres to the developer certificate of origion?

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick feedback, sorry this is taking so long.

@orbeckst can I ask you for a review here please?

package/CHANGELOG Outdated Show resolved Hide resolved
package/MDAnalysis/topology/TOPParser.py Show resolved Hide resolved
package/MDAnalysis/topology/TOPParser.py Show resolved Hide resolved
package/MDAnalysis/topology/TOPParser.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/datafiles.py Outdated Show resolved Hide resolved
package/MDAnalysis/topology/TOPParser.py Show resolved Hide resolved
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smaller questions + what @IAlibay said.

Comment on lines 30 to 31
* Add support for reading chainID info from prmtop amber topologies
(PR #4007)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not resolved because your editor inserted a <TAB> instead of spaces, try

Suggested change
* Add support for reading chainID info from prmtop amber topologies
(PR #4007)
* Add support for reading chainID info from prmtop amber topologies
(PR #4007)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation is still incorrect. If you commit my suggestion, it should be fixed.

package/MDAnalysis/topology/TOPParser.py Show resolved Hide resolved
@pgbarletta
Copy link
Contributor Author

@pgbarletta we are changing the way in which future contributions to MDAnalysis are made. Could you please confirm that you agree to releasing this code under the terms of the LGPLv2.1 and that your contribution also adheres to the developer certificate of origion?

Yes I release the code under LGPLv2.1 terms and yes my contribution adheres to the developer certificate of origin.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You addressed all the major things but it looks as if some of the smaller ones fell through the cracks — see comments please. Just the minor issues and then it's good from my side.

EDIT: The test is a major request that still needs to be added, see uncovered lines https://app.codecov.io/gh/MDAnalysis/mdanalysis/pull/4007/blob/package/MDAnalysis/topology/TOPParser.py#L322

Comment on lines 30 to 31
* Add support for reading chainID info from prmtop amber topologies
(PR #4007)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation is still incorrect. If you commit my suggestion, it should be fixed.

package/MDAnalysis/topology/TOPParser.py Show resolved Hide resolved
package/MDAnalysis/topology/TOPParser.py Show resolved Hide resolved
package/MDAnalysis/topology/TOPParser.py Show resolved Hide resolved
package/MDAnalysis/topology/TOPParser.py Show resolved Hide resolved
package/MDAnalysis/topology/TOPParser.py Show resolved Hide resolved
@orbeckst
Copy link
Member

Some runners timed out. I restarted one but stupid GH does not let me restart the other one in parallel.

@pgbarletta pgbarletta force-pushed the add_top_segid branch 4 times, most recently from f601f14 to 10f89be Compare August 16, 2023 14:22
@IAlibay
Copy link
Member

IAlibay commented Aug 16, 2023

Sorry about the delay @pgbarletta - Just been a bit stalled by the current release. I'm going to try to get a re-review in this evening.

@pgbarletta
Copy link
Contributor Author

Sorry about the delay @pgbarletta - Just been a bit stalled by the current release. I'm going to try to get a re-review in this evening.

No worries, I just switched continents and I've been dealing with that as well.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So sorry about the delay here @pgbarletta - I'm finally out of release mode so gettting this merged is now my priority.

On my end of things, there's only 2 things left and we should be good to go.

@orbeckst can I ask you for a re-review please?

@@ -162,6 +170,8 @@ class TOPParser(TopologyReaderBase):
warns users that chamber-style topologies are not currently supported
.. versionchanged:: 2.0.0
no longer guesses elements if missing
.. versionchanged:: 2.6.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. versionchanged:: 2.6.0
.. versionchanged:: 2.7.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just this and I think we're good (at least on my side of things)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, sorry. Fixed.

@@ -85,6 +88,23 @@ def test_impropers_atom_counts(self, filename):
assert len(u.atoms[[self.atom_i]].impropers) == \
self.expected_n_i_impropers

def test_chainIDs(self, filename):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than have this in a parent class that will be only used by one child class, could we just put this in the child class only?

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good to me.

@orbeckst could I ask you for a re-review please?

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looking good! Thank you @pgbarletta !!

Comment on lines +336 to +340
expected_chainIDs = np.array(
[
"A",
"A",
"A",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this "one line for each element" black's doing? ... Not my favorite code aesthetic (because it makes it harder to see context) but I accept it as your preferred choice.

So don't worry about my comment, I am just venting — I actually recognize the upsides to auto-formatters, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is one of those bits that really kills readability and intent in code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't even understand the rationale when it seems perfectly capable to organize other arrays neatly fitted in a line. ... Aaaaanyways... I am trying out black in MDPOW, just to see what it feels like and if it's really as life-altering, time-saving as @RMeli says it is ;-).

@orbeckst orbeckst self-assigned this Sep 3, 2023
@orbeckst orbeckst merged commit 2acd594 into MDAnalysis:develop Sep 3, 2023
19 of 21 checks passed
@orbeckst
Copy link
Member

orbeckst commented Sep 3, 2023

Thank you @pgbarletta — it took a while (sorry) but it's finally merged! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants