fix header length in dcd #1767

kain88-de · 2018-02-02T08:31:32Z

Changes made in this Pull Request:

the remark information is now consistent. Before we said that the title block is 164 bytes while we wrote 244.
DCDReader uses istart information to set the correct time.

PR Checklist

jbarnoud · 2018-02-02T08:50:51Z

Would there be a way to test that?

kain88-de · 2018-02-02T08:53:04Z

I asked @tclick to test out this branch. Once we know it works I can write a test to ensure consistency. We might also be able to rewrite the readdcd.h code to accept variable length headers.

tclick · 2018-02-02T09:53:31Z

I tested this using the CHARMM script, and it works. However, the thermodynamic calculations differ considerably from the original trajectory. This is due to line 403 in coordinates/DCD.py. If you change istart=0) to istart=1), then the previous and the current trajectory outputs match.

kain88-de · 2018-02-02T10:12:00Z

Do DCD's always start at 1? Should we instead make this an option of the write method to let the user decide?

tclick · 2018-02-02T10:43:38Z

In MDA 0.10.0, one was allowed to select the start time. You eventually dropped it, but AFAIK, CHARMM starts at one; I cannot say for NAMD, but VMD writes out a DCD starting at 0. That's why the thermodynamic output matched between your updated version and VMD. By allowing the user to select the start time, then s/he can decide where they want to begin the time.

kain88-de · 2018-02-02T10:51:37Z

@tclick I will make it user selectable and set the default to 1. We anyway pretend to write CHARMM files so it's better if we try to follow it.

kain88-de · 2018-02-02T13:57:01Z

@tclick can you test again.

I now set the remarks section to be 240 chars long by default (instead of 160 I had before). I also added an option to set istart (default = 1) and added a unit test to check if we store the remark information correctly in the DCD files. This option requires the least changes and keeps the long comments strings we decided to support when we the switched to libdcd. I would like a report if this works though. If there are any problems I switch back to 160 chars.

tylerjereddy · 2018-02-03T01:05:29Z

@kain88-de I see the review request -- will probably be Sunday at the earliest, though I'm guessing you're already fresher on the details than I. I do remember this being pretty fragile when we were porting to Python 3.

tclick · 2018-02-03T05:11:42Z

I am happy to report that the fix works. Thanks for the change.

kain88-de · 2018-02-03T08:28:20Z

@tylerjereddy I'm done now. I appreciate if you have a look on the weekend.

kain88-de · 2018-02-03T08:40:26Z

@tclick this PR should have another nice change for you now. The DCDReader now checks the istart variable in the DCD header to set the correct time. Before we always started from 0. A side effect of this is that I changed the default of istart back to 0 so that the DCDWriter starts writing from time 0, which I think is consistent with the rest of MDAnalysis.

tclick · 2018-02-03T08:56:52Z

@kain88-de <https://github.com/kain88-de>, that’s fine. Knowing that I should set `istart=1` for CHARMM is perfectly okay. I understand the need for consistency. Besides, NAMD may honestly start at 0 (although I’m not as familiar with NAMD). You might want to put a note in the docs about this difference, so users can set `istart=1` when wanting to use CHARMM for further work. I appreciate the work on this; in my current project, I moved away from using CHARMM for certain features because of this bug, and I also realized that the results could be accomplished purely in Python with MDA and numpy (albeit possibly a little slower but with a reduction in external dependencies). Cordially, 柯明 Timothy H. Click, Ph.D. Department of Biological Science and Technology Institute of Bioinformatics and Systems Biology National Chiao Tung University 208 Lab Building 1, 75 Bo-Ai St. Dong District, Hsinchu, Taiwan 30062 (R.O.C.) +886-3-5712121 x56997 tclick@nctu.edu.tw

A DCD has two byte positions to store the length of the remark section. First a it reads number of bytes in the remark plus 4. Afterwards it has the number of 80 character sections. These two numbers have to match. So far we wrote first that we use 164 chars and three 80 character blocks, 164 != 80*3 + 4 Now we corrected that and state correctly we write 244 bytes in the remark section. We never noticed that because our reader ignores the information about the byte length on looks only at the number of blocks. As a side note here. DCD remarks always have have to be a multiple of 80 in length.

Programs who use DCD have different conventions for istart, the starting frame of the trajectory. Because of this we now allow it to be set by the user.

richardjgowers · 2018-02-04T20:37:12Z

testsuite/MDAnalysisTests/formats/test_libdcd.py

+    with open(testfile, 'rb') as fh:
+        header_bytes = fh.read()
+    # check for magic number
+    assert struct.unpack('i', header_bytes[:4])[0] == 84


It's been a while since I had to reverse engineer Fortran binary, but this looks right-
the magic numbers are the length in bytes of the arrays

Yes, the magic number is the length of the data block. It looks like all data blocks in the DCD file are marked with numbers of the corresponding length at the beginning and end. As I can see the header of a DCD file has 3-4 blocks.

84 bytes time information like dt, nsavc, ...

x * 80+4 bytes remarks

4 bytes natoms

xx bytes fixed atoms (may not be written)

tylerjereddy

An end user has confirmed the validity of the fixes here and @kain88-de has carefully designed some low-level tests, plus the unit test suite passes, so I'm quite happy with this PR.

I added very minor comments & asked for clarification where test suites outside of the "DCD theme" had to be adjusted, but I suspect @kain88-de will simply confirm that in all such cases the modifications were related to test set-ups that depended on i.e., DCD start frame initialization.

If another core dev agrees with the above, then +1 to merge.

tylerjereddy · 2018-02-05T01:39:58Z

package/CHANGELOG

@@ -26,6 +26,8 @@ Fixes
    (Issue #1759)
  * AtomGroup.dimensions now strictly returns a copy (Issue #1582)
  * lib.distances.transform_StoR now checks input type (Issue #1699)
+  * libdcd now write correct length of remark section (Issue #1701)


write -> writes

tylerjereddy · 2018-02-05T01:40:49Z

package/CHANGELOG

@@ -26,6 +26,8 @@ Fixes
    (Issue #1759)
  * AtomGroup.dimensions now strictly returns a copy (Issue #1582)
  * lib.distances.transform_StoR now checks input type (Issue #1699)
+  * libdcd now write correct length of remark section (Issue #1701)
+  * DCDReader now reports to correct time based on istart information (PR #1767)


to -> the

tylerjereddy · 2018-02-05T01:45:08Z

testsuite/MDAnalysisTests/analysis/test_rms.py

-        return [[0, 0, 0, 0, 0],
-                [49,   49,   4.6997, 1.9154, 2.7139]]
+        return [[0, 1000, 0, 0, 0],
+                [49,   1049,   4.6997, 1.9154, 2.7139]]


Can you clarify why results in the test_rms suite are changing because of subtle changes in DCD handling machinery?

i.e., if DCD test file is being used to seed the tests, that might make sense

This is the changed time. The DCD file doesn't start at time point 0 but rather at time 1000. We have been treating this wrong for some time.

tylerjereddy · 2018-02-05T01:50:59Z

testsuite/MDAnalysisTests/coordinates/test_memory.py

@@ -75,6 +75,8 @@ def reader(self, trajectory):

    def iter_ts(self, i):
        ts = self.universe.trajectory[i]
+        # correct time because memory reader doesn't read the correct time
+        ts.time = ts.frame * self.dt


Can you briefly clarify why DCD change is impacting memory reader here?

the memory reader tests are based on the DCD file and assumed it started from 0. This code restores that assumption. The memory reader can't actually deal with a trajectory that doesn't start at 0 at the moment. See the new issue #1769

kain88-de · 2018-02-05T14:10:10Z

package/MDAnalysis/lib/formats/include/readdcd.h

  fio_write_int32(fd, 3); /* the number of 80 character title strings */

  strncpy(title_string, remarks, 240);
+  // Enforce null-termination for long remark strings.
+  // Not a problem for MDAnalysis but maybe for other readers.
+  title_string[239] = '\0';


I noticed we have a problem with 0 terminating a string if they are longer then 240 characters. This isn't an issue for writing! It might cause issues for reading with external libraries. Our own cython code seems to handle a missing 0 terminator just fine. There is almost no way for us to test this. The cython library converts the char array into proper python string internally. We never get to see the raw char buffer in python.

I can again to a test of the expected byte. But nothing more.

I adjusted the tests and docs for this change. This way we can be sure that any DCD written by MDAnalysis doesn't trip over another C library reading DCD.

this might cause issues in other c libraries otherwise. document and fix null termination

- fixes #1819 - see PR #1832 and #1767 for discussions on DCD and istart; see also https://github.com/MDAnalysis/mdanalysis/wiki/FileFormats

kain88-de mentioned this pull request Feb 2, 2018

DCD Headers #1701

Closed

kain88-de requested a review from tylerjereddy February 2, 2018 08:53

kain88-de force-pushed the fix-dcd-witer branch from 10df173 to d257429 Compare February 2, 2018 13:58

kain88-de force-pushed the fix-dcd-witer branch from 772c1bd to f374de1 Compare February 3, 2018 08:24

kain88-de mentioned this pull request Feb 3, 2018

Dcd use istart information #1768

Closed

4 tasks

kain88-de force-pushed the fix-dcd-witer branch from 94d6403 to 528f80d Compare February 3, 2018 08:38

kain88-de added 2 commits February 3, 2018 16:03

DCDwriter now has a option to set istart

fd68f60

Programs who use DCD have different conventions for istart, the starting frame of the trajectory. Because of this we now allow it to be set by the user.

kain88-de force-pushed the fix-dcd-witer branch from 528f80d to 862ca4b Compare February 3, 2018 15:03

richardjgowers reviewed Feb 4, 2018

View reviewed changes

tylerjereddy approved these changes Feb 5, 2018

View reviewed changes

kain88-de force-pushed the fix-dcd-witer branch from 655bbed to 0955fb9 Compare February 5, 2018 08:02

richardjgowers approved these changes Feb 5, 2018

View reviewed changes

kain88-de commented Feb 5, 2018

View reviewed changes

kain88-de and others added 4 commits February 6, 2018 21:02

use istart information in dcd to set correct time

adaf9d9

update changelog

b307e73

Update test_dcd.py

75b69bc

proper null termination of remark string

2c55dcd

this might cause issues in other c libraries otherwise. document and fix null termination

kain88-de force-pushed the fix-dcd-witer branch from 65c2dc7 to 2c55dcd Compare February 6, 2018 20:02

kain88-de merged commit 2703b38 into develop Feb 7, 2018

kain88-de deleted the fix-dcd-witer branch February 7, 2018 16:20

kain88-de mentioned this pull request Mar 18, 2018

fix DCDReader istart (#1819) #1832

Merged

4 tasks

orbeckst added a commit that referenced this pull request Mar 23, 2018

updated CHANGELOG for #1819

6412516

- fixes #1819 - see PR #1832 and #1767 for discussions on DCD and istart; see also https://github.com/MDAnalysis/mdanalysis/wiki/FileFormats

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix header length in dcd #1767

fix header length in dcd #1767

kain88-de commented Feb 2, 2018 •

edited

Loading

jbarnoud commented Feb 2, 2018

kain88-de commented Feb 2, 2018

tclick commented Feb 2, 2018 •

edited

Loading

kain88-de commented Feb 2, 2018

tclick commented Feb 2, 2018

kain88-de commented Feb 2, 2018

kain88-de commented Feb 2, 2018

tylerjereddy commented Feb 3, 2018

tclick commented Feb 3, 2018

kain88-de commented Feb 3, 2018

kain88-de commented Feb 3, 2018

tclick commented Feb 3, 2018 via email

richardjgowers Feb 4, 2018

kain88-de Feb 5, 2018

tylerjereddy left a comment •

edited

Loading

tylerjereddy Feb 5, 2018

tylerjereddy Feb 5, 2018

tylerjereddy Feb 5, 2018

tylerjereddy Feb 5, 2018

kain88-de Feb 5, 2018

tylerjereddy Feb 5, 2018

kain88-de Feb 5, 2018

kain88-de Feb 5, 2018

kain88-de Feb 6, 2018

fix header length in dcd #1767

fix header length in dcd #1767

Conversation

kain88-de commented Feb 2, 2018 • edited Loading

PR Checklist

jbarnoud commented Feb 2, 2018

kain88-de commented Feb 2, 2018

tclick commented Feb 2, 2018 • edited Loading

kain88-de commented Feb 2, 2018

tclick commented Feb 2, 2018

kain88-de commented Feb 2, 2018

kain88-de commented Feb 2, 2018

tylerjereddy commented Feb 3, 2018

tclick commented Feb 3, 2018

kain88-de commented Feb 3, 2018

kain88-de commented Feb 3, 2018

tclick commented Feb 3, 2018 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tylerjereddy left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kain88-de commented Feb 2, 2018 •

edited

Loading

tclick commented Feb 2, 2018 •

edited

Loading

tylerjereddy left a comment •

edited

Loading