Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix header length in dcd #1767

Merged
merged 6 commits into from
Feb 7, 2018
Merged

fix header length in dcd #1767

merged 6 commits into from
Feb 7, 2018

Conversation

kain88-de
Copy link
Member

@kain88-de kain88-de commented Feb 2, 2018

Fixes #1701

Changes made in this Pull Request:

  • the remark information is now consistent. Before we said that the title block is 164 bytes while we wrote 244.
  • DCDReader uses istart information to set the correct time.

PR Checklist

  • make istart user selectable and default to 1
  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@kain88-de kain88-de mentioned this pull request Feb 2, 2018
@jbarnoud
Copy link
Contributor

jbarnoud commented Feb 2, 2018

Would there be a way to test that?

@kain88-de
Copy link
Member Author

I asked @tclick to test out this branch. Once we know it works I can write a test to ensure consistency. We might also be able to rewrite the readdcd.h code to accept variable length headers.

@tclick
Copy link

tclick commented Feb 2, 2018

I tested this using the CHARMM script, and it works. However, the thermodynamic calculations differ considerably from the original trajectory. This is due to line 403 in coordinates/DCD.py. If you change istart=0) to istart=1), then the previous and the current trajectory outputs match.

@kain88-de
Copy link
Member Author

Do DCD's always start at 1? Should we instead make this an option of the write method to let the user decide?

@tclick
Copy link

tclick commented Feb 2, 2018

In MDA 0.10.0, one was allowed to select the start time. You eventually dropped it, but AFAIK, CHARMM starts at one; I cannot say for NAMD, but VMD writes out a DCD starting at 0. That's why the thermodynamic output matched between your updated version and VMD. By allowing the user to select the start time, then s/he can decide where they want to begin the time.

@kain88-de
Copy link
Member Author

@tclick I will make it user selectable and set the default to 1. We anyway pretend to write CHARMM files so it's better if we try to follow it.

@kain88-de
Copy link
Member Author

@tclick can you test again.

I now set the remarks section to be 240 chars long by default (instead of 160 I had before). I also added an option to set istart (default = 1) and added a unit test to check if we store the remark information correctly in the DCD files. This option requires the least changes and keeps the long comments strings we decided to support when we the switched to libdcd. I would like a report if this works though. If there are any problems I switch back to 160 chars.

@tylerjereddy
Copy link
Member

@kain88-de I see the review request -- will probably be Sunday at the earliest, though I'm guessing you're already fresher on the details than I. I do remember this being pretty fragile when we were porting to Python 3.

@tclick
Copy link

tclick commented Feb 3, 2018

I am happy to report that the fix works. Thanks for the change.

@kain88-de
Copy link
Member Author

@tylerjereddy I'm done now. I appreciate if you have a look on the weekend.

@kain88-de
Copy link
Member Author

@tclick this PR should have another nice change for you now. The DCDReader now checks the istart variable in the DCD header to set the correct time. Before we always started from 0. A side effect of this is that I changed the default of istart back to 0 so that the DCDWriter starts writing from time 0, which I think is consistent with the rest of MDAnalysis.

@tclick
Copy link

tclick commented Feb 3, 2018 via email

A DCD has two byte positions to store the length of the remark section. First a
it reads number of bytes in the remark plus 4. Afterwards it has the number of
80 character sections. These two numbers have to match. So far we wrote first
that we use 164 chars and three 80 character blocks, 164 != 80*3 + 4

Now we corrected that and state correctly we write 244 bytes in the remark
section. We never noticed that because our reader ignores the information about
the byte length on looks only at the number of blocks.

As a side note here. DCD remarks always have have to be a multiple of 80 in
length.
Programs who use DCD have different conventions for istart, the starting frame
of the trajectory. Because of this we now allow it to be set by the user.
with open(testfile, 'rb') as fh:
header_bytes = fh.read()
# check for magic number
assert struct.unpack('i', header_bytes[:4])[0] == 84
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while since I had to reverse engineer Fortran binary, but this looks right-
the magic numbers are the length in bytes of the arrays

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the magic number is the length of the data block. It looks like all data blocks in the DCD file are marked with numbers of the corresponding length at the beginning and end. As I can see the header of a DCD file has 3-4 blocks.

  1. 84 bytes time information like dt, nsavc, ...
  2. x * 80+4 bytes remarks
  3. 4 bytes natoms
  4. xx bytes fixed atoms (may not be written)

Copy link
Member

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An end user has confirmed the validity of the fixes here and @kain88-de has carefully designed some low-level tests, plus the unit test suite passes, so I'm quite happy with this PR.

I added very minor comments & asked for clarification where test suites outside of the "DCD theme" had to be adjusted, but I suspect @kain88-de will simply confirm that in all such cases the modifications were related to test set-ups that depended on i.e., DCD start frame initialization.

If another core dev agrees with the above, then +1 to merge.

@@ -26,6 +26,8 @@ Fixes
(Issue #1759)
* AtomGroup.dimensions now strictly returns a copy (Issue #1582)
* lib.distances.transform_StoR now checks input type (Issue #1699)
* libdcd now write correct length of remark section (Issue #1701)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write -> writes

@@ -26,6 +26,8 @@ Fixes
(Issue #1759)
* AtomGroup.dimensions now strictly returns a copy (Issue #1582)
* lib.distances.transform_StoR now checks input type (Issue #1699)
* libdcd now write correct length of remark section (Issue #1701)
* DCDReader now reports to correct time based on istart information (PR #1767)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to -> the

return [[0, 0, 0, 0, 0],
[49, 49, 4.6997, 1.9154, 2.7139]]
return [[0, 1000, 0, 0, 0],
[49, 1049, 4.6997, 1.9154, 2.7139]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify why results in the test_rms suite are changing because of subtle changes in DCD handling machinery?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e., if DCD test file is being used to seed the tests, that might make sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the changed time. The DCD file doesn't start at time point 0 but rather at time 1000. We have been treating this wrong for some time.

@@ -75,6 +75,8 @@ def reader(self, trajectory):

def iter_ts(self, i):
ts = self.universe.trajectory[i]
# correct time because memory reader doesn't read the correct time
ts.time = ts.frame * self.dt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you briefly clarify why DCD change is impacting memory reader here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the memory reader tests are based on the DCD file and assumed it started from 0. This code restores that assumption. The memory reader can't actually deal with a trajectory that doesn't start at 0 at the moment. See the new issue #1769

fio_write_int32(fd, 3); /* the number of 80 character title strings */

strncpy(title_string, remarks, 240);
// Enforce null-termination for long remark strings.
// Not a problem for MDAnalysis but maybe for other readers.
title_string[239] = '\0';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed we have a problem with 0 terminating a string if they are longer then 240 characters. This isn't an issue for writing! It might cause issues for reading with external libraries. Our own cython code seems to handle a missing 0 terminator just fine. There is almost no way for us to test this. The cython library converts the char array into proper python string internally. We never get to see the raw char buffer in python.

I can again to a test of the expected byte. But nothing more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adjusted the tests and docs for this change. This way we can be sure that any DCD written by MDAnalysis doesn't trip over another C library reading DCD.

@kain88-de kain88-de merged commit 2703b38 into develop Feb 7, 2018
@kain88-de kain88-de deleted the fix-dcd-witer branch February 7, 2018 16:20
@kain88-de kain88-de mentioned this pull request Mar 18, 2018
4 tasks
orbeckst added a commit that referenced this pull request Mar 23, 2018
- fixes #1819
- see PR #1832 and #1767 for discussions on DCD and istart; see
  also https://github.com/MDAnalysis/mdanalysis/wiki/FileFormats
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants