-
Notifications
You must be signed in to change notification settings - Fork 648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix header length in dcd #1767
fix header length in dcd #1767
Conversation
Would there be a way to test that? |
I asked @tclick to test out this branch. Once we know it works I can write a test to ensure consistency. We might also be able to rewrite the |
I tested this using the CHARMM script, and it works. However, the thermodynamic calculations differ considerably from the original trajectory. This is due to line 403 in coordinates/DCD.py. If you change |
Do DCD's always start at 1? Should we instead make this an option of the write method to let the user decide? |
In MDA 0.10.0, one was allowed to select the start time. You eventually dropped it, but AFAIK, CHARMM starts at one; I cannot say for NAMD, but VMD writes out a DCD starting at 0. That's why the thermodynamic output matched between your updated version and VMD. By allowing the user to select the start time, then s/he can decide where they want to begin the time. |
@tclick I will make it user selectable and set the default to 1. We anyway pretend to write CHARMM files so it's better if we try to follow it. |
@tclick can you test again. I now set the remarks section to be 240 chars long by default (instead of 160 I had before). I also added an option to set |
10df173
to
d257429
Compare
@kain88-de I see the review request -- will probably be Sunday at the earliest, though I'm guessing you're already fresher on the details than I. I do remember this being pretty fragile when we were porting to Python 3. |
I am happy to report that the fix works. Thanks for the change. |
772c1bd
to
f374de1
Compare
@tylerjereddy I'm done now. I appreciate if you have a look on the weekend. |
94d6403
to
528f80d
Compare
@tclick this PR should have another nice change for you now. The DCDReader now checks the |
@kain88-de <https://github.com/kain88-de>, that’s fine. Knowing that I should set `istart=1` for CHARMM is perfectly okay. I understand the need for consistency. Besides, NAMD may honestly start at 0 (although I’m not as familiar with NAMD). You might want to put a note in the docs about this difference, so users can set `istart=1` when wanting to use CHARMM for further work.
I appreciate the work on this; in my current project, I moved away from using CHARMM for certain features because of this bug, and I also realized that the results could be accomplished purely in Python with MDA and numpy (albeit possibly a little slower but with a reduction in external dependencies).
Cordially,
柯明 Timothy H. Click, Ph.D.
Department of Biological Science and Technology
Institute of Bioinformatics and Systems Biology
National Chiao Tung University
208 Lab Building 1, 75 Bo-Ai St.
Dong District, Hsinchu, Taiwan 30062
(R.O.C.)
+886-3-5712121 x56997
tclick@nctu.edu.tw
|
A DCD has two byte positions to store the length of the remark section. First a it reads number of bytes in the remark plus 4. Afterwards it has the number of 80 character sections. These two numbers have to match. So far we wrote first that we use 164 chars and three 80 character blocks, 164 != 80*3 + 4 Now we corrected that and state correctly we write 244 bytes in the remark section. We never noticed that because our reader ignores the information about the byte length on looks only at the number of blocks. As a side note here. DCD remarks always have have to be a multiple of 80 in length.
Programs who use DCD have different conventions for istart, the starting frame of the trajectory. Because of this we now allow it to be set by the user.
528f80d
to
862ca4b
Compare
with open(testfile, 'rb') as fh: | ||
header_bytes = fh.read() | ||
# check for magic number | ||
assert struct.unpack('i', header_bytes[:4])[0] == 84 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while since I had to reverse engineer Fortran binary, but this looks right-
the magic numbers are the length in bytes of the arrays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the magic number is the length of the data block. It looks like all data blocks in the DCD file are marked with numbers of the corresponding length at the beginning and end. As I can see the header of a DCD file has 3-4 blocks.
- 84 bytes time information like dt, nsavc, ...
- x * 80+4 bytes remarks
- 4 bytes natoms
- xx bytes fixed atoms (may not be written)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An end user has confirmed the validity of the fixes here and @kain88-de has carefully designed some low-level tests, plus the unit test suite passes, so I'm quite happy with this PR.
I added very minor comments & asked for clarification where test suites outside of the "DCD theme" had to be adjusted, but I suspect @kain88-de will simply confirm that in all such cases the modifications were related to test set-ups that depended on i.e., DCD start frame initialization.
If another core dev agrees with the above, then +1 to merge.
package/CHANGELOG
Outdated
@@ -26,6 +26,8 @@ Fixes | |||
(Issue #1759) | |||
* AtomGroup.dimensions now strictly returns a copy (Issue #1582) | |||
* lib.distances.transform_StoR now checks input type (Issue #1699) | |||
* libdcd now write correct length of remark section (Issue #1701) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write
-> writes
package/CHANGELOG
Outdated
@@ -26,6 +26,8 @@ Fixes | |||
(Issue #1759) | |||
* AtomGroup.dimensions now strictly returns a copy (Issue #1582) | |||
* lib.distances.transform_StoR now checks input type (Issue #1699) | |||
* libdcd now write correct length of remark section (Issue #1701) | |||
* DCDReader now reports to correct time based on istart information (PR #1767) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to
-> the
return [[0, 0, 0, 0, 0], | ||
[49, 49, 4.6997, 1.9154, 2.7139]] | ||
return [[0, 1000, 0, 0, 0], | ||
[49, 1049, 4.6997, 1.9154, 2.7139]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify why results in the test_rms
suite are changing because of subtle changes in DCD handling machinery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e., if DCD
test file is being used to seed the tests, that might make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the changed time. The DCD file doesn't start at time point 0 but rather at time 1000. We have been treating this wrong for some time.
@@ -75,6 +75,8 @@ def reader(self, trajectory): | |||
|
|||
def iter_ts(self, i): | |||
ts = self.universe.trajectory[i] | |||
# correct time because memory reader doesn't read the correct time | |||
ts.time = ts.frame * self.dt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you briefly clarify why DCD change is impacting memory reader here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the memory reader tests are based on the DCD file and assumed it started from 0. This code restores that assumption. The memory reader can't actually deal with a trajectory that doesn't start at 0 at the moment. See the new issue #1769
655bbed
to
0955fb9
Compare
fio_write_int32(fd, 3); /* the number of 80 character title strings */ | ||
|
||
strncpy(title_string, remarks, 240); | ||
// Enforce null-termination for long remark strings. | ||
// Not a problem for MDAnalysis but maybe for other readers. | ||
title_string[239] = '\0'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed we have a problem with 0 terminating a string if they are longer then 240 characters. This isn't an issue for writing! It might cause issues for reading with external libraries. Our own cython code seems to handle a missing 0 terminator just fine. There is almost no way for us to test this. The cython library converts the char array into proper python string internally. We never get to see the raw char buffer in python.
I can again to a test of the expected byte. But nothing more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adjusted the tests and docs for this change. This way we can be sure that any DCD written by MDAnalysis doesn't trip over another C library reading DCD.
this might cause issues in other c libraries otherwise. document and fix null termination
65c2dc7
to
2c55dcd
Compare
- fixes #1819 - see PR #1832 and #1767 for discussions on DCD and istart; see also https://github.com/MDAnalysis/mdanalysis/wiki/FileFormats
Fixes #1701
Changes made in this Pull Request:
PR Checklist
istart
user selectable and default to1