Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix header length in dcd #1767

Merged
merged 6 commits into from
Feb 7, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion package/CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ The rules for this file:
* release numbers follow "Semantic Versioning" http://semver.org

------------------------------------------------------------------------------
mm/dd/18 richardjgowers, palnabarun, bieniekmateusz
mm/dd/18 richardjgowers, palnabarun, bieniekmateusz, kain88-de

* 0.17.1

Expand All @@ -26,6 +26,8 @@ Fixes
(Issue #1759)
* AtomGroup.dimensions now strictly returns a copy (Issue #1582)
* lib.distances.transform_StoR now checks input type (Issue #1699)
* libdcd now writes correct length of remark section (Issue #1701)
* DCDReader now reports the correct time based on istart information (PR #1767)


01/24/18 richardjgowers, rathann, orbeckst, tylerjereddy, mtiberti, kain88-de,
Expand Down
7 changes: 5 additions & 2 deletions package/MDAnalysis/coordinates/DCD.py
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ def Writer(self, filename, n_atoms=None, **kwargs):
def _frame_to_ts(self, frame, ts):
"""convert a dcd-frame to a :class:`TimeStep`"""
ts.frame = self._frame
ts.time = ts.frame * self.ts.dt
ts.time = (ts.frame + self._file.header['istart']) * self.ts.dt
ts.data['step'] = self._file.tell()

# The original unitcell is read as ``[A, gamma, B, beta, alpha, C]``
Expand Down Expand Up @@ -358,6 +358,7 @@ def __init__(self,
dt=1,
remarks='',
nsavc=1,
istart=0,
**kwargs):
"""Parameters
----------
Expand All @@ -380,6 +381,8 @@ def __init__(self,
how many MD steps is a frame saved to the DCD). By default, this
number is just set to one and this should be sufficient for almost
all cases but if required, nsavc can be changed.
istart : int (optional)
starting frame number. CHARMM defaults to 1.
**kwargs : dict
General writer arguments

Expand All @@ -400,7 +403,7 @@ def __init__(self,
nsavc=nsavc,
delta=delta,
is_periodic=1,
istart=0)
istart=istart)

def write_next_timestep(self, ts):
"""Write timestep object into trajectory.
Expand Down
7 changes: 5 additions & 2 deletions package/MDAnalysis/lib/formats/include/readdcd.h
Original file line number Diff line number Diff line change
Expand Up @@ -735,13 +735,16 @@ int write_dcdheader(fio_fd fd, const char *remarks, int N,
fio_write_int32(fd, 0);
}
fio_write_int32(fd, 84);
fio_write_int32(fd, 164);
fio_write_int32(fd, 244);
fio_write_int32(fd, 3); /* the number of 80 character title strings */

strncpy(title_string, remarks, 240);
// Enforce null-termination for long remark strings.
// Not a problem for MDAnalysis but maybe for other readers.
title_string[239] = '\0';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed we have a problem with 0 terminating a string if they are longer then 240 characters. This isn't an issue for writing! It might cause issues for reading with external libraries. Our own cython code seems to handle a missing 0 terminator just fine. There is almost no way for us to test this. The cython library converts the char array into proper python string internally. We never get to see the raw char buffer in python.

I can again to a test of the expected byte. But nothing more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adjusted the tests and docs for this change. This way we can be sure that any DCD written by MDAnalysis doesn't trip over another C library reading DCD.

WRITE(fd, title_string, 240);

fio_write_int32(fd, 164);
fio_write_int32(fd, 244);
fio_write_int32(fd, 4);
fio_write_int32(fd, N);
fio_write_int32(fd, 4);
Expand Down
4 changes: 3 additions & 1 deletion package/MDAnalysis/lib/formats/libdcd.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -468,7 +468,8 @@ cdef class DCDFile:
Parameters
----------
remarks : str
remarks of DCD file. Shouldn't be more then 240 characters (ASCII)
remarks of DCD file. Writes up to 239 characters (ASCII). The
character 240 will be the null terminator
natoms : int
number of atoms to write
istart : int
Expand All @@ -479,6 +480,7 @@ cdef class DCDFile:
integrator time step. The time for 1 frame is nsavc * delta
is_periodic : bool
write unitcell information. Also pretends that file was written by CHARMM 24

"""
if not self.is_open:
raise IOError("No file open")
Expand Down
14 changes: 7 additions & 7 deletions testsuite/MDAnalysisTests/analysis/test_rms.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,21 +165,21 @@ def outfile(self, tmpdir):

@pytest.fixture()
def correct_values(self):
return [[0, 0, 0], [49, 49, 4.68953]]
return [[0, 1000, 0], [49, 1049, 4.68953]]

@pytest.fixture()
def correct_values_mass(self):
return [[0, 0, 0], [49, 49, 4.74920]]
return [[0, 1000, 0], [49, 1049, 4.74920]]

@pytest.fixture()
def correct_values_group(self):
return [[0, 0, 0, 0, 0],
[49, 49, 4.7857, 4.7048, 4.6924]]
return [[0, 1000, 0, 0, 0],
[49, 1049, 4.7857, 4.7048, 4.6924]]

@pytest.fixture()
def correct_values_backbone_group(self):
return [[0, 0, 0, 0, 0],
[49, 49, 4.6997, 1.9154, 2.7139]]
return [[0, 1000, 0, 0, 0],
[49, 1049, 4.6997, 1.9154, 2.7139]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify why results in the test_rms suite are changing because of subtle changes in DCD handling machinery?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e., if DCD test file is being used to seed the tests, that might make sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the changed time. The DCD file doesn't start at time point 0 but rather at time 1000. We have been treating this wrong for some time.



def test_progress_meter(self, capsys, universe):
Expand Down Expand Up @@ -217,7 +217,7 @@ def test_rmsd_atomgroup_selections(self, universe):
def test_rmsd_single_frame(self, universe):
RMSD = MDAnalysis.analysis.rms.RMSD(universe, select='name CA',
start=5, stop=6).run()
single_frame = [[5, 5, 0.91544906]]
single_frame = [[5, 1005, 0.91544906]]
assert_almost_equal(RMSD.rmsd, single_frame, 4,
err_msg="error: rmsd profile should match" +
"test values")
Expand Down
23 changes: 19 additions & 4 deletions testsuite/MDAnalysisTests/coordinates/test_dcd.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,20 @@ def test_with_statement(self):
np.arange(0, N),
err_msg="with_statement: DCDReader does not read all frames")

def test_set_time(self):
u = mda.Universe(PSF, DCD)
assert_almost_equal(u.trajectory.time, 1000, decimal=3)


@pytest.mark.parametrize('istart', (0, 1, 2, 3))
def test_write_istart(universe_dcd, tmpdir, istart):
u = universe_dcd
outfile = str(tmpdir.join('test.dcd'))
with mda.Writer(outfile, u.atoms.n_atoms, istart=istart) as w:
w.write(u.atoms)
u = mda.Universe(PSF, outfile)
assert u.trajectory._file.header['istart'] == istart


class TestDCDWriter(BaseWriterTest):
@staticmethod
Expand Down Expand Up @@ -107,6 +121,7 @@ def test_write_random_unitcell(tmpdir):
decimal=5)



################
# Legacy tests #
################
Expand Down Expand Up @@ -213,22 +228,22 @@ def test_reader_set_dt():
dt = 100
frame = 3
u = mda.Universe(PSF, DCD, dt=dt)
assert_almost_equal(u.trajectory[frame].time, frame*dt,
assert_almost_equal(u.trajectory[frame].time, (frame + 1000)*dt,
err_msg="setting time step dt={0} failed: "
"actually used dt={1}".format(
dt, u.trajectory._ts_kwargs['dt']))
assert_almost_equal(u.trajectory.dt, dt,
err_msg="trajectory.dt does not match set dt")


@pytest.mark.parametrize("ext, decimal", (("dcd", 5),
@pytest.mark.parametrize("ext, decimal", (("dcd", 4),
("xtc", 3)))
def test_writer_dt(tmpdir, ext, decimal):
dt = 5.0 # set time step to 5 ps
universe_dcd = mda.Universe(PSF, DCD, dt=dt)
t = universe_dcd.trajectory
outfile = str(tmpdir.join("test.{}".format(ext)))
with mda.Writer(outfile, n_atoms=t.n_atoms, dt=dt) as W:
with mda.Writer(outfile, n_atoms=t.n_atoms, dt=dt, istart=t._file.header['istart']) as W:
for ts in universe_dcd.trajectory:
W.write(universe_dcd.atoms)

Expand All @@ -237,7 +252,7 @@ def test_writer_dt(tmpdir, ext, decimal):
(uw.trajectory.n_frames - 1) * dt, decimal)
times = np.array([uw.trajectory.time for ts in uw.trajectory])
frames = np.array([ts.frame for ts in uw.trajectory])
assert_array_almost_equal(times, frames * dt, decimal)
assert_array_almost_equal(times, (frames + 1000) * dt, decimal)


@pytest.mark.parametrize("ext, decimal", (("dcd", 5),
Expand Down
2 changes: 2 additions & 0 deletions testsuite/MDAnalysisTests/coordinates/test_memory.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,8 @@ def reader(self, trajectory):

def iter_ts(self, i):
ts = self.universe.trajectory[i]
# correct time because memory reader doesn't read the correct time
ts.time = ts.frame * self.dt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you briefly clarify why DCD change is impacting memory reader here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the memory reader tests are based on the DCD file and assumed it started from 0. This code restores that assumption. The memory reader can't actually deal with a trajectory that doesn't start at 0 at the moment. See the new issue #1769

return ts


Expand Down
9 changes: 4 additions & 5 deletions testsuite/MDAnalysisTests/core/test_universe.py
Original file line number Diff line number Diff line change
Expand Up @@ -468,12 +468,11 @@ def test_slicing_step_with_start_stop(self):

def test_slicing_step_dt(self):
universe = MDAnalysis.Universe(PDB_small, DCD)
times = [ts.time for ts in universe.trajectory]
dt = universe.trajectory.dt
universe.transfer_to_memory(step=2)
times2 = [ts.time for ts in universe.trajectory]
assert_almost_equal(times[::2], times2,
err_msg="Unexpected in-memory timestep: "
+ "dt not updated with step information")
assert_almost_equal(dt * 2, universe.trajectory.dt,
err_msg="Unexpected in-memory timestep: "
+ "dt not updated with step information")

def test_slicing_negative_start(self):
universe = MDAnalysis.Universe(PDB_small, DCD)
Expand Down
23 changes: 21 additions & 2 deletions testsuite/MDAnalysisTests/formats/test_libdcd.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
from collections import namedtuple
import os
import string
import struct

import hypothesis.strategies as strategies
from hypothesis import example, given
Expand Down Expand Up @@ -242,6 +243,22 @@ def test_write_header(tmpdir):
assert header['nsavc'] == 10
assert np.allclose(header['delta'], .02)

# we also check the bytes written directly.
with open(testfile, 'rb') as fh:
header_bytes = fh.read()
# check for magic number
assert struct.unpack('i', header_bytes[:4])[0] == 84
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while since I had to reverse engineer Fortran binary, but this looks right-
the magic numbers are the length in bytes of the arrays

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the magic number is the length of the data block. It looks like all data blocks in the DCD file are marked with numbers of the corresponding length at the beginning and end. As I can see the header of a DCD file has 3-4 blocks.

  1. 84 bytes time information like dt, nsavc, ...
  2. x * 80+4 bytes remarks
  3. 4 bytes natoms
  4. xx bytes fixed atoms (may not be written)

# magic number should be written again before remark section
assert struct.unpack('i', header_bytes[88:92])[0] == 84
# length of remark section. We hard code this to 244 right now
assert struct.unpack('i', header_bytes[92:96])[0] == 244
# say we have 3 block of length 80
assert struct.unpack('i', header_bytes[96:100])[0] == 3
# after the remark section the length should be reported again
assert struct.unpack('i', header_bytes[340:344])[0] == 244
# this is a magic number as far as I see
assert struct.unpack('i', header_bytes[344:348])[0] == 4


def test_write_no_header(tmpdir):
fname = str(tmpdir.join('test.dcd'))
Expand Down Expand Up @@ -301,7 +318,7 @@ def write_dcd(in_name, out_name, remarks='testing', header=None):

@given(remarks=strategies.text(
alphabet=string.printable, min_size=0,
max_size=240)) # handle the printable ASCII strings
max_size=239)) # handle the printable ASCII strings
@example(remarks='')
def test_written_remarks_property(remarks, tmpdir, dcd):
# property based testing for writing of a wide range of string
Expand All @@ -310,7 +327,7 @@ def test_written_remarks_property(remarks, tmpdir, dcd):
header = dcd.header
header['remarks'] = remarks
write_dcd(DCD, testfile, header=header)
expected_remarks = remarks[:240]
expected_remarks = remarks
with DCDFile(testfile) as f:
assert f.header['remarks'] == expected_remarks

Expand All @@ -323,6 +340,8 @@ def written_dcd(tmpdir_factory):
testfile = str(testfile)
write_dcd(DCD, testfile)
Result = namedtuple("Result", "testfile, header, orgfile")
# throw away last char we didn't save due to null termination
header['remarks'] = header['remarks'][:-1]
return Result(testfile, header, DCD)


Expand Down