Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] u.transfer_to_memory also transfers the box #1537

Closed
wants to merge 6 commits into from

Conversation

jbarnoud
Copy link
Contributor

This commit makes the memory reader deal with a non-constant box
dimensions and makes `Universe.transfer_to_memory copy the box in
addition to the positions.

Until now, only one box was stored in the memory reader, making it
impractical for NPT systems.

Fixes #1041

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests!

@@ -278,10 +278,9 @@ def __init__(self, coordinate_array, order='fac',
"array ({})"
.format(provided_n_atoms, self.n_atoms))

self.dimensions = dimensions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd store this as _dimensions_array to avoid someone accidentally using it thinking it was ts.dimensions?

@@ -397,6 +396,11 @@ def _read_next_timestep(self, ts=None):
[self.ts.frame] +
[slice(None)]*(2-f_index))
ts.positions = self.coordinate_array[basic_slice]
if self.dimensions is not None:
if len(self.dimensions) == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if self.dimensions = [np.array([1, 2, 3, 4, 5, 6])] this won't work

@@ -494,8 +495,22 @@ def transfer_to_memory(self, start=None, stop=None, step=None,
coordinates = [] # TODO: use pre-allocated array
for i, ts in enumerate(self.trajectory[start:stop:step]):
coordinates.append(np.copy(ts.positions))
dimensions.append(np.copy(ts.dimensions))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annoyingly this will copy the box even for non NPT systems, but I guess it's a small cost compared to the number of atoms that will be floating around

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reliable way to detect a NPT system? I could detect boxes that are [0, 0, 0, 90, 90, 90] but I am not sure it is much better than copying the box.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's no big deal, and probably safer to do it this way

'copied to memory (frame {frame})')
pm = ProgressMeter(n_frames, interval=1,
verbose=verbose, format=pm_format)
for i, ts in enumerate(self.trajectory[start:stop:step]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes from an issue I have with Readers that implement timeseries (i.e. the DCD reader): the timeseries method only returns the coordinates. So whatever the method I use for to get the coordinates, I still need to go through the trajectory to get the box.

I see 3 ways of handling this:

  1. a timeseries like method for the box to go with the one for the positions
  2. having the box reading loginc twice: one if the timeseries is used that will iterate over the trajectory only to get the box, and one in the existing loop for readers that have no timeseries
  3. what I did: having the logic only once, but iterating twice in some cases

Solution 1 is the best but implies to add a chunk to the reader, 2 the most efficient for little new development, 3 is the simplest and avoid logic duplication.

I am not sure what is the best approach here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the box info reader to the timeseries PR #1400?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a good idea, indeed. Especially since, it would allow to just ditch the loop in transfer to memory.

I have one doubt, though. The way I see the API, I would add a ProtoReader.timeseries_dimensions method. This would mean that in all cases transfer_to-memory will iterate twice over the trajectory. This can be a performance issue with slow readers.

Any idea before I dive in it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeseries function gets a keyword that tells it the information we would like to retrieve

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factor of 2 in trajectory reading makes a difference for processing large data sets. I/O is probably the biggest issue in speeding up analysis (and getting good parallel performance). I would very much like to have ways to use the fastest approaches that are possible and just fall back to slow solutions if we wouldn't have the feature otherwise.

I think the point of a high-level library such as MDA is that it handles these cases seamlessly. (Otherwise, what's the point vs just using the raw lib.formats readers.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the _timeseries_positions approach should take care of taking advantage of full-speed when possible.

We must be very clear about this in both the general and the DCD-specific Reader docs.

Copy link
Member

@richardjgowers richardjgowers Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if #1400 gets done, then we can just call Reader.timeseries trajectory.timeseries and that will use a fast version if a child class implements something fancy (DCD) or fall back onto essentially what is written here.

@kain88-de why do we need an xarray here? Everything is nice and square?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardjgowers, just for the record: organization-wise I think it makes more sense for specialized children to override Reader.timeseries and then themselves do the choice of whether to run their own fancy code or fall back to Reader.timeseries. This way we don't need DCD-specific checks under the base Reader class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry, that's what I meant. 1400 implements Reader.timeseries which DCD overrides later.

@richardjgowers richardjgowers self-assigned this Jul 21, 2017
jbarnoud and others added 6 commits January 5, 2018 14:26
This commit makes the memory reader deal with a non-constant box
dimensions and makes `Universe.transfer_to_memory copy the box in
addition to the positions.

Until now, only one box was stored in the memory reader, making it
unpractical for NPT systems.
@kain88-de
Copy link
Member

I added a dimensions keyword arg to the timeseries function. That makes the reading easier. It's simple solution for now.

@richardjgowers richardjgowers mentioned this pull request Sep 19, 2018
4 tasks
@orbeckst orbeckst added the close? Evaluate if issue/PR is stale and can be closed. label Sep 28, 2018
@RMeli RMeli deleted the memory-dimensions branch December 23, 2023 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
close? Evaluate if issue/PR is stale and can be closed. Work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants