Various contributions #86

salotz · 2023-09-20T04:08:19Z

Opening this up for the possibility of merging as some interesting features were added and various bug fixes were made.

This likely shouldn't be merged as-is. The HDF5 file IIRC has some compatibility issues as it added support for variable numbers of walkers.

This is needed for testing

Also removes the checks on the images for shapes and dtypes so that any object can be used as an image.

This is a WIP and subject to change. WARNING: This does not yet make all of the APIs aware of variable numbers of walkers. There are likely bugs in the analysis code if you have variable number of walkers. This support will likely get added after this is merged to master. What has changed? First in the WepyHDF5 class itself: - There are a number of new methods to support proper and robust frame/cycles: - run_cycle_idxs - run_cycle_walker_idxs - run_cycle_walker_traces - get_traj_field_names - get_traj_field_names_observables - get_traj_field_names_alt_reps - get_traj_cycle_idxs - get_traj_field_num_frames - A number of methods were altered to not be lazy about getting frame/cycle idxs etc. - num_traj_frames - get_traj_field_cycle_idxs The main method of supporting variable numbers of walkers is through sparse fields over all of the fields. These latter methods also now take into account the offset in the beginning of a trajectory for a new larger number of walkers. This is both for contiguous and sparse fields. This relies on the presence of a metadata attribute in each traj group that is the 'start_cycle_idx' that is set in the reporter. Additionally some small changes: - add_traj now returns the traj_grp that it created For the WepyHDF5Reporter there are some additions to the API. Mainly this is the semantics around specifying the 'sparse_fields' argument. It should be backwards compatible. But now there is the option of both specifying a single Ellipsis, which makes all fields sparse. This would be the default/base recommendation for a variable number of walker simulation. And now also there are 3 total choices for a field by field specification: - `int` for a frequency - `None` if it is not sparse (ensures it is not sparse. Can act as an "exclude") - `Ellipsis` which will initialize the field as sparse but will still save everything that it can. It is the `Ellipsis` option which is used for the variable number of walkers. It should be possible to always make this the default. But likely there is too much code relying on the datasets being uncomplicated by sparse fields. Additionally the alt_reps and all_atoms_rep_freq options support the Ellipsis option as well now. The other major change is that now the reporter will record much more metadata on the cycle indices for the runs and trajectories. The new trajectory group metadata fields are: - 'first_cycle_idx' :: previously it did set this as 'cycle_idx' kind of as a hail mary for supporting variable walkers. - 'num_cycles' :: Which gets updated with each call to report - 'last_cycle_idx' :: which also gets updated with each call to report This allows for some consistency checks with both the shape of the data and in a run as a whole which is much more efficient than actually counting them. Currently 'first_cycle_idx' is the only actualized dependency in the WepyHDF5 class used as offsets for cycle indices. The new run group metadata attributes are: - 'num_cycles' - 'last_cycle_idx'

also removed the sparse_fields_freq argument that was left in by mistake

The performance for both `run_cycle_walker_idxs` was abysmal because of the repeated calls to `get_traj_field_cycle_idxs`. So I added in a shim function `_get_all_run_traj_cycle_idxs` which gets all of the trajectory cycles in one batch. This can be used as a datastructure without having to read them all the time from disk. `run_cycle_walker_idxs` was changed to `run_cycles_walker_idxs` and supports getting multiple cycles at a time, which makes only one call to `_get_all_run_traj_cycle_idxs` each time. This was then used from `run_cycle_walker_traces` and dramatically improves performance.

There was a mistake where the number of frames in the data assumed it wasn't sparse. This was corrected.

The `get_traj_field` was broken with the new cycle_idxs semantics. First the weights were not sparse as they should always be saved. The current commit is a HACK to get around this. There is a shim function for getting the weights specifically as a field which will look at the sparse idxs of the 'positions' field. As noted in the issue this should change in the future in a more robust mechanism. We also have a split in the semantics of the 'frames' argument to `get_trajectory_field`. For that function it is actually the "cycle_idxs" as one would logically expect. The lower level functions for contiguous and sparse fields accept a "frames" but this is actually the 'frame_idx' which is just the counting index of the frames in that lane.

Previously there was just a 'frames' argument for choosing a subset of the structures in a trajectory. With the introduction of "sparse trajectories" and not just "sparse fields" there is now a distinction between 'frame_idxs' and 'cycle_idxs'. This was addressed in a lower level in earlier commits but the `get_traj_fields` API stayed the same with the 'frames' argument simply meaning 'cycle_idxs'. Components that were using `get_traj_fields` (like in the distributed analysis chunker) were using 'frames' in an ambiguous way since there functionally wasn't a difference. Now this difference matters. For tools like the chunker where the actual time dependence between walkers is irrelevant they needed a way to query just on the data shape itself, or the 'frame_idxs'. 'frame_idxs' was added as an argument to support this. The 'frames' argument is a bit redundant and confusing with this (and not clear it means actuall cycle_idxs) so we deprecated it and replaced it with 'cycle_idxs'. It still exists for now but raises a deprecation warning on usage and may be removed in the future.

Credit to @tom.dixon for the fix.

Westpa integration fixes RDAS-32

add/fix the progress section RDAS-37

Fix.gh.16 RDAS-38

port fixes for just the REVO merge bug RDAS-142

I also included all of the new networkx refactoring to get an extra test that it is indeed working

bugfix for wexplore

salotz and others added 24 commits March 30, 2021 19:18

fixes tests calling the SimMaker

2a1b779

add openmm_systems to the dev env

0844bb7

This is needed for testing

removes resampler_field_{dtypes,shapes} from resamplers

2d7ae86

Also removes the checks on the images for shapes and dtypes so that any object can be used as an image.

WIP: made initial changes

fa2082c

removes the networkx import in wepy.hdf5

3060a19

also removed the sparse_fields_freq argument that was left in by mistake

fixes get_traj_field for sparse fields

e33f083

There was a mistake where the number of frames in the data assumed it wasn't sparse. This was corrected.

fix bug in get_traj_field from last commit

60523d3

add/fix the progress section

05cf4c5

Credit to @tom.dixon for the fix.

removes image stuff from REVO RESAMPLER_* constants

ab6f684

Merge pull request #1 from stxinsite/westpa-integration-fixes

dc65a0d

Westpa integration fixes RDAS-32

Merge branch 'master' of github.com:ADicksonLab/wepy

b3fef55

Merge pull request #3 from stxinsite/fix.gh.27

1a9f32c

add/fix the progress section RDAS-37

Merge pull request #2 from stxinsite/fix.gh.16

0f375b4

Fix.gh.16 RDAS-38

Merge branch 'master' of github.com:stxinsite/wepy

02151ac

fixes missing self in some methods

dc3cb86

fix bug in traj_fields_to_mdtraj

96a8a76

port fixes for just the REVO merge bug

68c9ccb

Merge pull request #5 from stxinsite/revo-merge-bug

352d4c1

port fixes for just the REVO merge bug RDAS-142

bugfix for wexplore

e0e79c9

I also included all of the new networkx refactoring to get an extra test that it is indeed working

Merge pull request #6 from stxinsite/fix-wexplore

d665de0

bugfix for wexplore

salotz modified the milestones: Proper 1.2 version, 2.0 Release Sep 20, 2023

salotz added the invalid label Sep 20, 2023

salotz self-assigned this Sep 20, 2023

salotz marked this pull request as draft September 20, 2023 04:09

salotz mentioned this pull request Nov 29, 2023

Reading weights from the HDF5 is slow #123

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various contributions #86

Various contributions #86

salotz commented Sep 20, 2023

Various contributions #86

Are you sure you want to change the base?

Various contributions #86

Conversation

salotz commented Sep 20, 2023