-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various contributions #86
Draft
salotz
wants to merge
24
commits into
ADicksonLab:master
Choose a base branch
from
stxinsite:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is needed for testing
Also removes the checks on the images for shapes and dtypes so that any object can be used as an image.
This is a WIP and subject to change. WARNING: This does not yet make all of the APIs aware of variable numbers of walkers. There are likely bugs in the analysis code if you have variable number of walkers. This support will likely get added after this is merged to master. What has changed? First in the WepyHDF5 class itself: - There are a number of new methods to support proper and robust frame/cycles: - run_cycle_idxs - run_cycle_walker_idxs - run_cycle_walker_traces - get_traj_field_names - get_traj_field_names_observables - get_traj_field_names_alt_reps - get_traj_cycle_idxs - get_traj_field_num_frames - A number of methods were altered to not be lazy about getting frame/cycle idxs etc. - num_traj_frames - get_traj_field_cycle_idxs The main method of supporting variable numbers of walkers is through sparse fields over all of the fields. These latter methods also now take into account the offset in the beginning of a trajectory for a new larger number of walkers. This is both for contiguous and sparse fields. This relies on the presence of a metadata attribute in each traj group that is the 'start_cycle_idx' that is set in the reporter. Additionally some small changes: - add_traj now returns the traj_grp that it created For the WepyHDF5Reporter there are some additions to the API. Mainly this is the semantics around specifying the 'sparse_fields' argument. It should be backwards compatible. But now there is the option of both specifying a single Ellipsis, which makes all fields sparse. This would be the default/base recommendation for a variable number of walker simulation. And now also there are 3 total choices for a field by field specification: - `int` for a frequency - `None` if it is not sparse (ensures it is not sparse. Can act as an "exclude") - `Ellipsis` which will initialize the field as sparse but will still save everything that it can. It is the `Ellipsis` option which is used for the variable number of walkers. It should be possible to always make this the default. But likely there is too much code relying on the datasets being uncomplicated by sparse fields. Additionally the alt_reps and all_atoms_rep_freq options support the Ellipsis option as well now. The other major change is that now the reporter will record much more metadata on the cycle indices for the runs and trajectories. The new trajectory group metadata fields are: - 'first_cycle_idx' :: previously it did set this as 'cycle_idx' kind of as a hail mary for supporting variable walkers. - 'num_cycles' :: Which gets updated with each call to report - 'last_cycle_idx' :: which also gets updated with each call to report This allows for some consistency checks with both the shape of the data and in a run as a whole which is much more efficient than actually counting them. Currently 'first_cycle_idx' is the only actualized dependency in the WepyHDF5 class used as offsets for cycle indices. The new run group metadata attributes are: - 'num_cycles' - 'last_cycle_idx'
also removed the sparse_fields_freq argument that was left in by mistake
The performance for both `run_cycle_walker_idxs` was abysmal because of the repeated calls to `get_traj_field_cycle_idxs`. So I added in a shim function `_get_all_run_traj_cycle_idxs` which gets all of the trajectory cycles in one batch. This can be used as a datastructure without having to read them all the time from disk. `run_cycle_walker_idxs` was changed to `run_cycles_walker_idxs` and supports getting multiple cycles at a time, which makes only one call to `_get_all_run_traj_cycle_idxs` each time. This was then used from `run_cycle_walker_traces` and dramatically improves performance.
There was a mistake where the number of frames in the data assumed it wasn't sparse. This was corrected.
The `get_traj_field` was broken with the new cycle_idxs semantics. First the weights were not sparse as they should always be saved. The current commit is a HACK to get around this. There is a shim function for getting the weights specifically as a field which will look at the sparse idxs of the 'positions' field. As noted in the issue this should change in the future in a more robust mechanism. We also have a split in the semantics of the 'frames' argument to `get_trajectory_field`. For that function it is actually the "cycle_idxs" as one would logically expect. The lower level functions for contiguous and sparse fields accept a "frames" but this is actually the 'frame_idx' which is just the counting index of the frames in that lane.
Previously there was just a 'frames' argument for choosing a subset of the structures in a trajectory. With the introduction of "sparse trajectories" and not just "sparse fields" there is now a distinction between 'frame_idxs' and 'cycle_idxs'. This was addressed in a lower level in earlier commits but the `get_traj_fields` API stayed the same with the 'frames' argument simply meaning 'cycle_idxs'. Components that were using `get_traj_fields` (like in the distributed analysis chunker) were using 'frames' in an ambiguous way since there functionally wasn't a difference. Now this difference matters. For tools like the chunker where the actual time dependence between walkers is irrelevant they needed a way to query just on the data shape itself, or the 'frame_idxs'. 'frame_idxs' was added as an argument to support this. The 'frames' argument is a bit redundant and confusing with this (and not clear it means actuall cycle_idxs) so we deprecated it and replaced it with 'cycle_idxs'. It still exists for now but raises a deprecation warning on usage and may be removed in the future.
Credit to @tom.dixon for the fix.
Westpa integration fixes RDAS-32
add/fix the progress section RDAS-37
Fix.gh.16 RDAS-38
port fixes for just the REVO merge bug RDAS-142
I also included all of the new networkx refactoring to get an extra test that it is indeed working
bugfix for wexplore
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Opening this up for the possibility of merging as some interesting features were added and various bug fixes were made.
This likely shouldn't be merged as-is. The HDF5 file IIRC has some compatibility issues as it added support for variable numbers of walkers.