Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve jump specification criteria #228

Merged
merged 34 commits into from
Jan 23, 2024
Merged

Improve jump specification criteria #228

merged 34 commits into from
Jan 23, 2024

Conversation

v1kko
Copy link
Contributor

@v1kko v1kko commented Dec 19, 2023

  • Specification of site radius in angstr
    sites = SitesData(structure=structure,
                      trajectory=vasp_traj,
                      floating_specie='Li',
                      site_radius=1, # This can be left out
                      site_inner_fraction=0.5)
  • Addition of Jumps class
    jumps = Jumps(transitions=transitions,
                  sites=sites,
                  conversion_method=<Function to convert transitions to Jumps>
                  minimal_residence=4)
    
  • transitions and jumps are now pandas dataframes
  • A generic conversion method is supplied: _generic_transitions_to_jumps which is used as default, and can be used to write your own converter.
  • The dataframe for transitions looks like:
      atom index  start site  destination site  time
0              0          94                -1   202
1              0          -1                94   204
2              0          94                -1   208
3              0          -1                94   219
4              0          94                -1   224
...          ...         ...               ...   ...
2100          47          47                -1  3421
2101          47          -1                47  3440
2102          47          47                -1  3665
2103          47          -1                55  3677
2104          47          55                -1  3691

[2105 rows x 4 columns]
  • The dataframe for jumps looks like:
     atom index  start site  destination site  start time  stop time
0             0          94                 0         282        284
1             0           0                94         385        442
2             0          94                 0         704        728
3             0           0                94         920        928
4             0          94                 0        1003       1006
..          ...         ...               ...         ...        ...
457          47          47                69         961        964
458          47          69                47        1014       1017
459          47          47                39        1212       1231
460          47          39                47        2751       2771
461          47          47                55        3665       3678
  • Specification of minimal residence time to count as jump
  • inner site radius specification
  • write tests
  • write docs

@v1kko v1kko linked an issue Dec 19, 2023 that may be closed by this pull request
v1kko added 5 commits January 9, 2024 13:11
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/e44462d6021bfe23dfb24b775cc7c390844f773d' (2023-11-12)
  → 'github:nixos/nixpkgs/91a00709aebb3602f172a0bf47ba1ef013e34835' (2023-12-17)
v1kko added 12 commits January 11, 2024 16:23
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:nixos/nixpkgs/91a00709aebb3602f172a0bf47ba1ef013e34835' (2023-12-17)
  → 'github:nixos/nixpkgs/317484b1ead87b9c1b8ac5261a8d2dd748a0492d' (2024-01-08)
Jumps now only count if they reach the inner fraction of a site
@v1kko
Copy link
Contributor Author

v1kko commented Jan 16, 2024

@AstyLavrinenko, @stefsmeets a notebook with the changes can be found here: https://gemdat.readthedocs.io/en/jumps_update/notebooks/jumps/

Copy link
Contributor

@stefsmeets stefsmeets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @v1kko , thanks for this. I understand the goal of the PR is to split off the jumps related code into its own class. Changing events into a dataframe will be super useful going forward.

To my taste this PR is too large, so I wasn't able to consider the changes as a whole. Some of the code became a bit too complex for me to fully grasp what is going on. So I just left some comments as I went through the code.

I have 2 major points of concern:

  1. The changes are useful, and decoupling of Jumps is a nice step forward. My concern is the complexity of the code and api, which has also taken a leap. I think it would be useful to go through the code and make sure the docstrings are up to date, that we avoid returning tuples (a code smell, if you ask me), and to refactor a bit to make sure that the api still pleasant to work with.

  2. The test data for the integration test are completely different. It's not clear to me exactly from the PR what has changed or why this is. This makes it difficult to verify the correctness of the code. Some small differences can be expected, but the results here are completely different. To me it indecates a problem with the code if we cannot reproduce the results from our test data.

src/gemdat/collective.py Outdated Show resolved Hide resolved
src/gemdat/collective.py Outdated Show resolved Hide resolved
src/gemdat/collective.py Outdated Show resolved Hide resolved
src/gemdat/collective.py Show resolved Hide resolved
src/gemdat/jumps.py Outdated Show resolved Hide resolved
tests/integration/sites_test.py Show resolved Hide resolved
tests/integration/sites_test.py Outdated Show resolved Hide resolved
tests/integration/sites_test.py Show resolved Hide resolved
tests/integration/sites_test.py Show resolved Hide resolved
tests/integration/sites_test.py Show resolved Hide resolved
@v1kko
Copy link
Contributor Author

v1kko commented Jan 18, 2024

Hi @v1kko , thanks for this. I understand the goal of the PR is to split off the jumps related code into its own class. Changing events into a dataframe will be super useful going forward.

To my taste this PR is too large, so I wasn't able to consider the changes as a whole. Some of the code became a bit too complex for me to fully grasp what is going on. So I just left some comments as I went through the code.

I have 2 major points of concern:

1. The changes are useful, and decoupling of `Jumps` is a nice step forward. My concern is the complexity of the code and api, which has also taken a leap. I think it would be useful to go through the code and make sure the docstrings are up to date, that we avoid returning tuples (a code smell, if you ask me), and to refactor a bit to make sure that the api still pleasant to work with.

2. The test data for the integration test are completely different. It's not clear to me exactly from the PR what has changed or why this is. This makes it difficult to verify the correctness of the code. Some small differences can be expected, but the results here are completely different. To me it indecates a problem with the code if we cannot reproduce the results from our test data.

Thanks for looking.

I don't think it can be helped that this PR is so large, introducing "configurable" jumps unfortunately touches almost everything, but suggestions are welcome.

I don't think returning tuples is always bad, but I can fix it here.

Number 2 is because most of these numbers did not make sense (especially the calculation of solo and collective jumps). This is now improved, so this change is expected, and discussed with Alexandros. I should have mentioned it before. If you look at the figures for jumps for example those are still similar.

@stefsmeets
Copy link
Contributor

stefsmeets commented Jan 18, 2024

I don't think returning tuples is always bad, but I can fix it here.

My gripe is that it leads to unmaintainable code. Adding another item breaks all code downstream (Similar to having functions with many positional arguments). I find that in most cases it is easier in the long run to return a dataclass, namedtuple or dict, so that variables can be accessed by name, rather than by index. Usually, I find that this leads to a better design.

Number 2 is because most of these numbers did not make sense (especially the calculation of solo and collective jumps). This is now improved, so this change is expected, and discussed with Alexandros. I should have mentioned it before. If you look at the figures for jumps for example those are still similar.

I see now that the 450 comes back in other places. 👍 I wasn't aware that there were some issues with the previous implementation. I'm still curious what you think about all those other values being so far off.

v1kko and others added 2 commits January 22, 2024 09:18
Co-authored-by: Stef Smeets <stefsmeets@users.noreply.github.com>
Co-authored-by: Stef Smeets <stefsmeets@users.noreply.github.com>
v1kko and others added 8 commits January 22, 2024 09:47
Co-authored-by: Stef Smeets <stefsmeets@users.noreply.github.com>
Co-authored-by: Stef Smeets <stefsmeets@users.noreply.github.com>
Co-authored-by: Stef Smeets <stefsmeets@users.noreply.github.com>
Co-authored-by: Stef Smeets <stefsmeets@users.noreply.github.com>
Co-authored-by: Stef Smeets <stefsmeets@users.noreply.github.com>
Co-authored-by: Stef Smeets <stefsmeets@users.noreply.github.com>
Co-authored-by: Stef Smeets <stefsmeets@users.noreply.github.com>
Co-authored-by: Stef Smeets <stefsmeets@users.noreply.github.com>
@v1kko v1kko merged commit b655775 into main Jan 23, 2024
3 checks passed
@v1kko v1kko deleted the jumps_update branch January 23, 2024 13:06
@stefsmeets stefsmeets mentioned this pull request Jan 23, 2024
@stefsmeets stefsmeets restored the jumps_update branch February 6, 2024 14:01
@stefsmeets stefsmeets deleted the jumps_update branch February 6, 2024 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refining the jump calculation
2 participants