Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync series and dataframe in statistical_inefficieny() #199

Merged
merged 10 commits into from
Jul 7, 2022

Conversation

ptmerz
Copy link
Contributor

@ptmerz ptmerz commented Jun 22, 2022

This adds a number of tests for slicing() and statistical_inefficieny():

  • Test that slicing() respects upper and lower time bounds (currently
    passes)
  • Test that statistical_inefficieny() respects upper and lower time bounds
    when it is used without series to subsample (currently passes)
  • Test that statistical_inefficieny() respects upper and lower time bounds
    when it is used with series to subsample (currently fails)
  • Test that first using slicing() on the data frame, then
    statistical_inefficieny() without time bounds yields the same results as
    a single call to statistical_inefficieny() with time bounds (currently
    fails)

It then makes sure that both the subsampling series and the dataframe are sliced
when calling statistical_inefficieny(). This ensure that the series and
the dataframe are in sync before the subsampled indices of the series
are applied to the dataframe. This fixes the above tests.

Fixes #198

ptmerz added 2 commits June 22, 2022 09:11
This adds a number of tests for slicing() and statistical_inefficieny():
* Test that slicing() respects upper and lower time bounds (currently
  passes)
* Test that statistical_inefficieny() respects upper and lower time bounds
  when it is used without series to subsample (currently passes)
* Test that statistical_inefficieny() respects upper and lower time bounds
  when it is used without series to subsample (currently fails)
* Test that first using slicing() on the data frame, then
  statistical_inefficieny() without time bounds yields the same results as
  a single call to statistical_inefficieny() with time bounds (currently
  fails)

Refs alchemistry#198
Make sure that both the subsampling series and the dataframe are sliced
when calling statistical_inefficieny(). This ensure that the series and
the dataframe are in sync before the subsampled indices of the series
are applied to the dataframe.

Fixes alchemistry#198
@codecov
Copy link

codecov bot commented Jun 22, 2022

Codecov Report

Merging #199 (cc122a8) into master (72622c9) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #199   +/-   ##
=======================================
  Coverage   98.15%   98.15%           
=======================================
  Files          24       24           
  Lines        1352     1353    +1     
  Branches      295      295           
=======================================
+ Hits         1327     1328    +1     
  Misses          5        5           
  Partials       20       20           
Impacted Files Coverage Δ
src/alchemlyb/preprocessing/subsampling.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 72622c9...cc122a8. Read the comment docs.

Copy link
Collaborator

@xiki-tempula xiki-tempula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the contribution. This fix is very important.
I think the test could be simplified as bit using the @pytest.mark.parametrize

You could do something like

@pytest.mark.parametrize('data', (gmx_benzene_dHdl(), gmx_benzene_u_nk())
@pytest.mark.parametrize('test_series', (True, False))
def test_lower_and_upper_bound_slicer(self, data, test_series):
        """
        Test that the lower and upper time is respected when using statistical_inefficiency
        without a series. In this case, statistical_inefficiency should behave like slicing
        """
        lower, upper = 1000, 34000
        original_length = len(data)
        # Check that the input data is appropriate for the test
        assert any(data.reset_index()['time'] < lower)
        assert any(data.reset_index()['time'] > upper)

        if test_series:
           series=data.sum(axis=1)
        else:
           series=None

        # Slice data, and check that we don't observe times outside
        # the prescribed range
        sliced = self.slicer(data,
                             series= series,
                             lower=lower,
                             upper=upper,
                             step=5)
        assert all(sliced.reset_index()['time'] >= lower)
        assert all(sliced.reset_index()['time'] <= upper)

        # Make sure we didn't change input data
        assert len(data) == original_length

Then for test_slicing_inefficiency_equivalence just strip away all the checking.

src/alchemlyb/tests/test_preprocessing.py Outdated Show resolved Hide resolved
# Slice data, and check that we don't observe times outside
# the prescribed range
sliced = self.slicer(data,
series=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't quite see how is this different from test_lower_and_upper_bound as the series is default to None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this test class tests statistical_inefficiency(...), the one above tests slicing(...).

So this tests makes sure that statistical_inefficiency respects the upper and lower bounds when it's used without series. The test above makes sure that slicing respects the upper and lower bounds.

This test could be rewritten as equivalence between statistical_inefficiency and slicing, if you prefer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ptmerz Why do you need to check slicing when you are changing the code in statistical_inefficiency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't need to. While investigating the bug I reported I noticed that there was no test for ranges in slicing or in statistical_inefficiency, which made it possible for the bug in statistical_inefficiency to go unnoticed until now. I thought it would be helpful to broaden the test coverage also for slicing to avoid any future bugs. Do you prefer me to remove them?

src/alchemlyb/tests/test_preprocessing.py Show resolved Hide resolved
src/alchemlyb/tests/test_preprocessing.py Outdated Show resolved Hide resolved
# Slice data, and check that we don't observe times outside
# the prescribed range
sliced = self.slicer(data,
series=None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this test class tests statistical_inefficiency(...), the one above tests slicing(...).

So this tests makes sure that statistical_inefficiency respects the upper and lower bounds when it's used without series. The test above makes sure that slicing respects the upper and lower bounds.

This test could be rewritten as equivalence between statistical_inefficiency and slicing, if you prefer.

src/alchemlyb/tests/test_preprocessing.py Show resolved Hide resolved
assert all(sliced.reset_index()['time'] >= lower)
assert all(sliced.reset_index()['time'] <= upper)
# Make sure we didn't change input data
assert len(data) == original_length
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: We should treat this in the same way as the test above (see comment line 92).

Copy link
Collaborator

@xiki-tempula xiki-tempula Jul 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ptmerz It might be good to abstract out the core of these three tests out as fixture and use parametrize to go through them

something like

@pytest.mark.parametrize(
    'output_data,original_length',
    [
        (lazy_fixture('inefficiency'), 100),
        (lazy_fixture('slicing'), 1000),
    ]
def test_original_length(output_data, original_length):
    assert len(data) == original_length

src/alchemlyb/tests/test_preprocessing.py Show resolved Hide resolved
@ptmerz
Copy link
Contributor Author

ptmerz commented Jun 29, 2022

I think the test could be simplified as bit using the @pytest.mark.parametrize

Agreed that this could unify the second and third added test (TestStatisticalInefficiency. test_lower_and_upper_bound_slicer and TestStatisticalInefficiency.test_lower_and_upper_bound_inefficiency). Unless we decide to rewrite TestStatisticalInefficiency. test_lower_and_upper_bound_slicer as an equivalence between statistical_inefficiency and slicing, of course.

So I'd suggest to wait until we have agreed on how to proceed with the remaining points raised, and I will implement this simplification if it's still viable!

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good to me — just small comments on the tests.

src/alchemlyb/tests/test_preprocessing.py Outdated Show resolved Hide resolved
Comment on lines 73 to 74
(gmx_benzene_dHdl(), 1000, 34000),
(gmx_benzene_u_nk(), 1000, 34000),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not great to call the data provider functions here because this adds to the time during which tests are gathered in serial as opposed during test execution (which can be parallelized). I know that we already have plenty examples of doing exactly that elsewhere so it's really not your fault.

I am just wondering if there are better ways to do this, e.g., passing the function itself as data_generator and then calling it as data = data_generator() might be easiest. It would be better if we could generate them as fixtures (but parametrized fixtures are still awkward IIRC).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orbeckst The parametrized fixtures could be made easy with pytest-lazy-fixture

I have posted an example here as is noted in their website https://pypi.org/project/pytest-lazy-fixture/. This is conda installable as well.

@pytest.fixture(params=[
    pytest.lazy_fixture('one'),
    pytest.lazy_fixture('two')
])
def some(request):
    return request.param

@pytest.fixture
def one():
    return 1

@pytest.fixture
def two():
    return 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a lot of sense @orbeckst, I wasn't aware of this issue. Unfortunately I haven't had time to look into this yet. Current version therefore still has the greedy evaluation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lazy_fixture looks like a convenient solution. I looked at the package and I am not sure how well maintained it is. There's a commit from Jan 2022 in https://github.com/tvorog/pytest-lazy-fixture but the pypi page https://pypi.org/project/pytest-lazy-fixture/ states that the highest supported Python version is 3.7. @xiki-tempula , do you have experience with it in production code? It would be annoying to rewrite quite a few of our tests, just to discover that it breaks with a new release of pytest...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look around in our tests and we do have examples of evaluating the loading functions later: see e.g.

class TestMBAR(FEPestimatorMixin):
"""Tests for MBAR.
"""
cls = MBAR
@pytest.fixture(scope="class",
params=[(gmx_benzene_coul_u_nk, 3.041, 0.02088),
(gmx_benzene_vdw_u_nk, -3.007, 0.04519),
(gmx_expanded_ensemble_case_1, 75.923, 0.14124),
(gmx_expanded_ensemble_case_2, 75.915, 0.14372),
(gmx_expanded_ensemble_case_3, 76.173, 0.11345),
(gmx_water_particle_with_total_energy, -11.680, 0.083655),
(gmx_water_particle_with_potential_energy, -11.675, 0.083589),
(gmx_water_particle_without_energy, -11.654, 0.083415),
(amber_bace_example_complex_vdw, 2.40200, 0.0618453),
(gomc_benzene_u_nk, -0.79994, 0.091579),
])
def X_delta_f(self, request):
get_unk, E, dE = request.param
return get_unk(), E, dE
def test_mbar(self, X_delta_f):
self.compare_delta_f(X_delta_f)

This is not too bad.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised #206 for fixing the other instances of upfront data loading.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orbeckst yes, I used the lazy_fixture in our production code but the reason that I use it is that it was there. It works fine for me but I do share your concern about it not being updated for years.

On the other hand, one could also use the official pytest solution

@pytest.fixture(params=['dir1_fixture', 'dir2_fixture'])
def dirname(request):
    return request.getfixturevalue(request.param)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's use the pytest-internal solution!

I'll update #206.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks straightforward

@pytest.mark.parametrize(('dataloader', 'size'), [('gmx_benzene_dHdl', 661), 
                                                  ('gmx_benzene_u_nk', 661)]) 
     def test_basic_slicing(self, dataloader, size, request):
         data = request.getfixturevalue(dataloader)
         assert len(self.slicer(data, lower=1000, upper=34000, step=5)) == size 

src/alchemlyb/tests/test_preprocessing.py Outdated Show resolved Hide resolved
src/alchemlyb/tests/test_preprocessing.py Show resolved Hide resolved
* Extract test for unchanged data in separate test
* Parametrize tests of statistical_inefficiency to merge cases
  which use `series` with the ones that don't
* Parametrize all tests to use `conservative` `True` and `False`
* Remove check for unchanged data that are now unnecessary
@ptmerz
Copy link
Contributor Author

ptmerz commented Jul 4, 2022

@xiki-tempula @orbeckst

Thank you for your reviews. I did a first pass, which should address most of your comments. I'm copying the git log here as a summary:

  • Extract input sanity check in separate function
  • Extract test for unchanged data in separate test
  • Parametrize tests of statistical_inefficiency to merge cases which use series with the ones that don't
  • Parametrize all tests to use conservative True and False
  • Remove check for unchanged data that are now unnecessary (in the last equivalence test, since we now know that all operations don't change the input data)

This doesn't yet address the issue with greedy parametrization, I unfortunately did not have time to look into this yet.

Copy link
Collaborator

@xiki-tempula xiki-tempula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the work and for addressing the comments.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for all the work @ptmerz and explaining your choices to us — very much appreciated!

  1. After looking at the up-front loading I'd ask you to change your code to load during run time and not set up. I posted an example from our code that should make it relatively straightforward to implement.

    I will raise an issue to eventually change the other parts of the tests in a similar fashion.

  2. Please add .. versionchanged:: 0.7.0 entries to the docs to summarize the change in behavior. We count it as a bug fix but because people may have unwittingly relied on the old behavior and produced results with it, we want to be clear that numbers will change.

  3. Please also add yourself to AUTHORS and add an entry to CHANGES.

Thank you!!

@ptmerz
Copy link
Contributor Author

ptmerz commented Jul 7, 2022

@orbeckst , I uploaded new commits that should hopefully address your points. Note that currently, the data loading functions had to be duplicated, as they can't be normal functions and fixtures at the same time. If after resolving #206, you only use them as fixtures, you could remove this duplication.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some small changes that I will apply directly. Everything looks really good!

One source code line change, lots of other stuff around it ... but that's what makes the difference for a well-tested library. Thank you for your willingness to see the whole PR through to the end! We very much appreciated your contribution.

CHANGES Outdated Show resolved Hide resolved
src/alchemlyb/preprocessing/subsampling.py Outdated Show resolved Hide resolved
src/alchemlyb/tests/test_preprocessing.py Outdated Show resolved Hide resolved
src/alchemlyb/tests/test_preprocessing.py Show resolved Hide resolved
@orbeckst orbeckst merged commit bc39ef3 into alchemistry:master Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

statistical_inefficiency does not slice data frame
3 participants