Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up the iloc in the tests #254

Merged
merged 16 commits into from
Dec 6, 2022
Merged

Conversation

xiki-tempula
Copy link
Collaborator

Fix #202

I have cleaned up most of the iloc in the test but there are a couple of tests that still use iloc. Most of them are parameterised tests where different datasets demand different column labels.

    @pytest.mark.parametrize(('data', 'size'), [(gmx_benzene_dHdl(), 4001),
                                                (gmx_benzene_u_nk(), 4001)])
    def test_subsampling(self, data, size):
        assert len(self.slicer(data, series=data.iloc[:, 0])) <= size

In this case, the first column is always chosen which have different column name for different datasets.

Another case is

    def get_delta_f(self, est):
        ee = 0.0

        for i in range(len(est.d_delta_f_) - 1):
            ee += est.d_delta_f_.values[i][i+1]**2
        return est.delta_f_.iloc[0, -1], ee**0.5

Where depending on the dataset, this could be est.delta_f_[0.0][1.0] or est.delta_f_.loc[(0.0,0.0)][(1.0,1.0)]

@codecov
Copy link

codecov bot commented Oct 16, 2022

Codecov Report

Merging #254 (45c9e0a) into master (079a5b5) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #254   +/-   ##
=======================================
  Coverage   98.69%   98.69%           
=======================================
  Files          26       26           
  Lines        1761     1761           
  Branches      379      379           
=======================================
  Hits         1738     1738           
  Misses          3        3           
  Partials       20       20           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@orbeckst
Copy link
Member

I think the remaining iloc cases are fine. Could you add a short comment stating why these are ilocs, just so that what you found now is not forgotten?

@xiki-tempula xiki-tempula mentioned this pull request Oct 25, 2022
6 tasks
@xiki-tempula xiki-tempula requested a review from orbeckst December 4, 2022 10:50
@xiki-tempula xiki-tempula marked this pull request as ready for review December 4, 2022 10:50
@orbeckst
Copy link
Member

orbeckst commented Dec 5, 2022

PLease resolve the conflicts and ping me when you need a review.

@xiki-tempula
Copy link
Collaborator Author

@orbeckst I have merged the master into this branch.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates.

I have a bunch of suggestions & PEP8 fixes. Primarily, can we avoid df.loc[a][b] and instead write df.loc[a, b] instead?

I am approving but please do at least the PEP8 formatting. Thank you!

ds += u_nk.iloc[i][i]
for i, lambda_ in enumerate(u_nk.columns):
#18.6 is the time step
ds += u_nk.loc[i*186/10][lambda_].values[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a proper .loc, can't we use

Suggested change
ds += u_nk.loc[i*186/10][lambda_].values[0]
ds += u_nk.loc[i*186/10, lambda_].values[0]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a proper .loc.
the .loc[1.0] could in theory match both dataframe with row index (1.0, 0.0, 0.0) and (1.0).
But .loc[1.0, :] would only strictly match (1.0).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, after looking at the data I see that this is a multiindex. But in this case wouldn't be

ds += u_nk.loc[i*186/10].loc[lambda_].values[0]

be cleaner?

(I have no idea how to get something like ".loc[lambda_, (0.0, 0.0)]" instead of the values[0] but I find pandas indexing confusing...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we cannot get lambda_ beforehand and the lambda_ could be (0.0, 0.0) or (0.0, 0.0, 0.0), which is why for some of them, I could only do iloc.
In some sense, we could get them by analysing the dataframe but that would make the unit test too complicated so I think the current form is a good balance.

@@ -124,7 +124,7 @@ def test_u_nk_with_total_energy():

# Check one specific value in the dataframe
assert_almost_equal(
extract_u_nk(dataset['data']['AllStates'][0], T=300).iloc[0][0],
extract_u_nk(dataset['data']['AllStates'][0], T=300).loc[0][(0.0,0.0)].values[0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 and using .loc

Suggested change
extract_u_nk(dataset['data']['AllStates'][0], T=300).loc[0][(0.0,0.0)].values[0],
extract_u_nk(dataset['data']['AllStates'][0], T=300).loc[0, (0.0, 0.0)].values[0],

@@ -142,7 +142,7 @@ def test_u_nk_with_potential_energy():

# Check one specific value in the dataframe
assert_almost_equal(
extract_u_nk(dataset['data']['AllStates'][0], T=300).iloc[0][0],
extract_u_nk(dataset['data']['AllStates'][0], T=300).loc[0][(0.0,0.0)].values[0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@@ -161,7 +161,7 @@ def test_u_nk_without_energy():

# Check one specific value in the dataframe
assert_almost_equal(
extract_u_nk(dataset['data']['AllStates'][0], T=300).iloc[0][0],
extract_u_nk(dataset['data']['AllStates'][0], T=300).loc[0][(0.0,0.0)].values[0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@@ -139,6 +139,8 @@ def compare_delta_f(self, X_delta_f):
assert X_delta_f[2] == pytest.approx(d_delta_f, rel=1e-3)

def get_delta_f(self, est):
# Use .iloc[0, -1] as we want to cater for both
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the comment

@@ -60,7 +60,7 @@ def dhdl():

def test_kt2kt_number(self, dhdl):
new_dhdl = to_kT(dhdl)
assert 12.9 == pytest.approx(new_dhdl.iloc[0, 0], 0.1)
assert 12.9 == pytest.approx(new_dhdl.loc[(0.0,0.0)], 0.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8 space after comma

Suggested change
assert 12.9 == pytest.approx(new_dhdl.loc[(0.0,0.0)], 0.1)
assert 12.9 == pytest.approx(new_dhdl.loc[(0.0, 0.0)], 0.1)

@@ -74,7 +74,7 @@ def test_kj2kt_unit(self, dhdl):
def test_kj2kt_number(self, dhdl):
dhdl.attrs['energy_unit'] = 'kJ/mol'
new_dhdl = to_kT(dhdl)
assert 5.0 == pytest.approx(new_dhdl.iloc[0, 0], 0.1)
assert 5.0 == pytest.approx(new_dhdl.loc[(0.0,0.0)], 0.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8 space after comma

@@ -84,7 +84,7 @@ def test_kcal2kt_unit(self, dhdl):
def test_kcal2kt_number(self, dhdl):
dhdl.attrs['energy_unit'] = 'kcal/mol'
new_dhdl = to_kT(dhdl)
assert 21.0 == pytest.approx(new_dhdl.iloc[0, 0], 0.1)
assert 21.0 == pytest.approx(new_dhdl.loc[(0.0,0.0)], 0.1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8 space after comma

Comment on lines +161 to +162
forward.append(estimate.delta_f_.loc[0.0,1.0])
forward_error.append(estimate.d_delta_f_.loc[0.0,1.0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8 space after comma

Comment on lines +166 to +167
backward.append(estimate.delta_f_.loc[0.0,1.0])
backward_error.append(estimate.d_delta_f_.loc[0.0,1.0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8 space after comma

@xiki-tempula
Copy link
Collaborator Author

@orbeckst For the PEP8, I guess I could just do a project-wide black, which is the same as the one for flamel?

@orbeckst
Copy link
Member

orbeckst commented Dec 5, 2022

Can we do black in a separate PR? Reformatting hides any relevant changes. For this PR, do the few fixes manually and then we can blackify in a separate PR.

@xiki-tempula
Copy link
Collaborator Author

@orbeckst Would it be easier if I merge this PR as it is. Then do a black PR?
For the .loc[0][0] related things, I cannot change them.
For the space after the ,, black will automatically sort them out.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's fine then; run black later

pandas indexing is a bit confusing...

ds += u_nk.iloc[i][i]
for i, lambda_ in enumerate(u_nk.columns):
#18.6 is the time step
ds += u_nk.loc[i*186/10][lambda_].values[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, after looking at the data I see that this is a multiindex. But in this case wouldn't be

ds += u_nk.loc[i*186/10].loc[lambda_].values[0]

be cleaner?

(I have no idea how to get something like ".loc[lambda_, (0.0, 0.0)]" instead of the values[0] but I find pandas indexing confusing...)

@orbeckst orbeckst self-assigned this Dec 6, 2022
@orbeckst orbeckst merged commit 380302d into alchemistry:master Dec 6, 2022
@xiki-tempula xiki-tempula deleted the feat_iloc branch December 6, 2022 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch tests to use .loc for all delta_f_ comparisons
2 participants