Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: fix DataFrame.isin docstring and doctests #22767

Merged
merged 7 commits into from
Sep 25, 2018

Conversation

Moisan
Copy link
Contributor

@Moisan Moisan commented Sep 19, 2018

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

Based on #22459. Fix the docstring for DataFrame.isin. I also updated ci/doctests.sh.

@pep8speaks
Copy link

Hello @Moisan! Thanks for submitting the PR.

@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Sep 19, 2018
@codecov
Copy link

codecov bot commented Sep 20, 2018

Codecov Report

Merging #22767 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #22767   +/-   ##
=======================================
  Coverage   92.18%   92.18%           
=======================================
  Files         169      169           
  Lines       50820    50820           
=======================================
  Hits        46850    46850           
  Misses       3970     3970
Flag Coverage Δ
#multiple 90.6% <ø> (ø) ⬆️
#single 42.38% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/frame.py 97.2% <ø> (ø) ⬆️
pandas/core/generic.py 96.67% <0%> (ø) ⬆️
pandas/core/reshape/melt.py 97.34% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27de8e6...938a21f. Read the comment docs.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good changes, added some ideas to improve the docstring even further.

pandas/core/frame.py Show resolved Hide resolved
@@ -7458,23 +7458,24 @@ def isin(self, values):
1 False False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you start the example before the "when values is a list" with the constructor, and displaying the content of the original Dataframe (i.e. >>> df)

Also, I think the docstring would be clearer if we use a more real world example. For example, if the index is an animal name, and the columns are num_legs and num_wings, a df.isin([2]) having a True in falcon/wings is very easy to see what it's doing. And not so much in a foo, A, 1, 2, 3 kind of example.

DataFrame of booleans
DataFrame
DataFrame of boolean showing whether each element in the DataFrame
is contained in values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a See Also section? I think DataFrame.eq and Series.str.isin are worth referencing.

@@ -7458,23 +7458,24 @@ def isin(self, values):
1 False False
2 True False

When ``values`` is a dict:
When ``values`` is a dict. Note that B doesn't match the 1 here.

>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [1, 4, 7]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better if we can use the same DataFrame for all the examples.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvements! Added some more comments

which must match. If `values` is a DataFrame,
then both the index and column labels must match.

Returns
-------
DataFrame
DataFrame of boolean showing whether each element in the DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boolean -> booleans

pandas/core/frame.py Show resolved Hide resolved

When ``values`` is a dict:
When ``values`` is a dict.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add something to this sentence to explain what is happening when values is a dict. Something like "When values is a dict, we can pass values to check for each column separately:"

(similar for the example below as well)

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Happy to merge it as it is, but added a couple of ideas of things that could avoid confusion. Just in case you think they are good ideas.

>>> df.isin([2])
num_legs num_wings
falcon True True
dog False False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using df.isin([0, 2])? I like the example, but I don't like showing just the case with one value in values, as it'd be better in that case to simply do df == 2.

Also, I think adding a bit more description would help (e.g. When values is a list, check whether every value in the DataFrame is present in the list (which animals have 0 or 2 legs or wings))

2 True True
num_legs num_wings
falcon False True
dog False False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the example, but I don't like that we have to use wrong information (a falcon without legs) ;)

Not sure if better of worse, but what do you think about using a DataFrame with just the num_wings column instead? And may be we can get rid of the parrot, and show df2 content, so it's clearer to see what's going on?

Also, small detail, but I think we used other instead of df2 in other docstrings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see :). No problem I agree with the changes!

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for another great docstring @Moisan, keep them coming :)

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@jorisvandenbossche jorisvandenbossche merged commit 302d43a into pandas-dev:master Sep 25, 2018
@jorisvandenbossche
Copy link
Member

@Moisan Thanks a lot!

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants