Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Make most arguments for read_html and read_json keyword-ony #27573

Merged
merged 13 commits into from
Apr 7, 2020

Conversation

alexitkes
Copy link
Contributor

@alexitkes alexitkes commented Jul 24, 2019

As mentioned in #27544 it is better to use keyword-only arguments for functions with too many arguments. A deprecation warning will be displayed if read_html get more than 2 positional arguments (io and match) or read_json get more than 1 positional argument (path_or_buf).

  • tests added / passed
    Three groups of tests are actually needed
    • Tests dof the deprecate_nonkeyword_arguments decoratorr
    • Check whether the read_json function emits a FutureWarning whenever necessary and does not emit it whenever not.
    • Check whether the read_html function emits a FutureWarning whenever necessary and does not emit it whenever not.
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@simonjayhawkins simonjayhawkins added IO HTML read_html, to_html, Styler.apply, Styler.applymap IO JSON read_json, to_json, json_normalize labels Jul 24, 2019
@alexitkes alexitkes changed the title WIP: Make most arguments for read_html and read_json keyword-ony Make most arguments for read_html and read_json keyword-ony Jul 24, 2019
@TomAugspurger
Copy link
Contributor

We need to deprecate the old behavior first. As written, this is an api breaking change.

And we shouldn’t do this until 1.0 or later.

@jreback
Copy link
Contributor

jreback commented Jul 25, 2019

We need to deprecate the old behavior first. As written, this is an api breaking change.

And we shouldn’t do this until 1.0 or later.

and how exactly would you deprecate ?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 25, 2019 via email

@alexitkes
Copy link
Contributor Author

Thank you. I also wanted to propose somewhat like this.

def Deprecated(*, n_args: int):
    def _deprecated(f):
        def _f(*args, **kwargs):
            if len(args) > n_args:
                warnings.warn("%s will only accept %i positional arguments in version X.Y.Z" % (f.__name__,n_args))
            return f(*args, **kwargs)
        return _f
    return _deprecated

Or do such a decorator already exist?

@pep8speaks
Copy link

pep8speaks commented Jul 26, 2019

Hello @alexitkes! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-03-31 18:09:11 UTC

@alexitkes alexitkes changed the title Make most arguments for read_html and read_json keyword-ony [WIPMake most arguments for read_html and read_json keyword-ony Jul 26, 2019
@alexitkes alexitkes changed the title [WIPMake most arguments for read_html and read_json keyword-ony [WIP] Make most arguments for read_html and read_json keyword-ony Jul 26, 2019
@alexitkes alexitkes changed the title [WIP] Make most arguments for read_html and read_json keyword-ony Make most arguments for read_html and read_json keyword-ony Jul 28, 2019
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests to read_html / read_json to ensure that warnings are raised there?

doc/source/whatsnew/v0.25.1.rst Outdated Show resolved Hide resolved
pandas/io/html.py Outdated Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor

Lifting #27573 (comment) to the top level

I'd like not to repeat the arguments here. That seems fragile. Is it possible to have the decorator inspect the signature and warn on any positional_or_keyword arg passed positionally?

What do you think about this @alexitkes?

@alexitkes
Copy link
Contributor Author

@TomAugspurger Thank you for review. Yes, writing argument list twice will be quite annoying, but I will need some time to find a way to avoid it, and I would like to try. However, I think it would be better to preserve the allowed_args argument for the decorator and default it to list of required arguments if None given.

@TomAugspurger
Copy link
Contributor

However, I think it would be better to preserve the allowed_args argument for the decorator and default it to list of required arguments if None given.

Why do you think we'll need that? That sounds more complicated than just detecting kwargs passed positionally. If we aren't going to need it then I'd rather not implement it.

@alexitkes
Copy link
Contributor Author

@TomAugspurger I have just written such a decorator and it does not look too complicated. However it is my first experience with inspect module so possibly more tests are needed though some tests are written and they all run successfully.

It will not be difficult to remove allowed_args argument and always set it to list of required arguments, if you think eventually all arguments having the default values should be keyword-only arguments. Should I do it?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 30, 2019 via email

@alexitkes
Copy link
Contributor Author

I'm not sure without looking more closely. My thought is that all keyword
arguments being keyword-only is easier to explain to users.

I have almost forgotten one thing - the path_or_buf argument for read_json has the default value None. So if we decorate the function in that way all arguments for read_json will be keyword-only. But I think most users are used to writing read_json(filename) and will not like to write read_json(path_or_buf=filename). For this reason I still think there should be a way to pass the list of arguments allowed to be positional to the decorator. However, the default decorator behavior (optional arguments are keyword only) should be used wherever there is no really significant reason to change it. The read_html function is decorated in this way.

@TomAugspurger
Copy link
Contributor

I have almost forgotten one thing - the path_or_buf argument for read_json has the default value None. So if we decorate the function in that way all arguments for read_json will be keyword-only.

Oh sorry. In that case we'll definitely need to have the ability to have some keyword-or-positional arguments.

@alexitkes
Copy link
Contributor Author

Well, it seems to work fine though I am a bit worried because it's my first experience with inspect module. I also don't know the proper number of version in which it will become impossible to use positionl arguments (currently written 1.1). What should it be?

pandas/io/html.py Outdated Show resolved Hide resolved
"""
df = pd.DataFrame({"A": [2, 4, 6], "B": [3, 6, 9]}, index=[0, 1, 2])
with tm.assert_produces_warning(FutureWarning):
assert_frame_equal(df, read_json(df.to_json(orient="split"), "split"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you minimize the amount of code in the context manager? I think just read_json(buf, "split").

Also, do we need to test all three of these? What differs between them?

pandas/tests/io/json/test_deprecated_kwargs.py Outdated Show resolved Hide resolved
pandas/util/_decorators.py Outdated Show resolved Hide resolved
def wrapper(*args, **kwargs):
if isinstance(allow_args, int) and len(args) > allow_args:
msg = (
"After version %s all arguments of %s "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version -> pandas version.

Use {}-style string templates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done it. I also added a separate function to form a better looking warning message depending on the number of arguments, but so far I only tested it manually. Is there a proper way to test how exactly does the produced warning look?

pandas/tests/util/test_deprecate_nonkeyword_arguments.py Outdated Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor

@WillAyd @jreback @jorisvandenbossche do you have thoughts on this (the overall goal, the implementation seems fine).

  • Do we expect to expand this to more readers? Should these be done in the same release?
  • What about other places in the library? Same release?
  • What's a good general policy on whether an argument should be keyword only?

@alexitkes alexitkes force-pushed the alexitkes/keyword-args branch 2 times, most recently from cd84833 to c8284ef Compare August 6, 2019 12:57
pandas/tests/io/test_html.py Outdated Show resolved Hide resolved

def test_index(self):
df1 = self.read_html(self.spam_data, ".*Water.*", index_col=0)
df2 = self.read_html(self.spam_data, "Unit", index_col=0)
with tm.assert_produces_warning(FutureWarning):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment for non-duplicated tests; would rather not use the deprecated behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, such amount of duplicated tests looks quite ugly, so I removed all tests with deprecated behavior except for one. Would it be enough?

pandas/tests/util/test_deprecate_nonkeyword_arguments.py Outdated Show resolved Hide resolved
pandas/util/_decorators.py Outdated Show resolved Hide resolved
pandas/util/_decorators.py Outdated Show resolved Hide resolved
pandas/io/html.py Outdated Show resolved Hide resolved
@alexitkes
Copy link
Contributor Author

@WillAyd OK, done it.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be OK with moving forward here - @jreback or @TomAugspurger ?

@TomAugspurger
Copy link
Contributor

Do we have a general policy for which arguments are keyword only?

@jreback
Copy link
Contributor

jreback commented Apr 3, 2020

lgtm. on anything with lots of args (pretty much read_*), happy to only have the first arg positional (path_or_buf).

@jreback jreback added this to the 1.1 milestone Apr 6, 2020
@jreback
Copy link
Contributor

jreback commented Apr 6, 2020

looks good, @TomAugspurger ok here?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 6, 2020 via email

@jreback jreback merged commit 467e1c2 into pandas-dev:master Apr 7, 2020
@jreback
Copy link
Contributor

jreback commented Apr 7, 2020

thanks!

@jorisvandenbossche
Copy link
Member

Thanks @alexitkes !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants