Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: validate_docstrings has many warnings #44642

Closed
3 tasks done
TomAugspurger opened this issue Nov 27, 2021 · 11 comments
Closed
3 tasks done

BUG: validate_docstrings has many warnings #44642

TomAugspurger opened this issue Nov 27, 2021 · 11 comments
Assignees
Labels
Docs good first issue Warnings Warnings that appear or should be added to pandas

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 27, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

Running ./ci/code_checks.sh docstrings results in many warnings:

$ ./ci/code_checks.sh docstrings
scripts/validate_docstrings.py:124: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.NumericIndex with the appropriate dtype instead.
  func = getattr(func, part)
scripts/validate_docstrings.py:124: FutureWarning: pandas.UInt64Index is deprecated and will be removed from pandas in a future version. Use pandas.NumericIndex with the appropriate dtype instead.
  func = getattr(func, part)
scripts/validate_docstrings.py:124: FutureWarning: pandas.Float64Index is deprecated and will be removed from pandas in a future version. Use pandas.NumericIndex with the appropriate dtype instead.
  func = getattr(func, part)
<doctest pandas.Index.equals[11]>:1: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.NumericIndex with the appropriate dtype instead.
  int64_idx = pd.Int64Index([1, 2, 3])
<doctest pandas.Index.equals[13]>:1: FutureWarning: pandas.UInt64Index is deprecated and will be removed from pandas in a future version. Use pandas.NumericIndex with the appropriate dtype instead.
  uint64_idx = pd.UInt64Index([1, 2, 3])
<doctest pandas.Index.is_mixed[1]>:1: FutureWarning: Index.is_mixed is deprecated and will be removed in a future version. Check index.inferred_type directly instead.
  idx.is_mixed()
/home/taugspurger/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/numpydoc/validate.py:167: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.NumericIndex with the appropriate dtype instead.
  obj = getattr(obj, part)
/home/taugspurger/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/numpydoc/validate.py:167: FutureWarning: pandas.UInt64Index is deprecated and will be removed from pandas in a future version. Use pandas.NumericIndex with the appropriate dtype instead.
  obj = getattr(obj, part)
/home/taugspurger/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/numpydoc/validate.py:167: FutureWarning: pandas.Float64Index is deprecated and will be removed from pandas in a future version. Use pandas.NumericIndex with the appropriate dtype instead.
  obj = getattr(obj, part)
<doctest pandas.Series.xs[5]>:1: PerformanceWarning: indexing past lexsort depth may impact performance.
  df.xs(('mammal', 'dog'))
<doctest pandas.DataFrame.empty[10]>:1: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  ser_empty = pd.Series()
<doctest pandas.DataFrame.xs[5]>:1: PerformanceWarning: indexing past lexsort depth may impact performance.
  df.xs(('mammal', 'dog'))
<doctest pandas.date_range[8]>:1: FutureWarning: Argument `closed` is deprecated in favor of `inclusive`.
  pd.date_range(start='2017-01-01', end='2017-01-04', closed=None)
<doctest pandas.core.groupby.SeriesGroupBy.transform[2]>:1: FutureWarning: Dropping invalid columns in DataFrameGroupBy.transform is deprecated. In a future version, a TypeError will be raised. Before calling .transform, select only columns which should be valid for the function.
  grouped.transform(lambda x: (x - x.mean()) / x.std())
<doctest pandas.core.groupby.SeriesGroupBy.transform[3]>:1: FutureWarning: Dropping invalid columns in DataFrameGroupBy.transform is deprecated. In a future version, a TypeError will be raised. Before calling .transform, select only columns which should be valid for the function.
  grouped.transform(lambda x: x.max() - x.min())
<doctest pandas.core.groupby.DataFrameGroupBy.transform[2]>:1: FutureWarning: Dropping invalid columns in DataFrameGroupBy.transform is deprecated. In a future version, a TypeError will be raised. Before calling .transform, select only columns which should be valid for the function.
  grouped.transform(lambda x: (x - x.mean()) / x.std())
<doctest pandas.core.groupby.DataFrameGroupBy.transform[3]>:1: FutureWarning: Dropping invalid columns in DataFrameGroupBy.transform is deprecated. In a future version, a TypeError will be raised. Before calling .transform, select only columns which should be valid for the function.
  grouped.transform(lambda x: x.max() - x.min())
<doctest pandas.errors.DtypeWarning[2]>:1: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
  df2 = pd.read_csv('test.csv')
<doctest pandas.core.window.rolling.Rolling.count[1]>:1: FutureWarning: min_periods=None will default to the size of window consistent with other methods in a future version. Specify min_periods=0 instead.
  s.rolling(2).count()

Issue Description

We shouldn't have any unintentional warnings in our docstrings. These examples should be updated to not use the deprecated behavior.

Unfortunately, we don't have the line numbers from the warnings. To find the problematic docstring, we probably just need to do a text search for something like Int64Index(

Expected Behavior

No warnings.

Installed Versions

Replace this line with the output of pd.show_versions()

@TomAugspurger TomAugspurger added Docs good first issue Warnings Warnings that appear or should be added to pandas labels Nov 27, 2021
@IsNotMyIP
Copy link

I'll take a look; if no answer or Pr from my side in 12 hours call a real developer pls 🥲

@IsNotMyIP
Copy link

Sorry, but I can't replicate this issue on my computer... Anyway, if anyone would like to contribute they just have to go to scripts/validate_docstrings.py and :

  • Update Int64Index func, which is going to be deprecated.
  • Update Index.mixed func to Numeric.Index

When i try to run the command it throw me back:
Validate docstrings (GL01, GL02, GL03, GL04, GL05, GL06, GL07, GL09, GL10, SS01, SS02, SS03, SS04, SS05, PR03, PR04, PR05, PR08, PRO9, PR10, EX04, RT01, RT04, RT05, SA02, SA03) Traceback (most recent call last): File "./ci/../scripts/validate_docstrings.py", line 431, in <module> main( File "./ci/../scripts/validate_docstrings.py", line 377, in main return print_validate_all_results( File "./ci/../scripts/validate_docstrings.py", line 321, in print_validate_all_results result = validate_all(prefix, ignore_deprecated) File "./ci/../scripts/validate_docstrings.py", line 286, in validate_all api_items += list(get_api_items(f)) File "./ci/../scripts/validate_docstrings.py", line 124, in get_api_items func = getattr(func, part) AttributeError: type object 'DateOffset' has no attribute 'is_month_start'

@rhshadrach
Copy link
Member

@IsNotMyIP - Just a guess, but I think you need to recompile the libs

@gourcool
Copy link

@TomAugspurger can you assign this issue to me

@IsNotMyIP
Copy link

@rhshadrach I'll check if I can reproduce this on my Windows, I tried in a MacOS....

@gourcool Go for it! If you can replicate the bug in your computer should be easy to fix, let me know if you have any doubt.

@nakarinh14
Copy link

@gourcool Any updates on this? I am thinking to pick this issue up if there isn't much progress yet.

@eshirvana
Copy link
Contributor

yeah, I couldn't reproduce this issue on my machine either

@Condielj
Copy link

take

@TamDBe TamDBe removed their assignment Apr 10, 2022
@rhshadrach
Copy link
Member

rhshadrach commented May 21, 2022

Fixed some of these in #47080, and opened #47079 due to some resample warnings. However many that remain I don't believe can or should be fixed. Here are some categories of warnings I don't believe we can/should fix:

  • Third party use of deprecated warnings (numba - the warning with np.MachAr will be fixed in the next release)
  • Docstrings of deprecated classes/methods - the docstring should remain until they are removed, but any examples will generate warnings
  • Performance warning - this I'm not sure about. But it demonstrates valid use of pandas. Examples include "indexing past lexsort" and "More than 20 figures have been opened" (for plotting)
  • Valid use that will change behavior. For example, pd.Series(). This generates a warning that the future dtype will be object. Do we specify pd.Series(dtype=object)?
  • Some docstrings purposefully generate warnings: pandas.core.ops.missing.mask_zero_div_zero divides by 0 to demonstrate different behaviors. pandas.util._validators.validate_axis_style_args includes "This emits a warning".

@seanjedi
Copy link
Contributor

seanjedi commented Dec 1, 2022

Is this issue still open, or is there something I can work on?

@mroeschke
Copy link
Member

I think these don't appear anymore since we enforced all the deprecations in the 1.x series so closing. Can reopen if these appear again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs good first issue Warnings Warnings that appear or should be added to pandas
Projects
None yet
Development

No branches or pull requests

10 participants