Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: concat ignoring empty objects #52532

Merged
merged 28 commits into from
Jul 10, 2023

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Apr 7, 2023

cc @jorisvandenbossche

@mroeschke mroeschke added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Deprecate Functionality to remove in pandas labels Apr 10, 2023
@jbrockmendel jbrockmendel changed the title DEPR: concat with empty objects DEPR: concat ignoring empty objects Apr 10, 2023
Comment on lines 112 to 116
if len(non_empties) < len(to_concat) and not any(
obj.dtype == _dtype_obj for obj in non_empties
):
# Check for object dtype is an imperfect proxy for checking if
# the result dtype is going to change once the deprecation is
Copy link
Member

@jorisvandenbossche jorisvandenbossche Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we verify this exactly? (checking what the common dtype would be with or without the empties?)

Because I think the check above will easily give false positives (unless I am misreading it): for example, for just a mixture of int types, of which one is empty, will trigger the warning? (since none of them is object dtype) While it will typically not change the resulting dtype (except if the input arrays with the largest bitwidth are all empty)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a try! Will take a look

@jorisvandenbossche
Copy link
Member

Rereading some of the previous discussions, would you be OK with start doing this deprecation for empty objects that are not object/float dtype? (since those are used as generic "empty" dtype)

@jbrockmendel
Copy link
Member Author

Rereading some of the previous discussions, would you be OK with start doing this deprecation for empty objects that are not object/float dtype? (since those are used as generic "empty" dtype)

I'm not dead-set against that, but would like to push down this road a bit further first. IIRC the object/float (in particular float) cases were mostly a sticking point in the all-nan cases more than the empty cases.

@MarcoGorelli MarcoGorelli self-requested a review May 15, 2023 18:24
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like something went wrong when merging in the whatsnew

also, this is kind of hard to review, it looks like you're both refactoring and introducing the deprecation warning - any chance to split out the refactoring into a precursor PR please? (or to write a few changes summarising the changes)

doc/source/whatsnew/v2.1.0.rst Outdated Show resolved Hide resolved
@jbrockmendel
Copy link
Member Author

also, this is kind of hard to review, it looks like you're both refactoring and introducing the deprecation warning - any chance to split out the refactoring into a precursor PR please? (or to write a few changes summarising the changes)

I think I can refactor out as a precursor, coming up shortly.

@MarcoGorelli MarcoGorelli self-requested a review May 25, 2023 18:35
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, but I don't understand the is_na_without_isna_all -> is_na_after_size_and_isna_all_deprecation change - could you explain please?

Comment on lines -409 to -418
def is_na_without_isna_all(self) -> bool:
def is_na_after_size_and_isna_all_deprecation(self) -> bool:
"""
Will self.is_na be True after values.size == 0 deprecation and isna_all
deprecation are enforced?
"""
blk = self.block
if blk.dtype.kind == "V":
return True
if not blk._can_hold_na:
return False

values = blk.values
if values.size == 0:
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the future behavior won't depend on values.size == 0 (note the changed method name/docstring)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure but the deprecation hasn't been enforced yet, why is this changing already?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is for checking on the future behavior to see if we need to issue a warning.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine to me

@jorisvandenbossche any objections?

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jun 27, 2023
@jbrockmendel
Copy link
Member Author

@jorisvandenbossche @MarcoGorelli gentle ping

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

I'd suggest self-merging if it's something you're confident about and have had an approval

@MarcoGorelli MarcoGorelli added this to the 2.1 milestone Jul 10, 2023
@MarcoGorelli MarcoGorelli merged commit 76854ce into pandas-dev:main Jul 10, 2023
@jorisvandenbossche
Copy link
Member

My apologies for the slow response here, but I would like to get back to this one point (#52532 (comment)):

Rereading some of the previous discussions, would you be OK with start doing this deprecation for empty objects that are not object/float dtype? (since those are used as generic "empty" dtype)

I'm not dead-set against that, but would like to push down this road a bit further first. IIRC the object/float (in particular float) cases were mostly a sticking point in the all-nan cases more than the empty cases.

I am not fully sure how to interpret the "push down this road a bit further first", but so you would not necessarily be against doing this deprecation only for non-object/float dtypes?

(I am also happy to do a follow-up PR for that)

@mroeschke
Copy link
Member

@jbrockmendel when you have a moment, could you also address this test failure (looks like this PR could have used a rebase)

FAILED pandas/tests/groupby/test_groupby.py::test_indices_concatenation_order - AssertionError: Did not see expected warning of class 'FutureWarning'

@jbrockmendel jbrockmendel deleted the depr-concat-empty branch July 10, 2023 23:48
@jbrockmendel
Copy link
Member Author

when you have a moment, could you also address this test failure (looks like this PR could have used a rebase)

will take a look now.

I am not fully sure how to interpret the "push down this road a bit further first", but so you would not necessarily be against doing this deprecation only for non-object/float dtypes?

IIRC at the time of that comment there were other review comments to address (#52532 (comment)) and i wanted to see how "clean" this PR would be once those were addressed.

Also as I said in that comment, my understanding of the motivation to limit to only non-object/float cases was about the all-NA case, not the empty case. What would the motivation be to special-case the deprecation here?

wshanks added a commit to wshanks/qiskit-experiments that referenced this pull request Nov 29, 2023
wshanks added a commit to wshanks/qiskit-experiments that referenced this pull request Dec 20, 2023
wshanks added a commit to wshanks/qiskit-experiments that referenced this pull request Dec 20, 2023
github-merge-queue bot pushed a commit to qiskit-community/qiskit-experiments that referenced this pull request Jan 19, 2024
This change rolls together several small changes to avoid warnings
generated when running the tests.

Here is a summary of the changes:

* Update use of `datetime.utcnow()` deprecated in Python 3.12
* Concatenate pandas dataframes more carefully to avoid `FutureWarning`
when concatenating an empty dataframe (see
pandas-dev/pandas#52532).
* Adjust plotter tests to avoid warnings
+ Use supported options rather than arbitrary options to avoid
unexpected option warning
+ Catch expected warning when passing an untrained discriminator to
`IQPlotter`
* Move the setting of the axis scale earlier in the logic of
`MplDrawer.format_canvas`. `format_canvas` queries the limits of each
`axes` object and uses those values. When no data set on a set of axes
(an odd edge case), the default values depend on the scale (because the
minimum for linear is 0 but for log is 0.1), so the scale needs to be
set first.
* Remove transpile options from `FineDragCal` (`_transpiled_circuits`
for `BaseCalibrationExperiment` warns about and ignores transpile
options)
* Replace `isinstance` checks with `pulse_type` checks for calibration
tests checking pulse shape
* Replace `qiskit.Aer` usage with `qiskit_aer`
* Set tolerance on cvxpy test to larger value, close to the tolerance
achieved in practice. The routine was maxing out its iterations without
achieving tolerance but still producing a close enough result for the
test to pass. Increasing the tolerance avoids the max iterations warning
and makes the test faster.
* Rename `TestLibrary` to `MutableTestLibrary`. The class name starting
with `Test*` causes pytest to warn about the class looking like a class
holding test cases. pytest is not the main way we run the tests but it
still seems best to avoid confusion between test case classes and test
helper classes.
* Catch warnings about insufficient trials in `QuantumVolume` tests
using small numbers of trials.
* Catch user and deprecation warnings about calibration saving and
loading to csv files.
* Convert existing calibration saving and loading tests from json to
csv, leaving basic coverage of csv saving. A bug was found with json
saving, resulting in one test being marked as an expected failure.
* Set data on the `ExperimentData` for
`TestFramework.test_run_analysis_experiment_data_pickle_roundtrip` to
avoid a warning about an empty experiment data object. Other cases of
this warning were addressed in 729014b
but this test case was added afterward in
a6c9e53.

In order to avoid warnings creeping in in the future, this change also
makes any warnings emitted by qiskit-experiments code be treated as
exceptions by the tests (with an exception for `ScatterTable` methods
for which it is planned to remove the deprecation). Any expected
warnings in tests can be handled by using `assertWarns` or
`warnings.catch_warnings`.

As all the changes relate to tests (except the `FineDragCal` one which
is a non-functional change to avoid a warning), no release note was
added.

---------

Co-authored-by: Helena Zhang <Helena.Zhang@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Reshaping Concat, Merge/Join, Stack/Unstack, Explode Stale
Projects
None yet
4 participants