You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to use pd.cut to sort values into bins that are half-open with the lower boundary being the closed end of the interval (i.e. [0, 5), so setting right=False) but still be able to include the upper bound of the last interval (i.e. have the last interval be closed, something like include_highest=True analogous to include_lowest=True for right=True). I encountered this also for infinite boundaries, where adding a small number is not an option (although as a workaround, one can fillna the result as the only remaining nas are those of the infinite right boundary).
I.e. while I can make the first interval closed for pd.cut:
(I would want the last value 3 to be in the final bin [2, 3).)
Describe the solution you'd like
Either, include_lowest could be changed to final_interval_closed or similar to work as include_lowest for right=True and include_highest for right=False (which would break API, see below). This would make the function work somewhat symmetrically for right=True and right=False. Alternatively, such a parameter could be added additionally, which would make include_lowest more or less obsolete though, as far as I can see. Or to make the API more symmetric one could add another parameter include_highest, which does nothing for right=True but makes the last interval closed on both ends for right=False.
API breaking implications
Changing the parameter include_lowest to final_interval_closed or similar would break the API. The alternative solutions (adding either final_interval_closed or include_highest) would add an additional parameter to the function pd.cut (and if the former would be added, potentially include_lowest could be deprecated down the line).
Describe alternatives you've considered
See three alternatives under Describe the solution you'd like
The text was updated successfully, but these errors were encountered:
I think it is somewhat related but the discussion seems to be about an issue for the case of right=True (default). I'm arguing for a similar option to include_lowest for right=False, which (AFAICS) doesn't exist. So basically, for left-open intervals, you can (with the caveats discussed in #23164) make the outer-most open end closed(-ish) by specifying include_lowest=True and I propose to extend this to also allow the same for right-open intervals.
Nonetheless, depending on the solution to #23164 (changing the docs vs. extending IntervalIndex to actually support a single closed interval), it might be a good idea to fix both together.
Is your feature request related to a problem?
I would like to use
pd.cut
to sort values into bins that are half-open with the lower boundary being the closed end of the interval (i.e. [0, 5), so settingright=False
) but still be able to include the upper bound of the last interval (i.e. have the last interval be closed, something likeinclude_highest=True
analogous toinclude_lowest=True
forright=True
). I encountered this also for infinite boundaries, where adding a small number is not an option (although as a workaround, one canfillna
the result as the only remainingna
s are those of the infinite right boundary).I.e. while I can make the first interval closed for
pd.cut
:I can't do the same for
right=False
whereinclude_lowest=True
seems functionless:(I would want the last value 3 to be in the final bin [2, 3).)
Describe the solution you'd like
Either,
include_lowest
could be changed tofinal_interval_closed
or similar to work asinclude_lowest
forright=True
andinclude_highest
forright=False
(which would break API, see below). This would make the function work somewhat symmetrically forright=True
andright=False
. Alternatively, such a parameter could be added additionally, which would makeinclude_lowest
more or less obsolete though, as far as I can see. Or to make the API more symmetric one could add another parameterinclude_highest
, which does nothing forright=True
but makes the last interval closed on both ends forright=False
.API breaking implications
Changing the parameter
include_lowest
tofinal_interval_closed
or similar would break the API. The alternative solutions (adding eitherfinal_interval_closed
orinclude_highest
) would add an additional parameter to the functionpd.cut
(and if the former would be added, potentiallyinclude_lowest
could be deprecated down the line).Describe alternatives you've considered
See three alternatives under Describe the solution you'd like
The text was updated successfully, but these errors were encountered: