Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISC: remove Index and EA subclasses #43002

Open
jbrockmendel opened this issue Aug 12, 2021 · 5 comments
Open

DISC: remove Index and EA subclasses #43002

jbrockmendel opened this issue Aug 12, 2021 · 5 comments
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Ideas Long-Term Enhancement Discussions Index Related to the Index class or subclasses Needs Discussion Requires discussion from core team before further action

Comments

@jbrockmendel
Copy link
Member

There has been discussion in eg #39133 about having EA-backed Indexes held in base-class Index objects, similar discussion regarding the new NumericIndex.

I do not like the idea of having some non-object dtypes return the base class Index, but would be on board for making all dtypes use the base class. i.e. only have Index and deprecate/remove all the subclasses. I'm envisioning roughly

  1. Implement relevant methods on the underlying EAs, eg DatetimeIndex.get_loc logic might live in DatetimeArray.get_loc.
  2. ??? for RangeIndex and MultiIndex, maybe make corresponding EAs?
  3. Methods currently available on subclasses but not Index would be accessed via accessor, e.g. dti.is_year_start becomes idx.foo.is_year_start

Related but independent, we could do the same with ExtensionArray subclasses. Instead of having authors implement methods on their EA subclass, they implement it on the ExtensionDtype subclass, which the EA object then dispatches to.

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member Ideas Long-Term Enhancement Discussions Needs Discussion Requires discussion from core team before further action and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 12, 2021
@topper-123
Copy link
Contributor

topper-123 commented Aug 17, 2021

I think having index methods in the array and treating the Index class as a container seems logical and has potential to simplify things and allow using user created ExtensionArrays as indexes (or as inputs to a generic index class), which would be a win.

Having immutable arrays could be even more powerful. In that case the index would just be a container that requires an immutable array as its input. Having arrays being immutable would also make some operations on DataFrame value faster, e.g. df[df.col1 == "a"] could be very fast, if we know that df.col1 is immutable and it's e.g. monotonic.

@mroeschke mroeschke added Index Related to the Index class or subclasses ExtensionArray Extending pandas with custom dtypes or arrays. labels Aug 21, 2021
@jbrockmendel
Copy link
Member Author

@TomAugspurger @jorisvandenbossche we discussed this briefly the other day. can i get you to post any relevant thoughts while they're still fresh-ish

@jorisvandenbossche
Copy link
Member

Personally I don't care much about the index subclasses as such (I agree thinking about the Index class as a simple container of an array is nice implementation wise), but I think the main discussion point is the user API and not so much the implementation.

Currently the Index subclasses provide additional dtype specific methods. And as you mention in the third point in the top post, this would mean that those methods will have to be accesses via accessors instead. This is huge breaking change, though (I suppose mostly for DatetimeIndex, as probably the most used index subclass with custom functionality), that will require a long deprecation cycle.
I am personally not sure if I would find this deprecation/change worth it.

So I think we should first discuss this user API question (although in theory we could also still provide this in a backwards compatible manner from a single Index class ...).

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Oct 22, 2021

The other aspect that I mentioned in our chat is that I question a bit the need for NumericIndex in our public interface (since this class doesn't even add additional public methods/attributes).
Although related, that's a bit a separate discussion (since it's also still new / only to be used in the future), so I will open a separate issue about that. Update: used the original issue for this -> #41272

@jbrockmendel
Copy link
Member Author

Getting rid of EA subclasses would cause trouble for any EAs that want to cache things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Ideas Long-Term Enhancement Discussions Index Related to the Index class or subclasses Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

4 participants