Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate pd.*Index*(names='') #19295

Closed
toobaz opened this issue Jan 18, 2018 · 13 comments
Closed

Deprecate pd.*Index*(names='') #19295

toobaz opened this issue Jan 18, 2018 · 13 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Deprecate Functionality to remove in pandas Index Related to the Index class or subclasses MultiIndex

Comments

@toobaz
Copy link
Member

toobaz commented Jan 18, 2018

  • pd.Index accepts name - great
  • pd.MultiIndex accepts names - after all, they are multiple names
  • pd.Index accepts names for compatibility with pd.MultiIndex - well, OK, after all, that constructor can result in a MultiIndex
  • pd.Index should accept ( BUG: Creating Index name using names names argument, doesn't set index name #19082 ) names even when it results in a flat index - well, OK, still for compatibility
  • pd.MultiIndex accepts name for compatibility - wait, wasn't names already provided for compatibility?! OK, forget it, go for name.
  • pd.Index accepts name even for multiple levels' names, for compatibility - with MultiIndex, which accepts it for compatibility with pd.Index - aaaaalright
  • All pd.Index subclasses (which will never result in multiple levels, and which currently in some cases discard names, in other cases raise an error) should accept names for compatibility - no, wait.

Proposal 1

  1. There is one way to have compatibility across any kind of Index (sub)class constructor, and it is name. When the constructor results in a MultiIndex (which can happen with pd.Index or pd.MultiIndex), then name should be list-like, and each level will "receive" the respective component

  2. in those cases - and only in those cases - in which a constructor actually results in a MultiIndex, names can be used as an alias for name. In all other cases, it is not accepted.

or alternatively:

  1. (b) (my preference) names is deprecated (in all cases in which it is currently supported, and remains unsupported in other constructors/cases)

  2. (c) (tradeoff between mental health and backward compatibility) names is supported in pd.MultiIndex, still supported but deprecated in pd.Index when resulting in MultiIndex (and remains unsupported in other constructors/cases)

Corollary:

  1. names will not be supported by any constructor that is not pd.Index or pd.MultiIndex.

Notice that a 1-level MultiIndex is still a MultiIndex. That is,

  • pd.Index([('a',), ('b',)], name=('only_one_name',)) will still work
  • pd.Index([('a',), ('b',)], names=('only_one_name',)) will still work (at least as long as we don't deprecate names entirely)
  • pd.Index([('a',), ('b',)], name='only_one_name') will still not work

Proposal 2

  1. There is one way to have compatibility across any kind of Index (sub)class constructor, and it is names, which must always be list-like.

  2. In those cases in which the constructor results in a flat index, name is also accepted, and interpreted as names=(name,); instead name is deprecated for pd.MultiIndex, and for pd.Index when resulting in a MultiIndex (even if with 1 level)

Corollary:

  1. implementation-wise, we will want to decorate all __new__ of non-MultiIndex subclasses to convert names to name
@jreback
Copy link
Contributor

jreback commented Jan 18, 2018

its easiest just to only accept name only. If its a scalar, make it a list. Then the len of names must match the number of levels. Very simple this way. I don't think it would break any external APIS (of course we need to actually deprcate names).

SO Proposal 1. 2b

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves API Design Deprecate Functionality to remove in pandas labels Jan 18, 2018
@jreback jreback added this to the 0.23.0 milestone Jan 18, 2018
@toobaz
Copy link
Member Author

toobaz commented Jan 18, 2018

its easiest just to only accept name only.

I like this!

If its a scalar, make it a list.

I don't like this! You mean in a 1-level MultiIndex? I think the user must still pass ("name",), we don't need to guess.

@jreback
Copy link
Contributor

jreback commented Jan 18, 2018

yeah that’s fine ; let’s make it explicit

@toobaz toobaz changed the title Deprecate pd.*Index*(names=''), at least when resulting in flat index - or vice-versa Deprecate pd.*Index*(names='') Jan 20, 2018
@toobaz
Copy link
Member Author

toobaz commented Jan 29, 2018

What about pd.MultiIndex.from_tuples, pd.MultiIndex.from_product... ? Replace names -> name there too?

@toobaz
Copy link
Member Author

toobaz commented Jan 29, 2018

By the way, the more I think about it, the more my "Proposal 2" looks cleaner to me. But it's more a vague sensation than an objective assessment.

@jorisvandenbossche
Copy link
Member

I think we should look back here at what is the goal? Why do we need/want compatibility? Is it because people want to be able to write code to creates an index without knowing in advance the values and names it receives will contains tuples or not / will result in MultiIndex or not?

I think that is a legitimate case, but, one that is only relevant for Index itself.

But that would also lead me to conclue: for the single level Index subclasses (Int64Index, DatetimeIndex, ...) this will never be the case, so for those we don't care and we use the 'normal' keyword, i.e. name, and there is no reason here to support names.

So I think this is more one of the options of Proposal 1?

Can you explain more what you find attractive in Proposal 2? It feels very strange to me to see name as a kind of alias of names for DatetimeIndex, etc.

@toobaz
Copy link
Member Author

toobaz commented Feb 2, 2018

@jorisvandenbossche you are right that (in my proposal 2) the support for names is only compelling for Index, which is the only case in which we don't know what is going to come out.

This said, I think (although I don't think it is a priority) that

  • supporting names in all classes is just a matter of creating the decorator
  • we might want to support names e.g. in copy():
idx = Index(some_data_which_I_ignore, names=tuple_with_maybe_1_element)
idx2 = idx.copy(names=another_tuple_with_maybe_1_element)

... and hence (presumably) also in the constructor

  • people might reasonably expect that something that results (for instance) in a Int64Index (that is, Index()) takes similar arguments than a Int64Index, where applicable
  • I'm not claiming we should present (e.g. in the docs) name as an alias... it is just the way you could see it if you want to "think general"

But again, I'm relatively indifferent on this.

@jorisvandenbossche
Copy link
Member

supporting names in all classes is just a matter of creating the decorator

It might well be that it is not difficult to support it, but still, what's the point to support it?

we might want to support names e.g. in copy():

That could be interesting (not sure if it is needed, you can also do .copy().rename()), but I don't think that necessarily means it should be the same in the constructor.

people might reasonably expect that something that results (for instance) in a Int64Index (that is, Index()) takes similar arguments than a Int64Index, where applicable

"where applicable", yes, and names is not applicable, I would say :)

But again, I'm relatively indifferent on this.

Me as well, as indeed, we can also simply allow it and convert it under the hood, without advertising it. But, if we allow it, it should be documented, and that will add to the docstring for something that is not really needed IMO.

@jorisvandenbossche
Copy link
Member

So I would vote for: Index subclasses only name, MultiIndex only names, Index both

@jreback
Copy link
Contributor

jreback commented Feb 24, 2018

cc @PoppyBagel

@TomAugspurger
Copy link
Contributor

Pushing to the next release.

@toobaz toobaz added Index Related to the Index class or subclasses and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 29, 2019
@jbrockmendel
Copy link
Member

possibly closed by #38597

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Jun 19, 2021
@mroeschke
Copy link
Member

This looks fairly addressed by #38597 so closing. We can reopen a new issue if there were specific followups needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Deprecate Functionality to remove in pandas Index Related to the Index class or subclasses MultiIndex
Projects
None yet
Development

No branches or pull requests

6 participants