Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial date indexing only works with single key #27180

Closed
toobaz opened this issue Jul 2, 2019 · 4 comments
Closed

Partial date indexing only works with single key #27180

toobaz opened this issue Jul 2, 2019 · 4 comments
Labels
Bug Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@toobaz
Copy link
Member

toobaz commented Jul 2, 2019

Code Sample, a copy-pastable example if possible

In [2]: index = pd.date_range('2001-01-01', periods=100)                                                                                                                                                                                      

In [3]: index.get_loc('2001-01')                                                                                                                                                                                                              
Out[3]: slice(0, 31, None)

In [4]: index.get_indexer(['2001-01'])                                                                                                                                                                                                        
Out[4]: array([-1])

In [5]: s = pd.Series(1, index=index)                                                                                                                                                                                                         

In [6]: s.loc['2001-01']                                                                                                                                                                                                                      
Out[6]: 
2001-01-01    1
2001-01-02    1
2001-01-03    1
2001-01-04    1
2001-01-05    1
2001-01-06    1
2001-01-07    1
2001-01-08    1
2001-01-09    1
2001-01-10    1
2001-01-11    1
2001-01-12    1
2001-01-13    1
2001-01-14    1
2001-01-15    1
2001-01-16    1
2001-01-17    1
2001-01-18    1
2001-01-19    1
2001-01-20    1
2001-01-21    1
2001-01-22    1
2001-01-23    1
2001-01-24    1
2001-01-25    1
2001-01-26    1
2001-01-27    1
2001-01-28    1
2001-01-29    1
2001-01-30    1
2001-01-31    1
Freq: D, dtype: int64

In [7]: s.loc[['2001-01']]                                                                                                                                                                                                                    
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-7-f0d6dc35665a> in <module>
----> 1 s.loc[['2001-01']]

/home/nobackup/repo/pandas/pandas/core/indexing.py in __getitem__(self, key)
   1427 
   1428             maybe_callable = com.apply_if_callable(key, self.obj)
-> 1429             return self._getitem_axis(maybe_callable, axis=axis)
   1430 
   1431     def _is_scalar_access(self, key):

/home/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1829                     raise ValueError('Cannot index with multidimensional key')
   1830 
-> 1831                 return self._getitem_iterable(key, axis=axis)
   1832 
   1833             # nested tuple slicing

/home/nobackup/repo/pandas/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1131             # A collection of keys
   1132             keyarr, indexer = self._get_listlike_indexer(key, axis,
-> 1133                                                          raise_missing=False)
   1134             return self.obj._reindex_with_indexers({axis: [keyarr, indexer]},
   1135                                                    copy=True, allow_dups=True)

/home/nobackup/repo/pandas/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1087         self._validate_read_indexer(keyarr, indexer,
   1088                                     o._get_axis_number(axis),
-> 1089                                     raise_missing=raise_missing)
   1090         return keyarr, indexer
   1091 

/home/nobackup/repo/pandas/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1172                 raise KeyError(
   1173                     "None of [{key}] are in the [{axis}]".format(
-> 1174                         key=key, axis=self.obj._get_axis_name(axis)))
   1175 
   1176             # We (temporarily) allow for some missing keys with .loc, except in

KeyError: "None of [Index(['2001-01'], dtype='object')] are in the [index]"

Problem description

If an indexing mechanism works for individual key, it would be expected to also work for lists of keys, and this is more or less the status quo for other indexes that allow some partial/smart selection.

IntervalIndex works just fine:

In [2]: iidx = pd.IntervalIndex.from_breaks([0, 1, 4.6, 7])                                                                                                                                                                                   

In [3]: iidx.get_loc(5)                                                                                                                                                                                                                       
Out[3]: 2

In [4]: iidx.get_indexer([5])                                                                                                                                                                                                                 
Out[4]: array([2])

MultiIndex doesn't (will open a bug now)

In [5]: midx = pd.MultiIndex.from_product([[0, 1], [2, 3]])                                                                                                                                                                                   

In [6]: midx.get_loc(0)                                                                                                                                                                                                                       
Out[6]: slice(0, 2, None)

In [7]: midx.get_indexer([0])                                                                                                                                                                                                                 
Out[7]: array([-1])

... but this is worked around in indexing code (which shouldn't happen):

In [8]: pd.Series(index=midx).loc[[0]]                                                                                                                                                                                                        
Out[8]: 
0  2   NaN
   3   NaN
dtype: float64

Expected Output

The same as when passing a single key.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 7ceefb3
python : 3.7.3.candidate.1
python-bits : 64
OS : Linux
OS-release : 4.9.0-9-amd64
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : it_IT.UTF-8
LOCALE : it_IT.UTF-8

pandas : 0.25.0.dev0+861.g7ceefb3f2.dirty
numpy : 1.16.4
pytz : 2016.7
dateutil : 2.8.0
pip : 9.0.1
setuptools : 41.0.1
Cython : 0.29.2
pytest : 4.6.3
hypothesis : 3.71.11
sphinx : 1.4.9
blosc : None
feather : None
xlsxwriter : 0.9.6
lxml.etree : 4.3.2
html5lib : 0.999999999
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.5.0
pandas_datareader: 0.2.1
bs4 : 4.5.3
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.3.2
matplotlib : 3.0.2
numexpr : 2.6.9
openpyxl : 2.3.0
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.1.0
sqlalchemy : 1.0.15
tables : 3.4.4
xarray : None
xlrd : 1.0.0
xlwt : 1.3.0
xlsxwriter : 0.9.6

@toobaz toobaz added Datetime Datetime data dtype Index Related to the Index class or subclasses labels Jul 2, 2019
@toobaz
Copy link
Member Author

toobaz commented Jul 7, 2019

The partial string is really a shortcut to help you filter datetime index easier. when you provide the list, make sure the type matches.

I know ;-)

But there is no reason why get_indexer should have different rules from get_loc.

@jbrockmendel jbrockmendel added Indexing Related to indexing on series/frames, not to indexes themselves and removed Index Related to the Index class or subclasses labels Feb 22, 2020
@mroeschke mroeschke added the Bug label Apr 2, 2020
@jbrockmendel
Copy link
Member

If we do change/fix this, i think DatetimeIndex._index_as_unique will just be always-False.

@jreback
Copy link
Contributor

jreback commented Nov 29, 2020

it isn't obvious that this is a useful feature. we have been trying to remove this from [] (deprecate) so expanding this is a -1 from me (this is also generally an interactive feature so not sure list-like partial string indexing makes it any more useful).

@jreback jreback added this to the No action milestone Dec 24, 2020
@jreback
Copy link
Contributor

jreback commented Dec 24, 2020

closing as wont' fix

@jreback jreback closed this as completed Dec 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants