Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new row not created upon assignment to nonexistent row label in DataFrame with MultiIndex #6699

Closed
lebedov opened this issue Mar 24, 2014 · 3 comments
Labels
Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex

Comments

@lebedov
Copy link

lebedov commented Mar 24, 2014

According to the documentation for pandas 0.13+, assigning to a non-existent key via the .ix or .loc operations should create a new row in the DataFrame. Using pandas 0.13.1 with Python 2.7.5, I noticed that this doesn't seem to be the case when the DataFrame has a MultiIndex. To illustrate, consider a DataFrame created as follows:

import itertools
import numpy as np
import pandas

rows = 2 
cols = 3 
a = np.random.randint(0, 2, rows*cols).reshape((rows, cols))
b = a*np.random.rand(rows, cols)
idx = pandas.MultiIndex.from_tuples([(i, j) for i, j in \
             itertools.product(xrange(rows), xrange(cols))],
                              names=['rows','cols'])
df = pandas.DataFrame({'a': a.flatten(),
                       'b': b.flatten()}, index=idx)

If df is

           a         b
rows cols             
0    0     0  0.000000
     1     0  0.000000
     2     1  0.315458
1    0     0  0.000000
     1     0  0.000000
     2     1  0.061142

then running df.loc[(0, 1), :] = (10, 20.0) modifies df as follows:

            a          b
rows cols               
0    0      0   0.000000
     1     10  20.000000
     2      1   0.315458
1    0      0   0.000000
     1      0   0.000000
     2      1   0.061142

However, if one attempts to run df.loc[(0,3), :] = (10, 20.0) on the original df, one obtains

            a          b
rows cols               
0    0     10  20.000000
     1      0   0.000000
     2      1   0.315458
1    0     10  20.000000
     1      0   0.000000
     2      1   0.061142

Is this behavior expected?

@jreback
Copy link
Contributor

jreback commented Mar 24, 2014

Its not implemented with a multi-index, should prob just raise an error for now.

want to do a PR to raise NotImplementedError?

@toobaz
Copy link
Member

toobaz commented Feb 22, 2017

This seems to be fixed (at least, the proposed example behaves fine in git):

In [2]: df.loc[(0,3), :] = (10, 20.0)

In [3]: df
Out[3]: 
              a          b
rows cols                 
0    0      1.0   0.274632
     1      0.0   0.000000
     2      1.0   0.056103
1    0      1.0   0.941639
     1      1.0   0.546224
     2      1.0   0.866284
0    3     10.0  20.000000

@lebedov
Copy link
Author

lebedov commented Feb 22, 2017

I concur - looks good in pandas 0.19.2. Closing.

@lebedov lebedov closed this as completed Feb 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

No branches or pull requests

3 participants