You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was recently working with a pandas DataFrame that had a column with a Int64Dtype, which is nullable. The column didn't actually have any null-values. This gave me the following error:
File "/usr/local/lib/python3.8/dist-packages/benfordslaw/benfordslaw.py", line 293, in _count_digit
digits[Iloc] = list(map(lambda x: int(str(x)[d]), data[Iloc]))
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid `indices`
I looked into it and it's because the nullable int also produces a nullable boolean series. So the variable Iloc was actually a nullable boolean, which I guess isn't supported by numpy. See below for a small reproducable example.
import pandas
from benfordslaw import benfordslaw
bl = benfordslaw(alpha=0.05)
data = pandas.DataFrame({'value': [1,2,3,4,5]})
bl.fit(data['value'].astype(int)) # this works fine
bl.fit(data['value'].astype(pandas.Int64Dtype())) #this throws an error
I feel like something like this would solve it (not tested):
# Get the ith digit
digits = np.zeros_like(data)
Iloc = data>=np.power(10, d)
# ignore nulls and cast to non-nullable dtype just in case
Iloc = Iloc.fillna(False).astype(bool)
digits[Iloc] = list(map(lambda x: int(str(x)[d]), data[Iloc]))
I wouldn't mind making a pull request with some test cases. But I'll leave it up to you, I can also imagine this is not a high priority since I think the nullable IntDtype is still pretty experimental.
Kind regards,
Thomas
The text was updated successfully, but these errors were encountered:
Hi Erdogan,
I was recently working with a pandas DataFrame that had a column with a Int64Dtype, which is nullable. The column didn't actually have any null-values. This gave me the following error:
I looked into it and it's because the nullable int also produces a nullable boolean series. So the variable Iloc was actually a nullable boolean, which I guess isn't supported by numpy. See below for a small reproducable example.
I feel like something like this would solve it (not tested):
I wouldn't mind making a pull request with some test cases. But I'll leave it up to you, I can also imagine this is not a high priority since I think the nullable IntDtype is still pretty experimental.
Kind regards,
Thomas
The text was updated successfully, but these errors were encountered: