-
-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError: 'IntegerArray' object has no attribute 'tobytes' #406
Comments
Note that you would get a much simpler traceback if you tried to write the same data using Indeed, the code is assuming that the data is a numpy array, which used to always be the case for integers. The new integer-with-nulls should follow the path that was previously one of the options for |
Thank you for the prompt and helpful reply, Martin.
Would it be useful for me to cross-post this to a repository further up the
stack trace, or is it instead in FastParquet's domain?
…On Sun, Feb 24, 2019, 2:25 PM Martin Durant ***@***.***> wrote:
Note that you would get a much simpler traceback if you tried to write the
same data using fastparquet's write() function directly.
Indeed, the code is assuming that the data is a numpy array, which used to
always be the case for integers. The new integer-with-nulls should follow
the path that was previously one of the options for object-type arrays,
to encode asinteger-with-nulls. This is fixable but would take a little
poking around.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#406 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAOSUVZEViqcUpKNxsoVMRe7QLkXf1Xks5vQucMgaJpZM4bOqRs>
.
|
This is certainly fastparquet's remit. I would appreciate any help in fixing it, though. The code is already in place for the object([int, int, None]) case from before, but needs logic to call it correctly in the right place. |
(it is likely that the IntegerArray already has the right structures internally to make writing them to parquet easy - @TomAugspurger ) |
We basically haven't addressed IO for extension arrays: pandas-dev/pandas#20612. Once that's solved, the idea would be for each ExtensionArray to determine how it should be serialized. I'm not sure whether fastparquet wants to get ahead of pandas here. The internal representation of IntegerArray is likely to change in the near future. |
Quite surprised to see that IntegerArray doesn't have public methods to get the values and mask separately. They are available as attributes |
That was intentional, since we know they’ll be changing. The mask is currently a Boolean ndarray, but we know that we’ll use a bitmask in the future.
…________________________________
From: Martin Durant <notifications@github.com>
Sent: Sunday, February 24, 2019 2:23 PM
To: dask/fastparquet
Cc: Tom Augspurger; Mention
Subject: Re: [dask/fastparquet] AttributeError: 'IntegerArray' object has no attribute 'tobytes' (#406)
Quite surprised to see that IntegerArray doesn't have public methods to get the values and mask separately. They are available as attributes _data and _mask, which is what we'll have to use.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#406 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABQHIiOUezYHwHwvyeZdEYEeJIixP1Iyks5vQvTTgaJpZM4bOqRs>.
|
Given
I think it's reasonable for fastparquet not to support this for the time being. It should become the standard thing that parquet integer columns create, and, as in this case, it should be valid input when writing - but not yet. I would ask that users keep to the standard object or float representation for now, even though it will be less efficient. |
Hello!
I am hoping you might know what is going on.
Using edge versions of Dask and FastParquet on Python 3.7.2 I execute:
Which raises:
While it's not doable for me to submit a full
df.dtypes.values
output, I can give you partial output:Note my use of the new Nullable Integer data type (https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html#integer-na).
What do you think?
―James
The text was updated successfully, but these errors were encountered: