Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boolean column with NAs becomes obj64 after roundtrip to pandas #1730

Closed
st-pasha opened this issue Mar 21, 2019 · 0 comments · Fixed by #1731
Closed

Boolean column with NAs becomes obj64 after roundtrip to pandas #1730

st-pasha opened this issue Mar 21, 2019 · 0 comments · Fixed by #1731
Assignees
Labels
High priority high-priority tasks improve Improvement of an existing functionality
Milestone

Comments

@st-pasha
Copy link
Contributor

If a boolean column with NAs is converted to pandas and then back to a datatable Frame, the end result is an obj64 column:

>>> import datatable as dt
>>> dt.Frame(dt.Frame([True, False, None]).to_pandas()).stypes
(stype.obj64,)
>>> dt.Frame(dt.Frame([True, False, None]).to_pandas())
       C0
--  -----
 0   True
 1  False
 2    nan

[3 rows x 1 column]

The problem here is that pandas does not support boolean columns with NAs, so it converts the column to object, and we retain this type when converting back. However, this is problematic because obj64 columns have very limited support in datatable: they cannot be sorted, saved to jay or csv, cannot compute stats, cannot participate in arithmetics, etc.

Note that integer columns have a similar problem: after round-trip to pandas they become float64. This, however, is not as bad because this float64 column is numerically equal to the original, and will produce similar results for most operations.

@st-pasha st-pasha added the bug Any bugs / errors in datatable; however for severe bugs use [segfault] label label Mar 21, 2019
@st-pasha st-pasha self-assigned this Mar 21, 2019
@st-pasha st-pasha added this to the Release 0.9.0 milestone Mar 21, 2019
@st-pasha st-pasha added High priority high-priority tasks improve Improvement of an existing functionality and removed bug Any bugs / errors in datatable; however for severe bugs use [segfault] label labels Mar 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
High priority high-priority tasks improve Improvement of an existing functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant