You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a boolean column with NAs is converted to pandas and then back to a datatable Frame, the end result is an obj64 column:
>>> import datatable as dt
>>> dt.Frame(dt.Frame([True, False, None]).to_pandas()).stypes
(stype.obj64,)
>>> dt.Frame(dt.Frame([True, False, None]).to_pandas())
C0
-- -----
0 True
1 False
2 nan
[3 rows x 1 column]
The problem here is that pandas does not support boolean columns with NAs, so it converts the column to object, and we retain this type when converting back. However, this is problematic because obj64 columns have very limited support in datatable: they cannot be sorted, saved to jay or csv, cannot compute stats, cannot participate in arithmetics, etc.
Note that integer columns have a similar problem: after round-trip to pandas they become float64. This, however, is not as bad because this float64 column is numerically equal to the original, and will produce similar results for most operations.
The text was updated successfully, but these errors were encountered:
st-pasha
added
the
bug
Any bugs / errors in datatable; however for severe bugs use [segfault] label
label
Mar 21, 2019
st-pasha
added
High priority
high-priority tasks
improve
Improvement of an existing functionality
and removed
bug
Any bugs / errors in datatable; however for severe bugs use [segfault] label
labels
Mar 21, 2019
If a boolean column with NAs is converted to pandas and then back to a datatable Frame, the end result is an obj64 column:
The problem here is that pandas does not support boolean columns with NAs, so it converts the column to object, and we retain this type when converting back. However, this is problematic because obj64 columns have very limited support in datatable: they cannot be sorted, saved to jay or csv, cannot compute stats, cannot participate in arithmetics, etc.
Note that integer columns have a similar problem: after round-trip to pandas they become
float64
. This, however, is not as bad because this float64 column is numerically equal to the original, and will produce similar results for most operations.The text was updated successfully, but these errors were encountered: