Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Can't read pickle file written using standard pandas when using cudf.pandas #14692

Closed
yazabaza opened this issue Dec 31, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@yazabaza
Copy link

Describe the bug
If, in standard pandas, you create a dataframe and save it to a pickle file, and then try to load that pickled dataframe in cudf.pandas, it crashes with this report:

Traceback (most recent call last):
File "/home/ron/miniconda3/envs/rap/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ron/miniconda3/envs/rap/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/ron/miniconda3/envs/rap/lib/python3.10/site-packages/cudf/pandas/main.py", line 91, in
main()
File "/home/ron/miniconda3/envs/rap/lib/python3.10/site-packages/cudf/pandas/main.py", line 87, in main
runpy.run_path(args.args[0], run_name="main")
File "/home/ron/miniconda3/envs/rap/lib/python3.10/runpy.py", line 289, in run_path
return _run_module_code(code, init_globals, run_name,
File "/home/ron/miniconda3/envs/rap/lib/python3.10/runpy.py", line 96, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/home/ron/miniconda3/envs/rap/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "scratch2.py", line 9, in
df = pickle.load(f)
File "/home/ron/miniconda3/envs/rap/lib/python3.10/site-packages/cudf/pandas/fast_slow_proxy.py", line 205, in setstate
unpickled_wrapped_obj = pickle.loads(state)
TypeError: a bytes-like object is required, not 'dict'

Steps/Code to reproduce bug
Run this Script A in standard pandas via e.g. python script_a.py:

# SCRIPT A
import pandas as pd
import pickle
df = pd.DataFrame(
	{
		'A':[0.3, 5.6, 6],
		'B':[0.3, 5., 6],
		'C':[0.3, 5.6, 2.5]
	}
)
print(df)
file_path = "/home/ron/bucky/test_df.pickle" # set this path to your desired location
with open(file_path, 'wb') as f:
	pickle.dump(df, f)

Next, run this Script B in cudf.pandas via python -m cudf.pandas script_b.py:

# SCRIPT B
import pandas as pd
import pickle
file_path = '/home/ron/bucky/test_df.pickle' # set this path to your desired location
with open(file_path, 'rb') as f:
	df = pickle.load(f)
print(df)

Expected behavior
I expect that Script B should load the dataframe from the pickle file.

Environment overview (please complete the following information)

  • Environment location: Bare-metal
  • Method of cuDF install: conda

Environment details
Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details
I do not know where the above script is located. Attempting to run the above command yields:

-bash: cudf/print_env.sh: No such file or directory
Environment: Ubuntu 22.04.3 LTS
Conda:

Name                    Version                   Build  Channel
conda                     23.11.0         py311h06a4308_0  
conda-content-trust       0.2.0           py311h06a4308_0  
conda-libmamba-solver     23.12.0            pyhd3eb1b0_1  
conda-package-handling    2.2.0           py311h06a4308_0  
conda-package-streaming   0.9.0           py311h06a4308_0 

Conda env packages:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
aiohttp                   3.9.1           py310h2372a71_0    conda-forge
aiosignal                 1.3.1              pyhd8ed1ab_0    conda-forge
anyio                     4.2.0              pyhd8ed1ab_0    conda-forge
aom                       3.7.1                h59595ed_0    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
argon2-cffi               23.1.0             pyhd8ed1ab_0    conda-forge
argon2-cffi-bindings      21.2.0          py310h2372a71_4    conda-forge
arrow                     1.3.0              pyhd8ed1ab_0    conda-forge
astropy                   6.0.0           py310h1f7b6fc_0    conda-forge
astropy-iers-data         0.2023.12.25.0.30.16    pyhd8ed1ab_0    conda-forge
async-timeout             4.0.3              pyhd8ed1ab_0    conda-forge
attrs                     23.1.0             pyh71513ae_1    conda-forge
authlib                   1.3.0                    pypi_0    pypi
autopep8                  2.0.4                    pypi_0    pypi
aws-c-auth                0.7.8                hcf8cf63_3    conda-forge
aws-c-cal                 0.6.9                h5d48c4d_2    conda-forge
aws-c-common              0.9.10               hd590300_0    conda-forge
aws-c-compression         0.2.17               h7f92143_7    conda-forge
aws-c-event-stream        0.3.2                h0bcb0bb_8    conda-forge
aws-c-http                0.7.15               hd268abd_0    conda-forge
aws-c-io                  0.13.36              hb3b01f7_3    conda-forge
aws-c-mqtt                0.10.0               hbafccad_1    conda-forge
aws-c-s3                  0.4.6                h47b1690_0    conda-forge
aws-c-sdkutils            0.1.13               h7f92143_0    conda-forge
aws-checksums             0.1.17               h7f92143_6    conda-forge
aws-crt-cpp               0.25.0               hfa7cc67_4    conda-forge
aws-sdk-cpp               1.11.210             h0853bfa_5    conda-forge
azure-core-cpp            1.10.3               h91d86a7_0    conda-forge
azure-storage-blobs-cpp   12.10.0              h00ab1b0_0    conda-forge
azure-storage-common-cpp  12.5.0               hb858b4b_2    conda-forge
beautifulsoup4            4.12.2             pyha770c72_0    conda-forge
bleach                    6.1.0              pyhd8ed1ab_0    conda-forge
blosc                     1.21.5               h0f2a231_0    conda-forge
bokeh                     3.3.2              pyhd8ed1ab_0    conda-forge
branca                    0.7.0              pyhd8ed1ab_1    conda-forge
brotli                    1.1.0                hd590300_1    conda-forge
brotli-bin                1.1.0                hd590300_1    conda-forge
brotli-python             1.1.0           py310hc6cd4ac_1    conda-forge
brunsli                   0.1                  h9c3ff4c_0    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.24.0               hd590300_0    conda-forge
c-blosc2                  2.12.0               hb4ffafa_0    conda-forge
ca-certificates           2023.11.17           hbcca054_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
cachetools                5.3.2              pyhd8ed1ab_0    conda-forge
cairo                     1.18.0               h3faef2a_0    conda-forge
certifi                   2023.11.17         pyhd8ed1ab_0    conda-forge
cffi                      1.16.0          py310h2fee648_0    conda-forge
cfitsio                   4.3.1                hbdc6101_0    conda-forge
charls                    2.4.2                h59595ed_0    conda-forge
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
click                     8.1.7           unix_pyh707e725_0    conda-forge
click-plugins             1.1.1                      py_0    conda-forge
cligj                     0.7.2              pyhd8ed1ab_1    conda-forge
cloudpickle               3.0.0              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
colorcet                  3.0.1              pyhd8ed1ab_0    conda-forge
contourpy                 1.2.0           py310hd41b1e2_0    conda-forge
cryptography              41.0.7                   pypi_0    pypi
cubinlinker               0.3.0           py310hfdf336d_0    rapidsai
cucim                     23.12.01        cuda11_py310_231211_ga3445df_0    rapidsai
cuda-profiler-api         11.8.86                       0    nvidia
cuda-python               11.8.3          py310h70a93da_0    conda-forge
cuda-version              11.5                 h6c6c5af_2    conda-forge
cudatoolkit               11.5.2              hbdc67f6_12    conda-forge
cudf                      23.12.01        cuda11_py310_231208_g2ce46216b5_0    rapidsai
cudf_kafka                23.12.01        cuda11_py310_231208_g2ce46216b5_0    rapidsai
cugraph                   23.12.00        cuda11_py310_231206_g1309813f_0    rapidsai
cuml                      23.12.00        cuda11_py310_231206_gad2bd2b65_0    rapidsai
cuproj                    23.12.01        cuda11_py310_231207_g16727064_0    rapidsai
cupy                      12.3.0          py310hf4db66c_0    conda-forge
cuspatial                 23.12.01        cuda11_py310_231207_g16727064_0    rapidsai
custreamz                 23.12.01        cuda11_py310_231208_g2ce46216b5_0    rapidsai
cuxfilter                 23.12.00        cuda11_py310_231206_g63dabeb_0    rapidsai
cycler                    0.12.1             pyhd8ed1ab_0    conda-forge
cyrus-sasl                2.1.27               h54b06d7_7    conda-forge
cytoolz                   0.12.2          py310h2372a71_1    conda-forge
dask                      2023.11.0          pyhd8ed1ab_0    conda-forge
dask-core                 2023.11.0          pyhd8ed1ab_0    conda-forge
dask-cuda                 23.12.00        py310_231206_ge1638ae_0    rapidsai
dask-cudf                 23.12.01        cuda11_py310_231208_g2ce46216b5_0    rapidsai
datashader                0.16.0             pyhd8ed1ab_0    conda-forge
dav1d                     1.2.1                hd590300_0    conda-forge
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
distributed               2023.11.0          pyhd8ed1ab_0    conda-forge
dlpack                    0.5                  h9c3ff4c_0    conda-forge
entrypoints               0.4                pyhd8ed1ab_0    conda-forge
exceptiongroup            1.2.0              pyhd8ed1ab_0    conda-forge
expat                     2.5.0                hcb278e6_1    conda-forge
fastrlock                 0.8.2           py310hc6cd4ac_2    conda-forge
fiona                     1.9.5           py310h0a1e91f_2    conda-forge
fmt                       9.1.0                h924138e_0    conda-forge
folium                    0.15.1             pyhd8ed1ab_0    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 h77eed37_1    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
fonttools                 4.47.0          py310h2372a71_0    conda-forge
fqdn                      1.5.1              pyhd8ed1ab_0    conda-forge
freetype                  2.12.1               h267a509_2    conda-forge
freexl                    2.0.0                h743c826_0    conda-forge
frozenlist                1.4.1           py310h2372a71_0    conda-forge
fsspec                    2023.12.2          pyhca7485f_0    conda-forge
gdal                      3.8.1           py310haaa150b_3    conda-forge
gdk-pixbuf                2.42.10              h829c605_4    conda-forge
geopandas                 0.14.1             pyhd8ed1ab_0    conda-forge
geopandas-base            0.14.1             pyha770c72_0    conda-forge
geos                      3.12.1               h59595ed_0    conda-forge
geotiff                   1.7.1               hf074850_14    conda-forge
gettext                   0.21.1               h27087fc_0    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
giflib                    5.2.1                h0b41bf4_3    conda-forge
glog                      0.6.0                h6f12383_0    conda-forge
gmock                     1.14.0               ha770c72_1    conda-forge
gtest                     1.14.0               h00ab1b0_1    conda-forge
h11                       0.14.0                   pypi_0    pypi
hdf4                      4.2.15               h2a13503_7    conda-forge
hdf5                      1.14.3          nompi_h4f84152_100    conda-forge
holoviews                 1.18.1             pyhd8ed1ab_0    conda-forge
httpcore                  1.0.2                    pypi_0    pypi
httpx                     0.26.0                   pypi_0    pypi
icu                       73.2                 h59595ed_0    conda-forge
idna                      3.6                pyhd8ed1ab_0    conda-forge
imagecodecs               2023.9.18       py310h496a806_2    conda-forge
imageio                   2.33.1             pyh8c1a49c_0    conda-forge
importlib-metadata        7.0.1              pyha770c72_0    conda-forge
importlib_metadata        7.0.1                hd8ed1ab_0    conda-forge
importlib_resources       6.1.1              pyhd8ed1ab_0    conda-forge
isoduration               20.11.0            pyhd8ed1ab_0    conda-forge
jbig                      2.1               h7f98852_2003    conda-forge
jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
joblib                    1.3.2              pyhd8ed1ab_0    conda-forge
json-c                    0.17                 h7ab15ed_0    conda-forge
jsonpointer               2.4             py310hff52083_3    conda-forge
jsonschema                4.20.0             pyhd8ed1ab_0    conda-forge
jsonschema-specifications 2023.12.1          pyhd8ed1ab_0    conda-forge
jsonschema-with-format-nongpl 4.20.0             pyhd8ed1ab_0    conda-forge
jupyter-server-proxy      4.1.0              pyhd8ed1ab_0    conda-forge
jupyter_client            8.6.0              pyhd8ed1ab_0    conda-forge
jupyter_core              5.6.0           py310hff52083_0    conda-forge
jupyter_events            0.9.0              pyhd8ed1ab_0    conda-forge
jupyter_server            2.12.1             pyhd8ed1ab_0    conda-forge
jupyter_server_terminals  0.5.1              pyhd8ed1ab_0    conda-forge
jupyterlab_pygments       0.3.0              pyhd8ed1ab_0    conda-forge
jxrlib                    1.1                  hd590300_3    conda-forge
kealib                    1.5.3                h2f55d51_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.5           py310hd41b1e2_1    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
lazy_loader               0.3                pyhd8ed1ab_0    conda-forge
lcms2                     2.16                 hb7c19ff_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20230802.1      cxx17_h59595ed_0    conda-forge
libaec                    1.1.2                h59595ed_1    conda-forge
libarchive                3.7.2                h2aa1ff5_1    conda-forge
libarrow                  14.0.2           hfb4d3a9_0_cpu    conda-forge
libarrow-acero            14.0.2           h59595ed_0_cpu    conda-forge
libarrow-dataset          14.0.2           h59595ed_0_cpu    conda-forge
libarrow-flight           14.0.2           h120cb0d_0_cpu    conda-forge
libarrow-flight-sql       14.0.2           h61ff412_0_cpu    conda-forge
libarrow-gandiva          14.0.2           hacb8726_0_cpu    conda-forge
libarrow-substrait        14.0.2           h61ff412_0_cpu    conda-forge
libavif16                 1.0.3                hef5bec9_1    conda-forge
libblas                   3.9.0           20_linux64_openblas    conda-forge
libboost-headers          1.84.0               ha770c72_0    conda-forge
libbrotlicommon           1.1.0                hd590300_1    conda-forge
libbrotlidec              1.1.0                hd590300_1    conda-forge
libbrotlienc              1.1.0                hd590300_1    conda-forge
libcblas                  3.9.0           20_linux64_openblas    conda-forge
libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
libcublas                 11.11.3.6                     0    nvidia
libcublas-dev             11.11.3.6                     0    nvidia
libcucim                  23.12.01        cuda11_231211_ga3445df_0    rapidsai
libcudf                   23.12.01        cuda11_231208_g2ce46216b5_0    rapidsai
libcudf_kafka             23.12.01        cuda11_231208_g2ce46216b5_0    rapidsai
libcufft                  10.9.0.58                     0    nvidia
libcufile                 1.4.0.31                      0    nvidia
libcufile-dev             1.4.0.31                      0    nvidia
libcugraph                23.12.00        cuda11_231206_g1309813f_0    rapidsai
libcugraph_etl            23.12.00        cuda11_231206_g1309813f_0    rapidsai
libcugraphops             23.12.00        cuda11_231206_g42d08202_0    nvidia
libcuml                   23.12.00        cuda11_231206_gad2bd2b65_0    rapidsai
libcumlprims              23.12.00        cuda11_231206_gc120fe0_0    nvidia
libcurand                 10.3.0.86                     0    nvidia
libcurand-dev             10.3.0.86                     0    nvidia
libcurl                   8.5.0                hca28451_0    conda-forge
libcusolver               11.4.1.48                     0    nvidia
libcusolver-dev           11.4.1.48                     0    nvidia
libcusparse               11.7.5.86                     0    nvidia
libcusparse-dev           11.7.5.86                     0    nvidia
libcuspatial              23.12.01        cuda11_231207_g16727064_0    rapidsai
libdeflate                1.19                 hd590300_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libevent                  2.1.12               hf998b51_1    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h807b86a_3    conda-forge
libgdal                   3.8.1                h4b8bffa_3    conda-forge
libgfortran-ng            13.2.0               h69a702a_3    conda-forge
libgfortran5              13.2.0               ha4646dd_3    conda-forge
libglib                   2.78.3               h783c2da_0    conda-forge
libgomp                   13.2.0               h807b86a_3    conda-forge
libgoogle-cloud           2.12.0               h5206363_4    conda-forge
libgrpc                   1.59.3               hd6c4280_0    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
libkml                    1.3.0             h01aab08_1018    conda-forge
libkvikio                 23.12.00        cuda11_231206_gf90bfbe_0    rapidsai
liblapack                 3.9.0           20_linux64_openblas    conda-forge
libllvm14                 14.0.6               hcd5def8_4    conda-forge
libllvm15                 15.0.7               hb3ce162_4    conda-forge
libnetcdf                 4.9.2           nompi_h9612171_113    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnl                     3.9.0                hd590300_0    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libntlm                   1.4               h7f98852_1002    conda-forge
libnuma                   2.0.16               h0b41bf4_1    conda-forge
libopenblas               0.3.25          pthreads_h413a1c8_0    conda-forge
libparquet                14.0.2           h352af49_0_cpu    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libpq                     16.1                 h33b98f1_7    conda-forge
libprotobuf               4.24.4               hf27288f_0    conda-forge
libraft                   23.12.00        cuda11_231206_g9e2d6277_0    rapidsai
libraft-headers           23.12.00        cuda11_231206_g9e2d6277_0    rapidsai
libraft-headers-only      23.12.00        cuda11_231206_g9e2d6277_0    rapidsai
librdkafka                1.9.2                ha5a0de0_2    conda-forge
libre2-11                 2023.06.02           h7a70373_0    conda-forge
librmm                    23.12.00        cuda11_231206_g2db5cbb3_0    rapidsai
librttopo                 1.1.0               h8917695_15    conda-forge
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libspatialindex           1.9.3                h9c3ff4c_4    conda-forge
libspatialite             5.1.0                h72606ae_3    conda-forge
libsqlite                 3.44.2               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.2.0               h7e041cc_3    conda-forge
libthrift                 0.19.0               hb90f79a_1    conda-forge
libtiff                   4.6.0                ha9c0a0a_2    conda-forge
libutf8proc               2.8.0                h166bdaf_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libuv                     1.46.0               hd590300_0    conda-forge
libwebp                   1.3.2                h658648e_1    conda-forge
libwebp-base              1.3.2                hd590300_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxgboost                1.7.6           rapidsai_he275d05_7    rapidsai
libxml2                   2.12.3               h232c23b_0    conda-forge
libzip                    1.10.1               h2629f0a_3    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
libzopfli                 1.0.3                h9c3ff4c_0    conda-forge
linkify-it-py             2.0.0              pyhd8ed1ab_0    conda-forge
llvmlite                  0.40.1          py310h1b8f574_0    conda-forge
locket                    1.0.0              pyhd8ed1ab_0    conda-forge
lz4                       4.3.2           py310h350c4a5_1    conda-forge
lz4-c                     1.9.4                hcb278e6_0    conda-forge
lzo                       2.10              h516909a_1000    conda-forge
mapclassify               2.6.1              pyhd8ed1ab_0    conda-forge
markdown                  3.5.1              pyhd8ed1ab_0    conda-forge
markdown-it-py            3.0.0              pyhd8ed1ab_0    conda-forge
markupsafe                2.1.3           py310h2372a71_1    conda-forge
matplotlib-base           3.8.2           py310h62c0568_0    conda-forge
mdit-py-plugins           0.4.0              pyhd8ed1ab_0    conda-forge
mdurl                     0.1.0              pyhd8ed1ab_0    conda-forge
minizip                   4.0.3                h0ab5242_0    conda-forge
mistune                   3.0.2              pyhd8ed1ab_0    conda-forge
msgpack-python            1.0.7           py310hd41b1e2_0    conda-forge
multidict                 6.0.4           py310h2372a71_1    conda-forge
multipledispatch          0.6.0                      py_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
nbclient                  0.8.0              pyhd8ed1ab_0    conda-forge
nbconvert-core            7.13.1             pyhd8ed1ab_0    conda-forge
nbformat                  5.9.2              pyhd8ed1ab_0    conda-forge
nccl                      2.19.4.1             h0800d71_0    conda-forge
ncurses                   6.4                  h59595ed_2    conda-forge
networkx                  3.2.1              pyhd8ed1ab_0    conda-forge
nodejs                    20.9.0               hb753e55_0    conda-forge
nspr                      4.35                 h27087fc_0    conda-forge
nss                       3.96                 h1d7d5a4_0    conda-forge
numba                     0.57.1          py310h0f6aa51_0    conda-forge
numpy                     1.24.4          py310ha4c1d20_0    conda-forge
nvcomp                    3.0.4                h838ba91_1    conda-forge
nvtx                      0.2.8           py310h2372a71_1    conda-forge
openjpeg                  2.5.0                h488ebb8_3    conda-forge
openslide                 3.4.1               h58ba908_12    conda-forge
openssl                   3.2.0                hd590300_1    conda-forge
orc                       1.9.2                h4b38347_0    conda-forge
outcome                   1.3.0.post0              pypi_0    pypi
overrides                 7.4.0              pyhd8ed1ab_0    conda-forge
packaging                 23.2               pyhd8ed1ab_0    conda-forge
pandas                    1.5.3           py310h9b08913_1    conda-forge
pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
panel                     1.3.6              pyhd8ed1ab_0    conda-forge
param                     2.0.1              pyhca7485f_0    conda-forge
partd                     1.4.1              pyhd8ed1ab_0    conda-forge
pcre2                     10.42                hcad00b1_0    conda-forge
pillow                    10.1.0          py310h01dd4db_0    conda-forge
pip                       23.3.2             pyhd8ed1ab_0    conda-forge
pixman                    0.42.2               h59595ed_0    conda-forge
pkgutil-resolve-name      1.3.10             pyhd8ed1ab_1    conda-forge
platformdirs              4.1.0              pyhd8ed1ab_0    conda-forge
poppler                   23.12.0              h590f24d_0    conda-forge
poppler-data              0.4.12               hd8ed1ab_0    conda-forge
postgresql                16.1                 h7387d8b_7    conda-forge
proj                      9.3.0                h1d62c97_2    conda-forge
prometheus_client         0.19.0             pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.43                   pypi_0    pypi
protobuf                  4.24.4          py310h620c231_0    conda-forge
psutil                    5.9.7           py310h2372a71_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
ptxcompiler               0.8.1           py310h70a93da_2    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
py-xgboost                1.7.6           rapidsai_py310h4c2db5f_7    rapidsai
pyarrow                   14.0.2          py310hf9e7431_0_cpu    conda-forge
pyarrow-hotfix            0.6                pyhd8ed1ab_0    conda-forge
pycodestyle               2.11.1                   pypi_0    pypi
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyct                      0.5.0              pyhd8ed1ab_0    conda-forge
pyee                      8.1.0              pyhd8ed1ab_0    conda-forge
pyerfa                    2.0.1.1         py310h1f7b6fc_0    conda-forge
pygments                  2.17.2             pyhd8ed1ab_0    conda-forge
pylibcugraph              23.12.00        cuda11_py310_231206_g1309813f_0    rapidsai
pylibraft                 23.12.00        cuda11_py310_231206_g9e2d6277_0    rapidsai
pynvml                    11.4.1             pyhd8ed1ab_0    conda-forge
pyparsing                 3.1.1              pyhd8ed1ab_0    conda-forge
pyppeteer                 1.0.2              pyhd8ed1ab_0    conda-forge
pyproj                    3.6.1           py310h32c33b7_4    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.10.13         hd12c33a_1_cpython    conda-forge
python-confluent-kafka    1.9.2           py310h5764c6d_2    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-fastjsonschema     2.19.1             pyhd8ed1ab_0    conda-forge
python-json-logger        2.0.7              pyhd8ed1ab_0    conda-forge
python_abi                3.10                    4_cp310    conda-forge
pytz                      2023.3.post1       pyhd8ed1ab_0    conda-forge
pyviz_comms               3.0.0              pyhd8ed1ab_0    conda-forge
pywavelets                1.4.1           py310h1f7b6fc_1    conda-forge
pyyaml                    6.0.1           py310h2372a71_1    conda-forge
pyzmq                     25.1.2          py310h795f18f_0    conda-forge
raft-dask                 23.12.00        cuda11_py310_231206_g9e2d6277_0    rapidsai
rapids                    23.12.00        cuda11_py310_231206_g1d8bed4_0    rapidsai
rapids-dask-dependency    23.12.01                      0    rapidsai
rapids-xgboost            23.12.00        cuda11_py310_231206_g1d8bed4_0    rapidsai
rav1e                     0.6.6                he8a937b_2    conda-forge
rdma-core                 49.0                 hd3aeb46_2    conda-forge
re2                       2023.06.02           h2873b5e_0    conda-forge
readline                  8.2                  h8228510_1    conda-forge
referencing               0.32.0             pyhd8ed1ab_0    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
rfc3339-validator         0.1.4              pyhd8ed1ab_0    conda-forge
rfc3986-validator         0.1.1              pyh9f0ad1d_0    conda-forge
rich                      13.7.0             pyhd8ed1ab_0    conda-forge
rmm                       23.12.00        cuda11_py310_231206_g2db5cbb3_0    rapidsai
rpds-py                   0.15.2          py310hcb5633a_0    conda-forge
rtree                     1.1.0           py310hbdcdc62_0    conda-forge
s2n                       1.4.1                h06160fa_0    conda-forge
scikit-image              0.21.0          py310hc6cd4ac_0    conda-forge
scikit-learn              1.3.2           py310h1fdf081_2    conda-forge
scipy                     1.11.4          py310hb13e2d6_0    conda-forge
selenium                  4.16.0                   pypi_0    pypi
send2trash                1.8.2              pyh41d4057_0    conda-forge
setuptools                68.2.2             pyhd8ed1ab_0    conda-forge
shapely                   2.0.2           py310hc3e127f_1    conda-forge
simpervisor               1.0.0              pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.1.10               h9fff704_0    conda-forge
sniffio                   1.3.0              pyhd8ed1ab_0    conda-forge
sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
soupsieve                 2.5                pyhd8ed1ab_1    conda-forge
spdlog                    1.11.0               h9b3ece8_1    conda-forge
sqlite                    3.44.2               h2c6b66d_0    conda-forge
streamz                   0.6.4              pyh6c4a22f_0    conda-forge
svt-av1                   1.8.0                h59595ed_0    conda-forge
tblib                     3.0.0              pyhd8ed1ab_0    conda-forge
tda-api                   1.6.0                    pypi_0    pypi
terminado                 0.18.0             pyh0d859eb_0    conda-forge
threadpoolctl             3.2.0              pyha21a80b_0    conda-forge
tifffile                  2023.12.9          pyhd8ed1ab_0    conda-forge
tiledb                    2.18.3               hc1131af_1    conda-forge
tinycss2                  1.2.1              pyhd8ed1ab_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
tomli                     2.0.1                    pypi_0    pypi
toolz                     0.12.0             pyhd8ed1ab_0    conda-forge
tornado                   6.3.3           py310h2372a71_1    conda-forge
tqdm                      4.66.1             pyhd8ed1ab_0    conda-forge
traitlets                 5.14.0             pyhd8ed1ab_0    conda-forge
treelite                  3.9.1           py310h4a6579d_0    conda-forge
treelite-runtime          3.9.1                    pypi_0    pypi
trio                      0.23.2                   pypi_0    pypi
trio-websocket            0.11.1                   pypi_0    pypi
types-python-dateutil     2.8.19.14          pyhd8ed1ab_0    conda-forge
typing-extensions         4.9.0                hd8ed1ab_0    conda-forge
typing_extensions         4.9.0              pyha770c72_0    conda-forge
typing_utils              0.1.0              pyhd8ed1ab_0    conda-forge
tzcode                    2023d                h3f72095_0    conda-forge
tzdata                    2023d                h0c530f3_0    conda-forge
uc-micro-py               1.0.1              pyhd8ed1ab_0    conda-forge
ucx                       1.15.0               h75e419f_2    conda-forge
ucx-proc                  1.0.0                       gpu    rapidsai
ucx-py                    0.35.00         py310_231206_gb5f60ca_0    rapidsai
unicodedata2              15.1.0          py310h2372a71_0    conda-forge
uri-template              1.3.0              pyhd8ed1ab_0    conda-forge
uriparser                 0.9.7                hcb278e6_1    conda-forge
urllib3                   1.26.18            pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.12                   pypi_0    pypi
webcolors                 1.13               pyhd8ed1ab_0    conda-forge
webencodings              0.5.1              pyhd8ed1ab_2    conda-forge
websocket-client          1.7.0              pyhd8ed1ab_0    conda-forge
websockets                10.4            py310h5764c6d_1    conda-forge
wheel                     0.42.0             pyhd8ed1ab_0    conda-forge
wsproto                   1.2.0                    pypi_0    pypi
xarray                    2023.12.0          pyhd8ed1ab_0    conda-forge
xerces-c                  3.2.5                hac6953d_0    conda-forge
xgboost                   1.7.6           rapidsai_py310h4c2db5f_7    rapidsai
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.1.1                hd590300_0    conda-forge
xorg-libsm                1.2.4                h7391055_0    conda-forge
xorg-libx11               1.8.7                h8ee46fc_0    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxrender           0.9.11               hd590300_0    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xyzservices               2023.10.1          pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
yarl                      1.9.3           py310h2372a71_0    conda-forge
zeromq                    4.3.5                h59595ed_0    conda-forge
zfp                       1.0.1                h59595ed_0    conda-forge
zict                      3.0.0              pyhd8ed1ab_0    conda-forge
zipp                      3.17.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               hd590300_5    conda-forge
zlib-ng                   2.0.7                h0b41bf4_0    conda-forge
zstd                      1.5.5                hfc55251_0    conda-forge

Additional context
If you run script B without cudf then the pickled dataframe loads correctly. Interestingly, if you run script A with cudf, then script B loads the file correctly. It seems that cudf is using its own file format, different than standard pandas?

@yazabaza yazabaza added Needs Triage Need team to review and classify bug Something isn't working labels Dec 31, 2023
@shwina
Copy link
Contributor

shwina commented Dec 31, 2023

I'd expect this behaviour: if you pickle with cudf.pandas, you must unpickle with cudf.pandas. If you pickle with regular pandas, you must unpickle with regular pandas.

It's not so much that the pickle format is different, it's that the DataFrame object looks different when you have cudf.pandas enabled v/s without.

@yazabaza
Copy link
Author

@shwina I can understand that if that's the case, but the docs emphasize how cudf.pandas is supposed to be 100% compatible with standard pandas. The docs should explain this incompatibility; I doubt I am the only one with big data files created in standard pandas.

@shwina
Copy link
Contributor

shwina commented Dec 31, 2023

Thanks - yes, I agree that it would be helpful for the docs to clarify this.

@shwina
Copy link
Contributor

shwina commented Dec 31, 2023

I opened #14693 to address the docs gap and would greatly appreciate if you could take a look at the wording and suggest any necessary additions! Thanks again for reporting!

Also, below is a hack if you need to read large pickle files in with cudf.pandas enabled. As a warning, the disable_module_accelerator functionality is undocumented and could change, and we don't recommend it using it in other contexts:

%load_ext cudf.pandas

import pickle
import pandas as pd
from cudf.pandas.module_accelerator import disable_module_accelerator


with disable_module_accelerator():
    with open("test.pkl", "rb") as f:
        pandas_df = pickle.load(f)  # a "real" pandas DataFrame

df = pd.DataFrame(pandas_df)  # a cudf.pandas DataFrame

rapids-bot bot pushed a commit that referenced this issue Jan 12, 2024
Adds to the docs the unpickling expectations that were noted in #14692.

Authors:
  - Ashwin Srinath (https://github.com/shwina)

Approvers:
  - Matthew Roeschke (https://github.com/mroeschke)

URL: #14693
@notwopr
Copy link

notwopr commented Jan 19, 2024

I have the same issue. I'm running WSL2 on Windows and using VSCode or command line to run my python script. So I can't using the Jupyter magic command of %load_ext cudf.pandas.

So I must load cudf.pandas from the outset:
python3 -m cudf.pandas example.py

My code loads a dataframe that has been pickled. So, I get the same error as the OP's error:
File "/home/username/.local/lib/python3.10/site-packages/cudf/pandas/fast_slow_proxy.py", line 205, in __setstate__ unpickled_wrapped_obj = pickle.loads(state) TypeError: a bytes-like object is required, not 'dict'

It'd nice if there is a way I can just suspend cudf for just the pickle and unpickle commands.

@shwina
Copy link
Contributor

shwina commented Jan 19, 2024

@notwopr does the snippet I posted above work for reading your pickle file? (you can skip the %load_ext if you are passing -m cudf.pandas on the command line)

@shwina shwina changed the title [BUG] [BUG] Can't read pickle file written using standard pandas when using cudf.pandas Jan 19, 2024
@notwopr
Copy link

notwopr commented Jan 19, 2024

yes the disable_module_accelerator(): method works. Thank you.

@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
@bdice
Copy link
Contributor

bdice commented Mar 4, 2024

Closing this issue -- #14693 clarified the documentation, and it seems all questions have been answered. Feel free to reopen if needed!

@bdice bdice closed this as completed Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants