-
Notifications
You must be signed in to change notification settings - Fork 914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Can't read pickle file written using standard pandas when using cudf.pandas
#14692
Comments
I'd expect this behaviour: if you pickle with It's not so much that the pickle format is different, it's that the |
@shwina I can understand that if that's the case, but the docs emphasize how cudf.pandas is supposed to be 100% compatible with standard pandas. The docs should explain this incompatibility; I doubt I am the only one with big data files created in standard pandas. |
Thanks - yes, I agree that it would be helpful for the docs to clarify this. |
I opened #14693 to address the docs gap and would greatly appreciate if you could take a look at the wording and suggest any necessary additions! Thanks again for reporting! Also, below is a hack if you need to read large pickle files in with %load_ext cudf.pandas
import pickle
import pandas as pd
from cudf.pandas.module_accelerator import disable_module_accelerator
with disable_module_accelerator():
with open("test.pkl", "rb") as f:
pandas_df = pickle.load(f) # a "real" pandas DataFrame
df = pd.DataFrame(pandas_df) # a cudf.pandas DataFrame |
Adds to the docs the unpickling expectations that were noted in #14692. Authors: - Ashwin Srinath (https://github.com/shwina) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: #14693
I have the same issue. I'm running WSL2 on Windows and using VSCode or command line to run my python script. So I can't using the Jupyter magic command of %load_ext cudf.pandas. So I must load cudf.pandas from the outset: My code loads a dataframe that has been pickled. So, I get the same error as the OP's error: It'd nice if there is a way I can just suspend cudf for just the pickle and unpickle commands. |
@notwopr does the snippet I posted above work for reading your pickle file? (you can skip the |
cudf.pandas
yes the disable_module_accelerator(): method works. Thank you. |
Closing this issue -- #14693 clarified the documentation, and it seems all questions have been answered. Feel free to reopen if needed! |
Describe the bug
If, in standard pandas, you create a dataframe and save it to a pickle file, and then try to load that pickled dataframe in cudf.pandas, it crashes with this report:
Steps/Code to reproduce bug
Run this Script A in standard pandas via e.g.
python script_a.py
:Next, run this Script B in cudf.pandas via
python -m cudf.pandas script_b.py
:Expected behavior
I expect that Script B should load the dataframe from the pickle file.
Environment overview (please complete the following information)
Environment details
Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailsI do not know where the above script is located. Attempting to run the above command yields:
Conda env packages:
Additional context
If you run script B without cudf then the pickled dataframe loads correctly. Interestingly, if you run script A with cudf, then script B loads the file correctly. It seems that cudf is using its own file format, different than standard pandas?
The text was updated successfully, but these errors were encountered: