Regression with v0.5.13 introducing StringArray #172

he-rvb · 2023-04-25T13:38:13Z

Describe the bug

Starting 0.5.13, pandas' StringArray are used, but it is only experimental and not well supported.
As a result, exporting a pandas dataframe with to_hdf lead to the following error:
TypeError: objects of type ``StringArray`` are not supported in this context, sorry; supported objects are: NumPy array, record or scalar; homogeneous list or tuple, integer, float, complex or bytes

Steps to reproduce

Have 0.5.13 or more recent version
Execute code example below

Code example

import clickhouse_connect

with clickhouse_connect.get_client(
    host="play.clickhouse.com", port=443, username="play"
) as client:
    df = client.query_df(query="SELECT 'TEST' as test")

print(df.dtypes)

df.to_hdf("./test.hdf", "df")

Expected behaviour

The export should not raise an exception and df.types should probably return

test    object
dtype: object

instead of

test    string
dtype: object

Configuration

Environment

Python version: 3.10.10
- clickhouse-connect version: 0.5.13
- pandas version: 1.5.3
- tables version: 3.8.0
Operating system: Linux

The text was updated successfully, but these errors were encountered:

genzgd · 2023-04-25T14:13:39Z

This seems like this should be an option perhaps? Also is there any easy workaround by updating the dtype before calling to_hdf?

genzgd · 2023-04-25T14:29:49Z

There's also a query option designed for disabling "advanced" pandas types which solves the problem. Please try:

df = client.query_df(query="SELECT 'TEST' as test", use_na_values=False)

he-rvb · 2023-04-25T15:12:36Z

Yes changing the types of some columns before using to_hdf was my first thought but I felt it was important to let you know in case other user would be impacted by this change.
Thanks for the quick answer and for the workaround; using use_na_values=False seems to solve the issue in a cleaner way.

genzgd · 2023-04-25T15:19:31Z

Glad to hear it, I think that option should reduce the dtypes used to the basic numpy types (plus pandas Timestamp), so it probably should have been named "use_advanced_dtypes" or something along those lines.

he-rvb · 2023-04-25T16:00:08Z

I agree that an option to disable or enable experimental dtypes might be useful.
However it is not exactly the usage of use_na_values, for example it is possible that to get a similar error for the experimental IntegerArray even with use_na_values=False with the following example:

import clickhouse_connect

with clickhouse_connect.get_client(
    host="play.clickhouse.com", port=443, username="play"
) as client:
    df = client.query_df(query="SELECT 1 as test UNION ALL SELECT NULL" , use_na_values=False)

print(df.dtypes)

df.to_hdf("./test.hdf", "df")

genzgd · 2023-04-25T16:01:39Z

I'll take a look at that, it's probably fairly easy to make the same option return an object array with NULL numeric columns.

genzgd · 2023-04-25T16:40:09Z

Yes, it's an easy change and I think it's more consistent to avoid all non-numpy dtypes if that flag is set. It will be fixed in the next release (tentatively scheduled for next week.)

genzgd · 2023-04-26T12:48:06Z

Renamed the flag in the new release 0.5.21 to use_extended_dtypes. Setting this to False on query_df should work to return "basic" dataframes.

he-rvb · 2023-04-28T13:19:26Z

I tried this version and setting and query_df with use_extended_dtypes = False indeed seems to work as expected.
Thank you.

genzgd · 2023-04-28T13:43:21Z

Thanks for testing and for reporting the result. Feedback is always much appreciated!

he-rvb added the bug Something isn't working label Apr 25, 2023

genzgd added enhancement New feature or request and removed bug Something isn't working labels Apr 25, 2023

genzgd added bug Something isn't working and removed enhancement New feature or request labels Apr 25, 2023

genzgd closed this as completed Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression with v0.5.13 introducing StringArray #172

Regression with v0.5.13 introducing StringArray #172

he-rvb commented Apr 25, 2023

genzgd commented Apr 25, 2023

genzgd commented Apr 25, 2023

he-rvb commented Apr 25, 2023

genzgd commented Apr 25, 2023

he-rvb commented Apr 25, 2023

genzgd commented Apr 25, 2023

genzgd commented Apr 25, 2023

genzgd commented Apr 26, 2023

he-rvb commented Apr 28, 2023 •

edited

Loading

genzgd commented Apr 28, 2023

Regression with v0.5.13 introducing StringArray #172

Regression with v0.5.13 introducing StringArray #172

Comments

he-rvb commented Apr 25, 2023

Describe the bug

Steps to reproduce

Code example

Expected behaviour

Configuration

Environment

genzgd commented Apr 25, 2023

genzgd commented Apr 25, 2023

he-rvb commented Apr 25, 2023

genzgd commented Apr 25, 2023

he-rvb commented Apr 25, 2023

genzgd commented Apr 25, 2023

genzgd commented Apr 25, 2023

genzgd commented Apr 26, 2023

he-rvb commented Apr 28, 2023 • edited Loading

genzgd commented Apr 28, 2023

he-rvb commented Apr 28, 2023 •

edited

Loading