Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert from dataframe to Nullable(Float) column depends on how values are ordered #414

Closed
konst-ivanov opened this issue Oct 23, 2024 · 1 comment · Fixed by #415
Closed
Assignees
Labels
bug Something isn't working

Comments

@konst-ivanov
Copy link

Describe the bug

I am trying to insert object column from dataframe to Nullable(Float64) column in ClickHouse. Values in df are str convertible to float or None. The insert succeeds when the first value in the block is str, but fails if it's None (the corresponding columns in the example below are float_col_1 and float_col_2).

What I have found:
  • here Nones are replaced with 0
  • on the next step here, if None is the first in the block and has already become 0 (int), all the str values won't be converted, and this leads to an error

N.B.: however, the problem doesn't appear with integers as strings, as they are converted to int here

Steps to reproduce

  1. run clickhouse server
  2. run code example

Expected behaviour

The same result no matter how the values ordered.

Code example

import pandas as pd
from clickhouse_connect import get_client


def main():
    client = get_client(host="localhost", database="default")
    client.command("DROP TABLE IF EXISTS insert_df_test")
    client.command(
        """CREATE TABLE insert_df_test
        (
            row_id UInt64,
            float_col_1 Nullable(Float64),
            float_col_2 Nullable(Float64)
        )
        ENGINE MergeTree
        ORDER BY row_id
        """
    )
    df = pd.DataFrame(
        [[0, '1.0', None], [1, None, '1.0']],
        columns=["row_id", "float_col_1", "float_col_2"],
    )
    client.insert_df("insert_df_test", df)


if __name__ == '__main__':
    main()

clickhouse-connect and/or ClickHouse server logs

Error traceback
Error serializing column `float_col_2` into data type `Nullable(Float64)`
Traceback (most recent call last):
  File "C:\Users\user\PycharmProjects\clicktest\venv\Lib\site-packages\clickhouse_connect\driver\common.py", line 57, in write_array
    dest += buff.pack(*column)
            ^^^^^^^^^^^^^^^^^^
struct.error: required argument is not a float

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\user\PycharmProjects\clicktest\venv\Lib\site-packages\clickhouse_connect\driver\transform.py", line 104, in chunk_gen
    col_type.write_column(data, output, context)
  File "C:\Users\user\PycharmProjects\clicktest\venv\Lib\site-packages\clickhouse_connect\datatypes\base.py", line 209, in write_column
    self.write_column_data(column, dest, ctx)
  File "C:\Users\user\PycharmProjects\clicktest\venv\Lib\site-packages\clickhouse_connect\datatypes\base.py", line 224, in write_column_data
    self._write_column_binary(column, dest, ctx)
  File "C:\Users\user\PycharmProjects\clicktest\venv\Lib\site-packages\clickhouse_connect\datatypes\base.py", line 341, in _write_column_binary
    write_array(self._array_type, column, dest, ctx.column_name)
  File "C:\Users\user\PycharmProjects\clicktest\venv\Lib\site-packages\clickhouse_connect\driver\common.py", line 62, in write_array
    raise DataError(f'Unable to create Python array{col_msg}.  This is usually caused by trying to insert None ' +
clickhouse_connect.driver.exceptions.DataError: Unable to create Python array for source column `float_col_2`.  This is usually caused by trying to insert None values into a ClickHouse column that is not Nullable
Traceback (most recent call last):
  File "C:\Users\user\PycharmProjects\clicktest\venv\Lib\site-packages\clickhouse_connect\driver\common.py", line 57, in write_array
    dest += buff.pack(*column)
            ^^^^^^^^^^^^^^^^^^
struct.error: required argument is not a float

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\user\PycharmProjects\clicktest\venv\Lib\site-packages\clickhouse_connect\driver\transform.py", line 104, in chunk_gen
    col_type.write_column(data, output, context)
  File "C:\Users\user\PycharmProjects\clicktest\venv\Lib\site-packages\clickhouse_connect\datatypes\base.py", line 209, in write_column
    self.write_column_data(column, dest, ctx)
  File "C:\Users\user\PycharmProjects\clicktest\venv\Lib\site-packages\clickhouse_connect\datatypes\base.py", line 224, in write_column_data
    self._write_column_binary(column, dest, ctx)
  File "C:\Users\user\PycharmProjects\clicktest\venv\Lib\site-packages\clickhouse_connect\datatypes\base.py", line 341, in _write_column_binary
    write_array(self._array_type, column, dest, ctx.column_name)
  File "C:\Users\user\PycharmProjects\clicktest\venv\Lib\site-packages\clickhouse_connect\driver\common.py", line 62, in write_array
    raise DataError(f'Unable to create Python array{col_msg}.  This is usually caused by trying to insert None ' +
clickhouse_connect.driver.exceptions.DataError: Unable to create Python array for source column `float_col_2`.  This is usually caused by trying to insert None values into a ClickHouse column that is not Nullable

Configuration

Environment

  • Python version: 3.11.8
  • clickhouse-connect version: 0.8.3
  • pandas version: 2.2.3
  • Operating system: Windows

ClickHouse server

  • ClickHouse Server version: 24.9.2.42
@konst-ivanov konst-ivanov added the bug Something isn't working label Oct 23, 2024
@genzgd
Copy link
Collaborator

genzgd commented Oct 23, 2024

Thanks for the detailed investigation. I think the float or int conversion needs to be moved up into the _write_column_binary method of each relevant data type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants