Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression 0.20.15->0.20.16: ComputeError: conversion from null to struct[100] failed in column 'literal' for 0 out of 1 values #15476

Closed
2 tasks done
antonioalegria opened this issue Apr 4, 2024 · 5 comments
Labels
bug Something isn't working P-medium Priority: medium python Related to Python Polars

Comments

@antonioalegria
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

def _unnest_list_columns(df, list_columns):
        new_columns = []
        for col in list_columns:
            new_column = pl.when((pl.col(col).is_not_null()) & (pl.col(col).list.len() > 0)).then(pl.col(col).list.to_struct("max_width", lambda x: f"{x}", 100)).otherwise(pl.lit(None)).alias(col) # This doesn't work with empty lists
            new_columns.append(new_column)
            
        return df.with_columns(new_columns)

df1 = pl.DataFrame(
    {"a": [1, 2, 3],
     "b": [[{"a": 1}], [{"a": 1}, {"a": 2}], [{"a": 1}, {"a": 2}, {"a": 3}]]
     }
    )

print(_unnest_list_columns(df1, ["b"])) # ComputeError: conversion from `null` to `struct[100]` failed in column 'literal' for 0 out of 1 values

Log output

Traceback (most recent call last):
  File "/Users/antonioalegria/Developer/hyperml/x.py", line 17, in <module>
    _unnest_list_columns(df1, ["b"]) # Boom!
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/antonioalegria/Developer/hyperml/x.py", line 9, in _unnest_list_columns
    return df.with_columns(new_columns)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/polars/dataframe/frame.py", line 8366, in with_columns
    return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 1943, in collect
    return wrap_df(ldf.collect())
                   ^^^^^^^^^^^^^
polars.exceptions.ComputeError: conversion from `null` to `struct[100]` failed in column 'literal' for 0 out of 1 values: []

Issue description

In 0.20.15 this ran without any issues, now it raises this exception.

Expected behavior

It should run as in 0.20.15, unless I need to migrate some code, printing the following:

shape: (3, 2)
┌─────┬─────────────────────┐
│ a   ┆ b                   │
│ --- ┆ ---                 │
│ i64 ┆ struct[3]           │
╞═════╪═════════════════════╡
│ 1   ┆ {{1},{null},{null}} │
│ 2   ┆ {{1},{2},{null}}    │
│ 3   ┆ {{1},{2},{3}}       │
└─────┴─────────────────────┘

Installed versions

--------Version info---------
Polars:               0.20.16
Index type:           UInt32
Platform:             macOS-14.3.1-arm64-arm-64bit
Python:               3.11.6 (main, Oct  2 2023, 20:46:14) [Clang 14.0.3 (clang-1403.0.22.14.1)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          2.2.1
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               2023.6.0
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.7.1
numpy:                1.24.3
openpyxl:             3.1.2
pandas:               1.5.3
pyarrow:              12.0.1
pydantic:             1.10.9
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.18
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@antonioalegria antonioalegria added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Apr 4, 2024
@cmdlineluser
Copy link
Contributor

Can reproduce the error.

On 0.20.15 I get this:

df1 = pl.DataFrame({
    "a": [1, 2, 3, 4, 5],
    "b": [[{"a": 1}], [{"a": 1}, {"a": 2}], [{"a": 1}, {"a": 2}, {"a": 3}], [], None]
})

pl.__version__

df1.with_columns(
    pl.when((pl.col("b").is_not_null()) & (pl.col("b").list.len() > 0))
      .then(pl.col("b").list.to_struct("max_width", lambda x: f"{x}", 100))
)

# '0.20.15'
# shape: (5, 2)
# ┌─────┬────────────────────────┐
# │ a   ┆ b                      │
# │ --- ┆ ---                    │
# │ i64 ┆ struct[3]              │
# ╞═════╪════════════════════════╡
# │ 1   ┆ {{1},{null},{null}}    │
# │ 2   ┆ {{1},{2},{null}}       │
# │ 3   ┆ {{1},{2},{3}}          │
# │ 4   ┆ {{null},{null},{null}} │
# │ 5   ┆ {{null},{null},{null}} │
# └─────┴────────────────────────┘

Does the .when() actually do anything in this case?

df1.with_columns(
    pl.col("b").list.to_struct("max_width", lambda x: f"{x}", 100)
)

# shape: (5, 2)
# ┌─────┬────────────────────────┐
# │ a   ┆ b                      │
# │ --- ┆ ---                    │
# │ i64 ┆ struct[3]              │
# ╞═════╪════════════════════════╡
# │ 1   ┆ {{1},{null},{null}}    │
# │ 2   ┆ {{1},{2},{null}}       │
# │ 3   ┆ {{1},{2},{3}}          │
# │ 4   ┆ {{null},{null},{null}} │
# │ 5   ┆ {{null},{null},{null}} │
# └─────┴────────────────────────┘

@reswqa reswqa added P-medium Priority: medium and removed needs triage Awaiting prioritization by a maintainer labels Apr 8, 2024
@reswqa reswqa self-assigned this Apr 8, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog Apr 8, 2024
@reswqa
Copy link
Collaborator

reswqa commented Apr 8, 2024

Thanks @antonioalegria and @cmdlineluser. This should have been an issue for some time, but type_coercion for when-then-otherwise was changed to strict_cast in 0.20.16, the culprit was revealed then. But yes, we should fix this.

@reswqa
Copy link
Collaborator

reswqa commented Apr 9, 2024

After some discussion, I think this should be fixed if we enable outer validity for StructChunked, see #3462.

Until then, you may need to set type_coercion=False to workaround.

@reswqa reswqa removed their assignment Apr 9, 2024
@antonioalegria
Copy link
Author

Where should I set type_coercion=False?

@coastalwhite
Copy link
Collaborator

This no longer reproduces. Closing.

@github-project-automation github-project-automation bot moved this from Ready to Done in Backlog Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P-medium Priority: medium python Related to Python Polars
Projects
Archived in project
Development

No branches or pull requests

4 participants