Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different treatment of schema on pl.concat for lazy and non-lazy frames in 1.18.0 #20501

Closed
2 tasks done
SebStrug opened this issue Dec 30, 2024 · 4 comments · Fixed by #20523
Closed
2 tasks done

Different treatment of schema on pl.concat for lazy and non-lazy frames in 1.18.0 #20501

SebStrug opened this issue Dec 30, 2024 · 4 comments · Fixed by #20523
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@SebStrug
Copy link

SebStrug commented Dec 30, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

a = pl.DataFrame({'id': [1], 'value': ['foo']})
b = pl.DataFrame({'id': [2], 'value': [None]})

pl.concat([a,b]) # succeeds
pl.concat([a.lazy(), b.lazy()]).collect() # errors

Log output

Traceback (most recent call last):
  File "/tmp/tmp.py", line 6, in <module>
    pl.concat([a.lazy(), b.lazy()]).collect()
  File "/Users/sebstrug/Library/Caches/uv/archive-v0/uKT2bjXk4fKatP8VuhFhL/lib/python3.12/site-packages/polars/lazyframe/frame.py", line 2043, in collect
    return wrap_df(ldf.collect(callback))
                   ^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.InvalidOperationError: 'union'/'concat' inputs should all have the same schema,got
Schema:
name: id, field: Int64
name: value, field: String
 and
Schema:
name: id, field: Int64
name: value, field: Null


Resolved plan until failure:

	---> FAILED HERE RESOLVING THIS_NODE <---
DF ["id", "value"]; PROJECT */2 COLUMNS; SELECTION: None

Issue description

Both concat with lazy and non-lazy worked with polars 1.17

Expected behavior

I would expect concat to work the same way with lazy and non-lazy frames

Installed versions

--------Version info---------
Polars:              1.18.0
Index type:          UInt32
Platform:            macOS-14.5-arm64-arm-64bit
Python:              3.12.5 (main, Aug 14 2024, 04:32:18) [Clang 18.1.8 ]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
azure.identity       <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           <not installed>
nest_asyncio         <not installed>
numpy                <not installed>
openpyxl             <not installed>
pandas               <not installed>
pyarrow              <not installed>
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>```

</details>
@SebStrug SebStrug added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Dec 30, 2024
@SebStrug
Copy link
Author

SebStrug commented Dec 30, 2024

Is this actually the new desired behaviour? I didn't see this in the release notes https://github.com/pola-rs/polars/releases/tag/py-1.18.0

Passing argument how="vertical_relaxed" works

@SebStrug SebStrug changed the title Different treatment of schema on pl.concat for lazy and non-lazy frames in 1.8.0 Different treatment of schema on pl.concat for lazy and non-lazy frames in 1.18.0 Dec 30, 2024
@SebStrug
Copy link
Author

Updated title to refer to latest version 1.18 and updated issue description to note this worked in 1.17

@mcrumiller
Copy link
Contributor

Is this actually the new desired behaviour? I didn't see this in the release notes https://github.com/pola-rs/polars/releases/tag/py-1.18.0

The change occured in #20406. Concatenation of dataframes goes through a completely different route, which allows nulls to be casted: https://github.com/pola-rs/polars/blob/main/crates/polars-core/src/datatypes/dtype.rs#L872-L905. I think dsl_to_ir.rs should maybe include that capability as well for lazy. @ritchie46?

@ritchie46
Copy link
Member

ritchie46 commented Jan 2, 2025

Hmm... Either we always need to propagate casts for null, or we must always error in this case.

Note that swapping the arguments in eager, does lead to an error. Which would be resolved in vertical_relaxed.

Edit: come to think of it, a one way direction cast of nulls is allowed IMO as it doesn't change the schema of the original first argument DataFrame

EDIT: We can leave current behavior, where we respect the first argument as the output schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants