Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fill_null only fills Int32 columns when filling with an untyped integer literal #19651

Closed
2 tasks done
adamreeve opened this issue Nov 5, 2024 · 1 comment · Fixed by #19656
Closed
2 tasks done

fill_null only fills Int32 columns when filling with an untyped integer literal #19651

adamreeve opened this issue Nov 5, 2024 · 1 comment · Fixed by #19656
Assignees
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars support

Comments

@adamreeve
Copy link
Contributor

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.DataFrame({
    'a': [1, None],
    'b': [None, 2],
})

# Returns a DataFrame that still contains nulls:
print(df.fill_null(pl.lit(0)))

# Works and fills with zero:
print(df.fill_null(pl.lit(0, dtype=pl.Int64)))

Log output

shape: (2, 2)
┌──────┬──────┐
│ a    ┆ b    │
│ ---  ┆ ---  │
│ i64  ┆ i64  │
╞══════╪══════╡
│ 1    ┆ null │
│ null ┆ 2    │
└──────┴──────┘
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 0   │
│ 0   ┆ 2   │
└─────┴─────┘

Issue description

Expected behavior

If this is unsupported then an error could be thrown, but it seems reasonable that this should work and fill nulls with zeros.

Installed versions

--------Version info---------
Polars:              1.12.0
Index type:          UInt32
Platform:            Linux-6.11.5-200.fc40.x86_64-x86_64-with-glibc2.39
Python:              3.12.7 (main, Oct  1 2024, 00:00:00) [GCC 14.2.1 20240912 (Red Hat 14.2.1-3)]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.6.1
gevent               <not installed>
great_tables         <not installed>
matplotlib           3.8.2
nest_asyncio         1.5.8
numpy                1.26.3
openpyxl             <not installed>
pandas               2.2.0
pyarrow              17.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                2.4.1+cu121
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@adamreeve adamreeve added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Nov 5, 2024
@adamreeve
Copy link
Contributor Author

adamreeve commented Nov 5, 2024

It looks like the literal is treated as an Int32 and only columns with a matching type are filled:

df = pl.DataFrame({
    'a': pl.Series([1, None], dtype=pl.Int32),
    'b': pl.Series([None, 2], dtype=pl.UInt32),
    'c': pl.Series([None, 2], dtype=pl.Int64),
})
print(df.fill_null(pl.lit(0)))
print(df.fill_null(pl.lit(0, dtype=pl.Int64)))

I had thought that matches_supertype=True would mean that an int32 literal would work with an int64 column, but maybe I haven't understood what that parameter does?

shape: (2, 3)
┌─────┬──────┬──────┐
│ a   ┆ b    ┆ c    │
│ --- ┆ ---  ┆ ---  │
│ i32 ┆ u32  ┆ i64  │
╞═════╪══════╪══════╡
│ 1   ┆ null ┆ null │
│ 0   ┆ 2    ┆ 2    │
└─────┴──────┴──────┘
shape: (2, 3)
┌──────┬──────┬─────┐
│ a    ┆ b    ┆ c   │
│ ---  ┆ ---  ┆ --- │
│ i32  ┆ u32  ┆ i64 │
╞══════╪══════╪═════╡
│ 1    ┆ null ┆ 0   │
│ null ┆ 2    ┆ 2   │
└──────┴──────┴─────┘

Passing a Python int as the fill value seems to handle any integer typed column though:

print(df.fill_null(0))
shape: (2, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i32 ┆ u32 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 0   ┆ 0   │
│ 0   ┆ 2   ┆ 2   │
└─────┴─────┴─────┘

@adamreeve adamreeve changed the title fill_null does nothing when filling with an untyped literal fill_null only fills Int32 columns when filling with an untyped integer literal Nov 5, 2024
@c-peters c-peters added the accepted Ready for implementation label Nov 11, 2024
@c-peters c-peters added this to Backlog Nov 11, 2024
@c-peters c-peters moved this to Done in Backlog Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars support
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants