Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

last left-on expression shows up as "literal" column in join output #9621

Closed
2 tasks done
mcrumiller opened this issue Jun 29, 2023 · 3 comments · Fixed by #17061
Closed
2 tasks done

last left-on expression shows up as "literal" column in join output #9621

mcrumiller opened this issue Jun 29, 2023 · 3 comments · Fixed by #17061
Assignees
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@mcrumiller
Copy link
Contributor

mcrumiller commented Jun 29, 2023

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

When a join occurs with expressions in the left_on, the last expression gets added to the dataframe as "literal". Because we are not allowed to alias these expressions, it doesn't make sense to include them in the final output, and because all expressions default to literal, only one of the expressions ends up in the dataframe, and you are unable to choose which.

Reproducible example

import polars as pl
from polars import col, when

df1 = pl.DataFrame({
    'a': [1, 2, 3, 4, 5]
})

df2 = pl.DataFrame({
    'b': [5, 4, 3, 2, 1]
})

df1.join(
    df2,
    left_on=when(False).then(0).otherwise(col('a')),
    right_on=when(False).then(0).otherwise(col('b')),
    how="left",
)
shape: (5, 3)
┌─────┬─────────┬─────┐
│ a   ┆ literal ┆ b   │
│ --- ┆ ---     ┆ --- │
│ i64 ┆ i64     ┆ i64 │
╞═════╪═════════╪═════╡
│ 1   ┆ 1       ┆ 1   │
│ 2   ┆ 2       ┆ 2   │
│ 3   ┆ 3       ┆ 3   │
│ 4   ┆ 4       ┆ 4   │
│ 5   ┆ 5       ┆ 5   │
└─────┴─────────┴─────┘

Expected behavior

literal column should not be inserted into resulting dataframe.

Installed versions

--------Version info---------
Polars:      0.18.1
Index type:  UInt32
Platform:    Windows-10-10.0.19045-SP0
Python:      3.11.2 (tags/v3.11.2:878ead1, Feb  7 2023, 16:38:35) [MSC v.1934 64 bit (AMD64)]

----Optional dependencies----
numpy:       1.24.3
pandas:      2.0.0
pyarrow:     11.0.0
connectorx:  0.3.2a3
deltalake:   <not installed>
fsspec:      <not installed>
matplotlib:  3.7.1
xlsx2csv:    0.8.1
xlsxwriter:  3.1.0
@mcrumiller mcrumiller added bug Something isn't working python Related to Python Polars labels Jun 29, 2023
@ritchie46
Copy link
Member

We follow a simple rule for determining the name of the output expresison (unless you provide an alias or keep_name) and that is the left-most expression. That is a literal, hence the name.

@mcrumiller
Copy link
Contributor Author

@ritchie46 -- my point is that this column shouldn't be in the output whatsoever. Using an expression via a join means "join on this calculation"--not necessary "include the calculation in the output."

@ritchie46
Copy link
Member

Oh, right! Now I see it ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants