Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concatenation of array columns, similar to concat_list #18090

Closed
adamreeve opened this issue Aug 7, 2024 · 6 comments · Fixed by #19881
Closed

Concatenation of array columns, similar to concat_list #18090

adamreeve opened this issue Aug 7, 2024 · 6 comments · Fixed by #19881
Assignees
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature

Comments

@adamreeve
Copy link
Contributor

Description

Polars allows concatentation of List typed columns with pl.concat_list. It would be useful to also allow concatenation of Array typed columns.

Eg:

df = pl.DataFrame([
    pl.Series('x', [[0, 1], None, [2, 3]], dtype=pl.Array(pl.Int64, 2)),
    pl.Series('y', [[4, 5, 6], [7, 8, 9], [10, 11, 12]], dtype=pl.Array(pl.Int64, 3)),
])

df.with_columns(z=pl.concat_array('x', 'y'))

This should produce a new column equivalent to:

pl.Series('z', [[0, 1, 4, 5, 6], None, [2, 3, 10, 11, 12]], dtype=pl.Array(pl.Int64, 5))
@adamreeve adamreeve added the enhancement New feature or an improvement of an existing feature label Aug 7, 2024
@m00ngoose
Copy link

m00ngoose commented Aug 8, 2024

concat_list doesn't do what you think it does! It constructs a new list column where the entries of the list are the input exprs. I would like to do the same, but for array. Eg.

df = pl.DataFrame(
    {
        'a': [1,2,3],
        'b': [4,5,6],
    }
)
df.select(
    pl.concat_list(pl.col('a'), pl.col('b')), 
    pl.Series(df.select('a', 'b').to_numpy(), dtype=pl.Array(pl.Int64, 2)),  # this should be just pl.concat_array(pl.col('a'), pl.col('b'))
)

@adamreeve
Copy link
Contributor Author

concat_list does do what I think it does and also what you think it does 😉 https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.concat_list.html

concat_array should probably work similarly and allow creating an array from scalars or concatenating existing arrays, or using a mix of arrays and scalars.

@m00ngoose
Copy link

Ah fair! I only knew about Expr.list.concat for the other thing. I stand corrected.

I only care about one of those two cases, but as you say it's probably best to have both if it's going to be named analogously. Thinking about it more, I think concat_[list|array] is a bad name for the "make a list|array" case and they should be separate apis. Out-of-scope though.

@cmdlineluser
Copy link
Contributor

@m00ngoose There has been some discussion of that if it is of interest:

@adamreeve
Copy link
Contributor Author

Based on the discussion linked above it looks like we most likely want to have separate methods for array construction (pl.array) and array concatenation (pl.concat_arrays), which seems much cleaner to me than one function that does both. Further discussion about the method split and naming should probably stay in that issue, but I think it makes sense to keep this issue open for implementing the array methods.

@corwinjoy
Copy link
Contributor

I have added a draft PR to discuss the design of a pl.array function in order to firm up what this would look like and how it should behave. @adamreeve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants