Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support Struct field selection in the SQL engine, RENAME and REPLACE select wildcard options #17109

Merged
merged 7 commits into from
Jun 24, 2024

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Jun 21, 2024

Closes #16651.

  • Adds SQL support for Struct field selection and wildcard expansion.
  • Adds support for SELECT * RENAME (…) syntax (also works with struct wildcards).
  • Adds support for SELECT * REPLACE (…) syntax (also works with struct wildcards).

Also

  • Fixed issue where ORDER BY <n> ordinal syntax picked the wrong columns to sort.
  • Consolidated QualifiedWildcard & CompoundIdentifier SQL → Expr translation.
  • Generate informative error on use of currently unsupported wildcard options, eg:
    SELECT tbl.* ILIKE …"ILIKE wildcard option is currently unsupported".

Note


Example

import polars as pl

df = pl.DataFrame(
  {
    "id": [100, 200, 300, 400],
    "name": ["Alice", "Bob", "David", "Zoe"],
    "age": [32, 27, 19, 45],
    "other": [{"n": 1.5}, {"n": None}, {"n": -0.5}, {"n": 2.0}],
  }
).select(pl.struct(pl.all()).alias("json_msg"))

# ┌─────────────────────────┐
# │ json_msg                │
# │ ---                     │
# │ struct[4]               │
# ╞═════════════════════════╡
# │ {100,"Alice",32,{1.5}}  │
# │ {200,"Bob",27,{null}}   │
# │ {300,"David",19,{-0.5}} │
# │ {400,"Zoe",45,{2.0}}    │
# └─────────────────────────┘

Select, filter, and sort on Struct field values from SQL:

df.sql(
  """
  SELECT
    json_msg.id   AS ID,
    json_msg.name AS NAME,
    json_msg.age  AS AGE
  FROM
    self
  WHERE
    json_msg.age > 20 AND
    json_msg.other.n IS NOT NULL
  ORDER BY
    json_msg.name DESC
  """
)
# ┌─────┬───────┬─────┐
# │ ID  ┆ NAME  ┆ AGE │
# │ --- ┆ ---   ┆ --- │
# │ i64 ┆ str   ┆ i64 │
# ╞═════╪═══════╪═════╡
# │ 400 ┆ Zoe   ┆ 45  │
# │ 100 ┆ Alice ┆ 32  │
# └─────┴───────┴─────┘

Wildcard Struct selection/expansion, with REPLACE wildcard option:

df.sql(
  """
  SELECT json_msg.* REPLACE (UPPER(name) AS name) FROM self
  """
)
# shape: (4, 4)
# ┌─────┬───────┬─────┬───────────┐
# │ id  ┆ name  ┆ age ┆ other     │
# │ --- ┆ ---   ┆ --- ┆ ---       │
# │ i64 ┆ str   ┆ i64 ┆ struct[1] │
# ╞═════╪═══════╪═════╪═══════════╡
# │ 100 ┆ ALICE ┆ 32  ┆ {1.5}     │
# │ 200 ┆ BOB   ┆ 27  ┆ {null}    │
# │ 300 ┆ DAVID ┆ 19  ┆ {-0.5}    │
# │ 400 ┆ ZOE   ┆ 45  ┆ {2.0}     │
# └─────┴───────┴─────┴───────────┘

Wildcard selection/expansion of nested Struct, with RENAME wildcard option:

df.sql(
  """
  SELECT json_msg.other.* RENAME (n AS value) FROM self
  """
)
# shape: (4, 1)
# ┌───────┐
# │ value │
# │ ---   │
# │ f64   │
# ╞═══════╡
# │ 1.5   │
# │ null  │
# │ -0.5  │
# │ 2.0   │
# └───────┘

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jun 21, 2024
@alexander-beedie alexander-beedie added the A-sql Area: Polars SQL functionality label Jun 21, 2024
@alexander-beedie alexander-beedie force-pushed the sql-struct-field-select branch from cc3f5dd to b90ae1d Compare June 21, 2024 10:18
@alexander-beedie alexander-beedie force-pushed the sql-struct-field-select branch from b90ae1d to 3bb8050 Compare June 21, 2024 13:59
@alexander-beedie alexander-beedie force-pushed the sql-struct-field-select branch from 239ddde to 2cfdfb2 Compare June 21, 2024 20:39
Copy link

codecov bot commented Jun 21, 2024

Codecov Report

Attention: Patch coverage is 94.76190% with 11 lines in your changes missing coverage. Please review.

Project coverage is 80.89%. Comparing base (8a6bf4b) to head (64982d7).
Report is 27 commits behind head on main.

Files Patch % Lines
crates/polars-sql/src/sql_expr.rs 87.09% 8 Missing ⚠️
crates/polars-sql/src/context.rs 97.94% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17109      +/-   ##
==========================================
+ Coverage   80.86%   80.89%   +0.03%     
==========================================
  Files        1456     1456              
  Lines      191141   191403     +262     
  Branches     2728     2742      +14     
==========================================
+ Hits       154562   154836     +274     
+ Misses      36073    36058      -15     
- Partials      506      509       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alexander-beedie alexander-beedie changed the title feat: Support Struct field selection in the SQL engine feat: Support Struct field selection in the SQL engine, RENAME and REPLACE select wildcard options Jun 23, 2024
@ritchie46 ritchie46 merged commit f7ff2ef into pola-rs:main Jun 24, 2024
38 checks passed
@alexander-beedie alexander-beedie deleted the sql-struct-field-select branch June 24, 2024 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql Area: Polars SQL functionality enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQLContext support for accessing Struct type key/value column
2 participants