Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support SQL SELECT * ILIKE wildcard syntax #17169

Merged
merged 7 commits into from
Jun 28, 2024

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Jun 24, 2024

Following on from #17109, which added RENAME and REPLACE modifiers:

New

  • Added support for the SELECT * ILIKE … wildcard select-modifier1.
    This is a pattern-match on column names, using SQL "%" and "_" glob chars.

Also

  • Fixed interaction of ORDER BY and REPLACE modifier (should sort on post-replace values).
  • Fixed use of ordinal values with ORDER BY clause if also using SELECT * EXCLUDE.
  • Improved use of ORDER BY on columns targeted by RENAME modifier.
  • Some code refactoring/consolidation to improve clarity around select-modifiers.
  • Extended select-modifier unit tests to cover additional permutations/combinations of modifiers.

Note

With this final addition, we will support all SELECT * wildcard modifiers for the 1.0 🎉

SELECT * EXCLUDE …
SELECT * ILIKE …
SELECT * RENAME …
SELECT * REPLACE …

Examples

import polars as pl

df = pl.DataFrame({
    "ID": [333, 666, 999],
    "FirstName": ["Bruce", "Diana", "Clark"],
    "LastName": ["Wayne", "Prince", "Kent"],
    "Address": ["The Batcave", "Paradise Island", "Fortress of Solitude"],
    "City": ["Gotham", "Themyscira", "Metropolis"],
})

Demonstrate use of SELECT * ILIKE:

df.sql("""
  SELECT * ILIKE '%a%e%'
  FROM self
  ORDER BY FirstName
""")
# shape: (3, 3)
# ┌───────────┬──────────┬──────────────────────┐
# │ FirstName ┆ LastName ┆ Address              │
# │ ---       ┆ ---      ┆ ---                  │
# │ str       ┆ str      ┆ str                  │
# ╞═══════════╪══════════╪══════════════════════╡
# │ Bruce     ┆ Wayne    ┆ The Batcave          │
# │ Clark     ┆ Kent     ┆ Fortress of Solitude │
# │ Diana     ┆ Prince   ┆ Paradise Island      │
# └───────────┴──────────┴──────────────────────┘

Combine ILIKE with RENAME, sort with ordinal-based ORDER BY:

df.sql("""
  SELECT * ILIKE '%I%' RENAME (FirstName AS Name) 
  FROM self
  ORDER BY 3 DESC
""")
# shape: (3, 3)
# ┌─────┬───────┬────────────┐
# │ ID  ┆ Name  ┆ City       │
# │ --- ┆ ---   ┆ ---        │
# │ i64 ┆ str   ┆ str        │
# ╞═════╪═══════╪════════════╡
# │ 666 ┆ Diana ┆ Themyscira │
# │ 999 ┆ Clark ┆ Metropolis │
# │ 333 ┆ Bruce ┆ Gotham     │
# └─────┴───────┴────────────┘

Correctly apply ORDER BY on a column targeted by the RENAME modifier:

df.sql("""
  SELECT * EXCLUDE (ID, City, LastName) RENAME FirstName AS Name
  FROM self
  ORDER BY Name
""")
# shape: (3, 2)
# ┌───────┬──────────────────────┐
# │ Name  ┆ Address              │
# │ ---   ┆ ---                  │
# │ str   ┆ str                  │
# ╞═══════╪══════════════════════╡
# │ Bruce ┆ The Batcave          │
# │ Clark ┆ Fortress of Solitude │
# │ Diana ┆ Paradise Island      │
# └───────┴──────────────────────┘

Footnotes

  1. Note that these SELECT * wildcard options have become very popular in the last couple of years, with DuckDB, Snowflake, Databricks, and other vendors jumping on to the syntax, but PostgreSQL (our usual syntax target) actually has yet to implement it. However, given its popularity I think it's important that we have it (and it's purely additive; we lose nothing by supporting it) ✌️

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jun 24, 2024
@alexander-beedie alexander-beedie added the A-sql Area: Polars SQL functionality label Jun 24, 2024
Copy link

codecov bot commented Jun 24, 2024

Codecov Report

Attention: Patch coverage is 97.75281% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.82%. Comparing base (6518a80) to head (8a77789).
Report is 7 commits behind head on main.

Files Patch % Lines
crates/polars-sql/src/context.rs 97.75% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17169      +/-   ##
==========================================
- Coverage   80.94%   80.82%   -0.13%     
==========================================
  Files        1464     1464              
  Lines      191956   192043      +87     
  Branches     2742     2744       +2     
==========================================
- Hits       155385   155224     -161     
- Misses      36062    36308     +246     
- Partials      509      511       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

feat: Support `SELECT * ILIKE` wildcard syntax

feat: Support `SELECT * ILIKE` wildcard syntax in SQL interface
@alexander-beedie alexander-beedie force-pushed the select-wildcard-ilike branch 2 times, most recently from 3d04925 to 8a8260f Compare June 25, 2024 07:11
@alexander-beedie alexander-beedie force-pushed the select-wildcard-ilike branch 2 times, most recently from 0e088ad to 8fff43c Compare June 25, 2024 20:29
@alexander-beedie alexander-beedie changed the title feat: Support SELECT * ILIKE wildcard syntax in SQL interface feat: Support SQL SELECT * ILIKE wildcard syntax Jun 26, 2024
@ritchie46 ritchie46 merged commit 8f8427d into pola-rs:main Jun 28, 2024
27 checks passed
@alexander-beedie alexander-beedie deleted the select-wildcard-ilike branch July 30, 2024 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql Area: Polars SQL functionality enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants