Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predicate/projection pushdown causes ShapeError #19944

Closed
2 tasks done
mdavis-xyz opened this issue Nov 23, 2024 · 10 comments · Fixed by #20181
Closed
2 tasks done

predicate/projection pushdown causes ShapeError #19944

mdavis-xyz opened this issue Nov 23, 2024 · 10 comments · Fixed by #20181
Assignees
Labels
accepted Ready for implementation bug Something isn't working needs repro Bug does not yet have a reproducible example needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@mdavis-xyz
Copy link
Contributor

mdavis-xyz commented Nov 23, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

Note that this script will download 170MB of sample files from AWS S3. But you shouldn't need any AWS credentials for this to work. (I made the bucket public.)

import os
import datetime as dt
from copy import deepcopy as copy
from tempfile import TemporaryDirectory

import boto3
from botocore.config import Config
from botocore import UNSIGNED
import polars as pl

# download the data
filenames_short = ['bop1.parquet', 'bop2.parquet', 'bdo.parquet']
bucket_name = 'mdavis-xyz-public-mwe'
s3_prefix = 'pl-shape-error/01/'
local_dir = 'mwe-data'
region_name = "eu-west-3"

s3 = boto3.resource('s3', 
                  config=Config(signature_version=UNSIGNED),
                  region_name=region_name)
bucket = s3.Bucket(bucket_name)

os.makedirs(local_dir, exist_ok=True)
local_paths = []
remote_paths = []
for f in filenames_short:
    local_path = os.path.join(local_dir, f)
    remote_path = os.path.join(s3_prefix, f)
    bucket.download_file(remote_path, local_path)
    local_paths.append(local_path)
    remote_paths.append(os.path.join('s3://' + bucket_name, remote_path))
remote_paths

# define the query which causes problems
def query(bop_paths, bdo_path, storage_options={}, predicate_pushdown=True, projection_pushdown=True):
    bidofferperiod_wide = (
        pl.concat([
            pl.scan_parquet(bop_paths[0], storage_options=storage_options),
            pl.scan_parquet(bop_paths[1], storage_options=storage_options)
        ], how='diagonal_relaxed')
    )
    
    biddayoffer_per_bid_wide = pl.scan_parquet(bdo_path, storage_options=storage_options)
    
    lf = (
        biddayoffer_per_bid_wide
        .rename({
            'SETTLEMENTDATE': 'TRADINGDATE',
            'OFFERDATE': 'OFFERDATETIME',
        })
        .with_columns(pl.col("TRADINGDATE").cast(dt.date))
        .join(bidofferperiod_wide, 
              how='inner', 
              on=['BIDTYPE', 'DIRECTION', 'DUID', 'OFFERDATETIME', 'TRADINGDATE'], 
              join_nulls=True,
              coalesce=True
        )
        .filter(pl.col("MAXAVAIL") > 0) # avoid divide by zero later
        .filter(pl.col("DUID").is_not_null())
    
        
        .tail()
        .group_by("DUID")
        .agg([
             pl.len().alias("LENGTH"),
        ])
    )
    lf.collect(predicate_pushdown=predicate_pushdown, projection_pushdown=projection_pushdown)

# these ones work
query(remote_paths[:2], remote_paths[-1], storage_options={"aws_region": region_name})
query(local_paths[:2], local_paths[-1], predicate_pushdown=False, projection_pushdown=False)
query(local_paths[:2], local_paths[-1], predicate_pushdown=False, projection_pushdown=True)
query(local_paths[:2], local_paths[-1], predicate_pushdown=True, projection_pushdown=False)

# this throws an error
query(local_paths[:2], local_paths[-1], predicate_pushdown=True, projection_pushdown=True)

Log output

Async thread count: 4
async download_chunk_size: 67108864
join parallel: true
UNION: union is run in parallel
POLARS PREFETCH_SIZE: 96
querying metadata of 1/1 files...
POLARS PREFETCH_SIZE: 96
reading of 1/1 file...
querying metadata of 1/1 files...
POLARS PREFETCH_SIZE: 96
reading of 1/1 file...
querying metadata of 1/1 files...
reading of 1/1 file...
parquet row group must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
POLARS ROW_GROUP PREFETCH_SIZE: 128
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
...
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
POLARS ROW_GROUP PREFETCH_SIZE: 128
parquet scan with parallel = Columns
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet scan with parallel = RowGroups
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
...
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
sys:1: CategoricalRemappingWarning: Local categoricals have different encodings, expensive re-encoding is done to perform this merge operation. Consider using a StringCache or an Enum type if the categories are known in advance
parquet scan with parallel = RowGroups
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
...
parquet row group must be read, statistics not sufficient for predicate.
/home/azureuser/after_holiday/mwe/gh/script.py:63: CategoricalRemappingWarning: Local categoricals have different encodings, expensive re-encoding is done to perform this merge operation. Consider using a StringCache or an Enum type if the categories are known in advance
  lf.collect(predicate_pushdown=predicate_pushdown, projection_pushdown=projection_pushdown)
INNER join dataframes finished
FOUND SORTED KEY: running default HASH AGGREGATION
join parallel: true
UNION: union is run in parallel
parquet scan with parallel = RowGroups
parquet scan with parallel = Columns
parquet scan with parallel = Columns
INNER join dataframes finished
dataframe filtered
dataframe filtered
FOUND SORTED KEY: running default HASH AGGREGATION
join parallel: true
UNION: union is run in parallel
parquet scan with parallel = RowGroups
parquet scan with parallel = Columns
parquet scan with parallel = RowGroups
INNER join dataframes finished
dataframe filtered
dataframe filtered
FOUND SORTED KEY: running default HASH AGGREGATION
join parallel: true
UNION: union is run in parallel
parquet scan with parallel = Prefiltered
parquet live columns = 1, dead columns = 30
parquet scan with parallel = Columns
parquet scan with parallel = Prefiltered
parquet live columns = 2, dead columns = 26
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
...
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
...
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
INNER join dataframes finished
FOUND SORTED KEY: running default HASH AGGREGATION
join parallel: true
UNION: union is run in parallel
parquet scan with parallel = Columns
parquet scan with parallel = Prefiltered
parquet live columns = 1, dead columns = 4
parquet scan with parallel = Prefiltered
parquet live columns = 2, dead columns = 4
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
...
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
Traceback (most recent call last):
  File "/home/azureuser/after_holiday/mwe/gh/script.py", line 69, in <module>
    query(local_paths[:2], local_paths[-1], predicate_pushdown=True, projection_pushdown=True)
  File "/home/azureuser/after_holiday/mwe/gh/script.py", line 63, in query
    lf.collect(predicate_pushdown=predicate_pushdown, projection_pushdown=projection_pushdown)
  File "/home/azureuser/venv/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 2029, in collect
    return wrap_df(ldf.collect(callback))
polars.exceptions.ShapeError: unable to vstack, column names don't match: "BIDTYPE" and "DUID"

Issue description

I get error:

ShapeError: unable to vstack, column names don't match: "BIDTYPE" and "DUID"

For code which has no vstack command. I do have a concat but with diagonal_relaxed.

Note that if I query from AWS S3 directly, I don't get the error. If I turn off either predicate or projection pushdown (or both), I don't get the error. Only when I turn on both, and read the file locally, do I get the error.

I suspect the error may be related to enums/categoricals.

It may be related to #13381

Expected behavior

The script should run without throwing an exception.

Querying files locally should give the same result as querying from S3.

Installed versions

--------Version info---------
Polars:              1.14.0
Index type:          UInt32
Platform:            Linux-6.8.0-1018-azure-x86_64-with-glibc2.35
Python:              3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
boto3                1.35.68
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            0.12.0
fsspec               <not installed>
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           3.9.2
nest_asyncio         1.6.0
numpy                2.0.0
openpyxl             3.1.5
pandas               2.2.3
pyarrow              18.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             0.8.4
xlsxwriter           3.2.0

@mdavis-xyz mdavis-xyz added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Nov 23, 2024
@cmdlineluser
Copy link
Contributor

cmdlineluser commented Nov 23, 2024

Attempted minimal repro.

import polars as pl

filename = '19944.parquet'

n = 1_500_000

values = list('abcdefghijlmno')
dtype = pl.Enum(values)
s = pl.Series(values, dtype=dtype)

pl.select(
    A = pl.lit('a'),
    B = s.sample(n, with_replacement=True),
    C = s.sample(n, with_replacement=True)
).write_parquet(filename)

r1 = pl.LazyFrame({'A': 'a', 'B': 'b'}).cast({'B': dtype})
r2 = pl.scan_parquet(filename).cast({'B': dtype, 'C': dtype})

l = pl.LazyFrame({'B': 'b'}).cast({'B': dtype})
r = pl.concat([r1, r2], how='diagonal_relaxed')

(l.join(r, on='B')
  .filter(A = pl.col('A'))
  .group_by('A')
  .first()
  .collect() 
)
# ShapeError: unable to vstack, column names don't match: "B" and "A"

Update: It seems this started to error in 1.13.0

@ritchie46
Copy link
Member

Attempted minimal repro.

import polars as pl

filename = '19944.parquet'

n = 1_500_000

values = list('abcdefghijlmno')
dtype = pl.Enum(values)
s = pl.Series(values, dtype=dtype)

pl.select(
    A = pl.lit('a'),
    B = s.sample(n, with_replacement=True),
    C = s.sample(n, with_replacement=True)
).write_parquet(filename)

r1 = pl.LazyFrame({'A': 'a', 'B': 'b'}).cast({'B': dtype})
r2 = pl.scan_parquet(filename).cast({'B': dtype, 'C': dtype})

l = pl.LazyFrame({'B': 'b'}).cast({'B': dtype})
r = pl.concat([r1, r2], how='diagonal_relaxed')

(l.join(r, on='B')
  .filter(A = pl.col('A'))
  .group_by('A')
  .first()
  .collect() 
)
# ShapeError: unable to vstack, column names don't match: "B" and "A"

Update: It seems this started to error in 1.13.0

I cannot reproduce this one.

@mdavis-xyz can you make a minimal repro? Try to get out boto3 and aws for instance and add the minimal amount of columns operations needed to reproduce this.

@ritchie46 ritchie46 added the needs repro Bug does not yet have a reproducible example label Nov 25, 2024
@mdavis-xyz
Copy link
Contributor Author

mdavis-xyz commented Nov 25, 2024

The MWE from @cmdlineluser does not produce the error for me. But here is one which does:

import polars as pl
(
    pl.LazyFrame({
        'e': ['a'],
    })
    .cast({
        'e': pl.Enum(['a']),
    })
    .sink_parquet('1.parquet')
)

(
    pl.LazyFrame({
        'x': [1],
    })
    .sink_parquet('2.parquet')
)


(
    pl.concat([
        pl.scan_parquet('1.parquet'),
        pl.scan_parquet('2.parquet'),
    ], how='diagonal')
    .collect()
)

SchemaError: type Categorical(Some(local), Physical) is incompatible with expected type Enum(Some(local), Physical)

It seems the issue arises when doing a diagonal concat and one of the lazyframes is missing a column which is of type enum.

If I directly concatenate LazyFrames without going via parquet, I do not get the error. If I use Categorical instead of enum, I do not get the error.

@mdavis-xyz
Copy link
Contributor Author

Hmm, woops. That might actually be a different error to what I originally reported.

SchemaError: type Categorical(Some(local), Physical) is incompatible with expected type Enum(Some(local), Physical)

vs

polars.exceptions.ShapeError: unable to vstack, column names don't match: "BIDTYPE" and "DUID"

@cmdlineluser
Copy link
Contributor

FWIW, I was getting both errors while trying to get a MWE.

It seemed to go from No Error -> SchemaError -> ShapeError depending on the number of rows.

@peterbuecker-form3
Copy link

peterbuecker-form3 commented Dec 5, 2024

I'm encountering the same issue, starting with Polars 1.13.0. General system info and some other package versions (not sure if they're relevant):

  • Mac M1 (MacOS 15.1.1)
  • CPython 3.12.6
  • numpy 2.1.3
  • pyarrow 17.0.0
--------Version info---------
Polars:              1.16.0
Index type:          UInt32
Platform:            macOS-15.1.1-arm64-arm-64bit
Python:              3.12.6 (main, Sep  6 2024, 19:03:47) [Clang 15.0.0 (clang-1500.3.9.4)]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
google.auth          2.36.0
great_tables         <not installed>
matplotlib           <not installed>
nest_asyncio         1.6.0
numpy                2.1.3
openpyxl             <not installed>
pandas               2.2.3
pyarrow              17.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>

I've come up with a somewhat minimal reproduction case which shows the error happening.

Generally, data frames are constructed in-memory and written to Parquet files. Those are then read back using pl.scan_parquet(), then combined using pl.concat(..., how='diagonal_relaxed'). A basic query with a filter() is executed to test for the issue.

At a high-level, I've observed:

  1. The problem does only occur with .collect(predicate_pushdown=True)
    1. Using pl.read_parquet() instead also works
  2. The problem is only triggered for data frames beyond a certain number of rows
  3. How many rows are needed to trigger the issue depends on how the Parquet files were written (pl.sink_parquet() vs. pl.write_parquet())

In summary:

Predicate pushdown Parquet files written using pl.sink_parquet() Row count at which the issue occurs
yes yes 10
no yes (works as expected)
yes no 2621440
no no (works as expected)

Here's the reproduction case - it's a bit verbose because it tests for various combinations of the issue using binary search:

Reproduction case

import polars as pl
from polars.exceptions import ShapeError

pl.show_versions()

def _experiment(n, predicate_pushdown, lazy) -> bool:
    col1_type = pl.Int8
    col2_type = pl.Int8

    data = {
        'col1': [0] * n,
        'col2': [0] * n
    }

    if lazy:
        cls = pl.LazyFrame
    else:
        cls = pl.DataFrame

    df1 = cls(data, schema={
        'col1': col1_type,
        'col2': col2_type,
    })

    df2 = cls(data, schema={
        'col2': col2_type,
        'col1': col1_type,
    })

    if lazy:
        df1.sink_parquet('df1.parquet')
        df2.sink_parquet('df2.parquet')
    else:
        df1.write_parquet('df1.parquet')
        df2.write_parquet('df2.parquet')

    df1 = pl.scan_parquet('df1.parquet')
    df2 = pl.scan_parquet('df2.parquet')
    df = pl.concat([df1, df2], how='diagonal_relaxed')

    try:
        df.filter(pl.col('col1') >= 0).collect(predicate_pushdown=predicate_pushdown)
    except ShapeError as e:
        print(df1.collect().shape)
        print(e)
        return False

    return True

def experiment(predicate_pushdown, lazy) -> bool:
    print(f'*** Trying to reproduce issue with predicate_pushdown={predicate_pushdown} lazy={lazy}')
    failed = False
    high = 5000000
    low = 1

    while low < high:
        mid = (low + high) // 2
        if _experiment(mid, predicate_pushdown, lazy):
            low = mid + 1
        else:
            failed = True
            high = mid

    if failed:
        print(f'*** Stacking starts to fail with dataframes of length {low} and predicate_pushdown={predicate_pushdown} lazy={lazy}\n\n')
    else:
        print(f'*** Everything worked as expected with predicate_pushdown={predicate_pushdown} lazy={lazy}\n\n')

experiment(predicate_pushdown=True, lazy=True)
experiment(predicate_pushdown=False, lazy=True)
experiment(predicate_pushdown=True, lazy=False)
experiment(predicate_pushdown=False, lazy=False)

The failing results, as observed on my machine with Polars 1.16.0

--------Version info---------
Polars:              1.16.0
Index type:          UInt32
Platform:            macOS-15.1.1-arm64-arm-64bit
Python:              3.12.6 (main, Sep  6 2024, 19:03:47) [Clang 15.0.0 (clang-1500.3.9.4)]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
google.auth          2.36.0
great_tables         <not installed>
matplotlib           <not installed>
nest_asyncio         1.6.0
numpy                2.1.3
openpyxl             <not installed>
pandas               2.2.3
pyarrow              17.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>

*** Trying to reproduce issue with predicate_pushdown=True lazy=True
(2500000, 2)
unable to vstack, column names don't match: "col1" and "col2"
(1250000, 2)
unable to vstack, column names don't match: "col1" and "col2"
(625000, 2)
unable to vstack, column names don't match: "col1" and "col2"
(312500, 2)
unable to vstack, column names don't match: "col1" and "col2"
(156250, 2)
unable to vstack, column names don't match: "col1" and "col2"
(78125, 2)
unable to vstack, column names don't match: "col1" and "col2"
(39063, 2)
unable to vstack, column names don't match: "col1" and "col2"
(19532, 2)
unable to vstack, column names don't match: "col1" and "col2"
(9766, 2)
unable to vstack, column names don't match: "col1" and "col2"
(4883, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2442, 2)
unable to vstack, column names don't match: "col1" and "col2"
(1221, 2)
unable to vstack, column names don't match: "col1" and "col2"
(611, 2)
unable to vstack, column names don't match: "col1" and "col2"
(306, 2)
unable to vstack, column names don't match: "col1" and "col2"
(153, 2)
unable to vstack, column names don't match: "col1" and "col2"
(77, 2)
unable to vstack, column names don't match: "col1" and "col2"
(39, 2)
unable to vstack, column names don't match: "col1" and "col2"
(20, 2)
unable to vstack, column names don't match: "col1" and "col2"
(10, 2)
unable to vstack, column names don't match: "col1" and "col2"
*** Stacking starts to fail with dataframes of length 10 and predicate_pushdown=True lazy=True


*** Trying to reproduce issue with predicate_pushdown=False lazy=True
*** Everything worked as expected with predicate_pushdown=False lazy=True


*** Trying to reproduce issue with predicate_pushdown=True lazy=False
(3750000, 2)
unable to vstack, column names don't match: "col1" and "col2"
(3125000, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2812500, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2656250, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2636719, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2626954, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2622071, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2621461, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2621442, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2621440, 2)
unable to vstack, column names don't match: "col1" and "col2"
*** Stacking starts to fail with dataframes of length 2621440 and predicate_pushdown=True lazy=False


*** Trying to reproduce issue with predicate_pushdown=False lazy=False
*** Everything worked as expected with predicate_pushdown=False lazy=False

The OK results, as observed on my machine with Polars 1.12.0:

--------Version info---------
Polars:              1.12.0
Index type:          UInt32
Platform:            macOS-15.1.1-arm64-arm-64bit
Python:              3.12.6 (main, Sep  6 2024, 19:03:47) [Clang 15.0.0 (clang-1500.3.9.4)]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
great_tables         <not installed>
matplotlib           <not installed>
nest_asyncio         1.6.0
numpy                2.1.3
openpyxl             <not installed>
pandas               2.2.3
pyarrow              17.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>

*** Trying to reproduce issue with predicate_pushdown=True lazy=True
*** Everything worked as expected with predicate_pushdown=True lazy=True


*** Trying to reproduce issue with predicate_pushdown=False lazy=True
*** Everything worked as expected with predicate_pushdown=False lazy=True


*** Trying to reproduce issue with predicate_pushdown=True lazy=False
*** Everything worked as expected with predicate_pushdown=True lazy=False


*** Trying to reproduce issue with predicate_pushdown=False lazy=False
*** Everything worked as expected with predicate_pushdown=False lazy=False

@peterbuecker-form3
Copy link

Sorry @mdavis-xyz @ritchie46 I forgot to add that release v1.13.0 is the first Polars release where I'm seeing this issue 👍 Let me know if you need any more details or if you have any questions.

@peterbuecker-form3
Copy link

Through git bisect between py-1.12.0 (good) and py-1.13.0 (bad) I've identified the issue to have started with 3fe10a3, part of #19190.

@mdavis-xyz
Copy link
Contributor Author

I've just run my full code (more complex than the MWE) using polars v1.17.0, and now it works. Thanks!

@peterbuecker-form3
Copy link

Same here, thanks a lot @coastalwhite @ritchie46 @nameexhaustion @cmdlineluser, that was a very quick fix ❤️ Can confirm the issue is solved in v1.17.0 🥇 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working needs repro Bug does not yet have a reproducible example needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants