predicate/projection pushdown causes ShapeError #19944

mdavis-xyz · 2024-11-23T21:04:03Z

Checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.

Reproducible example

Note that this script will download 170MB of sample files from AWS S3. But you shouldn't need any AWS credentials for this to work. (I made the bucket public.)

import os
import datetime as dt
from copy import deepcopy as copy
from tempfile import TemporaryDirectory

import boto3
from botocore.config import Config
from botocore import UNSIGNED
import polars as pl

# download the data
filenames_short = ['bop1.parquet', 'bop2.parquet', 'bdo.parquet']
bucket_name = 'mdavis-xyz-public-mwe'
s3_prefix = 'pl-shape-error/01/'
local_dir = 'mwe-data'
region_name = "eu-west-3"

s3 = boto3.resource('s3', 
                  config=Config(signature_version=UNSIGNED),
                  region_name=region_name)
bucket = s3.Bucket(bucket_name)

os.makedirs(local_dir, exist_ok=True)
local_paths = []
remote_paths = []
for f in filenames_short:
    local_path = os.path.join(local_dir, f)
    remote_path = os.path.join(s3_prefix, f)
    bucket.download_file(remote_path, local_path)
    local_paths.append(local_path)
    remote_paths.append(os.path.join('s3://' + bucket_name, remote_path))
remote_paths

# define the query which causes problems
def query(bop_paths, bdo_path, storage_options={}, predicate_pushdown=True, projection_pushdown=True):
    bidofferperiod_wide = (
        pl.concat([
            pl.scan_parquet(bop_paths[0], storage_options=storage_options),
            pl.scan_parquet(bop_paths[1], storage_options=storage_options)
        ], how='diagonal_relaxed')
    )
    
    biddayoffer_per_bid_wide = pl.scan_parquet(bdo_path, storage_options=storage_options)
    
    lf = (
        biddayoffer_per_bid_wide
        .rename({
            'SETTLEMENTDATE': 'TRADINGDATE',
            'OFFERDATE': 'OFFERDATETIME',
        })
        .with_columns(pl.col("TRADINGDATE").cast(dt.date))
        .join(bidofferperiod_wide, 
              how='inner', 
              on=['BIDTYPE', 'DIRECTION', 'DUID', 'OFFERDATETIME', 'TRADINGDATE'], 
              join_nulls=True,
              coalesce=True
        )
        .filter(pl.col("MAXAVAIL") > 0) # avoid divide by zero later
        .filter(pl.col("DUID").is_not_null())
    
        
        .tail()
        .group_by("DUID")
        .agg([
             pl.len().alias("LENGTH"),
        ])
    )
    lf.collect(predicate_pushdown=predicate_pushdown, projection_pushdown=projection_pushdown)

# these ones work
query(remote_paths[:2], remote_paths[-1], storage_options={"aws_region": region_name})
query(local_paths[:2], local_paths[-1], predicate_pushdown=False, projection_pushdown=False)
query(local_paths[:2], local_paths[-1], predicate_pushdown=False, projection_pushdown=True)
query(local_paths[:2], local_paths[-1], predicate_pushdown=True, projection_pushdown=False)

# this throws an error
query(local_paths[:2], local_paths[-1], predicate_pushdown=True, projection_pushdown=True)

Log output

Async thread count: 4
async download_chunk_size: 67108864
join parallel: true
UNION: union is run in parallel
POLARS PREFETCH_SIZE: 96
querying metadata of 1/1 files...
POLARS PREFETCH_SIZE: 96
reading of 1/1 file...
querying metadata of 1/1 files...
POLARS PREFETCH_SIZE: 96
reading of 1/1 file...
querying metadata of 1/1 files...
reading of 1/1 file...
parquet row group must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
POLARS ROW_GROUP PREFETCH_SIZE: 128
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
...
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
POLARS ROW_GROUP PREFETCH_SIZE: 128
parquet scan with parallel = Columns
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet scan with parallel = RowGroups
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
...
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
sys:1: CategoricalRemappingWarning: Local categoricals have different encodings, expensive re-encoding is done to perform this merge operation. Consider using a StringCache or an Enum type if the categories are known in advance
parquet scan with parallel = RowGroups
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
...
parquet row group must be read, statistics not sufficient for predicate.
/home/azureuser/after_holiday/mwe/gh/script.py:63: CategoricalRemappingWarning: Local categoricals have different encodings, expensive re-encoding is done to perform this merge operation. Consider using a StringCache or an Enum type if the categories are known in advance
  lf.collect(predicate_pushdown=predicate_pushdown, projection_pushdown=projection_pushdown)
INNER join dataframes finished
FOUND SORTED KEY: running default HASH AGGREGATION
join parallel: true
UNION: union is run in parallel
parquet scan with parallel = RowGroups
parquet scan with parallel = Columns
parquet scan with parallel = Columns
INNER join dataframes finished
dataframe filtered
dataframe filtered
FOUND SORTED KEY: running default HASH AGGREGATION
join parallel: true
UNION: union is run in parallel
parquet scan with parallel = RowGroups
parquet scan with parallel = Columns
parquet scan with parallel = RowGroups
INNER join dataframes finished
dataframe filtered
dataframe filtered
FOUND SORTED KEY: running default HASH AGGREGATION
join parallel: true
UNION: union is run in parallel
parquet scan with parallel = Prefiltered
parquet live columns = 1, dead columns = 30
parquet scan with parallel = Columns
parquet scan with parallel = Prefiltered
parquet live columns = 2, dead columns = 26
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
...
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
...
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
INNER join dataframes finished
FOUND SORTED KEY: running default HASH AGGREGATION
join parallel: true
UNION: union is run in parallel
parquet scan with parallel = Columns
parquet scan with parallel = Prefiltered
parquet live columns = 1, dead columns = 4
parquet scan with parallel = Prefiltered
parquet live columns = 2, dead columns = 4
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
parquet file must be read, statistics not sufficient for predicate.
...
parquet file must be read, statistics not sufficient for predicate.
parquet row group must be read, statistics not sufficient for predicate.
Traceback (most recent call last):
  File "/home/azureuser/after_holiday/mwe/gh/script.py", line 69, in <module>
    query(local_paths[:2], local_paths[-1], predicate_pushdown=True, projection_pushdown=True)
  File "/home/azureuser/after_holiday/mwe/gh/script.py", line 63, in query
    lf.collect(predicate_pushdown=predicate_pushdown, projection_pushdown=projection_pushdown)
  File "/home/azureuser/venv/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 2029, in collect
    return wrap_df(ldf.collect(callback))
polars.exceptions.ShapeError: unable to vstack, column names don't match: "BIDTYPE" and "DUID"

Issue description

I get error:

ShapeError: unable to vstack, column names don't match: "BIDTYPE" and "DUID"

For code which has no vstack command. I do have a concat but with diagonal_relaxed.

Note that if I query from AWS S3 directly, I don't get the error. If I turn off either predicate or projection pushdown (or both), I don't get the error. Only when I turn on both, and read the file locally, do I get the error.

I suspect the error may be related to enums/categoricals.

It may be related to #13381

Expected behavior

The script should run without throwing an exception.

Querying files locally should give the same result as querying from S3.

Installed versions

--------Version info---------
Polars:              1.14.0
Index type:          UInt32
Platform:            Linux-6.8.0-1018-azure-x86_64-with-glibc2.35
Python:              3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
boto3                1.35.68
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            0.12.0
fsspec               <not installed>
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           3.9.2
nest_asyncio         1.6.0
numpy                2.0.0
openpyxl             3.1.5
pandas               2.2.3
pyarrow              18.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             0.8.4
xlsxwriter           3.2.0

The text was updated successfully, but these errors were encountered:

cmdlineluser · 2024-11-23T23:22:49Z

Attempted minimal repro.

import polars as pl

filename = '19944.parquet'

n = 1_500_000

values = list('abcdefghijlmno')
dtype = pl.Enum(values)
s = pl.Series(values, dtype=dtype)

pl.select(
    A = pl.lit('a'),
    B = s.sample(n, with_replacement=True),
    C = s.sample(n, with_replacement=True)
).write_parquet(filename)

r1 = pl.LazyFrame({'A': 'a', 'B': 'b'}).cast({'B': dtype})
r2 = pl.scan_parquet(filename).cast({'B': dtype, 'C': dtype})

l = pl.LazyFrame({'B': 'b'}).cast({'B': dtype})
r = pl.concat([r1, r2], how='diagonal_relaxed')

(l.join(r, on='B')
  .filter(A = pl.col('A'))
  .group_by('A')
  .first()
  .collect() 
)
# ShapeError: unable to vstack, column names don't match: "B" and "A"

Update: It seems this started to error in 1.13.0

ritchie46 · 2024-11-25T07:56:42Z

Attempted minimal repro.

import polars as pl

filename = '19944.parquet'

n = 1_500_000

values = list('abcdefghijlmno')
dtype = pl.Enum(values)
s = pl.Series(values, dtype=dtype)

pl.select(
    A = pl.lit('a'),
    B = s.sample(n, with_replacement=True),
    C = s.sample(n, with_replacement=True)
).write_parquet(filename)

r1 = pl.LazyFrame({'A': 'a', 'B': 'b'}).cast({'B': dtype})
r2 = pl.scan_parquet(filename).cast({'B': dtype, 'C': dtype})

l = pl.LazyFrame({'B': 'b'}).cast({'B': dtype})
r = pl.concat([r1, r2], how='diagonal_relaxed')

(l.join(r, on='B')
  .filter(A = pl.col('A'))
  .group_by('A')
  .first()
  .collect() 
)
# ShapeError: unable to vstack, column names don't match: "B" and "A"

Update: It seems this started to error in 1.13.0

I cannot reproduce this one.

@mdavis-xyz can you make a minimal repro? Try to get out boto3 and aws for instance and add the minimal amount of columns operations needed to reproduce this.

mdavis-xyz · 2024-11-25T16:06:37Z

The MWE from @cmdlineluser does not produce the error for me. But here is one which does:

import polars as pl
(
    pl.LazyFrame({
        'e': ['a'],
    })
    .cast({
        'e': pl.Enum(['a']),
    })
    .sink_parquet('1.parquet')
)

(
    pl.LazyFrame({
        'x': [1],
    })
    .sink_parquet('2.parquet')
)


(
    pl.concat([
        pl.scan_parquet('1.parquet'),
        pl.scan_parquet('2.parquet'),
    ], how='diagonal')
    .collect()
)

SchemaError: type Categorical(Some(local), Physical) is incompatible with expected type Enum(Some(local), Physical)

It seems the issue arises when doing a diagonal concat and one of the lazyframes is missing a column which is of type enum.

If I directly concatenate LazyFrames without going via parquet, I do not get the error. If I use Categorical instead of enum, I do not get the error.

mdavis-xyz · 2024-11-25T16:08:20Z

Hmm, woops. That might actually be a different error to what I originally reported.

SchemaError: type Categorical(Some(local), Physical) is incompatible with expected type Enum(Some(local), Physical)

vs

polars.exceptions.ShapeError: unable to vstack, column names don't match: "BIDTYPE" and "DUID"

cmdlineluser · 2024-11-25T16:24:08Z

FWIW, I was getting both errors while trying to get a MWE.

It seemed to go from No Error -> SchemaError -> ShapeError depending on the number of rows.

peterbuecker-form3 · 2024-12-05T10:22:34Z

I'm encountering the same issue, starting with Polars 1.13.0. General system info and some other package versions (not sure if they're relevant):

Mac M1 (MacOS 15.1.1)
CPython 3.12.6
numpy 2.1.3
pyarrow 17.0.0

--------Version info---------
Polars:              1.16.0
Index type:          UInt32
Platform:            macOS-15.1.1-arm64-arm-64bit
Python:              3.12.6 (main, Sep  6 2024, 19:03:47) [Clang 15.0.0 (clang-1500.3.9.4)]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
google.auth          2.36.0
great_tables         <not installed>
matplotlib           <not installed>
nest_asyncio         1.6.0
numpy                2.1.3
openpyxl             <not installed>
pandas               2.2.3
pyarrow              17.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>

I've come up with a somewhat minimal reproduction case which shows the error happening.

Generally, data frames are constructed in-memory and written to Parquet files. Those are then read back using pl.scan_parquet(), then combined using pl.concat(..., how='diagonal_relaxed'). A basic query with a filter() is executed to test for the issue.

At a high-level, I've observed:

The problem does only occur with .collect(predicate_pushdown=True)
1. Using pl.read_parquet() instead also works
The problem is only triggered for data frames beyond a certain number of rows
How many rows are needed to trigger the issue depends on how the Parquet files were written (pl.sink_parquet() vs. pl.write_parquet())

In summary:

Predicate pushdown	Parquet files written using `pl.sink_parquet()`	Row count at which the issue occurs
yes	yes	10
no	yes	(works as expected)
yes	no	2621440
no	no	(works as expected)

Here's the reproduction case - it's a bit verbose because it tests for various combinations of the issue using binary search:

Reproduction case

import polars as pl
from polars.exceptions import ShapeError

pl.show_versions()

def _experiment(n, predicate_pushdown, lazy) -> bool:
    col1_type = pl.Int8
    col2_type = pl.Int8

    data = {
        'col1': [0] * n,
        'col2': [0] * n
    }

    if lazy:
        cls = pl.LazyFrame
    else:
        cls = pl.DataFrame

    df1 = cls(data, schema={
        'col1': col1_type,
        'col2': col2_type,
    })

    df2 = cls(data, schema={
        'col2': col2_type,
        'col1': col1_type,
    })

    if lazy:
        df1.sink_parquet('df1.parquet')
        df2.sink_parquet('df2.parquet')
    else:
        df1.write_parquet('df1.parquet')
        df2.write_parquet('df2.parquet')

    df1 = pl.scan_parquet('df1.parquet')
    df2 = pl.scan_parquet('df2.parquet')
    df = pl.concat([df1, df2], how='diagonal_relaxed')

    try:
        df.filter(pl.col('col1') >= 0).collect(predicate_pushdown=predicate_pushdown)
    except ShapeError as e:
        print(df1.collect().shape)
        print(e)
        return False

    return True

def experiment(predicate_pushdown, lazy) -> bool:
    print(f'*** Trying to reproduce issue with predicate_pushdown={predicate_pushdown} lazy={lazy}')
    failed = False
    high = 5000000
    low = 1

    while low < high:
        mid = (low + high) // 2
        if _experiment(mid, predicate_pushdown, lazy):
            low = mid + 1
        else:
            failed = True
            high = mid

    if failed:
        print(f'*** Stacking starts to fail with dataframes of length {low} and predicate_pushdown={predicate_pushdown} lazy={lazy}\n\n')
    else:
        print(f'*** Everything worked as expected with predicate_pushdown={predicate_pushdown} lazy={lazy}\n\n')

experiment(predicate_pushdown=True, lazy=True)
experiment(predicate_pushdown=False, lazy=True)
experiment(predicate_pushdown=True, lazy=False)
experiment(predicate_pushdown=False, lazy=False)

The failing results, as observed on my machine with Polars 1.16.0

--------Version info---------
Polars:              1.16.0
Index type:          UInt32
Platform:            macOS-15.1.1-arm64-arm-64bit
Python:              3.12.6 (main, Sep  6 2024, 19:03:47) [Clang 15.0.0 (clang-1500.3.9.4)]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
google.auth          2.36.0
great_tables         <not installed>
matplotlib           <not installed>
nest_asyncio         1.6.0
numpy                2.1.3
openpyxl             <not installed>
pandas               2.2.3
pyarrow              17.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>

*** Trying to reproduce issue with predicate_pushdown=True lazy=True
(2500000, 2)
unable to vstack, column names don't match: "col1" and "col2"
(1250000, 2)
unable to vstack, column names don't match: "col1" and "col2"
(625000, 2)
unable to vstack, column names don't match: "col1" and "col2"
(312500, 2)
unable to vstack, column names don't match: "col1" and "col2"
(156250, 2)
unable to vstack, column names don't match: "col1" and "col2"
(78125, 2)
unable to vstack, column names don't match: "col1" and "col2"
(39063, 2)
unable to vstack, column names don't match: "col1" and "col2"
(19532, 2)
unable to vstack, column names don't match: "col1" and "col2"
(9766, 2)
unable to vstack, column names don't match: "col1" and "col2"
(4883, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2442, 2)
unable to vstack, column names don't match: "col1" and "col2"
(1221, 2)
unable to vstack, column names don't match: "col1" and "col2"
(611, 2)
unable to vstack, column names don't match: "col1" and "col2"
(306, 2)
unable to vstack, column names don't match: "col1" and "col2"
(153, 2)
unable to vstack, column names don't match: "col1" and "col2"
(77, 2)
unable to vstack, column names don't match: "col1" and "col2"
(39, 2)
unable to vstack, column names don't match: "col1" and "col2"
(20, 2)
unable to vstack, column names don't match: "col1" and "col2"
(10, 2)
unable to vstack, column names don't match: "col1" and "col2"
*** Stacking starts to fail with dataframes of length 10 and predicate_pushdown=True lazy=True


*** Trying to reproduce issue with predicate_pushdown=False lazy=True
*** Everything worked as expected with predicate_pushdown=False lazy=True


*** Trying to reproduce issue with predicate_pushdown=True lazy=False
(3750000, 2)
unable to vstack, column names don't match: "col1" and "col2"
(3125000, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2812500, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2656250, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2636719, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2626954, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2622071, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2621461, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2621442, 2)
unable to vstack, column names don't match: "col1" and "col2"
(2621440, 2)
unable to vstack, column names don't match: "col1" and "col2"
*** Stacking starts to fail with dataframes of length 2621440 and predicate_pushdown=True lazy=False


*** Trying to reproduce issue with predicate_pushdown=False lazy=False
*** Everything worked as expected with predicate_pushdown=False lazy=False

The OK results, as observed on my machine with Polars 1.12.0:

--------Version info---------
Polars:              1.12.0
Index type:          UInt32
Platform:            macOS-15.1.1-arm64-arm-64bit
Python:              3.12.6 (main, Sep  6 2024, 19:03:47) [Clang 15.0.0 (clang-1500.3.9.4)]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
great_tables         <not installed>
matplotlib           <not installed>
nest_asyncio         1.6.0
numpy                2.1.3
openpyxl             <not installed>
pandas               2.2.3
pyarrow              17.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>

*** Trying to reproduce issue with predicate_pushdown=True lazy=True
*** Everything worked as expected with predicate_pushdown=True lazy=True


*** Trying to reproduce issue with predicate_pushdown=False lazy=True
*** Everything worked as expected with predicate_pushdown=False lazy=True


*** Trying to reproduce issue with predicate_pushdown=True lazy=False
*** Everything worked as expected with predicate_pushdown=True lazy=False


*** Trying to reproduce issue with predicate_pushdown=False lazy=False
*** Everything worked as expected with predicate_pushdown=False lazy=False

peterbuecker-form3 · 2024-12-05T10:24:43Z

Sorry @mdavis-xyz @ritchie46 I forgot to add that release v1.13.0 is the first Polars release where I'm seeing this issue 👍 Let me know if you need any more details or if you have any questions.

peterbuecker-form3 · 2024-12-05T21:42:32Z

Through git bisect between py-1.12.0 (good) and py-1.13.0 (bad) I've identified the issue to have started with 3fe10a3, part of #19190.

mdavis-xyz · 2024-12-08T21:18:23Z

I've just run my full code (more complex than the MWE) using polars v1.17.0, and now it works. Thanks!

peterbuecker-form3 · 2024-12-09T07:47:38Z

Same here, thanks a lot @coastalwhite @ritchie46 @nameexhaustion @cmdlineluser, that was a very quick fix ❤️ Can confirm the issue is solved in v1.17.0 🥇 !

mdavis-xyz added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Nov 23, 2024

ritchie46 added the needs repro Bug does not yet have a reproducible example label Nov 25, 2024

cmdlineluser mentioned this issue Dec 5, 2024

polars 1.13.0 filtering by date column doesn't work in lazyframe obtained via scan_parquet #19766

Closed

2 tasks

alexander-beedie mentioned this issue Dec 5, 2024

scan_parquet filter/select optimisation error #20175

Closed

2 tasks

ritchie46 mentioned this issue Dec 6, 2024

fix: Serialize categories of Enum in arrow metadata #20181

Merged

ritchie46 closed this as completed in #20181 Dec 6, 2024

c-peters added the accepted Ready for implementation label Dec 8, 2024

c-peters assigned ritchie46 Dec 8, 2024

github-project-automation bot added this to Backlog Dec 8, 2024

github-project-automation bot moved this to Ready in Backlog Dec 8, 2024

c-peters moved this from Ready to Done in Backlog Dec 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

predicate/projection pushdown causes ShapeError #19944

predicate/projection pushdown causes ShapeError #19944

mdavis-xyz commented Nov 23, 2024 •

edited

Loading

cmdlineluser commented Nov 23, 2024 •

edited

Loading

ritchie46 commented Nov 25, 2024

mdavis-xyz commented Nov 25, 2024 •

edited

Loading

mdavis-xyz commented Nov 25, 2024

cmdlineluser commented Nov 25, 2024

peterbuecker-form3 commented Dec 5, 2024 •

edited

Loading

peterbuecker-form3 commented Dec 5, 2024

peterbuecker-form3 commented Dec 5, 2024

mdavis-xyz commented Dec 8, 2024

peterbuecker-form3 commented Dec 9, 2024

predicate/projection pushdown causes ShapeError #19944

predicate/projection pushdown causes ShapeError #19944

Comments

mdavis-xyz commented Nov 23, 2024 • edited Loading

Checks

Reproducible example

Log output

Issue description

Expected behavior

Installed versions

cmdlineluser commented Nov 23, 2024 • edited Loading

ritchie46 commented Nov 25, 2024

mdavis-xyz commented Nov 25, 2024 • edited Loading

mdavis-xyz commented Nov 25, 2024

cmdlineluser commented Nov 25, 2024

peterbuecker-form3 commented Dec 5, 2024 • edited Loading

peterbuecker-form3 commented Dec 5, 2024

peterbuecker-form3 commented Dec 5, 2024

mdavis-xyz commented Dec 8, 2024

peterbuecker-form3 commented Dec 9, 2024

mdavis-xyz commented Nov 23, 2024 •

edited

Loading

cmdlineluser commented Nov 23, 2024 •

edited

Loading

mdavis-xyz commented Nov 25, 2024 •

edited

Loading

peterbuecker-form3 commented Dec 5, 2024 •

edited

Loading