Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting panic when calling LazyFrame.group_by().map_groups and intermitten panic when calling LazyFrame.columns #16385

Closed
2 tasks done
kszlim opened this issue May 21, 2024 · 8 comments
Assignees
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@kszlim
Copy link
Contributor

kszlim commented May 21, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

n/a

Log output

With RUST_BACKTRACE=full


thread '<unnamed>' panicked at crates/polars-plan/src/logical_plan/optimizer/predicate_pushdown/mod.rs:356:69:
called `Option::unwrap()` on a `None` value
stack backtrace:
   0:     0x7feacd22c108 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hc65a86809eb3aa65
   1:     0x7feacab6c94b - core::fmt::write::hcd5b8dd8febb96a0
   2:     0x7feacd1fac4e - std::io::Write::write_fmt::hc422b42d0849f877
   3:     0x7feacd231169 - std::sys_common::backtrace::print::h286fd4354e2ba39e
   4:     0x7feacd230a79 - std::panicking::default_hook::{{closure}}::hf9d4b516f8220f92
   5:     0x7feacd231c25 - std::panicking::rust_panic_with_hook::h124d9722759d43e1
   6:     0x7feacd2314ba - std::panicking::begin_panic_handler::{{closure}}::h1123a3c792c1da95
   7:     0x7feacd231449 - std::sys_common::backtrace::__rust_end_short_backtrace::h33bd6640824974d0
   8:     0x7feacd231436 - rust_begin_unwind
   9:     0x7feac9a320b2 - core::panicking::panic_fmt::hea6c49867823d75c
  10:     0x7feac9a32184 - core::panicking::panic::hfd7eccb65c6169e0
  11:     0x7feac9a32548 - core::option::unwrap_failed::h86dc8dafdfc76144
  12:     0x7feaccdfaaa4 - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::{{closure}}::h5a5bad65de362a3b
  13:     0x7feaccde79eb - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::h61896591c8f43972
  14:     0x7feaccdf900d - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::{{closure}}::h5a5bad65de362a3b
  15:     0x7feaccde79eb - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::h61896591c8f43972
  16:     0x7feaccdff37b - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::pushdown_and_continue::h23c73325c77176a6
  17:     0x7feaccdf46d2 - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::{{closure}}::h5a5bad65de362a3b
  18:     0x7feaccde79eb - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::h61896591c8f43972
  19:     0x7feacce0a711 - core::iter::adapters::map::map_try_fold::{{closure}}::he7ef732b1c8e3e90
  20:     0x7feaccdfe882 - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::pushdown_and_continue::h23c73325c77176a6
  21:     0x7feaccdf6584 - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::{{closure}}::h5a5bad65de362a3b
  22:     0x7feaccde79eb - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::h61896591c8f43972
  23:     0x7feaccdfcb6e - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::pushdown_and_assign::h97ce6db6241b6956
  24:     0x7feaccdf7c3f - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::{{closure}}::h5a5bad65de362a3b
  25:     0x7feaccde79eb - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::h61896591c8f43972
  26:     0x7feaccdff37b - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::pushdown_and_continue::h23c73325c77176a6
  27:     0x7feaccdf46d2 - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::{{closure}}::h5a5bad65de362a3b
  28:     0x7feaccde79eb - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::h61896591c8f43972
  29:     0x7feaccdff37b - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::pushdown_and_continue::h23c73325c77176a6
  30:     0x7feaccdf46d2 - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::{{closure}}::h5a5bad65de362a3b
  31:     0x7feaccde79eb - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::h61896591c8f43972
  32:     0x7feacce0a711 - core::iter::adapters::map::map_try_fold::{{closure}}::he7ef732b1c8e3e90
  33:     0x7feaccdfe882 - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::pushdown_and_continue::h23c73325c77176a6
  34:     0x7feaccdf6584 - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::{{closure}}::h5a5bad65de362a3b
  35:     0x7feaccde79eb - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::h61896591c8f43972
  36:     0x7feaccdfcb6e - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::pushdown_and_assign::h97ce6db6241b6956
  37:     0x7feaccdf7c3f - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::{{closure}}::h5a5bad65de362a3b
  38:     0x7feaccde79eb - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::h61896591c8f43972
  39:     0x7feaccdff37b - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::pushdown_and_continue::h23c73325c77176a6
  40:     0x7feaccdf46d2 - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::{{closure}}::h5a5bad65de362a3b
  41:     0x7feaccde79eb - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::h61896591c8f43972
  42:     0x7feaccdff37b - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::pushdown_and_continue::h23c73325c77176a6
  43:     0x7feaccdf46d2 - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::{{closure}}::h5a5bad65de362a3b
  44:     0x7feaccde79eb - polars_plan::logical_plan::optimizer::predicate_pushdown::PredicatePushDown::push_down::h61896591c8f43972
  45:     0x7feaccde614d - polars_plan::logical_plan::optimizer::optimize::hfb7b7173a02b0711
  46:     0x7feacbea3ff6 - polars_lazy::frame::LazyFrame::schema::h73004ade2e466225
  47:     0x7feacaa1429b - polars::lazyframe::PyLazyFrame::__pymethod_columns__::h242dc846d8ce3a0c
  48:     0x7feaca26f0e7 - pyo3::impl_::trampoline::trampoline::h04772e5c587c8251
  49:     0x7feadfaff99a - _PyEval_EvalFrameDefault
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:5293
  50:     0x7feadfc5e37a - _PyEval_EvalFrame
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/./Include/internal/pycore_ceval.h:73
  51:     0x7feadfc5e37a - _PyEval_Vector
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:6434
  52:     0x7feadfb57cf9 - _PyObject_VectorcallTstate
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/./Include/internal/pycore_call.h:92
  53:     0x7feadfb57cf9 - PyObject_CallOneArg
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:376
  54:     0x7feadfbbbe72 - _PyObject_GenericGetAttrWithDict
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/object.c:1278
  55:     0x7feadfbbb8bc - PyObject_GetAttr
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/object.c:916
  56:     0x7feadfafdd0b - _PyEval_EvalFrameDefault
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:3461
  57:     0x7feadfc5e37a - _PyEval_EvalFrame
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/./Include/internal/pycore_ceval.h:73
  58:     0x7feadfc5e37a - _PyEval_Vector
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:6434
  59:     0x7feadfb5b26e - _PyObject_VectorcallTstate
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/./Include/internal/pycore_call.h:92
  60:     0x7feadfb5b26e - method_vectorcall
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/classobject.c:59
  61:     0x7feadfb57648 - _PyVectorcall_Call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:257
  62:     0x7feadfb57648 - _PyObject_Call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:328
  63:     0x7feadfafe3ba - do_call_core
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:7352
  64:     0x7feadfafe3ba - _PyEval_EvalFrameDefault
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:5376
  65:     0x7feadfc5e37a - _PyEval_EvalFrame
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/./Include/internal/pycore_ceval.h:73
  66:     0x7feadfc5e37a - _PyEval_Vector
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:6434
  67:     0x7feadfb575f8 - _PyVectorcall_Call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:245
  68:     0x7feadfb575f8 - _PyObject_Call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:328
  69:     0x7feadfafe3ba - do_call_core
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:7352
  70:     0x7feadfafe3ba - _PyEval_EvalFrameDefault
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:5376
  71:     0x7feadfc5e37a - _PyEval_EvalFrame
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/./Include/internal/pycore_ceval.h:73
  72:     0x7feadfc5e37a - _PyEval_Vector
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:6434
  73:     0x7feadfb579c5 - _PyObject_FastCallDictTstate
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:152
  74:     0x7feadfb57c1b - _PyObject_Call_Prepend
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:482
  75:     0x7feadfbd9572 - slot_tp_call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/typeobject.c:7623
  76:     0x7feadfb57841 - _PyObject_MakeTpCall
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:214
  77:     0x7feadfb00d59 - _PyEval_EvalFrameDefault
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:4769
  78:     0x7feadfc5e37a - _PyEval_EvalFrame
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/./Include/internal/pycore_ceval.h:73
  79:     0x7feadfc5e37a - _PyEval_Vector
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:6434
  80:     0x7feadfb575f8 - _PyVectorcall_Call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:245
  81:     0x7feadfb575f8 - _PyObject_Call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:328
  82:     0x7feadfafe3ba - do_call_core
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:7352
  83:     0x7feadfafe3ba - _PyEval_EvalFrameDefault
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:5376
  84:     0x7feadfc5e37a - _PyEval_EvalFrame
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/./Include/internal/pycore_ceval.h:73
  85:     0x7feadfc5e37a - _PyEval_Vector
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:6434
  86:     0x7feadfb579c5 - _PyObject_FastCallDictTstate
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:152
  87:     0x7feadfb57c1b - _PyObject_Call_Prepend
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:482
  88:     0x7feadfbd9572 - slot_tp_call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/typeobject.c:7623
  89:     0x7feadfb575b8 - _PyObject_Call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:343
  90:     0x7feadfafe3ba - do_call_core
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:7352
  91:     0x7feadfafe3ba - _PyEval_EvalFrameDefault
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:5376
  92:     0x7feadfc5e37a - _PyEval_EvalFrame
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/./Include/internal/pycore_ceval.h:73
  93:     0x7feadfc5e37a - _PyEval_Vector
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:6434
  94:     0x7feadfb575f8 - _PyVectorcall_Call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:245
  95:     0x7feadfb575f8 - _PyObject_Call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:328
  96:     0x7feadfafe3ba - do_call_core
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:7352
  97:     0x7feadfafe3ba - _PyEval_EvalFrameDefault
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:5376
  98:     0x7feadfc5e37a - _PyEval_EvalFrame
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/./Include/internal/pycore_ceval.h:73
  99:     0x7feadfc5e37a - _PyEval_Vector
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:6434
 100:     0x7feadfb579c5 - _PyObject_FastCallDictTstate
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:152
 101:     0x7feadfb57c1b - _PyObject_Call_Prepend
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:482
 102:     0x7feadfbd9572 - slot_tp_call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/typeobject.c:7623
 103:     0x7feadfb57841 - _PyObject_MakeTpCall
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:214
 104:     0x7feadfb00d59 - _PyEval_EvalFrameDefault
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:4769
 105:     0x7feadfc5e37a - _PyEval_EvalFrame
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/./Include/internal/pycore_ceval.h:73
 106:     0x7feadfc5e37a - _PyEval_Vector
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:6434
 107:     0x7feadfb575f8 - _PyVectorcall_Call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:245
 108:     0x7feadfb575f8 - _PyObject_Call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:328
 109:     0x7feadfafe3ba - do_call_core
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:7352
 110:     0x7feadfafe3ba - _PyEval_EvalFrameDefault
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:5376
 111:     0x7feadfc5e37a - _PyEval_EvalFrame
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/./Include/internal/pycore_ceval.h:73
 112:     0x7feadfc5e37a - _PyEval_Vector
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:6434
 113:     0x7feadfb579c5 - _PyObject_FastCallDictTstate
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:152
 114:     0x7feadfb57c1b - _PyObject_Call_Prepend
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:482
 115:     0x7feadfbd9572 - slot_tp_call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/typeobject.c:7623
 116:     0x7feadfb57841 - _PyObject_MakeTpCall
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:214
 117:     0x7feadfb00d59 - _PyEval_EvalFrameDefault
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:4769
 118:     0x7feadfc5e37a - _PyEval_EvalFrame
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/./Include/internal/pycore_ceval.h:73
 119:     0x7feadfc5e37a - _PyEval_Vector
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:6434
 120:     0x7feadfb575f8 - _PyVectorcall_Call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:245
 121:     0x7feadfb575f8 - _PyObject_Call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:328
 122:     0x7feadfafe3ba - do_call_core
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:7352
 123:     0x7feadfafe3ba - _PyEval_EvalFrameDefault
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:5376
 124:     0x7feadfc5e37a - _PyEval_EvalFrame
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/./Include/internal/pycore_ceval.h:73
 125:     0x7feadfc5e37a - _PyEval_Vector
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:6434
 126:     0x7feadfb579c5 - _PyObject_FastCallDictTstate
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:152
 127:     0x7feadfb57c1b - _PyObject_Call_Prepend
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:482
 128:     0x7feadfbd9572 - slot_tp_call
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/typeobject.c:7623
 129:     0x7feadfb57841 - _PyObject_MakeTpCall
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Objects/call.c:214
 130:     0x7feadfb00d59 - _PyEval_EvalFrameDefault
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:4769
 131:     0x7feadfc5e219 - _PyEval_EvalFrame
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/./Include/internal/pycore_ceval.h:73
 132:     0x7feadfc5e219 - _PyEval_Vector
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:6434
 133:     0x7feadfc5e219 - PyEval_EvalCode
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/ceval.c:1148
 134:     0x7feadfca8121 - run_eval_code_obj
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/pythonrun.c:1710
 135:     0x7feadfca8121 - run_mod
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/pythonrun.c:1731
 136:     0x7feadfca9a20 - pyrun_file
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/pythonrun.c:1626
 137:     0x7feadfca9a20 - _PyRun_SimpleFileObject
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/pythonrun.c:440
 138:     0x7feadfca9f4c - _PyRun_AnyFileObject
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Python/pythonrun.c:79
 139:     0x7feadfccdbae - pymain_run_file_obj
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Modules/main.c:360
 140:     0x7feadfccdbae - pymain_run_file
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Modules/main.c:379
 141:     0x7feadfccdbae - pymain_run_python
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Modules/main.c:601
 142:     0x7feadfccdbae - Py_RunMain
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Modules/main.c:680
 143:     0x7feadfcce0a3 - pymain_main
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Modules/main.c:710
 144:     0x7feadfcce0a3 - Py_BytesMain
                               at /tmp/python-build.20231205220011.25159/Python-3.11.7/Modules/main.c:734
 145:     0x7feaded0813a - __libc_start_main
 146:           0x40066a - _start
 147:                0x0 - <unknown>


### Issue description

It panics intermittently while getting `.columns` from a ldf. The ldf is sourced from a cloud parquet file.

### Expected behavior

Should never panic.

### Installed versions

<details>

--------Version info---------
Polars: 0.20.27
Index type: UInt32
Platform: Linux-5.10.216-182.855.x86_64-x86_64-with-glibc2.26
Python: 3.11.7 (main, Dec 5 2023, 22:00:36) [GCC 7.3.1 20180712 (Red Hat 7.3.1-17)]

----Optional dependencies----
adbc_driver_manager:
cloudpickle: 3.0.0
connectorx:
deltalake:
fastexcel:
fsspec: 2024.5.0
gevent:
hvplot:
matplotlib: 3.8.4
nest_asyncio: 1.6.0
numpy: 1.26.4
openpyxl:
pandas: 2.2.2
pyarrow: 16.1.0
pydantic:
pyiceberg:
pyxlsb:
sqlalchemy:
torch:
xlsx2csv:
xlsxwriter: ```

@kszlim kszlim added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels May 21, 2024
@kszlim
Copy link
Contributor Author

kszlim commented May 21, 2024

Seems like it was introduced sometime between 0.20.22 -> 0.20.23

@kszlim
Copy link
Contributor Author

kszlim commented May 21, 2024

@kszlim
Copy link
Contributor Author

kszlim commented May 21, 2024

I managed to make it reproduce with:

import polars as pl

import pyarrow as pa
import pyarrow.parquet as pq
import numpy as np
import tempfile
import pandas as pd
from pathlib import Path
import os

# Parameters
num_records = 1000
num_ids = 10

# Generate random data
data = {
    'some_id': np.random.randint(0, num_ids, num_records),
    'a': np.random.rand(num_records),
    'b': np.random.rand(num_records),
    'c': np.random.rand(num_records)
}

# Convert to a Pandas DataFrame
df = pd.DataFrame(data)

# Convert Pandas DataFrame to a PyArrow Table
table = pa.Table.from_pandas(df)

# Use a temporary directory for output
with tempfile.TemporaryDirectory() as output_dir:
    # Write table to Parquet files partitioned by 'some_id'
    pq.write_to_dataset(
        table,
        root_path=output_dir,
        partition_cols=['some_id']
    )

    print(f"Data generation and partitioning complete. Files are stored in {output_dir}")
    print(os.listdir(output_dir))
    ldf = pl.scan_parquet(f"{output_dir}/**/*.parquet")
    df = ldf.filter(pl.col("some_id").is_in([0, 1, 2, 3])).group_by("some_id").map_groups(
        lambda df: df,
        schema=None
    ).collect()
    print(df)

Not sure if the repro is exactly the same as the columns issue, but I'm guessing it's likely related.

@kszlim kszlim changed the title Getting intermittent panic when calling LazyFrame.columns Getting intermittent panic when calling LazyFrame.columns or LazyFrame.group_by().map_groups May 21, 2024
@kszlim kszlim changed the title Getting intermittent panic when calling LazyFrame.columns or LazyFrame.group_by().map_groups Getting panic when calling LazyFrame.group_by().map_groups and intermitten panic when calling LazyFrame.columns May 21, 2024
@cmdlineluser
Copy link
Contributor

I can replicate the error.

Out of interest, trying schema={} runs without error.

I removed the pandas/numpy stuff from your example just to rule them out as potential issues:

import tempfile
import polars as pl
import pyarrow.parquet as pq

with tempfile.TemporaryDirectory() as output_dir:
    pq.write_to_dataset(
        pl.DataFrame({"some_id": 0, "a": 1}).to_arrow(),
        root_path=output_dir,
        partition_cols=["some_id"]
    )
    ldf = pl.scan_parquet(f"{output_dir}/**/*.parquet")
    (ldf.filter(pl.col("a").is_in(0))
        .group_by("a")
        .map_groups(lambda df: df, schema=None))

# thread '<unnamed>' panicked at crates/polars-plan/src/logical_plan/optimizer/predicate_pushdown/mod.rs:356:69:
# called `Option::unwrap()` on a `None` value

@macukadam
Copy link

macukadam commented May 23, 2024

I can replicate the error.

Out of interest, trying schema={} runs without error.

I removed the pandas/numpy stuff from your example just to rule them out as potential issues:

import tempfile
import polars as pl
import pyarrow.parquet as pq

with tempfile.TemporaryDirectory() as output_dir:
    pq.write_to_dataset(
        pl.DataFrame({"some_id": 0, "a": 1}).to_arrow(),
        root_path=output_dir,
        partition_cols=["some_id"]
    )
    ldf = pl.scan_parquet(f"{output_dir}/**/*.parquet")
    (ldf.filter(pl.col("a").is_in(0))
        .group_by("a")
        .map_groups(lambda df: df, schema=None))

# thread '<unnamed>' panicked at crates/polars-plan/src/logical_plan/optimizer/predicate_pushdown/mod.rs:356:69:
# called `Option::unwrap()` on a `None` value

Fork. I've tested your example with this little change and it works. Just one line check if the hive_partition_eval is indeed Some; getting rid of the unwrap call. But I don't dare to open pull request since I have no clue what is the root cause of this.

@kszlim
Copy link
Contributor Author

kszlim commented May 23, 2024

I'm guessing your fix just entirely ignores the hive partitioning? Ie. if it's None, it's just not considered at all?

@ritchie46 might know why it's happening i'm 60% sure it's related to what i posted earlier.

@kszlim
Copy link
Contributor Author

kszlim commented May 28, 2024

Not 100% sure if this is the case, but I believe this gets fixed by #16549 (notably the removal of Default::default() for the hive partition info.

I've compiled the latest main and the repro no longer panics and my full repro case seems to print appropriately.

@ritchie46 ritchie46 self-assigned this May 28, 2024
@ritchie46
Copy link
Member

Fixed by #16549.

A schema call shouldn't do optimization at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

4 participants