possible issue (sigsegv) with cached_context #340

tjb900 · 2022-07-27T05:47:08Z

Hi!

I've been tracking down a very rare crash in our application, and I think I've reached a point where I can see a problem in gmpy2 - would very much appreciate if you could either tell me I'm dreaming or not!

I believe we are crashing at this line:

/* Return borrowed reference to thread local context. */
static CTXT_Object *
GMPy_current_context(void)
{
    PyThreadState *tstate = PyThreadState_GET();

    if (cached_context && cached_context->tstate == tstate) {     <=== second part of this is a segfault, dereferencing cached_context
        return (CTXT_Object*)cached_context;
    }

    return current_context_from_dict();
}

gdb dump showing instruction where fault occurred

Dump of assembler code for function GMPy_current_context:
   0x00007f6b928eaaf0 <+0>:     push   %rbp
   0x00007f6b928eaaf1 <+1>:     push   %rbx
   0x00007f6b928eaaf2 <+2>:     sub    $0x8,%rsp

# I suspect this is the call to PyThreadState_GET
   0x00007f6b928eaaf6 <+6>:     callq  *0x70dec(%rip)        # 0x7f6b9295b8e8
   0x00007f6b928eaafc <+12>:    mov    0x77305(%rip),%rbx        # 0x7f6b92961e08 <cached_context>

# I'm fairly sure this the `if cached_context`
   0x00007f6b928eab03 <+19>:    test   %rbx,%rbx
   0x00007f6b928eab06 <+22>:    je     0x7f6b928eab0e <GMPy_current_context+30>

# and then this is the `if cached_context->tstate == tstate` - and that's where the crash is
=> 0x00007f6b928eab08 <+24>:    cmp    %rax,0x78(%rbx)
   0x00007f6b928eab0c <+28>:    je     0x7f6b928eab4e <GMPy_current_context+94>
   0x00007f6b928eab0e <+30>:    callq  *0x7087c(%rip)        # 0x7f6b9295b390
   0x00007f6b928eab14 <+36>:    mov    %rax,%rbp
   0x00007f6b928eab17 <+39>:    test   %rax,%rax

Is it possible that the cached_context could point to a thread-local state for a thread that has since exited?

Thanks and Kind Regards,
Tim

This is the traceback from the crash - gmpy2 is being used via sympy:

#8  <signal handler called>
#9  0x00007f6b928eab08 in GMPy_current_context () from ..../site-packages/gmpy2/gmpy2.cpython-38-x86_64-linux-gnu.so
#10 0x00007f6b929124ac in GMPy_RichCompare_Slot () from ..../site-packages/gmpy2/gmpy2.cpython-38-x86_64-linux-gnu.so
#11 0x000055fceaa200eb in do_richcompare (op=2, w=<mpz at remote 0x7f6b9298fc00>, v=<mpz at remote 0x7f6ab7ee9870>) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:726
#12 PyObject_RichCompare (op=2, w=<mpz at remote 0x7f6b9298fc00>, v=<mpz at remote 0x7f6ab7ee9870>) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:774
#13 PyObject_RichCompareBool (op=2, w=<mpz at remote 0x7f6b9298fc00>, v=<mpz at remote 0x7f6ab7ee9870>) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:796
#14 PyObject_RichCompareBool (op=2, w=<mpz at remote 0x7f6b9298fc00>, v=<mpz at remote 0x7f6ab7ee9870>) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:782
#15 tuplerichcompare (v=(0, <mpz at remote 0x7f6ab7ee9870>, 0, 2), w=(0, <mpz at remote 0x7f6b9298fc00>, 0, 0), op=2) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/tupleobject.c:655
#16 0x000055fceaa1ea60 in do_richcompare (op=2, w=(0, <mpz at remote 0x7f6b9298fc00>, 0, 0), v=(0, <mpz at remote 0x7f6ab7ee9870>, 0, 2)) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:726
#17 PyObject_RichCompare (v=(0, <mpz at remote 0x7f6ab7ee9870>, 0, 2), w=(0, <mpz at remote 0x7f6b9298fc00>, 0, 0), op=2) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:774
#18 0x000055fceaac25cb in cmp_outcome (w=(0, <mpz at remote 0x7f6b9298fc00>, 0, 0), v=(0, <mpz at remote 0x7f6ab7ee9870>, 0, 2), op=<optimized out>, tstate=<optimized out>)
    at /home/sat_bot/base/conda-bld/python_1648081724180/work/Python/ceval.c:5111

py-bt:

  File "..../site-packages/sympy/core/numbers.py", line 668, in _eval_evalf
    return Float._new(self._as_mpf_val(prec), prec)
  (frame information optimized out)
  File "..../site-packages/sympy/core/expr.py", line 912, in _eval_is_extended_negative
    return self._eval_is_extended_positive_negative(positive=False)
  File "..../site-packages/sympy/core/assumptions.py", line 501, in _ask
    a = evaluate(obj)
  File "..../site-packages/sympy/core/assumptions.py", line 513, in _ask
    _ask(pk, obj)

The text was updated successfully, but these errors were encountered:

casevh · 2022-07-28T05:14:50Z

IIRC, there have been some changes in handling thread_state in recent versions (3.9 or 3.10 ??).

What version of Python are you using?

Is it reproducible enough to justify the effort in testing with an older version, say 3.7?

tjb900 · 2022-07-28T06:05:06Z

This is with 3.8 - I haven't had time to work on a reproducer yet but that's my next step, certainly.

casevh · 2022-09-11T05:26:59Z

I found a simple way to reproduce the issue without involving any other libraries. It is a reference counting bug. Don't know where it is yet but I can trigger it within a few seconds. Interestingly, I can trigger with Python 3.7 but not any later versions.

casevh · 2022-12-24T03:36:18Z

TL;DR I'm fairly certain I've solved the issue.

We've recently made some changes and are starting to work on the next major release - version 2.2. The minimum supported version of Python is now 3.7. Contextvars were introduced in Python 3.7 as a replacement for using thread local storage to manage application contexts (such as gmpy2). I converted to using contextvars and my intermittent crashes have stopped.

Can you test your application compiling from the latest source?

Case

skirpichev · 2023-10-04T08:19:51Z

@tjb900, now there are binary wheels for 2.2.0a1. Can you reproduce the issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

possible issue (sigsegv) with cached_context #340

possible issue (sigsegv) with cached_context #340

tjb900 commented Jul 27, 2022

casevh commented Jul 28, 2022

tjb900 commented Jul 28, 2022

casevh commented Sep 11, 2022

casevh commented Dec 24, 2022

skirpichev commented Oct 4, 2023

possible issue (sigsegv) with cached_context #340

possible issue (sigsegv) with cached_context #340

Comments

tjb900 commented Jul 27, 2022

casevh commented Jul 28, 2022

tjb900 commented Jul 28, 2022

casevh commented Sep 11, 2022

casevh commented Dec 24, 2022

skirpichev commented Oct 4, 2023