Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible issue (sigsegv) with cached_context #340

Open
tjb900 opened this issue Jul 27, 2022 · 5 comments
Open

possible issue (sigsegv) with cached_context #340

tjb900 opened this issue Jul 27, 2022 · 5 comments

Comments

@tjb900
Copy link

tjb900 commented Jul 27, 2022

Hi!

I've been tracking down a very rare crash in our application, and I think I've reached a point where I can see a problem in gmpy2 - would very much appreciate if you could either tell me I'm dreaming or not!

I believe we are crashing at this line:

/* Return borrowed reference to thread local context. */
static CTXT_Object *
GMPy_current_context(void)
{
    PyThreadState *tstate = PyThreadState_GET();

    if (cached_context && cached_context->tstate == tstate) {     <=== second part of this is a segfault, dereferencing cached_context
        return (CTXT_Object*)cached_context;
    }

    return current_context_from_dict();
}

gdb dump showing instruction where fault occurred

Dump of assembler code for function GMPy_current_context:
   0x00007f6b928eaaf0 <+0>:     push   %rbp
   0x00007f6b928eaaf1 <+1>:     push   %rbx
   0x00007f6b928eaaf2 <+2>:     sub    $0x8,%rsp

# I suspect this is the call to PyThreadState_GET
   0x00007f6b928eaaf6 <+6>:     callq  *0x70dec(%rip)        # 0x7f6b9295b8e8
   0x00007f6b928eaafc <+12>:    mov    0x77305(%rip),%rbx        # 0x7f6b92961e08 <cached_context>

# I'm fairly sure this the `if cached_context`
   0x00007f6b928eab03 <+19>:    test   %rbx,%rbx
   0x00007f6b928eab06 <+22>:    je     0x7f6b928eab0e <GMPy_current_context+30>

# and then this is the `if cached_context->tstate == tstate` - and that's where the crash is
=> 0x00007f6b928eab08 <+24>:    cmp    %rax,0x78(%rbx)
   0x00007f6b928eab0c <+28>:    je     0x7f6b928eab4e <GMPy_current_context+94>
   0x00007f6b928eab0e <+30>:    callq  *0x7087c(%rip)        # 0x7f6b9295b390
   0x00007f6b928eab14 <+36>:    mov    %rax,%rbp
   0x00007f6b928eab17 <+39>:    test   %rax,%rax

Is it possible that the cached_context could point to a thread-local state for a thread that has since exited?

Thanks and Kind Regards,
Tim


This is the traceback from the crash - gmpy2 is being used via sympy:

#8  <signal handler called>
#9  0x00007f6b928eab08 in GMPy_current_context () from ..../site-packages/gmpy2/gmpy2.cpython-38-x86_64-linux-gnu.so
#10 0x00007f6b929124ac in GMPy_RichCompare_Slot () from ..../site-packages/gmpy2/gmpy2.cpython-38-x86_64-linux-gnu.so
#11 0x000055fceaa200eb in do_richcompare (op=2, w=<mpz at remote 0x7f6b9298fc00>, v=<mpz at remote 0x7f6ab7ee9870>) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:726
#12 PyObject_RichCompare (op=2, w=<mpz at remote 0x7f6b9298fc00>, v=<mpz at remote 0x7f6ab7ee9870>) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:774
#13 PyObject_RichCompareBool (op=2, w=<mpz at remote 0x7f6b9298fc00>, v=<mpz at remote 0x7f6ab7ee9870>) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:796
#14 PyObject_RichCompareBool (op=2, w=<mpz at remote 0x7f6b9298fc00>, v=<mpz at remote 0x7f6ab7ee9870>) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:782
#15 tuplerichcompare (v=(0, <mpz at remote 0x7f6ab7ee9870>, 0, 2), w=(0, <mpz at remote 0x7f6b9298fc00>, 0, 0), op=2) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/tupleobject.c:655
#16 0x000055fceaa1ea60 in do_richcompare (op=2, w=(0, <mpz at remote 0x7f6b9298fc00>, 0, 0), v=(0, <mpz at remote 0x7f6ab7ee9870>, 0, 2)) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:726
#17 PyObject_RichCompare (v=(0, <mpz at remote 0x7f6ab7ee9870>, 0, 2), w=(0, <mpz at remote 0x7f6b9298fc00>, 0, 0), op=2) at /home/sat_bot/base/conda-bld/python_1648081724180/work/Objects/object.c:774
#18 0x000055fceaac25cb in cmp_outcome (w=(0, <mpz at remote 0x7f6b9298fc00>, 0, 0), v=(0, <mpz at remote 0x7f6ab7ee9870>, 0, 2), op=<optimized out>, tstate=<optimized out>)
    at /home/sat_bot/base/conda-bld/python_1648081724180/work/Python/ceval.c:5111

py-bt:

  File "..../site-packages/sympy/core/numbers.py", line 668, in _eval_evalf
    return Float._new(self._as_mpf_val(prec), prec)
  (frame information optimized out)
  File "..../site-packages/sympy/core/expr.py", line 912, in _eval_is_extended_negative
    return self._eval_is_extended_positive_negative(positive=False)
  File "..../site-packages/sympy/core/assumptions.py", line 501, in _ask
    a = evaluate(obj)
  File "..../site-packages/sympy/core/assumptions.py", line 513, in _ask
    _ask(pk, obj)
@casevh
Copy link
Collaborator

casevh commented Jul 28, 2022

IIRC, there have been some changes in handling thread_state in recent versions (3.9 or 3.10 ??).

What version of Python are you using?

Is it reproducible enough to justify the effort in testing with an older version, say 3.7?

@tjb900
Copy link
Author

tjb900 commented Jul 28, 2022

This is with 3.8 - I haven't had time to work on a reproducer yet but that's my next step, certainly.

@casevh
Copy link
Collaborator

casevh commented Sep 11, 2022

I found a simple way to reproduce the issue without involving any other libraries. It is a reference counting bug. Don't know where it is yet but I can trigger it within a few seconds. Interestingly, I can trigger with Python 3.7 but not any later versions.

@casevh
Copy link
Collaborator

casevh commented Dec 24, 2022

TL;DR I'm fairly certain I've solved the issue.

We've recently made some changes and are starting to work on the next major release - version 2.2. The minimum supported version of Python is now 3.7. Contextvars were introduced in Python 3.7 as a replacement for using thread local storage to manage application contexts (such as gmpy2). I converted to using contextvars and my intermittent crashes have stopped.

Can you test your application compiling from the latest source?

Case

@skirpichev
Copy link
Contributor

@tjb900, now there are binary wheels for 2.2.0a1. Can you reproduce the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants