Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add pybind11/gil_safe_call_once.h (to fix deadlocks in pybind11/numpy.h) #4877

Merged
merged 31 commits into from
Oct 12, 2023

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Oct 9, 2023

Description

See comments in pybind11/gil_safe_call_once.h for usage and technical details.

The primary (and quite common!) use case is initialization of C++ static variables that involves interactions with the CPython API.

The original motivation for adding pybind11/gil_safe_call_once.h was to fix deadlocks observed after deploying PR #4857 Google-internally (we had to roll back after a few hours).

Explanation in a nutshell (with a lot of help from @tkoeppe and @jbms):

  • static npy_api api = lookup(); introduces a hidden mutex: https://eel.is/c++draft/stmt.dcl#3
  • The GIL is a mutex.
  • 1 mutex + 1 mutex without lock ordering is a classical setup for deadlocks.

In retrospect: This was a problem all along. PR #4857 only made it much more noticeable.

Suggested changelog entry:

``pybind11/gil_safe_call_once.h`` was added (it needs to be included explicitly). The primary use case is GIL-safe initialization of C++ ``static`` variables.

Ralf W. Grosse-Kunstleve added 2 commits October 8, 2023 22:22
@rwgk
Copy link
Collaborator Author

rwgk commented Oct 9, 2023

@mtsokol @EthanSteinberg @henryiii FYI

Not yet ready for review.

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 9, 2023

For easy reference, a copy of the clang-tidy errors (GitHub Actions):

/__w/pybind11/pybind11/include/pybind11/numpy.h:65:5: error: use '= default' to define a trivial default constructor [modernize-use-equals-default,-warnings-as-errors]
    LazyInitializeAtLeastOnceDestroyNever() {}
    ^                                       ~~
                                            = default;
/__w/pybind11/pybind11/include/pybind11/numpy.h:66:5: error: use '= default' to define a trivial destructor [modernize-use-equals-default,-warnings-as-errors]
    ~LazyInitializeAtLeastOnceDestroyNever() {}
    ^                                        ~~
                                             = default;

@tkoeppe
Copy link
Contributor

tkoeppe commented Oct 9, 2023

No, you can't = default these special members, but you can add constexpr to them!

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 9, 2023

No, you can't = default these special members,

Thanks! That's what I suspected and yes = default makes it fail on godbolt.

but you can add constexpr to them!

Verified on godbolt, based on the link you shared with me: https://godbolt.org/z/jsE4MW99P

I'll try constexpr here.

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 9, 2023

Oh, hitting a wall much quicker than expected (g++ (Debian 13.2.0-4) 13.2.0):

g++ -o pybind11/tests/test_numpy_array.os -c -std=c++17 -fPIC -fvisibility=hidden -O0 -g -Wall -Wextra -Wconversion -Wcast-qual -Wdeprec
ated -Wundef -Wnon-virtual-dtor -Wunused-result -Werror -isystem /usr/include/python3.11 -isystem /usr/include/eigen3 -DPYBIND11_STRICT_
ASSERTS_CLASS_HOLDER_VS_TYPE_CASTER_MIX -DPYBIND11_ENABLE_TYPE_CASTER_ODR_GUARD_IF_AVAILABLE -DPYBIND11_TEST_BOOST -Ipybind11/include -I
/usr/local/google/home/rwgk/forked/pybind11/include -I/usr/local/google/home/rwgk/clone/pybind11/include /usr/local/google/home/rwgk/for
ked/pybind11/tests/test_numpy_array.cpp
In file included from /usr/local/google/home/rwgk/forked/pybind11/tests/test_numpy_array.cpp:10:
/usr/local/google/home/rwgk/forked/pybind11/include/pybind11/numpy.h:66:5: error: ‘constexpr’ destructors only available with ‘-std=c++2
0’ or ‘-std=gnu++20’
   66 |     constexpr ~LazyInitializeAtLeastOnceDestroyNever() {}
      |     ^~~~~~~~~

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 9, 2023

Also fails with C++17 on godbolt: https://godbolt.org/z/K8dP1fhv7

…e trick (also suggested by jbms@) is to add empty ctor + dtor."

This reverts commit e7b8c4f.
@rwgk
Copy link
Collaborator Author

rwgk commented Oct 9, 2023

I tried the aligned char storage alternative on godbolt:

https://godbolt.org/z/q6oEj4dP9

This code (copied from there) compiles:

    using LX = LazyInitializeAtLeastOnceDestroyNever<X>;
    static_assert(std::is_trivially_destructible<LX>::value, "");
    static LX impl;

The Google-internal reproducer for the original deadlock also passes with the aligned char storage alternative.

@tkoeppe You brought up elsewhere:

We need to make sure that this class is constant-initializable

If I understand the godbolt output (link above) correctly, that version of the class is not constant-initializable, is that true?

When is it critical that the class is constant-initializable?

(I'll try the aligned char storage alternative in the GitHub Actions here.)

@tkoeppe
Copy link
Contributor

tkoeppe commented Oct 9, 2023

It's not critical, but not having it be constant-initialized and destroyed means that you pay for a) a static guard flag (and its check) after all, and b) you add an element to the global destruction sequence.

This should work in C++17: https://godbolt.org/z/Wa79nKz6e

#include <cstdint>
#include <cstdlib>
#include <cstring>
#include <functional>
#include <numeric>
#include <sstream>
#include <stdalign.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this, this is a C interop header (and we're not writing C).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 109a165

template <typename T>
class LazyInitializeAtLeastOnceDestroyNever {
public:
template <typename Initialize>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document that you expect the GIL to be held, or in any case that calls are serialized. (Otherwise the mutating accesses would cause races.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 1ce2715

T &Get(Initialize &&initialize) {
if (!initialized_) {
assert(PyGILState_Check());
// Multiple threads may run this concurrently, but that is fine.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a bit misleading. Multiple threads may call Get, but multiple threads definitely mustn't get to this point! E.g. initialized_ is not an atomic variable and it'd be a race to access it concurrently.

What is true is that multiple threads may possibly reach the inside of initialize, but that's a separate matter.

Copy link
Collaborator Author

@rwgk rwgk Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm uncertain I understand the two points correctly.

My understanding:

From getting burned in the past, I've developed the strong belief that multiple threads do in fact get EDITconcurrently(see below) into the if (! initialized_) branch.

If the first thread calls back into the Python C API (e.g. for a Python import), Python can and does in the general case release the GIL. That gives other threads the opportunity to also pass the if (! initialized_) test, so the Python initialize() really can run multiple times.

But after the initialize() call we are sure to have the GIL again, therefore there is no race flipping initialized_ from false to true. One lucky thread gets there first. (All other threads that made it to the initialize() call will write the true again, but each under the GIL.) Only then will other threads no longer get into the if (! initialized_) branch.

Would it help to change the comment to like this?

// Multiple threads may reach `initialize()` EDIT~concurrently~(see below), but each holding the GIL at that time.

Fundamentally, the "at least once" aspect here isn't great. "Guaranteed once" would be better.

My thinking:

  • Let's get the "at least once" solution stable. It almost seems to be there, and it solves the original deadlock problem.

  • Then let's look at it again (this PR maybe) to see what it takes to achieve a "guaranteed once" implementation (right here in pybind11, within the limitations of C++11).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think when I said "multiple threads can call Get" that's not entirely right. By assuming that Get is only called with the GIL held, we effectively say that only one thread can call Get at a time. What I meant, but didn't say right, was that "mulitple threads can attept to lock the GIL and call Get".

From getting burned in the past, I've developed the strong belief that multiple threads do in fact get concurrently into the if (! initialized_) branch.

I'm not sure how this would happen, and if it did, it'd definitely be a bug, since that's a race.

If the first thread calls back into the Python C API (e.g. for a Python import), Python can and does in the general case release the GIL. That gives other threads the opportunity to also pass the if (! initialized_) test, so the Python initialize() really can run multiple times.

OK, but that is not "concurrently". Only the thread holding the GIL can be accessing initialized_. In order for one thread to release the GIL it must already have completed this access. (And here "completed" can be made precise in the sense of the memory model.)

I don't think "initialize() really can run multiple times" is a precise enough statement to allow for a meaningful interpretation. It is certainly true that multiple threads can exectue inside initialize, but it is (or definitely should) not be true that multiple threads reach the expression initialize() on L56 "concurrently", in the sense of "without sequencing". If one thread is inside initialize and another thread reaches L56, then the former thread must be blocked.

Copy link
Contributor

@tkoeppe tkoeppe Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can say something more concrete and specific to this situation:

// It is possible that multiple threads execute `Get` with `initialized_` still being
// false, and thus proceed to execute `initialize()`. For this to happen, `initialize`
// has to release and reacquire the GIL internally. We accept this, and expect the
// operation to be both idempotent and cheap.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I appear to be mixing up terms, which is critical here.

What is the correct term? (I want to edit-correct my comment.)

E.g.

I've developed the strong belief that multiple threads do in fact get concurrently CORRECT_TERM_HERE into the if (! initialized_) branch.

I want to say what you are saying.

Although all I originally wanted: leave a meaningful, terse, and correct comment in the code, just enough to give readers a good clue to follow up on in case they want to fully understand.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm seeing your suggested comment only now. Will adopt.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 1ce2715

Ralf W. Grosse-Kunstleve added 6 commits October 9, 2023 06:47
* `include\pybind11/numpy.h(24,10): fatal error C1083: Cannot open include file: 'stdalign.h': No such file or directory`

* @tkoeppe wrote: this is a C interop header (and we're not writing C)
```
include/pybind11/eigen/../numpy.h:63:53: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
         return *reinterpret_cast<T *>(value_storage_);
                                                     ^
```
Document PRECONDITION.

Adopt comment suggested by @tkoeppe: pybind#4877 (comment)
…ses:

```
g++ -o pybind11/tests/test_numpy_array.os -c -std=c++20 -fPIC -fvisibility=hidden -O0 -g -Wall -Wextra -Wconversion -Wcast-qual -Wdeprecated -Wundef -Wnon-virtual-dtor -Wunused-result -Werror -isystem /usr/include/python3.11 -isystem /usr/include/eigen3 -DPYBIND11_STRICT_ASSERTS_CLASS_HOLDER_VS_TYPE_CASTER_MIX -DPYBIND11_ENABLE_TYPE_CASTER_ODR_GUARD_IF_AVAILABLE -DPYBIND11_TEST_BOOST -Ipybind11/include -I/usr/local/google/home/rwgk/forked/pybind11/include -I/usr/local/google/home/rwgk/clone/pybind11/include /usr/local/google/home/rwgk/forked/pybind11/tests/test_numpy_array.cpp
```

```
In file included from /usr/local/google/home/rwgk/forked/pybind11/tests/test_numpy_array.cpp:10:
/usr/local/google/home/rwgk/forked/pybind11/include/pybind11/numpy.h: In static member function ‘static pybind11::detail::npy_api& pybind11::detail::npy_api::get()’:
/usr/local/google/home/rwgk/forked/pybind11/include/pybind11/numpy.h:258:82: error: ‘constinit’ variable ‘api_init’ does not have a constant initializer
  258 |         PYBIND11_CONSTINIT static LazyInitializeAtLeastOnceDestroyNever<npy_api> api_init;
      |                                                                                  ^~~~~~~~
```

```
In file included from /usr/local/google/home/rwgk/forked/pybind11/tests/test_numpy_array.cpp:10:
/usr/local/google/home/rwgk/forked/pybind11/include/pybind11/numpy.h: In static member function ‘static pybind11::object& pybind11::dtype::_dtype_from_pep3118()’:
/usr/local/google/home/rwgk/forked/pybind11/include/pybind11/numpy.h:697:13: error: ‘constinit’ variable ‘imported_obj’ does not have a constant initializer
  697 |             imported_obj;
      |             ^~~~~~~~~~~~
```
@rwgk
Copy link
Collaborator Author

rwgk commented Oct 9, 2023

Just to highlight with a full comment here: constinit didn't work for the current use cases: f07b28b

I think we'd need to get into class object and struct npy_api. Maybe better left for later?

@tkoeppe
Copy link
Contributor

tkoeppe commented Oct 9, 2023

Just to highlight with a full comment here: constinit didn't work for the current use cases: f07b28b

I think we'd need to get into class object and struct npy_api. Maybe better left for later?

You're missing the DMI on buffer_:

... buffer_ {};

@tkoeppe
Copy link
Contributor

tkoeppe commented Oct 9, 2023

Just to highlight with a full comment here: constinit didn't work for the current use cases: f07b28b
I think we'd need to get into class object and struct npy_api. Maybe better left for later?

You're missing the DMI on buffer_:

... buffer_ {};

Or rather:

alignas(T) char value_storage_[sizeof(T)] = {};

@EthanSteinberg
Copy link
Collaborator

This looks really good! One minor comment: This code will eventually be completely removed when we switch to numpy's public Python API in about two months.

assert(PyGILState_Check());
auto value = initialize();
if (!initialized_) {
new (reinterpret_cast<T *>(value_storage_)) T(std::move(value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner cast is bogus. Placement-new takes a void*, so you should at best cast to void*. However, this conversion is also available implicitly, and so unless you want to avoid hitting an overloaded placement new operator, you can just drop it:

new (value_storage_) T(args...);   // OK
::new (static_cast<void*>(value_storage_)) T(args...);  // Paranoidly correct

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be commit a864f21 (see below). Sorry I missed this before triggering the currently running GHA.

I'll work on moving things to the right places before running GHA again.

    Semi-paranoid placement new (based on https://github.com/pybind/pybind11/pull/4877#discussion_r1350573114).

diff --git a/include/pybind11/numpy.h b/include/pybind11/numpy.h
index 47050a36..1c76177f 100644
--- a/include/pybind11/numpy.h
+++ b/include/pybind11/numpy.h
@@ -66,7 +66,7 @@ public:
             assert(PyGILState_Check());
             auto value = initialize();
             if (!initialized_) {
-                new (reinterpret_cast<T *>(value_storage_)) T(std::move(value));
+                ::new (value_storage_) T(std::move(value));
                 initialized_ = true;
             }
         }

Comment on lines 78 to 79
~LazyInitializeAtLeastOnceDestroyNever()
= default;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this go on a single line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only with

// clang-format off
...
/// clang-format on

unfortunately. I feel it's better to just let clang-format do what it wants, but let me know if you prefer the override.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh OK, sure, never mind!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a nicer solution: PYBIND11_DTOR_CONSTEXPR

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 9, 2023

This looks really good! One minor comment: This code will eventually be completely removed when we switch to numpy's public Python API in about two months.

Thanks, but

  • The future is always uncertain.
  • I don't want to fall behind master for more than a few days at most.
  • LazyInitializeAtLeastOnceDestroyNever is here to stay, independently from numpy.h: It solves a general problem.

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 10, 2023

With the latest insights and removal of std::atomic, I feel we're done with the details of the lock juggling magic.

Regarding the API: we could just have

class gil_safe_call_once
  T &get(Callable &&fn)

but that will

  1. Most likely leave readers of client code wondering what the heck it is doing. They'll have to find and read documentation and/or source code. In aggregate over the lifetime of pybind11 that'll be a lot of lost time.

  2. See the diff in docs/advanced/exceptions.rst:

    PYBIND11_CONSTINIT static py::gil_safe_call_once_and_store<py::object> exc_storage;
    exc_storage.call_once_and_store_result(
        [&]() { return py::exception<MyCustomException>(m, "MyCustomError"); });
    py::register_exception_translator([](std::exception_ptr p) {
        try {
            if (p) std::rethrow_exception(p);
        } catch (const MyCustomException &e) {
            py::set_error(exc_storage.get_stored(), e.what());
        } catch (const OtherException &e) {
            py::set_error(PyExc_RuntimeError, e.what());
        }

What would that look like with the simpler API?

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 11, 2023

Unfortunately I developed a suspicion that the lock juggling magic is actually still not complete, and it does not want to go away:

     template <typename Callable>
     gil_safe_call_once_and_store &call_once_and_store_result(Callable &&fn) {
         if (!is_initialized_) { // This read is guarded by the GIL.
             // Multiple threads may enter here, because CPython API calls in the
             // `fn()` call below may release and reacquire the GIL.
             gil_scoped_release gil_rel; // Needed to establish lock ordering.
             std::call_once(once_flag_, [&] {
                 // Only one thread will ever enter here.
                 gil_scoped_acquire gil_acq;
                 ::new (storage_) T(fn());
                 is_initialized_ = true; // This write is guarded by the GIL.
             });
+            // Is it possible that one (or more) of the multiple threads entering above
+            // reaches this point BEFORE `is_initialized_` is true?
         }
         // Intentionally not returning `T &` to ensure the calling code is self-documenting.
         return *this;
     }

Under https://en.cppreference.com/w/cpp/thread/call_once I found this:

If, by the time std::call_once is called, flag indicates that f was already called, std::call_once returns right away (such a call to std::call_once is known as passive).

I think we need to name the threads:

  • There is one thread that makes the fn() call: "Lucky"
  • There are one or more different threads for which is_initiazlied_ is false but std::call_once "returns right away": "Limbo"

What we need is a mechanism that blocks the Limbo threads until the Lucky thread has flipped is_initialized_ from false to true.

Is that somehow happening already? If not, how can we fix that?

@tkoeppe
Copy link
Contributor

tkoeppe commented Oct 11, 2023

Under https://en.cppreference.com/w/cpp/thread/call_once I found this:

If, by the time std::call_once is called, flag indicates that f was already called, std::call_once returns right away (such a call to std::call_once is known as passive).

I think we need to name the threads:

  • There is one thread that makes the fn() call: "Lucky"
  • There are one or more different threads for which is_initiazlied_ is false but std::call_once "returns right away": "Limbo"

What we need is a mechanism that blocks the Limbo threads until the Lucky thread has flipped is_initialized_ from false to true.

Is that somehow happening already? If not, how can we fix that?

In the scenario you describe, the thread that calls std::call_once and that call returns right away must also see is_initialized_ being true on exit, since the return from std::call_once happens after the once-callee returns. I think this is described in the details of that document you linked.

To repeat: yes, is_initialized_ can definitely be observed as false by multiple thread on entry, but it is definitely true for all threads on exit. We don't check this, of course, but to reassure yourself you could try patching in assert(gil_held() && is_initialized_); after the call_once.

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 11, 2023

To repeat: yes, is_initialized_ can definitely be observed as false by multiple thread on entry, but it is definitely true for all threads on exit.

Awesome, thanks! I was just beginning to think that must be the case after inspecting the absl::call_once code and seeing this there. Reading the cppreference page with that in mind, my doubts began to go away.

We don't check this, of course, but to reassure yourself you could try patching in assert(gil_held() && is_initialized_); after the call_once.

I don't think we need that, that's too much on the side of unit-testing the std::call_once implementation, but I'll work on a comment that documents our conclusion here.

@tkoeppe
Copy link
Contributor

tkoeppe commented Oct 11, 2023

You shouldn't have to read any code to understand an API, especially not one as important as this one. If there's any aspect of any documentation that left you unsure about the semantics, please file appropriate bugs (or editorial issues).

@tkoeppe
Copy link
Contributor

tkoeppe commented Oct 11, 2023

I think the fact that "every return from call_once(once, f) happens after the return from f()" is an absolutely central feature of this facility; hopefully this is clear from its documentation(s) and also clear to users. It is very much like static-variable initialization in that regard, as indeed it is a user-controlled replacement for that. It can't really be anything else. A post-condition of "call once" is that a call happened (and completed).

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 11, 2023

You shouldn't have to read any code to understand an API, especially not one as important as this one. If there's any aspect of any documentation that left you unsure about the semantics, please file appropriate bugs (or editorial issues).

I think the documentation is great.

It's just me: The problem for me is that I'm only vaguely familiar with the definition of terms used there, including some that may seem basic to someone more knowledgable about threads. My first instinct in such situations (very common) is to look at the code.

@rwgk
Copy link
Collaborator Author

rwgk commented Oct 11, 2023

With all comments stripped out, and defines resolved for a modern compiler, the gil_safe_call_once_and_store implementation is a mere 23 non-empty lines btw:

template <typename T>
class gil_safe_call_once_and_store {
public:
    template <typename Callable>
    gil_safe_call_once_and_store &call_once_and_store_result(Callable &&fn) {
        if (!is_initialized_) {
            gil_scoped_release gil_rel;
            std::call_once(once_flag_, [&] {
                gil_scoped_acquire gil_acq;
                ::new (storage_) T(fn());
                is_initialized_ = true;
            });
        }
        return *this;
    }

    T &get_stored() { return *reinterpret_cast<T *>(storage_); }

    constexpr gil_safe_call_once_and_store() = default;
    constexpr ~gil_safe_call_once_and_store() = default;

private:
    alignas(T) char storage_[sizeof(T)] = {};
    std::once_flag once_flag_ = {};
    bool is_initialized_ = false;
};

Copy link
Collaborator

@EthanSteinberg EthanSteinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good to me now! Thanks for the explaining all the intricate parts.

Copy link
Contributor

@tkoeppe tkoeppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All pending comment threads are resolved as far as I'm concerned.

@rwgk rwgk changed the title Fix deadlocks in pybind11/numpy.h Add pybind11/gil_safe_call_once.h (to fix deadlocks in pybind11/numpy.h) Oct 12, 2023
@rwgk
Copy link
Collaborator Author

rwgk commented Oct 12, 2023

Thanks for all the help, and the reviews!

This PR was deployed Google-internally already a few hours ago.

@rwgk rwgk merged commit 0e2c3e5 into pybind:master Oct 12, 2023
85 checks passed
@rwgk rwgk deleted the numpy_h_v2_fix branch October 12, 2023 04:05
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Oct 12, 2023
gigony added a commit to gigony/cucim that referenced this pull request Oct 25, 2023
This PR updates pybind11 to v2.11.1.

Even with the latest version of pybind11, we still have an issue
with `pybind11::array_t` when cuCIM is used in multithread without
importing numpy in the main thread.

pybind/pybind11#4877

Will need to wait for the next release of pybind11.
gigony added a commit to gigony/cucim that referenced this pull request Oct 26, 2023
This PR updates pybind11 to v2.11.1.

Even with the latest version of pybind11, we still have an issue
with `pybind11::array_t` when cuCIM is used in multithread without
importing numpy in the main thread.

pybind/pybind11#4877

Will need to wait for the next release of pybind11.

Signed-off-by: Gigon Bae <gbae@nvidia.com>
gigony added a commit to gigony/cucim that referenced this pull request Oct 26, 2023
This applies the following patches to pybind11:

- pybind/pybind11#4857
- pybind/pybind11#4877

to avoid deadlock when using pybind11 without importing numpy in
multi-threaded environment.
gigony added a commit to gigony/cucim that referenced this pull request Oct 27, 2023
This PR updates pybind11 to v2.11.1.

Even with the latest version of pybind11, we still have an issue
with `pybind11::array_t` when cuCIM is used in multithread without
importing numpy in the main thread.

pybind/pybind11#4877

Will need to wait for the next release of pybind11.

Signed-off-by: Gigon Bae <gbae@nvidia.com>
gigony added a commit to gigony/cucim that referenced this pull request Oct 27, 2023
This applies the following patches to pybind11:

- pybind/pybind11#4857
- pybind/pybind11#4877

to avoid deadlock when using pybind11 without importing numpy in
multi-threaded environment.
rapids-bot bot pushed a commit to rapidsai/cucim that referenced this pull request Oct 30, 2023
…mmand (#618)

### Update Catch2 to v3.4.0

Without upgrading Catch2, The following error occurs when building on
Ubuntu 22.04 due to glibc:
```
cucim/build-debug/_deps/deps-catch2-src/single_include/catch2/catch.hpp:10830:58: error: call to non-‘constexpr’ function
‘long int sysconf(int)’
10830 |     static constexpr std::size_t sigStackSize = 32768 >=  MINSIGSTKSZ ? 32768 : MINSIGSTKSZ;
```
### Update pybind11 to v2.11.1

Even with the latest version of pybind11, we still have an issue
with `pybind11::array_t` when cuCIM is used in multithread without
importing numpy in the main thread.

See pybind/pybind11#4877

Will need to wait for the next release of pybind11.

### Use runtime option instead of using nvidia-docker command

nvidia-docker binary is not available if user doesn't install
nvidia-docker2 package. This change uses runtime option instead
of using nvidia-docker command.

### Apply pybind11 patch to avoid deadlock (until new release is available)

This applies the following patches to pybind11:

- pybind/pybind11#4857
- pybind/pybind11#4877

to avoid deadlock when using pybind11 without importing numpy in
multi-threaded environment.

Authors:
  - Gigon Bae (https://github.com/gigony)
  - Gregory Lee (https://github.com/grlee77)

Approvers:
  - Gregory Lee (https://github.com/grlee77)
  - https://github.com/jakirkham

URL: #618
@henryiii henryiii changed the title Add pybind11/gil_safe_call_once.h (to fix deadlocks in pybind11/numpy.h) feat: add pybind11/gil_safe_call_once.h (to fix deadlocks in pybind11/numpy.h) Nov 15, 2023
@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants