Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add #[pyo3(allow_threads)] to release the GIL in (async) functions #3610

Closed
wants to merge 1 commit into from

Conversation

wyfo
Copy link
Contributor

@wyfo wyfo commented Nov 30, 2023

Relates to #1632

Draft based on #3609

@wyfo wyfo marked this pull request as draft November 30, 2023 02:59
@wyfo wyfo force-pushed the allow_threads branch 3 times, most recently from 32c2050 to 77b24fa Compare November 30, 2023 03:11
Copy link

codspeed-hq bot commented Nov 30, 2023

CodSpeed Performance Report

Merging #3610 will not alter performance

Comparing wyfo:allow_threads (f51c087) with main (7263fa9)

Summary

✅ 67 untouched benchmarks

@wyfo wyfo force-pushed the allow_threads branch 7 times, most recently from 905f1eb to 3bc7d67 Compare December 7, 2023 07:55
@wyfo wyfo marked this pull request as ready for review December 7, 2023 07:57
@wyfo
Copy link
Contributor Author

wyfo commented Dec 7, 2023

@davidhewitt Here is the next one! I'm more explicit now 😄

src/coroutine.rs Outdated Show resolved Hide resolved
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

src/coroutine.rs Outdated Show resolved Hide resolved
@davidhewitt davidhewitt added this pull request to the merge queue Dec 9, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 9, 2023
@adamreichold
Copy link
Member

@davidhewitt I have a few minutes now and the merge queue failed. If we could hold off on merging I could have a look as well?

@@ -16,7 +16,7 @@ error: expected argument from function definition `y` but got argument `x`
13 | #[pyo3(signature = (x))]
| ^

error: expected one of: `name`, `pass_module`, `signature`, `text_signature`, `crate`
error: expected one of: `allow_threads`, `name`, `pass_module`, `signature`, `text_signature`, `crate`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I would really want to have here is a test that checks that I cannot get py: Python<'_> as my argument when using allow_threads. This should not be possible to do as the GIL is released, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Ideally, it would also have a nice error message explaining why these two contradict each other.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would have a compilation error, because Python is not Ungil, but you're right, I will add that case in the UI tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As expected, there is a compilation error, but the output will not make @davidhewitt happy 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add an error message for Python, but not really for things like &PyAny

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uff. Could you split the test into two, one for the GIL token and for the GIL bound reference?

For the GIL token: Since we are injecting this with our proc macro machinery, we should be able to do something about the error message for this one, shouldn't we?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add an error message for Python, but not really for things like &PyAny

Indeed, but I think we should do what we can. Sorry for putting more work on your table.

Copy link
Contributor Author

@wyfo wyfo Dec 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an issue, it will just wait a few hours but I will do the modification 😃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just for coverage, we should also duplicate the compiler error test for plain and async functions.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a check for Python<'_> field for both allow_threads and async functions. The error raised by Bound<'_, PyAny> for these cases is still ugly, but as said earlier, I don't think it's possible to add a pretty error message, and this is already the case for Python::allow_threads tests in ui/not_send.rs and ui/not_send2.rs.

@davidhewitt
Copy link
Member

Sure thing, I'll leave this for you to review and merge 👍

src/coroutine.rs Outdated Show resolved Hide resolved
src/coroutine.rs Outdated Show resolved Hide resolved
src/coroutine.rs Outdated
self: Pin<&mut Self>,
py: Python<'_>,
waker: &Waker,
allow_threads: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be me going off into the weeds, but we turning a compile time information into a runtime flag here and generating code to handle both cases even only one will ever be executed.

Hence, I wonder if we should use a const generic const ALLOW_THREADS: bool on Coroutine to ensure that only the required code is generated?

Copy link
Contributor Author

@wyfo wyfo Dec 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, that's what I have in my POC. I'd made the whole Coroutine generic and boxed it in the #[pyclass].
But I decide to go for simplicity here. I can change that if you think the performance impact will be significant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think simplicity is certainly good for the first iteration and this could definitely be a follow-up PR.

That said, I was wondering if we could hide this behind the boxed CoroutineFuture trait to avoid generics all the way up, e.g. a marker wrapper like

struct AllowThreads<F>(F);

impl<F, T, E> CoroutineFuture for AllowThreads<F>
where
    F: Future<Output = Result<T, E>> + Send,
    T: IntoPy<PyObject> + Send,
    E: Into<PyErr> + Send;

but I am not sure how well this can be tucked away to avoid complicating the instantiation too much.

Copy link
Contributor Author

@wyfo wyfo Apr 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

EDIT: I've kept allow_thread as a function parameter instead of generic parameter in the constructor, because it will be more user-friendly when the constructor will be made public.

@wyfo
Copy link
Contributor Author

wyfo commented Apr 18, 2024

#4033 broke some code here, I will update the PR.
Could I ask then for a new review? @adamreichold? I would like to unblock the async PRs to close the async chapter for the next release.

@wyfo wyfo force-pushed the allow_threads branch 2 times, most recently from 40c0b9f to 08948df Compare April 25, 2024 09:19
@wyfo wyfo force-pushed the allow_threads branch from 08948df to f51c087 Compare May 7, 2024 08:26
Copy link
Contributor

@Icxolu Icxolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, but this is definitely a new area for me, so someone with more experience here should have a look as well.

We might additionally want a test to verify that for (async) allow_threads methods only &self/&mut self receivers are allowed and Bound/PyRef/PyRefMut are rejected.

fn without_gil(_arg1: PyObject, _arg2: PyObject) {
GIL_COUNT.with(|c| assert_eq!(c.get(), 0));
}
Python::with_gil(|gil| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally call the Python token py. I think we should stick with that for consistency here as well.

@@ -88,10 +125,10 @@ impl Coroutine {
} else {
self.waker = Some(Arc::new(AsyncioWaker::new()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can split this here and do something like

let waker = self
    .waker
    .get_or_insert_with(|| Arc::new(AsyncioWaker::new()));

after the (possible) reset, to avoid the unwrap()s below?

@@ -54,6 +54,6 @@ fn test_compile_errors() {
t.compile_fail("tests/ui/invalid_pymodule_two_pymodule_init.rs");
#[cfg(feature = "experimental-async")]
#[cfg(any(not(Py_LIMITED_API), Py_3_10))] // to avoid PyFunctionArgument for &str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given you've changed the argument types to i32, I think this comment/feature gate is not accurate anymore.

{
ensure_spanned!(
self.asyncness.is_none(),
py_arg.ty.span() => "GIL token cannot be passed to async function"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 it looks like this already doesn't work so this makes the error message nicer. One suggestion given the GIL is potentially an outdated idea:

Suggested change
py_arg.ty.span() => "GIL token cannot be passed to async function"
py_arg.ty.span() => "`Python<'_>` token cannot be passed to async functions"

);
ensure_spanned!(
self.allow_threads.is_none(),
py_arg.ty.span() => "GIL cannot be held in function annotated with `allow_threads`"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here I think we want to just name the type:

Suggested change
py_arg.ty.span() => "GIL cannot be held in function annotated with `allow_threads`"
py_arg.ty.span() => "`Python<'_>` cannot be passed to a function annotated with `allow_threads`"

quote! {{
#self_arg_decl
#(let #arg_names = #args;)*
py.allow_threads(|| function(#self_arg_name #(#arg_names),*))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a question in my head here about after #3646 which of the allow_threads variants we will use here (cc @adamreichold). I think it would be possible to create a scoped TLS issue by setting up a TLS scope and then calling into Python. Similar thinking applies to the async case below. I guess we have to stick to the principle that the soundness is important and use the "safe but slow" form for this attribute. If users want to go unsafe they can release the GIL themselves. This feels the right way to do it, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels the right way to do it, I think.

Yes, I think so too. Especially since we have already accumulated a significant amount of high risk changes for 0.22 and we can always optimize things in following releases.

@davidhewitt
Copy link
Member

@wyfo I may try to rebase and merge this tomorrow at the PyCon sprints. Any objections?

@davidhewitt davidhewitt mentioned this pull request Jun 16, 2024
5 tasks
@itamarst
Copy link
Contributor

If anyone has time to work on this, it could make a PR I need to write have rather smaller diffs 😁

@Icxolu
Copy link
Contributor

Icxolu commented Oct 31, 2024

@davidhewitt If we're good with including this in 0.23 I could go rebase this and fixup the last final remarks from above.

@davidhewitt
Copy link
Member

I would love to have this, but I am worried about interactions with #3646. I think merging this before that PR is potentially a problem (might have breakage), and I am worried that PR is necessary but also makes async extremely complicated (because that PR basically disables all TLS with allow_threads, but the tokio runtime is frequently accessed by TLS, for example). I haven't had a chance to sit down and reason through all of this myself, however I would be happy to hand over for someone else to evaluate the design space and understand how bad the interaction between #3646 and async is.

@itamarst
Copy link
Contributor

itamarst commented Nov 1, 2024

(My personal use case is non-async functions, so I'd be happy with this feature to land with initially just supporting normal functions. But I do see why this would be much more of a useful thing for async functions.)

@mejrs
Copy link
Member

mejrs commented Nov 1, 2024

The interaction between async runtimes, threadlocals and #3646 remains whether or not this PR is merged. Is your concern that we'll have more fallout if we merge this and then that?

@wyfo
Copy link
Contributor Author

wyfo commented Nov 1, 2024

@davidhewitt Why does the PR disable all TLS?

@mejrs
Copy link
Member

mejrs commented Nov 2, 2024

@davidhewitt Why does the PR disable all TLS?

Merging that PR means that allow_threads runs its closures on a threadpool, so it's executed on a different thread. It'd be surprising to people who are accessing threadlocals from it.

@davidhewitt
Copy link
Member

The interaction between async runtimes, threadlocals and #3646 remains whether or not this PR is merged. Is your concern that we'll have more fallout if we merge this and then that?

Yes, I still feel like we should aim to do something like #3646 but the more I look at it the more extreme the fallout seems to be. Merging this seems to encourage use of allow_threads by making it really cheap to slap as an attribute on basically everything. Which I do think is probably a good thing, but if #3646 then goes and breaks all those, I'm worried about getting the order of these additions right.

@itamarst
Copy link
Contributor

itamarst commented Nov 2, 2024

Merging that PR means that allow_threads runs its closures on a threadpool, so it's executed on a different thread. It'd be surprising to people who are accessing threadlocals from it.

As a user I would find the use of a thread pool very surprising and it would make #[pyo3(allow_threads)] into something I probably can't use. As context I'm looking at using this in Polars Python APIs, that all need to release the GIL before calling into Polars' internal thread pool (otherwise you get deadlocks). #[pyo3(allow_threads)] resulting in dispatching to a completely different thread pool first would I suspect be undesirable to Polars core devs.

(From my personal perspective the thread pool option seems very intrusive and in my own projects I would likely default to the unsound-if-you-put-Python-objects-into-TLS version that runs in current thread.)

More practically for this PR: there could be two different attributes, one for each proposed allow_threads strategy?

@Icxolu
Copy link
Contributor

Icxolu commented Nov 2, 2024

More practically for this PR: there could be two different attributes, one for each proposed allow_threads strategy?

I don't think we can add an attribute for any unsafe strategy. These rely on safety invariants that we can't validate within our macro code. I think it is important that this is spelled out in the code and not hidden within an attribute.

@wyfo
Copy link
Contributor Author

wyfo commented Nov 2, 2024

@davidhewitt
Copy link
Member

Yes, though it's still in RFC and we might want to see how that works out: rust-lang/rfcs#3715

@wyfo
Copy link
Contributor Author

wyfo commented Jan 14, 2025

@davidhewitt I've just realized that this feature shouldn't be an issue regarding TLS smuggling. Indeed, functions annotated with #[pyfunction(allow_threads)] are called from Python, and the allow_threads() call is done in the macro generated code, with no possibility to set a TLS before.
Better, annotate a function with #[pyfunction(allow_threads)] would be a safe way to release the GIL without needing to use a different thread.
Did I miss something?

@itamarst
Copy link
Contributor

itamarst commented Jan 14, 2025

You can have a stack that goes back and forth between Rust and Python, it isn't necessarily just Python → Rust:

Python function → Rust function that stores something in TLS → Python function → Rust function annotated with #[pyfunction(allow_threads)] which reads from TLS

@davidhewitt
Copy link
Member

Yes, it is exactly that kind of mixed language call stack which is my worry.

I really want this feature, and I think the practical way forward is probably to use the unsafe attribute as you suggest, though the RFC is in limbo.

I will be speaking at RustNation in a few weeks and will be bringing up this TLS pain as part of my talk, so maybe it will spur some bright minds in the Rust ecosystem to help us figure out new ways forward we haven't yet seen.

@wyfo
Copy link
Contributor Author

wyfo commented Jan 15, 2025

Indeed, sorry about the noise.

Anyway, let's recap for once why this PR exists. We don't care about synchronous function here, as I supported them only for the symmetry with asynchronous functions. The last one needed a special support, because to release the GIL at each poll, you need to use the unsafe but guaranteed assumption that the GIL is acquired when poll is called. So it was to avoid the user to type the unsafe keyword.
However, now that allow_threads become unsafe, we no longer care about saving this Python::assume_gil_acquired() to the user, has unsafe become inevitable. So it's up to the user to wrap his future in a newtype doing unsafe { Python::assume_gil_acquired().allow_threads_unsafe(/* call poll */) }. If the user wants to avoid unsafe with the allow_threads_safe, then the cost of the safe Python::with_gil becomes negligible anyway compared to using a second thread.

Therefore, I will simply close this PR.
I've a bit of time now, so I will try get back to the rest of the PR series, to finally have the full async support merged!

@wyfo wyfo closed this Jan 15, 2025
@davidhewitt
Copy link
Member

davidhewitt commented Jan 15, 2025

Ok, makes sense and awesome to hear you have some capacity!

I must apologise in advance as I am going extremely slowly recently and unlikely to be back to full capacity until March. But I will do my best to review anything you push, async support got stuck for too long.

Since we last collaborated on this I think I saw new Waker APIs got stabilised for data, so maybe new optimisations are possible. We also had to bring pyo3-asyncio into the PyO3 org as pyo3-async-runtimes.

@wyfo
Copy link
Contributor Author

wyfo commented Jan 15, 2025

Since we last collaborated on this I think I saw new Waker APIs got stabilised for data, so maybe new optimisations are possible.

It has indeed, so I could use that to save me my dirty TLS trick in the next PR. However, it would imply to radically bump the MSRV to 1.83, so I don't think we want to do that for now. However, I will let a TODO as a reminder for the next bump.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants