Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not catching panics across FFI boundaries #74

Open
SimonSapin opened this issue May 21, 2019 · 19 comments
Open

Not catching panics across FFI boundaries #74

SimonSapin opened this issue May 21, 2019 · 19 comments
Labels
bug Something isn't working

Comments

@SimonSapin
Copy link
Contributor

gtk-rs/cairo#257 added bindings for “user data” owned by cairo objects. Each user data entry has a destructor function pointer that cairo calls when the object is destroyed. This helps solve life time issues, since with reference counting it can be hard to predict when an object will be destroyed exactly.

The bindings are generic and accept user data of any type. They forward destructor function pointer calls to Drop, which potentially panic and unwind. However, until rust-lang/rust#58794 is resolved, unwinding into C is undefined behavior.

A possible fix is using std::panic::catch_unwind in the destructor function. However even if we stash the panic payload somewhere at that point, there is no good place in the code to call resume_unwind. So the best we can do might be to abort the process.

@sdroege
Copy link
Member

sdroege commented May 21, 2019

Abort seems the best, yes.

@SimonSapin
Copy link
Contributor Author

Or wait until the compiler does this abort for us? I think that’s the expected eventual resolution of rust-lang/rust#58794 (though with an opt-out that I’m not sure is implemented yet.)

@sdroege
Copy link
Member

sdroege commented May 21, 2019

Personally I'd wait for the compiler, I don't want to change everything back yet another time only to revert the change again in a bit.

@sdroege
Copy link
Member

sdroege commented Nov 10, 2020

Made this issue a bit more generic. This applies to all gtk-rs code currently.

@jhg
Copy link

jhg commented Dec 25, 2020

Are there news about if the compiler implement it or not yet?

@sdroege
Copy link
Member

sdroege commented Dec 25, 2020

There's a whole WG about that: https://rust-lang.github.io/rfcs/2797-project-ffi-unwind.html / https://github.com/rust-lang/project-ffi-unwind . It's not finished yet.

@jhg
Copy link

jhg commented Dec 26, 2020

Thanks for the information @sdroege 🤗

@SimonSapin
Copy link
Contributor Author

The status quo is that unwinding across FFI is Undefined Behavior. Therefore all callbacks called from C should be wrapped into something that uses catch_unwind + abort.

Only if and when that WG propose languages changes that end up implemented, maybe that wrapping will become unnecessary.

@jhg
Copy link

jhg commented Jan 4, 2021

@SimonSapin I found rust-lang/rust#58794 is closed and in rust-lang/rust#58760 link to rust-lang/rust#55982 merged PR. Is that also UB when the FFI is Rust function to be used from C or that fix the UB? (even if it's in nightly only yet and it is not unwind but at least is not undefined, it will abort).

And thanks about the information about FFI and unwind.

@sdroege
Copy link
Member

sdroege commented Jan 4, 2021

Currently any unwinding across extern "C" functions is UB, even if all those functions happens to be implemented in Rust. That's part of what that WG is working on solving.

For example this adds support for an extern "C-unwind" ABI that explicitly allows unwinding (and AFAIU causes unwinding through extern "C" to abort as it should).

@bilelmoussaoui
Copy link
Member

@sdroege this should be moved to gtk-rs-core

@sdroege sdroege transferred this issue from gtk-rs/gtk3-rs May 18, 2021
@sdroege sdroege added the bug Something isn't working label May 18, 2021
@piegamesde
Copy link
Contributor

I don't understand one thing in here: the initial issue description is specifically about Cairo "user data". But doesn't the problem also apply in any signal callbacks? I've already panicked in my code more than once and it's never blown up badly, so was it just by luck?

@sdroege
Copy link
Member

sdroege commented Jul 8, 2021

Yes, it's about any callbacks that go through FFI.

@piegamesde
Copy link
Contributor

piegamesde commented Jul 8, 2021

So if I'm getting a stack trace like

   2: glib::source::fnmut_callback_wrapper_local::{{closure}}
             at ~/.cache/cargo/registry/src/github.com-1ecc6299db9ec823/glib-0.14.0/src/source.rs:180:9
   3: glib::source::trampoline
             at ~/.cache/cargo/registry/src/github.com-1ecc6299db9ec823/glib-0.14.0/src/source.rs:90:5
   4: <unknown>
   5: g_main_context_dispatch
   6: <unknown>
   7: g_main_context_iteration
   8: g_application_run
   9: <O as gio::application::ApplicationExtManual>::run_with_args
             at ~/.cache/cargo/registry/src/github.com-1ecc6299db9ec823/gio-0.14.0/src/application.rs:30:13

then that's totally not guaranteed to work in the general case?


If I understand the linked threads above correctly, then this ought to work in a defined matter some time in the future. Nevertheless, I'd argue for running signal callbacks in a catch_unwind. This would allow at least a minimum of clean shutdown within Gtk. Because the current behavior straight up aborts the Gtk main loop, leaving any resources open.

How do other language bindings handle exceptions?

@sdroege
Copy link
Member

sdroege commented Jul 8, 2021

If I understand the linked threads above correctly, then this ought to work in a defined matter some time in the future. Nevertheless, I'd argue for running signal callbacks in a catch_unwind.

Something like that existed in gtk-rs for a long time. Then Rust changed to make it defined behaviour (directly abort the process) and we removed it. Then this change was reverted again and we reverted the removal. Then Rust again made it defined behaviour and we removed the gtk-rs code. And again it was reverted and promised to be added again in the near future.

And here we are now :) It seems like it's really close (I hope for real) now and actually works like that in nightly already if I understand correctly.

The behaviour in any case would be to kill the process because that's the only thing you can safely do in such situations.

How do other language bindings handle exceptions?

Unwinding across language boundaries is generally a bad idea and I'm not aware of any language binding where that is handled in any other way than UB or killing the process.

@sdroege
Copy link
Member

sdroege commented Aug 11, 2021

There are some updates in rust-lang/rust#86155 (comment) . It looks like this is finally going to be solved in Rust in the foreseeable future.

@A6GibKm
Copy link
Contributor

A6GibKm commented Aug 1, 2023

"C-unwind" was stabilized in 1.71, see https://blog.rust-lang.org/2023/07/13/Rust-1.71.0.html#c-unwind-abi.

@sdroege
Copy link
Member

sdroege commented Aug 7, 2023

That's not sufficient. What we need is that the "C" ABI aborts on panics across FFI boundaries again, which is what it's supposed to do.

With "C-unwind" we could allow panics across FFI boundaries in certain selected cases. Like, the callback to g_list_foreach() can panic and unwind from Rust to C to Rust without problems.

@sdroege
Copy link
Member

sdroege commented Jul 2, 2024

With 1.81 this should be solved finally: rust-lang/rust#116088

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants