Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in queue_write_texture #3974

Closed
Uriopass opened this issue Jul 25, 2023 · 6 comments
Closed

Segfault in queue_write_texture #3974

Uriopass opened this issue Jul 25, 2023 · 6 comments
Labels
area: correctness We're behaving incorrectly type: bug Something isn't working

Comments

@Uriopass
Copy link
Contributor

Uriopass commented Jul 25, 2023

Description
A segfault happened to one of my players inside the queue_write_texture using wgpu 0.16.0

Repro steps
I don't have repro steps as I couldn't do it either, I guess this issue can still be useful if anyone has the same issue or if anyone can spot where the UB is just by looking at the queue_write_texture code (I have seen some unsafe in there)

Expected vs observed behavior
Expected: no segfaults (rare in Rust!)
Observed: segfault

Extra materials
The GDB backtrace from the segfault:

#0  __memcpy_avx_unaligned () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:220 
#1  0x00005555559e965d in wgpu_core::device::queue::<impl wgpu_core::hub::Global<G>>::queue_write_texture::hb616310d97ebec18 () 
#2  0x000055555598af9f in <wgpu::backend::direct::Context as wgpu::context::Context>::queue_write_texture::h580e3bf7dcbbbd5b () 
#3  0x000055555598e0eb in <T as wgpu::context::DynContext>::queue_write_texture::hb2d45c638ad32280 () 
#4  0x00005555558f2474 in wgpu::Queue::write_texture::h95b36af508f38989 () 
#5  0x00005555557b1f17 in egui_wgpu::renderer::Renderer::update_texture::had6c9e6960ebf2bf () 
#6  0x00005555556fe770 in native_app::rendering::egui_wrapper::EguiWrapper::render::h57be7e83faaa5fca () 
#7  0x000055555566cc78 in native_app::game_loop::State::render_gui::h34ce7a6a15d81aa3 ()
#8  0x00005555556daf23 in native_app::context::Context::start::{{closure}}::he8979e2826527909 () 
#9  0x00005555557751f9 in winit::platform_impl::platform::x11::EventLoop<T>::run_return::single_iteration::h828929b7df330c0f () 
#10 0x0000555555776ecb in winit::platform_impl::platform::x11::EventLoop<T>::run::h307485526e140859 () 
#11 0x00005555556d010f in winit::platform_impl::platform::EventLoop<T>::run::hbd35f510cd9a248a () 
#12 0x0000555555790e9b in native_app::context::Context::start::h630bc63e454c7644 () 
#13 0x00005555557476d5 in native_app::run::{{closure}}::hf2af93603059b38a () 
#14 0x000055555572a7eb in beul::execute::haab737fd1672577d () #15 0x00005555556bb7a8 in native_app::main::ha4864cd8825a879d () 
#16 0x00005555557296c3 in std::sys_common::backtrace::__rust_begin_short_backtrace::h8f19b6a3440fb64d () 
#17 0x0000555555729ae9 in std::rt::lang_start::{{closure}}::h44b77de3d8fe9669 () 
#18 0x0000555555f66995 in core::ops::function::impls::{impl#2}::call_once<(), (dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> () at library/core/src/ops/function.rs:284 
#19 std::panicking::try::do_call<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panicking.rs:485 
#20 std::panicking::try<i32, &(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe)> () at library/std/src/panicking.rs:449 
#21 std::panic::catch_unwind<&(dyn core::ops::function::Fn<(), Output=i32> + core::marker::Sync + core::panic::unwind_safe::RefUnwindSafe), i32> () at library/std/src/panic.rs:140 
#22 std::rt::lang_start_internal::{closure#2} () at library/std/src/rt.rs:148 
#23 std::panicking::try::do_call<std::rt::lang_start_internal::{closure_env#2}, isize> () at library/std/src/panicking.rs:485 
#24 std::panicking::try<isize, std::rt::lang_start_internal::{closure_env#2}> () at library/std/src/panicking.rs:449 
#25 std::panic::catch_unwind<std::rt::lang_start_internal::{closure_env#2}, isize> () at library/std/src/panic.rs:140 
#26 std::rt::lang_start_internal () at library/std/src/rt.rs:148 #27 0x00005555556bbb65 in main () 

Platform
Linux+winit with X11 and Wayland (both lead to segfault)

@teoxoy teoxoy added type: bug Something isn't working area: correctness We're behaving incorrectly labels Aug 1, 2023
@teoxoy teoxoy added this to the WebGPU Specification V1 milestone Aug 1, 2023
@teoxoy
Copy link
Member

teoxoy commented Aug 1, 2023

I suspect this is happening in the code below but don't see anything off there.

if stage_bytes_per_row == bytes_per_row {
profiling::scope!("copy aligned");
// Fast path if the data is already being aligned optimally.
unsafe {
ptr::copy_nonoverlapping(
data.as_ptr().offset(data_layout.offset as isize),
staging_buffer_ptr,
stage_size as usize,
);
}
} else {
profiling::scope!("copy chunked");
// Copy row by row into the optimal alignment.
let copy_bytes_per_row = stage_bytes_per_row.min(bytes_per_row) as usize;
for layer in 0..size.depth_or_array_layers {
let rows_offset = layer * block_rows_per_image;
for row in 0..height_blocks {
unsafe {
ptr::copy_nonoverlapping(
data.as_ptr().offset(
data_layout.offset as isize
+ (rows_offset + row) as isize * bytes_per_row as isize,
),
staging_buffer_ptr.offset(
(rows_offset + row) as isize * stage_bytes_per_row as isize,
),
copy_bytes_per_row,
);
}
}
}
}

@teoxoy
Copy link
Member

teoxoy commented Aug 1, 2023

Reading through the original issue, I wonder what is happening in the debug build since the SIGSEGV (Address boundary error) is only happening in release.

@teoxoy
Copy link
Member

teoxoy commented Jan 18, 2024

@Uriopass are you still running into this? If so, would you be able to put together a minimum reproducible example?

@Uriopass
Copy link
Contributor Author

Uriopass commented Jan 18, 2024

I couldn't reproduce :/ I cross-posted in hope that someone else would trigger it too.
I can ask the original issuer if they can try again. Should I wait until I update to wgpu 0.19? Or did wgpu 0.17/0.18 change anything?

@teoxoy
Copy link
Member

teoxoy commented Jan 19, 2024

I'm not aware of any changes that would have fixed this issue.

I can ask the original issuer if they can try again.

If they could produce a trace that we could replay to try to reproduce the issue, that would be great!
Here are some instructions on how to do that.

@teoxoy
Copy link
Member

teoxoy commented Jul 26, 2024

It turns out there was UB caused by by our use of copy_nonoverlapping.
It was fixed by 6f16ea4 (#5946), 7e112ca (#5973) and 205f1e3 (#6009).

@teoxoy teoxoy closed this as completed Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: correctness We're behaving incorrectly type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants