Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in FlowControl::send_data (self.window_size >= sz as usize) #607

Closed
nox opened this issue Mar 3, 2022 · 18 comments
Closed

Panic in FlowControl::send_data (self.window_size >= sz as usize) #607

nox opened this issue Mar 3, 2022 · 18 comments

Comments

@nox
Copy link
Contributor

nox commented Mar 3, 2022

Application Specific Information:
assertion failed: self.window_size >= sz as usize

Thread 0 Crashed:
0   std                             0x5616f493ed64      std::sys_common::backtrace::__rust_end_short_backtrace (backtrace.rs:139)
1   <unknown>                       0x5616f49407e9      rust_begin_unwind (panicking.rs:498)
2   core                            0x5616f3628131      core::panicking::panic_fmt (panicking.rs:107)
3   core                            0x5616f362807d      core::panicking::panic (panicking.rs:48)
4   h2                              0x5616f477fa60      h2::proto::streams::flow_control::FlowControl::send_data (flow_control.rs:176)
5   h2                              0x5616f39efd4f      [inlined] h2::proto::streams::prioritize::Prioritize::pop_frame::{{closure}} (prioritize.rs:765)
6   tracing                         0x5616f39efd4f      tracing::span::Span::in_scope (span.rs:1100)
7   h2                              0x5616f36454e6      h2::proto::streams::prioritize::Prioritize::pop_frame (prioritize.rs:763)
8   h2                              0x5616f363a563      h2::proto::streams::prioritize::Prioritize::poll_complete (prioritize.rs:505)
9   h2                              0x5616f39dca72      [inlined] h2::proto::streams::send::Send::poll_complete (send.rs:301)
10  h2                              0x5616f39dca72      [inlined] h2::proto::streams::streams::Inner::poll_complete (streams.rs:852)
11  h2                              0x5616f39dca72      h2::proto::streams::streams::Streams<T>::poll_complete (streams.rs:181)
12  h2                              0x5616f39264e1      h2::proto::connection::Connection<T>::poll (connection.rs:266)
13  h2                              0x5616f383e761      [inlined] h2::server::Connection<T>::poll_closed (server.rs:504)
14  h2                              0x5616f383e761      h2::server::Connection<T>::poll_accept (server.rs:416)
15  hyper                           0x5616f3633b6d      [inlined] hyper::proto::h2::server::Serving<T>::poll_server (server.rs:268)
16  hyper                           0x5616f3633b6d      hyper::proto::h2::server::Server<T>::poll (server.rs:202)
17  hyper                           0x5616f382c015      [inlined] hyper::server::conn::ProtoServer<T>::poll (conn.rs:1135)
18  hyper                           0x5616f382c015      hyper::server::conn::upgrades::UpgradeableConnection<T>::poll (conn.rs:1337)
19  core                            0x5616f3b2ba9b      [inlined] core::pin::Pin<T>::poll (future.rs:119)
20  oxy_http_server                 0x5616f3b2ba9b      [inlined] oxy_http_server::http_impl::Http1And2::serve_requests_from_connection::{{closure}}::{{closure}} (select.rs:505)
21  tokio                           0x5616f3b2ba9b      tokio::future::poll_fn::PollFn<T>::poll (poll_fn.rs:38)
22  oxy_http_server                 0x5616f37b7eae      [inlined] oxy_http_server::http_impl::Http1And2::serve_requests_from_connection::{{closure}} (http_impl.rs:133)
23  core                            0x5616f37b7eae      core::future::from_generator::GenFuture<T>::poll (mod.rs:80)
24  core                            0x5616f37a85c0      [inlined] core::pin::Pin<T>::poll (future.rs:119)
25  oxy_http_server                 0x5616f37a85c0      [inlined] oxy_http_server::HttpServer<T>::serve::{{closure}} (lib.rs:99)
26  core                            0x5616f37a85c0      core::future::from_generator::GenFuture<T>::poll (mod.rs:80)
27  tokio                           0x5616f3b76b29      [inlined] tokio::runtime::task::core::CoreStage<T>::poll (core.rs:151)
28  tokio                           0x5616f3b76b29      [inlined] tokio::runtime::task::harness::poll_future::{{closure}} (harness.rs:461)
29  core                            0x5616f3b76b29      [inlined] core::panic::unwind_safe::AssertUnwindSafe<T>::call_once (unwind_safe.rs:271)
30  std                             0x5616f3b76b29      [inlined] std::panicking::try::do_call (panicking.rs:406)
31  std                             0x5616f3b76b29      [inlined] std::panicking::try (panicking.rs:370)
32  std                             0x5616f3b76b29      [inlined] std::panic::catch_unwind (panic.rs:133)
33  tokio                           0x5616f3b76b29      [inlined] tokio::runtime::task::harness::poll_future (harness.rs:449)
34  tokio                           0x5616f3b76b29      [inlined] tokio::runtime::task::harness::Harness<T>::poll_inner (harness.rs:98)
35  tokio                           0x5616f3b76b29      tokio::runtime::task::harness::Harness<T>::poll (harness.rs:53)
36  tokio                           0x5616f47e2f3f      [inlined] tokio::runtime::task::raw::RawTask::poll (raw.rs:70)
37  tokio                           0x5616f47e2f3f      [inlined] tokio::runtime::task::LocalNotified<T>::run (mod.rs:343)
38  tokio                           0x5616f47e2f3f      [inlined] tokio::runtime::thread_pool::worker::Context::run_task::{{closure}} (worker.rs:443)
39  tokio                           0x5616f47e2f3f      [inlined] tokio::coop::with_budget::{{closure}} (coop.rs:102)
40  std                             0x5616f47e2f3f      [inlined] std::thread::local::LocalKey<T>::try_with (local.rs:399)
41  std                             0x5616f47e2f3f      std::thread::local::LocalKey<T>::with (local.rs:375)
42  tokio                           0x5616f47dce7c      [inlined] tokio::coop::with_budget (coop.rs:95)
43  tokio                           0x5616f47dce7c      [inlined] tokio::coop::budget (coop.rs:72)
44  tokio                           0x5616f47dce7c      tokio::runtime::thread_pool::worker::Context::run_task (worker.rs:419)
45  tokio                           0x5616f47dbeb7      tokio::runtime::thread_pool::worker::Context::run
46  tokio                           0x5616f47c7745      [inlined] tokio::runtime::thread_pool::worker::run::{{closure}} (worker.rs:371)
47  tokio                           0x5616f47c7745      tokio::macros::scoped_tls::ScopedKey<T>::set (scoped_tls.rs:61)
48  tokio                           0x5616f47db88c      tokio::runtime::thread_pool::worker::run (worker.rs:368)
49  tokio                           0x5616f47ddf6f      [inlined] tokio::runtime::thread_pool::worker::Launch::launch::{{closure}} (worker.rs:347)
50  tokio                           0x5616f47ddf6f      [inlined] tokio::runtime::blocking::task::BlockingTask<T>::poll (task.rs:42)
51  tokio                           0x5616f47ddf6f      [inlined] tokio::runtime::task::core::CoreStage<T>::poll::{{closure}} (core.rs:161)
52  tokio                           0x5616f47ddf6f      tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut (unsafe_cell.rs:14)
53  tokio                           0x5616f47d2308      [inlined] tokio::runtime::task::core::CoreStage<T>::poll (core.rs:151)
54  tokio                           0x5616f47d2308      [inlined] tokio::runtime::task::harness::poll_future::{{closure}} (harness.rs:461)
55  core                            0x5616f47d2308      [inlined] core::panic::unwind_safe::AssertUnwindSafe<T>::call_once (unwind_safe.rs:271)
56  std                             0x5616f47d2308      [inlined] std::panicking::try::do_call (panicking.rs:406)
57  std                             0x5616f47d2308      [inlined] std::panicking::try (panicking.rs:370)
58  std                             0x5616f47d2308      [inlined] std::panic::catch_unwind (panic.rs:133)
59  tokio                           0x5616f47d2308      [inlined] tokio::runtime::task::harness::poll_future (harness.rs:449)
60  tokio                           0x5616f47d2308      [inlined] tokio::runtime::task::harness::Harness<T>::poll_inner (harness.rs:98)
61  tokio                           0x5616f47d2308      tokio::runtime::task::harness::Harness<T>::poll (harness.rs:53)
62  tokio                           0x5616f47bd126      [inlined] tokio::runtime::task::raw::RawTask::poll (raw.rs:70)
63  tokio                           0x5616f47bd126      [inlined] tokio::runtime::task::UnownedTask<T>::run (mod.rs:379)
64  tokio                           0x5616f47bd126      tokio::runtime::blocking::pool::Inner::run (pool.rs:264)
65  tokio                           0x5616f47c1f87      [inlined] tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}} (pool.rs:244)
66  std                             0x5616f47c1f87      std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:123)
67  std                             0x5616f47e3552      [inlined] std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}} (mod.rs:484)
68  core                            0x5616f47e3552      [inlined] core::panic::unwind_safe::AssertUnwindSafe<T>::call_once (unwind_safe.rs:271)
69  std                             0x5616f47e3552      [inlined] std::panicking::try::do_call (panicking.rs:406)
70  std                             0x5616f47e3552      [inlined] std::panicking::try (panicking.rs:370)
71  std                             0x5616f47e3552      [inlined] std::panic::catch_unwind (panic.rs:133)
72  std                             0x5616f47e3552      [inlined] std::thread::Builder::spawn_unchecked::{{closure}} (mod.rs:483)
73  core                            0x5616f47e3552      core::ops::function::FnOnce::call_once{{vtable.shim}} (function.rs:227)
74  alloc                           0x5616f49464c3      [inlined] alloc::boxed::Box<T>::call_once (boxed.rs:1694)
75  alloc                           0x5616f49464c3      [inlined] alloc::boxed::Box<T>::call_once (boxed.rs:1694)
76  std                             0x5616f49464c3      std::sys::unix::thread::Thread::new::thread_start (thread.rs:106)
77  <unknown>                       0x7faf8b7dbea7      start_thread
78  <unknown>                       0x7faf8b59cdef      clone
79  <unknown>                       0x0                 <unknown>
@xujihui1985
Copy link

@nox Hi nox, do you have any clue on this?

@Gowee
Copy link

Gowee commented Sep 13, 2023

I encounter the same issue occasionally with v0.3.21. To catch more useful info, I manually add more logs inside h2. But it is not easy to reproduce it.

Does the panic indicate some programming errors outside the h2 library or some possible internal bugs?

@mstyura
Copy link
Contributor

mstyura commented Sep 6, 2024

The issue still happen on version 0.4.6:
image
UPD:
also happen in context of client connection:

          Crashed: Thread: SIGABRT  0x0000000000000000
#00 pc 0x8c9b4 libc.so (BuildId: 21cd09af0ad2ccf35d6b8ba2d064b3c4)
#01 pc 0x8c984 libc.so (BuildId: 21cd09af0ad2ccf35d6b8ba2d064b3c4)
#02 pc 0x4c47cc lib.so (panic_abort::__rust_start_panic::abort::h9b26335161c4d44c [lib.rs:49]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#03 pc 0x4c47c0 lib.so (panic_abort.__rust_start_panic [lib.rs:44]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#04 pc 0x4c0b38 lib.so (std.panicking.rust_panic [panicking.rs:847]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#05 pc 0x4c085c lib.so (std::panicking::rust_panic_with_hook::h9a9a95261333cc0c [panicking.rs:817]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#06 pc 0x4c05c0 lib.so (std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h3e3954ede52e3e1a [panicking.rs:656]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#07 pc 0x4bf874 lib.so (std::sys_common::backtrace::__rust_end_short_backtrace::hcdd454926dbe14e7 [backtrace.rs:171]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#08 pc 0x4c0360 lib.so (rust_begin_unwind [panicking.rs:652]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#09 pc 0x4d1350 lib.so (core::panicking::panic_fmt::hd43de60a0f4a7faa [panicking.rs:72]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#10 pc 0x4d13d0 lib.so (core::panicking::panic::hfb07255d1d63f30c [panicking.rs:146]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#11 pc 0x46e634 lib.so (h2::proto::streams::flow_control::FlowControl::send_data::h0bf7134d7505f503 [flow_control.rs:181]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#12 pc 0x2622e8 lib.so (h2::proto::streams::prioritize::Prioritize::pop_frame::h2c183c7b2e675956 [prioritize.rs:797]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#13 pc 0x2291d8 lib.so (_$LT$hyper..proto..h2..client..H2ClientFuture$LT$B$C$T$GT$$u20$as$u20$core..future..future..Future$GT$::poll::h8f0713c06de2f6ae [prioritize.rs:536]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#14 pc 0x2b48e4 lib.so (tokio::runtime::task::raw::poll::hb485f18e92e021d4 [core.rs:328]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#15 pc 0x4a7d08 lib.so (tokio::task::local::LocalSet::tick::h5a0aab602b50c9c1 [raw.rs:201]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#16 pc 0x4a7f00 lib.so (_$LT$tokio..task..local..LocalSet$u20$as$u20$core..future..future..Future$GT$::poll::he9c83869361e6673 [local.rs:876]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#17 pc 0x2b9c70 lib.so (tokio::runtime::runtime::Runtime::block_on_inner::h1543425d0e361b5f [future.rs:123]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#18 pc 0x26c744 lib.so (std::sys_common::backtrace::__rust_begin_short_backtrace::h26fe551535da8cb4 [runtime.rs:335]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#19 pc 0x273534 lib.so (core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h2f8529abcb194436 [mod.rs:542]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#20 pc 0x4c1eb4 lib.so (std::sys::pal::unix::thread::Thread::new::thread_start::h772a29ac11c0cc47 [boxed.rs:2022]) (BuildId: 2cdb981d84abef08b403fcdb9669eecf0561ddd7)
#21 pc 0xfaecc libc.so (BuildId: 21cd09af0ad2ccf35d6b8ba2d064b3c4)
#22 pc 0x8e1f0 libc.so (BuildId: 21cd09af0ad2ccf35d6b8ba2d064b3c4)

mstyura added a commit to mstyura/h2 that referenced this issue Oct 24, 2024
mstyura added a commit to mstyura/h2 that referenced this issue Oct 24, 2024
@mstyura
Copy link
Contributor

mstyura commented Nov 7, 2024

I've changed basically the same assert in code which executed earlier and added more context to assert message to see the actual values which triggers asserts:

// There *must* be be enough connection level
// capacity at this point.
assert!(
    len <= self.flow.window_size(),
    "data len out of connection flow capacity: len={len}, csz={csz}, sz={sz}, eos={eos}, window={window}, available={available}, requested={requested}, buffered={buffered}`",
    csz = self.flow.window_size(),
    sz = sz,
    eos = frame.is_end_stream(),
    window = stream_capacity,
    available = stream.send_flow.available(),
    requested = stream.requested_send_capacity,
    buffered = stream.buffered_send_data
);

Assert hit several time a day, so far I've got those:

data len out of connection flow capacity: len=6720, csz=3528, sz=6720, eos=false, window=6720, available=6720, requested=8192, buffered=6720
data len out of connection flow capacity: len=8192, csz=1809, sz=8192, eos=false, window=10001, available=10001, requested=16384, buffered=10001
data len out of connection flow capacity: len=8192, csz=3392, sz=8192, eos=false, window=13872, available=13872, requested=16384, buffered=13872
data len out of connection flow capacity: len=6880, csz=5017, sz=6880, eos=false, window=6880, available=6880, requested=6880, buffered=6880
data len out of connection flow capacity: len=8192, csz=5456, sz=8192, eos=false, window=13648, available=13648, requested=16384, buffered=13648
data len out of connection flow capacity: len=8192, csz=7121, sz=8192, eos=false, window=15313, available=15313, requested=16384, buffered=15313
data len out of connection flow capacity: len=8192, csz=0, sz=8192, eos=false, window=8192, available=8192, requested=16384, buffered=8192
data len out of connection flow capacity: len=8192, csz=2638, sz=8192, eos=false, window=10830, available=10830, requested=16384, buffered=10830
data len out of connection flow capacity: len=8192, csz=6756, sz=8192, eos=false, window=14948, available=14948, requested=16384, buffered=14948
data len out of connection flow capacity: len=8192, csz=6756, sz=8192, eos=false, window=14948, available=14948, requested=16384, buffered=14948
data len out of connection flow capacity: len=3008, csz=1252, sz=3008, eos=false, window=3008, available=3008, requested=3008, buffered=3008

There are some additional asserts I've observed also on modified version of h2 with assert in flow control:

assert!(
    self.window_size.0 >= sz as i32,
    "size out of window capacity sz={}; window={}; available={}",
    sz,
    self.window_size,
    self.available
);
size out of window capacity sz=8192; window=3899; available=9083
size out of window capacity sz=8192; window=1879; available=8192
size out of window capacity sz=8192; window=6078; available=8192
size out of window capacity sz=8192; window=3657; available=8192
size out of window capacity sz=8192; window=1232; available=8192
size out of window capacity sz=4914; window=0; available=4914
size out of window capacity sz=8192; window=5544; available=8192

mstyura added a commit to mstyura/h2 that referenced this issue Nov 12, 2024
@mstyura
Copy link
Contributor

mstyura commented Nov 18, 2024

Enriched assert even more, here is some data from crashes:

data len out of connection flow capacity: len=8192, conn_flow=FlowControl { window_size: Window(5997), available: Window(0), stream_id: StreamId(0), direction: Send }, stream_send_flow=FlowControl { window_size: Window(310458), available: Window(14189), stream_id: StreamId(73), direction: Send }, sz=8192, eos=false, window=14189, requested=16384, buffered=13801

data len out of connection flow capacity: len=8192, conn_flow=FlowControl { window_size: Window(0), available: Window(0), stream_id: StreamId(0), direction: Send }, stream_send_flow=FlowControl { window_size: Window(8192), available: Window(8192), stream_id: StreamId(137), direction: Send }, sz=8192, eos=false, window=8192, requested=16384, buffered=8192

data len out of connection flow capacity: len=8192, conn_flow=FlowControl { window_size: Window(3781), available: Window(0), stream_id: StreamId(0), direction: Send }, stream_send_flow=FlowControl { window_size: Window(180224), available: Window(11973), stream_id: StreamId(123), direction: Send }, sz=8192, eos=false, window=11973, requested=16384, buffered=11973

data len out of connection flow capacity: len=8192, conn_flow=FlowControl { window_size: Window(4662), available: Window(0), stream_id: StreamId(0), direction: Send }, stream_send_flow=FlowControl { window_size: Window(450206), available: Window(12854), stream_id: StreamId(297), direction: Send }, sz=8192, eos=false, window=12854, requested=16384, buffered=12854

data len out of connection flow capacity: len=8192, conn_flow=FlowControl { window_size: Window(7156), available: Window(0), stream_id: StreamId(0), direction: Send }, stream_send_flow=FlowControl { window_size: Window(196608), available: Window(13880), stream_id: StreamId(63), direction: Send }, sz=8192, eos=false, window=13880, requested=16384, buffered=13880

data len out of connection flow capacity: len=8192, conn_flow=FlowControl { window_size: Window(279), available: Window(0), stream_id: StreamId(0), direction: Send }, stream_send_flow=FlowControl { window_size: Window(647168), available: Window(8471), stream_id: StreamId(727), direction: Send }, sz=8192, eos=false, window=8471, requested=16384, buffered=8471

@mstyura
Copy link
Contributor

mstyura commented Nov 19, 2024

Hello @Noah-Kennedy,

Apologies for reaching out here as well, but I came across your message in Discord regarding this issue a few days ago. I’m not sure if you saw my question there, so I hope you don’t mind if I repeat it here: did you uncover any insights into the issue after your investigation?

From your message, it seems that you identified the problem as occurring during graceful shutdown. Could you please share any additional details? Perhaps with more context, it would be possible to create a stress test to replicate the issue.

So far, I’ve gathered enriched assertion messages from crashes, but they haven’t provided much useful information beyond indicating that the assertion fails on connections with moderately long lifespans (based on the stream ID value). I’m also working on collecting core dumps, though I’m unsure if they’ll yield more actionable insights.

I’ve tried several artificial stress tests, similar to hammer.rs, and attempted to multiplex 1,000 concurrent requests via a single H2 connection backed by tokio::io::DuplexStream, but haven’t managed to reproduce the issue. I also experimented with sending RST_STREAM responses to potentially disrupt window size accounting, but this didn’t lead to any breakthroughs either.

Any pointers or insights you could share would be greatly appreciated. Thank you in advance!

@seanmonstar
Copy link
Member

I notice there's a couple TODOs in there around "proper error handling" of self.flow.assign_capacity() (in a couple spots, actually). Could that be related?

@mstyura
Copy link
Contributor

mstyura commented Nov 19, 2024

@seanmonstar, I’ve also noticed that there are some TODOs related to flow control window handling. However, since I don’t have a complete picture of how this was originally intended to work, I’m unsure if these TODOs could trigger as issues in real-world scenarios.

On my end, I’m now considering trying a custom version of h2 where I add expect calls in the TODO areas and convert some debug_assert! into assert! around flow control. This might help identify if the issue is actually triggered earlier in lifetime of the connection.

@seanmonstar
Copy link
Member

Looking a bit closer, could it be related to a connection adjusting the SETTINGS_INITIAL_WINDOW_SIZE? It's possible the increase or decrease might not be fully accounted for with h2's "assigned capacity"...

@nox
Copy link
Contributor Author

nox commented Jan 13, 2025

AFAICT, the only place where Prioritize::flow::window_size is decreased is in Prioritize::pop_frame with the panicking FlowControl::send_data call.

// TODO: proper error handling
let _res = self.flow.send_data(len);
debug_assert!(_res.is_ok());

The only place where it is increased is in Prioritize::recv_connection_window_update with FlowControl::inc_window.

// Update the connection's window
self.flow.inc_window(inc)?;

In Prioritize::pop_frame, we can see that the length is capped by the stream window's capacity.

// Get the amount of capacity remaining for stream's
// window.
let stream_capacity = stream.send_flow.available();
// Only send up to the stream's window capacity
let len =
cmp::min(len, stream_capacity.as_size() as usize) as WindowSize;

In Prioritize::try_assign_capacity, we can see that the stream window's capacity is increased based on the connection's available capacity (FlowControl::available, and not FlowControl::window_size).

// The amount of currently available capacity on the connection
let conn_available = self.flow.available().as_size();
// First check if capacity is immediately available
if conn_available > 0 {
// The amount of capacity to assign to the stream
// TODO: Should prioritization factor into this?
let assign = cmp::min(conn_available, additional);

So I guess the issue is that Prioritize::flow::available at one point becomes greater than Prioritize::flow::window_size, and afterwards the latter get decreased by a value larger than itself in Prioritize::pop_frame?

@nox
Copy link
Contributor Author

nox commented Jan 13, 2025

One weird thing I've noticed is that in Prioritize::pop_frame, we increase Prioritize::flow::available and then immediately decrease it back again.

// Assign the capacity back to the connection that
// was just consumed from the stream in the previous
// line.
// TODO: proper error handling
let _res = self.flow.assign_capacity(len);
debug_assert!(_res.is_ok());
// TODO: proper error handling
let _res = self.flow.send_data(len);
debug_assert!(_res.is_ok());

Why do we increase it at all? In which case should available not be in sync with window_size, in Prioritize::flow?

@nox
Copy link
Contributor Author

nox commented Jan 13, 2025

Looking a bit closer, could it be related to a connection adjusting the SETTINGS_INITIAL_WINDOW_SIZE? It's possible the increase or decrease might not be fully accounted for with h2's "assigned capacity"...

Seems like in Send::apply_remote_settings reducing SETTINGS_INITIAL_WINDOW_SIZE can increase Prioritize::flow::available while letting Prioritize::flow::window_size the same.

self.prioritize
.assign_connection_capacity(total_reclaimed, store, counts);

@nox
Copy link
Contributor Author

nox commented Jan 14, 2025

I also find this snippet a little weird:

// Total requested should never go below actual assigned
// (Note: the window size can go lower than assigned)
debug_assert!(stream.send_flow.available() <= total_requested as usize);
// The amount of additional capacity that the stream requests.
// Don't assign more than the window has available!
let additional = cmp::min(
total_requested - stream.send_flow.available().as_size(),
// Can't assign more than what is available
stream.send_flow.window_size() - stream.send_flow.available().as_size(),
);

First assert says "Note: the window size can go lower than assigned", but then we do stream.send_flow.window_size() - stream.send_flow.available().as_size() in the second snippet. Can't this underflow?

@nox
Copy link
Contributor Author

nox commented Jan 14, 2025

I've been looking into an old assertion failure discovered with fuzzing at #692 (comment), not sure if related or not, but there are some weird window size shenanigans going on there with send-closed streams.

@nox
Copy link
Contributor Author

nox commented Jan 16, 2025

Why do we increase it at all? In which case should available not be in sync with window_size, in Prioritize::flow?

When capacity is assigned to a stream, Prioritize::flow::available is decreased by that much, so we increase it back in "updating stream flow" before decreasing it again at the same time we decrease Prioritize::flow::window_size in "updating connection flow", which is indeed correct.

@nox
Copy link
Contributor Author

nox commented Jan 16, 2025

In Prioritize::reclaim_reserved_capacity, we give back to the connection requested_send_capacity - buffered_send_data.

/// Reclaim just reserved capacity, not buffered capacity, and re-assign
/// it to the connection
pub fn reclaim_reserved_capacity(&mut self, stream: &mut store::Ptr, counts: &mut Counts) {
// only reclaim requested capacity that isn't already buffered
if stream.requested_send_capacity as usize > stream.buffered_send_data {
let reserved = stream.requested_send_capacity - stream.buffered_send_data as WindowSize;
// TODO: proper error handling
let _res = stream.send_flow.claim_capacity(reserved);
debug_assert!(_res.is_ok());
self.assign_connection_capacity(reserved, stream, counts);
}
}

But we know that requested_send_capacity may be greater than the available capacity for that stream, as shown in Prioritize::reserve_capacity.

// If the stream has more assigned capacity than requested, reclaim
// some for the connection
if available as usize > capacity {
let diff = available - capacity as WindowSize;
// TODO: proper error handling
let _res = stream.send_flow.claim_capacity(diff);
debug_assert!(_res.is_ok());
self.assign_connection_capacity(diff, stream, counts);
}

I think the issue is that we don't account for available capacity when reclaiming reserved capacity.

nox added a commit that referenced this issue Jan 20, 2025
Reclaiming requested capacity that has not been actually reserved yet
is wrong, as this capacity never existed to begin with.
nox added a commit that referenced this issue Jan 20, 2025
Reclaiming requested capacity that has not been actually reserved yet
is wrong, as this capacity never existed to begin with.
@mstyura
Copy link
Contributor

mstyura commented Feb 3, 2025

@nox thank you very much for investigating and fixing the issue, I've no longer see crashes (4+ days) from the assert when upgraded to version of h2 with your fix.

@mstyura
Copy link
Contributor

mstyura commented Feb 11, 2025

Hello @seanmonstar, sorry for bothering you. Are there any plans to release new version of h2 with fix of this issue included?
Thanks a lot in advance for answering!

GlenDC added a commit to plabayo/rama that referenced this issue Feb 20, 2025
* Fix reclaiming reserved capacity (fixes <hyperium/h2#607>)
  by @nox in <hyperium/h2#832>
* Fix busy loop on shutdown by @seanmonstar
  in <hyperium/h2#834>
* Fix window size decrement of send-closed streams
  by @nox in <hyperium/h2#830>
* Fix handle implicit resets at the right time
  by @nox in <hyperium/h2#833>
* Fix poll_flush after poll_shutdown
  by @bdbai in <hyperium/h2#836>

Co-authored-by: Sean McArthur <sean@seanmonstar.com>
Co-authored-by: 包布丁 <htbai1998m@hotmail.com>
Co-authored-by: Anthony Ramine <123095+nox@users.noreply.github.com>
Co-authored-by: Samuel Tardieu <sam@rfc1149.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants