Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Http2 Segmentation fault #29902

Closed
lomaster1 opened this issue Oct 9, 2019 · 16 comments
Closed

Http2 Segmentation fault #29902

lomaster1 opened this issue Oct 9, 2019 · 16 comments
Labels
http2 Issues or PRs related to the http2 subsystem.

Comments

@lomaster1
Copy link

I got it on production (many times per day):

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000956028 in node::http2::Http2Session::MaybeStopReading() [clone .part.109] ()
[Current thread is 1 (Thread 0x7f9a40aa4740 (LWP 12507))]
(gdb) bt
#0  0x0000000000956028 in node::http2::Http2Session::MaybeStopReading() [clone .part.109] ()
#1  0x0000000000959b96 in node::http2::Http2Session::SendPendingData() ()
#2  0x000000000095a478 in node::http2::Http2Session::MaybeScheduleWrite()::{lambda(node::Environment*, void*)#1}::_FUN(node::Environment*, void*) ()
#3  0x00000000008e337c in node::Environment::RunAndClearNativeImmediates() ()
#4  0x00000000008e3594 in node::Environment::CheckImmediate(uv_check_s*) ()
#5  0x0000000000a76f5c in uv__run_check (loop=loop@entry=0x2655860 <default_loop_struct>) at ../deps/uv/src/unix/loop-watcher.c:70
#6  0x0000000000a6fe73 in uv_run (loop=0x2655860 <default_loop_struct>, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:365
#7  0x0000000000904725 in node::Start(v8::Isolate*, node::IsolateData*, std::vector<std::string, std::allocator<std::string> > const&, std::vector<std::string, std::allocator<std::string> > const&)
    ()
#8  0x000000000090297f in node::Start(int, char**) ()
#9  0x00007f9a3f619830 in __libc_start_main (main=0x8bbd50 <main>, argc=5, argv=0x7ffd529d2b88, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd529d2b78)
    at ../csu/libc-start.c:291
#10 0x00000000008bbe85 in _start ()

Can you help me investigate why it happens?
What can I do to provide you with information to fix it? I can't use debug node in production, because it's very slow.

@lomaster1 lomaster1 changed the title Http2 segfault Http2 Segmentation fault Oct 9, 2019
@lomaster1
Copy link
Author

Here is a stack which I catch with debug node:

Core was generated by `/var/lib/nave/installed/10.16.3/bin/node --expose-internals --expose-gc --max-h'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f51f196d428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f51f2de0740 (LWP 11051))]
(gdb) bt
#0  0x00007f51f196d428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f51f196f02a in __GI_abort () at abort.c:89
#2  0x0000000001ab1288 in node::Abort () at ../src/node.cc:1043
#3  0x0000000001ab1f15 in node::OnFatalError (location=0x35e6798 "v8::HandleScope::CreateHandle()", message=0x35e6768 "Cannot create a handle without a HandleScope") at ../src/node.cc:1351
#4  0x0000000001e2f7d2 in v8::Utils::ReportApiFailure (location=0x35e6798 "v8::HandleScope::CreateHandle()", message=0x35e6768 "Cannot create a handle without a HandleScope")
    at ../deps/v8/src/api.cc:435
#5  0x0000000001e1ccbb in v8::Utils::ApiCheck (condition=false, location=0x35e6798 "v8::HandleScope::CreateHandle()", message=0x35e6768 "Cannot create a handle without a HandleScope")
    at ../deps/v8/src/api.h:139
#6  0x00000000024ed4ce in v8::internal::HandleScope::Extend (isolate=0x52265e0) at ../deps/v8/src/handles.cc:69
#7  0x0000000001e05092 in v8::internal::HandleScope::CreateHandle (isolate=0x52265e0, value=0x15fd40874939) at ../deps/v8/src/handles-inl.h:169
#8  0x0000000001e323cd in v8::HandleScope::CreateHandle (isolate=0x52265e0, value=0x15fd40874939) at ../deps/v8/src/api.cc:1144
#9  0x0000000001a4d698 in v8::Local<v8::Object>::New (isolate=0x52265e0, that=0x9981950) at ../deps/v8/include/v8.h:9378
#10 0x0000000001a4c40c in v8::Local<v8::Object>::New (isolate=0x52265e0, that=...) at ../deps/v8/include/v8.h:9369
#11 0x0000000001a4c9c8 in node::WeakPersistentToLocal<v8::Object> (isolate=0x52265e0, persistent=...) at ../src/util-inl.h:190
#12 0x0000000001a4adb0 in node::PersistentToLocal<v8::Object> (isolate=0x52265e0, persistent=...) at ../src/util-inl.h:173
#13 0x0000000001a49a94 in node::BaseObject::object (this=0x8429340) at ../src/base_object-inl.h:65
#14 0x0000000001b20b9c in node::http2::Http2Scope::Http2Scope (this=0x7ffe4352c470, session=0x8429340) at ../src/node_http2.cc:87
#15 0x0000000001b20b20 in node::http2::Http2Scope::Http2Scope (this=0x7ffe4352c470, stream=0x142524e0) at ../src/node_http2.cc:70
#16 0x0000000001b2ab8f in node::http2::Http2Stream::FlushRstStream (this=0x142524e0) at ../src/node_http2.cc:2237
#17 0x0000000001b27edb in node::http2::Http2Session::ClearOutgoing (this=0x8429340, status=0) at ../src/node_http2.cc:1708
#18 0x0000000001b27818 in node::http2::Http2Session::OnStreamAfterWrite (this=0x8429340, w=0x5f82900, status=0) at ../src/node_http2.cc:1616
#19 0x0000000001a6dadb in node::StreamResource::EmitAfterWrite (this=0x1785b080, w=0x5f82900, status=0) at ../src/stream_base-inl.h:135
#20 0x0000000001a6dbf0 in node::WriteWrap::OnDone (this=0x5f82900, status=0) at ../src/stream_base-inl.h:430
#21 0x0000000001a9da21 in node::StreamReq::Done (this=0x5f82900, status=0, error_str=0x0) at ../src/stream_base-inl.h:442
#22 0x0000000001c991b4 in node::TLSWrap::InvokeQueued (this=0x1785af90, status=0, error_str=0x0) at ../src/tls_wrap.cc:96
#23 0x0000000001c9a472 in node::TLSWrap::EncOut (this=0x1785af90) at ../src/tls_wrap.cc:295
#24 0x0000000001c9ab2d in node::TLSWrap::OnStreamAfterWrite (this=0x1785af90, req_wrap=0x82c1bd0, status=0) at ../src/tls_wrap.cc:368
#25 0x0000000001a6dadb in node::StreamResource::EmitAfterWrite (this=0x6bfee40, w=0x82c1bd0, status=0) at ../src/stream_base-inl.h:135
#26 0x0000000001a6dbf0 in node::WriteWrap::OnDone (this=0x82c1bd0, status=0) at ../src/stream_base-inl.h:430
#27 0x0000000001a9da21 in node::StreamReq::Done (this=0x82c1bd0, status=0, error_str=0x0) at ../src/stream_base-inl.h:442
#28 0x0000000001a9dd7e in node::JSStream::Finish<node::WriteWrap> (args=...) at ../src/js_stream.cc:159
#29 0x0000000001f37714 in v8::internal::FunctionCallbackArguments::Call (this=0x7ffe4352ccb0, handler=0x17d923ae2849) at ../deps/v8/src/api-arguments-inl.h:94
#30 0x0000000001f3a53a in v8::internal::(anonymous namespace)::HandleApiCallHelper<false> (isolate=0x52265e0, function=..., new_target=..., fun_data=..., receiver=..., args=...)
    at ../deps/v8/src/builtins/builtins-api.cc:109
#31 0x0000000001f38430 in v8::internal::Builtin_Impl_HandleApiCall (args=..., isolate=0x52265e0) at ../deps/v8/src/builtins/builtins-api.cc:139
#32 0x0000000001f381b1 in v8::internal::Builtin_HandleApiCall (args_length=7, args_object=0x7ffe4352ced8, isolate=0x52265e0) at ../deps/v8/src/builtins/builtins-api.cc:127
#33 0x000003d4db841d64 in ?? ()
#34 0x000003d4db841cc1 in ?? ()
#35 0x00007ffe4352ce80 in ?? ()
#36 0x0000000000000006 in ?? ()
#37 0x00007ffe4352cf08 in ?? ()
#38 0x000003d4dc23f41a in ?? ()
#39 0x00001d2549c826f1 in ?? ()
#40 0x000017d923ae3599 in ?? ()
#41 0x0000000700000000 in ?? ()
#42 0x00001d2549c82801 in ?? ()
#43 0x0000000000000000 in ?? ()

@addaleax addaleax added the http2 Issues or PRs related to the http2 subsystem. label Oct 10, 2019
@addaleax
Copy link
Member

@lomaster1 Thanks for the report! I assume this would fix the debug mode crash, but maybe not the original crash? Can you verify?

diff --git a/src/node_http2.cc b/src/node_http2.cc
index 6bd282593e28..0ba88946fc5c 100644
--- a/src/node_http2.cc
+++ b/src/node_http2.cc
@@ -2100,6 +2100,7 @@ void Http2Stream::SubmitRstStream(const uint32_t code) {
 void Http2Stream::FlushRstStream() {
   if (IsDestroyed())
     return;
+  HandleScope handle_scope(env()->isolate());
   Http2Scope h2scope(this);
   CHECK_EQ(nghttp2_submit_rst_stream(**session_, NGHTTP2_FLAG_NONE,
                                      id_, code_), 0);

@lomaster1
Copy link
Author

lomaster1 commented Oct 10, 2019

@addaleax thanks for the fix. I applied it and got the same error as in production.

Core was generated by `/var/lib/nave/installed/10.16.3/bin/node --expose-internals --expose-gc --max-h'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000001b27c83 in node::http2::Http2Session::MaybeStopReading (this=0x6975610) at ../src/node_http2.cc:1673
1673	    stream_->ReadStop();
[Current thread is 1 (Thread 0x7f39abdd0740 (LWP 20545))]
(gdb) bt
#0  0x0000000001b27c83 in node::http2::Http2Session::MaybeStopReading (this=0x6975610) at ../src/node_http2.cc:1673
#1  0x0000000001b2864f in node::http2::Http2Session::SendPendingData (this=0x6975610) at ../src/node_http2.cc:1815
#2  0x0000000001b27974 in node::http2::Http2Session::<lambda(node::Environment*, void*)>::operator()(node::Environment *, void *) const (__closure=0x0, env=0x7ffc13154330, data=0x6975610)
    at ../src/node_http2.cc:1662
#3  0x0000000001b279cb in node::http2::Http2Session::<lambda(node::Environment*, void*)>::_FUN(node::Environment *, void *) () at ../src/node_http2.cc:1663
#4  0x0000000001a78744 in node::Environment::<lambda()>::operator()(void) const (__closure=0x7ffc13154140) at ../src/env.cc:500
#5  0x0000000001a78993 in node::Environment::RunAndClearNativeImmediates (this=0x7ffc13154330) at ../src/env.cc:516
#6  0x0000000001a78acb in node::Environment::CheckImmediate (handle=0x7ffc13154498) at ../src/env.cc:536
#7  0x0000000001d1fbee in uv__run_check (loop=0x4af9120 <default_loop_struct>) at ../deps/uv/src/unix/loop-watcher.c:67
#8  0x0000000001d166b7 in uv_run (loop=0x4af9120 <default_loop_struct>, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:365
#9  0x0000000001abbc11 in node::Start (isolate=0x5f045e0, isolate_data=0x5f6ce10, args=std::vector of length 2, capacity 5 = {...}, exec_args=std::vector of length 3, capacity 4 = {...})
    at ../src/node.cc:2895
#10 0x0000000001abbf91 in node::Start (event_loop=0x4af9120 <default_loop_struct>, args=std::vector of length 2, capacity 5 = {...}, exec_args=std::vector of length 3, capacity 4 = {...})
    at ../src/node.cc:2994
#11 0x0000000001ab960d in node::Start (argc=5, argv=0x5f02030) at ../src/node.cc:3053
#12 0x0000000002e8ebc0 in main (argc=5, argv=0x7ffc13154fc8) at ../src/node_main.cc:124

@addaleax
Copy link
Member

@lomaster1 Can you print the contents of stream_ on the crashing line? If that’s not NULL, is there any chance you could run the program under valgrind? That might give a better error than gdb here if this is e.g. use-after-free.

@lomaster1
Copy link
Author

lomaster1 commented Oct 10, 2019

@addaleax I added this before stream_->ReadStop(); (also I make listener_ and other fields public)

fprintf(stderr, "XXX this %p -> stream_ is not null: %d\n", this, stream_!=nullptr);
fprintf(stderr, "XXX this %p -> stream_ %p: listener %p, bR%d, bW%d\n", this, stream_, stream_->listener_, stream_->bytes_read_, stream_->bytes_written_);

And got crash with stack:

Core was generated by `/var/lib/nave/installed/10.16.3/bin/node --expose-internals --expose-gc --max-h'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000001b27cb5 in node::http2::Http2Session::MaybeStopReading (this=0x9c8f8c0) at ../src/node_http2.cc:1674
1674	    fprintf(stderr, "XXX this %p -> stream_ %p: listener %p, bR%d, bW%d\n", this, stream_, stream_->listener_, stream_->bytes_read_, stream_->bytes_written_);
[Current thread is 1 (Thread 0x7f10ecb58740 (LWP 24690))]
(gdb) bt
#0  0x0000000001b27cb5 in node::http2::Http2Session::MaybeStopReading (this=0x9c8f8c0) at ../src/node_http2.cc:1674
#1  0x0000000001b286d9 in node::http2::Http2Session::SendPendingData (this=0x9c8f8c0) at ../src/node_http2.cc:1817
#2  0x0000000001b27974 in node::http2::Http2Session::<lambda(node::Environment*, void*)>::operator()(node::Environment *, void *) const (__closure=0x0, env=0x7ffeca078e50, data=0x9c8f8c0)
    at ../src/node_http2.cc:1662
#3  0x0000000001b279cb in node::http2::Http2Session::<lambda(node::Environment*, void*)>::_FUN(node::Environment *, void *) () at ../src/node_http2.cc:1663
#4  0x0000000001a78744 in node::Environment::<lambda()>::operator()(void) const (__closure=0x7ffeca078c60) at ../src/env.cc:500
#5  0x0000000001a78993 in node::Environment::RunAndClearNativeImmediates (this=0x7ffeca078e50) at ../src/env.cc:516
#6  0x0000000001a78acb in node::Environment::CheckImmediate (handle=0x7ffeca078fb8) at ../src/env.cc:536
#7  0x0000000001d1fc78 in uv__run_check (loop=0x4af9120 <default_loop_struct>) at ../deps/uv/src/unix/loop-watcher.c:67
#8  0x0000000001d16741 in uv_run (loop=0x4af9120 <default_loop_struct>, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:365
#9  0x0000000001abbc11 in node::Start (isolate=0x63b35e0, isolate_data=0x641be10, args=std::vector of length 2, capacity 5 = {...}, exec_args=std::vector of length 3, capacity 4 = {...})
    at ../src/node.cc:2895
#10 0x0000000001abbf91 in node::Start (event_loop=0x4af9120 <default_loop_struct>, args=std::vector of length 2, capacity 5 = {...}, exec_args=std::vector of length 3, capacity 4 = {...})
    at ../src/node.cc:2994
#11 0x0000000001ab960d in node::Start (argc=5, argv=0x63b1030) at ../src/node.cc:3053
#12 0x0000000002e8ebc0 in main (argc=5, argv=0x7ffeca079ae8) at ../src/node_main.cc:124

In logs:

XXX this 0x9c8f8c0 -> stream_ is not null: 1
XXX this 0x9c8f8c0 -> stream_ 0x9be5d30: listener 0x9c8f8f0, bR40, bW42
XXX this 0x9c8f8c0 -> stream_ is not null: 1
XXX this 0x9c8f8c0 -> stream_ 0x9be5d30: listener 0x9c8f8f0, bR49, bW308
XXX this 0x9c8f8c0 -> stream_ is not null: 1
XXX this 0x9c8f8c0 -> stream_ 0x9be5d30: listener 0x9c8f8f0, bR34751, bW334
XXX this 0x9c8f8c0 -> stream_ is not null: 1
XXX this 0x9c8f8c0 -> stream_ 0x9be5d30: listener 0x9c8f8f0, bR83060, bW360
XXX this 0x9c8f8c0 -> stream_ is not null: 0

That means that stream_ is null in some cases.

I didn't use valgrind before. All with valgrind will be work too slow then debug node. To reproduce the bug I send a lot of requests to service.

@addaleax
Copy link
Member

Okay, thanks for the info so far – that’s definitely helpful.

@lomaster1
Copy link
Author

Some times I got this error:

TypeError: Cannot read property 'finishWrite' of null
    at JSStreamWrap.finishWrite (internal/wrap_js_stream.js:195:12)
    at Immediate.setImmediate (internal/wrap_js_stream.js:218:12)
    at runCallback (timers.js:705:18)
    at tryOnImmediate (timers.js:676:5)
    at processImmediate (timers.js:658:5)

on the same steps when I'm repeating segfaults. Maybe it's related to issue.

Some additional information: we're using TlsSocket around our own Duplex stream. We wrote own protocol to send/receive data from our proxy server (Http2->TlsSocket->Duplex->Proxy). I send a lot of requests to the service and server CPU is about 95-100% (maybe it's because node is in debug). and all works slow. I noticed when Duplex closed, for example, by timeout, this error or segfault happened. There are also caught errors in the log:

Error [ERR_STREAM_WRITE_AFTER_END]: write after end
    at writeAfterEnd (_stream_writable.js:248:12)
    at Duplex.Writable.write (_stream_writable.js:296:5) // it's Duplex stream to proxy
    at Duplex.fake_socket.write (/var/product/module.js:694:31) // our module
    at JSStreamWrap.doWrite (internal/wrap_js_stream.js:160:19)
    at JSStream.onwrite (internal/wrap_js_stream.js:20:57)

@addaleax
Copy link
Member

Some additional information: we're using TlsSocket around our own Duplex stream. We wrote own protocol to send/receive data from our proxy server (Http2->TlsSocket->Duplex->Proxy)

That makes it likely that looking at your code would be very helpful – would you be willing to share it, or at least the parts that implement that Duplex stream?

@lomaster1
Copy link
Author

Some additional information: we're using TlsSocket around our own Duplex stream. We wrote own protocol to send/receive data from our proxy server (Http2->TlsSocket->Duplex->Proxy)

That makes it likely that looking at your code would be very helpful – would you be willing to share it, or at least the parts that implement that Duplex stream?

I can't share the code.
Duplex stream is a stream in multiplexing WS connection (like as http2 streams). It has a lot of code.

@lomaster1
Copy link
Author

lomaster1 commented Oct 15, 2019

New information: we have a timer to cancel a long query and we're calling http2stream.close(http2.constants.NGHTTP2_CANCEL); if timeout. There are no segfaults without this close call. UPD: there is one crash for many tries (>6).
Also, I checked destroyed, aborted and closed flags for http2stream and session before calling the close method and got a segfault. The same segfault if call the close method without arguments.

@lomaster1
Copy link
Author

lomaster1 commented Oct 16, 2019

In production, this solution (no http2stream.close if timeout) worked not good - segfaults returned.
UPD: not so many as before

@lomaster1
Copy link
Author

I created wrapper (new duplex) around our duplex stream to the proxy and segfaults gone. I'm passing with a wrapper to tls socket and then tls socket to http2.

class TunnelStreamWrapper extends Duplex {
    constructor(source, opt){
        super(opt);
        this.source = source;
        this._source_on_data = this._source_on_data.bind(this);
        this._source_on_error = this._source_on_error.bind(this);
        this._source_on_close = this._source_on_close.bind(this);
        this.source.on('data', this._source_on_data);
        this.source.on('error', this._source_on_error);
        this.source.once('close', this._source_on_close);
    }
    _source_on_data(d){ this.push(d); }
    _source_on_error(e){ this.emit('error', e); }
    _source_on_close(){
        // setImmediate important to prevent error 'Cannot read
        // property 'finishWrite' of null' and crash
        setImmediate(this.destroy.bind(this));
    }
    _write(chunk, encoding, cb){
        if (this.source.writable)
            this.source.write(chunk, encoding, cb);
        else
            cb(new Error('Source socket is not writable'));
    }
    _read(size){}
    _destroy(e, cb){
        this.source.off('data', this._source_on_data);
        this.source.off('error', this._source_on_error);
        this.source.destroy();
        this.source = undefined;
        cb(e);
    }
}

Also, I have been testing with emitClose: false for our Duplex to proxy (used without this wrapper) and segfaults gone too.
When I added this.emit('close') to wrapper when source stream emitted close, segfaults returned.

@pimterry
Copy link
Member

I'm having a similar-looking segfault issue in my server, only when using HTTP/2. @lomaster1 / @addaleax did you ever get to the bottom of this?

Reproducible for me in node 10.15.3, 10.21.0 and 12.18.3.

I'm also wrapping streams up as sockets in some cases in my app, using the built-in _stream_wrap, but I've now disabled that everywhere and I'm still seeing this behaviour, so I'm fairly confident it's unrelated.

Some example traces, via segfault-handler:

.../node_modules/segfault-handler/build/Release/segfault-handler.node(+0x3127)[0x7fdb5a5fb127]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7fdb735f58a0]
node(_ZN4node7TLSWrap6EncOutEv+0x170)[0xa09010]
node(_ZN4node7TLSWrap7DoWriteEPNS_9WriteWrapEP8uv_buf_tmP11uv_stream_s+0x2c7)[0xa0a6c7]
node(_ZN4node5http212Http2Session15SendPendingDataEv+0x4ce)[0x93b5ae]
node(_ZN4node5http212Http2Session5CloseEjb+0xda)[0x93c4fa]
node[0xb62a3f]
node(_ZN2v88internal21Builtin_HandleApiCallEiPPNS0_6ObjectEPNS0_7IsolateE+0xb9)[0xb635a9]
[0xcec6c2dbe1d]
[1]    30818 segmentation fault (core dumped)  node ./bin/run start
PID 26794 received SIGSEGV for address: 0x20
.../node_modules/segfault-handler/build/Release/segfault-handler.node(+0x3127)[0x7f7dd00b4127]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7f7dd1ebc8a0]
node(_ZN4node10StreamBase5WriteEP8uv_buf_tmP11uv_stream_sN2v85LocalINS5_6ObjectEEE+0x2b)[0xad97cb]
node(_ZN4node7TLSWrap6EncOutEv+0x2a7)[0xb53e07]
node(_ZN4node7TLSWrap7DoWriteEPNS_9WriteWrapEP8uv_buf_tmP11uv_stream_s+0x438)[0xb554d8]
node(_ZN4node5http212Http2Session15SendPendingDataEv+0x4ec)[0xa40b4c]
node(_ZN4node5http212Http2Session5CloseEjb+0x26e)[0xa413be]
node[0xbf0239]
node(_ZN2v88internal21Builtin_HandleApiCallEiPmPNS0_7IsolateE+0xb7)[0xbf2027]
node[0x13cf119]
[1]    26794 segmentation fault (core dumped)  node ./bin/run start
PID 32363 received SIGSEGV for address: 0x20
.../node_modules/segfault-handler/build/Release/segfault-handler.node(+0x3127)[0x7fc4c85f7127]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7fc4d0d428a0]
node(_ZN4node10StreamBase5WriteEP8uv_buf_tmP11uv_stream_sN2v85LocalINS5_6ObjectEEE+0x2b)[0xad97cb]
node(_ZN4node7TLSWrap6EncOutEv+0x2a7)[0xb53e07]
node(_ZN4node7TLSWrap7DoWriteEPNS_9WriteWrapEP8uv_buf_tmP11uv_stream_s+0x438)[0xb554d8]
node(_ZN4node5http212Http2Session15SendPendingDataEv+0x4ec)[0xa40b4c]
node(_ZN4node5http212Http2Session5CloseEjb+0x26e)[0xa413be]
node[0xbf0239]
node(_ZN2v88internal21Builtin_HandleApiCallEiPmPNS0_7IsolateE+0xb7)[0xbf2027]
node[0x13cf119]

Any clues would be much appreciated! I'm still doing some fairly funky networking tricks in places (unwrapping CONNECT requests, peeking at first bytes of connections), I'm going to remove each of those one by one and see if I can hunt down a clearer culprit.

For my case, all the code involved is open-source (see https://github.com/httptoolkit/httpolyglot/blob/master/lib/index.js and https://github.com/httptoolkit/mockttp/tree/h2, especially http-combo-server) but it's tricky to set up - currently my most reliable option for reproducing this is by proxying connections from Tiktok on my Android phone for 2 minutes 😄.

@pimterry
Copy link
Member

Ok, I'm not sure if I've narrowed down the root cause, but I've found a reliable workaround.

Currently, I accept CONNECT requests over HTTP/1, and then call server.emit('connection', socket) after the initial setup for my HTTP/2 server, effectively passing the tunnelled connection right back to myself. It sounds like @lomaster1 might be doing something similar. That looks like this:

h1Server.on('connect', (req, socket) => {
    socket.write('HTTP/' + req.httpVersion + ' 200 OK\r\n\r\n', 'utf-8', () => {
        h2Server.emit('connection', socket);
    });
});

If I add const SocketWrapper = require('_stream_wrap') and change the above to h2Server.emit('connection', new SocketWrapper(socket)), the segfault here disappears completely, I can't reproduce it at all.

Not sure why, I can only guess that using the stream wrapper moves it to the simpler (and slower) stream path, avoiding the underlying bug. Regardless of the root cause, that seems to be sufficient as a workaround for me for now.

@pimterry
Copy link
Member

I'm now fairly sure this was fixed by #41185 in v16.14+ and this can be closed. I no longer see this issue in production despite heavy use of all the above.

@bnoordhuis
Copy link
Member

You're almost certainly right. Thanks, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http2 Issues or PRs related to the http2 subsystem.
Projects
None yet
Development

No branches or pull requests

4 participants