-
Notifications
You must be signed in to change notification settings - Fork 2.2k
fixes graceful termination when aleth --test
is invoked but require…
#5146
base: master
Are you sure you want to change the base?
fixes graceful termination when aleth --test
is invoked but require…
#5146
Conversation
libweb3jsonrpc/UnixSocketServer.cpp
Outdated
FD_ZERO(&out); | ||
FD_ZERO(&err); | ||
FD_SET(fd, &in); | ||
struct timeval tv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a more C++ way this could be timeval const tv = { timeoutMillis / 1000, timeoutMillis % 1000 }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a little conservative. Thx. done! :-)
libweb3jsonrpc/UnixSocketServer.cpp
Outdated
@@ -103,11 +104,27 @@ bool UnixDomainSocketServer::StopListening() | |||
return false; | |||
} | |||
|
|||
static bool waitForReadable(int fd, unsigned timeoutMillis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move it into unnamed namespace above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please add a comment describing why it's needed
Codecov Report
@@ Coverage Diff @@
## develop #5146 +/- ##
==========================================
- Coverage 59.92% 59.7% -0.23%
==========================================
Files 337 337
Lines 27306 27271 -35
Branches 3173 3175 +2
==========================================
- Hits 16362 16281 -81
- Misses 9860 9906 +46
Partials 1084 1084 |
Can there be any better approach than calling |
hey @gumb0. of course there is, by using However, I didn't wanna be too intrusive for a very first PR in this repo. I can however look into finding a somewhat minimal solution of the latter suggestion I made. |
Let's take a step back, can you describe how exactly do you reproduce it? Now when I tried the following steps
|
…d a `SIGKILL` to terminate. This was because the UnixSocketServer requires at least one accepting connection *after* SIGTERM/SIGINT has been received, which usually never is the case. This PR implements a trivial workaround by first waiting for up to N msecs (N=1000 for now) for any input to arrive, and if not, it'll just retry, hence, evaluating m_running, resulting in `aleth --test` to properly terminate upon receival of `SIGINT` or `SIGTERM`.
libweb3jsonrpc/UnixSocketServer.cpp
Outdated
FD_ZERO(&err); | ||
FD_SET(fd, &in); | ||
timeval tv { timeoutMillis / 1000, timeoutMillis % 1000 }; | ||
return select(fd + 1, &in, &out, &err, &tv) > 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What fd +
does?
libweb3jsonrpc/UnixSocketServer.cpp
Outdated
// We do this test before calling accept() in order to gracefully exit | ||
// this function when an external thread has set m_running to true | ||
// (such as a signal handler in the main thread). | ||
if (!waitForReadable(m_socket, 500)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not way to add the timeout to accept()?
Try without " send some RPC calls". |
Still terminates fine without RPC calls. Also tried running soltest and SIGINT after that, also aleth is terminated fine. |
I tried it tool and it terminates quickly. Can it be macOS @christianparpart ? |
So I was facing this issue on "Windows Subsystem for Linux" (which basically is a Linux kernel & ABI re-implementation within the Windows NT kernel). However, I was trying to reproduce this behavior with a minimal test case, and I believe that I get the same behavior even on Ubuntu 16.04's Linux kernel 4.8.0 with the following test case (link below). So I may still be absolutely brain-dead right now, not finding the bug in my head (or we need this patch) https://gist.github.com/christianparpart/8b89c9dbfba83478d1308ecc32b9753d This code is a minimalistic TCP pong server that shares the same behavior this PR tries to fix within aleth's AF_UNIX listener code. |
I digged a little bit into it. it's actually not accept() being cancelled with -1 (EINTR) due to the SIGINT but a -1 (EINVAL) due to the unlink() on the socket. While that one should work on all UNIX'es, I don't think this is a clean solution, as only a side effect of closing a socket while being in a blocking I/O is made use of, which seems unreliable wrt. platform independence (?). I do like your solution (lean, but unfortunately wasn't quite obvious from the beginning), but it doesn't work on my end it seems. |
So we had in place a solution I hoped for, it just seems to not work on exotic systems. I would prefer to not merge this (it complicates things quite a bit) and resolving this by you @christianparpart submitting a bug report to Microsoft (is this even a bug?) Otherwise needs some work on better logging / better error reporting / better code formatting. |
|
I'd adapt the code to conform to coding style and "codecov" once you say I can proceed (:+1:). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please apply clang-format to your changes as described in https://github.com/ethereum/cpp-ethereum/blob/master/CONTRIBUTING.md
@@ -54,16 +57,45 @@ fs::path getIpcPathOrDataDir() | |||
return getDataDir(); | |||
return path; | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use only //
for multiline comments
* @retval true One file descriptor became available. | ||
* @retval false Most likely an error occured, or a timeout. | ||
*/ | ||
static bool waitForReadable(std::initializer_list<int> sockets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be declared static
as it's in the unnamed namespace
@@ -92,6 +124,10 @@ bool UnixDomainSocketServer::StartListening() | |||
|
|||
bool UnixDomainSocketServer::StopListening() | |||
{ | |||
int dummy = 0x90; | |||
if (::write(m_exitChannel[1], &dummy, sizeof(dummy)) < 0) | |||
cerr << "Failed to notify UNIX domain server loop. " << strerror(errno) << "\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better output it to clog(VerbosityWarn, "rpc")
, It will not need \n
in the end.
{ | ||
if (pipe(m_exitChannel) < 0) | ||
abort(); // I don't like. Replace me with throw. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throw exception here.
Declare it in IpcServerBase.h
as
DEV_SIMPLE_EXCEPTION(FailedToCreatePipe);
Then throw here like
BOOST_THROW_EXCEPTION(FailedToCreatePipe());
Fixes graceful termination when
aleth --test
is invoked but required aSIGKILL
to terminate.This was because the UnixSocketServer requires at least one accepting
connection after SIGTERM/SIGINT has been received, which usually never
is the case.
This PR implements a trivial workaround by first waiting for up to N
msecs (N=1000 for now) for any input to arrive, and if not, it'll just
retry, hence, evaluating m_running, resulting in
aleth --test
toproperly terminate upon receival of
SIGINT
orSIGTERM
.