Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connection: propagate I/O errors to network filters. #13941

Closed
2 changes: 2 additions & 0 deletions include/envoy/api/io_error.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ class IoError {
Interrupt,
// Requested a nonexistent interface or a non-local source address.
AddressNotAvailable,
// Connection reset by peer.
ConnectionResetByPeer,
// Bad file descriptor.
BadFd,
// Other error codes cannot be mapped to any one above in getErrorCode().
Expand Down
2 changes: 2 additions & 0 deletions include/envoy/common/platform.h
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,7 @@ struct msghdr {
#define SOCKET_ERROR_ADDR_NOT_AVAIL WSAEADDRNOTAVAIL
#define SOCKET_ERROR_INVAL WSAEINVAL
#define SOCKET_ERROR_ADDR_IN_USE WSAEADDRINUSE
#define SOCKET_ERROR_ECONNRESET WSAECONNRESET

#define HANDLE_ERROR_PERM ERROR_ACCESS_DENIED
#define HANDLE_ERROR_INVALID ERROR_INVALID_HANDLE
Expand Down Expand Up @@ -240,6 +241,7 @@ typedef int filesystem_os_id_t; // NOLINT(modernize-use-using)
#define SOCKET_ERROR_ADDR_NOT_AVAIL EADDRNOTAVAIL
#define SOCKET_ERROR_INVAL EINVAL
#define SOCKET_ERROR_ADDR_IN_USE EADDRINUSE
#define SOCKET_ERROR_ECONNRESET ECONNRESET

// Mapping POSIX file errors to common error names
#define HANDLE_ERROR_PERM EACCES
Expand Down
4 changes: 3 additions & 1 deletion include/envoy/network/post_io_action.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ enum class PostIoAction {
// Close the connection.
Close,
// Keep the connection open.
KeepOpen
KeepOpen,
antoniovicente marked this conversation as resolved.
Show resolved Hide resolved
// Close the connection because of an error.
CloseError,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that a majority of Close cases are being renamed CloseError.

Should we also rename Close to CloseGraceful or similar as a way to improve our chances of catching all cases that need to be adjusted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

};

} // namespace Network
Expand Down
34 changes: 31 additions & 3 deletions source/common/network/connection_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,12 @@ namespace {

constexpr absl::string_view kTransportSocketConnectTimeoutTerminationDetails =
"transport socket timeout was reached";
constexpr absl::string_view kDownstreamConnectionTerminationDetails =
"downstream connection was terminated";
constexpr absl::string_view kUpstreamConnectionTerminationDetails =
"upstream connection was terminated";

}
} // namespace

void ConnectionImplUtility::updateBufferStats(uint64_t delta, uint64_t new_total,
uint64_t& previous_total, Stats::Counter& stat_total,
Expand Down Expand Up @@ -588,6 +592,19 @@ void ConnectionImpl::onReadReady() {
result.action_ = PostIoAction::Close;
}

if (result.action_ == PostIoAction::CloseError) {
if (dynamic_cast<ServerConnectionImpl*>(this)) {
stream_info_.setConnectionTerminationDetails(kDownstreamConnectionTerminationDetails);
stream_info_.setResponseFlag(StreamInfo::ResponseFlag::DownstreamConnectionTermination);
} else {
stream_info_.setUpstreamTransportFailureReason(kUpstreamConnectionTerminationDetails);
stream_info_.setConnectionTerminationDetails(kUpstreamConnectionTerminationDetails);
stream_info_.setResponseFlag(StreamInfo::ResponseFlag::UpstreamConnectionTermination);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: delegate stats update to virtual method so you can avoid the dynamic cast.

}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those values are set on the upstream connection's stream info, and they are never propagated to downstream connection's stream info or access logs. I'm not sure what's the best way to solve that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the high level request object query the connection as part of the termination process to determine if termination was graceful or abrupt?

// Force "end_stream" so that filters can process this error.
result.end_stream_read_ = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually associate end_stream_read_ with graceful termination. Is it appropriate to set this? Do we need alternate signaling to notify filters about abrupt terminations via either additional arguments to push pipeline or a new virtual method which is called on abrupt termination.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree; I think of end_stream as always a graceful operation. I think a different signal for error is more appropriate.

Copy link
Contributor Author

@PiotrSikora PiotrSikora Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that there is no different signal right now, and if we don't set this, then network filters won't be executed at all.

Note that this PR is meant to be a temporary fix until we have a proper solution for #13940, but that's going to require a lot more changes and time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a massive hack. How much effort would it be to implement the real fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That depends on what do you consider a real fix. I'm not too familiar with that part of the codebase, but if we want to propagate upstream connection events to network filters, then I imagine it's quite a lot of changes, so I think it would be better if one of maintainers could work on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggreenway

I don't know what to suggest. It would be good to figure out how to do a catch-all cleanup of WASM resources on abnormal connection termination. Could WASM detect connection termination based on an L4 filter that does the relevant signaling on destruction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, we have a workaround in #13836 that works for Wasm.

}

read_end_stream_ |= result.end_stream_read_;
if (result.bytes_processed_ != 0 || result.end_stream_read_ ||
(latched_dispatch_buffered_data && read_buffer_.length() > 0)) {
Expand All @@ -598,7 +615,7 @@ void ConnectionImpl::onReadReady() {
}

// The read callback may have already closed the connection.
if (result.action_ == PostIoAction::Close || bothSidesHalfClosed()) {
if (result.action_ != PostIoAction::KeepOpen || bothSidesHalfClosed()) {
ENVOY_CONN_LOG(debug, "remote close", *this);
closeSocket(ConnectionEvent::RemoteClose);
}
Expand Down Expand Up @@ -651,11 +668,22 @@ void ConnectionImpl::onWriteReady() {
uint64_t new_buffer_size = write_buffer_->length();
updateWriteBufferStats(result.bytes_processed_, new_buffer_size);

if (result.action_ == PostIoAction::CloseError) {
if (dynamic_cast<ServerConnectionImpl*>(this)) {
stream_info_.setConnectionTerminationDetails(kDownstreamConnectionTerminationDetails);
stream_info_.setResponseFlag(StreamInfo::ResponseFlag::DownstreamConnectionTermination);
} else {
stream_info_.setUpstreamTransportFailureReason(kUpstreamConnectionTerminationDetails);
stream_info_.setConnectionTerminationDetails(kUpstreamConnectionTerminationDetails);
stream_info_.setResponseFlag(StreamInfo::ResponseFlag::UpstreamConnectionTermination);
}
}

// NOTE: If the delayed_close_timer_ is set, it must only trigger after a delayed_close_timeout_
// period of inactivity from the last write event. Therefore, the timer must be reset to its
// original timeout value unless the socket is going to be closed as a result of the doWrite().

if (result.action_ == PostIoAction::Close) {
if (result.action_ != PostIoAction::KeepOpen) {
// It is possible (though unlikely) for the connection to have already been closed during the
// write callback. This can happen if we manage to complete the SSL handshake in the write
// callback, raise a connected event, and close the connection.
Expand Down
2 changes: 2 additions & 0 deletions source/common/network/io_socket_error_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ Api::IoError::IoErrorCode IoSocketError::getErrorCode() const {
return IoErrorCode::Interrupt;
case SOCKET_ERROR_ADDR_NOT_AVAIL:
return IoErrorCode::AddressNotAvailable;
case SOCKET_ERROR_ECONNRESET:
return IoErrorCode::ConnectionResetByPeer;
default:
ENVOY_LOG_MISC(debug, "Unknown error code {} details {}", errno_, getErrorDetails());
return IoErrorCode::UnknownError;
Expand Down
4 changes: 2 additions & 2 deletions source/common/network/raw_buffer_socket.cc
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ IoResult RawBufferSocket::doRead(Buffer::Instance& buffer) {
ENVOY_CONN_LOG(trace, "read error: {}", callbacks_->connection(),
result.err_->getErrorDetails());
if (result.err_->getErrorCode() != Api::IoError::IoErrorCode::Again) {
action = PostIoAction::Close;
action = PostIoAction::CloseError;
}
break;
}
Expand Down Expand Up @@ -73,7 +73,7 @@ IoResult RawBufferSocket::doWrite(Buffer::Instance& buffer, bool end_stream) {
if (result.err_->getErrorCode() == Api::IoError::IoErrorCode::Again) {
action = PostIoAction::KeepOpen;
} else {
action = PostIoAction::Close;
action = PostIoAction::CloseError;
}
break;
}
Expand Down
12 changes: 6 additions & 6 deletions source/extensions/transport_sockets/alts/tsi_socket.cc
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ Network::PostIoAction TsiSocket::doHandshakeNextDone(NextResultPtr&& next_result

if (status != TSI_INCOMPLETE_DATA && status != TSI_OK) {
ENVOY_CONN_LOG(debug, "TSI: Handshake failed: status: {}", callbacks_->connection(), status);
return Network::PostIoAction::Close;
return Network::PostIoAction::CloseError;
}

if (next_result->to_send_->length() > 0) {
Expand Down Expand Up @@ -145,7 +145,7 @@ Network::PostIoAction TsiSocket::doHandshakeNextDone(NextResultPtr&& next_result
if (read_error_ || (!handshake_complete_ && end_stream_read_)) {
ENVOY_CONN_LOG(debug, "TSI: Handshake failed: end of stream without enough data",
callbacks_->connection());
return Network::PostIoAction::Close;
return Network::PostIoAction::CloseError;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 remaining uses of Network::PostIoAction::Close; in this file when finding a read 0 during handshakes. Would it make sense to also consider those cases CloseError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed one to CloseError and one to CloseGraceful (when the certificate validation failed, since that's not and an I/O error).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggreenway

I guess the question is wherever or it is worth having the extra complexity that comes from having 2 close types when out of the 3 remaining uses 2 of them are handshake errors. I guess it's fine to use CloseGraceful for the 3 remaining cases.

}

if (raw_read_buffer_.length() > 0) {
Expand All @@ -167,7 +167,7 @@ Network::IoResult TsiSocket::doRead(Buffer::Instance& buffer) {
ENVOY_CONN_LOG(debug, "TSI: raw read result action {} bytes {} end_stream {}",
callbacks_->connection(), enumToInt(result.action_), result.bytes_processed_,
result.end_stream_read_);
if (result.action_ == Network::PostIoAction::Close && result.bytes_processed_ == 0) {
if (result.action_ != Network::PostIoAction::KeepOpen && result.bytes_processed_ == 0) {
return result;
}

Expand All @@ -176,12 +176,12 @@ Network::IoResult TsiSocket::doRead(Buffer::Instance& buffer) {
}

end_stream_read_ = result.end_stream_read_;
read_error_ = result.action_ == Network::PostIoAction::Close;
read_error_ = result.action_ == Network::PostIoAction::CloseError;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be:
result.action_ != Network::PostIoAction::KeepOpen;

or consider renaming read_error_ to got_close_error_

I guess one thing to keep in mind is that as of this PR RawBufferSocket can only return KeepOpen or CloseError from doRead. Should we consider all Close PostIoActions as errors?

}

if (!handshake_complete_) {
Network::PostIoAction action = doHandshake();
if (action == Network::PostIoAction::Close || !handshake_complete_) {
if (action != Network::PostIoAction::KeepOpen || !handshake_complete_) {
return {action, 0, false};
}
}
Expand Down Expand Up @@ -244,7 +244,7 @@ void TsiSocket::onNextDone(NextResultPtr&& result) {
handshaker_next_calling_ = false;

Network::PostIoAction action = doHandshakeNextDone(std::move(result));
if (action == Network::PostIoAction::Close) {
if (action != Network::PostIoAction::KeepOpen) {
callbacks_->connection().close(Network::ConnectionCloseType::NoFlush);
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ Network::IoResult UpstreamProxyProtocolSocket::writeHeader() {
ENVOY_CONN_LOG(trace, "write error: {}", callbacks_->connection(),
result.err_->getErrorDetails());
if (result.err_->getErrorCode() != Api::IoError::IoErrorCode::Again) {
action = Network::PostIoAction::Close;
action = Network::PostIoAction::CloseError;
}
break;
}
Expand Down
2 changes: 1 addition & 1 deletion source/extensions/transport_sockets/tls/ssl_handshaker.cc
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ Network::PostIoAction SslHandshakerImpl::doHandshake() {
return PostIoAction::KeepOpen;
default:
handshake_callbacks_->onFailure();
return PostIoAction::Close;
return PostIoAction::CloseError;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the PostIoAction::Close on line 233 above a graceful or error close?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graceful.

}
}
}
Expand Down
10 changes: 5 additions & 5 deletions source/extensions/transport_sockets/tls/ssl_socket.cc
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ Network::IoResult SslSocket::doRead(Buffer::Instance& read_buffer) {
if (info_->state() != Ssl::SocketState::HandshakeComplete &&
info_->state() != Ssl::SocketState::ShutdownSent) {
PostIoAction action = doHandshake();
if (action == PostIoAction::Close || info_->state() != Ssl::SocketState::HandshakeComplete) {
if (action != PostIoAction::KeepOpen || info_->state() != Ssl::SocketState::HandshakeComplete) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also change post actions in NotReadySslSocket to CloseError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// end_stream is false because either a hard error occurred (action == Close) or
// the handshake isn't complete, so a half-close cannot occur yet.
return {action, 0, false};
Expand Down Expand Up @@ -154,7 +154,7 @@ Network::IoResult SslSocket::doRead(Buffer::Instance& read_buffer) {
// Renegotiation has started. We don't handle renegotiation so just fall through.
default:
drainErrorQueue();
action = PostIoAction::Close;
action = PostIoAction::CloseError;
break;
}

Expand Down Expand Up @@ -182,7 +182,7 @@ void SslSocket::onPrivateKeyMethodComplete() {

// Resume handshake.
PostIoAction action = doHandshake();
if (action == PostIoAction::Close) {
if (action != PostIoAction::KeepOpen) {
ENVOY_CONN_LOG(debug, "async handshake completion error", callbacks_->connection());
callbacks_->connection().close(Network::ConnectionCloseType::FlushWrite);
}
Expand Down Expand Up @@ -231,7 +231,7 @@ Network::IoResult SslSocket::doWrite(Buffer::Instance& write_buffer, bool end_st
if (info_->state() != Ssl::SocketState::HandshakeComplete &&
info_->state() != Ssl::SocketState::ShutdownSent) {
PostIoAction action = doHandshake();
if (action == PostIoAction::Close || info_->state() != Ssl::SocketState::HandshakeComplete) {
if (action != PostIoAction::KeepOpen || info_->state() != Ssl::SocketState::HandshakeComplete) {
return {action, 0, false};
}
}
Expand Down Expand Up @@ -270,7 +270,7 @@ Network::IoResult SslSocket::doWrite(Buffer::Instance& write_buffer, bool end_st
// Renegotiation has started. We don't handle renegotiation so just fall through.
default:
drainErrorQueue();
return {PostIoAction::Close, total_bytes_written, false};
return {PostIoAction::CloseError, total_bytes_written, false};
}

break;
Expand Down
8 changes: 5 additions & 3 deletions test/extensions/transport_sockets/alts/tsi_socket_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -327,12 +327,13 @@ TEST_F(TsiSocketTest, HandshakeWithImmediateReadError) {
initialize(nullptr, nullptr);

EXPECT_CALL(*client_.raw_socket_, doRead(_)).WillOnce(Invoke([&](Buffer::Instance& buffer) {
Network::IoResult result = {Network::PostIoAction::Close, server_to_client_.length(), false};
Network::IoResult result = {Network::PostIoAction::CloseError, server_to_client_.length(),
false};
buffer.move(server_to_client_);
return result;
}));
EXPECT_CALL(*client_.raw_socket_, doWrite(_, false)).Times(0);
expectIoResult({Network::PostIoAction::Close, 0UL, false},
expectIoResult({Network::PostIoAction::CloseError, 0UL, false},
client_.tsi_socket_->doRead(client_.read_buffer_));
EXPECT_EQ("", client_to_server_.toString());
EXPECT_EQ(0L, client_.read_buffer_.length());
Expand All @@ -344,7 +345,8 @@ TEST_F(TsiSocketTest, HandshakeWithReadError) {
doFakeInitHandshake();

EXPECT_CALL(*client_.raw_socket_, doRead(_)).WillOnce(Invoke([&](Buffer::Instance& buffer) {
Network::IoResult result = {Network::PostIoAction::Close, server_to_client_.length(), false};
Network::IoResult result = {Network::PostIoAction::CloseError, server_to_client_.length(),
false};
buffer.move(server_to_client_);
return result;
}));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ TEST_F(ProxyProtocolTest, ReturnsCloseWhenWriteErrorIsNotAgain) {
}

auto resp = proxy_protocol_socket_->doWrite(msg, false);
EXPECT_EQ(Network::PostIoAction::Close, resp.action_);
EXPECT_EQ(Network::PostIoAction::CloseError, resp.action_);
}

// Test injects V1 PROXY protocol using upstream addresses when transport options are null
Expand Down
10 changes: 5 additions & 5 deletions test/extensions/transport_sockets/tls/handshaker_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ TEST_F(HandshakerTest, NormalOperation) {

// Run the handshakes from the client and server until SslHandshakerImpl decides
// we're done and returns PostIoAction::Close.
while (post_io_action != Network::PostIoAction::Close) {
while (post_io_action == Network::PostIoAction::KeepOpen) {
SSL_do_handshake(client_ssl_.get());
post_io_action = handshaker.doHandshake();
}
Expand All @@ -159,13 +159,13 @@ TEST_F(HandshakerTest, ErrorCbOnAbnormalOperation) {

auto post_io_action = Network::PostIoAction::KeepOpen; // default enum

while (post_io_action != Network::PostIoAction::Close) {
while (post_io_action == Network::PostIoAction::KeepOpen) {
SSL_do_handshake(client_ssl_.get());
post_io_action = handshaker.doHandshake();
}

// In the error case, SslHandshakerImpl also closes the connection.
EXPECT_EQ(post_io_action, Network::PostIoAction::Close);
EXPECT_EQ(post_io_action, Network::PostIoAction::CloseError);
}

// Example SslHandshakerImpl demonstrating special-case behavior which necessitates
Expand Down Expand Up @@ -206,7 +206,7 @@ class SslHandshakerImplForTest : public SslHandshakerImpl {
return Network::PostIoAction::KeepOpen;
default:
handshakeCallbacks()->onFailure();
return Network::PostIoAction::Close;
return Network::PostIoAction::CloseError;
}
}
}
Expand All @@ -231,7 +231,7 @@ TEST_F(HandshakerTest, NormalOperationWithSslHandshakerImplForTest) {

auto post_io_action = Network::PostIoAction::KeepOpen; // default enum

while (post_io_action != Network::PostIoAction::Close) {
while (post_io_action == Network::PostIoAction::KeepOpen) {
SSL_do_handshake(client_ssl_.get());
post_io_action = handshaker.doHandshake();
}
Expand Down