Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overload: scale transport socket connect timeout #13800

Merged
merged 22 commits into from
Jan 25, 2021

Conversation

akonradi
Copy link
Contributor

@akonradi akonradi commented Oct 28, 2020

Commit Message: Scale transport socket connect timeout in response to load
Additional Description:
Add support for scaling the transport socket connect timeout with load.

Risk Level: low
Testing: added tests and ran affected tests
Docs Changes: none
Release Notes: none
Platform Specific Features: none
Fixes: #11426

/assign @KBaichoo

Splitting these will prevent circular dependencies when the Dispatcher
references thread-local overload state.

Signed-off-by: Alex Konradi <akonradi@google.com>
Thread the ThreadLocalOverloadState down into the ConnectionHandlerImpl
and ConnectionImpl and use it to create the timer for the transport
socket handshake timeout.

Signed-off-by: Alex Konradi <akonradi@google.com>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #13800 was opened by akonradi.

see: more, trace.

KBaichoo
KBaichoo previously approved these changes Oct 28, 2020
Copy link
Contributor

@KBaichoo KBaichoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; most of it was refactoring / interface changes.

@@ -143,6 +142,8 @@ OverloadTimerType parseTimerType(
switch (config_timer_type) {
case Config::HTTP_DOWNSTREAM_CONNECTION_IDLE:
return OverloadTimerType::HttpDownstreamIdleConnectionTimeout;
case Config::TRANSPORT_SOCKET_CONNECT:
return OverloadTimerType::HttpDownstreamIdleConnectionTimeout;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think you meant TransportSocketConnectTimeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's embarrassing, and that there wasn't a test that caught it is problematic. I've added a test to verify that the correct scale factors are applied for the different timer types.

@yanavlasov
Copy link
Contributor

format check failed

Add tests to verify that the correct scale factors are being applied
based on the overload manager action config.

Signed-off-by: Alex Konradi <akonradi@google.com>
Copy link
Contributor Author

@akonradi akonradi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed formatting too.

@@ -143,6 +142,8 @@ OverloadTimerType parseTimerType(
switch (config_timer_type) {
case Config::HTTP_DOWNSTREAM_CONNECTION_IDLE:
return OverloadTimerType::HttpDownstreamIdleConnectionTimeout;
case Config::TRANSPORT_SOCKET_CONNECT:
return OverloadTimerType::HttpDownstreamIdleConnectionTimeout;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's embarrassing, and that there wasn't a test that caught it is problematic. I've added a test to verify that the correct scale factors are applied for the different timer types.

This addresses the clang-tidy issue

Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
@antoniovicente antoniovicente self-assigned this Nov 3, 2020
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this happening. First set of comments related just to PR size and unrelated looking changes.

include/envoy/event/scaled_range_timer_manager.h Outdated Show resolved Hide resolved
// Test-only functions that need access to the internals of
// ScaledTimerMinimum. These are declared here but only defined in tests.
friend bool operator==(const ScaledTimerMinimum&, const ScaledTimerMinimum&);
friend std::ostream& operator<<(std::ostream&, const ScaledTimerMinimum&);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to implement these functions outside tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if we want to write these as output anywhere. We don't need to yet, and this avoids (a tiny bit of) binary bloat.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems very unusual to implement these functions in a different library. Consider the case of "ForTest" methods. We tend to implement those in the same library as the main object despite they only being used in tests. I think that modern linkers are capable of removing unused code from the resulting binaries, but don't know if they do so by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sold, the confusion isn't worth the tiny bit of size gains. Moved these definitions inline.

#include "envoy/event/timer.h"
#include "envoy/thread_local/thread_local.h"
#include "envoy/event/dispatcher.h"
#include "envoy/server/overload/thread_local_overload_state.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactor to add a directory for overload manager components in the server subdirectory and thread_local_object.h adds a lot of noise to this review. Worth doing it as a separate change instead? Also see comments about forward declaring instead.

Example:

class Dispatcher;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to do the refactor in a separate PR. I don't have strong opinions about forward declarations vs #includes other than an inclination to follow the style guide.


void ServerConnectionImpl::setTransportSocketConnectTimeout(std::chrono::milliseconds timeout) {
if (!transport_connect_pending_) {
return;
}
if (transport_socket_connect_timer_ == nullptr) {
transport_socket_connect_timer_ =
dispatcher_.createTimer([this] { onTransportSocketConnectTimeout(); });
overload_state_.createScaledTimer(Server::OverloadTimerType::TransportSocketConnectTimeout,
[this] { onTransportSocketConnectTimeout(); });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should revisit the possibility of having the dispatcher be the one that is aware of the overload state and provide a method to create scaled timers. Dispatcher is plumbed in widely.

I can see a counter argument involving our desire to bring overload manager closer to connections as a starting point for adjusting work done on wakeup based on overload state; something like smaller allocations/reads when under memory pressure. So this new plumbing may be more generally useful. But I think the direction we're taking there is to measure memory usage by connection, which actually doesn't require involvement from the overload mananger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned elsewhere, I'd rather keep scaled timer creation going through the Overload Manager since it is ultimately responsible for determining the scale factor of those timers. We can revisit making the Dispatcher aware of the OM later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of plumbing the OM widely, could the dispatcher hold a reference to it? I dislike that there are now two very different places to create timers: OM and dispatcher.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a chicken-and-egg problem there since the thread-local overload state requires a reference to the dispatcher. We could work around this, though, if that sounds preferable. It seems like the larger issue is that there is a divide between the dispatcher, which represents a thread worker, and thread-local state, which is managed elsewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dispatcher is also owned by the thread. I wonder if the solution would be to have the dispatcher own the thread overload state instead of having it be part of the thread local storage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dispatcher is also owned by the thread. I wonder if the solution would be to have the dispatcher own the thread overload state instead of having it be part of the thread local storage.

Yeah I think that makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started working on this. It's doable but introduces some weirdness; see #14401 for more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a fresh attempt at this in #14679 which feels reasonably clean.

@@ -6,30 +6,13 @@

#include "envoy/common/pure.h"
#include "envoy/event/dispatcher.h"
#include "envoy/thread_local/thread_local_object.h"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the dispatcher could be forward declared and dispatcher interface library depedency removed as a dependency if that would make some of these changes easier.

include/envoy/event/dispatcher.h Outdated Show resolved Hide resolved
@@ -9,6 +9,7 @@
#include "envoy/api/api.h"
#include "envoy/network/listen_socket.h"
#include "envoy/network/listener.h"
#include "envoy/server/overload/overload_manager.h"
Copy link
Contributor

@antoniovicente antoniovicente Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this include needed? It seems that envoy/server/overload/thread_local_overload_state.h would be sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither is needed since thread_local_overload_state.h is included in the interface header.

source/common/event/dispatcher_impl.h Outdated Show resolved Hide resolved
@htuch
Copy link
Member

htuch commented Nov 9, 2020

/lgtm api

Copy link
Contributor Author

@akonradi akonradi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do the refactoring in a separate PR, or to talk about making the thread-local overload state accessible via the dispatcher.

include/envoy/event/dispatcher.h Outdated Show resolved Hide resolved
include/envoy/event/scaled_range_timer_manager.h Outdated Show resolved Hide resolved
// Test-only functions that need access to the internals of
// ScaledTimerMinimum. These are declared here but only defined in tests.
friend bool operator==(const ScaledTimerMinimum&, const ScaledTimerMinimum&);
friend std::ostream& operator<<(std::ostream&, const ScaledTimerMinimum&);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if we want to write these as output anywhere. We don't need to yet, and this avoids (a tiny bit of) binary bloat.

#include "envoy/event/timer.h"
#include "envoy/thread_local/thread_local.h"
#include "envoy/event/dispatcher.h"
#include "envoy/server/overload/thread_local_overload_state.h"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to do the refactor in a separate PR. I don't have strong opinions about forward declarations vs #includes other than an inclination to follow the style guide.

@antoniovicente
Copy link
Contributor

Happy to do the refactoring in a separate PR, or to talk about making the thread-local overload state accessible via the dispatcher.

See #13800 (comment)

If you decide to expose this via dispatcher instead, I suggest you add a createRangeTimer method to the dispatcher interface instead of providing a way to access thread locals via the dispatcher interface.

@akonradi
Copy link
Contributor Author

As discussed offline, I think I'd prefer keeping the scaled timer accessible via the OverloadManager, not the Dispatcher, since the scaling factor is determined by the OM. To that end, I've sent #13999 to do the refactoring requested above separate from this PR.

…-scaled-timeout

Signed-off-by: Alex Konradi <akonradi@google.com>
antoniovicente
antoniovicente previously approved these changes Nov 13, 2020
@@ -93,6 +93,11 @@ message ScaleTimersOverloadActionConfig {
// Adjusts the idle timer for downstream HTTP connections that takes effect when there are no active streams.
// This affects the value of :ref:`RouteAction.idle_timeout <envoy_v3_api_field_config.route.v3.RouteAction.idle_timeout>`.
HTTP_DOWNSTREAM_CONNECTION_IDLE = 1;

// Adjusts the timer for how long downstream clients have to finish transport-level negotiations before the connection is closed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This line seem very long. Can you rerun format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed manually since the formatter doesn't seem to touch proto comments.

// Test-only functions that need access to the internals of
// ScaledTimerMinimum. These are declared here but only defined in tests.
friend bool operator==(const ScaledTimerMinimum&, const ScaledTimerMinimum&);
friend std::ostream& operator<<(std::ostream&, const ScaledTimerMinimum&);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems very unusual to implement these functions in a different library. Consider the case of "ForTest" methods. We tend to implement those in the same library as the main object despite they only being used in tests. I think that modern linkers are capable of removing unused code from the resulting binaries, but don't know if they do so by default.

return ConnectionMocks{std::move(dispatcher), timer, std::move(transport_socket), file_event,
&file_ready_cb_};
return ConnectionMocks{std::move(dispatcher), {}, timer,
std::move(transport_socket), file_event, &file_ready_cb_};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unusual looking formatting, can you rerun format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reran format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that the {}'s around the arguments are confusing the formatter, ugh.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: will explicitly constructing Server::MockThreadLocalOverloadState() make formatter happy?

)YAML";

// These are the timer types according to the reduced timeouts config above.
constexpr std::pair<OverloadTimerType, Event::ScaledTimerMinimum> kReducedTimeoutsMinimums[]{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please move this list of cases inside the relevant test case to avoid having a large separation between these definitions and the test body.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list matches exactly what kReducedTimeoutsConfig produces, so I wanted to keep them close together. Please push back if that's not a good enough reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know which of the two separations is more problematic. I would consider having both in the test case so everything is in 1 place, or move OverloadManagerImplTest.TimerTypesProduceCorrectMinimums so it is just below this list definition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the test case; that's a good option.

@akonradi
Copy link
Contributor Author

I'm trying to figure out what integrating the thread-local overload state into the dispatcher would look like. Should I consider that a blocker for getting this PR merged?

@antoniovicente
Copy link
Contributor

I'm trying to figure out what integrating the thread-local overload state into the dispatcher would look like. Should I consider that a blocker for getting this PR merged?

I defer to @ggreenway 's opinion.

Seems like you accumulated some merge conflicts as your other PRs which modify some of the same files got merged in.

@ggreenway
Copy link
Contributor

I'm trying to figure out what integrating the thread-local overload state into the dispatcher would look like. Should I consider that a blocker for getting this PR merged?

Yes, I'd like to have that sorted out before we merge this.

Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
@akonradi
Copy link
Contributor Author

I've merged main. This is now using Dispatcher::createScaledTimer, which obviates the need for plumbing around the ThreadLocalOverloadState.

@akonradi
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #13800 (comment) was created by @akonradi.

see: more, trace.

Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great otherwise.

@ggreenway could you take a look?

api/envoy/config/overload/v3/overload.proto Show resolved Hide resolved
…caled-timeout

Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
Copy link
Contributor

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, aside from one nit.

include/envoy/event/scaled_timer.h Outdated Show resolved Hide resolved
…caled-timeout

Signed-off-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Alex Konradi <akonradi@google.com>
@akonradi
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #13800 (comment) was created by @akonradi.

see: more, trace.

@akonradi
Copy link
Contributor Author

@envoyproxy/api-shepherds for the new timer type

@lizan lizan merged commit 8814014 into envoyproxy:main Jan 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[tls] timeout for TLS handshakes
8 participants