Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tls] timeout for TLS handshakes #11426

Closed
antoniovicente opened this issue Jun 3, 2020 · 13 comments · Fixed by #13800
Closed

[tls] timeout for TLS handshakes #11426

antoniovicente opened this issue Jun 3, 2020 · 13 comments · Fixed by #13800
Assignees
Labels

Comments

@antoniovicente
Copy link
Contributor

It would be good to have individual config knobs to control timeouts for TLS connections that are in waiting for handshake to start and are in the middle of the handshake. AFAIK there is an existing connection idle_timeout config but it tends to be relatively long because it is meant to cover cases where an idle H1 connection is waiting for the next request.

The handshake timeout should cover the full operation since it is reasonable for clients to complete handshakes relatively quickly(within 5-15secs) once started. A longer timeout may be appropriate for connections that are waiting for the handshake to start in order to accommodate for connection prefetching.

The effective value of these timeouts could be reduced based on memory pressure to increase resiliency to memory exhaustion attacks and improving the proxy's ability to accept legitimate connections/requests. Config strawman: min/max timeouts and high/low memory thresholds at which the timeouts apply or high/low connection count at which timeouts apply

@akonradi
Copy link
Contributor

akonradi commented Jun 5, 2020

This seems like something that should be added to the DownstreamTlsContext proto. If that sounds reasonable, I'd be happy to address the first part of this issue (adding the configuration knobs).

@antoniovicente
Copy link
Contributor Author

CC @envoyproxy/api-shepherds

DownstreamTlsContext proto seems like the right place, but you should double check with the api-shepherds

@htuch
Copy link
Member

htuch commented Jun 8, 2020

Makes sense to me, @lizan WDYT?

@lizan
Copy link
Member

lizan commented Jun 8, 2020

For upstream TLS the handshake is included in connect_timeout, wondering whether we should do this similarly?

@antoniovicente
Copy link
Contributor Author

A longer timeout before handshake starts is appropriate in some cases to allow for TCP prefetches.

@ldemailly
Copy link

If i have a tls listener it seems that it will sit there permanently if the client just connects and doesn't ever send anything? is that what we expect? (when I read "For TLS connections, the connect timeout includes the TLS handshake." on https://www.envoyproxy.io/docs/envoy/latest/faq/configuration/timeouts#tcp I mistakenly expect that it applies to my inbound connection too but it isn't?) am I missing a listener side/tls filter side timeout setting?

@antoniovicente
Copy link
Contributor Author

I think that the idle HTTP connection timeout applies in this case. Unfortunately it has a very long default value of 1 hour.

google.protobuf.Duration idle_timeout = 1;

@ldemailly
Copy link

ldemailly commented Jul 10, 2020

thanks @antoniovicente but lets say I would set the idle_timeout to 10s; after the tls handshake completes, I don't want to drop established connections that idle for 10s - I really need a "please complete the layer 7 connection (ie post tls) within X" not a idle timeout for the whole duration of the flow (and/or the ability to change the timeout after tls is done)

@antoniovicente
Copy link
Contributor Author

thanks @antoniovicente but lets say I would set the idle_timeout to 10s; after the tls handshake completes, I don't want to drop established connections that idle for 10s - I really need a "please complete the layer 7 connection (ie post tls) within X" not a idle timeout for the whole duration of the flow (and/or the ability to change the timeout after tls is done)

I agree with you. I think we need more specific timeouts for various operations that happen prior to a request being active on the connection. Currently those specific timeouts don't exist. See #11427 for a related feature request.

@ldemailly
Copy link

I see the other issue is closed but I can't quite tell if I can now set a tls establishment timeout or not?

@akonradi
Copy link
Contributor

Yep, #13610 added that. #13800 adds the ability to scale the timeout in response to overload, which I think should close this issue out completely.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Jan 15, 2021
@akonradi
Copy link
Contributor

Not stale! #14679 implements some refactoring requested on #13800.

@antoniovicente antoniovicente removed the stale stalebot believes this issue/PR has not been touched recently label Jan 15, 2021
lizan pushed a commit that referenced this issue Jan 25, 2021
Add support for scaling the transport socket connect timeout with load.

Risk Level: low
Testing: added tests and ran affected tests
Docs Changes: none
Release Notes: none
Platform Specific Features: none
Fixes: #11426

Signed-off-by: Alex Konradi <akonradi@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants