Add on-by-default option to use TCP keepalive in the REST client. #5319

teo-tsirpanis · 2024-09-25T15:16:09Z

This PR enables by default TCP keepalive in the REST client connections, and adds an option to disable them. TCP keepalive is expected to reduce failure rates on large remote queries.

TYPE: FEATURE
DESC: Connections to TileDB Cloud use TCP keepalive by default.

TYPE: CONFIG
DESC: Add rest.curl.tcp_keepalive config option that controls using TCP keepalive for TileDB Cloud connections. It is enabled by default.

ypatia

If I understand correctly from documentation (https://everything.curl.dev/transfers/conn/keepalive.html), using the defaults for the rest of KEEPALIVE options can buy as 60 sec (Idle time) + 60 sec (Probe interval) * 9 (Probe count) = 10 minutes of IDLE processing time on REST Server side.

I am worried this might not be enough time for a large ingestion (that's where we typically hit those "Connection reset by peer" type of errors), and I think it'd make sense to set a larger number of probes. In fact I don't see any downside to setting Probe count to 120 for example (and buy us 2 hours instead), as our goal here is not to detect broken connections as soon as possible, but to put our connection on top of the list of active connections in potential intermediate devices such as Firewalls etc for as long as possible before our query is done processing.

ypatia · 2024-09-26T09:07:06Z

Let's also add a text description to this PR, and consider backport the change to 2.26 when approved.

teo-tsirpanis · 2024-09-26T11:14:43Z

can buy as […] 10 minutes of IDLE processing time on REST Server side

I don't understand this. The REST server doesn't have to respond by itself when receiving a keepalive packet, does it? Isn't the response to the TCP keepalive sent by the operating system's network stack?

ypatia · 2024-09-26T11:58:10Z

can buy as […] 10 minutes of IDLE processing time on REST Server side

I don't understand this. The REST server doesn't have to respond by itself when receiving a keepalive packet, does it? Isn't the response to the TCP keepalive sent by the operating system's network stack?

Sorry I wasn't clear. I am referring to case 2.4 here: https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html . The connection might be alive, but inactivity can make intermediate proxies and LBs decide to delete it as inactive. So sending keepalives for longer can reduce that probability, and sending them for only 10 minutes might not be enough. I've heard of ingestions lasting 1 hour or so in the past.

Actually I can't think of another use case that Keepalives would be helpful in our case other than what I just described. Please keep me honest as I might be missing some other scenario.

teo-tsirpanis · 2024-09-26T12:42:31Z

I think this ten minute timer gets reset every time a side receives and acknowledges a keepalive. Therefore an ingestion can last for longer.

ypatia · 2024-09-26T12:43:51Z

can buy as […] 10 minutes of IDLE processing time on REST Server side

I don't understand this. The REST server doesn't have to respond by itself when receiving a keepalive packet, does it? Isn't the response to the TCP keepalive sent by the operating system's network stack?

Sorry I wasn't clear. While the REST server is processing (e.g. writing to s3) an incoming WRITE query with large data (say 2GB) from the client, the TCP connection remains without any traffic. So that TCP connection, that is not necessarily between the client and the server, it might be between the client and a load balancer, can remain inactive for, say 20 mins, that it takes to process/ingest the incoming data. So even if the server replies with ACK in the first 10 minutes, it might need

I think this ten minute timer gets reset every time a side receives and acknowledges a keepalive. Therefore an ingestion can last for longer.

You are right, I got it wrong. The default settings look ok then indeed.

ypatia

LGTM, but let's add a PR description.

nickvigilante

Approved config.h and config_api_external.h

shaunrd0

Left two suggestions but LGTM 👍

tiledb/sm/config/config.cc

teo-tsirpanis force-pushed the teo/rest-tcp-keepalive branch from 9d5a6b4 to 8059abe Compare September 25, 2024 22:46

Add on-by-default option to use TCP keepalive in the REST client.

5808eeb

teo-tsirpanis force-pushed the teo/rest-tcp-keepalive branch from 8059abe to 5808eeb Compare September 25, 2024 23:22

teo-tsirpanis marked this pull request as ready for review September 25, 2024 23:49

teo-tsirpanis requested a review from a team as a code owner September 25, 2024 23:49

teo-tsirpanis requested review from ypatia and shaunrd0 September 25, 2024 23:49

ypatia reviewed Sep 26, 2024

View reviewed changes

ypatia approved these changes Sep 26, 2024

View reviewed changes

nickvigilante approved these changes Sep 26, 2024

View reviewed changes

shaunrd0 approved these changes Sep 26, 2024

View reviewed changes

tiledb/sm/config/config.cc Outdated Show resolved Hide resolved

tiledb/sm/config/config.cc Outdated Show resolved Hide resolved

Declare config option value constants in the header.

237632f

teo-tsirpanis force-pushed the teo/rest-tcp-keepalive branch from 131e416 to 237632f Compare September 27, 2024 10:22

teo-tsirpanis merged commit f67a4cb into dev Sep 30, 2024
62 of 63 checks passed

teo-tsirpanis deleted the teo/rest-tcp-keepalive branch September 30, 2024 10:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add on-by-default option to use TCP keepalive in the REST client. #5319

Add on-by-default option to use TCP keepalive in the REST client. #5319

teo-tsirpanis commented Sep 25, 2024 •

edited

Loading

ypatia left a comment •

edited by teo-tsirpanis

Loading

ypatia commented Sep 26, 2024

teo-tsirpanis commented Sep 26, 2024

ypatia commented Sep 26, 2024

teo-tsirpanis commented Sep 26, 2024

ypatia commented Sep 26, 2024

ypatia left a comment

nickvigilante left a comment

shaunrd0 left a comment

Add on-by-default option to use TCP keepalive in the REST client. #5319

Add on-by-default option to use TCP keepalive in the REST client. #5319

Conversation

teo-tsirpanis commented Sep 25, 2024 • edited Loading

ypatia left a comment • edited by teo-tsirpanis Loading

Choose a reason for hiding this comment

ypatia commented Sep 26, 2024

teo-tsirpanis commented Sep 26, 2024

ypatia commented Sep 26, 2024

teo-tsirpanis commented Sep 26, 2024

ypatia commented Sep 26, 2024

ypatia left a comment

Choose a reason for hiding this comment

nickvigilante left a comment

Choose a reason for hiding this comment

shaunrd0 left a comment

Choose a reason for hiding this comment

teo-tsirpanis commented Sep 25, 2024 •

edited

Loading

ypatia left a comment •

edited by teo-tsirpanis

Loading