Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

Remove a unnecessary sleep in run server #32216

Conversation

lijunwangs
Copy link
Contributor

@lijunwangs lijunwangs commented Jun 21, 2023

Problem

The quic streamer run_server unnecessarily sleep. This can slow down processing connection in initializing state and build up memory pressure. We do not need it as we have already used timeout when doing

    let timeout_connection = timeout(WAIT_FOR_CONNECTION_TIMEOUT, incoming.accept()).await;

Summary of Changes

Remove the unnecessary sleep.

Fixes #

@behzadnouri
Copy link
Contributor

I am not sure how this argument

We do not need it as we have already used timeout when doing
let timeout_connection = timeout(WAIT_FOR_CONNECTION_TIMEOUT, incoming.accept()).await;

is relevant here. The timeout addresses idle periods were the future blocks indefinitely. The sleep on the other hand addresses periods where there is a flood of incoming new connections and we need some sort of throttling. These two seem to me opposite of each other.

Don't we need to somehow throttle flood of new connections?

@ryleung-solana
Copy link
Contributor

I don't have any super strong objections to this, but I do wonder what the original motivation was for including this wait int the first place. However, I agree that we don't want to leave incoming connections to build up in Quinn with this delay, and want to get them wrapped into the Timeout future ASAP, unless we have a good reason for why this wait was included in the first place.

@pgarg66
Copy link
Contributor

pgarg66 commented Jun 21, 2023

This is the PR where it was added: #26043
As @behzadnouri mentioned, this is just to throttle the flood of incoming connections. It'll be ideal if we can see the impact of this change with some test in a multi node cluster.

@lijunwangs
Copy link
Contributor Author

This is the PR where it was added: #26043
As @behzadnouri mentioned, this is just to throttle the flood of incoming connections. It'll be ideal if we can see the impact of this change with some test in a multi node cluster.

The problem is it is not throttling the flood of incoming connections -- especially the ones in initializing state. See https://github.com/quinn-rs/quinn/blob/main/quinn-proto/src/endpoint.rs#L539.

The rate limiting part has to be solved using a rate limiter which I will submit a PR soon. I found this alone to be very effective in limiting memory growth by starting a standalone streamer.

@pgarg66
Copy link
Contributor

pgarg66 commented Jun 21, 2023

This is the PR where it was added: #26043
As @behzadnouri mentioned, this is just to throttle the flood of incoming connections. It'll be ideal if we can see the impact of this change with some test in a multi node cluster.

The problem is it is not throttling the flood of incoming connections -- especially the ones in initializing state. See https://github.com/quinn-rs/quinn/blob/main/quinn-proto/src/endpoint.rs#L539.

The rate limiting part has to be solved using a rate limiter which I will submit a PR soon. I found this alone to be very effective in limiting memory growth by starting a standalone streamer.

Ok, that makes sense. I can approve if there are no other objections. Can you also rebase to fix the CI errors?

@lijunwangs lijunwangs force-pushed the remove_a_unnecessary_sleep_in_run_server branch from 7dfbc82 to 5c159a2 Compare June 21, 2023 19:31
@codecov
Copy link

codecov bot commented Jun 21, 2023

Codecov Report

Merging #32216 (5c159a2) into master (42ccc5c) will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@           Coverage Diff            @@
##           master   #32216    +/-   ##
========================================
  Coverage    82.0%    82.0%            
========================================
  Files         769      769            
  Lines      208992   209095   +103     
========================================
+ Hits       171408   171497    +89     
- Misses      37584    37598    +14     

@lijunwangs lijunwangs merged commit 689ca50 into solana-labs:master Jun 22, 2023
@mvines
Copy link
Contributor

mvines commented Jun 23, 2023

Do we need a v1.16 backport of this?

@lijunwangs lijunwangs added v1.16 PRs that should be backported to v1.16 v1.14 labels Jun 23, 2023
mergify bot pushed a commit that referenced this pull request Jun 23, 2023
remove sleep; and handle initializing connection as soon as available

(cherry picked from commit 689ca50)
mergify bot pushed a commit that referenced this pull request Jun 23, 2023
remove sleep; and handle initializing connection as soon as available

(cherry picked from commit 689ca50)

# Conflicts:
#	streamer/src/nonblocking/quic.rs
lijunwangs added a commit that referenced this pull request Jun 26, 2023
…32251)

Remove a unnecessary sleep in run server (#32216)

remove sleep; and handle initializing connection as soon as available

(cherry picked from commit 689ca50)

Co-authored-by: Lijun Wang <83639177+lijunwangs@users.noreply.github.com>
wen-coding pushed a commit to wen-coding/solana that referenced this pull request Aug 15, 2023
remove sleep; and handle initializing connection as soon as available
wen-coding pushed a commit to wen-coding/solana that referenced this pull request Aug 15, 2023
remove sleep; and handle initializing connection as soon as available
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
v1.16 PRs that should be backported to v1.16
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants