Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move game traffic sockets to io-uring #850

Merged
merged 10 commits into from
Nov 22, 2023
Merged

Move game traffic sockets to io-uring #850

merged 10 commits into from
Nov 22, 2023

Conversation

XAMPPRocky
Copy link
Collaborator

There's a lot of sleeps added in the tests, because this requires spawning threads, the tests need to wait for the threads to spawn. We should probably have a better interface for tests to remove them.

@@ -13,6 +13,9 @@
# limitations under the License.

steps:
- name: ubuntu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong cloudbuild.yaml, you want the one in the root of the repository. This one gets run on a chron once a day.

let sessions = SessionPool::new(config.clone(), tx, shutdown_rx);

proxy.run_recv_from(&config, &sessions, rx).unwrap();
tokio::time::sleep(Duration::from_millis(500)).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion here (if possible) would be to create a looping poll to check a condition (or run the whole test), with a sleep at the end of each loop, and break the loop on successful checking of the condition.

If I was doing this in Golang I would use https://pkg.go.dev/github.com/stretchr/testify/assert#Eventually but I've found a for loop in rust with a sleep at the end of it much easier since Rust closures and ownership are trickier to manage - and I've not been able to find a better way.

Copy link
Collaborator Author

@XAMPPRocky XAMPPRocky Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion here (if possible) would be to create a looping poll to check a condition (or run the whole test), with a sleep at the end of each loop, and break the loop on successful checking of the condition.

My preference would be that we return a signalling primitive that we await, so the moment it's ready the test starts. The problem isn't solvable with a looping poll because the issue isn't "the packet recv timeout expires before the system is ready", the problem is "the test sends the packet before the the socket is listening, so the packet is lost and it will never pass."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've definitely done tests (in other systems) where it's basically

  • set check = false
  • try 10 times
    • send packet to endpoint
    • timeout after 500ms on getting a packet
    • If successfully received, set check = true, and break
    • If not successful, go around the loop again.

So you get a polling operation that eventually passes, and generally isn't prone to hitting race conditions with a single sleep operation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My worry about that loop, is that it could mask some bugs, for the sake of passing race conditions, for example, it would be indistinguishable from a bug in the proxy that was dropping packets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yep, I do see your point there. That's fair.

My preference would be that we return a signalling primitive that we await, so the moment it's ready the test starts.

That makes a lot of sense then - and shouldn't be impossible to do.

Are we thinking we keep the sleeps for now, and then adjust at a later date (or as we track potential flakiness in tests, and fix as needed?) - that would work for me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think keeping the sleeps for now is best, because it will require changing a lot more and it's not the worst right now, I've already implemented part of it in this PR, but there's a lot more code for updating our test framework.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM!

Did you want to merge this now, or wait until after performance tests are done?

Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions github-actions bot added size/xl and removed size/l labels Nov 20, 2023
XAMPPRocky and others added 4 commits November 21, 2023 13:25
This just changes the entry count to be kept as a single atomic, rather
than iterating and summing the entire map for every call to
num_of_endpoints/has_endpoints. Changes quilkin proxy loop from ~90MB/s
to ~125MB/s, so a fairly good increase for little work

The count may not be exactly accurate, but it's more of a boolean than
an exact number anyway
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Several tests were using the same port and would sporadically fail on my
machine, this just changes it so that all of the tests (some already
did this) bind to an ephemeral port unless otherwise specified, then
waits until all of the workers have spawned before sending back which
port was assigned. This also adds a configuration option to specify the
number of workers to use as running all the tests in parallel where each
uses the max number of workers also caused tests to unreliably fail due
to contention
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

This improved the benchmarks slightly, not as much as I had hoped, but
this initial implementation is super simple and can be improved.

One behavior change is that the snappy compression is no longer using
streaming compression in favor of just compressing a single chunk as
streaming compression is overkill for the size of data in a typical
packet, in fact one of the tests had to be changed to add more data to
the test packet as otherwise the packet could still be turned into a
utf8 string since the block compression only added a size header due to
how small it was
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@XAMPPRocky XAMPPRocky enabled auto-merge (squash) November 22, 2023 13:19
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@XAMPPRocky XAMPPRocky merged commit 6324a57 into main Nov 22, 2023
5 checks passed
@quilkin-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: 10702350-868b-4f50-9731-e534aa60f40e

The following development images have been built, and will exist for the next 30 days:

To build this version:

git fetch git@github.com:googleforgames/quilkin.git pull/850/head:pr_850 && git checkout pr_850
cargo build

@Jake-Shadle Jake-Shadle deleted the ep/iouring branch November 22, 2023 14:14
@markmandel markmandel added kind/feature New feature or request area/performance Anything to do with Quilkin being slow, or making it go faster. labels Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Anything to do with Quilkin being slow, or making it go faster. kind/feature New feature or request size/xl
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants