-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move game traffic sockets to io-uring #850
Conversation
30eeb47
to
96faf8b
Compare
build/ci/build-image/cloudbuild.yaml
Outdated
@@ -13,6 +13,9 @@ | |||
# limitations under the License. | |||
|
|||
steps: | |||
- name: ubuntu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong cloudbuild.yaml, you want the one in the root of the repository. This one gets run on a chron once a day.
let sessions = SessionPool::new(config.clone(), tx, shutdown_rx); | ||
|
||
proxy.run_recv_from(&config, &sessions, rx).unwrap(); | ||
tokio::time::sleep(Duration::from_millis(500)).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion here (if possible) would be to create a looping poll to check a condition (or run the whole test), with a sleep at the end of each loop, and break the loop on successful checking of the condition.
If I was doing this in Golang I would use https://pkg.go.dev/github.com/stretchr/testify/assert#Eventually but I've found a for
loop in rust with a sleep at the end of it much easier since Rust closures and ownership are trickier to manage - and I've not been able to find a better way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion here (if possible) would be to create a looping poll to check a condition (or run the whole test), with a sleep at the end of each loop, and break the loop on successful checking of the condition.
My preference would be that we return a signalling primitive that we await, so the moment it's ready the test starts. The problem isn't solvable with a looping poll because the issue isn't "the packet recv timeout expires before the system is ready", the problem is "the test sends the packet before the the socket is listening, so the packet is lost and it will never pass."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've definitely done tests (in other systems) where it's basically
- set check = false
- try 10 times
- send packet to endpoint
- timeout after 500ms on getting a packet
- If successfully received, set check = true, and break
- If not successful, go around the loop again.
So you get a polling operation that eventually passes, and generally isn't prone to hitting race conditions with a single sleep operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My worry about that loop, is that it could mask some bugs, for the sake of passing race conditions, for example, it would be indistinguishable from a bug in the proxy that was dropping packets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yep, I do see your point there. That's fair.
My preference would be that we return a signalling primitive that we await, so the moment it's ready the test starts.
That makes a lot of sense then - and shouldn't be impossible to do.
Are we thinking we keep the sleeps for now, and then adjust at a later date (or as we track potential flakiness in tests, and fix as needed?) - that would work for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think keeping the sleeps for now is best, because it will require changing a lot more and it's not the worst right now, I've already implemented part of it in this PR, but there's a lot more code for updating our test framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM!
Did you want to merge this now, or wait until after performance tests are done?
ea69281
to
ca2443d
Compare
6b53364
to
4921993
Compare
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
This just changes the entry count to be kept as a single atomic, rather than iterating and summing the entire map for every call to num_of_endpoints/has_endpoints. Changes quilkin proxy loop from ~90MB/s to ~125MB/s, so a fairly good increase for little work The count may not be exactly accurate, but it's more of a boolean than an exact number anyway
e7c69dd
to
54ff435
Compare
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
Several tests were using the same port and would sporadically fail on my machine, this just changes it so that all of the tests (some already did this) bind to an ephemeral port unless otherwise specified, then waits until all of the workers have spawned before sending back which port was assigned. This also adds a configuration option to specify the number of workers to use as running all the tests in parallel where each uses the max number of workers also caused tests to unreliably fail due to contention
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
This improved the benchmarks slightly, not as much as I had hoped, but this initial implementation is super simple and can be improved. One behavior change is that the snappy compression is no longer using streaming compression in favor of just compressing a single chunk as streaming compression is overkill for the size of data in a typical packet, in fact one of the tests had to be changed to add more data to the test packet as otherwise the packet could still be turned into a utf8 string since the block compression only added a size header due to how small it was
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size. |
Build Succeeded 🥳 Build Id: 10702350-868b-4f50-9731-e534aa60f40e The following development images have been built, and will exist for the next 30 days: To build this version:
|
There's a lot of sleeps added in the tests, because this requires spawning threads, the tests need to wait for the threads to spawn. We should probably have a better interface for tests to remove them.