Move game traffic sockets to io-uring #850

XAMPPRocky · 2023-11-07T10:32:47Z

There's a lot of sleeps added in the tests, because this requires spawning threads, the tests need to wait for the threads to spawn. We should probably have a better interface for tests to remove them.

markmandel · 2023-11-07T17:16:40Z

build/ci/build-image/cloudbuild.yaml

@@ -13,6 +13,9 @@
 # limitations under the License.

 steps:
+  - name: ubuntu


Wrong cloudbuild.yaml, you want the one in the root of the repository. This one gets run on a chron once a day.

build/build-image/Dockerfile

markmandel · 2023-11-07T17:27:09Z

src/cli/proxy.rs

+        let sessions = SessionPool::new(config.clone(), tx, shutdown_rx);
+
+        proxy.run_recv_from(&config, &sessions, rx).unwrap();
+        tokio::time::sleep(Duration::from_millis(500)).await;


My suggestion here (if possible) would be to create a looping poll to check a condition (or run the whole test), with a sleep at the end of each loop, and break the loop on successful checking of the condition.

If I was doing this in Golang I would use https://pkg.go.dev/github.com/stretchr/testify/assert#Eventually but I've found a for loop in rust with a sleep at the end of it much easier since Rust closures and ownership are trickier to manage - and I've not been able to find a better way.

My suggestion here (if possible) would be to create a looping poll to check a condition (or run the whole test), with a sleep at the end of each loop, and break the loop on successful checking of the condition.

My preference would be that we return a signalling primitive that we await, so the moment it's ready the test starts. The problem isn't solvable with a looping poll because the issue isn't "the packet recv timeout expires before the system is ready", the problem is "the test sends the packet before the the socket is listening, so the packet is lost and it will never pass."

I've definitely done tests (in other systems) where it's basically

set check = false

try 10 times

send packet to endpoint

timeout after 500ms on getting a packet

If successfully received, set check = true, and break

If not successful, go around the loop again.

So you get a polling operation that eventually passes, and generally isn't prone to hitting race conditions with a single sleep operation.

My worry about that loop, is that it could mask some bugs, for the sake of passing race conditions, for example, it would be indistinguishable from a bug in the proxy that was dropping packets.

Ah yep, I do see your point there. That's fair.

My preference would be that we return a signalling primitive that we await, so the moment it's ready the test starts.

That makes a lot of sense then - and shouldn't be impossible to do.

Are we thinking we keep the sleeps for now, and then adjust at a later date (or as we track potential flakiness in tests, and fix as needed?) - that would work for me.

Yeah I think keeping the sleeps for now is best, because it will require changing a lot more and it's not the worst right now, I've already implemented part of it in this PR, but there's a lot more code for updating our test framework.

SGTM!

Did you want to merge this now, or wait until after performance tests are done?

github-actions · 2023-11-20T16:25:35Z