-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test performance and reliability fixes #1096
Test performance and reliability fixes #1096
Conversation
If a test happens to use the same port as a previous test run (either by bad luck or hardcoding like TestMultiRepoUpdater) that happened within a minute, the second run will fail because TCP by default keeps sockets open for a while. Avoid this by explicitly saying re-use is fine in this case. Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
This seems to have only worked because someone else usually imports formats. This fixes e.g. python3 test_download.py \ TestDownload.test_download_url_to_tempfileobj Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
Interested in seeing the CI results... Even if they're green we might want to have more CI runs to make sure there are no flaky tests: based on git archeology this bit of code has been very prone to those. EDIT: obviously a bit of work left :| |
MSG_DONTWAIT does not exist on windows apparently: will have to set the socket to non-blocking mode explicitly |
The other issue is python2.7:
|
This avoids the need to sleep() before removing the temporary directories used, and makes sure we don't get ResourceWarning: subprocess N is still running messages. Use subprocess.communicate() instead of wait() if the process has a pipe (currently the return values are just dropped though). Practical results should be more reliability and a slightly reduced test runtime. Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
fa0e30c
to
d33b8c4
Compare
Well the windows results imply that Windows is not fine with my hack of calling connect() multiple times... I guess the by-the-book method would be to select() after getting the first EINPROGRESS (I'm assuming that's received first on Windows, just guessing though) |
@jku would you please kindly mark this as a draft PR until ready? Thanks! |
b26c4a4
to
f6bfaf3
Compare
Okay, should be ready for review now. My original idea of non-blocking connect() did not work on Windows at all: Latest version is using a blocking socket but luckily that's fast on Linux and not slow on Windows ("Connection Refused" can unfortunately take even 2 seconds on Windows because it does retries internally). I've also bumped the timeout to 10 seconds since a) it does not increase the normal runtime at all and b) Travis actually managed to hit the 5 second mark once. I'm a little worried about Travis not managing to start simple_server.py in 5 seconds though... Doing some more CI runs might not be a bad idea: the full test runtime gives a rough indication of whether there were any really long waits. |
py35 on Travis CI failed the first time, which only indicates that the tests are still a little flakey. |
That was a fossa upload failure I think (but I don't know how to view that build anymore). |
WRT test speed:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic PR, thanks @jku. Very happy to see sleep
calls being removed!
I was expecting to see some import time
lines deleted too, are we still using time in all of the tests that are no longer calling time.sleep
?
Thanks @joshuagl all relevant comments. I'll push these changes one by one to get a few more CI runs, will mark the comments resolved after everything is pushed |
f3f03ee
to
c210883
Compare
Last commit is a fixup for an early commit: I'll squash it once the 2nd review is there (or I can do it now, just let me know). There are now 7 successive green CI runs for this branch: I feel confident in saying the reliability is on par or better than current develop branch. |
Excellent set of changes, thanks Jussi. The PR looks good to me, please rebase and squash/fixup and we'll be ready to approve. |
* Add utility function to wait on a socket until it responds * Use the function instead of sleeping in tests that need to wait for the server to start * Increase the max timeout to 5 seconds by default (as appveyor builds still seem to hit the 3 second mark sometimes) wait_for_server() functions quite differently depending on OS: Windows can take 2 seconds to respond with ECONNREFUSED whereas Linux is almost instant. There might be tricks to be faster on Windows (like setting a shorter socket timeout) but this was not done here. This makes a full Linux test run almost 40% faster and should be more reproducible on every platform. Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
This silences "ResourceWarning: unclosed file" Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
Silences "ResourceWarning: unclosed file" Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
Travis managed to still timeout with 5 seconds: increasing the timeout (now that it's not a sleep) doesn't really hurt normal non-VM use cases so let's bump it to 10 seconds. Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
Our tests already expect localhost lookup to work to find test servers: use it consistently instead of sometimes using 127.0.0.1 Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
proxy_server.py is python2 only, don't try to setup the class as it leads to a confusing TimeoutError. This may prevent me from debugging this apparent test failure a third time. Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
c210883
to
b6661e0
Compare
This PR is ready to go from my perspective, in case that was unclear. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these great improvements to the test reliability.
Improve reliability and speed of the tests: altogether a full test run is > 40% faster on my laptop and should be more reliable as well (there's still a possibility that e.g. the
new wait_for_server()
is not handling some exceptions it should but the potential issues should be resolvable over time -- unlike sleep() that is inherently unreliable).Also add a missing import in download.py and close some unclosed handles that make it difficult to read test output
Fixes issue #1085
There's quite a bit of repetition in e.g. the simple-server using tests... I didn't try to refactor that at the same time but it might make sense to have a SimpleServerTestMixin that does the required initialization and cleanup.