-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition when executing parallel tests, passing context down and back #292
Fix race condition when executing parallel tests, passing context down and back #292
Conversation
Hi @phisco. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phisco, this looks promising. Thanks for jumping in.
I left couple of comments about returning context from env.Test() method.
Also, I would suggest to focus on the following location where the test is actually ran. I would do the followings
Change the name of the function param from t
to new
:
t.Run(s.Name, func(newT *testing.T) {
This is to doubly-ensure that the right t
variable is captured in the closure. I would also do the same here.
@@ -313,8 +321,8 @@ func (e *testEnv) TestInParallel(t *testing.T, testFeatures ...types.Feature) { | |||
// | |||
// BeforeTest and AfterTest operations are executed before and after | |||
// the feature is tested respectively. | |||
func (e *testEnv) Test(t *testing.T, testFeatures ...types.Feature) { | |||
e.processTests(t, false, testFeatures...) | |||
func (e *testEnv) Test(t *testing.T, testFeatures ...types.Feature) context.Context { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change isn't it? It looks additive so, it may not be. But we should check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the context coming out of Test is never used is it ? Is this change to allow multiple Test calls to chain together ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, we could actually omit it, I added it because I felt it could be useful for users to inspect the content of the context after calling it, but now that I think about it, it's more an implementation detail of testenv rather than of the Environment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@phisco funny thing is while researching the race condition issue, i ran into the need to inspect the context. However, I would lean toward not leaving the signature as is to avoid breakage.
Maybe a separate PR would be useful to update the signature of the Test and TestInParallel methods. That way, it's effect can be studied (or roll back if needed) with minimal disruption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vladimirvivien, I did some checks, adding a returned context to the signatures would not be breaking change, maybe some linter could complain, but it will build without any issue. However, not returning it, would mean there is no way to inspect the content of the ctx after a run as we were saying, which would mean that we would have no way to check the following test cases:
- https://github.com/kubernetes-sigs/e2e-framework/pull/292/files#diff-6da391e4e882fa4bb01f054410d115230af7831328b51bcf92939c9bfae905aaL318-R320 already existing
- https://github.com/kubernetes-sigs/e2e-framework/pull/292/files#diff-6da391e4e882fa4bb01f054410d115230af7831328b51bcf92939c9bfae905aaR374-R375 which I added, similar to the existing one
- https://github.com/kubernetes-sigs/e2e-framework/pull/292/files#diff-6da391e4e882fa4bb01f054410d115230af7831328b51bcf92939c9bfae905aaR424-R425 which I added, similar to the existing one
So, I tried adding two internal testEnv.test/testInParallel
functions that are returning the ctx, which are then called directly by their public versions Test/TestInParalle
, explicitly discarding the context returned, so that we can use those in tests if we need to check the ctx afterward, but they won't be accessible to users.
I would probably suggest adding the return ctx to the interface, but I totally understand not wanting to change the public API at least as part of this PR, so this could be a good intermediate step while we take a decision around that.
Thanks @harshanarayana, I'm on vacation, so I'm going to implement the changes suggested by @vladimirvivien by end of the week: avoid returning the context from Test and TesInParallel, and rename the two testing.T parameters. If you have any other request, let me know 🙏 |
e12794e
to
bbd1faf
Compare
@vladimirvivien @harshanarayana, I implemented the changes suggested, let me know what you think 🙏 |
bbd1faf
to
da401a8
Compare
This patch reduces the number of accesses to the context in the testEnv to only the ones that are strictly necessary and that should be synchronous. So, testEnv still contains a ctx, but is only to be used as a starting point and won't be updated except after the setup and finish phase. According to the following logic: - If `Test` is used, the same logic of now applies. So the context is passed to all features being run sequentially and the most updated version is then returned to the caller. - If `TestInParallel` is used (and parallel tests are enabled), the context is first populated by the `BeforeEachTest` functions and then a dedicated child of the parent context is provided to each test running in parallel, which is then discarded and only the original parent context is passed to the `AfterEachTest` functions and back into the environment. This is actually fixing a race condition in the previous implementation of `TestInParallel`. - If `t.Parallel` is used to run one or more features either using `Test` or `TestInParallel`, the same logic above applies, given that each caller will get its own context back. The only caveat is that the context is that the Finish phase will only be able to see the context from the Setup phase and not the one from the features themselves. Co-authored-by: Matteo Ruina <matteo.ruina@datadoghq.com> Signed-off-by: Philippe Scorsolini <p.scorsolini@gmail.com>
Signed-off-by: Philippe Scorsolini <p.scorsolini@gmail.com>
da401a8
to
a57988a
Compare
@phisco Apologies, I hadn't look at this all week. |
/ok-to-test |
a57988a
to
25cc833
Compare
@vladimirvivien, no problem, I agree it's better to return the context, I dropped a57988a 🙏 |
@vladimirvivien what do you think it's missing for merging this PR? |
@phisco @maruina @vladimirvivien This is a wonderful change. I don't really have any change request for the changes pushed into this PR. Thanks for adding some great tests to avoid future regressions of race conditions @phisco |
Apologies for the delay (busy). /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: phisco, vladimirvivien The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This is another approach to solve the same problem as #265 and was developed with the help of @maruina.
Supersedes #291 as I think is a better approach w.r.t. the one proposed there.
This patch reduces the number of accesses to the context in the testEnv to only the ones that are strictly necessary and that should be synchronous. So, testEnv still contains a ctx, but is only to be used as a starting point and won't be updated except after the setup and finish phase.
According to the following logic:
Test
is used, the same logic of now applies. So the context is passed to all features being run sequentially and the most updated version is then returned to the caller.TestInParallel
is used (and parallel tests are enabled), the context is first populated by theBeforeEachTest
functions and then a dedicated child of the parent context is provided to each test running in parallel, which is then discarded and only the original parent context is passed to theAfterEachTest
functions and back into the environment. This is actually fixing a race condition in the previous implementation ofTestInParallel
.t.Parallel
is used to run one or more features either usingTest
orTestInParallel
, the same logic above applies, given that each caller will get its own context back.The only caveat is that the context is that the Finish phase will only be able to see the context from the Setup phase and not the one from the features themselves.