-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolved race condition in collector when calling Shutdown
#4878
Resolved race condition in collector when calling Shutdown
#4878
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4878 +/- ##
=======================================
Coverage 91.02% 91.02%
=======================================
Files 178 178
Lines 10597 10600 +3
=======================================
+ Hits 9646 9649 +3
Misses 734 734
Partials 217 217
Continue to review full report at Codecov.
|
service/collector_test.go
Outdated
@@ -135,6 +135,53 @@ func TestCollector_Start(t *testing.T) { | |||
}, time.Second*2, time.Millisecond*200) | |||
} | |||
|
|||
// Calling shutdown before collector is running should not stop it from running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The comment here should start with the func name, alternatively the comment could be moved inside func
68d8a83
to
79ebdb3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR changes more than needed to fix the problem.
The adding of a new state is completely independent and should be a separate PR.
@bogdandrutu So I added a new state because with out it the Collector is in a You're right the new state isn't technically needed to solve this problem but without a way to tell that the collector has been created and not run , I'm curious if you have another solution to this were we can avoid a new state. |
79ebdb3
to
379de67
Compare
Maybe we change the title of the PR to @cpheps what do you think? |
@buptubuntu I think that's fair. I think without the new state we trade a race condition a less than ideal behavior pattern for the collector. |
@bogdandrutu Curious if I can get your thoughts on my reply and @buptubuntu's suggestion? |
379de67
to
fb83227
Compare
service/collector.go
Outdated
@@ -48,6 +48,7 @@ const ( | |||
Running | |||
Closing | |||
Closed | |||
Created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why at the end? I don't think we guarantee the numbers.
@cpheps this is probably the most important question. I think we should do what the "language pattern" is, so probably we should do exactly what http server does for shutdown https://pkg.go.dev/net/http#Server.Shutdown and document the behavior better. |
@bogdandrutu I think that's fair. Looks like if you call shutdown on a server it will not allow it to start or will cause any listen function to immediately close. So for the collector it should do the same. I think this eliminates the need for a new state. I'll remove the created state and rename the PR back to the way it was. Do you have any concerns on how fast the collector shutdown if |
fb83227
to
5a0f6e4
Compare
@bogdandrutu Removed the extra state. Do you mind taking a re-review? |
Looks like CI failures are unrelated to changes. |
Do you mean block in the shutdown function? I suppose we can, in For us I think we need to do some refactoring to figure out what shutdown is waiting on and ensure it doesn't block forever if called before
I agree on this but I felt like it was out of the scope of the PR as we probably need to restructure the startup logic. I'll write up issues for these. I think the former needs more discussion. |
Shutdown
@cpheps I think your test catch you :)
|
I'll double check. They were running locally even the service one that keeps failing. It might be flaky I'll check. |
Oh I see it's the race detector I'll take a look. |
Signed-off-by: Corbin Phelps <corbin.phelps@bluemedora.com>
Signed-off-by: Corbin Phelps <corbin.phelps@bluemedora.com>
…ine with language standards Signed-off-by: Corbin Phelps <corbin.phelps@bluemedora.com>
@bogdandrutu This is actually interesting as you brought this up on my last PR. The data race is due to access to the logger pointer is not atomic. I wrote up and issue to track the atomic log pointer at #4939. Would you be opposed to removing the log statement in the We could add it back once the log pointer is atomic but I don't think the message is useful enough to block merging this PR. |
+1 on removing the log |
Signed-off-by: Corbin Phelps <corbin.phelps@bluemedora.com>
5a0f6e4
to
df4dab0
Compare
Signed-off-by: Corbin Phelps <corbin.phelps@bluemedora.com>
Hi @cpheps please I was looking through this and I was wondering how you came up with time in the assert.eventually() function |
@Chinwendu20 the time is a bit arbitrary. Shutdown should be almost instant. I picked 2 seconds because that should be more than enough time for Shutdown to work. It's also short enough where it shutdown fails it does not slow the entire test suit down significantly. |
Thank you so much @cpheps. I see this PR is related to the shutdown not being honored if called right after run. I would like to please know why you called Shutdown() before run in this test case. I see the comment on the code right after, that call but I still do not get it. Thanks a lot |
@Chinwendu20 This specific test case is to ensure if you call The order of operations should be this:
We expect that collector should exit run within a reasonable amount of time (hence the 2 second timer). Currently though the collector has to run through its entire startup then detect the shutdown and exit. The issue you're working on should make it so as soon as |
Description:
Link to tracking Issue: Fixes #4843
Testing: Verified existing tests still work. Added unit test to verify calling
Shutdown
while collector is not running will cause an immediate shutdown when Run is called.