-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix slow shutdown #2113
Fix slow shutdown #2113
Conversation
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
WalkthroughThe changes in this pull request enhance error handling in the Changes
Assessment against linked issues
Possibly related PRs
Poem
Tip New featuresWalkthrough comment now includes:
Notes:
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Files selected for processing (3)
- cmd/headscale/cli/serve.go (2 hunks)
- hscontrol/app.go (3 hunks)
- hscontrol/notifier/notifier.go (7 hunks)
Additional comments not posted (10)
cmd/headscale/cli/serve.go (2)
4-5
: Approved import additions.The new imports for
errors
andnet/http
are necessary for the updated error handling logic and are correctly placed.
28-29
: Improved error handling in server shutdown.The updated error handling logic now correctly differentiates between a normal server closure and other errors, preventing the logging of misleading fatal errors during a normal shutdown. This is a significant improvement in the robustness of the application.
hscontrol/notifier/notifier.go (7)
39-39
: Approved addition ofclosed
field.The addition of the
closed
boolean field to theNotifier
struct is a good practice for managing the lifecycle of the notifier. It helps ensure that no operations are performed on a closed notifier, which can prevent resource leaks and other errors.
47-47
: Approved initialization ofclosed
field inNewNotifier
.The initialization of the
closed
field tofalse
in theNewNotifier
function is correctly implemented. This ensures that the notifier starts in an active state, ready to handle operations.
85-87
: Approved early return inAddNode
method.The addition of an early return in the
AddNode
method when the notifier is closed is a good practice. It prevents the addition of new nodes to a closed notifier, maintaining the integrity of the notifier's state and avoiding potential errors.
115-117
: Approved early return inRemoveNode
method.The implementation of an early return in the
RemoveNode
method when the notifier is closed is correctly done. This prevents modifications to the notifier's state after it has been closed, which is essential for maintaining the robustness of the system.
177-179
: Approved check for closed state inNotifyWithIgnore
.The addition of a check for the
closed
state in theNotifyWithIgnore
method is a good practice. It ensures that no updates are sent from a closed notifier, which helps maintain the stability and predictability of the notification system.
197-199
: Approved consistency in checking closed state inNotifyByNodeID
.The implementation of a check for the
closed
state in theNotifyByNodeID
method is consistent with other methods in theNotifier
class. This consistency is crucial for ensuring that the notifier does not perform operations after it has been closed, maintaining the robustness of the system.
236-238
: Approved check for closed state insendAll
.The implementation of a check for the
closed
state in thesendAll
method is a good practice. It ensures that updates are not sent to all nodes from a closed notifier, which is crucial for preventing errors that could affect the entire network of nodes.hscontrol/app.go (1)
797-802
: Ensure proper resource management during shutdown.The changes made in lines 797-802 address the slow shutdown issue by ensuring that resources are properly managed and released during the shutdown process. The reordering of the shutdown sequence, specifically the closing of the node notifier and waiting for the netmap stream to close, is a critical improvement.
However, it's important to ensure that all resources are accounted for and that there are no potential leaks or deadlocks. The use of
sync.WaitGroup
for managing concurrent operations is appropriate, but thorough testing is necessary to ensure that all go routines are properly synchronized and that theWaitGroup
counter is correctly managed throughout the application lifecycle.Consider adding more detailed logging at each step of the shutdown process to help diagnose issues and ensure that each component is shut down in the correct order and timely manner. This could be particularly useful in production environments where shutdown behaviours might vary due to different loads or configurations.
Additionally, review the entire shutdown sequence to ensure that all network connections, file handles, and other resources are properly closed. This includes verifying that all goroutines are terminated, which can be done by checking for active goroutines before and after the shutdown sequence in a test environment.
Run the following script to verify the proper management of resources during shutdown:
Verification successful
Shutdown sequence is well-managed and logged.
The shutdown sequence in
hscontrol/app.go
is comprehensive and includes appropriate use ofsync.WaitGroup
and logging mechanisms. The sequence ensures that resources are closed in a specific order, and the logging provides visibility into the process, aiding in diagnosing potential issues. The original review comment's suggestion for detailed logging is already implemented in the codebase.
- The
sync.WaitGroup
is used effectively to manage concurrent operations during shutdown.- Logging statements are present throughout the shutdown process, providing detailed insights.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that all resources are properly managed during shutdown. # Test: Check for resource leaks or deadlocks during shutdown. Expect: No leaks or deadlocks. # This script assumes the presence of a testing framework that can simulate shutdown scenarios. # Replace `simulate_shutdown` with the actual command or script that triggers the shutdown process. simulate_shutdown # Check system logs or monitoring tools to verify that all resources are released and no errors are reported.Length of output: 104
Script:
#!/bin/bash # Search for the usage of sync.WaitGroup in the codebase to understand its role in the shutdown process. ast-grep --lang go --pattern 'sync.WaitGroup' # Search for functions or methods related to shutdown processes. rg 'shutdown|close|terminate' -A 5 # Check for logging mechanisms related to shutdown processes. rg 'log|trace' -A 5Length of output: 221008
Signed-off-by: Kristoffer Dalby <kristoffer@tailscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Files selected for processing (1)
- CHANGELOG.md (1 hunks)
Additional comments not posted (1)
CHANGELOG.md (1)
77-77
: Approve the changelog entry with a suggestion.The entry clearly documents the fix for the slow shutdown issue and correctly links to PR #2113. Ensure that the release date
2023-XX-XX
for version0.23.0
is updated to the actual release date upon finalising the version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Files selected for processing (1)
- hscontrol/app.go (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- hscontrol/app.go
Turns out we were waiting for connections to close, without telling them to close...
Now:
Fixes #1968
Signed-off-by: Kristoffer Dalby kristoffer@tailscale.com
Summary by CodeRabbit
New Features
Bug Fixes
Documentation