Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix promtail deadlock #5170

Merged
merged 7 commits into from
Jan 18, 2022
Merged

Fix promtail deadlock #5170

merged 7 commits into from
Jan 18, 2022

Conversation

chaudum
Copy link
Contributor

@chaudum chaudum commented Jan 18, 2022

What this PR does / why we need it:

This PR fixes a deadlock in Promtail which occurred when targets were removed from the target groups by the discovery manager.

The problem was that upon removal of a target it was removed from the targets list, but the correlating file event watcher channel was not removed from from the watchers list. This caused events to be sent to channels that were not read any more and therefore resulting in a deadlock.

Which issue(s) this PR fixes:
Fixes #5162

Special notes for your reviewer:

Checklist

  • Documentation added
  • Tests updated
  • Add an entry in the CHANGELOG.md about the changes.

@chaudum chaudum requested a review from a team as a code owner January 18, 2022 14:46
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
@@ -319,6 +319,10 @@ func (s *targetSyncer) sync(groups []*targetgroup.Group, targetEventHandler chan
target.Stop()
s.metrics.targetsActive.Add(-1.)
delete(s.targets, key)

// close related file event watcher
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @slim-bean :)

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -319,6 +319,10 @@ func (s *targetSyncer) sync(groups []*targetgroup.Group, targetEventHandler chan
target.Stop()
s.metrics.targetsActive.Add(-1.)
delete(s.targets, key)

// close related file event watcher
close(s.fileEventWatchers[target.path])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be safe, should we check that this member exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good idea, on the other hand, there went something wrong beforehand if the member does not exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good idea !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the only place we don't iterate. Let's do it.

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Contributor

@kavirajk kavirajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cyriltovena cyriltovena merged commit 86ed98c into main Jan 18, 2022
@cyriltovena cyriltovena deleted the chaudum/promtail-deadlock branch January 18, 2022 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Promtail stuck on mutex while tailing files.
4 participants