Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in dispatcher code #2182

Closed
jtlisi opened this issue Feb 12, 2020 · 3 comments · Fixed by #2208
Closed

Race condition in dispatcher code #2182

jtlisi opened this issue Feb 12, 2020 · 3 comments · Fixed by #2208

Comments

@jtlisi
Copy link
Contributor

jtlisi commented Feb 12, 2020

I noticed a mild race condition in the dispatch package.

I wrote a small test in the dispatch package to demonstrate a race condition in the dispatch code:

Test Added to `dispatch/dispatch_test.go:

func TestDispatcherRace(t *testing.T) {
	logger := log.NewNopLogger()
	marker := types.NewMarker(prometheus.NewRegistry())
	alerts, err := mem.NewAlerts(context.Background(), marker, time.Hour, logger)
	if err != nil {
		t.Fatal(err)
	}
	defer alerts.Close()

	timeout := func(d time.Duration) time.Duration { return time.Duration(0) }
	dispatcher := NewDispatcher(alerts, nil, nil, marker, timeout, logger, NewDispatcherMetrics(prometheus.NewRegistry()))
	go dispatcher.Run()
	dispatcher.Stop()
}

Test Output:

$ go test ./dispatch -run TestDispatcherRace -race
==================
WARNING: DATA RACE
Write at 0x00c0002e0580 by goroutine 26:
  github.com/prometheus/alertmanager/dispatch.(*Dispatcher).Run()
      /Users/jtlisi/tmp/alertmanager/dispatch/dispatch.go:112 +0x1d7

Previous read at 0x00c0002e0580 by goroutine 20:
  github.com/prometheus/alertmanager/dispatch.TestDispatcherRace()
      /Users/jtlisi/tmp/alertmanager/dispatch/dispatch.go:254 +0x677
  testing.tRunner()
      /usr/local/Cellar/go/1.13.6/libexec/src/testing/testing.go:909 +0x199

Goroutine 26 (running) created at:
  github.com/prometheus/alertmanager/dispatch.TestDispatcherRace()
      /Users/jtlisi/tmp/alertmanager/dispatch/dispatch_test.go:537 +0x58e
  testing.tRunner()
      /usr/local/Cellar/go/1.13.6/libexec/src/testing/testing.go:909 +0x199

Goroutine 20 (finished) created at:
  testing.(*T).Run()
      /usr/local/Cellar/go/1.13.6/libexec/src/testing/testing.go:960 +0x651
  testing.runTests.func1()
      /usr/local/Cellar/go/1.13.6/libexec/src/testing/testing.go:1202 +0xa6
  testing.tRunner()
      /usr/local/Cellar/go/1.13.6/libexec/src/testing/testing.go:909 +0x199
  testing.runTests()
      /usr/local/Cellar/go/1.13.6/libexec/src/testing/testing.go:1200 +0x521
  testing.(*M).Run()
      /usr/local/Cellar/go/1.13.6/libexec/src/testing/testing.go:1117 +0x2ff
  main.main()
      _testmain.go:58 +0x223
==================
FAIL
FAIL	github.com/prometheus/alertmanager/dispatch	0.271s
FAIL

Although it will not happen as things stand, in theory simultaneous call to Stop() and Run() will cause a panic. The fix is simple. The operations on the cancel field in the Stop and Run functions require a lock.

Let me know if I'm mistaken. Otherwise, I can open a small PR with the fix.

@simonpasquier
Copy link
Member

You're right about the race. As for the fix, it might even be possible to avoid locking by moving some initialization happening in dispatcher.Run() to dispatch.New(...).

@simonpasquier
Copy link
Member

@jtlisi still interested to tackle this one?

@jtlisi
Copy link
Contributor Author

jtlisi commented Mar 10, 2020

Sure! However, I'm not sure how to fix this without locking. I do have a way to fix with this with locking if that's acceptable.

gotjosh added a commit to gotjosh/cortex that referenced this issue Aug 24, 2020
Signed-off-by: gotjosh <josue@grafana.com>
gouthamve pushed a commit to cortexproject/cortex that referenced this issue Aug 24, 2020
#3065)

* Remove TODO about prometheus/alertmanager#2182 as it got merged

Signed-off-by: gotjosh <josue@grafana.com>

* Remove TODO about prometheus/alertmanager#2200 as it got merged

Signed-off-by: gotjosh <josue@grafana.com>

* Register the Alertmanager API metrics

Signed-off-by: gotjosh <josue@grafana.com>

* Add a changelog entry

Signed-off-by: gotjosh <josue@grafana.com>

* Fix tests

Signed-off-by: gotjosh <josue@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants