Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data race in raft implementation #1180

Closed
sanga opened this issue Nov 27, 2014 · 5 comments
Closed

data race in raft implementation #1180

sanga opened this issue Nov 27, 2014 · 5 comments
Assignees
Milestone

Comments

@sanga
Copy link
Contributor

sanga commented Nov 27, 2014

Go has a built in race detector (http://blog.golang.org/race-detector). Enabling it is as simple as: go test -v -race . ./messaging ./influxql in .travis.yml

And it actually finds some errors against current master 8049c34.

@benbjohnson
Copy link
Contributor

@sanga We'll definitely get the race errors buttoned up. There was a lot of shuffling around of some pieces in messaging and raft while I was architecting them and I haven't had a chance to put the mutexes back in.

@otoolep
Copy link
Contributor

otoolep commented Nov 28, 2014

Yeah, definitely something we will enable. As @benbjohnson mentioned, we're working through some of the new tests in master, and ensuring everything is solid.

I'll make sure we get this turned on eventually.

@otoolep otoolep self-assigned this Nov 28, 2014
@toddboom toddboom added this to the 0.9.0 milestone Dec 5, 2014
@sanga
Copy link
Contributor Author

sanga commented Feb 12, 2015

For kicks I tried this again this morning (against 59a953d), and you still have racy tests (or code)

GORACE="halt_on_error=1" go test -v -race  ./...
....
==================
WARNING: DATA RACE
Read by goroutine 71:
  github.com/influxdb/influxdb/raft.(*Log).waitUncommitted()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/raft/log.go:967 +0x78
  github.com/influxdb/influxdb/raft.(*Log).WaitUncommitted()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/raft/internal_test.go:3 +0x51
  github.com/influxdb/influxdb/raft_test.(*Log).MustWaitUncommitted()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/raft/log_test.go:492 +0x5f
  github.com/influxdb/influxdb/raft_test.NewCluster()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/raft/log_test.go:377 +0x1109
  github.com/influxdb/influxdb/raft_test.TestCluster_ID_Sequential()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/raft/log_test.go:99 +0x37
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:447 +0x133

Previous write by goroutine 79:
  github.com/influxdb/influxdb/raft.(*Log).ReadFrom()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/raft/log.go:1568 +0xbfa
  github.com/influxdb/influxdb/raft.func·006()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/raft/log.go:654 +0x7a

Goroutine 71 (running) created at:
  testing.RunTests()
      /usr/local/go/src/testing/testing.go:555 +0xd4e
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:485 +0xe0
  main.main()
      github.com/influxdb/influxdb/raft/_test/_testmain.go:202 +0x28c

Goroutine 79 (running) created at:
  github.com/influxdb/influxdb/raft.(*Log).followerLoop()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/raft/log.go:658 +0x921

@otoolep
Copy link
Contributor

otoolep commented Feb 12, 2015

Thanks @sanga -- yep, we know. We are fixing these as we work though it, though base features for the 0.9.0 release are obviously a high priority right now. We are holding off from enabling this on our CI system until the race count gets way down.

PRs -- any fixes or suggested changes for this stuff -- are more than welcome in the meantime.

@sanga sanga changed the title please consider adding -race to go test in travis.yml data race in raft implementation Mar 5, 2015
@sanga
Copy link
Contributor Author

sanga commented Mar 5, 2015

This is definitely in prod code and unrelated to tests. I just reproed it by building with -race enabled and running the 3_shards test. Stack:

 8888888           .d888 888                   8888888b.  888888b.
   888            d88P"  888                   888  "Y88b 888  "88b
   888            888    888                   888    888 888  .88P
   888   88888b.  888888 888 888  888 888  888 888    888 8888888K.
   888   888 "88b 888    888 888  888  Y8bd8P' 888    888 888  "Y88b
   888   888  888 888    888 888  888   X88K   888    888 888    888
   888   888  888 888    888 Y88b 888 .d8""8b. 888  .d88P 888   d88P
 8888888 888  888 888    888  "Y88888 888  888 8888888P"  8888888P"

[srvr] 2015/03/05 12:34:59 influxdb started, version 0.9, commit unknown
[raft]  2015/03/05 12:34:59 log open: created at /tmp/influxdb/8086/broker/raft, with ID 1, term 1, last applied index of 38
[raft]  2015/03/05 12:34:59 log state change: stopped => follower
==================
WARNING: DATA RACE
Write by goroutine 9:
  github.com/influxdb/influxdb/raft.(*Log).stateLoop()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/raft/log.go:647 +0x20d

Previous read by main goroutine:
  github.com/influxdb/influxdb/raft.(*Log).startStateLoop()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/raft/log.go:634 +0x8f
  github.com/influxdb/influxdb/raft.(*Log).Open()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/raft/log.go:289 +0xd05
  github.com/influxdb/influxdb/messaging.(*Broker).Open()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/messaging/broker.go:102 +0x36f
  main.openBroker()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/cmd/influxd/run.go:216 +0x10e
  main.Run()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/cmd/influxd/run.go:45 +0x447
  main.execRun()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/cmd/influxd/main.go:118 +0x8e1
  main.main()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/cmd/influxd/main.go:70 +0x4a9

Goroutine 9 (running) created at:
  github.com/influxdb/influxdb/raft.(*Log).startStateLoop()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/raft/log.go:630 +0x7a
  github.com/influxdb/influxdb/raft.(*Log).Open()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/raft/log.go:289 +0xd05
  github.com/influxdb/influxdb/messaging.(*Broker).Open()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/messaging/broker.go:102 +0x36f
  main.openBroker()  
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/cmd/influxd/run.go:216 +0x10e
  main.Run()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/cmd/influxd/run.go:45 +0x447
  main.execRun()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/cmd/influxd/main.go:118 +0x8e1
  main.main()
      /home/sampti/Code/go/src/github.com/influxdb/influxdb/cmd/influxd/main.go:70 +0x4a9
==================
[srvr] 2015/03/05 12:34:59 broker listening on :8086
[srvr] 2015/03/05 12:34:59 Loading metadata index for foo
[srvr] 2015/03/05 12:34:59 broker enforcing retention policies with check interval of 10m0s
[srvr] 2015/03/05 12:34:59 data node #1 listening on :8086
[srvr] 2015/03/05 12:34:59 starting admin server on :8083
[srvr] 2015/03/05 12:34:59 Sending anonymous usage statistics to m.influxdb.com
[messaging] 2015/03/05 12:34:59 connected to broker: http://sampti-wheezy:8086/messaging/messages?replicaID=1
[raft]  2015/03/05 12:35:00 log state change: follower => candidate
[raft]  2015/03/05 12:35:02 log state change: candidate => follower

@dgnorton dgnorton assigned dgnorton and unassigned otoolep Mar 6, 2015
dgnorton added a commit that referenced this issue Mar 6, 2015
@dgnorton dgnorton removed the review label Mar 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants