Skip to content
This repository has been archived by the owner on Jul 12, 2023. It is now read-only.

Fix data races in tests #484

Merged
merged 3 commits into from
Sep 7, 2020
Merged

Fix data races in tests #484

merged 3 commits into from
Sep 7, 2020

Conversation

sethvargo
Copy link
Member

@sethvargo sethvargo commented Sep 7, 2020

While working on some other stuff locally and running the tests, I occasionally saw many data races. I think we've also seen these a few times on prow, so I figured I'd dig in. There were two separate data races:

  1. The DefaultOptions in migrations is a global mutable. Despite always setting the value to false (and the default being false), the race detector sees this as a data-write and barfs. I removed it, since it's the default value.

  2. In the event we failed to connect to the database, we were still registering the opencensus views. Furthermore, we could succeed connecting to the database, but fail to initialize gorm, which would trigger a re-connect. This separates the "connect to the database" and "connect to the database via gorm" steps into separate retries. It also moves the opencenusus and driver initialization into a package-level init.

    We've temporarily reverted the OC integration.

  3. The verification codes tests create verification codes in the struct with a set expiration. Those times are set on load, not "on test run". When the system is under heavy load, there could be 5-10s between when the struct is created and when the test is run. To fix this, we changed the test cases to return a function that returns a verification code, ensuring the time-based tests run right before use, not on load.

Release Note

Fix data race in database initialization

@googlebot googlebot added the cla: yes Auto: added by CLA bot when all committers have signed a CLA. label Sep 7, 2020
@sethvargo
Copy link
Member Author

It looks like either gorm or the opencensus library we're using has a data race:

==================
WARNING: DATA RACE
Write at 0x00c00022d0d8 by goroutine 32:
  github.com/jinzhu/gorm.(*CallbackProcessor).Register()
      /Users/sethvargo/Development/go/pkg/mod/github.com/jinzhu/gorm@v1.9.16/callback.go:107 +0x3d2
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).OpenWithCacher()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:185 +0xd45
  github.com/sethvargo/go-retry.Do()
      /Users/sethvargo/Development/go/pkg/mod/github.com/sethvargo/go-retry@v0.1.0/retry.go:53 +0x6a
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).OpenWithCacher.func2()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:153 +0xbd
  github.com/sethvargo/go-retry.Do()
      /Users/sethvargo/Development/go/pkg/mod/github.com/sethvargo/go-retry@v0.1.0/retry.go:53 +0x6a
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).OpenWithCacher.func2()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:153 +0xbd
  github.com/sethvargo/go-retry.Do()
      /Users/sethvargo/Development/go/pkg/mod/github.com/sethvargo/go-retry@v0.1.0/retry.go:53 +0x6a
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).OpenWithCacher.func2()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:153 +0xbd
  github.com/sethvargo/go-retry.Do()
      /Users/sethvargo/Development/go/pkg/mod/github.com/sethvargo/go-retry@v0.1.0/retry.go:53 +0x6a
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).OpenWithCacher.func2()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:153 +0xbd
  github.com/sethvargo/go-retry.Do()
      /Users/sethvargo/Development/go/pkg/mod/github.com/sethvargo/go-retry@v0.1.0/retry.go:53 +0x6a
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).OpenWithCacher.func2()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:153 +0xbd
  github.com/sethvargo/go-retry.Do()
      /Users/sethvargo/Development/go/pkg/mod/github.com/sethvargo/go-retry@v0.1.0/retry.go:53 +0x6a
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).OpenWithCacher.func2()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:153 +0xbd
  github.com/sethvargo/go-retry.Do()
      /Users/sethvargo/Development/go/pkg/mod/github.com/sethvargo/go-retry@v0.1.0/retry.go:53 +0x6a
  github.com/google/exposure-notifications-verification-server/pkg/database.withRetries()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:481 +0x134
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).OpenWithCacher()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:151 +0x284
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).Open()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:123 +0xb1e
  github.com/google/exposure-notifications-verification-server/pkg/database.NewTestDatabaseWithConfig()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database_util.go:125 +0xb29
  github.com/google/exposure-notifications-verification-server/pkg/database.NewTestDatabase()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database_util.go:145 +0x56
  github.com/google/exposure-notifications-verification-server/pkg/database.TestVerificationCode_FindVerificationCode()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/vercode_test.go:27 +0x72
  testing.tRunner()
      /Users/sethvargo/.homebrew/Cellar/go/1.15.1/libexec/src/testing/testing.go:1108 +0x202

Previous read at 0x00c00022d0d8 by goroutine 22:
  runtime.growslice()
      /Users/sethvargo/.homebrew/Cellar/go/1.15.1/libexec/src/runtime/slice.go:125 +0x0
  github.com/jinzhu/gorm.(*CallbackProcessor).Register()
      /Users/sethvargo/Development/go/pkg/mod/github.com/jinzhu/gorm@v1.9.16/callback.go:107 +0x505
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).OpenWithCacher()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:194 +0x19c4
  github.com/sethvargo/go-retry.Do()
      /Users/sethvargo/Development/go/pkg/mod/github.com/sethvargo/go-retry@v0.1.0/retry.go:53 +0x6a
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).OpenWithCacher.func2()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:153 +0xbd
  github.com/sethvargo/go-retry.Do()
      /Users/sethvargo/Development/go/pkg/mod/github.com/sethvargo/go-retry@v0.1.0/retry.go:53 +0x6a
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).OpenWithCacher.func2()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:153 +0xbd
  github.com/sethvargo/go-retry.Do()
      /Users/sethvargo/Development/go/pkg/mod/github.com/sethvargo/go-retry@v0.1.0/retry.go:53 +0x6a
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).OpenWithCacher.func2()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:153 +0xbd
  github.com/sethvargo/go-retry.Do()
      /Users/sethvargo/Development/go/pkg/mod/github.com/sethvargo/go-retry@v0.1.0/retry.go:53 +0x6a
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).OpenWithCacher.func2()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:153 +0xbd
  github.com/sethvargo/go-retry.Do()
      /Users/sethvargo/Development/go/pkg/mod/github.com/sethvargo/go-retry@v0.1.0/retry.go:53 +0x6a
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).OpenWithCacher.func2()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:153 +0xbd
  github.com/sethvargo/go-retry.Do()
      /Users/sethvargo/Development/go/pkg/mod/github.com/sethvargo/go-retry@v0.1.0/retry.go:53 +0x6a
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).OpenWithCacher.func2()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:153 +0xbd
  github.com/sethvargo/go-retry.Do()
      /Users/sethvargo/Development/go/pkg/mod/github.com/sethvargo/go-retry@v0.1.0/retry.go:53 +0x6a
  github.com/google/exposure-notifications-verification-server/pkg/database.withRetries()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:481 +0x134
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).OpenWithCacher()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:151 +0x284
  github.com/google/exposure-notifications-verification-server/pkg/database.(*Database).Open()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database.go:123 +0xb1e
  github.com/google/exposure-notifications-verification-server/pkg/database.NewTestDatabaseWithConfig()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database_util.go:125 +0xb29
  github.com/google/exposure-notifications-verification-server/pkg/database.NewTestDatabase()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/database_util.go:145 +0x56
  github.com/google/exposure-notifications-verification-server/pkg/database.TestSMSProvider()
      /Users/sethvargo/Development/exposure-notifications-verification-server/pkg/database/sms_config_test.go:119 +0x72
  testing.tRunner()
      /Users/sethvargo/.homebrew/Cellar/go/1.15.1/libexec/src/testing/testing.go:1108 +0x202

Goroutine 32 (running) created at:
  testing.(*T).Run()
      /Users/sethvargo/.homebrew/Cellar/go/1.15.1/libexec/src/testing/testing.go:1159 +0x796
  testing.runTests.func1()
      /Users/sethvargo/.homebrew/Cellar/go/1.15.1/libexec/src/testing/testing.go:1430 +0xa6
  testing.tRunner()
      /Users/sethvargo/.homebrew/Cellar/go/1.15.1/libexec/src/testing/testing.go:1108 +0x202
  testing.runTests()
      /Users/sethvargo/.homebrew/Cellar/go/1.15.1/libexec/src/testing/testing.go:1428 +0x5aa
  testing.(*M).Run()
      /Users/sethvargo/.homebrew/Cellar/go/1.15.1/libexec/src/testing/testing.go:1338 +0x4eb
  main.main()
      _testmain.go:77 +0x236

Goroutine 22 (running) created at:
  testing.(*T).Run()
      /Users/sethvargo/.homebrew/Cellar/go/1.15.1/libexec/src/testing/testing.go:1159 +0x796
  testing.runTests.func1()
      /Users/sethvargo/.homebrew/Cellar/go/1.15.1/libexec/src/testing/testing.go:1430 +0xa6
  testing.tRunner()
      /Users/sethvargo/.homebrew/Cellar/go/1.15.1/libexec/src/testing/testing.go:1108 +0x202
  testing.runTests()
      /Users/sethvargo/.homebrew/Cellar/go/1.15.1/libexec/src/testing/testing.go:1428 +0x5aa
  testing.(*M).Run()
      /Users/sethvargo/.homebrew/Cellar/go/1.15.1/libexec/src/testing/testing.go:1338 +0x4eb
  main.main()
      _testmain.go:77 +0x236
==================

I've temporarily removed it /cc @icco @taddari

@sethvargo
Copy link
Member Author

Hmm - pretty sure GH-396 broke some stuff. Gonna revert for now.

While working on some other stuff locally and running the tests, I occasionally saw many data races. I think we've also seen these a few times on prow, so I figured I'd dig in. There were two separate data races:

1. The DefaultOptions in migrations is a global mutable. Despite always setting the value to false (and the default being false), the race detector sees this as a data-write and barfs. I removed it, since it's the default value.

2. In the event we failed to connect to the database, we were still registering the opencensus views. Furthermore, we could succeed connecting to the database, but fail to initialize gorm, which would trigger a re-connect. This separates the "connect to the database" and "connect to the database via gorm" steps into separate retries. It also moves the opencenusus and driver initialization into a package-level init.
Under heavy load, this causes a race where the struct is created (and thus the time is set), but the test might not run for a few more seconds. By switching this to a function, we ensure it's only called immediately before executing instead of during scheduling.
@sethvargo
Copy link
Member Author

That passed. I'm gonna retest just to make sure...

/retest

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mikehelmick, sethvargo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [mikehelmick,sethvargo]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-robot google-oss-robot merged commit f9f4f9a into main Sep 7, 2020
@google-oss-robot google-oss-robot deleted the sethvargo/race branch September 7, 2020 21:59
@icco
Copy link
Contributor

icco commented Sep 7, 2020

Woofta. Thanks for catching @sethvargo. /cc @taddari

@google google locked and limited conversation to collaborators Oct 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes Auto: added by CLA bot when all committers have signed a CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants