Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a GitHub action that runs unit tests with -race, to CI build #4774

Closed
jgwest opened this issue Nov 6, 2020 · 1 comment
Closed

Add a GitHub action that runs unit tests with -race, to CI build #4774

jgwest opened this issue Nov 6, 2020 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@jgwest
Copy link
Member

jgwest commented Nov 6, 2020

Summary

At present, the Argo CD unit tests run as part of every CI build, but they do not run with with -race parameter which may be used to detect data races.

Since data races, in production applications, may lead to application instability and crashes, it's beneficial to ensure that we prevent data races from being introduced into Argo CD where possible. An easy way of improving our coverage here is what I propose below.

I've already fixed/opened PRs for a number of existing data races:

While some of them have existed for a while, a number of them were introduced fairly recently, and could have been caught if we were running the unit tests with -race as part of PR CI builds.

Proposal

I'm proposing:

  • Add a new make test-race entry to the makefile, to run the (supported) tests with -race
  • Segregate tests that currently fail with -race enabled, such that they do not run during make test-race, and ONLY run during make test
  • Failing tests are seggregated by moving them into a new test file, suffixed with _norace_test.go
  • Segregated tests will contain // +build !race at the top of the source file which prevents them from being run when -race is enabled.
  • Add a new go-test-race job to the ci-build.yaml as a check to run on every PR.

See the corresponding PR for this issue, as an example of how all of the above looks in practice.

Feedback definitely welcome!

Additional Questions Answered

Q: Are all the tests passing with data race detection enabled?

Unfortunately there are a few tests that are still throwing data races during unit tests, and thus I have moved them into _norace_test.go files so that they don't run during the -race tess pass. I've filed issues for a few, and PRs for the ones mentioned above... here are some examples:

TestUserAgent/Test_StaticHeaders in server/server_test.go

A data race in go-client's shared_informer.go, between sharedProcessor.run(...) and itself. Based on the data race, it APPEARS to be expected, but in any case it's nothing we are doing in Argo CD AFAICT that is causing this issue.

TestRandomPasswordVerificationDelay in util/session/sessionmanager_test.go

SessionManager.VerifyUsernamePassword uses bcrypt to prevent against time-based attacks and verify the hashed password; however this is a CPU intensive algorithm that is made significantly slower due to data race detection being enabled, which breaks through the maximum time limit required by TestRandomPasswordVerificationDelay.

TestResourceActionWildcards/TestPolicyInformer in util/rbac/rbac_test.go

A BUNCH of data race warnings thrown by running this test package, and it's tough to guess to what degree this is primarily a casbin issue or an Argo CD RBAC issue... at least one data race is in rbac.go with itself, and a bunch are in casbin. You can see the full list by doing a go test -race github.com/argoproj/argo-cd/util/rbac

It couldn't hurt to take a look at this code to decide if Argo CD is properly handling concurrent data access here, but in the mean time I have disabled data race testing of these tests.

Q: What if it makes the build unreliable?

I've stress tested the unit tests by running go test -race in a continous loop, and any tests that failed due to a race condition during the process were moved to _norace_ files. (with the only exception being those couple of test packages that fail when run too closely together, due to to github repo request limits)

I've likewise hit the GitHub 'rerun action' button a bunch of times on the newly introduced go test -race job, to reduce the potential for intermittent issues only reproducible from within GitHub internal runners.

If we do discover it makes the build unreliable, we should definitely segregate those affected tests, but as above I'm not expecting many issues there.

@jgwest jgwest added the enhancement New feature or request label Nov 6, 2020
jgwest added a commit to jgwest/argo-cd that referenced this issue Nov 6, 2020
jgwest added a commit to jgwest/argo-cd that referenced this issue Nov 7, 2020
jannfis pushed a commit that referenced this issue Nov 7, 2020
#4774) (#4775)

* chore: Add a GitHub action that runs unit tests with -race to CI build (#4774)

Signed-off-by: Jonathan West <jonwest@redhat.com>

* chore: Add a GitHub action that runs unit tests with -race to CI build (#4774)

Signed-off-by: Jonathan West <jonwest@redhat.com>
@jgwest
Copy link
Member Author

jgwest commented Nov 9, 2020

Merged 🎉🎉🎉

@jgwest jgwest closed this as completed Nov 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant