Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make TestBlocksStoreQuerier_ShouldReturnContextCanceledIfContextWasCanceledWhileRunningRequestOnStoreGateway easier to debug when flaky #7134

Conversation

pracucci
Copy link
Collaborator

What this PR does

@charleskorn reported me that the TestBlocksStoreQuerier_ShouldReturnContextCanceledIfContextWasCanceledWhileRunningRequestOnStoreGateway failed in a CI run with the following error:

--- FAIL: TestBlocksStoreQuerier_ShouldReturnContextCanceledIfContextWasCanceledWhileRunningRequestOnStoreGateway (0.01s)
    --- FAIL: TestBlocksStoreQuerier_ShouldReturnContextCanceledIfContextWasCanceledWhileRunningRequestOnStoreGateway/LabelNames() (0.00s)
panic: not enough mocked results [recovered]
	panic: not enough mocked results

goroutine 687 [running]:
testing.tRunner.func1.2({0x203a880, 0x35a4490})
	/usr/local/go/src/testing/testing.go:1545 +0x3f7
testing.tRunner.func1()
	/usr/local/go/src/testing/testing.go:1548 +0x716
panic({0x203a880?, 0x35a4490?})
	/usr/local/go/src/runtime/panic.go:920 +0x270
github.com/grafana/mimir/pkg/querier.(*blocksStoreSetMock).GetClientsFor(0xc0001c20c0, {0xc0011b98c8?, 0x6?}, {0x6?, 0x1?, 0x11?}, 0x38?)
	/__w/mimir/mimir/pkg/querier/blocks_store_queryable_test.go:2181 +0x266
github.com/grafana/mimir/pkg/querier.(*blocksStoreQuerier).queryWithConsistencyCheck(0xc00067c310, {0x35bcbd8, 0xc0001c2150}, 0x2191020?, 0x35a9060?, 0x42b9640?, {0x22f8cfd, 0x6}, 0x0, 0xc0011b9cf8)
	/__w/mimir/mimir/pkg/querier/blocks_store_queryable.go:554 +0x12c5
github.com/grafana/mimir/pkg/querier.(*blocksStoreQuerier).LabelNames(0xc00067c310, {0x35bcc10, 0xc000149270}, {0xc0001423e8, 0x1, 0x1})
	/__w/mimir/mimir/pkg/querier/blocks_store_queryable.go:373 +0xbd1
github.com/grafana/mimir/pkg/querier.TestBlocksStoreQuerier_ShouldReturnContextCanceledIfContextWasCanceledWhileRunningRequestOnStoreGateway.func3(0x0?)
	/__w/mimir/mimir/pkg/querier/blocks_store_queryable_test.go:1077 +0x40c
testing.tRunner(0xc00086ed00, 0xc000665830)
	/usr/local/go/src/testing/testing.go:1595 +0x262
created by testing.(*T).Run in goroutine 661
	/usr/local/go/src/testing/testing.go:1648 +0x846
FAIL	github.com/grafana/mimir/pkg/querier	16.730s

The issue happens when a gRPC request is retried, while we expect not to be retried. I haven't been able to reproduce it locally (I've also tried to add some time.Sleep() in different places to try to trigger a timing issue). Unfortunately, due to the panic() we can't have more details from the actual CI run.

In this PR I suggest to print log messages and to mock the response multiple (3) times so that the panic() should go away and the actual behaviour logged when the test fails.

Which issue(s) this PR fixes or relates to

N/A

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

…nceledWhileRunningRequestOnStoreGateway easier to debug when flaky

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci marked this pull request as ready for review January 15, 2024 18:10
@pracucci pracucci requested a review from a team as a code owner January 15, 2024 18:10
Copy link
Contributor

@charleskorn charleskorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking suggestion: another thing that might make this easier to debug is changing the behaviour of blocksStoreSetMock.GetClientsFor() to return an error rather than panic when it is called more times than expected

@pracucci
Copy link
Collaborator Author

Non-blocking suggestion: another thing that might make this easier to debug is changing the behaviour of blocksStoreSetMock.GetClientsFor() to return an error rather than panic when it is called more times than expected

I'm not super convinced because it's not the same thing. The problem I see is that, from the caller perspective, there would be no difference between a mocked error and an error occurring because not enough responses have been mocked.

I'm going to keep it as is for now. We can revisit if the current changes will not help to debug it.

@pracucci pracucci merged commit 7928755 into main Jan 16, 2024
28 checks passed
@pracucci pracucci deleted the make-TestBlocksStoreQuerier_ShouldReturnContextCanceledIfContextWasCanceledWhileRunningRequestOnStoreGateway-debuggable branch January 16, 2024 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants