Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestRegionLabelDenyScheduler is flaky #8339

Open
okJiang opened this issue Jun 27, 2024 · 7 comments · Fixed by #8340
Open

TestRegionLabelDenyScheduler is flaky #8339

okJiang opened this issue Jun 27, 2024 · 7 comments · Fixed by #8340
Assignees
Labels
type/ci The issue is related to CI.

Comments

@okJiang
Copy link
Member

okJiang commented Jun 27, 2024

Flaky Test

Which jobs are failing

TestRegionLabelDenyScheduler

CI link

https://do.pingcap.net/jenkins/blue/organizations/jenkins/tikv%2Fpd%2Fpull_integration_realcluster_test/detail/pull_integration_realcluster_test/130/pipeline

Reason for failure (if possible)

grant_leader scheduler did not grant all regions except one denied region.

Following is the scheduled region id.

evict_leader.log

grant_leader.log

grant_leader scheduler does not schedule region(26, 94)

comm -3 <(sort evict_leader.log) <(sort grant_leader.log)
26
94

Anything else

@okJiang
Copy link
Member Author

okJiang commented Jun 27, 2024

https://do.pingcap.net/jenkins/blue/organizations/jenkins/tikv%2Fpd%2Fpull_integration_realcluster_test/detail/pull_integration_realcluster_test/144/pipeline/
test2.log
test.log

comm -3 <(grep "op finish duration less than 10s" test.log | grep -oP '\[region-id=\K\d+' | sort) <(grep "op finish duration less than 10s" test2.log | grep -oP '\[region-id=\K\d+' | sort)
114
28
38

@okJiang
Copy link
Member Author

okJiang commented Jun 28, 2024

/assign

@HuSharp
Copy link
Member

HuSharp commented Jul 5, 2024

meet again
https://do.pingcap.net/jenkins/blue/organizations/jenkins/tikv%2Fpd%2Fpull_integration_realcluster_test/detail/pull_integration_realcluster_test/250/pipeline

=== RUN   TestRegionLabelDenyScheduler
[2024/07/05 16:06:36.782 +08:00] [INFO] [pd_service_discovery.go:1018] ["[pd] switch leader"] [new-leader=http://127.0.0.1:2382] [old-leader=http://127.0.0.1:2384]
    testutil.go:56: 
        	Error Trace:	/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/testutil/testutil.go:56
        	            				/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/tests/integrations/realcluster/scheduler_test.go:105
        	Error:      	Condition never satisfied
        	Test:       	TestRegionLabelDenyScheduler

@HuSharp HuSharp reopened this Jul 5, 2024
@okJiang
Copy link
Member Author

okJiang commented Jul 10, 2024

meet again https://do.pingcap.net/jenkins/blue/organizations/jenkins/tikv%2Fpd%2Fpull_integration_realcluster_test/detail/pull_integration_realcluster_test/250/pipeline

=== RUN   TestRegionLabelDenyScheduler
[2024/07/05 16:06:36.782 +08:00] [INFO] [pd_service_discovery.go:1018] ["[pd] switch leader"] [new-leader=http://127.0.0.1:2382] [old-leader=http://127.0.0.1:2384]
    testutil.go:56: 
        	Error Trace:	/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/testutil/testutil.go:56
        	            				/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/tests/integrations/realcluster/scheduler_test.go:105
        	Error:      	Condition never satisfied
        	Test:       	TestRegionLabelDenyScheduler

This failure was caused by the previous test failure, which I have added in another issue #8348 (comment). So we can still close this issue, and we will discuss the instability of TestTransferLeader in another issue.

@okJiang okJiang closed this as completed Jul 10, 2024
@okJiang
Copy link
Member Author

okJiang commented Jul 15, 2024

https://do.pingcap.net/jenkins/blue/organizations/jenkins/tikv%2Fpd%2Fpull_integration_realcluster_test/detail/pull_integration_realcluster_test/310/pipeline/

It seems like the 'stream not found' affected the grant-leader process, causing a timeout.

Still grant-leader in progress until timeout.

image

@okJiang
Copy link
Member Author

okJiang commented Jul 16, 2024

https://do.pingcap.net/jenkins/blue/organizations/jenkins/tikv%2Fpd%2Fpull_integration_realcluster_test/detail/pull_integration_realcluster_test/310/pipeline/

It seems like the 'stream not found' affected the grant-leader process, causing a timeout.

Still grant-leader in progress until timeout.

image

fixed by 5941965

@okJiang okJiang closed this as completed Jul 16, 2024
@HuSharp
Copy link
Member

HuSharp commented Aug 1, 2024

meet again
https://do.pingcap.net/jenkins/blue/organizations/jenkins/tikv%2Fpd%2Fpull_integration_realcluster_test/detail/pull_integration_realcluster_test/467/

--- PASS: TestReloadLabel (63.86s)
=== RUN   TestTransferLeader
--- PASS: TestTransferLeader (3.07s)
=== RUN   TestRegionLabelDenyScheduler
    testutil.go:56: 
        	Error Trace:	/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/testutil/testutil.go:56
        	            				/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/tests/integrations/realcluster/scheduler_test.go:178
        	Error:      	Condition never satisfied
        	Test:       	TestRegionLabelDenyScheduler

@HuSharp HuSharp reopened this Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/ci The issue is related to CI.
Projects
None yet
2 participants