Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestTransferLeader in realcluster is flaky #8348

Open
okJiang opened this issue Jul 1, 2024 · 2 comments
Open

TestTransferLeader in realcluster is flaky #8348

okJiang opened this issue Jul 1, 2024 · 2 comments
Labels
type/ci The issue is related to CI.

Comments

@okJiang
Copy link
Member

okJiang commented Jul 1, 2024

Flaky Test

Which jobs are failing

TestTransferLeader

CI link

https://do.pingcap.net/jenkins/blue/organizations/jenkins/tikv%2Fpd%2Fpull_integration_realcluster_test/detail/pull_integration_realcluster_test/191/pipeline/

=== RUN   TestTransferLeader
    scheduler_test.go:67: 
        	Error Trace:	/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/tests/integrations/realcluster/scheduler_test.go:67
        	Error:      	"[balance-hot-region-scheduler balance-leader-scheduler balance-region-scheduler]" should have 5 item(s), but has 3
        	Test:       	TestTransferLeader
--- FAIL: TestTransferLeader (4.62s)

Reason for failure (if possible)

Anything else

@okJiang okJiang added the type/ci The issue is related to CI. label Jul 1, 2024
@okJiang
Copy link
Member Author

okJiang commented Jul 10, 2024

[2024/07/05 16:05:09.779 +08:00] [ERROR] [pd_service_discovery.go:586] ["[pd] failed to update service mode"] [urls="[http://127.0.0.1:2379,http://127.0.0.1:2382,http://127.0.0.1:2384]"] [error="[PD:client:ErrClientGetClusterInfo]error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2384: connect: connection refused\" target:127.0.0.1:2384 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2384: connect: connection refused\" target:127.0.0.1:2384 status:TRANSIENT_FAILURE"] [stack="github.com/tikv/pd/client.(*pdServiceDiscovery).updateServiceModeLoop\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/pd_service_discovery.go:586"]
[2024/07/05 16:05:09.779 +08:00] [INFO] [pd_service_discovery.go:912] ["[pd] cannot update member from this url"] [url=http://127.0.0.1:2379] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\" target:127.0.0.1:2379 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\" target:127.0.0.1:2379 status:TRANSIENT_FAILURE"]
[2024/07/05 16:05:09.779 +08:00] [INFO] [pd_service_discovery.go:912] ["[pd] cannot update member from this url"] [url=http://127.0.0.1:2382] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2382: connect: connection refused\" target:127.0.0.1:2382 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2382: connect: connection refused\" target:127.0.0.1:2382 status:TRANSIENT_FAILURE"]
[2024/07/05 16:05:09.780 +08:00] [INFO] [pd_service_discovery.go:912] ["[pd] cannot update member from this url"] [url=http://127.0.0.1:2384] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2384: connect: connection refused\" target:127.0.0.1:2384 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2384: connect: connection refused\" target:127.0.0.1:2384 status:TRANSIENT_FAILURE"]
[2024/07/05 16:05:09.880 +08:00] [ERROR] [pd_service_discovery.go:559] ["[pd] failed to update member"] [urls="[http://127.0.0.1:2379,http://127.0.0.1:2382,http://127.0.0.1:2384]"] [error="[PD:client:ErrClientGetMember]get member failed"] [stack="github.com/tikv/pd/client.(*pdServiceDiscovery).updateMemberLoop\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/pd_service_discovery.go:559"]
[2024/07/05 16:05:40.035 +08:00] [INFO] [reboot_pd_test.go:33] ["TiUP restart success"]
--- PASS: TestReloadLabel (63.30s)
=== RUN   TestTransferLeader
[2024/07/05 16:05:41.819 +08:00] [ERROR] [client.go:252] ["[pd] request failed with a non-200 status"] [caller-id=pd-http-client] [name=GetLeader] [uri=/pd/api/v1/leader] [method=GET] [target-url=] [source=pd-real-cluster-test] [url=http://127.0.0.1:2384/pd/api/v1/leader] [status="500 Internal Server Error"] [body="[PD:apiutil:ErrRedirectToNotLeader]redirect to not leader"] [stack="github.com/tikv/pd/client/http.(*clientInner).doRequest\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/http/client.go:252\ngithub.com/tikv/pd/client/http.(*clientInner).requestWithRetry.func1\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/http/client.go:155\ngithub.com/tikv/pd/client/http.(*clientInner).requestWithRetry\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/http/client.go:171\ngithub.com/tikv/pd/client/http.(*client).request\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/http/client.go:397\ngithub.com/tikv/pd/client/http.(*client).GetLeader\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/http/interface.go:152\ngithub.com/tikv/pd/tests/integrations/realcluster.TestTransferLeader\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/tests/integrations/realcluster/scheduler_test.go:61\ntesting.tRunner\n\t/usr/local/go/src/testing/testing.go:1595"]
    scheduler_test.go:67: 
        	Error Trace:	/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/tests/integrations/realcluster/scheduler_test.go:67
        	Error:      	"[balance-hot-region-scheduler]" should have 5 item(s), but has 1
        	Test:       	TestTransferLeader

https://do.pingcap.net/jenkins/blue/rest/organizations/jenkins/pipelines/tikv/pipelines/pd/pipelines/pull_integration_realcluster_test/runs/250/nodes/72/steps/77/log/?start=0

@okJiang
Copy link
Member Author

okJiang commented Jul 10, 2024

https://do.pingcap.net/jenkins/blue/organizations/jenkins/tikv%2Fpd%2Fpull_integration_realcluster_test/detail/pull_integration_realcluster_test/281/pipeline

[2024/07/10 17:27:38.295 +08:00] [ERROR] [pd_service_discovery.go:559] ["[pd] failed to update member"] [urls="[http://127.0.0.1:2379,http://127.0.0.1:2382,http://127.0.0.1:2384]"] [error="[PD:client:ErrClientGetMember]get member failed"] [stack="github.com/tikv/pd/client.(*pdServiceDiscovery).updateMemberLoop\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/pd_service_discovery.go:559"]
[2024/07/10 17:27:41.194 +08:00] [ERROR] [pd_service_discovery.go:586] ["[pd] failed to update service mode"] [urls="[http://127.0.0.1:2379,http://127.0.0.1:2382,http://127.0.0.1:2384]"] [error="[PD:client:ErrClientGetClusterInfo]error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\" target:127.0.0.1:2379 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\" target:127.0.0.1:2379 status:TRANSIENT_FAILURE"] [stack="github.com/tikv/pd/client.(*pdServiceDiscovery).updateServiceModeLoop\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/pd_service_discovery.go:586"]
[2024/07/10 17:27:41.194 +08:00] [INFO] [pd_service_discovery.go:912] ["[pd] cannot update member from this url"] [url=http://127.0.0.1:2379] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\" target:127.0.0.1:2379 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\" target:127.0.0.1:2379 status:TRANSIENT_FAILURE"]
[2024/07/10 17:27:41.195 +08:00] [INFO] [pd_service_discovery.go:912] ["[pd] cannot update member from this url"] [url=http://127.0.0.1:2382] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2382: connect: connection refused\" target:127.0.0.1:2382 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2382: connect: connection refused\" target:127.0.0.1:2382 status:TRANSIENT_FAILURE"]
[2024/07/10 17:27:41.195 +08:00] [INFO] [pd_service_discovery.go:912] ["[pd] cannot update member from this url"] [url=http://127.0.0.1:2384] [error="[PD:client:ErrClientGetMember]error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2384: connect: connection refused\" target:127.0.0.1:2384 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2384: connect: connection refused\" target:127.0.0.1:2384 status:TRANSIENT_FAILURE"]
[2024/07/10 17:27:41.295 +08:00] [ERROR] [pd_service_discovery.go:559] ["[pd] failed to update member"] [urls="[http://127.0.0.1:2379,http://127.0.0.1:2382,http://127.0.0.1:2384]"] [error="[PD:client:ErrClientGetMember]get member failed"] [stack="github.com/tikv/pd/client.(*pdServiceDiscovery).updateMemberLoop\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/pd_service_discovery.go:559"]
[2024/07/10 17:28:08.196 +08:00] [INFO] [pd_service_discovery.go:1018] ["[pd] switch leader"] [new-leader=http://127.0.0.1:2382] [old-leader=http://127.0.0.1:2379]
[2024/07/10 17:28:10.619 +08:00] [INFO] [reboot_pd_test.go:33] ["TiUP restart success"]
--- PASS: TestReloadLabel (62.45s)
=== RUN   TestTransferLeader
[2024/07/10 17:28:12.292 +08:00] [ERROR] [client.go:252] ["[pd] request failed with a non-200 status"] [caller-id=pd-http-client] [name=GetLeader] [uri=/pd/api/v1/leader] [method=GET] [target-url=] [source=pd-real-cluster-test] [url=http://127.0.0.1:2382/pd/api/v1/leader] [status="500 Internal Server Error"] [body="[PD:apiutil:ErrRedirectToNotLeader]redirect to not leader"] [stack="github.com/tikv/pd/client/http.(*clientInner).doRequest\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/http/client.go:252\ngithub.com/tikv/pd/client/http.(*clientInner).requestWithRetry.func1\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/http/client.go:155\ngithub.com/tikv/pd/client/http.(*clientInner).requestWithRetry\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/http/client.go:171\ngithub.com/tikv/pd/client/http.(*client).request\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/http/client.go:397\ngithub.com/tikv/pd/client/http.(*client).GetLeader\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/http/interface.go:152\ngithub.com/tikv/pd/tests/integrations/realcluster.TestTransferLeader\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/tests/integrations/realcluster/scheduler_test.go:61\ntesting.tRunner\n\t/usr/local/go/src/testing/testing.go:1595"]
[2024/07/10 17:28:12.596 +08:00] [ERROR] [client.go:252] ["[pd] request failed with a non-200 status"] [caller-id=pd-http-client] [name=GetLeader] [uri=/pd/api/v1/leader] [method=GET] [target-url=] [source=pd-real-cluster-test] [url=http://127.0.0.1:2384/pd/api/v1/leader] [status="500 Internal Server Error"] [body="[PD:apiutil:ErrRedirectToNotLeader]redirect to not leader"] [stack="github.com/tikv/pd/client/http.(*clientInner).doRequest\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/http/client.go:252\ngithub.com/tikv/pd/client/http.(*clientInner).requestWithRetry.func1\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/http/client.go:155\ngithub.com/tikv/pd/client/http.(*clientInner).requestWithRetry\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/http/client.go:171\ngithub.com/tikv/pd/client/http.(*client).request\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/http/client.go:397\ngithub.com/tikv/pd/client/http.(*client).GetLeader\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/client/http/interface.go:152\ngithub.com/tikv/pd/tests/integrations/realcluster.TestTransferLeader\n\t/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/tests/integrations/realcluster/scheduler_test.go:61\ntesting.tRunner\n\t/usr/local/go/src/testing/testing.go:1595"]
    scheduler_test.go:67: 
        	Error Trace:	/home/jenkins/agent/workspace/tikv/pd/pull_integration_realcluster_test/pd/tests/integrations/realcluster/scheduler_test.go:67
        	Error:      	"[balance-hot-region-scheduler balance-leader-scheduler balance-region-scheduler evict-leader-scheduler]" should have 5 item(s), but has 4
        	Test:       	TestTransferLeader
--- FAIL: TestTransferLeader (2.37s)

@lhy1024 lhy1024 changed the title TestTransferLeader is flaky TestTransferLeader in realcluster is flaky Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/ci The issue is related to CI.
Projects
None yet
Development

No branches or pull requests

1 participant