Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestUpdateESSecureSettings is failing #2380

Closed
barkbay opened this issue Jan 9, 2020 · 14 comments
Closed

TestUpdateESSecureSettings is failing #2380

barkbay opened this issue Jan 9, 2020 · 14 comments
Assignees
Labels
>test Related to unit/integration/e2e tests v1.0.0

Comments

@barkbay
Copy link
Contributor

barkbay commented Jan 9, 2020

TestUpdateESSecureSettings has failed several times on the last release candidate for 1.0.0 (rc5):

=== RUN   TestUpdateESSecureSettings/Elasticsearch_secure_settings_should_eventually_be_set_in_all_nodes_keystore#03
Retries (5m0s timeout): ........................................................................................
{"level":"error","@timestamp":"2020-01-09T01:23:48.555Z","message":"stopping early","ver":"0.0.0-00000000","error":"test failure","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128\ngithub.com/elastic/cloud-on-k8s/test/e2e/test.StepList.RunSequential\n\t/go/src/github.com/elastic/cloud-on-k8s/test/e2e/test/step.go:42\ngithub.com/elastic/cloud-on-k8s/test/e2e/es.TestUpdateESSecureSettings\n\t/go/src/github.com/elastic/cloud-on-k8s/test/e2e/es/keystore_test.go:136\ntesting.tRunner\n\t/usr/local/go/src/testing/testing.go:909"}
    --- FAIL: TestUpdateESSecureSettings/Elasticsearch_secure_settings_should_eventually_be_set_in_all_nodes_keystore#03 (300.00s)
        utils.go:83: 
            	Error Trace:	utils.go:83
            	Error:      	Received unexpected error:
            	            	pod test-es-keystore-XXXX-es-masterdata-0 is not Ready.
            	            	Status:{
@barkbay barkbay added >test Related to unit/integration/e2e tests v1.0.0 labels Jan 9, 2020
@pebrc
Copy link
Collaborator

pebrc commented Jan 9, 2020

Not sure if this is relevant but I noticed

01:20:47  W0109 01:20:36.519197    8619 reflector.go:299] pkg/mod/k8s.io/client-go@v0.0.0-20191114101535-6c5935290e33/tools/cache/reflector.go:96: watch of *v1.Pod ended with: too old resource version: 2003 (2374)

in the logs which I haven't seen before.

@sebgl
Copy link
Contributor

sebgl commented Jan 9, 2020

I reproduced it with the operator running locally and no webhook set up.
The test times out after 5 minutes because the 3 nodes rolling upgrade is not over yet. On my local test the last one was still initialising after 5 minutes.
Which makes sense since we increased the preStop hook wait and the general rolling upgrade E2E check timeout but not the secure settings rolling upgrade timeout.

Edit: my dev env is running on AKS, where Pods take 2 minutes to start up. I'm double checking the timings again on GKE, then I'll probably open a PR to fix the test timeout once I'm 100% confident this is causing the issue.

@pebrc
Copy link
Collaborator

pebrc commented Jan 9, 2020

On my 1.15 GKE cluster the rolling upgrade takes about 200 secs for the three nodes so well below the 5 min threshold.

@sebgl
Copy link
Contributor

sebgl commented Jan 9, 2020

@pebrc you're right, my slow rolling upgrade was just coming from using AKS. Switched back to GKE and I also get below the 5 min threshold. Sorry for the noise!

@sebgl
Copy link
Contributor

sebgl commented Jan 9, 2020

It looks like the test is flaky and sometimes succeeded:

image

image

I'm still investigating the slow rolling upgrade hypothesis.

@sebgl
Copy link
Contributor

sebgl commented Jan 9, 2020

I'm investigating this build.

Comparing the status of the unready pod in the test logs:

	pod test-es-keystore-qrtl-es-masterdata-0 is not Ready.
            	            	Status:{
            	            	    "phase": "Running",
            	            	    "conditions": [
            	            	        {
            	            	            "type": "Initialized",
            	            	            "status": "True",
            	            	            "lastProbeTime": null,
            	            	            "lastTransitionTime": "2020-01-08T20:49:18Z"
            	            	        },
            	            	        {
            	            	            "type": "Ready",
            	            	            "status": "False",
            	            	            "lastProbeTime": null,
            	            	            "lastTransitionTime": "2020-01-08T20:49:16Z",
            	            	            "reason": "ContainersNotReady",
            	            	            "message": "containers with unready status: [elasticsearch]"
            	            	        },

vs. status of the Pod from the support archive:


                "name": "test-es-keystore-qrtl-es-masterdata-0",
...
 "status": {
                "conditions": [
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": "2020-01-08T20:49:18Z",
                        "status": "True",
                        "type": "Initialized"
                    },
                    {
                        "lastProbeTime": null,
                        "lastTransitionTime": "2020-01-08T20:49:38Z",
                        "status": "True",
                        "type": "Ready"
                    },

22 seconds after the test ended (5min timeout), the container was ready and the test would probably have succeeded.

Looking at the operator logs:

  • Pod test-es-keystore-qrtl-es-masterdata-2 was restarted at 2020-01-08T20:44:25
  • Pod test-es-keystore-qrtl-es-masterdata-1 was restarted at 2020-01-08T20:45:22` (one minute later)
  • Pod test-es-keystore-qrtl-es-masterdata-0 was restarted at 2020-01-08T20:46:40 (one minute later)

Then we get Cache expectations are not satisfied yet, re-queueing until 2020-01-08T20:49:13, which probably matches the time where we finally see the pod being deleted in our cache. Then the log shows the operators moves on with setting up the TLS certificates for that Pod.

That could be related to the error @pebrc pointed out:

W0108 20:44:06.048715       1 reflector.go:299] pkg/mod/k8s.io/client-go@v0.0.0-20191114101535-6c5935290e33/tools/cache/reflector.go:96: watch of *v1.Elasticsearch ended with: too old resource version: 9469 (11313)

Same investigation on build 179:

  • Pod test-es-keystore-xkg8-es-masterdata-2 deleted at 2020-01-09T01:22:58
  • Pod test-es-keystore-xkg8-es-masterdata-1 deleted at 2020-01-09T01:24:02
  • Pod test-es-keystore-xkg8-es-masterdata-0 deleted at 2020-01-09T01:25:15

But then we see Cache expectations are not satisfied yet, re-queueing until 2020-01-09T01:27:57 where we finally move on with the upgraded node.

I can also see a bunch of reflector.go errors in the logs, targeting various resources Kinds (ConfigMap, Secret, Elasticsearch, StatefulSet), but not targeting Pods this time.


Same thing on build 146 (stack-versions test).


Since we added a 30sec wait to the Pod deletion, I think we can expect to see Cache expectations are not satisfied yet, re-queueing a bit more (Pod is still there for an additional 30sec during which expectations cannot be satisfied). A lot longer than 30sec could be caused by either:

  • a slow k8s client cache refresh
  • a slow Elasticsearch process stop (after the 30sec delay)

Based on the logs the keystore seems to be the first "real" rolling upgrade test executed. I guess this failure would also happen for other rolling upgrade tests.

@pebrc
Copy link
Collaborator

pebrc commented Jan 9, 2020

I am wondering whether we have some variability in the pre-stop hook as well. If the endpoint removal from the service is slow we might have longer pre-stop hook run times, worst case 50 secs if IIUC

@thbkrkr
Copy link
Contributor

thbkrkr commented Jan 9, 2020

Just to confirm that something happens at random when the container is killed/stopped.

Each time the test fails, the container kill/stop takes more than two minutes.

# https://devops-ci.elastic.co/view/cloud-on-k8s/job/cloud-on-k8s-versions-gke/179/1

2020-01-09T01:43:46Z Killing container with id docker://elasticsearch:Need to kill Pod test-es-keystore-q6kh-es-masterdata-0
2020-01-09T01:45:50Z Successfully assigned e2e-j2ibb-mercury/test-es-keystore-q6kh-es-masterdata-0 to gke-eck-gke12-179-e2e-default-pool-9085c232-svxp

# https://devops-ci.elastic.co/view/cloud-on-k8s/job/cloud-on-k8s-versions-gke/179/2

2020-01-09T01:25:15Z Stopping container elasticsearch
2020-01-09T01:27:59Z Successfully assigned e2e-gw6m8-mercury/test-es-keystore-xkg8-es-masterdata-0 to gke-eck-gke14-179-e2e-default-pool-ff5f4e13-j29p

# https://devops-ci.elastic.co/view/cloud-on-k8s/job/cloud-on-k8s-stack/146/3

2020-01-09T01:27:07Z Killing container with id docker://elasticsearch:Need to kill Pod test-es-keystore-vnrl-es-masterdata-0
2020-01-09T01:29:12Z Successfully assigned e2e-sife0-mercury/test-es-keystore-vnrl-es-masterdata-0 to gke-eck-73-146-e2e-default-pool-6f60a53a-69hr

When the test succeeds, it takes a few seconds:

# https://devops-ci.elastic.co/view/cloud-on-k8s/job/cloud-on-k8s-versions-gke/119/1

2019-12-11T01:11:24Z Killing container with id docker://elasticsearch:Need to kill Pod test-es-keystore-2drl-es-masterdata-1
2019-12-11T01:11:33Z Successfully assigned e2e-77ioh-mercury/test-es-keystore-2drl-es-masterdata-1 to gke-eck-68-119-e2e-default-pool-7e724caa-8012

2019-12-11T01:12:17Z Killing container with id docker://elasticsearch:Need to kill Pod test-es-keystore-2drl-es-masterdata-0
2019-12-11T01:12:18Z Successfully assigned e2e-77ioh-mercury/test-es-keystore-2drl-es-masterdata-0 to gke-eck-68-119-e2e-default-pool-2ee4afc7-dkq1

@barkbay
Copy link
Contributor Author

barkbay commented Jan 9, 2020

Just did a quick test while looking at the state of the Docker container on the host:

Container is stopped and deleted at 12:18:34:

gke-michael-dev-e2e-default-pool-80be6640-nvxl ~ # date && docker ps -a|grep elasticsearch
Thu Jan  9 12:18:33 UTC 2020
84b15905182b        2bd69c322e98                                          "/usr/local/bin/dock…"   3 minutes ago       Up 3 minutes                                   k8s_elasticsearch_test-es-keystore-lmh4-es-masterdata-0_e2e-gj4vg-mercury_b4d136c9-32d9-11ea-9732-42010a840129_0
gke-michael-dev-e2e-default-pool-80be6640-nvxl ~ # date && docker ps -a|grep elasticsearch
Thu Jan  9 12:18:34 UTC 2020

But the Pod is still in the Terminating state until 12:20:40 :

Thu Jan  9 12:20:39 UTC 2020
NAME                                    READY   STATUS        RESTARTS   AGE
test-es-keystore-lmh4-es-masterdata-0   0/1     Terminating   0          5m18s
test-es-keystore-lmh4-es-masterdata-1   1/1     Running       0          3m
test-es-keystore-lmh4-es-masterdata-2   1/1     Running       0          4m11s
...
Thu Jan  9 12:20:40 UTC 2020
NAME                                    READY   STATUS    RESTARTS   AGE
test-es-keystore-lmh4-es-masterdata-1   1/1     Running   0          3m1s
test-es-keystore-lmh4-es-masterdata-2   1/1     Running   0          4m12s

Edit:

Here are the Kubelet logs (most recent messages first) :

Jan 09 12:20:40 gke-michael-dev-e2e-default-pool-80be6640-nvxl kubelet[1809]: I0109 12:20:40.588525    1809 kubelet.go:1918] SyncLoop (ADD, "api"): "test-es-keystore-lmh4-es-masterdata-0_e2e-gj4vg-mercury(732e9d44-32da-11ea-9732-42010a840129)"
Jan 09 12:20:40 gke-michael-dev-e2e-default-pool-80be6640-nvxl kubelet[1809]: I0109 12:20:40.256170    1809 kubelet.go:2130] Failed to delete pod "test-es-keystore-lmh4-es-masterdata-0_e2e-gj4vg-mercury(b4d136c9-32d9-11ea-9732-42010a840129)", err: pod not found
Jan 09 12:20:40 gke-michael-dev-e2e-default-pool-80be6640-nvxl kubelet[1809]: I0109 12:20:40.256068    1809 kubelet.go:1928] SyncLoop (REMOVE, "api"): "test-es-keystore-lmh4-es-masterdata-0_e2e-gj4vg-mercury(b4d136c9-32d9-11ea-9732-42010a840129)"
Jan 09 12:20:40 gke-michael-dev-e2e-default-pool-80be6640-nvxl kubelet[1809]: I0109 12:20:40.230240    1809 kubelet.go:1934] SyncLoop (DELETE, "api"): "test-es-keystore-lmh4-es-masterdata-0_e2e-gj4vg-mercury(b4d136c9-32d9-11ea-9732-42010a840129)"
Jan 09 12:20:36 gke-michael-dev-e2e-default-pool-80be6640-nvxl kubelet[1809]: I0109 12:20:36.318044    1809 reconciler.go:301] Volume detached for volume "elastic-internal-secure-settings" (UniqueName: "kubernetes.io/secret/b4d136c9-32d9-11ea-9732-42010a840129-elastic-internal-secure-settings") on node "gke-michael-dev-e2e-default-pool-80be6640-nvxl" DevicePath ""
Jan 09 12:20:36 gke-michael-dev-e2e-default-pool-80be6640-nvxl kubelet[1809]: I0109 12:20:36.233069    1809 operation_generator.go:693] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/b4d136c9-32d9-11ea-9732-42010a840129-elastic-internal-secure-settings" (OuterVolumeSpecName: "elastic-internal-secure-settings") pod "b4d136c9-32d9-11ea-9732-42010a840129" (UID: "b4d136c9-32d9-11ea-9732-42010a840129"). InnerVolumeSpecName "elastic-internal-secure-settings". PluginName "kubernetes.io/secret"
Jan 09 12:20:36 gke-michael-dev-e2e-default-pool-80be6640-nvxl kubelet[1809]: I0109 12:20:36.217727    1809 reconciler.go:181] operationExecutor.UnmountVolume started for volume "elastic-internal-secure-settings" (UniqueName: "kubernetes.io/secret/b4d136c9-32d9-11ea-9732-42010a840129-elastic-internal-secure-settings") pod "b4d136c9-32d9-11ea-9732-42010a840129" (UID: "b4d136c9-32d9-11ea-9732-42010a840129")
Jan 09 12:20:31 gke-michael-dev-e2e-default-pool-80be6640-nvxl google-accounts[591]: INFO Removing user gke-8b13d089812e7a94b51b.
Jan 09 12:18:38 gke-michael-dev-e2e-default-pool-80be6640-nvxl kubelet[1809]: I0109 12:18:38.962669    1809 log.go:172] http: superfluous response.WriteHeader call from k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/httplog.(*respLogger).WriteHeader (httplog.go:184)
Jan 09 12:18:38 gke-michael-dev-e2e-default-pool-80be6640-nvxl kubelet[1809]: E0109 12:18:38.958878    1809 remote_runtime.go:282] ContainerStatus "84b15905182b65e9408674feca483eb04ec9c47f070fa6e75ba7b4a26e7978e6" from runtime service failed: rpc error: code = Unknown desc = Error: No such container: 84b15905182b65e9408674feca483eb04ec9c47f070fa6e75ba7b4a26e7978e6
Jan 09 12:18:36 gke-michael-dev-e2e-default-pool-80be6640-nvxl kubelet[1809]: I0109 12:18:36.191672    1809 kubelet_pods.go:1073] Killing unwanted pod "test-es-keystore-lmh4-es-masterdata-0"
Jan 09 12:18:36 gke-michael-dev-e2e-default-pool-80be6640-nvxl kubelet[1809]: I0109 12:18:36.052573    1809 kubelet.go:1934] SyncLoop (DELETE, "api"): "test-es-keystore-lmh4-es-masterdata-0_e2e-gj4vg-mercury(b4d136c9-32d9-11ea-9732-42010a840129)"

@sebgl
Copy link
Contributor

sebgl commented Jan 10, 2020

#2388 should fix the test flakiness.
I mentioned the issue in #2270 to make sure we think about monitoring upgrade time in E2E tests.

@sebgl sebgl closed this as completed Jan 10, 2020
@sebgl
Copy link
Contributor

sebgl commented Mar 31, 2020

Reopening. This is flaky again :(

https://devops-ci.elastic.co/job/cloud-on-k8s-e2e-tests-stack-versions/32/testReport/github/com_elastic_cloud-on-k8s_test_e2e_es/Run_tests_for_different_ELK_stack_versions_in_GKE___7_6_0___TestMutationSecondMasterSetDown_ES_cluster_health_should_eventually_be_green_01/

=== RUN   TestMutationSecondMasterSetDown/ES_cluster_health_should_eventually_be_green#01
Retries (5m0s timeout): ....................................................................................................
{"log.level":"error","@timestamp":"2020-03-31T02:13:11.533Z","message":"stopping early","service.version":"0.0.0-00000000","service.type":"eck","ecs.version":"1.4.0","error":"test failure","error.stack_trace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128\ngithub.com/elastic/cloud-on-k8s/test/e2e/test.StepList.RunSequential\n\t/go/src/github.com/elastic/cloud-on-k8s/test/e2e/test/step.go:42\ngithub.com/elastic/cloud-on-k8s/test/e2e/test.RunMutations\n\t/go/src/github.com/elastic/cloud-on-k8s/test/e2e/test/run_mutation.go:37\ngithub.com/elastic/cloud-on-k8s/test/e2e/test.RunMutation\n\t/go/src/github.com/elastic/cloud-on-k8s/test/e2e/test/run_mutation.go:78\ngithub.com/elastic/cloud-on-k8s/test/e2e/es.RunESMutation\n\t/go/src/github.com/elastic/cloud-on-k8s/test/e2e/es/mutation_test.go:299\ngithub.com/elastic/cloud-on-k8s/test/e2e/es.TestMutationSecondMasterSetDown\n\t/go/src/github.com/elastic/cloud-on-k8s/test/e2e/es/mutation_test.go:168\ntesting.tRunner\n\t/usr/local/go/src/testing/testing.go:909"}
    --- FAIL: TestMutationSecondMasterSetDown/ES_cluster_health_should_eventually_be_green#01 (300.00s)
        utils.go:84: 
            	Error Trace:	utils.go:84
            	Error:      	Received unexpected error:
            	            	health is red
            	Test:       	TestMutationSecondMasterSetDown/ES_cluster_health_should_eventually_be_green#01

@sebgl sebgl reopened this Mar 31, 2020
@sebgl sebgl self-assigned this Apr 6, 2020
@sebgl
Copy link
Contributor

sebgl commented Apr 6, 2020

I'm looking at https://devops-ci.elastic.co/job/cloud-on-k8s-e2e-tests-stack-versions/38/.

In the tests, we remove one of the 2 referenced secure settings secret, then expect a rolling upgrade to happen.

  • rolling upgrade starts at 01:26:46 with the deletion of the 3rd node (test-es-keystore-mkv6-es-masterdata-2) - based on operator logs.
  • the Pod gets recreated at 01:27:19 - based on k8s events.
  • this Pod gets ready at 01:28:10 - based on its status
  • the ES node is still not back into the cluster at 01:41:48 - operator logs: Some upgraded nodes are not back in the cluster yet, keeping shard allocations disabled, which results from a call to the nodes API

Elasticsearch logs report a problem with TLS certificates, preventing the node to join the cluster:

[2020-04-06T01:27:41,409][WARN ][o.e.x.c.s.t.n.SecurityNetty4Transport] [test-es-keystore-mkv6-es-masterdata-2] client did not trust this server's certificate, closing connection Netty4TcpChannel{localAddress=/10.103.34.20:9300, remoteAddress=/10.103.32.34:55346}
[2020-04-06T01:27:41,485][WARN ][o.e.x.c.s.t.n.SecurityNetty4Transport] [test-es-keystore-mkv6-es-masterdata-2] client did not trust this server's certificate, closing connection Netty4TcpChannel{localAddress=/10.103.34.20:9300, remoteAddress=/10.103.32.34:55348}
[2020-04-06T01:27:41,489][WARN ][o.e.x.c.s.t.n.SecurityNetty4Transport] [test-es-keystore-mkv6-es-masterdata-2] client did not trust this server's certificate, closing connection Netty4TcpChannel{localAddress=/10.103.34.20:9300, remoteAddress=/10.103.32.34:55344}
[2020-04-06T01:27:41,612][WARN ][o.e.x.c.s.t.n.SecurityNetty4Transport] [test-es-keystore-mkv6-es-masterdata-2] client did not trust this server's certificate, closing connection Netty4TcpChannel{localAddress=/10.103.34.20:9300, remoteAddress=/10.103.32.34:55350}
[2020-04-06T01:27:41,622][INFO ][o.e.d.z.ZenDiscovery     ] [test-es-keystore-mkv6-es-masterdata-2] failed to send join request to master [{test-es-keystore-mkv6-es-masterdata-1}{T1n_KWimSuOlGd2pTbfbWg}{ULIVyNJ6QKyJCP6zrRX66w}{10.103.32.34}{10.103.32.34:9300}{ml.machine_memory=2147483648, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason [RemoteTransportException[[test-es-keystore-mkv6-es-masterdata-1][10.103.32.34:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[test-es-keystore-mkv6-es-masterdata-2][10.103.34.20:9300] general node connection failure]; nested: TransportException[handshake failed because connection reset]; ]

Logs of other ES nodes seem to indicate something wrong with the TLS SANs:

[2020-04-06T01:36:32,392][WARN ][o.e.t.TcpTransport       ] [test-es-keystore-mkv6-es-masterdata-1] exception caught on transport layer [Netty4TcpChannel{localAddress=/10.103.32.34:34222, remoteAddress=10.103.34.20/10.103.34.20:9300}], closing connection
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: No subject alternative names matching IP address 10.103.34.20 found
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:472) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
	at java.lang.Thread.run(Thread.java:830) [?:?]

10.103.34.20 is the (new) IP of the new 3rd node.

@sebgl
Copy link
Contributor

sebgl commented Apr 6, 2020

Other E2E test fail for the same reason, I think all rolling upgrades are impacted. I'm opening a dedicated issue: #2823.

@sebgl
Copy link
Contributor

sebgl commented Apr 8, 2020

#2831 should fix this.
Will reopen if needed.

@sebgl sebgl closed this as completed Apr 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>test Related to unit/integration/e2e tests v1.0.0
Projects
None yet
Development

No branches or pull requests

4 participants